external/arm-optimized-routines.git

Age	Commit message (Collapse)	Author
2022-02-10	Update lincense to MIT OR Apache-2.0 WITH LLVM-exception	Szabolcs Nagy
	The outgoing license was MIT only. The new dual license allows using the code under Apache-2.0 WITH LLVM-exception license too.
2021-02-17	Update copyright years	Szabolcs Nagy
	Scripted copyright year updates based on git committer date.
2020-01-14	math: add vector pow	Szabolcs Nagy
	This implementation is a wrapper around the scalar pow with appropriate call abi. As such it is not expected to be faster than scalar calls, the new double prec vector pow symbols are provided for completeness.
2019-11-05	Add vector exp2f	Szabolcs Nagy
	Same design as in expf. Worst-case error of __v_exp2f and __v_exp2f_1u is 1.96 and 0.88 ulp respectively. It is not clear if round/convert instructions are better or +- Shift. For expf the latter, for exp2f the former seems more consistently faster, but both options are kept in the code for now.
2019-10-14	Add vector log	Szabolcs Nagy
	Worst-case error is 1.67 ulp, the polynomial was generated by sollya. Uses a 128 entry (2KB) lookup table. Special cases fall back to scalar log call.
2019-10-14	Add vector sin and cos	Szabolcs Nagy
	Worst-case error is 3.5 ulp, the polynomial was generated by sollya. For large (>2^23) and special inputs the code falls back to scalar sin and cos.
2019-10-14	Add vector powf	Szabolcs Nagy
	Essentially the scalar powf algorithm is used for each element in the vector just inlined for better scheduling and simpler special case handling. The log polynomial is smaller as less accuracy is enough. Worst-case error is 2.6 ulp.
2019-10-14	Add vector sinf and cosf	Szabolcs Nagy
	The polynomials were produced by searching the coefficient space using heuristics and ideas from https://arxiv.org/abs/1508.03211 The worst-case error is 1.886 ulp, large inputs (> 2^20) and other special cases use scalar sinf and cosf.
2019-10-14	Add vector logf	Szabolcs Nagy
	The polynomial was produced by searching the coefficient space using heuristics and ideas from https://arxiv.org/abs/1508.03211 The worst-case error is 3.34 ulp, subnormal range inputs and other special cases use scalar logf.
2019-10-14	Add vector exp, expf and related vector math support code	Szabolcs Nagy
	Vector math routines are added to the same libmathlib library as scalar ones. The difficulty is that they are not always available, the external abi depends on the compiler version used for the build. Currently only aarch64 AdvSIMD is supported, there are 4 new sets of symbols: __s_foo is a scalar function with identical result to the vector one, __v_foo is a vector function using the base PCS, __vn_foo uses the vector PCS and _ZGV*_foo is the vector ABI symbol alias of vn_foo for a scalar math function foo. The test and benchmark code got extended to handle vector functions. Vector functions aim for < 5 ulp worst case error, only support nearest rounding mode and don't support floating-point exceptions. Vector functions may call scalar functions to handle special cases, but for a single value they should return the same result independently of values in other vector lanes or the position of the value in the vector. The __v_expf and __v_expf_1u polynomials were produced by searching the coefficient space with some heuristics and ideas from https://arxiv.org/abs/1508.03211 Their worst case error is 1.95 and 0.866 ulp respectively. The exp polynomial was produced by sollya, it uses a 128 element (1KB) lookup table and has 2.38 ulp worst case error.
2019-08-27	math: fix duplicated declaration in mathlib.h	Szabolcs Nagy
	Removed tanf declaration since the implementation got removed too.
2018-11-22	Relicence the project under the MIT License	Szabolcs Nagy

2018-06-12	Add pow to mathlib.h	Szabolcs Nagy
	Update mathlib.h to use GNU style declarations and add missing pow.
2018-06-06	Add new log2 implementation	Szabolcs Nagy
	Similar algorithm is used as in log, but there are more operations (and more error) due to the 1/ln2 multiplier. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, for which the table size is doubled, and to compute (x/c - 1)/ln2 precisely. The worst case error is 0.547 ULP (0.55 without fma), the read only global data size is 1168 bytes (2192 without fma). The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log latency: 2.04x log thruput: 1.87x
2018-06-05	Add new double precision functions to mathlib.h	Szabolcs Nagy

2018-05-16	Improve performance of sinf/cosf/sincosf	Wilco Dijkstra
	This patch is a complete rewrite of sinf, cosf and sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases - on some targets this may be slow, so this can be configured to use floating point comparisons. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary.
2018-05-16	Remove the ARM__ symbol prefix	Szabolcs Nagy
	Use standard math symbols so it's easy to override libm functions. The arm_math.h header is no longer necessary, user code can just use math.h, but keep a header for freestanding code.
2018-05-16	Reformat the license headers	Szabolcs Nagy
	Use standard name for the LICENSE file. Use consistent license text across files: - "ARM" is changed to Arm, - "All Rights Reserved" is dropped (not needed), - "This file is part of.." is dropped, - Text is formatted as is recommended by the LICENSE file.
2017-08-11	Add new expf, exp2f, logf, log2f and powf implementations	Szabolcs Nagy

2015-11-19	Initial release of Optimized Routines	George Lander