external/arm-optimized-routines.git

Age	Commit message (Collapse)	Author
2018-11-22	Relicence the project under the MIT License	Szabolcs Nagy

2018-09-05	Document the log table generation method	Szabolcs Nagy
	Add comments with enough detail so the log lookup tables can be recreated.
2018-06-29	Fix GNU style issues	Szabolcs Nagy
	Whitespace changes only.
2018-06-22	Improve pow implementation	Szabolcs Nagy
	The log part of pow got rewritten to use a slightly different algorithm. This improves precision and throughput while keeps the same table size. Near 1 cases are no longer special cased, there is a slight performance regression in that case. And when the fma instruction is not available this algorithm is expected to have slightly worse performance. Worst-case error improved from 0.67 ULP to 0.57 ULP. On Cortex-A72 i see thruput near 1: 7% worse latency near 1: 2% worse thruput general: 8% better latency general: 2% better
2018-06-11	Add new pow implementation	Szabolcs Nagy
	The algorithm is exp(y * log(x)), where log(x) is computed with about 1.8*2^-66 relative error, returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). There is separate code path when fma is not available but the worst case error is about 0.67 ULP in both cases. The lookup table and consts for log are 4224 bytes, the code is 1196 bytes. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: latency: 1.8x thruput: 2.5x