Age | Commit message (Collapse) | Author | |
---|---|---|---|
2018-11-22 | Relicence the project under the MIT License | Szabolcs Nagy | |
2018-09-05 | Document the log table generation method | Szabolcs Nagy | |
Add comments with enough detail so the log lookup tables can be recreated. | |||
2018-06-29 | Fix GNU style issues | Szabolcs Nagy | |
Whitespace changes only. | |||
2018-06-22 | Improve pow implementation | Szabolcs Nagy | |
The log part of pow got rewritten to use a slightly different algorithm. This improves precision and throughput while keeps the same table size. Near 1 cases are no longer special cased, there is a slight performance regression in that case. And when the fma instruction is not available this algorithm is expected to have slightly worse performance. Worst-case error improved from 0.67 ULP to 0.57 ULP. On Cortex-A72 i see thruput near 1: 7% worse latency near 1: 2% worse thruput general: 8% better latency general: 2% better | |||
2018-06-11 | Add new pow implementation | Szabolcs Nagy | |
The algorithm is exp(y * log(x)), where log(x) is computed with about 1.8*2^-66 relative error, returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). There is separate code path when fma is not available but the worst case error is about 0.67 ULP in both cases. The lookup table and consts for log are 4224 bytes, the code is 1196 bytes. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: latency: 1.8x thruput: 2.5x |