aboutsummaryrefslogtreecommitdiff
path: root/math/include
AgeCommit message (Collapse)Author
2022-02-10Update lincense to MIT OR Apache-2.0 WITH LLVM-exceptionSzabolcs Nagy
The outgoing license was MIT only. The new dual license allows using the code under Apache-2.0 WITH LLVM-exception license too.
2021-02-17Update copyright yearsSzabolcs Nagy
Scripted copyright year updates based on git committer date.
2020-01-14math: add vector powSzabolcs Nagy
This implementation is a wrapper around the scalar pow with appropriate call abi. As such it is not expected to be faster than scalar calls, the new double prec vector pow symbols are provided for completeness.
2019-11-05Add vector exp2fSzabolcs Nagy
Same design as in expf. Worst-case error of __v_exp2f and __v_exp2f_1u is 1.96 and 0.88 ulp respectively. It is not clear if round/convert instructions are better or +- Shift. For expf the latter, for exp2f the former seems more consistently faster, but both options are kept in the code for now.
2019-10-14Add vector logSzabolcs Nagy
Worst-case error is 1.67 ulp, the polynomial was generated by sollya. Uses a 128 entry (2KB) lookup table. Special cases fall back to scalar log call.
2019-10-14Add vector sin and cosSzabolcs Nagy
Worst-case error is 3.5 ulp, the polynomial was generated by sollya. For large (>2^23) and special inputs the code falls back to scalar sin and cos.
2019-10-14Add vector powfSzabolcs Nagy
Essentially the scalar powf algorithm is used for each element in the vector just inlined for better scheduling and simpler special case handling. The log polynomial is smaller as less accuracy is enough. Worst-case error is 2.6 ulp.
2019-10-14Add vector sinf and cosfSzabolcs Nagy
The polynomials were produced by searching the coefficient space using heuristics and ideas from https://arxiv.org/abs/1508.03211 The worst-case error is 1.886 ulp, large inputs (> 2^20) and other special cases use scalar sinf and cosf.
2019-10-14Add vector logfSzabolcs Nagy
The polynomial was produced by searching the coefficient space using heuristics and ideas from https://arxiv.org/abs/1508.03211 The worst-case error is 3.34 ulp, subnormal range inputs and other special cases use scalar logf.
2019-10-14Add vector exp, expf and related vector math support codeSzabolcs Nagy
Vector math routines are added to the same libmathlib library as scalar ones. The difficulty is that they are not always available, the external abi depends on the compiler version used for the build. Currently only aarch64 AdvSIMD is supported, there are 4 new sets of symbols: __s_foo is a scalar function with identical result to the vector one, __v_foo is a vector function using the base PCS, __vn_foo uses the vector PCS and _ZGV*_foo is the vector ABI symbol alias of vn_foo for a scalar math function foo. The test and benchmark code got extended to handle vector functions. Vector functions aim for < 5 ulp worst case error, only support nearest rounding mode and don't support floating-point exceptions. Vector functions may call scalar functions to handle special cases, but for a single value they should return the same result independently of values in other vector lanes or the position of the value in the vector. The __v_expf and __v_expf_1u polynomials were produced by searching the coefficient space with some heuristics and ideas from https://arxiv.org/abs/1508.03211 Their worst case error is 1.95 and 0.866 ulp respectively. The exp polynomial was produced by sollya, it uses a 128 element (1KB) lookup table and has 2.38 ulp worst case error.
2019-08-27math: fix duplicated declaration in mathlib.hSzabolcs Nagy
Removed tanf declaration since the implementation got removed too.
2018-11-22Relicence the project under the MIT LicenseSzabolcs Nagy
2018-06-12Add pow to mathlib.hSzabolcs Nagy
Update mathlib.h to use GNU style declarations and add missing pow.
2018-06-06Add new log2 implementationSzabolcs Nagy
Similar algorithm is used as in log, but there are more operations (and more error) due to the 1/ln2 multiplier. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, for which the table size is doubled, and to compute (x/c - 1)/ln2 precisely. The worst case error is 0.547 ULP (0.55 without fma), the read only global data size is 1168 bytes (2192 without fma). The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log latency: 2.04x log thruput: 1.87x
2018-06-05Add new double precision functions to mathlib.hSzabolcs Nagy
2018-05-16Improve performance of sinf/cosf/sincosfWilco Dijkstra
This patch is a complete rewrite of sinf, cosf and sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.56072, maximum relative error is 0.5303p-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases - on some targets this may be slow, so this can be configured to use floating point comparisons. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary.
2018-05-16Remove the ARM__ symbol prefixSzabolcs Nagy
Use standard math symbols so it's easy to override libm functions. The arm_math.h header is no longer necessary, user code can just use math.h, but keep a header for freestanding code.
2018-05-16Reformat the license headersSzabolcs Nagy
Use standard name for the LICENSE file. Use consistent license text across files: - "ARM" is changed to Arm, - "All Rights Reserved" is dropped (not needed), - "This file is part of.." is dropped, - Text is formatted as is recommended by the LICENSE file.
2017-08-11Add new expf, exp2f, logf, log2f and powf implementationsSzabolcs Nagy
2015-11-19Initial release of Optimized RoutinesGeorge Lander