Add new log2 implementation

Similar algorithm is used as in log, but there are more operations (and more error) due to the 1/ln2 multiplier. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, for which the table size is doubled, and to compute (x/c - 1)/ln2 precisely. The worst case error is 0.547 ULP (0.55 without fma), the read only global data size is 1168 bytes (2192 without fma). The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log latency: 2.04x log thruput: 1.87x
author: Szabolcs Nagy <szabolcs.nagy@arm.com> 2018-06-05 16:15:27 +0100
committer: Szabolcs Nagy <szabolcs.nagy@arm.com> 2018-06-06 16:17:19 +0100
commit: d69e504577169c5f75803f1b97a42822898a78b3 (patch)
tree: 6196f61c3386e50ad8257d6a1f21c90ef39dddb8 /math/include
parent: a7711a35d57cae0c9fcf0cd61903bbf4701240cf (diff)
download: arm-optimized-routines-d69e504577169c5f75803f1b97a42822898a78b3.tar.gz
1 files changed, 1 insertions, 0 deletions
diff --git a/math/include/mathlib.h b/math/include/mathlib.h
index 17fdcf4..7a544d1 100644
--- a/math/include/mathlib.h
+++ b/math/include/mathlib.h
@@ -32,3 +32,4 @@ void sincosf(float, float*, float*);
 double exp(double);
 double exp2(double);
 double log(double);
+double log2(double);
author	Szabolcs Nagy <szabolcs.nagy@arm.com>	2018-06-05 16:15:27 +0100
committer	Szabolcs Nagy <szabolcs.nagy@arm.com>	2018-06-06 16:17:19 +0100
commit	d69e504577169c5f75803f1b97a42822898a78b3 (patch)
tree	6196f61c3386e50ad8257d6a1f21c90ef39dddb8 /math/include
parent	a7711a35d57cae0c9fcf0cd61903bbf4701240cf (diff)
download	arm-optimized-routines-d69e504577169c5f75803f1b97a42822898a78b3.tar.gz