external/arm-optimized-routines.git

Age	Commit message (Collapse)	Author
2023-01-24	pl/math: Fix a copyright notice for consistency	Szabolcs Nagy
	The (c) is not strictly required, but it was only missing from one file.
2023-01-23	pl/math: Reduce order of single-precision tan polynomial	Joe Ramsay
	For both vector and scalar routines we reduce the order from 6 to 5. For vector routines, this requires reducing RangeVal as for large values the tan polynomial is not quite accurate enough. However the cotan polynomial is used in the inaccurate region in the scalar routine, so this does not need to change. Accuracy of scalar routine is unchanged. Accuracy in both vector routines is now 3.45 ULP, with the same worst-case.
2023-01-19	pl/math: Add vector/Neon tan	Joe Ramsay
	New routine uses a similar technique to the single-precision Neon routine, but with an extra reduction to pi/8 using the double-angle formula. It is accurate to 3.5 ULP.
2023-01-09	pl/math: Fix benchmark entries for SVE bivariate functions	Pierre Blanchard
	Variant was wrongly set in structures used to benchmark SVE functions. Before this change only half of the lanes were set as expected. Also reformat for ease of reading.
2023-01-06	pl/math: Update copyright years	Joe Ramsay
	All files in pl/math updated to 2023.
2023-01-05	Rewrite two abs masks as literals	Joe Ramsay
	These were technically undefined behaviour - they have been rewritten without the shift so that their type is unsigned int by default.
2023-01-05	pl/math: Add vector/Neon acosh	Joe Ramsay
	New routine is based on a vector implementation from log1p, which has been reused (with some modification for improved accuracy close to 0) from Neon atanh. Accurate to 3.5 ULP.
2023-01-05	pl/math: Add vector/Neon acoshf	Joe Ramsay
	New routine uses inlined log1pf helper, and is accurate to 3.1 ULP (2.8 ULP if fp exceptions are enabled).
2023-01-05	pl/math: Add scalar & vector/Neon tanh	Joe Ramsay
	New routines use the same algorithm, reliant on a modified version of expm1, and are accurate to 3 ULP.
2022-12-30	pl/math: Add vector/SVE log2	Pierre Blanchard
	The new SVE implementation is a direct port of Neon log2, and is accurate to 2.58 ULPs. Update error threshold and comments for Neon log2 too, new approximate argmax but same threshold.
2022-12-30	pl/math: Add vector/SVE log2f	Pierre Blanchard
	New SVE routine is an SVE port of the Neon algorithm and is accurate to 2.48 ULPs.
2022-12-22	pl/math: Add scalar & vector/Neon atanh	Joe Ramsay
	New routines are both based on existing log1p routines. Scalar is accurate to 3 ULP, Neon to 3.5 ULP. Both set fp exceptions correctly regardless of build config.
2022-12-22	pl/math: Add scalar atan and set fenv in Neon atan	Joe Ramsay
	The simplest way to set fenv in Neon atan is by using a scalar fallback for under/overflow cases, however this routine did not have a scalar counterpart so we add a new one, based on the same algorithm and polynomial as the vector variants, and accurate to 2.5 ULP. This is now used as the fallback for all lanes, when any lane of the Neon input is special.
2022-12-22	pl/math: Fix fp exceptions in Neon sinhf and sinh	Joe Ramsay
	Both routines previously relied on the vector expm1(f) routine exposed by the library, which depended on WANT_SIMD_EXCEPT for its fenv behaviour, however both routines were expected to always trigger fp exceptions correctly. To remedy this, both routines now use an inlined helper for expm1 (reused from vector tanhf in the case of sinhf), and special-case small input as well as large when WANT_SIMD_EXCEPT is enabled.
2022-12-20	Correct exit code from runulp.sh	Joe Ramsay
	The pipe prevented FAILs and PASSs being counted properly - the while read loop has been rewritten without a pipe, as it was prior to the changes here. fenv checking is temporarily disabled in Neon sinh and sinhf, as they do not get it right. This will be re-enabled once they have been fixed.
2022-12-20	pl/math: Update ULP threshold for SVE erf	Pierre Blanchard
	Updated comment and test threshold.
2022-12-20	pl/math: Add scalar atanf and set fenv in Neon atanf	Joe Ramsay
	The simplest way to set fenv in Neon atanf is by using a scalar fallback to under/overflow cases, however this routine did not have a scalar counterpart so we add a new one, based on the same algorithm and polynomial as the vector variants, and accurate to 2.9 ULP. This is now used as the fallback for all lanes, when any lane of the Neon input is special.
2022-12-20	pl/math: Add scalar & vector/Neon cbrt	Joe Ramsay
	New routines use the same algorithm, with simplified argument reduction and recombination in the vector variant. Both are accurate to 2 ULP.
2022-12-19	pl/math: Update ULP threshold for Neon asinh	Joe Ramsay
	New max observed - updated filenames, comments and runulp threshold.
2022-12-19	pl/math: Replace WANT_ERRNO with WANT_SIMD_EXCEPT for Neon fenv	Joe Ramsay
	We were previously misusing the WANT_ERRNO build flag. This is now replaced everywhere appropriate with WANT_SIMD_EXCEPT. A small number of vector routines get fp exceptions right with no modification - the tests have been updated to track this.
2022-12-19	pl/math: Improve vector/Neon log2f	Pierre Blanchard
	A new implementation based on the same approach as Neon logf, that is accurate to 2.48 ULPs. Flags set correctly regardless of WANT_ERRNO.
2022-12-15	pl/math: Move test intervals to routine source files	Joe Ramsay
	To conclude the work on simplifying the runulp.sh script, a new macro has been introduced to specify the intervals in which a routine should be tested in the routine source. This is eventually consumed by runulp.sh.
2022-12-15	pl/math: Move fenv expectations out of runulp.sh	Joe Ramsay
	Introduces a new macro, similar to how ULP thresholds are now handled, that emits a list of routines which are expected to correctly trigger fenv exceptions, to be consumed by runulp.sh. All scalar routines are expected to do so. A small number of Neon routines are also expected to, dependent on WANT_ERRNO.
2022-12-15	pl/math: Move ULP limits to routine source files	Joe Ramsay
	Introduces a new set of macros and Make rules for mechanically generating a list of ULP limits for each routine, to be consumed by runulp.sh. This removes the need to maintain long lists of thresholds in runulp.sh.
2022-12-15	pl/math: Auto-generate mathbench and ulp headers	Joe Ramsay
	Instead of maintaining three separate lists of routines, which are cumbersome and prone to merge conflicts, we provide a new macro, PL_SIG, which by some preprocessor machinery outputs the lists in the required format (macro formats have been changed very slightly to make the generation simpler). Only routines with simple signatures are handled - binary functions still need mathbench wrappers defined manually. As well, routines with non-standard references (i.e. powi/powk) still need entries and wrappers manually defined.
2022-12-13	pl/math: Set fenv flags in Neon log1p	Joe Ramsay
	New behaviour is hidden behind WANT_ERRNO config option.
2022-12-13	pl/math: Set fenv flags in Neon tanf	Joe Ramsay
	New behaviour is hidden behind WANT_ERRNO config option.
2022-12-13	pl/math: Set fenv flags in Neon log2f	Joe Ramsay
	Flags set correctly regardless of WANT_ERRNO.
2022-12-13	pl/math: Update ULP threshold for SVE atan2	Pierre Blanchard
	Test threshold fixed.
2022-12-13	pl/math: Set fenv flags in Neon log1pf	Joe Ramsay
	New behaviour is hidden behind WANT_ERRNO config option.
2022-12-09	pl/math: Add polynomial helpers	Joe Ramsay
	Add macros for simplifying polynomial evaluation using either Horner, pairwise Horner or Estrin. Several routines have been modified to use the new helpers. Readability is improved slightly, and we expect that this will make prototyping new routines simpler.
2022-12-09	pl/math/test: Simplify runulp.sh	Joe Ramsay
	Small simplification - pl routines do not support different rounding modes, so there is no need to support them in runulp.sh. As a result we can also remove Ldir.
2022-12-08	pl/math: Fix fenv in asinh	Joe Ramsay
	Special lanes were not being properly masked when a lane was tiny. This is now fixed.
2022-12-08	pl/math: Fix vector/SVE erf	Pierre Blanchard
	Fixing a bug that resulted in potentially random results in boring domain by saturating index at an appropriate value.
2022-12-06	pl/math: Set fenv flags in Neon asinhf	Joe Ramsay
	Routine no longer relies on vector log1pf, as this has to become more complex to deal with fenv itself. Instead we re-use a log1pf helper from Neon atanhf which does no special-case handling, instead leaving it all up to the main routine. We now just fall back to the scalar routine for special-case handling. This uncovered a mistake in asinhf's handling of NaNs, which has been fixed.
2022-12-05	pl/math: Avoid UB in scalar tanhf	Joe Ramsay
	The ldexp shortcut was left-shifting a signed value. We now bias the exponent first, will allows the shift to be done on an unsigned value.
2022-11-30	pl/math: Add scalar and vector/Neon tanhf	Joe Ramsay
	Both routines use simplified inline versions of expm1f, and are accurate to 2.6 ULP.
2022-11-29	pl/math: Add vector/Neon asinh	Joe Ramsay
	New routine uses two separate algorithms for input greater and less than 1 (similar to the scalar routine). It is accurate to 2.5 ULP.
2022-11-24	pl/math: Update ULP threshold for vector atans	Joe Ramsay
	New max observed for both Neon and SVE.
2022-11-22	pl/math: Add scalar & vector/Neon cbrtf	Joe Ramsay
	Both routines use the same algorithm - one Newton iteration with the initial guess obtained by a low-order polynomial. Scalar is used as a fallback for subnormal and special cases for the vector routine, which allows vastly simplified argument reduction and reassembly. Both routines accurate to 1.5 ULP.
2022-11-22	pl/math: Add scalar and vector/Neon atanhf	Joe Ramsay
	Both routines are based on a simplified version of log1pf, and are accurate to 3.1 ULP. Also enabled -c flag from runulp.sh - we need this for atanhf so that we can set the control lane to something other than 1, since atanh(1) is infinite.
2022-11-17	pl/math: Add scalar & vector/Neon cosh	Joe Ramsay
	New routines are based on double-precision exp, both accurate to 2 ULP.
2022-11-17	pl/math: Add scalar and vector/Neon sinh	Joe Ramsay
	New routines are based on the single-precision versions and are accurate to 3 ULP.
2022-11-15	pl/math: Use order-6 polynomial in Vector/Neon log2	Nicholas Dingle
	Reduce the order of the polynomial used in Neon log2 by one (from 7 to 6). In order to calculate the new coefficients required we rescale the coefficients from log_data.c by log2(e) in extended precision and round back. The maximum observed error is unchanged (2.59 ULPs) but the point at which it is observed has changed slightly.
2022-11-15	pl/math: Change conflicting variable names	Joe Ramsay
	There is collision for math-tests and math-rtests between math/ and pl/math, which can lead to failures if running both concurrently. We rename the pl-specific lists to avoid this.
2022-11-11	pl/math: Fix minus zero in vector expm1	Joe Ramsay
	Extra special-case check.
2022-11-11	pl/math: Fix SVE mathbench wrappers	Joe Ramsay
	These were broken in the previous patch, now fixed.
2022-11-09	pl/math/test: Simplify ulp and bench macros	Joe Ramsay
	Reduces the amount of boilerplate developers need to write for new routines.
2022-11-09	pl/math: Add vector/Neon expm1	Joe Ramsay
	New routine is a vector port of the scalar algorithm, with fallback to the scalar variant for large and special input. This enables us to simplify elements of the algorithm which were necessary for large input. It also means that, as long as we fall back to the scalar for tiny input as well (dependent on the value of WANT_ERRNO), the routine sets fenv flags correctly.
2022-11-09	pl/math: Add scalar expm1	Joe Ramsay
	New routine uses the same algorithm as the single-precision routine, and is accurate to 2.5 ULP.