Age | Commit message (Collapse) | Author |
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2292660
Change-Id: I66ad991cf1fac06969f6ef33a633feb1c7b0f9b4
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2292660
Change-Id: Ia9342d61d347c848df56818a754589a4e94bba9c
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2292660
Change-Id: I426442ab3a92baaf5697cc7a95a27fec14121094
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Test: treehugger
Change-Id: I64eb06a9d17c229abb026439d0cdd36ba646eaf4
|
|
8783f524be am: 8a481eb48c
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2402215
Change-Id: Id535bcfef3f5b328f4666742a1623794a20f982d
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2402215
Change-Id: Ief5438cf89d86c43e5f52d5e48b882bacb945170
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2402215
Change-Id: Ic29e1a6f8a0485f6317699bfda38fb79e7022ee1
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
This project was upgraded with external_updater.
Usage: tools/external_updater/updater.sh update arm-optimized-routines
For more info, check https://cs.android.com/android/platform/superproject/+/master:tools/external_updater/README.md
Test: TreeHugger
Change-Id: I03de0ecb0ea89c2a77b1c411685af09fd480b63e
|
|
* Project changes
* All files are under a new dual license now (MIT OR Apache-2.0 WITH
LLVM-exception at the election of the user).
* Added MAINTAINERS file describing who maintains the subdirectories.
* Added README.contributors files documenting contribution
requirements.
* Added new pl/ subdirectory for Arm's Performance Library related
routines.
* String routine changes
* Added memset benchmark.
* Improved strlen and memcpy benchmarks.
* Added SVE memcpy.
* Updated arm string functions to support M-profile PACBTI.
* Merged the MTE and generic versions of strcmp, strncmp, strcpy and
stpcpy into one implementation.
* Optimized memcmp, memchr-mte, memrchr, strchr-mte, strchrnul-mte,
strrchr-mte, strlen, strlen-mte, strnlen, strcpy.
* Math routine changes
* Fixed constants in sinf, cosf and sincosf to be compile time
computed even with gcc-12 -frounding-math.
* Fixed an invalid shift in logf.
* Support floating-point exceptions in vector math routines when
WANT_SIMD_EXCEPT is set.
|
|
The (c) is not strictly required, but it was only missing from one file.
|
|
Scripted copyright year updates based on git committer date.
|
|
Improve SVE memcpy by copying 2 vectors. This avoids a check on vector length
and improves performance of random memcpy.
|
|
For both vector and scalar routines we reduce the order from 6 to
5. For vector routines, this requires reducing RangeVal as for large
values the tan polynomial is not quite accurate enough. However the
cotan polynomial is used in the inaccurate region in the scalar
routine, so this does not need to change.
Accuracy of scalar routine is unchanged. Accuracy in both vector
routines is now 3.45 ULP, with the same worst-case.
|
|
New routine uses a similar technique to the single-precision Neon
routine, but with an extra reduction to pi/8 using the double-angle
formula. It is accurate to 3.5 ULP.
|
|
This is a partial revert of b7e368fb. If SVE assembly is guarded by
__ARM_FEATURE_SVE, it cannot build when SVE is not enabled by the build
system. This is ok on AOR, but because Android (bionic) uses ifuncs to
select the appropriate assembly at runtime, these need to compile
regardless of if the target actually supports the instructions.
Check for AArch64 and GCC >= 8 or Clang >= 5 so that SVE is not used on
compilers that do not support it. This condition will always be true on
future builds of Android for AArch64.
|
|
Optimize strcpy main loop - large strings are ~22% faster.
|
|
Use shrn for narrowing the mask which simplifies code. Unroll the
strchr search loop which improves performance on large strings.
|
|
Variant was wrongly set in structures used to benchmark SVE functions.
Before this change only half of the lanes were set as expected.
Also reformat for ease of reading.
|
|
All files in pl/math updated to 2023.
|
|
These were technically undefined behaviour - they have been rewritten
without the shift so that their type is unsigned int by default.
|
|
New routine is based on a vector implementation from log1p, which has
been reused (with some modification for improved accuracy close to 0)
from Neon atanh. Accurate to 3.5 ULP.
|
|
New routine uses inlined log1pf helper, and is accurate to 3.1 ULP
(2.8 ULP if fp exceptions are enabled).
|
|
New routines use the same algorithm, reliant on a modified version of
expm1, and are accurate to 3 ULP.
|
|
The new SVE implementation is a direct port of Neon log2, and is
accurate to 2.58 ULPs.
Update error threshold and comments for Neon log2 too, new
approximate argmax but same threshold.
|
|
New SVE routine is an SVE port of the Neon algorithm
and is accurate to 2.48 ULPs.
|
|
New routines are both based on existing log1p routines. Scalar is
accurate to 3 ULP, Neon to 3.5 ULP. Both set fp exceptions correctly
regardless of build config.
|
|
The simplest way to set fenv in Neon atan is by using a scalar
fallback for under/overflow cases, however this routine did not have a
scalar counterpart so we add a new one, based on the same algorithm
and polynomial as the vector variants, and accurate to 2.5 ULP. This
is now used as the fallback for all lanes, when any lane of the Neon
input is special.
|
|
Both routines previously relied on the vector expm1(f) routine exposed
by the library, which depended on WANT_SIMD_EXCEPT for its fenv
behaviour, however both routines were expected to always trigger fp
exceptions correctly. To remedy this, both routines now use an inlined
helper for expm1 (reused from vector tanhf in the case of sinhf), and
special-case small input as well as large when WANT_SIMD_EXCEPT is
enabled.
|
|
The pipe prevented FAILs and PASSs being counted properly - the while
read loop has been rewritten without a pipe, as it was prior to the
changes here.
fenv checking is temporarily disabled in Neon sinh and sinhf, as they
do not get it right. This will be re-enabled once they have been
fixed.
|
|
Updated comment and test threshold.
|
|
The simplest way to set fenv in Neon atanf is by using a scalar
fallback to under/overflow cases, however this routine did not have a
scalar counterpart so we add a new one, based on the same algorithm
and polynomial as the vector variants, and accurate to 2.9 ULP. This
is now used as the fallback for all lanes, when any lane of the Neon
input is special.
|
|
New routines use the same algorithm, with simplified argument
reduction and recombination in the vector variant. Both are accurate
to 2 ULP.
|
|
New max observed - updated filenames, comments and runulp threshold.
|
|
We were previously misusing the WANT_ERRNO build flag. This is now
replaced everywhere appropriate with WANT_SIMD_EXCEPT. A small number
of vector routines get fp exceptions right with no modification - the
tests have been updated to track this.
|
|
A new implementation based on the same approach as
Neon logf, that is accurate to 2.48 ULPs.
Flags set correctly regardless of WANT_ERRNO.
|
|
To conclude the work on simplifying the runulp.sh script, a new macro
has been introduced to specify the intervals in which a routine should
be tested in the routine source. This is eventually consumed by
runulp.sh.
|
|
Introduces a new macro, similar to how ULP thresholds are now
handled, that emits a list of routines which are expected to
correctly trigger fenv exceptions, to be consumed by runulp.sh.
All scalar routines are expected to do so. A small number of Neon
routines are also expected to, dependent on WANT_ERRNO.
|
|
Introduces a new set of macros and Make rules for mechanically
generating a list of ULP limits for each routine, to be consumed
by runulp.sh. This removes the need to maintain long lists of
thresholds in runulp.sh.
|
|
Instead of maintaining three separate lists of routines, which
are cumbersome and prone to merge conflicts, we provide a new
macro, PL_SIG, which by some preprocessor machinery outputs the
lists in the required format (macro formats have been changed
very slightly to make the generation simpler). Only routines with
simple signatures are handled - binary functions still need
mathbench wrappers defined manually. As well, routines with
non-standard references (i.e. powi/powk) still need entries and
wrappers manually defined.
|
|
edaf4d6088 am: 0bc581b0e4
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2338622
Change-Id: Id7177c8b85844f9ce0b189a5b4a72cbc85073daf
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2338622
Change-Id: I1007ac06e0383b6b89cec2d0a3e73f001fb80a7c
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2338622
Change-Id: Iac9d067e65a1452f7f3447795d80378af429e4a8
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
|
|
New behaviour is hidden behind WANT_ERRNO config option.
|
|
New behaviour is hidden behind WANT_ERRNO config option.
|
|
Flags set correctly regardless of WANT_ERRNO.
|
|
Test threshold fixed.
|
|
New behaviour is hidden behind WANT_ERRNO config option.
|
|
Migrate the libc dependencies for Rust to Bionic but keep using this
library as a back-end for arm64 so update the visibility to point to
libc instead of client code directly.
Test: m pvmfw_bin && atest vmbase_example.integration_test
Change-Id: I0e8b2ef862fcb47fcb66494aa180a7e66575a0a7
|
|
Add macros for simplifying polynomial evaluation using either Horner,
pairwise Horner or Estrin. Several routines have been modified to use
the new helpers. Readability is improved slightly, and we expect that
this will make prototyping new routines simpler.
|