Age | Commit message (Collapse) | Author |
|
Modify previously defined PACBTI macros to allow for
more flexible push/pop expressions at function prologue/epilogues,
allowing further simplification of code predicated on the use of
M-profile PACBTI hardware features.
This patch also allows for the specification of whether generated pac
keys are pushed onto the stack for leaf functions where this may not
be necessary.
It defines the following preprocessor macros:
* HAVE_PAC_LEAF: Indicates whether pac-signing has been requested for
leaf functions.
* PAC_LEAF_PUSH_IP: Whether leaf functions should push the pac code
to the stack irrespective of whether the ip register is clobbered in
the function or not.
* PAC_CFI_ADJ: Given values for the above two parameters, this
holds the calculated offset applied to default CFI address/offset
values as a consequence of potentially pushing the pac-code to the
stack.
It also defines the following assembler macros:
* prologue: In addition to pushing any callee-saved registers onto
the stack, it generates any requested pacbti instructions.
Pushed registers are specified via the optional `first', `last' and
`savepac' macro argument parameters.
when a single register number is provided, it pushes that
register. When two register numbers are provided, they specify a
rage to save. If savepac is non-zero, the ip register is also
saved.
For example:
prologue savepac=1 -> push {sp}
prologue 1 -> push {r1}
prologue 1 savepac=1 -> push {r1, ip}
prologue 1 4 -> push {r1-r4}
prologue 1 4 savepac=1 -> push {r1-r4, ip}
* epilogue: pops registes off the stack and emmits pac key signing
instruction if requested. The optional `first', `last' and
`savepac' function as per the prologue macro, generating a pop
instead of push instruction.
* cfisavelist - prologue macro helper function, generating
necessary .cfi_offset directives associated with push instruction.
Therefore, the net effect of calling `prologue 1 2 savepac=1' is
to generate the following:
push {r1-r2, ip}
.cfi_adjust_cfa_offset 12
.cfi_offset 143, -12
.cfi_offset 2, -8
.cfi_offset 1, -4
* cfirestorelist - epilogue macro helper function, emitting
.cfi_restore instructions prior to resetting the cfa offset. As
such, calling `epilogue 1 2 savepac=1' will produce:
pop {r1-r2, ip}
.cfi_restore 143
.cfi_restore 2
.cfi_restore 1
.cfi_def_cfa_offset 0
|
|
As leaf functions cannot throw exceptions, with EHABI only supporting
synchronous exceptions, add support for emitting a `.cantunwind'
directive prior to `.fnend' in ARM_FNEND preprocessor macro.
This ensures no personality routine or exception table data is
generated. Existing `.save' directives used in leaf functions are also
removed.
Built w/ arm-none-linux-gnueabihf, ran make check-string w/ qemu-arm-static.
|
|
New routine is accurate to 2 ulps.
|
|
New routine is accurate to 3 ulps.
|
|
New routine uses polynomial on a reduced interval, and is accurate to
2.5 ulp.
|
|
New routine uses polynomial on a reduced interval, and is accurate to
2.9 ulp.
|
|
An implementation based on SVE trigonometric instructions.
It relies on a similar range reduction as Vector/Neon
sin, but to [-pi/4, pi/4] instead of [-pi/2, pi/2].
The estimated maximum error is 1.95ULPs.
|
|
An implementation based on Taylor series expansion of sin.
The maximum measured error is 1.89ULPs.
|
|
Prevent inlining in most cases - change to use AOR style (NOINLINE).
|
|
New routine uses the same algorithm as vector log10, with scaled
coefficients. Accurate to 2.5 ulp.
|
|
The new routine is based on the scalar algorithm in the main math
directory, but with all arithmetic done in single precision. invc
is represented with a high and a low part. The routine is accurate
to 2.6 ULPs.
|
|
Add the `.cfi_register 143, 12' directive immediately after pac
instruction is emitted.
Ensures unwind info consumers know immediately that if they need
the PAC for the function, they can find it in ip register.
|
|
Move away from use of the non-portable __ARM_ARCH_8M_MAIN__ feature
test macro in favour of __ARM_ARCH >= 8 in selecting for target
architecture selection.
|
|
Adjust critetion for M-profile PACBTI signing of leaf function to be
contingent on +leaf option being passed to -mbranch-protect compilation
option.
|
|
An implementation based on SVE trigonometric instructions.
It relies on the same range reduction as Vector/Neon
cos, with a slight modification of the shift.
The maximum measured error is 2.11ULPs around x = 205.522.
|
|
An implementation based on SVE trigonometric instructions.
It relies on the same range reduction as Vector/Neon
cosf, with a slight modification of the shift.
The maximum measured error is 2.06ULPs.
|
|
This is required when running a `make check`, in order to avoid
running ulp tests on SVE routines when SVE is disabled.
Keeping the definition of cflags for SVE in the config file to
allow user control over `-march`.
|
|
Change-Id: I983613368e28bf4b4f2b952e93512ee0ea5cf532
|
|
5e88621e0d am: 0b204e7d3b am: d810904721
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622
Change-Id: Ib6f68f1c58fcda877d5640a60c897bf10abda16d
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
5e88621e0d am: 0b204e7d3b
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622
Change-Id: Iecb24cc82429ee083523a3769049cb7f907fb48b
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
5e88621e0d
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622
Change-Id: I1f5046f625ca1ae155065ea7b3a1a584ddc9bda9
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622
Change-Id: I69e4ac7d42caef55ea972e6a9406928b3052f413
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622
Change-Id: I5ff52167cce5d4fe77e559f4d19015ffaa4898f9
Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
|
|
The Rust compiler may emit calls to memcmp, memcpy, memmove and memset.
Usually these come from libc, but in bare-metal binaries they must be
provided somewhere. For now we only use bare-metal rust on aarch64, so
the Arm optimized-routines implementation seems like the best choice.
Bug: 223166344
Test: atest vmbase_example.integration_test
Change-Id: Id2439208160411dcde12be76fbaa22c30c24b81d
|
|
The new routine uses vector log1pf, and is accurate to 2.7 ulp.
|
|
The new routine is a Neon port of the scalar algorithm, using the same
coefficients. The worst-case error is about 2.1 ulp when using Estrin,
at the same value as the scalar algorithm.
|
|
The new routine uses a polynomial on a reduced interval. Worst-case
error is about 2.1 ULP, with an option in math_config.h to use Horner
instead of Estrin for the polynomial, which gives worst-case error of
1.3 ULP.
|
|
New routine uses a polynomial on reduced interval. Worst-case error is
about 1.7 ULP.
|
|
The new routine uses a similar approach to asinhf, using a polynomial
only in the region where either returning x or calculating the result
directly is not sufficiently precise.
Worst-case error is about 2 ULP, close to 1. There are 4
intervals with slightly different error behaviour, as follows:
Interval Worst-case accuracy (ulp)
|x| < 2^-26 0.0
|x| < 1 1.5
|x| < ~sqrt(DBL_MAX) 2.0
|x| < infinity 1.0
log has been copied from the main math directory so that it can be
used in asinh. The only modifications to the relevant files are to
remove aliases and rename log itself to an internal 'helper' name.
|
|
asinhf depends on logf, which has been copied over from the main math
directory. The only modification was to change the name logf to
optr_aor_log_f32 to resolve any ambiguity with libm.
Worst-case error is about 3.4 ULP, at very large input. There are 4
intervals with slightly different error behaviour, as follows:
Interval Worst-case accuracy (ulp)
|x| < 2^-12 0
|x| < 1 1.3
|x| < sqrt(FLT_MAX) 2.0
|x| < infinity 3.4
|
|
These cause compile warnings and are unnecessary as strong_alias is a
macro.
|
|
Optimize
__memchr_aarch64_mte
__memrchr_aarch64
__strchrnul_aarch64_mte
__stpcpy_aarch64
__strcpy_aarch64
__strlen_aarch64_mte
using the shrn instruction for computing the nibble mask instead of
and + addp, which reduces instruction count.
|
|
Merge stack pop instructions prior to returning from function. This also
introduces fixes to CFI offset calculations to reflect the register
ordering on push and pop instructions, with the lowest-numbered register
saved to the lowest memory address.
|
|
Merge stack pop instructions prior to returning from function. This also
introduces fixes to CFI offset calculations to reflect the register
ordering on push and pop instructions, with the lowest-numbered register
saved to the lowest memory address.
|
|
Fix build failure introduced by
commit 40b662ce7b65d5eaefa40fd8046d6f3c6b3238c1
string: add .fnstart and .fnend directives to ENTRY/END macros
|
|
Ensure BTI indirect branch landing pads (BTI) and pointer authentication
code genetaion (PAC) and verification instructions (BXAUT) are
conditionally added to assembly when branch protection is requested
|
|
Ensure BTI indirect branch landing pads (BTI) and pointer authentication
code genetaion (PAC) and verification instructions (BXAUT) are
conditionally added to assembly when branch protection is requested.
|
|
Ensure BTI indirect branch landing pads (BTI) and pointer authentication
code genetaion (PAC) and verification instructions (BXAUT) are
conditionally added to assembly when branch protection is requested.
NOTE: ENTRY_ALIGN() Macro factored out as .fnstart & .cfi_startproc
directives needed to be moved to prior to L(fastpath_exit)
|
|
Header adds assembler macro to handle Pointer Authentication and Branch
Target Identification assembly instructions in function prologues
and epilogues according to selected flags at compile-time.
|
|
Modify the ENTRY_ALIGN and END assembler macros to mark the start and
end of functions for arm unwind tables.
Enables the pacbti epilogue function to emit .save{} directives for
stack unwinding.
|
|
Fix missing include directive for use of ENTRY_ALIGN and END macros.
|
|
Remove unnecessary sys/mman.h dependency.
|
|
The polynomial was mistakenly using double-precision fma, where
single is sufficiently accurate. New underflow special cases have
been handled accordingly.
|
|
Increase polynomial order to 12, and update summation scheme to
match AOR log. New coefficients are copied from AOR log.
|
|
Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
|
|
Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
|
|
Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
|
|
Successfully ran tests and benchmarks. New routine is accurate to 3.0 ulps.
|
|
Ran make check and benchmarks. New routine is accurate to 3.0 ULP.
|
|
Ran tests and benchmarks. The new routine is accurate to 2.0 ulps.
|