aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-22string: arm: Augment M-profile PACBTI-enablement macrosVictor Do Nascimento
Modify previously defined PACBTI macros to allow for more flexible push/pop expressions at function prologue/epilogues, allowing further simplification of code predicated on the use of M-profile PACBTI hardware features. This patch also allows for the specification of whether generated pac keys are pushed onto the stack for leaf functions where this may not be necessary. It defines the following preprocessor macros: * HAVE_PAC_LEAF: Indicates whether pac-signing has been requested for leaf functions. * PAC_LEAF_PUSH_IP: Whether leaf functions should push the pac code to the stack irrespective of whether the ip register is clobbered in the function or not. * PAC_CFI_ADJ: Given values for the above two parameters, this holds the calculated offset applied to default CFI address/offset values as a consequence of potentially pushing the pac-code to the stack. It also defines the following assembler macros: * prologue: In addition to pushing any callee-saved registers onto the stack, it generates any requested pacbti instructions. Pushed registers are specified via the optional `first', `last' and `savepac' macro argument parameters. when a single register number is provided, it pushes that register. When two register numbers are provided, they specify a rage to save. If savepac is non-zero, the ip register is also saved. For example: prologue savepac=1 -> push {sp} prologue 1 -> push {r1} prologue 1 savepac=1 -> push {r1, ip} prologue 1 4 -> push {r1-r4} prologue 1 4 savepac=1 -> push {r1-r4, ip} * epilogue: pops registes off the stack and emmits pac key signing instruction if requested. The optional `first', `last' and `savepac' function as per the prologue macro, generating a pop instead of push instruction. * cfisavelist - prologue macro helper function, generating necessary .cfi_offset directives associated with push instruction. Therefore, the net effect of calling `prologue 1 2 savepac=1' is to generate the following: push {r1-r2, ip} .cfi_adjust_cfa_offset 12 .cfi_offset 143, -12 .cfi_offset 2, -8 .cfi_offset 1, -4 * cfirestorelist - epilogue macro helper function, emitting .cfi_restore instructions prior to resetting the cfa offset. As such, calling `epilogue 1 2 savepac=1' will produce: pop {r1-r2, ip} .cfi_restore 143 .cfi_restore 2 .cfi_restore 1 .cfi_def_cfa_offset 0
2022-08-22string: arm: Prevent leaf function unwindingVictor Do Nascimento
As leaf functions cannot throw exceptions, with EHABI only supporting synchronous exceptions, add support for emitting a `.cantunwind' directive prior to `.fnend' in ARM_FNEND preprocessor macro. This ensures no personality routine or exception table data is generated. Existing `.save' directives used in leaf functions are also removed. Built w/ arm-none-linux-gnueabihf, ran make check-string w/ qemu-arm-static.
2022-08-17pl/math: Add vector/SVE atan2Joe Ramsay
New routine is accurate to 2 ulps.
2022-08-17pl/math: Add vector/SVE atan2fJoe Ramsay
New routine is accurate to 3 ulps.
2022-08-17pl/math: Add vector/SVE atanJoe Ramsay
New routine uses polynomial on a reduced interval, and is accurate to 2.5 ulp.
2022-08-17pl/math: Add vector/SVE atanfJoe Ramsay
New routine uses polynomial on a reduced interval, and is accurate to 2.9 ulp.
2022-08-17pl/math: Add Vector/SVE sinJoe Ramsay
An implementation based on SVE trigonometric instructions. It relies on a similar range reduction as Vector/Neon sin, but to [-pi/4, pi/4] instead of [-pi/2, pi/2]. The estimated maximum error is 1.95ULPs.
2022-08-17pl/math: Add Vector/SVE sinfJoe Ramsay
An implementation based on Taylor series expansion of sin. The maximum measured error is 1.89ULPs.
2022-08-15pl/math: Audit Neon special-case handlersJoe Ramsay
Prevent inlining in most cases - change to use AOR style (NOINLINE).
2022-08-15pl/math: Add vector/Neon log2Joe Ramsay
New routine uses the same algorithm as vector log10, with scaled coefficients. Accurate to 2.5 ulp.
2022-08-15pl/math: Add vector/Neon log2fJoe Ramsay
The new routine is based on the scalar algorithm in the main math directory, but with all arithmetic done in single precision. invc is represented with a high and a low part. The routine is accurate to 2.6 ULPs.
2022-08-03string: arm: Augument unwind information for PAC instructionsVictor Do Nascimento
Add the `.cfi_register 143, 12' directive immediately after pac instruction is emitted. Ensures unwind info consumers know immediately that if they need the PAC for the function, they can find it in ip register.
2022-08-03string: arm: Update feature test macro use in .arch selectionVictor Do Nascimento
Move away from use of the non-portable __ARM_ARCH_8M_MAIN__ feature test macro in favour of __ARM_ARCH >= 8 in selecting for target architecture selection.
2022-08-03string: arm: Implement conditional leaf PAC signingVictor Do Nascimento
Adjust critetion for M-profile PACBTI signing of leaf function to be contingent on +leaf option being passed to -mbranch-protect compilation option.
2022-07-21pl/math: Add Vector/SVE cos.Joe Ramsay
An implementation based on SVE trigonometric instructions. It relies on the same range reduction as Vector/Neon cos, with a slight modification of the shift. The maximum measured error is 2.11ULPs around x = 205.522.
2022-07-21pl/math: Add Vector/SVE cosf.Joe Ramsay
An implementation based on SVE trigonometric instructions. It relies on the same range reduction as Vector/Neon cosf, with a slight modification of the shift. The maximum measured error is 2.06ULPs.
2022-07-21Update config.mk example to define WANT_SVE_MATH.Pierre Blanchard
This is required when running a `make check`, in order to avoid running ulp tests on SVE routines when SVE is disabled. Keeping the definition of cflags for SVE in the config file to allow user control over `-march`.
2022-07-19Snap for 8843601 from a597e56fe5d1d57af3f980c8d49c29b3375d23e5 to udc-releaseAndroid Build Coastguard Worker
Change-Id: I983613368e28bf4b4f2b952e93512ee0ea5cf532
2022-07-18Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270 am: ↵Andrew Walbran
5e88621e0d am: 0b204e7d3b am: d810904721 Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: Ib6f68f1c58fcda877d5640a60c897bf10abda16d Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270 am: ↵Andrew Walbran
5e88621e0d am: 0b204e7d3b Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: Iecb24cc82429ee083523a3769049cb7f907fb48b Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270 am: ↵Andrew Walbran
5e88621e0d Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: I1f5046f625ca1ae155065ea7b3a1a584ddc9bda9 Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270Andrew Walbran
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: I69e4ac7d42caef55ea972e6a9406928b3052f413 Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18Add library for core memory routines. am: 4d56ab71eaAndrew Walbran
Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: I5ff52167cce5d4fe77e559f4d19015ffaa4898f9 Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18Add library for core memory routines.Andrew Walbran
The Rust compiler may emit calls to memcmp, memcpy, memmove and memset. Usually these come from libc, but in bare-metal binaries they must be provided somewhere. For now we only use bare-metal rust on aarch64, so the Arm optimized-routines implementation seems like the best choice. Bug: 223166344 Test: atest vmbase_example.integration_test Change-Id: Id2439208160411dcde12be76fbaa22c30c24b81d
2022-07-15pl/math: Add vector/Neon asinhfJoe Ramsay
The new routine uses vector log1pf, and is accurate to 2.7 ulp.
2022-07-14pl/math: Add vector/Neon log1pfJoe Ramsay
The new routine is a Neon port of the scalar algorithm, using the same coefficients. The worst-case error is about 2.1 ulp when using Estrin, at the same value as the scalar algorithm.
2022-07-14pl/math: Add scalar log1pfJoe Ramsay
The new routine uses a polynomial on a reduced interval. Worst-case error is about 2.1 ULP, with an option in math_config.h to use Horner instead of Estrin for the polynomial, which gives worst-case error of 1.3 ULP.
2022-07-12pl/math: Add scalar log1pJoe Ramsay
New routine uses a polynomial on reduced interval. Worst-case error is about 1.7 ULP.
2022-07-12pl/math: Add scalar asinhJoe Ramsay
The new routine uses a similar approach to asinhf, using a polynomial only in the region where either returning x or calculating the result directly is not sufficiently precise. Worst-case error is about 2 ULP, close to 1. There are 4 intervals with slightly different error behaviour, as follows: Interval Worst-case accuracy (ulp) |x| < 2^-26 0.0 |x| < 1 1.5 |x| < ~sqrt(DBL_MAX) 2.0 |x| < infinity 1.0 log has been copied from the main math directory so that it can be used in asinh. The only modifications to the relevant files are to remove aliases and rename log itself to an internal 'helper' name.
2022-07-12pl/math: Add scalar asinhfJoe Ramsay
asinhf depends on logf, which has been copied over from the main math directory. The only modification was to change the name logf to optr_aor_log_f32 to resolve any ambiguity with libm. Worst-case error is about 3.4 ULP, at very large input. There are 4 intervals with slightly different error behaviour, as follows: Interval Worst-case accuracy (ulp) |x| < 2^-12 0 |x| < 1 1.3 |x| < sqrt(FLT_MAX) 2.0 |x| < infinity 3.4
2022-07-11pl/math: Remove some stray semi-colonsJoe Ramsay
These cause compile warnings and are unnecessary as strong_alias is a macro.
2022-07-06string: Optimize string functions with shrn instructionDanila Kutenin
Optimize __memchr_aarch64_mte __memrchr_aarch64 __strchrnul_aarch64_mte __stpcpy_aarch64 __strcpy_aarch64 __strlen_aarch64_mte using the shrn instruction for computing the nibble mask instead of and + addp, which reduces instruction count.
2022-07-04string: simplify M-profile strlen PACBTI epilogueVictor Do Nascimento
Merge stack pop instructions prior to returning from function. This also introduces fixes to CFI offset calculations to reflect the register ordering on push and pop instructions, with the lowest-numbered register saved to the lowest memory address.
2022-07-04string: simplify M-profile memchr PACBTI epilogueVictor Do Nascimento
Merge stack pop instructions prior to returning from function. This also introduces fixes to CFI offset calculations to reflect the register ordering on push and pop instructions, with the lowest-numbered register saved to the lowest memory address.
2022-06-24string: Fix ARM_FNSTART on non-arm targetsSzabolcs Nagy
Fix build failure introduced by commit 40b662ce7b65d5eaefa40fd8046d6f3c6b3238c1 string: add .fnstart and .fnend directives to ENTRY/END macros
2022-06-22string: Add M-profile PACBTI implementation of memchrVictor Do Nascimento
Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested
2022-06-22string: Add M-profile PACBTI implementation of strlenVictor Do Nascimento
Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested.
2022-06-22string: Add M-profile PACBTI implementation of strcmpVictor Do Nascimento
Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested. NOTE: ENTRY_ALIGN() Macro factored out as .fnstart & .cfi_startproc directives needed to be moved to prior to L(fastpath_exit)
2022-06-22string: Add M-profile PACBTI-enablement header fileVictor Do Nascimento
Header adds assembler macro to handle Pointer Authentication and Branch Target Identification assembly instructions in function prologues and epilogues according to selected flags at compile-time.
2022-06-22string: add .fnstart and .fnend directives to ENTRY/END macrosVictor Do Nascimento
Modify the ENTRY_ALIGN and END assembler macros to mark the start and end of functions for arm unwind tables. Enables the pacbti epilogue function to emit .save{} directives for stack unwinding.
2022-06-22string: Fix header file issue in arm strcmp-armv6m.SVictor Do Nascimento
Fix missing include directive for use of ENTRY_ALIGN and END macros.
2022-06-22string: Fix header file issue in strlen testVictor Do Nascimento
Remove unnecessary sys/mman.h dependency.
2022-06-22pl/math: Use single-precision fma in atan2fJoe Ramsay
The polynomial was mistakenly using double-precision fma, where single is sufficiently accurate. New underflow special cases have been handled accordingly.
2022-06-20pl/math: Improve accuracy in log10Joe Ramsay
Increase polynomial order to 12, and update summation scheme to match AOR log. New coefficients are copied from AOR log.
2022-06-17pl/math: Add vector/Neon atanfJoe Ramsay
Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
2022-06-17pl/math: Add vector/Neon atan2fJoe Ramsay
Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
2022-06-16pl/math: Add vector/Neon atanJoe Ramsay
Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
2022-06-16pl/math: Add vector/Neon atan2Joe Ramsay
Successfully ran tests and benchmarks. New routine is accurate to 3.0 ulps.
2022-06-16pl/math: Add scalar atan2fJoe Ramsay
Ran make check and benchmarks. New routine is accurate to 3.0 ULP.
2022-06-16pl/math: Add scalar atan2Joe Ramsay
Ran tests and benchmarks. The new routine is accurate to 2.0 ulps.