external/arm-optimized-routines.git

Age	Commit message (Collapse)	Author
2022-08-22	string: arm: Augment M-profile PACBTI-enablement macros	Victor Do Nascimento
	Modify previously defined PACBTI macros to allow for more flexible push/pop expressions at function prologue/epilogues, allowing further simplification of code predicated on the use of M-profile PACBTI hardware features. This patch also allows for the specification of whether generated pac keys are pushed onto the stack for leaf functions where this may not be necessary. It defines the following preprocessor macros: * HAVE_PAC_LEAF: Indicates whether pac-signing has been requested for leaf functions. * PAC_LEAF_PUSH_IP: Whether leaf functions should push the pac code to the stack irrespective of whether the ip register is clobbered in the function or not. * PAC_CFI_ADJ: Given values for the above two parameters, this holds the calculated offset applied to default CFI address/offset values as a consequence of potentially pushing the pac-code to the stack. It also defines the following assembler macros: * prologue: In addition to pushing any callee-saved registers onto the stack, it generates any requested pacbti instructions. Pushed registers are specified via the optional `first', `last' and `savepac' macro argument parameters. when a single register number is provided, it pushes that register. When two register numbers are provided, they specify a rage to save. If savepac is non-zero, the ip register is also saved. For example: prologue savepac=1 -> push {sp} prologue 1 -> push {r1} prologue 1 savepac=1 -> push {r1, ip} prologue 1 4 -> push {r1-r4} prologue 1 4 savepac=1 -> push {r1-r4, ip} * epilogue: pops registes off the stack and emmits pac key signing instruction if requested. The optional `first', `last' and `savepac' function as per the prologue macro, generating a pop instead of push instruction. * cfisavelist - prologue macro helper function, generating necessary .cfi_offset directives associated with push instruction. Therefore, the net effect of calling `prologue 1 2 savepac=1' is to generate the following: push {r1-r2, ip} .cfi_adjust_cfa_offset 12 .cfi_offset 143, -12 .cfi_offset 2, -8 .cfi_offset 1, -4 * cfirestorelist - epilogue macro helper function, emitting .cfi_restore instructions prior to resetting the cfa offset. As such, calling `epilogue 1 2 savepac=1' will produce: pop {r1-r2, ip} .cfi_restore 143 .cfi_restore 2 .cfi_restore 1 .cfi_def_cfa_offset 0
2022-08-22	string: arm: Prevent leaf function unwinding	Victor Do Nascimento
	As leaf functions cannot throw exceptions, with EHABI only supporting synchronous exceptions, add support for emitting a `.cantunwind' directive prior to `.fnend' in ARM_FNEND preprocessor macro. This ensures no personality routine or exception table data is generated. Existing `.save' directives used in leaf functions are also removed. Built w/ arm-none-linux-gnueabihf, ran make check-string w/ qemu-arm-static.
2022-08-17	pl/math: Add vector/SVE atan2	Joe Ramsay
	New routine is accurate to 2 ulps.
2022-08-17	pl/math: Add vector/SVE atan2f	Joe Ramsay
	New routine is accurate to 3 ulps.
2022-08-17	pl/math: Add vector/SVE atan	Joe Ramsay
	New routine uses polynomial on a reduced interval, and is accurate to 2.5 ulp.
2022-08-17	pl/math: Add vector/SVE atanf	Joe Ramsay
	New routine uses polynomial on a reduced interval, and is accurate to 2.9 ulp.
2022-08-17	pl/math: Add Vector/SVE sin	Joe Ramsay
	An implementation based on SVE trigonometric instructions. It relies on a similar range reduction as Vector/Neon sin, but to [-pi/4, pi/4] instead of [-pi/2, pi/2]. The estimated maximum error is 1.95ULPs.
2022-08-17	pl/math: Add Vector/SVE sinf	Joe Ramsay
	An implementation based on Taylor series expansion of sin. The maximum measured error is 1.89ULPs.
2022-08-15	pl/math: Audit Neon special-case handlers	Joe Ramsay
	Prevent inlining in most cases - change to use AOR style (NOINLINE).
2022-08-15	pl/math: Add vector/Neon log2	Joe Ramsay
	New routine uses the same algorithm as vector log10, with scaled coefficients. Accurate to 2.5 ulp.
2022-08-15	pl/math: Add vector/Neon log2f	Joe Ramsay
	The new routine is based on the scalar algorithm in the main math directory, but with all arithmetic done in single precision. invc is represented with a high and a low part. The routine is accurate to 2.6 ULPs.
2022-08-03	string: arm: Augument unwind information for PAC instructions	Victor Do Nascimento
	Add the `.cfi_register 143, 12' directive immediately after pac instruction is emitted. Ensures unwind info consumers know immediately that if they need the PAC for the function, they can find it in ip register.
2022-08-03	string: arm: Update feature test macro use in .arch selection	Victor Do Nascimento
	Move away from use of the non-portable __ARM_ARCH_8M_MAIN__ feature test macro in favour of __ARM_ARCH >= 8 in selecting for target architecture selection.
2022-08-03	string: arm: Implement conditional leaf PAC signing	Victor Do Nascimento
	Adjust critetion for M-profile PACBTI signing of leaf function to be contingent on +leaf option being passed to -mbranch-protect compilation option.
2022-07-21	pl/math: Add Vector/SVE cos.	Joe Ramsay
	An implementation based on SVE trigonometric instructions. It relies on the same range reduction as Vector/Neon cos, with a slight modification of the shift. The maximum measured error is 2.11ULPs around x = 205.522.
2022-07-21	pl/math: Add Vector/SVE cosf.	Joe Ramsay
	An implementation based on SVE trigonometric instructions. It relies on the same range reduction as Vector/Neon cosf, with a slight modification of the shift. The maximum measured error is 2.06ULPs.
2022-07-21	Update config.mk example to define WANT_SVE_MATH.	Pierre Blanchard
	This is required when running a `make check`, in order to avoid running ulp tests on SVE routines when SVE is disabled. Keeping the definition of cflags for SVE in the config file to allow user control over `-march`.
2022-07-19	Snap for 8843601 from a597e56fe5d1d57af3f980c8d49c29b3375d23e5 to udc-release	Android Build Coastguard Worker
	Change-Id: I983613368e28bf4b4f2b952e93512ee0ea5cf532
2022-07-18	Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270 am: ↵	Andrew Walbran
	5e88621e0d am: 0b204e7d3b am: d810904721 Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: Ib6f68f1c58fcda877d5640a60c897bf10abda16d Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18	Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270 am: ↵	Andrew Walbran
	5e88621e0d am: 0b204e7d3b Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: Iecb24cc82429ee083523a3769049cb7f907fb48b Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18	Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270 am: ↵	Andrew Walbran
	5e88621e0d Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: I1f5046f625ca1ae155065ea7b3a1a584ddc9bda9 Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18	Add library for core memory routines. am: 4d56ab71ea am: 6c969b7270	Andrew Walbran
	Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: I69e4ac7d42caef55ea972e6a9406928b3052f413 Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18	Add library for core memory routines. am: 4d56ab71ea	Andrew Walbran
	Original change: https://android-review.googlesource.com/c/platform/external/arm-optimized-routines/+/2153622 Change-Id: I5ff52167cce5d4fe77e559f4d19015ffaa4898f9 Signed-off-by: Automerger Merge Worker <android-build-automerger-merge-worker@system.gserviceaccount.com>
2022-07-18	Add library for core memory routines.	Andrew Walbran
	The Rust compiler may emit calls to memcmp, memcpy, memmove and memset. Usually these come from libc, but in bare-metal binaries they must be provided somewhere. For now we only use bare-metal rust on aarch64, so the Arm optimized-routines implementation seems like the best choice. Bug: 223166344 Test: atest vmbase_example.integration_test Change-Id: Id2439208160411dcde12be76fbaa22c30c24b81d
2022-07-15	pl/math: Add vector/Neon asinhf	Joe Ramsay
	The new routine uses vector log1pf, and is accurate to 2.7 ulp.
2022-07-14	pl/math: Add vector/Neon log1pf	Joe Ramsay
	The new routine is a Neon port of the scalar algorithm, using the same coefficients. The worst-case error is about 2.1 ulp when using Estrin, at the same value as the scalar algorithm.
2022-07-14	pl/math: Add scalar log1pf	Joe Ramsay
	The new routine uses a polynomial on a reduced interval. Worst-case error is about 2.1 ULP, with an option in math_config.h to use Horner instead of Estrin for the polynomial, which gives worst-case error of 1.3 ULP.
2022-07-12	pl/math: Add scalar log1p	Joe Ramsay
	New routine uses a polynomial on reduced interval. Worst-case error is about 1.7 ULP.
2022-07-12	pl/math: Add scalar asinh	Joe Ramsay
	The new routine uses a similar approach to asinhf, using a polynomial only in the region where either returning x or calculating the result directly is not sufficiently precise. Worst-case error is about 2 ULP, close to 1. There are 4 intervals with slightly different error behaviour, as follows: Interval Worst-case accuracy (ulp) \|x\| < 2^-26 0.0 \|x\| < 1 1.5 \|x\| < ~sqrt(DBL_MAX) 2.0 \|x\| < infinity 1.0 log has been copied from the main math directory so that it can be used in asinh. The only modifications to the relevant files are to remove aliases and rename log itself to an internal 'helper' name.
2022-07-12	pl/math: Add scalar asinhf	Joe Ramsay
	asinhf depends on logf, which has been copied over from the main math directory. The only modification was to change the name logf to optr_aor_log_f32 to resolve any ambiguity with libm. Worst-case error is about 3.4 ULP, at very large input. There are 4 intervals with slightly different error behaviour, as follows: Interval Worst-case accuracy (ulp) \|x\| < 2^-12 0 \|x\| < 1 1.3 \|x\| < sqrt(FLT_MAX) 2.0 \|x\| < infinity 3.4
2022-07-11	pl/math: Remove some stray semi-colons	Joe Ramsay
	These cause compile warnings and are unnecessary as strong_alias is a macro.
2022-07-06	string: Optimize string functions with shrn instruction	Danila Kutenin
	Optimize __memchr_aarch64_mte __memrchr_aarch64 __strchrnul_aarch64_mte __stpcpy_aarch64 __strcpy_aarch64 __strlen_aarch64_mte using the shrn instruction for computing the nibble mask instead of and + addp, which reduces instruction count.
2022-07-04	string: simplify M-profile strlen PACBTI epilogue	Victor Do Nascimento
	Merge stack pop instructions prior to returning from function. This also introduces fixes to CFI offset calculations to reflect the register ordering on push and pop instructions, with the lowest-numbered register saved to the lowest memory address.
2022-07-04	string: simplify M-profile memchr PACBTI epilogue	Victor Do Nascimento
	Merge stack pop instructions prior to returning from function. This also introduces fixes to CFI offset calculations to reflect the register ordering on push and pop instructions, with the lowest-numbered register saved to the lowest memory address.
2022-06-24	string: Fix ARM_FNSTART on non-arm targets	Szabolcs Nagy
	Fix build failure introduced by commit 40b662ce7b65d5eaefa40fd8046d6f3c6b3238c1 string: add .fnstart and .fnend directives to ENTRY/END macros
2022-06-22	string: Add M-profile PACBTI implementation of memchr	Victor Do Nascimento
	Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested
2022-06-22	string: Add M-profile PACBTI implementation of strlen	Victor Do Nascimento
	Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested.
2022-06-22	string: Add M-profile PACBTI implementation of strcmp	Victor Do Nascimento
	Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested. NOTE: ENTRY_ALIGN() Macro factored out as .fnstart & .cfi_startproc directives needed to be moved to prior to L(fastpath_exit)
2022-06-22	string: Add M-profile PACBTI-enablement header file	Victor Do Nascimento
	Header adds assembler macro to handle Pointer Authentication and Branch Target Identification assembly instructions in function prologues and epilogues according to selected flags at compile-time.
2022-06-22	string: add .fnstart and .fnend directives to ENTRY/END macros	Victor Do Nascimento
	Modify the ENTRY_ALIGN and END assembler macros to mark the start and end of functions for arm unwind tables. Enables the pacbti epilogue function to emit .save{} directives for stack unwinding.
2022-06-22	string: Fix header file issue in arm strcmp-armv6m.S	Victor Do Nascimento
	Fix missing include directive for use of ENTRY_ALIGN and END macros.
2022-06-22	string: Fix header file issue in strlen test	Victor Do Nascimento
	Remove unnecessary sys/mman.h dependency.
2022-06-22	pl/math: Use single-precision fma in atan2f	Joe Ramsay
	The polynomial was mistakenly using double-precision fma, where single is sufficiently accurate. New underflow special cases have been handled accordingly.
2022-06-20	pl/math: Improve accuracy in log10	Joe Ramsay
	Increase polynomial order to 12, and update summation scheme to match AOR log. New coefficients are copied from AOR log.
2022-06-17	pl/math: Add vector/Neon atanf	Joe Ramsay
	Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
2022-06-17	pl/math: Add vector/Neon atan2f	Joe Ramsay
	Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
2022-06-16	pl/math: Add vector/Neon atan	Joe Ramsay
	Successfully ran tests and benchmarks. New routine is accurate to 3 ulps.
2022-06-16	pl/math: Add vector/Neon atan2	Joe Ramsay
	Successfully ran tests and benchmarks. New routine is accurate to 3.0 ulps.
2022-06-16	pl/math: Add scalar atan2f	Joe Ramsay
	Ran make check and benchmarks. New routine is accurate to 3.0 ULP.
2022-06-16	pl/math: Add scalar atan2	Joe Ramsay
	Ran tests and benchmarks. The new routine is accurate to 2.0 ulps.