aboutsummaryrefslogtreecommitdiff
path: root/string
AgeCommit message (Collapse)Author
2023-01-10string: Compile memcpy-sve.S for aarch64 if compiler supports itJake Weinstein
This is a partial revert of b7e368fb. If SVE assembly is guarded by __ARM_FEATURE_SVE, it cannot build when SVE is not enabled by the build system. This is ok on AOR, but because Android (bionic) uses ifuncs to select the appropriate assembly at runtime, these need to compile regardless of if the target actually supports the instructions. Check for AArch64 and GCC >= 8 or Clang >= 5 so that SVE is not used on compilers that do not support it. This condition will always be true on future builds of Android for AArch64.
2023-01-10string: Optimize strcpyWilco Dijkstra
Optimize strcpy main loop - large strings are ~22% faster.
2023-01-10string: Improve strrchr-mteWilco Dijkstra
Use shrn for narrowing the mask which simplifies code. Unroll the strchr search loop which improves performance on large strings.
2022-12-07string: arm: Fix cfi restore info for hot loop exitVictor Do Nascimento
The branch out of the core memchr loop to label 60 jumps over the popping of registers r4-r7. The restoration of the cfi state at 60 is adjusted to reflect this fact, avoiding restoring a state where r4-r7 have already been popped off the stack. Built w/ arm-none-linux-gnueabihf, ran make check-string w/ qemu-arm-static.
2022-12-07string: arm: Ensure correct cfi state at strcmp entryVictor Do Nascimento
Move code fragment corresponding to L(fastpath_exit) to after function entry so that a .cfi_remember_state/.cfi_restore_state pair are not needed prior to strcmp start. The resulting reshuffle of code cleans up the entry part, fixing the .size directive calculation, which at present calculates the function size based on the address of __strcmp_arm and not L(strcmp_start_addr).
2022-11-17string: arm: Refactor ENTRY/END macrosSzabolcs Nagy
The .fnstart/.fnend directives can be inlined now that asmdefs.h is arm specific.
2022-11-17string: arm: Use /**/ comments in asmdefs.hSzabolcs Nagy
This is preprocessed asm code, so /**/ style comments are most appropriate.
2022-11-17string: arm: Include asmdefs.h even into empty asm filesSzabolcs Nagy
Currently this is not expected to change behaviour, but if global directives are added in asmdefs.h (like .thumb) those should be in all asm files in case the link ABI is affected.
2022-11-17string: Add separate asmdefs.h per targetSzabolcs Nagy
The definitions in this header are necessarily target specific, so better to have a separate version in each target directory.
2022-11-17string: arm: Fix build failureSzabolcs Nagy
asmdefs.h ifdef logic was wrong: arm only macro definitions were outside of defined(__arm__). Added some ifdef indentation to make the code more readable.
2022-10-21string: arm: Add new functionality to prologue/epilogue assembler macros.Victor Do Nascimento
This patch adds options for automatic alignment enforcement and for pushing/popping the lr register to prologue and epilogue assembler macros, while making the pushing of the ip register optional for PACBTI. Furthermore, as the use of these macros is independent of PACBTI and may be used on architectures without the feature, the macros are moved to a common header. Improvements are also made to cfi handling. Where absolute cfi offset calculation is complicated by optional function prologue parameters (e.g. the pushing of pac-codes to the stack for M-profile pacbti on function entry and pushing of dummy register when alignment required), replace the use of .cfi_offset for .cfi_rel_offset, simplifying cfi calculations by basing offsets on SP rather than the cfa. Finally, extensive in-source documentation is added to these macros to facilitate their use and further development. Built w/ arm-none-linux-gnueabihf, ran make check-string w/ qemu-arm-static.
2022-08-23string: Optimize memchr-mteWilco Dijkstra
Optimize the main loop - large strings are 40% faster.
2022-08-23string: Optimize memrchrWilco Dijkstra
Optimize the main loop - large strings are 43% faster.
2022-08-23string: Improve strchr-mteWilco Dijkstra
Simplify calculation of the mask using shrn. Unroll the main loop. Small strings are 20% faster.
2022-08-23string: Improve strchrnul-mteWilco Dijkstra
Unroll the main loop, which gives a small gain.
2022-08-23string: Improve strlenWilco Dijkstra
Use shrn for the mask, merge tst+bne into cbnz, tweak code alignment. The random strlen test improves by 2%.
2022-08-23string: Optimize strlen-mteWilco Dijkstra
Optimize strlen by unrolling the main loop. Large strings are 64% faster.
2022-08-23string: Optimize strnlenWilco Dijkstra
Optimize strnlen using the shrn instruction and improve the main loop. Small strings are 10% faster, large strings are 40% faster.
2022-08-23string: arm: Fix CFI auto-alignment issues.Victor Do Nascimento
The use of the PAC_CFI_ADJ macro for calculating the effect of pushing the IP register onto the stack assumes that pushing the register is always optional and is always supressed when PAC_LEAF_PUSH_IP is set to 0. This leads to CFI alignment issues for functions where the IP register is clobbered and thus where IP is always pushed to the stack in the function prologue. This patch introduces a new macro PAC_CFI_ADJ_DEFAULT whose value is never zeroed when PAC signing is requested, irrespective of the PAC_LEAF_PUSH_IP settings. Example: * HAVE_PAC_LEAF == 1 && PAC_LEAF_PUSH_IP == 1: PAC_CFI_ADJ = 4 PAC_CFI_ADJ_DEFAULT = 4 * HAVE_PAC_LEAF == 1 && PAC_LEAF_PUSH_IP == 0: PAC_CFI_ADJ = 0 PAC_CFI_ADJ_DEFAULT = 4 Built w/ arm-none-linux-gnueabihf, ran make check-string w/ qemu-arm-static.
2022-08-22string: arm: Augment M-profile PACBTI-enablement macrosVictor Do Nascimento
Modify previously defined PACBTI macros to allow for more flexible push/pop expressions at function prologue/epilogues, allowing further simplification of code predicated on the use of M-profile PACBTI hardware features. This patch also allows for the specification of whether generated pac keys are pushed onto the stack for leaf functions where this may not be necessary. It defines the following preprocessor macros: * HAVE_PAC_LEAF: Indicates whether pac-signing has been requested for leaf functions. * PAC_LEAF_PUSH_IP: Whether leaf functions should push the pac code to the stack irrespective of whether the ip register is clobbered in the function or not. * PAC_CFI_ADJ: Given values for the above two parameters, this holds the calculated offset applied to default CFI address/offset values as a consequence of potentially pushing the pac-code to the stack. It also defines the following assembler macros: * prologue: In addition to pushing any callee-saved registers onto the stack, it generates any requested pacbti instructions. Pushed registers are specified via the optional `first', `last' and `savepac' macro argument parameters. when a single register number is provided, it pushes that register. When two register numbers are provided, they specify a rage to save. If savepac is non-zero, the ip register is also saved. For example: prologue savepac=1 -> push {sp} prologue 1 -> push {r1} prologue 1 savepac=1 -> push {r1, ip} prologue 1 4 -> push {r1-r4} prologue 1 4 savepac=1 -> push {r1-r4, ip} * epilogue: pops registes off the stack and emmits pac key signing instruction if requested. The optional `first', `last' and `savepac' function as per the prologue macro, generating a pop instead of push instruction. * cfisavelist - prologue macro helper function, generating necessary .cfi_offset directives associated with push instruction. Therefore, the net effect of calling `prologue 1 2 savepac=1' is to generate the following: push {r1-r2, ip} .cfi_adjust_cfa_offset 12 .cfi_offset 143, -12 .cfi_offset 2, -8 .cfi_offset 1, -4 * cfirestorelist - epilogue macro helper function, emitting .cfi_restore instructions prior to resetting the cfa offset. As such, calling `epilogue 1 2 savepac=1' will produce: pop {r1-r2, ip} .cfi_restore 143 .cfi_restore 2 .cfi_restore 1 .cfi_def_cfa_offset 0
2022-08-22string: arm: Prevent leaf function unwindingVictor Do Nascimento
As leaf functions cannot throw exceptions, with EHABI only supporting synchronous exceptions, add support for emitting a `.cantunwind' directive prior to `.fnend' in ARM_FNEND preprocessor macro. This ensures no personality routine or exception table data is generated. Existing `.save' directives used in leaf functions are also removed. Built w/ arm-none-linux-gnueabihf, ran make check-string w/ qemu-arm-static.
2022-08-03string: arm: Augument unwind information for PAC instructionsVictor Do Nascimento
Add the `.cfi_register 143, 12' directive immediately after pac instruction is emitted. Ensures unwind info consumers know immediately that if they need the PAC for the function, they can find it in ip register.
2022-08-03string: arm: Update feature test macro use in .arch selectionVictor Do Nascimento
Move away from use of the non-portable __ARM_ARCH_8M_MAIN__ feature test macro in favour of __ARM_ARCH >= 8 in selecting for target architecture selection.
2022-08-03string: arm: Implement conditional leaf PAC signingVictor Do Nascimento
Adjust critetion for M-profile PACBTI signing of leaf function to be contingent on +leaf option being passed to -mbranch-protect compilation option.
2022-07-06string: Optimize string functions with shrn instructionDanila Kutenin
Optimize __memchr_aarch64_mte __memrchr_aarch64 __strchrnul_aarch64_mte __stpcpy_aarch64 __strcpy_aarch64 __strlen_aarch64_mte using the shrn instruction for computing the nibble mask instead of and + addp, which reduces instruction count.
2022-07-04string: simplify M-profile strlen PACBTI epilogueVictor Do Nascimento
Merge stack pop instructions prior to returning from function. This also introduces fixes to CFI offset calculations to reflect the register ordering on push and pop instructions, with the lowest-numbered register saved to the lowest memory address.
2022-07-04string: simplify M-profile memchr PACBTI epilogueVictor Do Nascimento
Merge stack pop instructions prior to returning from function. This also introduces fixes to CFI offset calculations to reflect the register ordering on push and pop instructions, with the lowest-numbered register saved to the lowest memory address.
2022-06-24string: Fix ARM_FNSTART on non-arm targetsSzabolcs Nagy
Fix build failure introduced by commit 40b662ce7b65d5eaefa40fd8046d6f3c6b3238c1 string: add .fnstart and .fnend directives to ENTRY/END macros
2022-06-22string: Add M-profile PACBTI implementation of memchrVictor Do Nascimento
Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested
2022-06-22string: Add M-profile PACBTI implementation of strlenVictor Do Nascimento
Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested.
2022-06-22string: Add M-profile PACBTI implementation of strcmpVictor Do Nascimento
Ensure BTI indirect branch landing pads (BTI) and pointer authentication code genetaion (PAC) and verification instructions (BXAUT) are conditionally added to assembly when branch protection is requested. NOTE: ENTRY_ALIGN() Macro factored out as .fnstart & .cfi_startproc directives needed to be moved to prior to L(fastpath_exit)
2022-06-22string: Add M-profile PACBTI-enablement header fileVictor Do Nascimento
Header adds assembler macro to handle Pointer Authentication and Branch Target Identification assembly instructions in function prologues and epilogues according to selected flags at compile-time.
2022-06-22string: add .fnstart and .fnend directives to ENTRY/END macrosVictor Do Nascimento
Modify the ENTRY_ALIGN and END assembler macros to mark the start and end of functions for arm unwind tables. Enables the pacbti epilogue function to emit .save{} directives for stack unwinding.
2022-06-22string: Fix header file issue in arm strcmp-armv6m.SVictor Do Nascimento
Fix missing include directive for use of ENTRY_ALIGN and END macros.
2022-06-22string: Fix header file issue in strlen testVictor Do Nascimento
Remove unnecessary sys/mman.h dependency.
2022-02-10Add README.contributorsSzabolcs Nagy
Document contributor requirements.
2022-02-10Update lincense to MIT OR Apache-2.0 WITH LLVM-exceptionSzabolcs Nagy
The outgoing license was MIT only. The new dual license allows using the code under Apache-2.0 WITH LLVM-exception license too.
2022-02-10string: Merge MTE versions of strcpy and stpcpyWilco Dijkstra
Merge the MTE and non-MTE versions of strcpy and stpcpy since the MTE versions are faster.
2022-02-10string: Merge MTE versions of strcmp and strncmpWilco Dijkstra
Merge the MTE and non-MTE versions of strcmp and strncmp since the MTE versions are faster.
2022-02-10string: Add SVE memcpyWilco Dijkstra
Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly.
2021-10-29string: Optimize memcmpWilco Dijkstra
Rewrite memcmp to improve performance. On small and medium inputs performance is typically 25% better. Large inputs use a SIMD loop processing 64 bytes per iteration, which is 50% faster than the previous version.
2021-10-04string: Improve memcpy benchmarkWilco Dijkstra
Improve memcpy benchmark. Double the number of random tests and the memory size. Add separate tests using a direct call to memcpy to compare with indirect call to GLIBC memcpy. Add a test for small aligned and unaligned memcpy.
2021-10-04string: Improve strlen benchmarkWilco Dijkstra
Increase the number of iterations of the random test. Minor code cleanup.
2021-10-04string: Add memset benchmarkWilco Dijkstra
Add a randomized memset benchmark using string length and alignment distribution based on SPEC2017.
2021-02-17Update copyright yearsSzabolcs Nagy
Scripted copyright year updates based on git committer date.
2021-02-12string: add __mtag_tag_zero_regionSzabolcs Nagy
Add optimized __mtag_tag_zero_region(dst, len) operation to AOR. It tags the memory according to the tag of the dst pointer then memsets it to 0 and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16. Similar to __mtag_tag_region, but uses the zeroing instructions.
2021-02-12string: add __mtag_tag_regionSzabolcs Nagy
Add optimized __mtag_tag_region(dst, len) operation to AOR. It tags the given memory region according to the tag of the dst pointer and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16.
2021-01-08string: Assembly code cleanupWilco Dijkstra
Cleanup spurious .text and .arch. Use ENTRY rather than ENTRY_ALIGN.
2021-01-04string/test: Fix strrchr '\0' error reportRichard Henderson
The error report was copied from the seekchar test above, and needs adjustment to match the gating IF.
2021-01-04string: Reduce alignment in strncmpRichard Henderson
There were nops before the beginning of the function to place the main loop on a 64-byte boundary, but the addition of BTI and instructions for ILP32 has corrupted that. As per review, drop 64-byte alignment entirely, and use the default 16-byte alignment from ENTRY.