aboutsummaryrefslogtreecommitdiff
path: root/string/include
AgeCommit message (Collapse)Author
2022-02-10Update lincense to MIT OR Apache-2.0 WITH LLVM-exceptionSzabolcs Nagy
The outgoing license was MIT only. The new dual license allows using the code under Apache-2.0 WITH LLVM-exception license too.
2022-02-10string: Merge MTE versions of strcpy and stpcpyWilco Dijkstra
Merge the MTE and non-MTE versions of strcpy and stpcpy since the MTE versions are faster.
2022-02-10string: Merge MTE versions of strcmp and strncmpWilco Dijkstra
Merge the MTE and non-MTE versions of strcmp and strncmp since the MTE versions are faster.
2022-02-10string: Add SVE memcpyWilco Dijkstra
Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly.
2021-02-17Update copyright yearsSzabolcs Nagy
Scripted copyright year updates based on git committer date.
2021-02-12string: add __mtag_tag_zero_regionSzabolcs Nagy
Add optimized __mtag_tag_zero_region(dst, len) operation to AOR. It tags the memory according to the tag of the dst pointer then memsets it to 0 and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16. Similar to __mtag_tag_region, but uses the zeroing instructions.
2021-02-12string: add __mtag_tag_regionSzabolcs Nagy
Add optimized __mtag_tag_region(dst, len) operation to AOR. It tags the given memory region according to the tag of the dst pointer and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16.
2020-05-20string: Add optimized strcpy-mte and stpcpy-mteWilco Dijkstra
Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro architectures the speedup over the non-MTE version is 53% on large strings and 20-60% on small strings.
2020-05-12string: Add optimized memrchrWilco Dijkstra
Add optimized MTE-comparible memrchr. This walks the input backwards using the same algorithm as memchr-mte.
2020-04-30string: ARMv8.5 MTE: Add MTE compatible version of strncmp.Branislav Rankov
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strncmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: string/test/strncmp.c on big endian, little endian and with MTE support. Booted nanodroid with MTE enabled. Bechmarked on Pixel4.
2020-04-30string: ARMv8.5 MTE: Add MTE compatible version of strcmp.Branislav Rankov
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strcmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: optimized-routines/string/test/strcmp.c on big and little endian. Booted nanodroid with MTE enabled. bionic string tests with MTE enabled. Benchmarks results: Run both bionic benchmarks and glibc benchmarks on Pixel4. Cores A76 and A55.
2020-04-30ARMv8.5 MTE: Add MTE compatible version of strrchr.Gabor Kertesz
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strrchr.S. Testing done: optimized-routines/string/test/strrchr.c Booted nanodroid with MTE enabled. Bionic string tests with MTE enabled. Big endian with Qemu: qemu-aarch64_be
2020-04-08ARMv8.5 MTE: Add MTE compatible version of strchrnul.Gabor Kertesz
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strchr-mte.S and string/aarch64/strchrnul.S Testing done: optimized-routines/string/test/strchrnul.c Booted nanodroid with MTE enabled. bionic string tests with MTE enabled. Big endian with Qemu: qemu-aarch64_be
2020-04-08ARMv8.5 MTE: Add MTE compatible version of memchr.Gabor Kertesz
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/memchr.S The 64-bit syndrome value is changed to contain only 16 bytes of data. The 32 byte loop is unrolled to two 16 byte reads. Testing done: optimized-routines/string/test/memchr.c Booted nanodroid with MTE enabled. bionic string tests with MTE enabled.
2020-03-31string: Add memcpy benchmarkWilco Dijkstra
Add memcpy benchmark based on size and alignment distribution of SPEC2017.
2020-02-25ARMv8.5 MTE: Add MTE compatible version of strchr.Gabor Kertesz
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strchr.S The 64-bit syndrome value is changed to contain only 16 bytes of data. The 32 byte loop is unrolled by two 16 byte reads.
2020-02-25string: Add stpcpyWilco Dijkstra
Add support for stpcpy on AArch64.
2020-02-18ARMv8.5 MTE: Add MTE compatible version of strlen.Branislav Rankov
Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strlen.S Merged the page cross code into the main path and optimized it. Modified the zeroones mask to ignore the bytes that are loaded but are not part of the string. Made a special case for when there is 8 bytes or less to check before the alignment boundary.
2020-01-14string: Remove memcpy_bytewiseWilco Dijkstra
This was a placeholder for testing the build system before we added optimized string code and thus no longer needed.
2020-01-06string: Add strrchrWilco Dijkstra
Add strrchr for AArch64. Originally written by Richard Earnshaw, same code is present in newlib, this copy has minor edits for inclusion into the optimized-routines repo.
2019-12-10aarch64: Combine memcpy and memmove implementationsKrzysztof Koch
Modify integer and SIMD versions of memcpy to handle overlaps correctly. Make __memmove_aarch64 and __memmove_aarch64_simd alias to __memcpy_aarch64 and __memcpy_aarch64_simd respectively. Complete sharing of code between memcpy and memmove implementations is possible without noticeable performance penalty. This is thanks to moving the source and destination buffer overlap detection after the code for handling small and medium copies which are overlap-safe anyway. Benchmarking shows that keeping two versions of memcpy is necessary because newer platforms favor aligning src over destination for large copies. Using NEON registers also gives a small speedup. However, aligning dst and using general-purpose registers works best for older platforms. Consequently, memcpy.S and memcpy_simd.S contain memcpy code which is identical except for the registers used and src vs dst alignment.
2019-11-26arch64: Add SIMD version of memcpyKrzysztof Koch
Create a new memcpy implementation for targets with the NEON extension. __memcpy_aarch64_simd has been tested on a range of modern microarchitectures. It turned out to be faster than __memcpy_aarch64 on all of them, with a performance improvement of 3-11% depending on the platform.
2019-08-28Import aarch64 sve strrchrAdhemerval Zanella
The only difference is changing the symbol name from strrchr to __strrchr_aarch64_sve.
2019-08-28Import aarch64 sve strnlenAdhemerval Zanella
The only difference is changing the symbol name from strnlen to __strnlen_aarch64_sve.
2019-08-28Import aarch64 sve strncmpAdhemerval Zanella
The only difference is changing the symbol name from strncmp to __strncmp_aarch64_sve.
2019-08-28Import aarch64 sve strlenAdhemerval Zanella
The only difference is changing the symbol name from strlen to __strlen_aarch64_sve.
2019-08-28Import aarch64 sve strcpyAdhemerval Zanella
The only difference is changing the symbol name from strcpy to __strcpy_aarch64_sve.
2019-08-28Import aarch64 sve strcmpAdhemerval Zanella
The only difference is changing the symbol name from strcmp to __strcmp_aarch64_sve.
2019-08-28Import aarch64 sve strchr and strchrnulAdhemerval Zanella
The only difference is changing the symbol name from strchr/strchrnul to __strchr_aarch64_sve and __strchrnul_aarch64_sve.
2019-08-28Import aarch64 sve memcmpAdhemerval Zanella
The only difference is changing the symbol name from memcmp to __memcmp_aarch64_sve.
2019-08-28Import aarch64 sve memchrAdhemerval Zanella
The only difference is changing the symbol name from memchr to __memchr_aarch64_sve.
2019-08-23Import arm strlen armv6t2Adhemerval Zanella
The only difference is changing the symbol name from strlen to __strlen_armv6t2.
2019-08-23Import arm strcmp armv6-mAdhemerval Zanella
The only difference is changing the symbol name from strcmp to __strcmp_armv6m.
2019-08-23Import arm strcmpAdhemerval Zanella
The only difference is changing the symbol name from strcmp to __strcmp_arm.
2019-08-23Import arm strcpyAdhemerval Zanella
The differences from cortex-strings are: - Simplify the thumb-2/thumb selection by removing the usage of PREFER_SIZE_OVER_SPEED and __OPTIMIZE_SIZE__. - Removed the dumb byte-per-byte loops.
2019-08-23Import arm memchrAdhemerval Zanella
The only difference is changing the symbol name from memchr to __memchr_arm and the final .size directive.
2019-08-23Import arm memsetAdhemerval Zanella
The only difference is changing the symbol name from memset to __memset_arm and the final .size directive.
2019-08-23Import arm memcpyAdhemerval Zanella
The only difference is changing the symbol name from memcpy to __memcpy_arm.
2019-08-23Import aarch64 strncmpAdhemerval Zanella
The only difference is changing the symbol name from strncmp to __strncmp_aarch64.
2019-08-23Import aarch64 strnlenAdhemerval Zanella
The only difference is changing the symbol name from strnlen to __strnlen_aarch64.
2019-08-23Import aarch64 strlenAdhemerval Zanella
The only difference is changing the symbol name from strlen to __strlen_aarch64.
2019-08-23Import aarch64 strchrnulAdhemerval Zanella
The only difference is changing the symbol name from strchrnul to __strchrnul_aarch64.
2019-08-23Import aarch64 strchrAdhemerval Zanella
The only difference is changing the symbol name from strchr to __strchr_aarch64.
2019-08-23Import aarch64 strcmpAdhemerval Zanella
The only difference is changing the symbol name from strcmp to __strcmp_aarch64.
2019-08-23Import aarch64 strcpyAdhemerval Zanella
The only difference is changing the symbol name from strcpy to __strcpy_aarch64.
2019-08-23Import aarch64 memcmpAdhemerval Zanella
The only difference is changing the symbol name from memcmp to __memcmp_aarch64.
2019-08-23Import aarch64 memchrAdhemerval Zanella
The only difference is changing the symbol name from memchr to __memchr_aarch64.
2019-08-23Import aarch64 memsetAdhemerval Zanella
The only difference is changing the symbol name from memset to __memset_aarch64.
2019-08-23Import aarch64 memmoveAdhemerval Zanella
The only difference is changing the symbol name from memmove to __memmove_aarch64 and the memcpy branch to __memcpy_aarch64.
2019-08-23Import aarch64 memcpyAdhemerval Zanella
The only difference is changing the symbol name from memcpy to __memcpy_aarch64.