external/arm-optimized-routines.git

Age	Commit message (Collapse)	Author
2022-02-10	Update lincense to MIT OR Apache-2.0 WITH LLVM-exception	Szabolcs Nagy
	The outgoing license was MIT only. The new dual license allows using the code under Apache-2.0 WITH LLVM-exception license too.
2022-02-10	string: Merge MTE versions of strcpy and stpcpy	Wilco Dijkstra
	Merge the MTE and non-MTE versions of strcpy and stpcpy since the MTE versions are faster.
2022-02-10	string: Merge MTE versions of strcmp and strncmp	Wilco Dijkstra
	Merge the MTE and non-MTE versions of strcmp and strncmp since the MTE versions are faster.
2022-02-10	string: Add SVE memcpy	Wilco Dijkstra
	Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly.
2021-02-17	Update copyright years	Szabolcs Nagy
	Scripted copyright year updates based on git committer date.
2021-02-12	string: add __mtag_tag_zero_region	Szabolcs Nagy
	Add optimized __mtag_tag_zero_region(dst, len) operation to AOR. It tags the memory according to the tag of the dst pointer then memsets it to 0 and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16. Similar to __mtag_tag_region, but uses the zeroing instructions.
2021-02-12	string: add __mtag_tag_region	Szabolcs Nagy
	Add optimized __mtag_tag_region(dst, len) operation to AOR. It tags the given memory region according to the tag of the dst pointer and returns dst. It requires MTE support. The memory remains untagged if tagging is not enabled for it. The dst must be 16 bytes aligned and len must be a multiple of 16.
2020-05-20	string: Add optimized strcpy-mte and stpcpy-mte	Wilco Dijkstra
	Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro architectures the speedup over the non-MTE version is 53% on large strings and 20-60% on small strings.
2020-05-12	string: Add optimized memrchr	Wilco Dijkstra
	Add optimized MTE-comparible memrchr. This walks the input backwards using the same algorithm as memchr-mte.
2020-04-30	string: ARMv8.5 MTE: Add MTE compatible version of strncmp.	Branislav Rankov
	Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strncmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: string/test/strncmp.c on big endian, little endian and with MTE support. Booted nanodroid with MTE enabled. Bechmarked on Pixel4.
2020-04-30	string: ARMv8.5 MTE: Add MTE compatible version of strcmp.	Branislav Rankov
	Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strcmp.S Change the case when strings are are misaligned, align the pointers down, and ignore bytes before the start of the string. Carry the part that is not compared to the next comparison. Testing done: optimized-routines/string/test/strcmp.c on big and little endian. Booted nanodroid with MTE enabled. bionic string tests with MTE enabled. Benchmarks results: Run both bionic benchmarks and glibc benchmarks on Pixel4. Cores A76 and A55.
2020-04-30	ARMv8.5 MTE: Add MTE compatible version of strrchr.	Gabor Kertesz
	Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strrchr.S. Testing done: optimized-routines/string/test/strrchr.c Booted nanodroid with MTE enabled. Bionic string tests with MTE enabled. Big endian with Qemu: qemu-aarch64_be
2020-04-08	ARMv8.5 MTE: Add MTE compatible version of strchrnul.	Gabor Kertesz
	Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strchr-mte.S and string/aarch64/strchrnul.S Testing done: optimized-routines/string/test/strchrnul.c Booted nanodroid with MTE enabled. bionic string tests with MTE enabled. Big endian with Qemu: qemu-aarch64_be
2020-04-08	ARMv8.5 MTE: Add MTE compatible version of memchr.	Gabor Kertesz
	Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/memchr.S The 64-bit syndrome value is changed to contain only 16 bytes of data. The 32 byte loop is unrolled to two 16 byte reads. Testing done: optimized-routines/string/test/memchr.c Booted nanodroid with MTE enabled. bionic string tests with MTE enabled.
2020-03-31	string: Add memcpy benchmark	Wilco Dijkstra
	Add memcpy benchmark based on size and alignment distribution of SPEC2017.
2020-02-25	ARMv8.5 MTE: Add MTE compatible version of strchr.	Gabor Kertesz
	Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strchr.S The 64-bit syndrome value is changed to contain only 16 bytes of data. The 32 byte loop is unrolled by two 16 byte reads.
2020-02-25	string: Add stpcpy	Wilco Dijkstra
	Add support for stpcpy on AArch64.
2020-02-18	ARMv8.5 MTE: Add MTE compatible version of strlen.	Branislav Rankov
	Reading outside the range of the string is only allowed within 16 byte aligned granules when MTE is enabled. This implementation is based on string/aarch64/strlen.S Merged the page cross code into the main path and optimized it. Modified the zeroones mask to ignore the bytes that are loaded but are not part of the string. Made a special case for when there is 8 bytes or less to check before the alignment boundary.
2020-01-14	string: Remove memcpy_bytewise	Wilco Dijkstra
	This was a placeholder for testing the build system before we added optimized string code and thus no longer needed.
2020-01-06	string: Add strrchr	Wilco Dijkstra
	Add strrchr for AArch64. Originally written by Richard Earnshaw, same code is present in newlib, this copy has minor edits for inclusion into the optimized-routines repo.
2019-12-10	aarch64: Combine memcpy and memmove implementations	Krzysztof Koch
	Modify integer and SIMD versions of memcpy to handle overlaps correctly. Make __memmove_aarch64 and __memmove_aarch64_simd alias to __memcpy_aarch64 and __memcpy_aarch64_simd respectively. Complete sharing of code between memcpy and memmove implementations is possible without noticeable performance penalty. This is thanks to moving the source and destination buffer overlap detection after the code for handling small and medium copies which are overlap-safe anyway. Benchmarking shows that keeping two versions of memcpy is necessary because newer platforms favor aligning src over destination for large copies. Using NEON registers also gives a small speedup. However, aligning dst and using general-purpose registers works best for older platforms. Consequently, memcpy.S and memcpy_simd.S contain memcpy code which is identical except for the registers used and src vs dst alignment.
2019-11-26	arch64: Add SIMD version of memcpy	Krzysztof Koch
	Create a new memcpy implementation for targets with the NEON extension. __memcpy_aarch64_simd has been tested on a range of modern microarchitectures. It turned out to be faster than __memcpy_aarch64 on all of them, with a performance improvement of 3-11% depending on the platform.
2019-08-28	Import aarch64 sve strrchr	Adhemerval Zanella
	The only difference is changing the symbol name from strrchr to __strrchr_aarch64_sve.
2019-08-28	Import aarch64 sve strnlen	Adhemerval Zanella
	The only difference is changing the symbol name from strnlen to __strnlen_aarch64_sve.
2019-08-28	Import aarch64 sve strncmp	Adhemerval Zanella
	The only difference is changing the symbol name from strncmp to __strncmp_aarch64_sve.
2019-08-28	Import aarch64 sve strlen	Adhemerval Zanella
	The only difference is changing the symbol name from strlen to __strlen_aarch64_sve.
2019-08-28	Import aarch64 sve strcpy	Adhemerval Zanella
	The only difference is changing the symbol name from strcpy to __strcpy_aarch64_sve.
2019-08-28	Import aarch64 sve strcmp	Adhemerval Zanella
	The only difference is changing the symbol name from strcmp to __strcmp_aarch64_sve.
2019-08-28	Import aarch64 sve strchr and strchrnul	Adhemerval Zanella
	The only difference is changing the symbol name from strchr/strchrnul to __strchr_aarch64_sve and __strchrnul_aarch64_sve.
2019-08-28	Import aarch64 sve memcmp	Adhemerval Zanella
	The only difference is changing the symbol name from memcmp to __memcmp_aarch64_sve.
2019-08-28	Import aarch64 sve memchr	Adhemerval Zanella
	The only difference is changing the symbol name from memchr to __memchr_aarch64_sve.
2019-08-23	Import arm strlen armv6t2	Adhemerval Zanella
	The only difference is changing the symbol name from strlen to __strlen_armv6t2.
2019-08-23	Import arm strcmp armv6-m	Adhemerval Zanella
	The only difference is changing the symbol name from strcmp to __strcmp_armv6m.
2019-08-23	Import arm strcmp	Adhemerval Zanella
	The only difference is changing the symbol name from strcmp to __strcmp_arm.
2019-08-23	Import arm strcpy	Adhemerval Zanella
	The differences from cortex-strings are: - Simplify the thumb-2/thumb selection by removing the usage of PREFER_SIZE_OVER_SPEED and __OPTIMIZE_SIZE__. - Removed the dumb byte-per-byte loops.
2019-08-23	Import arm memchr	Adhemerval Zanella
	The only difference is changing the symbol name from memchr to __memchr_arm and the final .size directive.
2019-08-23	Import arm memset	Adhemerval Zanella
	The only difference is changing the symbol name from memset to __memset_arm and the final .size directive.
2019-08-23	Import arm memcpy	Adhemerval Zanella
	The only difference is changing the symbol name from memcpy to __memcpy_arm.
2019-08-23	Import aarch64 strncmp	Adhemerval Zanella
	The only difference is changing the symbol name from strncmp to __strncmp_aarch64.
2019-08-23	Import aarch64 strnlen	Adhemerval Zanella
	The only difference is changing the symbol name from strnlen to __strnlen_aarch64.
2019-08-23	Import aarch64 strlen	Adhemerval Zanella
	The only difference is changing the symbol name from strlen to __strlen_aarch64.
2019-08-23	Import aarch64 strchrnul	Adhemerval Zanella
	The only difference is changing the symbol name from strchrnul to __strchrnul_aarch64.
2019-08-23	Import aarch64 strchr	Adhemerval Zanella
	The only difference is changing the symbol name from strchr to __strchr_aarch64.
2019-08-23	Import aarch64 strcmp	Adhemerval Zanella
	The only difference is changing the symbol name from strcmp to __strcmp_aarch64.
2019-08-23	Import aarch64 strcpy	Adhemerval Zanella
	The only difference is changing the symbol name from strcpy to __strcpy_aarch64.
2019-08-23	Import aarch64 memcmp	Adhemerval Zanella
	The only difference is changing the symbol name from memcmp to __memcmp_aarch64.
2019-08-23	Import aarch64 memchr	Adhemerval Zanella
	The only difference is changing the symbol name from memchr to __memchr_aarch64.
2019-08-23	Import aarch64 memset	Adhemerval Zanella
	The only difference is changing the symbol name from memset to __memset_aarch64.
2019-08-23	Import aarch64 memmove	Adhemerval Zanella
	The only difference is changing the symbol name from memmove to __memmove_aarch64 and the memcpy branch to __memcpy_aarch64.
2019-08-23	Import aarch64 memcpy	Adhemerval Zanella
	The only difference is changing the symbol name from memcpy to __memcpy_aarch64.