Age | Commit message (Collapse) | Author |
|
Scripted copyright year updates based on git committer date.
|
|
Add optimized __mtag_tag_zero_region(dst, len) operation to AOR. It tags
the memory according to the tag of the dst pointer then memsets it to 0
and returns dst. It requires MTE support. The memory remains untagged if
tagging is not enabled for it. The dst must be 16 bytes aligned and len
must be a multiple of 16.
Similar to __mtag_tag_region, but uses the zeroing instructions.
|
|
Add optimized __mtag_tag_region(dst, len) operation to AOR. It tags the
given memory region according to the tag of the dst pointer and returns
dst. It requires MTE support. The memory remains untagged if tagging is
not enabled for it. The dst must be 16 bytes aligned and len must be a
multiple of 16.
|
|
The error report was copied from the seekchar test above,
and needs adjustment to match the gating IF.
|
|
Set taggs for every test case so that boundaries are as narrow as
possible. There is no handling of tag faults, so the test will
crash if there is a MTE problem.
The implementations that are not compatible are excluded, including
the standard symbols that may come from an mte incompatible libc.
|
|
Clean up code and improve test coverage.
|
|
Clean up code and improve test coverage.
|
|
Clean up code and improve test coverage.
|
|
Cleanup stpcpy test and improve test coverage.
|
|
Cleanup strcpy test and improve test coverage.
|
|
Cleanup strnlen test and improve test coverage.
|
|
Cleanup strlen test and improve test coverage.
|
|
Add optimized MTE-compatible strcpy-mte and stpcpy-mte. On various micro
architectures the speedup over the non-MTE version is 53% on large strings
and 20-60% on small strings.
|
|
Add new memrchr test.
|
|
Improve memchr test coverage and cleanup code.
|
|
Improve strnlen test coverage and cleanup code.
|
|
Use the GNU style consistently in the string test code.
Added clang-format guard comments where necessary so the
code can be reformated using the clang-format tool and
GNU style settings from gcc contrib/clang-format.
|
|
Reading outside the range of the string is only allowed within 16 byte
aligned granules when MTE is enabled.
This implementation is based on string/aarch64/strncmp.S
Change the case when strings are are misaligned, align the pointers
down, and ignore bytes before the start of the string. Carry the part
that is not compared to the next comparison.
Testing done:
string/test/strncmp.c on big endian, little endian and with MTE support.
Booted nanodroid with MTE enabled.
Bechmarked on Pixel4.
|
|
Reading outside the range of the string is only allowed within 16 byte
aligned granules when MTE is enabled.
This implementation is based on string/aarch64/strcmp.S
Change the case when strings are are misaligned, align the pointers
down, and ignore bytes before the start of the string. Carry the part
that is not compared to the next comparison.
Testing done:
optimized-routines/string/test/strcmp.c on big and little endian.
Booted nanodroid with MTE enabled.
bionic string tests with MTE enabled.
Benchmarks results:
Run both bionic benchmarks and glibc benchmarks on Pixel4. Cores A76 and A55.
|
|
Reading outside the range of the string is only allowed within
16 byte aligned granules when MTE is enabled.
This implementation is based on string/aarch64/strrchr.S.
Testing done:
optimized-routines/string/test/strrchr.c
Booted nanodroid with MTE enabled.
Bionic string tests with MTE enabled.
Big endian with Qemu: qemu-aarch64_be
|
|
Use matching and null characters in the padding area around the string.
Remove large input tests.
|
|
Tests printed too much output on broken string function
and the output was not entirely useful.
Added a new header file with some common logic for
printing buffers nicely.
In str* tests len now means string length (not buffer
size which was confusing).
|
|
Reading outside the range of the string is only allowed within
16 byte aligned granules when MTE is enabled.
This implementation is based on string/aarch64/strchr-mte.S and
string/aarch64/strchrnul.S
Testing done:
optimized-routines/string/test/strchrnul.c
Booted nanodroid with MTE enabled.
bionic string tests with MTE enabled.
Big endian with Qemu: qemu-aarch64_be
|
|
Reading outside the range of the string is only allowed within 16 byte
aligned granules when MTE is enabled.
This implementation is based on string/aarch64/memchr.S
The 64-bit syndrome value is changed to contain only 16 bytes of data.
The 32 byte loop is unrolled to two 16 byte reads.
Testing done:
optimized-routines/string/test/memchr.c
Booted nanodroid with MTE enabled.
bionic string tests with MTE enabled.
|
|
Memchr's length input parameter is unsigned and it's allowed to be
huge, so any algorithm that uses that as a signed number, should
fail the test.
This patch adds cases when the length is actually bigger than the
inspected array, but the seeked character is within the valid range.
|
|
Reading outside the range of the string is only allowed within 16 byte
aligned granules when MTE is enabled.
This implementation is based on string/aarch64/strchr.S
The 64-bit syndrome value is changed to contain only 16 bytes of
data.
The 32 byte loop is unrolled by two 16 byte reads.
|
|
Add support for stpcpy on AArch64.
|
|
Reading outside the range of the string is only allowed within 16 byte
aligned granules when MTE is enabled.
This implementation is based on string/aarch64/strlen.S
Merged the page cross code into the main path and optimized it.
Modified the zeroones mask to ignore the bytes that are loaded but are
not part of the string. Made a special case for when there is 8 bytes
or less to check before the alignment boundary.
|
|
This was a placeholder for testing the build system before we added
optimized string code and thus no longer needed.
|
|
Add strrchr for AArch64. Originally written by Richard Earnshaw, same
code is present in newlib, this copy has minor edits for inclusion into
the optimized-routines repo.
|
|
Modify integer and SIMD versions of memcpy to handle overlaps correctly.
Make __memmove_aarch64 and __memmove_aarch64_simd alias to
__memcpy_aarch64 and __memcpy_aarch64_simd respectively.
Complete sharing of code between memcpy and memmove implementations is
possible without noticeable performance penalty. This is thanks to
moving the source and destination buffer overlap detection after
the code for handling small and medium copies which are overlap-safe
anyway.
Benchmarking shows that keeping two versions of memcpy is necessary
because newer platforms favor aligning src over destination for large
copies. Using NEON registers also gives a small speedup. However,
aligning dst and using general-purpose registers works best for older
platforms. Consequently, memcpy.S and memcpy_simd.S contain memcpy
code which is identical except for the registers used and src vs dst
alignment.
|
|
Create a new memcpy implementation for targets with the NEON extension.
__memcpy_aarch64_simd has been tested on a range of modern
microarchitectures. It turned out to be faster than __memcpy_aarch64 on
all of them, with a performance improvement of 3-11% depending on the
platform.
|
|
Without printing anything on success it is unclear if the right set
of functions got hooked up in the test code.
|
|
The only difference is changing the symbol name from strrchr
to __strrchr_aarch64_sve.
|
|
|
|
The only difference is changing the symbol name from strnlen
to __strnlen_aarch64_sve.
|
|
The only difference is changing the symbol name from strncmp
to __strncmp_aarch64_sve.
|
|
The only difference is changing the symbol name from strlen
to __strlen_aarch64_sve.
|
|
The only difference is changing the symbol name from strcpy
to __strcpy_aarch64_sve.
|
|
The only difference is changing the symbol name from strcmp
to __strcmp_aarch64_sve.
|
|
The only difference is changing the symbol name from strchr/strchrnul
to __strchr_aarch64_sve and __strchrnul_aarch64_sve.
|
|
The only difference is changing the symbol name from memcmp
to __memcmp_aarch64_sve.
|
|
The only difference is changing the symbol name from memchr
to __memchr_aarch64_sve.
|
|
The only difference is changing the symbol name from strlen
to __strlen_armv6t2.
|
|
The only difference is changing the symbol name from strcmp
to __strcmp_armv6m.
|
|
The only difference is changing the symbol name from strcmp
to __strcmp_arm.
|
|
The differences from cortex-strings are:
- Simplify the thumb-2/thumb selection by removing the usage of
PREFER_SIZE_OVER_SPEED and __OPTIMIZE_SIZE__.
- Removed the dumb byte-per-byte loops.
|
|
The only difference is changing the symbol name from memchr
to __memchr_arm and the final .size directive.
|
|
The only difference is changing the symbol name from memset
to __memset_arm and the final .size directive.
|
|
The only difference is changing the symbol name from memcpy
to __memcpy_arm.
|