[AArch64] Avoid unnecessary lane-indexed loads in READYUV

The existing code makes use of a pair of lane-indexed load instructions to fill the two halves of the input vector, however this has the effect of introducing an unnecessary dependency on the value of the vector from the previous loop iteration. This doesn't really seem to affect little core performance since these cores never execute enough work concurrently to hit the bottleneck, however we can improve performance on mid and big cores quite a bit by using LDR instead of LD1 to load the low lane, zeroing the upper portion of the vector rather than keeping the previous value. Reduction in runtime for select kernels (no observed performance delta on Cortex-A55): Kernel | Cortex-A76 | Cortex-X2 I422ToARGB4444Row_NEON | -23.1% | -49.3% I422ToARGBRow_NEON | -1.2% | -2.5% I422ToRGB24Row_NEON | -11.7% | -7.0% I422ToRGBARow_NEON | -4.7% | -3.4% I444AlphaToARGBRow_NEON | -1.1% | -2.4% I444ToARGBRow_NEON | -1.6% | -3.2% I444ToRGB24Row_NEON | -9.6% | -6.8% Bug: libyuv:976 Change-Id: I8c9413e0e6ed97b8f060ce42b6e8abdfb77914b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5365868 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
author: George Steed <george.steed@arm.com> 2024-03-12 17:05:08 +0000
committer: libyuv LUCI CQ <libyuv-scoped@luci-project-accounts.iam.gserviceaccount.com> 2024-03-13 18:35:31 +0000
commit: 188e4e3afbfe90f6f13f59db6c929762de07921b (patch)
tree: d719993a8eec9279ca5cb130cc6debf79cdb8241
parent: 772bddaed79860d05f5d3b5ade09ec251b8cbbdb (diff)
download: libyuv-188e4e3afbfe90f6f13f59db6c929762de07921b.tar.gz
1 files changed, 2 insertions, 2 deletions
diff --git a/source/row_neon64.cc b/source/row_neon64.cc
index 1679f87c..a51b51aa 100644
--- a/source/row_neon64.cc
+++ b/source/row_neon64.cc
@@ -28,7 +28,7 @@ extern "C" {
 // Read 8 Y, 4 U and 4 V from 422
 #define READYUV422                               \
   "ldr        d0, [%[src_y]], #8             \n" \
-  "ld1        {v1.s}[0], [%[src_u]], #4      \n" \
+  "ldr        s1, [%[src_u]], #4             \n" \
   "ld1        {v1.s}[1], [%[src_v]], #4      \n" \
   "zip1       v0.16b, v0.16b, v0.16b         \n" \
   "prfm       pldl1keep, [%[src_y], 448]     \n" \
@@ -39,7 +39,7 @@ extern "C" {
 // Read 8 Y, 8 U and 8 V from 444
 #define READYUV444                               \
   "ldr        d0, [%[src_y]], #8             \n" \
-  "ld1        {v1.d}[0], [%[src_u]], #8      \n" \
+  "ldr        d1, [%[src_u]], #8             \n" \
   "prfm       pldl1keep, [%[src_y], 448]     \n" \
   "ld1        {v1.d}[1], [%[src_v]], #8      \n" \
   "prfm       pldl1keep, [%[src_u], 448]     \n" \
author	George Steed <george.steed@arm.com>	2024-03-12 17:05:08 +0000
committer	libyuv LUCI CQ <libyuv-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-03-13 18:35:31 +0000
commit	188e4e3afbfe90f6f13f59db6c929762de07921b (patch)
tree	d719993a8eec9279ca5cb130cc6debf79cdb8241
parent	772bddaed79860d05f5d3b5ade09ec251b8cbbdb (diff)
download	libyuv-188e4e3afbfe90f6f13f59db6c929762de07921b.tar.gz