aboutsummaryrefslogtreecommitdiff
path: root/BUILD
diff options
context:
space:
mode:
authorBenoit Jacob <benoitjacob@google.com>2019-08-20 10:25:05 -0400
committerBenoit Jacob <benoitjacob@google.com>2020-03-10 16:36:41 -0400
commitfa69a4bbdf3b676156668842b5d2e042cd4cd1f7 (patch)
tree2f19d08f4e70d5ab400aa64d0eaf341fa299dc01 /BUILD
parent9a8ac17ea97b04776c6c0ab9f90ee4f9c3636afe (diff)
downloadruy-fa69a4bbdf3b676156668842b5d2e042cd4cd1f7.tar.gz
Some more fixes to arm32 asm:
- Use vld1.8 not vld1.32 to load 8bit values. Especially in packing code, the source pointers are not guaranteed so have any alignment. In kernels, they are more or less guaranteed to be, but .8 is more idiomatic. If we ever notice a performance benefit of .32 (news to me) justifying this choice, we could then use .32 in kernels only and with a comment recording the performance rationale. - One vld1 was passing a single d-register without enclosing it in {} to make it a register-list. - Pack8bitNeonOutOfOrder{LHS,RHS} renamed to Pack8bitNeonOutOfOrder{4Cols,2Cols} because that's more descriptive of the actual difference between these functions. PiperOrigin-RevId: 264378751
Diffstat (limited to 'BUILD')
0 files changed, 0 insertions, 0 deletions