aboutsummaryrefslogtreecommitdiff
path: root/scripts
AgeCommit message (Collapse)Author
2021-02-26QS8 Neon IGEMM microkernels with 8 bit MUL using DUPFrank Barchard
PiperOrigin-RevId: 359852046
2021-02-23QS8 Neon IGEMM C16 microkernel with two 8 bit multiplies and vpadal to ↵Frank Barchard
accumulate. Based on C8 but with ld128 which is fewer loads, fewer registers and no remainder code. PiperOrigin-RevId: 359009745
2021-02-19QS8 Neon GEMM C16 microkernel with two 8 bit multiplies and vpadal to ↵Frank Barchard
accumulate. C16 partial sums kernel using mull on 8 bit to 16 bit, then mlal on 8 bit to 16 bit Then padal to add pairs and lengthen to 32 bit, accumulating. The 4 int accumulators will represent 1 byte in the final output, so there is a vector for each element in the matrix. The 4 ints are added together outside the loop. PiperOrigin-RevId: 358428739
2021-02-16Remove scalar C4 QS8 and QU8 gemm microkernels.Frank Barchard
PiperOrigin-RevId: 357789675
2021-02-15QS8 C2 Neon igemmFrank Barchard
PiperOrigin-RevId: 357621434
2021-02-15QS8 C8 Neon igemmFrank Barchard
PiperOrigin-RevId: 357611597
2021-02-03C2 QS8 microkernel using mull then mlal with KC loop of 16Frank Barchard
PiperOrigin-RevId: 355524975
2021-01-29QS8 Neon GEMM C8 microkernel with 8 bit multiply and vpadal to accumulate.Frank Barchard
C8 partial sums kernel using mull on 8 bit to 16 bit a full 64 bits at a time. Then padal to add pairs and lengthen to 32 bit, accumulating. The 4 int accumulators will represent 1 byte in the final output, so there is a vector for each element in the matrix. The 4 ints are added together outside the loop. PiperOrigin-RevId: 354631007
2021-01-22Implement bilinear upsampling (CHW layout) for ARM architectureArtsiom Ablavatski
PiperOrigin-RevId: 353317573
2021-01-22QS8 Neon GEMM microkernel with 8 bit multiply and vpadal to accumulateFrank Barchard
PiperOrigin-RevId: 353315852
2021-01-15QS8 GEMM and IGEMM 3x8 3x16 and IGEMM 4x8 and 4x16Frank Barchard
PiperOrigin-RevId: 352033627
2021-01-14QS8 Neon GEMM microkernel with 8 bit multiplyFrank Barchard
PiperOrigin-RevId: 351893800
2021-01-12Add 4x8 and 4x16 qs8 gemm microkernelsFrank Barchard
PiperOrigin-RevId: 351464523
2020-12-21WebAssembly DWConv2D 3x3 stride 2 loadsplatFrank Barchard
PiperOrigin-RevId: 348584331
2020-12-21WebAssembly DWConv2D 5x5 stride 2 loadsplatFrank Barchard
PiperOrigin-RevId: 348498716
2020-12-15WebAssembly DWConv2D 3x3p1 adapted from NEONFrank Barchard
PiperOrigin-RevId: 347733375
2020-12-15WASMSIMD dwconv2d 5x5p2 use loadsplatFrank Barchard
PiperOrigin-RevId: 347719530
2020-12-11Additional SSE/SSE2 GEMM/IGEMM microkernelsMarat Dukhan
4x8 LOAD1 version is still the fastest on Silvermont PiperOrigin-RevId: 347077581
2020-12-11Rename WASMSIMD dwconv2d functions to splat or loadsplatFrank Barchard
- 3x3p1 is renamed to loadsplat. This function was ported from SSSE3. - 3x3s2p1 is renamed to splat. - 5x5p2 is renamed to splat. - 5x5s2p2 is renamed to splat. PiperOrigin-RevId: 347042306
2020-12-07Rename WebAssembly SIMD source files and functions with x86 or arm suffix ↵Frank Barchard
after wasmsimd - BUILD/CMakeLists.txt sorted alpha numberically PiperOrigin-RevId: 346133603
2020-12-06Refactor accuracy evaluation benchmarksMarat Dukhan
PiperOrigin-RevId: 346004052
2020-12-06NEON versions of non-blocked F32 SpMM microkernelsMarat Dukhan
PiperOrigin-RevId: 346003353
2020-12-04WebAssembly SIMD DWConv2D 3x3 stride-2 adapted from NEONFrank Barchard
PiperOrigin-RevId: 345689106
2020-12-03WebAssembly SIMD DWConv2D 5x5 stride 2 adapted from NEONFrank Barchard
PiperOrigin-RevId: 345557870
2020-12-03Remove code generator for old 5x5p2Frank Barchard
PiperOrigin-RevId: 345423998
2020-12-01Vector ELU microkernelsMarat Dukhan
PiperOrigin-RevId: 345108685
2020-11-30Web Assemble DWConv2D f32_dwconv2d_chw_ukernel_5x5p2__wasmsimd adapted from NeonFrank Barchard
PiperOrigin-RevId: 344959802
2020-11-22WAsm SIMD version of DWCONV2D CHW 3x3p1Frank Barchard
PiperOrigin-RevId: 343748483
2020-11-16WasmSIMD dwconv2d generate x86 optimized version.Frank Barchard
PiperOrigin-RevId: 342804477
2020-11-03Pipelined Web Assembly Sparse Matrix MultiplyFrank Barchard
PiperOrigin-RevId: 340501315
2020-10-30Rename unroll to x for SpMM microkernels with unrolled loopFrank Barchard
PiperOrigin-RevId: 339990626
2020-10-30SSE variant of 5x5s2 DWCONV CHW micro-kernelsMarat Dukhan
PiperOrigin-RevId: 339968575
2020-10-30SSE variants of 5x5 DWCONV CHW micro-kernelsMarat Dukhan
PiperOrigin-RevId: 339873989
2020-10-29Auto-generate 5x5s2 DWCONV CHW micro-kernelsMarat Dukhan
PiperOrigin-RevId: 339770846
2020-10-28Auto-generate 5x5s2p2 DWCONV CHW micro-kernelsMarat Dukhan
PiperOrigin-RevId: 339600408
2020-10-26Add 32x1 32x2 32x4 SPMM microkernels and remove 4x1 4x2 4x4 for WASMSIMD, ↵Frank Barchard
Neon and SSE PiperOrigin-RevId: 339125492
2020-10-26Auto-generate NEON 5x5p2 DWCONV micro-kernelsMarat Dukhan
PiperOrigin-RevId: 339097659
2020-10-25Auto-generate scalar 5x5p2 DWCONV CHW micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338981996
2020-10-25Auto-generate scalar versions of DWCONV2D CHW 3x3s2p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338962809
2020-10-25Auto-generate NEON/NEONFMA versions of DWCONV2D CHW 3x3s2p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338962587
2020-10-25Auto-generate SSE versions of DWCONV2D CHW 3x3s2p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338962367
2020-10-24Auto-generate scalar versions of DWCONV2D CHW 3x3p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338847524
2020-10-24NEON versions of DWCONV2D CHW 3x3p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338809743
2020-10-24Auto-generate AArch64 NEONFMA versions of DWCONV2D CHW 3x3p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338806381
2020-10-23SSSE3 versions of DWCONV2D CHW 3x3p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338801635
2020-10-23Auto-generate SSE versions of DWCONV2D CHW 3x3p1 micro-kernelsMarat Dukhan
PiperOrigin-RevId: 338798065
2020-10-23Add WebAssembly SIMD IBILINEAR microkernels for CHW layoutXNNPACK Team
PiperOrigin-RevId: 338792392
2020-10-23Rename DWCONV CHW microkernels to DWCONV2D CHWMarat Dukhan
Explicitly mention that microkernels assume 2D layout PiperOrigin-RevId: 338788486
2020-10-23Roll back the decision to split the packed weights for the CHW IBILINEAR ↵XNNPACK Team
microkernel interface PiperOrigin-RevId: 338785454
2020-10-23Generate DWCONV CHW microkernel tests from a YAML specificationMarat Dukhan
PiperOrigin-RevId: 338783962