Age | Commit message (Collapse) | Author | |
---|---|---|---|
2021-02-26 | QS8 Neon IGEMM microkernels with 8 bit MUL using DUP | Frank Barchard | |
PiperOrigin-RevId: 359852046 | |||
2021-02-23 | QS8 Neon IGEMM C16 microkernel with two 8 bit multiplies and vpadal to ↵ | Frank Barchard | |
accumulate. Based on C8 but with ld128 which is fewer loads, fewer registers and no remainder code. PiperOrigin-RevId: 359009745 | |||
2021-02-19 | QS8 Neon GEMM C16 microkernel with two 8 bit multiplies and vpadal to ↵ | Frank Barchard | |
accumulate. C16 partial sums kernel using mull on 8 bit to 16 bit, then mlal on 8 bit to 16 bit Then padal to add pairs and lengthen to 32 bit, accumulating. The 4 int accumulators will represent 1 byte in the final output, so there is a vector for each element in the matrix. The 4 ints are added together outside the loop. PiperOrigin-RevId: 358428739 | |||
2021-02-16 | Remove scalar C4 QS8 and QU8 gemm microkernels. | Frank Barchard | |
PiperOrigin-RevId: 357789675 | |||
2021-02-15 | QS8 C2 Neon igemm | Frank Barchard | |
PiperOrigin-RevId: 357621434 | |||
2021-02-15 | QS8 C8 Neon igemm | Frank Barchard | |
PiperOrigin-RevId: 357611597 | |||
2021-02-03 | C2 QS8 microkernel using mull then mlal with KC loop of 16 | Frank Barchard | |
PiperOrigin-RevId: 355524975 | |||
2021-01-29 | QS8 Neon GEMM C8 microkernel with 8 bit multiply and vpadal to accumulate. | Frank Barchard | |
C8 partial sums kernel using mull on 8 bit to 16 bit a full 64 bits at a time. Then padal to add pairs and lengthen to 32 bit, accumulating. The 4 int accumulators will represent 1 byte in the final output, so there is a vector for each element in the matrix. The 4 ints are added together outside the loop. PiperOrigin-RevId: 354631007 | |||
2021-01-22 | Implement bilinear upsampling (CHW layout) for ARM architecture | Artsiom Ablavatski | |
PiperOrigin-RevId: 353317573 | |||
2021-01-22 | QS8 Neon GEMM microkernel with 8 bit multiply and vpadal to accumulate | Frank Barchard | |
PiperOrigin-RevId: 353315852 | |||
2021-01-15 | QS8 GEMM and IGEMM 3x8 3x16 and IGEMM 4x8 and 4x16 | Frank Barchard | |
PiperOrigin-RevId: 352033627 | |||
2021-01-14 | QS8 Neon GEMM microkernel with 8 bit multiply | Frank Barchard | |
PiperOrigin-RevId: 351893800 | |||
2021-01-12 | Add 4x8 and 4x16 qs8 gemm microkernels | Frank Barchard | |
PiperOrigin-RevId: 351464523 | |||
2020-12-21 | WebAssembly DWConv2D 3x3 stride 2 loadsplat | Frank Barchard | |
PiperOrigin-RevId: 348584331 | |||
2020-12-21 | WebAssembly DWConv2D 5x5 stride 2 loadsplat | Frank Barchard | |
PiperOrigin-RevId: 348498716 | |||
2020-12-15 | WebAssembly DWConv2D 3x3p1 adapted from NEON | Frank Barchard | |
PiperOrigin-RevId: 347733375 | |||
2020-12-15 | WASMSIMD dwconv2d 5x5p2 use loadsplat | Frank Barchard | |
PiperOrigin-RevId: 347719530 | |||
2020-12-11 | Additional SSE/SSE2 GEMM/IGEMM microkernels | Marat Dukhan | |
4x8 LOAD1 version is still the fastest on Silvermont PiperOrigin-RevId: 347077581 | |||
2020-12-11 | Rename WASMSIMD dwconv2d functions to splat or loadsplat | Frank Barchard | |
- 3x3p1 is renamed to loadsplat. This function was ported from SSSE3. - 3x3s2p1 is renamed to splat. - 5x5p2 is renamed to splat. - 5x5s2p2 is renamed to splat. PiperOrigin-RevId: 347042306 | |||
2020-12-07 | Rename WebAssembly SIMD source files and functions with x86 or arm suffix ↵ | Frank Barchard | |
after wasmsimd - BUILD/CMakeLists.txt sorted alpha numberically PiperOrigin-RevId: 346133603 | |||
2020-12-06 | Refactor accuracy evaluation benchmarks | Marat Dukhan | |
PiperOrigin-RevId: 346004052 | |||
2020-12-06 | NEON versions of non-blocked F32 SpMM microkernels | Marat Dukhan | |
PiperOrigin-RevId: 346003353 | |||
2020-12-04 | WebAssembly SIMD DWConv2D 3x3 stride-2 adapted from NEON | Frank Barchard | |
PiperOrigin-RevId: 345689106 | |||
2020-12-03 | WebAssembly SIMD DWConv2D 5x5 stride 2 adapted from NEON | Frank Barchard | |
PiperOrigin-RevId: 345557870 | |||
2020-12-03 | Remove code generator for old 5x5p2 | Frank Barchard | |
PiperOrigin-RevId: 345423998 | |||
2020-12-01 | Vector ELU microkernels | Marat Dukhan | |
PiperOrigin-RevId: 345108685 | |||
2020-11-30 | Web Assemble DWConv2D f32_dwconv2d_chw_ukernel_5x5p2__wasmsimd adapted from Neon | Frank Barchard | |
PiperOrigin-RevId: 344959802 | |||
2020-11-22 | WAsm SIMD version of DWCONV2D CHW 3x3p1 | Frank Barchard | |
PiperOrigin-RevId: 343748483 | |||
2020-11-16 | WasmSIMD dwconv2d generate x86 optimized version. | Frank Barchard | |
PiperOrigin-RevId: 342804477 | |||
2020-11-03 | Pipelined Web Assembly Sparse Matrix Multiply | Frank Barchard | |
PiperOrigin-RevId: 340501315 | |||
2020-10-30 | Rename unroll to x for SpMM microkernels with unrolled loop | Frank Barchard | |
PiperOrigin-RevId: 339990626 | |||
2020-10-30 | SSE variant of 5x5s2 DWCONV CHW micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 339968575 | |||
2020-10-30 | SSE variants of 5x5 DWCONV CHW micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 339873989 | |||
2020-10-29 | Auto-generate 5x5s2 DWCONV CHW micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 339770846 | |||
2020-10-28 | Auto-generate 5x5s2p2 DWCONV CHW micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 339600408 | |||
2020-10-26 | Add 32x1 32x2 32x4 SPMM microkernels and remove 4x1 4x2 4x4 for WASMSIMD, ↵ | Frank Barchard | |
Neon and SSE PiperOrigin-RevId: 339125492 | |||
2020-10-26 | Auto-generate NEON 5x5p2 DWCONV micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 339097659 | |||
2020-10-25 | Auto-generate scalar 5x5p2 DWCONV CHW micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338981996 | |||
2020-10-25 | Auto-generate scalar versions of DWCONV2D CHW 3x3s2p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338962809 | |||
2020-10-25 | Auto-generate NEON/NEONFMA versions of DWCONV2D CHW 3x3s2p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338962587 | |||
2020-10-25 | Auto-generate SSE versions of DWCONV2D CHW 3x3s2p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338962367 | |||
2020-10-24 | Auto-generate scalar versions of DWCONV2D CHW 3x3p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338847524 | |||
2020-10-24 | NEON versions of DWCONV2D CHW 3x3p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338809743 | |||
2020-10-24 | Auto-generate AArch64 NEONFMA versions of DWCONV2D CHW 3x3p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338806381 | |||
2020-10-23 | SSSE3 versions of DWCONV2D CHW 3x3p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338801635 | |||
2020-10-23 | Auto-generate SSE versions of DWCONV2D CHW 3x3p1 micro-kernels | Marat Dukhan | |
PiperOrigin-RevId: 338798065 | |||
2020-10-23 | Add WebAssembly SIMD IBILINEAR microkernels for CHW layout | XNNPACK Team | |
PiperOrigin-RevId: 338792392 | |||
2020-10-23 | Rename DWCONV CHW microkernels to DWCONV2D CHW | Marat Dukhan | |
Explicitly mention that microkernels assume 2D layout PiperOrigin-RevId: 338788486 | |||
2020-10-23 | Roll back the decision to split the packed weights for the CHW IBILINEAR ↵ | XNNPACK Team | |
microkernel interface PiperOrigin-RevId: 338785454 | |||
2020-10-23 | Generate DWCONV CHW microkernel tests from a YAML specification | Marat Dukhan | |
PiperOrigin-RevId: 338783962 |