diff options
author | Tim Murray <timmurray@google.com> | 2015-03-30 15:14:36 -0700 |
---|---|---|
committer | Tim Murray <timmurray@google.com> | 2015-04-03 13:40:20 -0700 |
commit | aff744561bea3c8a7a7d59c0cb8cd9438f6dcd1c (patch) | |
tree | 9374920f3746bd76b27c692c94823f10398cc09d /rsDefines.h | |
parent | 9270cd93e444d11d6e1b49653613409f34a0cc35 (diff) | |
download | rs-aff744561bea3c8a7a7d59c0cb8cd9438f6dcd1c.tar.gz |
Add eight-bit GEMM-like intrinsic.
Change-Id: I9b920900b4cb8b27e2ab27386d05f4175142d6b2
Diffstat (limited to 'rsDefines.h')
-rw-r--r-- | rsDefines.h | 295 |
1 files changed, 151 insertions, 144 deletions
diff --git a/rsDefines.h b/rsDefines.h index 4ccdeb85..8b334c69 100644 --- a/rsDefines.h +++ b/rsDefines.h @@ -277,148 +277,151 @@ enum RsBlasSide { enum RsBlasFunction { RsBlas_nop = 0, - RsBlas_sdsdot, - RsBlas_dsdot, - RsBlas_sdot, - RsBlas_ddot, - RsBlas_cdotu_sub, - RsBlas_cdotc_sub, - RsBlas_zdotu_sub, - RsBlas_zdotc_sub, - RsBlas_snrm2, - RsBlas_sasum, - RsBlas_dnrm2, - RsBlas_dasum, - RsBlas_scnrm2, - RsBlas_scasum, - RsBlas_dznrm2, - RsBlas_dzasum, - RsBlas_isamax, - RsBlas_idamax, - RsBlas_icamax, - RsBlas_izamax, - RsBlas_sswap, - RsBlas_scopy, - RsBlas_saxpy, - RsBlas_dswap, - RsBlas_dcopy, - RsBlas_daxpy, - RsBlas_cswap, - RsBlas_ccopy, - RsBlas_caxpy, - RsBlas_zswap, - RsBlas_zcopy, - RsBlas_zaxpy, - RsBlas_srotg, - RsBlas_srotmg, - RsBlas_srot, - RsBlas_srotm, - RsBlas_drotg, - RsBlas_drotmg, - RsBlas_drot, - RsBlas_drotm, - RsBlas_sscal, - RsBlas_dscal, - RsBlas_cscal, - RsBlas_zscal, - RsBlas_csscal, - RsBlas_zdscal, - RsBlas_sgemv, - RsBlas_sgbmv, - RsBlas_strmv, - RsBlas_stbmv, - RsBlas_stpmv, - RsBlas_strsv, - RsBlas_stbsv, - RsBlas_stpsv, - RsBlas_dgemv, - RsBlas_dgbmv, - RsBlas_dtrmv, - RsBlas_dtbmv, - RsBlas_dtpmv, - RsBlas_dtrsv, - RsBlas_dtbsv, - RsBlas_dtpsv, - RsBlas_cgemv, - RsBlas_cgbmv, - RsBlas_ctrmv, - RsBlas_ctbmv, - RsBlas_ctpmv, - RsBlas_ctrsv, - RsBlas_ctbsv, - RsBlas_ctpsv, - RsBlas_zgemv, - RsBlas_zgbmv, - RsBlas_ztrmv, - RsBlas_ztbmv, - RsBlas_ztpmv, - RsBlas_ztrsv, - RsBlas_ztbsv, - RsBlas_ztpsv, - RsBlas_ssymv, - RsBlas_ssbmv, - RsBlas_sspmv, - RsBlas_sger, - RsBlas_ssyr, - RsBlas_sspr, - RsBlas_ssyr2, - RsBlas_sspr2, - RsBlas_dsymv, - RsBlas_dsbmv, - RsBlas_dspmv, - RsBlas_dger, - RsBlas_dsyr, - RsBlas_dspr, - RsBlas_dsyr2, - RsBlas_dspr2, - RsBlas_chemv, - RsBlas_chbmv, - RsBlas_chpmv, - RsBlas_cgeru, - RsBlas_cgerc, - RsBlas_cher, - RsBlas_chpr, - RsBlas_cher2, - RsBlas_chpr2, - RsBlas_zhemv, - RsBlas_zhbmv, - RsBlas_zhpmv, - RsBlas_zgeru, - RsBlas_zgerc, - RsBlas_zher, - RsBlas_zhpr, - RsBlas_zher2, - RsBlas_zhpr2, - RsBlas_sgemm, - RsBlas_ssymm, - RsBlas_ssyrk, - RsBlas_ssyr2k, - RsBlas_strmm, - RsBlas_strsm, - RsBlas_dgemm, - RsBlas_dsymm, - RsBlas_dsyrk, - RsBlas_dsyr2k, - RsBlas_dtrmm, - RsBlas_dtrsm, - RsBlas_cgemm, - RsBlas_csymm, - RsBlas_csyrk, - RsBlas_csyr2k, - RsBlas_ctrmm, - RsBlas_ctrsm, - RsBlas_zgemm, - RsBlas_zsymm, - RsBlas_zsyrk, - RsBlas_zsyr2k, - RsBlas_ztrmm, - RsBlas_ztrsm, - RsBlas_chemm, - RsBlas_cherk, - RsBlas_cher2k, - RsBlas_zhemm, - RsBlas_zherk, - RsBlas_zher2k + RsBlas_sdsdot = 1, + RsBlas_dsdot = 2, + RsBlas_sdot = 3, + RsBlas_ddot = 4, + RsBlas_cdotu_sub = 5, + RsBlas_cdotc_sub = 6, + RsBlas_zdotu_sub = 7, + RsBlas_zdotc_sub = 8, + RsBlas_snrm2 = 9, + RsBlas_sasum = 10, + RsBlas_dnrm2 = 11, + RsBlas_dasum = 12, + RsBlas_scnrm2 = 13, + RsBlas_scasum = 14, + RsBlas_dznrm2 = 15, + RsBlas_dzasum = 16, + RsBlas_isamax = 17, + RsBlas_idamax = 18, + RsBlas_icamax = 19, + RsBlas_izamax = 20, + RsBlas_sswap = 21, + RsBlas_scopy = 22, + RsBlas_saxpy = 23, + RsBlas_dswap = 24, + RsBlas_dcopy = 25, + RsBlas_daxpy = 26, + RsBlas_cswap = 27, + RsBlas_ccopy = 28, + RsBlas_caxpy = 29, + RsBlas_zswap = 30, + RsBlas_zcopy = 31, + RsBlas_zaxpy = 32, + RsBlas_srotg = 33, + RsBlas_srotmg = 34, + RsBlas_srot = 35, + RsBlas_srotm = 36, + RsBlas_drotg = 37, + RsBlas_drotmg = 38, + RsBlas_drot = 39, + RsBlas_drotm = 40, + RsBlas_sscal = 41, + RsBlas_dscal = 42, + RsBlas_cscal = 43, + RsBlas_zscal = 44, + RsBlas_csscal = 45, + RsBlas_zdscal = 46, + RsBlas_sgemv = 47, + RsBlas_sgbmv = 48, + RsBlas_strmv = 49, + RsBlas_stbmv = 50, + RsBlas_stpmv = 51, + RsBlas_strsv = 52, + RsBlas_stbsv = 53, + RsBlas_stpsv = 54, + RsBlas_dgemv = 55, + RsBlas_dgbmv = 56, + RsBlas_dtrmv = 57, + RsBlas_dtbmv = 58, + RsBlas_dtpmv = 59, + RsBlas_dtrsv = 60, + RsBlas_dtbsv = 61, + RsBlas_dtpsv = 62, + RsBlas_cgemv = 63, + RsBlas_cgbmv = 64, + RsBlas_ctrmv = 65, + RsBlas_ctbmv = 66, + RsBlas_ctpmv = 67, + RsBlas_ctrsv = 68, + RsBlas_ctbsv = 69, + RsBlas_ctpsv = 70, + RsBlas_zgemv = 71, + RsBlas_zgbmv = 72, + RsBlas_ztrmv = 73, + RsBlas_ztbmv = 74, + RsBlas_ztpmv = 75, + RsBlas_ztrsv = 76, + RsBlas_ztbsv = 77, + RsBlas_ztpsv = 78, + RsBlas_ssymv = 79, + RsBlas_ssbmv = 80, + RsBlas_sspmv = 81, + RsBlas_sger = 82, + RsBlas_ssyr = 83, + RsBlas_sspr = 84, + RsBlas_ssyr2 = 85, + RsBlas_sspr2 = 86, + RsBlas_dsymv = 87, + RsBlas_dsbmv = 88, + RsBlas_dspmv = 89, + RsBlas_dger = 90, + RsBlas_dsyr = 91, + RsBlas_dspr = 92, + RsBlas_dsyr2 = 93, + RsBlas_dspr2 = 94, + RsBlas_chemv = 95, + RsBlas_chbmv = 96, + RsBlas_chpmv = 97, + RsBlas_cgeru = 98, + RsBlas_cgerc = 99, + RsBlas_cher = 100, + RsBlas_chpr = 101, + RsBlas_cher2 = 102, + RsBlas_chpr2 = 103, + RsBlas_zhemv = 104, + RsBlas_zhbmv = 105, + RsBlas_zhpmv = 106, + RsBlas_zgeru = 107, + RsBlas_zgerc = 108, + RsBlas_zher = 109, + RsBlas_zhpr = 110, + RsBlas_zher2 = 111, + RsBlas_zhpr2 = 112, + RsBlas_sgemm = 113, + RsBlas_ssymm = 114, + RsBlas_ssyrk = 115, + RsBlas_ssyr2k = 116, + RsBlas_strmm = 117, + RsBlas_strsm = 118, + RsBlas_dgemm = 119, + RsBlas_dsymm = 120, + RsBlas_dsyrk = 121, + RsBlas_dsyr2k = 122, + RsBlas_dtrmm = 123, + RsBlas_dtrsm = 124, + RsBlas_cgemm = 125, + RsBlas_csymm = 126, + RsBlas_csyrk = 127, + RsBlas_csyr2k = 128, + RsBlas_ctrmm = 129, + RsBlas_ctrsm = 130, + RsBlas_zgemm = 131, + RsBlas_zsymm = 132, + RsBlas_zsyrk = 133, + RsBlas_zsyr2k = 134, + RsBlas_ztrmm = 135, + RsBlas_ztrsm = 136, + RsBlas_chemm = 137, + RsBlas_cherk = 138, + RsBlas_cher2k = 139, + RsBlas_zhemm = 140, + RsBlas_zherk = 141, + RsBlas_zher2k = 142, + + // BLAS extensions start here + RsBlas_bgemm = 1000, }; // custom complex types because of NDK support @@ -432,7 +435,7 @@ typedef struct { double i; } RsDoubleComplex; -typedef union { +typedef union { float f; RsFloatComplex c; double d; @@ -455,8 +458,12 @@ typedef struct { int incY; int KL; int KU; + uint32_t a_offset; + uint32_t b_offset; + uint32_t c_offset; + uint32_t c_mult_int; } RsBlasCall; - + #ifdef __cplusplus }; #endif |