aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorMárton Danóczy <marton.danoczy@nielsen.com>2015-09-30 18:14:38 +0200
committerMárton Danóczy <marton.danoczy@nielsen.com>2015-09-30 18:14:38 +0200
commiteee294f730e8eed08f95bbc8451d010f266bdbf3 (patch)
tree01bbc4a6adf31795e508c4e50d5f27579638f634 /README.md
parent64b8a2ed6a1b624849e7b369959524d346c598d9 (diff)
downloadpffft-eee294f730e8eed08f95bbc8451d010f266bdbf3.tar.gz
fixed formatting
Diffstat (limited to 'README.md')
-rw-r--r--README.md507
1 files changed, 261 insertions, 246 deletions
diff --git a/README.md b/README.md
index 2343216..01a6434 100644
--- a/README.md
+++ b/README.md
@@ -30,8 +30,8 @@ is a library that is written in FORTRAN 77, a language that is now
considered as a bit antiquated by many. FFTPACKv4 was written in 1985,
by Dr Paul Swarztrauber of NCAR, more than 25 years ago ! And despite
its age, benchmarks show it that it still a very good performing FFT
-library, see for example the 1d single precision benchmarks here:
-http://www.fftw.org/speed/opteron-2.2GHz-32bit/ . It is however not
+library, see for example the 1d single precision benchmarks
+[here](http://www.fftw.org/speed/opteron-2.2GHz-32bit/). It is however not
competitive with the fastest ones, such as FFTW, Intel MKL, AMD ACML,
Apple vDSP. The reason for that is that those libraries do take
advantage of the SSE SIMD instructions available on Intel CPUs,
@@ -43,16 +43,16 @@ a 4-fold performance improvement.
The idea was to take this fortran fftpack v4 code, translate to C,
modify it to deal with those SSE instructions, and check that the
final performance is not completely ridiculous when compared to other
-SIMD FFT libraries. Translation to C was performed with f2c (
-http://www.netlib.org/f2c/ ). The resulting file was a bit edited in
+SIMD FFT libraries. Translation to C was performed with [f2c](
+http://www.netlib.org/f2c/). The resulting file was a bit edited in
order to remove the thousands of gotos that were introduced by
f2c. You will find the fftpack.h and fftpack.c sources in the
-repository, this a complete translation of
-http://www.netlib.org/fftpack/ , with the discrete cosine transform
+repository, this a complete translation of [fftpack](
+http://www.netlib.org/fftpack/), with the discrete cosine transform
and the test program. There is no license information in the netlib
repository, but it was confirmed to me by the fftpack v5 curators that
-the same terms do apply to fftpack v4:
-http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html . This is a
+the [same terms do apply to fftpack v4]
+(http://www.cisl.ucar.edu/css/software/fftpack5/ftpk.html). This is a
"BSD-like" license, it is compatible with proprietary projects.
Adapting fftpack to deal with the SIMD 4-element vectors instead of
@@ -63,8 +63,8 @@ more code than I planned..
## The code:
-Only two files, in good old C, pffft.c and pffft.h . The API is very
-very simple, just make sure that you read the comments in pffft.h.
+Only two files, in good old C, `pffft.c` and `pffft.h`. The API is very
+very simple, just make sure that you read the comments in `pffft.h`.
## Comparison with other FFTs:
@@ -89,244 +89,257 @@ operation.
The benchmark shows the performance of various fft implementations measured in
MFlops, with the number of floating point operations being defined as 5Nlog2(N)
for a length N complex fft, and 2.5*Nlog2(N) for a real fft.
-See http://www.fftw.org/speed/method.html for an explanation of these formulas.
+See [here](http://www.fftw.org/speed/method.html) for an explanation of these formulas.
-MacOS Lion, gcc 4.2, 64-bit, fftw 3.3 on a 3.4 GHz core i7 2600
+### MacOS Lion, gcc 4.2, 64-bit, fftw 3.3 on a 3.4 GHz core i7 2600
Built with:
- gcc-4.2 -o test_pffft -arch x86_64 -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -DHAVE_VECLIB -framework veclib -DHAVE_FFTW -lfftw3f
-
-| input len |real FFTPack| real vDSP | real FFTW | real PFFFT | |cplx FFTPack| cplx vDSP | cplx FFTW | cplx PFFFT |
-|-----------+------------+------------+------------+------------| |------------+------------+------------+------------|
-| 64 | 2816 | 8596 | 7329 | 8187 | | 2887 | 14898 | 14668 | 11108 |
-| 96 | 3298 | n/a | 8378 | 7727 | | 3953 | n/a | 15680 | 10878 |
-| 128 | 3507 | 11575 | 9266 | 10108 | | 4233 | 17598 | 16427 | 12000 |
-| 160 | 3391 | n/a | 9838 | 10711 | | 4220 | n/a | 16653 | 11187 |
-| 192 | 3919 | n/a | 9868 | 10956 | | 4297 | n/a | 15770 | 12540 |
-| 256 | 4283 | 13179 | 10694 | 13128 | | 4545 | 19550 | 16350 | 13822 |
-| 384 | 3136 | n/a | 10810 | 12061 | | 3600 | n/a | 16103 | 13240 |
-| 480 | 3477 | n/a | 10632 | 12074 | | 3536 | n/a | 11630 | 12522 |
-| 512 | 3783 | 15141 | 11267 | 13838 | | 3649 | 20002 | 16560 | 13580 |
-| 640 | 3639 | n/a | 11164 | 13946 | | 3695 | n/a | 15416 | 13890 |
-| 768 | 3800 | n/a | 11245 | 13495 | | 3590 | n/a | 15802 | 14552 |
-| 800 | 3440 | n/a | 10499 | 13301 | | 3659 | n/a | 12056 | 13268 |
-| 1024 | 3924 | 15605 | 11450 | 15339 | | 3769 | 20963 | 13941 | 15467 |
-| 2048 | 4518 | 16195 | 11551 | 15532 | | 4258 | 20413 | 13723 | 15042 |
-| 2400 | 4294 | n/a | 10685 | 13078 | | 4093 | n/a | 12777 | 13119 |
-| 4096 | 4750 | 16596 | 11672 | 15817 | | 4157 | 19662 | 14316 | 14336 |
-| 8192 | 3820 | 16227 | 11084 | 12555 | | 3691 | 18132 | 12102 | 13813 |
-| 9216 | 3864 | n/a | 10254 | 12870 | | 3586 | n/a | 12119 | 13994 |
-| 16384 | 3822 | 15123 | 10454 | 12822 | | 3613 | 16874 | 12370 | 13881 |
-| 32768 | 4175 | 14512 | 10662 | 11095 | | 3881 | 14702 | 11619 | 11524 |
-| 262144 | 3317 | 11429 | 6269 | 9517 | | 2810 | 11729 | 7757 | 10179 |
-| 1048576 | 2913 | 10551 | 4730 | 5867 | | 2661 | 7881 | 3520 | 5350 |
-|-----------+------------+------------+------------+------------| |------------+------------+------------+------------|
-
-
-Debian 6, gcc 4.4.5, 64-bit, fftw 3.3.1 on a 3.4 GHz core i7 2600
+```
+gcc-4.2 -o test_pffft -arch x86_64 -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -DHAVE_VECLIB -framework veclib -DHAVE_FFTW -lfftw3f
+```
+
+| input len |real FFTPack| real vDSP | real FFTW | real PFFFT |cplx FFTPack| cplx vDSP | cplx FFTW | cplx PFFFT |
+|----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|
+| 64 | 2816 | 8596 | 7329 | 8187 | 2887 | 14898 | 14668 | 11108 |
+| 96 | 3298 | n/a | 8378 | 7727 | 3953 | n/a | 15680 | 10878 |
+| 128 | 3507 | 11575 | 9266 | 10108 | 4233 | 17598 | 16427 | 12000 |
+| 160 | 3391 | n/a | 9838 | 10711 | 4220 | n/a | 16653 | 11187 |
+| 192 | 3919 | n/a | 9868 | 10956 | 4297 | n/a | 15770 | 12540 |
+| 256 | 4283 | 13179 | 10694 | 13128 | 4545 | 19550 | 16350 | 13822 |
+| 384 | 3136 | n/a | 10810 | 12061 | 3600 | n/a | 16103 | 13240 |
+| 480 | 3477 | n/a | 10632 | 12074 | 3536 | n/a | 11630 | 12522 |
+| 512 | 3783 | 15141 | 11267 | 13838 | 3649 | 20002 | 16560 | 13580 |
+| 640 | 3639 | n/a | 11164 | 13946 | 3695 | n/a | 15416 | 13890 |
+| 768 | 3800 | n/a | 11245 | 13495 | 3590 | n/a | 15802 | 14552 |
+| 800 | 3440 | n/a | 10499 | 13301 | 3659 | n/a | 12056 | 13268 |
+| 1024 | 3924 | 15605 | 11450 | 15339 | 3769 | 20963 | 13941 | 15467 |
+| 2048 | 4518 | 16195 | 11551 | 15532 | 4258 | 20413 | 13723 | 15042 |
+| 2400 | 4294 | n/a | 10685 | 13078 | 4093 | n/a | 12777 | 13119 |
+| 4096 | 4750 | 16596 | 11672 | 15817 | 4157 | 19662 | 14316 | 14336 |
+| 8192 | 3820 | 16227 | 11084 | 12555 | 3691 | 18132 | 12102 | 13813 |
+| 9216 | 3864 | n/a | 10254 | 12870 | 3586 | n/a | 12119 | 13994 |
+| 16384 | 3822 | 15123 | 10454 | 12822 | 3613 | 16874 | 12370 | 13881 |
+| 32768 | 4175 | 14512 | 10662 | 11095 | 3881 | 14702 | 11619 | 11524 |
+| 262144 | 3317 | 11429 | 6269 | 9517 | 2810 | 11729 | 7757 | 10179 |
+| 1048576 | 2913 | 10551 | 4730 | 5867 | 2661 | 7881 | 3520 | 5350 |
+
+
+### Debian 6, gcc 4.4.5, 64-bit, fftw 3.3.1 on a 3.4 GHz core i7 2600
Built with:
-gcc -o test_pffft -DHAVE_FFTW -msse2 -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L$HOME/local/lib -I$HOME/local/include/ -lfftw3f -lm
-| N (input length) | real FFTPack | real FFTW | real PFFFT | | cplx FFTPack | cplx FFTW | cplx PFFFT |
-|------------------+--------------+--------------+--------------| |--------------+--------------+--------------|
-| 64 | 3840 | 7680 | 8777 | | 4389 | 20480 | 11171 |
-| 96 | 4214 | 9633 | 8429 | | 4816 | 22477 | 11238 |
-| 128 | 3584 | 10240 | 10240 | | 5120 | 23893 | 11947 |
-| 192 | 4854 | 11095 | 12945 | | 4854 | 22191 | 14121 |
-| 256 | 4096 | 11703 | 16384 | | 5120 | 23406 | 13653 |
-| 384 | 4395 | 14651 | 12558 | | 4884 | 19535 | 14651 |
-| 512 | 5760 | 13166 | 15360 | | 4608 | 23040 | 15360 |
-| 768 | 4907 | 14020 | 16357 | | 4461 | 19628 | 14020 |
-| 1024 | 5120 | 14629 | 14629 | | 5120 | 20480 | 15754 |
-| 2048 | 5632 | 14080 | 18773 | | 4693 | 12516 | 16091 |
-| 4096 | 5120 | 13653 | 17554 | | 4726 | 7680 | 14456 |
-| 8192 | 4160 | 7396 | 13312 | | 4437 | 14791 | 13312 |
-| 9216 | 4210 | 6124 | 13473 | | 4491 | 7282 | 14970 |
-| 16384 | 3976 | 11010 | 14313 | | 4210 | 11450 | 13631 |
-| 32768 | 4260 | 10224 | 10954 | | 4260 | 6816 | 11797 |
-| 262144 | 3736 | 6896 | 9961 | | 2359 | 8965 | 9437 |
-| 1048576 | 2796 | 4534 | 6453 | | 1864 | 3078 | 5592 |
-|------------------+--------------+--------------+--------------| |--------------+--------------+--------------|
-
-
-
-MacOS Snow Leopard, gcc 4.0, 32-bit, fftw 3.3 on a 1.83 GHz core 1 duo
+```
+gcc -o test_pffft -DHAVE_FFTW -msse2 -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L$HOME/local/lib -I$HOME/local/include/ -lfftw3f -lm
+```
+
+| N (input length) | real FFTPack | real FFTW | real PFFFT | cplx FFTPack | cplx FFTW | cplx PFFFT |
+|-----------------:|-------------:|-------------:|-------------:|-------------:|-------------:|-------------:|
+| 64 | 3840 | 7680 | 8777 | 4389 | 20480 | 11171 |
+| 96 | 4214 | 9633 | 8429 | 4816 | 22477 | 11238 |
+| 128 | 3584 | 10240 | 10240 | 5120 | 23893 | 11947 |
+| 192 | 4854 | 11095 | 12945 | 4854 | 22191 | 14121 |
+| 256 | 4096 | 11703 | 16384 | 5120 | 23406 | 13653 |
+| 384 | 4395 | 14651 | 12558 | 4884 | 19535 | 14651 |
+| 512 | 5760 | 13166 | 15360 | 4608 | 23040 | 15360 |
+| 768 | 4907 | 14020 | 16357 | 4461 | 19628 | 14020 |
+| 1024 | 5120 | 14629 | 14629 | 5120 | 20480 | 15754 |
+| 2048 | 5632 | 14080 | 18773 | 4693 | 12516 | 16091 |
+| 4096 | 5120 | 13653 | 17554 | 4726 | 7680 | 14456 |
+| 8192 | 4160 | 7396 | 13312 | 4437 | 14791 | 13312 |
+| 9216 | 4210 | 6124 | 13473 | 4491 | 7282 | 14970 |
+| 16384 | 3976 | 11010 | 14313 | 4210 | 11450 | 13631 |
+| 32768 | 4260 | 10224 | 10954 | 4260 | 6816 | 11797 |
+| 262144 | 3736 | 6896 | 9961 | 2359 | 8965 | 9437 |
+| 1048576 | 2796 | 4534 | 6453 | 1864 | 3078 | 5592 |
+
+
+
+### MacOS Snow Leopard, gcc 4.0, 32-bit, fftw 3.3 on a 1.83 GHz core 1 duo
Built with:
- gcc -o test_pffft -DHAVE_FFTW -DHAVE_VECLIB -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -lfftw3f -framework veclib
-
-| input len |real FFTPack| real vDSP | real FFTW | real PFFFT | |cplx FFTPack| cplx vDSP | cplx FFTW | cplx PFFFT |
-|-----------+------------+------------+------------+------------| |------------+------------+------------+------------|
-| 64 | 745 | 2145 | 1706 | 2028 | | 961 | 3356 | 3313 | 2300 |
-| 96 | 877 | n/a | 1976 | 1978 | | 1059 | n/a | 3333 | 2233 |
-| 128 | 951 | 2808 | 2213 | 2279 | | 1202 | 3803 | 3739 | 2494 |
-| 192 | 1002 | n/a | 2456 | 2429 | | 1186 | n/a | 3701 | 2508 |
-| 256 | 1065 | 3205 | 2641 | 2793 | | 1302 | 4013 | 3912 | 2663 |
-| 384 | 845 | n/a | 2759 | 2499 | | 948 | n/a | 3729 | 2504 |
-| 512 | 900 | 3476 | 2956 | 2759 | | 974 | 4057 | 3954 | 2645 |
-| 768 | 910 | n/a | 2912 | 2737 | | 975 | n/a | 3837 | 2614 |
-| 1024 | 936 | 3583 | 3107 | 3009 | | 1006 | 4124 | 3821 | 2697 |
-| 2048 | 1057 | 3585 | 3091 | 2837 | | 1089 | 3889 | 3701 | 2513 |
-| 4096 | 1083 | 3524 | 3092 | 2733 | | 1039 | 3617 | 3462 | 2364 |
-| 8192 | 874 | 3252 | 2967 | 2363 | | 911 | 3106 | 2789 | 2302 |
-| 9216 | 898 | n/a | 2420 | 2290 | | 865 | n/a | 2676 | 2204 |
-| 16384 | 903 | 2892 | 2506 | 2421 | | 899 | 3026 | 2797 | 2289 |
-| 32768 | 965 | 2837 | 2550 | 2358 | | 920 | 2922 | 2763 | 2240 |
-| 262144 | 738 | 2422 | 1589 | 1708 | | 610 | 2038 | 1436 | 1091 |
-| 1048576 | 528 | 1207 | 845 | 880 | | 606 | 1020 | 669 | 1036 |
-|-----------+------------+------------+------------+------------| |------------+------------+------------+------------|
-
-
-
-Ubuntu 11.04, gcc 4.5, 32-bit, fftw 3.2 on a 2.66 core 2 quad
+```
+gcc -o test_pffft -DHAVE_FFTW -DHAVE_VECLIB -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -lfftw3f -framework veclib
+```
+
+| input len |real FFTPack| real vDSP | real FFTW | real PFFFT |cplx FFTPack| cplx vDSP | cplx FFTW | cplx PFFFT |
+|----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|
+| 64 | 745 | 2145 | 1706 | 2028 | 961 | 3356 | 3313 | 2300 |
+| 96 | 877 | n/a | 1976 | 1978 | 1059 | n/a | 3333 | 2233 |
+| 128 | 951 | 2808 | 2213 | 2279 | 1202 | 3803 | 3739 | 2494 |
+| 192 | 1002 | n/a | 2456 | 2429 | 1186 | n/a | 3701 | 2508 |
+| 256 | 1065 | 3205 | 2641 | 2793 | 1302 | 4013 | 3912 | 2663 |
+| 384 | 845 | n/a | 2759 | 2499 | 948 | n/a | 3729 | 2504 |
+| 512 | 900 | 3476 | 2956 | 2759 | 974 | 4057 | 3954 | 2645 |
+| 768 | 910 | n/a | 2912 | 2737 | 975 | n/a | 3837 | 2614 |
+| 1024 | 936 | 3583 | 3107 | 3009 | 1006 | 4124 | 3821 | 2697 |
+| 2048 | 1057 | 3585 | 3091 | 2837 | 1089 | 3889 | 3701 | 2513 |
+| 4096 | 1083 | 3524 | 3092 | 2733 | 1039 | 3617 | 3462 | 2364 |
+| 8192 | 874 | 3252 | 2967 | 2363 | 911 | 3106 | 2789 | 2302 |
+| 9216 | 898 | n/a | 2420 | 2290 | 865 | n/a | 2676 | 2204 |
+| 16384 | 903 | 2892 | 2506 | 2421 | 899 | 3026 | 2797 | 2289 |
+| 32768 | 965 | 2837 | 2550 | 2358 | 920 | 2922 | 2763 | 2240 |
+| 262144 | 738 | 2422 | 1589 | 1708 | 610 | 2038 | 1436 | 1091 |
+| 1048576 | 528 | 1207 | 845 | 880 | 606 | 1020 | 669 | 1036 |
+
+
+
+### Ubuntu 11.04, gcc 4.5, 32-bit, fftw 3.2 on a 2.66 core 2 quad
Built with:
-gcc -o test_pffft -DHAVE_FFTW -msse -mfpmath=sse -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -lfftw3f -lm
-| input len |real FFTPack| real FFTW | real PFFFT | |cplx FFTPack| cplx FFTW | cplx PFFFT |
-|-----------+------------+------------+------------| |------------+------------+------------|
-| 64 | 1920 | 3614 | 5120 | | 2194 | 7680 | 6467 |
-| 96 | 1873 | 3549 | 5187 | | 2107 | 8429 | 5863 |
-| 128 | 2240 | 3773 | 5514 | | 2560 | 7964 | 6827 |
-| 192 | 1765 | 4569 | 7767 | | 2284 | 9137 | 7061 |
-| 256 | 2048 | 5461 | 7447 | | 2731 | 9638 | 7802 |
-| 384 | 1998 | 5861 | 6762 | | 2313 | 9253 | 7644 |
-| 512 | 2095 | 6144 | 7680 | | 2194 | 10240 | 7089 |
-| 768 | 2230 | 5773 | 7549 | | 2045 | 10331 | 7010 |
-| 1024 | 2133 | 6400 | 8533 | | 2133 | 10779 | 7877 |
-| 2048 | 2011 | 7040 | 8665 | | 1942 | 10240 | 7768 |
-| 4096 | 2194 | 6827 | 8777 | | 1755 | 9452 | 6827 |
-| 8192 | 1849 | 6656 | 6656 | | 1752 | 7831 | 6827 |
-| 9216 | 1871 | 5858 | 6416 | | 1643 | 6909 | 6266 |
-| 16384 | 1883 | 6223 | 6506 | | 1664 | 7340 | 6982 |
-| 32768 | 1826 | 6390 | 6667 | | 1631 | 7481 | 6971 |
-| 262144 | 1546 | 4075 | 5977 | | 1299 | 3415 | 3551 |
-| 1048576 | 1104 | 2071 | 1730 | | 1104 | 1149 | 1834 |
-|-----------+------------+------------+------------| |------------+------------+------------|
-
-
-
-Ubuntu 11.04, gcc 4.5, 32-bit, fftw 3.3 on a 1.6 GHz Atom N270
+```
+gcc -o test_pffft -DHAVE_FFTW -msse -mfpmath=sse -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -lfftw3f -lm
+```
+
+| input len |real FFTPack| real FFTW | real PFFFT |cplx FFTPack| cplx FFTW | cplx PFFFT |
+|----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|
+| 64 | 1920 | 3614 | 5120 | 2194 | 7680 | 6467 |
+| 96 | 1873 | 3549 | 5187 | 2107 | 8429 | 5863 |
+| 128 | 2240 | 3773 | 5514 | 2560 | 7964 | 6827 |
+| 192 | 1765 | 4569 | 7767 | 2284 | 9137 | 7061 |
+| 256 | 2048 | 5461 | 7447 | 2731 | 9638 | 7802 |
+| 384 | 1998 | 5861 | 6762 | 2313 | 9253 | 7644 |
+| 512 | 2095 | 6144 | 7680 | 2194 | 10240 | 7089 |
+| 768 | 2230 | 5773 | 7549 | 2045 | 10331 | 7010 |
+| 1024 | 2133 | 6400 | 8533 | 2133 | 10779 | 7877 |
+| 2048 | 2011 | 7040 | 8665 | 1942 | 10240 | 7768 |
+| 4096 | 2194 | 6827 | 8777 | 1755 | 9452 | 6827 |
+| 8192 | 1849 | 6656 | 6656 | 1752 | 7831 | 6827 |
+| 9216 | 1871 | 5858 | 6416 | 1643 | 6909 | 6266 |
+| 16384 | 1883 | 6223 | 6506 | 1664 | 7340 | 6982 |
+| 32768 | 1826 | 6390 | 6667 | 1631 | 7481 | 6971 |
+| 262144 | 1546 | 4075 | 5977 | 1299 | 3415 | 3551 |
+| 1048576 | 1104 | 2071 | 1730 | 1104 | 1149 | 1834 |
+
+
+
+### Ubuntu 11.04, gcc 4.5, 32-bit, fftw 3.3 on a 1.6 GHz Atom N270
Built with:
-gcc -o test_pffft -DHAVE_FFTW -msse -mfpmath=sse -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -lfftw3f -lm
-| N (input length) | real FFTPack | real FFTW | real PFFFT | | cplx FFTPack | cplx FFTW | cplx PFFFT |
-|------------------+--------------+--------------+--------------| |--------------+--------------+--------------|
-| 64 | 452 | 1041 | 1336 | | 549 | 2318 | 1781 |
-| 96 | 444 | 1297 | 1297 | | 503 | 2408 | 1686 |
-| 128 | 527 | 1525 | 1707 | | 543 | 2655 | 1886 |
-| 192 | 498 | 1653 | 1849 | | 539 | 2678 | 1942 |
-| 256 | 585 | 1862 | 2156 | | 594 | 2777 | 2244 |
-| 384 | 499 | 1870 | 1998 | | 511 | 2586 | 1890 |
-| 512 | 562 | 2095 | 2194 | | 542 | 2973 | 2194 |
-| 768 | 545 | 2045 | 2133 | | 545 | 2365 | 2133 |
-| 1024 | 595 | 2133 | 2438 | | 569 | 2695 | 2179 |
-| 2048 | 587 | 2125 | 2347 | | 521 | 2230 | 1707 |
-| 4096 | 495 | 1890 | 1834 | | 492 | 1876 | 1672 |
-| 8192 | 469 | 1548 | 1729 | | 438 | 1740 | 1664 |
-| 9216 | 468 | 1663 | 1663 | | 446 | 1585 | 1531 |
-| 16384 | 453 | 1608 | 1767 | | 398 | 1476 | 1664 |
-| 32768 | 456 | 1420 | 1503 | | 387 | 1388 | 1345 |
-| 262144 | 309 | 385 | 726 | | 262 | 415 | 840 |
-| 1048576 | 280 | 351 | 739 | | 261 | 313 | 797 |
-|------------------+--------------+--------------+--------------| |--------------+--------------+--------------|
-
-
-
-Windows 7, visual c++ 2010 on a 1.6 GHz Atom N270
+```
+gcc -o test_pffft -DHAVE_FFTW -msse -mfpmath=sse -O3 -Wall -W pffft.c test_pffft.c fftpack.c -L/usr/local/lib -I/usr/local/include/ -lfftw3f -lm
+```
+
+| N (input length) | real FFTPack | real FFTW | real PFFFT | cplx FFTPack | cplx FFTW | cplx PFFFT |
+|-----------------:|-------------:|-------------:|-------------:|-------------:|-------------:|-------------:|
+| 64 | 452 | 1041 | 1336 | 549 | 2318 | 1781 |
+| 96 | 444 | 1297 | 1297 | 503 | 2408 | 1686 |
+| 128 | 527 | 1525 | 1707 | 543 | 2655 | 1886 |
+| 192 | 498 | 1653 | 1849 | 539 | 2678 | 1942 |
+| 256 | 585 | 1862 | 2156 | 594 | 2777 | 2244 |
+| 384 | 499 | 1870 | 1998 | 511 | 2586 | 1890 |
+| 512 | 562 | 2095 | 2194 | 542 | 2973 | 2194 |
+| 768 | 545 | 2045 | 2133 | 545 | 2365 | 2133 |
+| 1024 | 595 | 2133 | 2438 | 569 | 2695 | 2179 |
+| 2048 | 587 | 2125 | 2347 | 521 | 2230 | 1707 |
+| 4096 | 495 | 1890 | 1834 | 492 | 1876 | 1672 |
+| 8192 | 469 | 1548 | 1729 | 438 | 1740 | 1664 |
+| 9216 | 468 | 1663 | 1663 | 446 | 1585 | 1531 |
+| 16384 | 453 | 1608 | 1767 | 398 | 1476 | 1664 |
+| 32768 | 456 | 1420 | 1503 | 387 | 1388 | 1345 |
+| 262144 | 309 | 385 | 726 | 262 | 415 | 840 |
+| 1048576 | 280 | 351 | 739 | 261 | 313 | 797 |
+
+
+
+### Windows 7, visual c++ 2010 on a 1.6 GHz Atom N270
Built with:
+
+```
cl /Ox -D_USE_MATH_DEFINES /arch:SSE test_pffft.c pffft.c fftpack.c
+```
(visual c++ is definitively not very good with SSE intrinsics...)
-| N (input length) | real FFTPack | real PFFFT | | cplx FFTPack | cplx PFFFT |
-|------------------+--------------+--------------| |--------------+--------------|
-| 64 | 173 | 1009 | | 174 | 1159 |
-| 96 | 169 | 1029 | | 188 | 1201 |
-| 128 | 195 | 1242 | | 191 | 1275 |
-| 192 | 178 | 1312 | | 184 | 1276 |
-| 256 | 196 | 1591 | | 186 | 1281 |
-| 384 | 172 | 1409 | | 181 | 1281 |
-| 512 | 187 | 1640 | | 181 | 1313 |
-| 768 | 171 | 1614 | | 176 | 1258 |
-| 1024 | 186 | 1812 | | 178 | 1223 |
-| 2048 | 190 | 1707 | | 186 | 1099 |
-| 4096 | 182 | 1446 | | 177 | 975 |
-| 8192 | 175 | 1345 | | 169 | 1034 |
-| 9216 | 165 | 1271 | | 168 | 1023 |
-| 16384 | 166 | 1396 | | 165 | 949 |
-| 32768 | 172 | 1311 | | 161 | 881 |
-| 262144 | 136 | 632 | | 134 | 629 |
-| 1048576 | 134 | 698 | | 127 | 623 |
-|------------------+--------------+--------------| |--------------+--------------|
-
-
-
-Ubuntu 12.04, gcc-4.7.3, 32-bit, with fftw 3.3.3 (built with --enable-neon), on a 1.2GHz ARM Cortex A9 (Tegra 3)
+| N (input length) | real FFTPack | real PFFFT | cplx FFTPack | cplx PFFFT |
+|-----------------:|-------------:|-------------:|-------------:|-------------:|
+| 64 | 173 | 1009 | 174 | 1159 |
+| 96 | 169 | 1029 | 188 | 1201 |
+| 128 | 195 | 1242 | 191 | 1275 |
+| 192 | 178 | 1312 | 184 | 1276 |
+| 256 | 196 | 1591 | 186 | 1281 |
+| 384 | 172 | 1409 | 181 | 1281 |
+| 512 | 187 | 1640 | 181 | 1313 |
+| 768 | 171 | 1614 | 176 | 1258 |
+| 1024 | 186 | 1812 | 178 | 1223 |
+| 2048 | 190 | 1707 | 186 | 1099 |
+| 4096 | 182 | 1446 | 177 | 975 |
+| 8192 | 175 | 1345 | 169 | 1034 |
+| 9216 | 165 | 1271 | 168 | 1023 |
+| 16384 | 166 | 1396 | 165 | 949 |
+| 32768 | 172 | 1311 | 161 | 881 |
+| 262144 | 136 | 632 | 134 | 629 |
+| 1048576 | 134 | 698 | 127 | 623 |
+
+
+
+### Ubuntu 12.04, gcc-4.7.3, 32-bit, with fftw 3.3.3 (built with --enable-neon), on a 1.2GHz ARM Cortex A9 (Tegra 3)
Built with:
-gcc-4.7 -O3 -DHAVE_FFTW -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon -ffast-math test_pffft.c pffft.c -o test_pffft_arm fftpack.c -lm -I/usr/local/include/ -L/usr/local/lib/ -lfftw3f
-| input len |real FFTPack| real FFTW | real PFFFT | |cplx FFTPack| cplx FFTW | cplx PFFFT |
-|-----------+------------+------------+------------| |------------+------------+------------|
-| 64 | 549 | 452 | 731 | | 512 | 602 | 640 |
-| 96 | 421 | 272 | 702 | | 496 | 571 | 602 |
-| 128 | 498 | 512 | 815 | | 597 | 618 | 652 |
-| 160 | 521 | 536 | 815 | | 586 | 669 | 625 |
-| 192 | 539 | 571 | 883 | | 485 | 597 | 626 |
-| 256 | 640 | 539 | 975 | | 569 | 611 | 671 |
-| 384 | 499 | 610 | 879 | | 499 | 602 | 637 |
-| 480 | 518 | 507 | 877 | | 496 | 661 | 616 |
-| 512 | 524 | 591 | 1002 | | 549 | 678 | 668 |
-| 640 | 542 | 612 | 955 | | 568 | 663 | 645 |
-| 768 | 557 | 613 | 981 | | 491 | 663 | 598 |
-| 800 | 514 | 353 | 882 | | 514 | 360 | 574 |
-| 1024 | 640 | 640 | 1067 | | 492 | 683 | 602 |
-| 2048 | 587 | 640 | 908 | | 486 | 640 | 552 |
-| 2400 | 479 | 368 | 777 | | 422 | 376 | 518 |
-| 4096 | 511 | 614 | 853 | | 426 | 640 | 534 |
-| 8192 | 415 | 584 | 708 | | 386 | 622 | 516 |
-| 9216 | 419 | 571 | 687 | | 364 | 586 | 506 |
-| 16384 | 426 | 577 | 716 | | 398 | 606 | 530 |
-| 32768 | 417 | 572 | 673 | | 399 | 572 | 468 |
-| 262144 | 219 | 380 | 293 | | 255 | 431 | 343 |
-| 1048576 | 202 | 274 | 237 | | 265 | 282 | 355 |
-|-----------+------------+------------+------------| |------------+------------+------------|
+```
+gcc-4.7 -O3 -DHAVE_FFTW -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon -ffast-math test_pffft.c pffft.c -o test_pffft_arm fftpack.c -lm -I/usr/local/include/ -L/usr/local/lib/ -lfftw3f
+```
+
+| input len |real FFTPack| real FFTW | real PFFFT |cplx FFTPack| cplx FFTW | cplx PFFFT |
+|----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|
+| 64 | 549 | 452 | 731 | 512 | 602 | 640 |
+| 96 | 421 | 272 | 702 | 496 | 571 | 602 |
+| 128 | 498 | 512 | 815 | 597 | 618 | 652 |
+| 160 | 521 | 536 | 815 | 586 | 669 | 625 |
+| 192 | 539 | 571 | 883 | 485 | 597 | 626 |
+| 256 | 640 | 539 | 975 | 569 | 611 | 671 |
+| 384 | 499 | 610 | 879 | 499 | 602 | 637 |
+| 480 | 518 | 507 | 877 | 496 | 661 | 616 |
+| 512 | 524 | 591 | 1002 | 549 | 678 | 668 |
+| 640 | 542 | 612 | 955 | 568 | 663 | 645 |
+| 768 | 557 | 613 | 981 | 491 | 663 | 598 |
+| 800 | 514 | 353 | 882 | 514 | 360 | 574 |
+| 1024 | 640 | 640 | 1067 | 492 | 683 | 602 |
+| 2048 | 587 | 640 | 908 | 486 | 640 | 552 |
+| 2400 | 479 | 368 | 777 | 422 | 376 | 518 |
+| 4096 | 511 | 614 | 853 | 426 | 640 | 534 |
+| 8192 | 415 | 584 | 708 | 386 | 622 | 516 |
+| 9216 | 419 | 571 | 687 | 364 | 586 | 506 |
+| 16384 | 426 | 577 | 716 | 398 | 606 | 530 |
+| 32768 | 417 | 572 | 673 | 399 | 572 | 468 |
+| 262144 | 219 | 380 | 293 | 255 | 431 | 343 |
+| 1048576 | 202 | 274 | 237 | 265 | 282 | 355 |
Same platform as above, but this time pffft and fftpack are built with clang 3.2:
+```
clang -O3 -DHAVE_FFTW -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon -ffast-math test_pffft.c pffft.c -o test_pffft_arm fftpack.c -lm -I/usr/local/include/ -L/usr/local/lib/ -lfftw3f
-
-| input len |real FFTPack| real FFTW | real PFFFT | |cplx FFTPack| cplx FFTW | cplx PFFFT |
-|-----------+------------+------------+------------| |------------+------------+------------|
-| 64 | 427 | 452 | 853 | | 427 | 602 | 1024 |
-| 96 | 351 | 276 | 843 | | 337 | 571 | 963 |
-| 128 | 373 | 512 | 996 | | 390 | 618 | 1054 |
-| 160 | 426 | 536 | 987 | | 375 | 669 | 914 |
-| 192 | 404 | 571 | 1079 | | 388 | 588 | 1079 |
-| 256 | 465 | 539 | 1205 | | 445 | 602 | 1170 |
-| 384 | 366 | 610 | 1099 | | 343 | 594 | 1099 |
-| 480 | 356 | 507 | 1140 | | 335 | 651 | 931 |
-| 512 | 411 | 591 | 1213 | | 384 | 649 | 1124 |
-| 640 | 398 | 612 | 1193 | | 373 | 654 | 901 |
-| 768 | 409 | 613 | 1227 | | 383 | 663 | 1044 |
-| 800 | 411 | 348 | 1073 | | 353 | 358 | 809 |
-| 1024 | 427 | 640 | 1280 | | 413 | 692 | 1004 |
-| 2048 | 414 | 626 | 1126 | | 371 | 640 | 853 |
-| 2400 | 399 | 373 | 898 | | 319 | 368 | 653 |
-| 4096 | 404 | 602 | 1059 | | 357 | 633 | 778 |
-| 8192 | 332 | 584 | 792 | | 308 | 616 | 716 |
-| 9216 | 322 | 561 | 783 | | 299 | 586 | 687 |
-| 16384 | 344 | 568 | 778 | | 314 | 617 | 745 |
-| 32768 | 342 | 564 | 737 | | 314 | 552 | 629 |
-| 262144 | 201 | 383 | 313 | | 227 | 435 | 413 |
-| 1048576 | 187 | 262 | 251 | | 228 | 281 | 409 |
-|-----------+------------+------------+------------| |------------+------------+------------|
+```
+
+| input len |real FFTPack| real FFTW | real PFFFT |cplx FFTPack| cplx FFTW | cplx PFFFT |
+|----------:|-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|
+| 64 | 427 | 452 | 853 | 427 | 602 | 1024 |
+| 96 | 351 | 276 | 843 | 337 | 571 | 963 |
+| 128 | 373 | 512 | 996 | 390 | 618 | 1054 |
+| 160 | 426 | 536 | 987 | 375 | 669 | 914 |
+| 192 | 404 | 571 | 1079 | 388 | 588 | 1079 |
+| 256 | 465 | 539 | 1205 | 445 | 602 | 1170 |
+| 384 | 366 | 610 | 1099 | 343 | 594 | 1099 |
+| 480 | 356 | 507 | 1140 | 335 | 651 | 931 |
+| 512 | 411 | 591 | 1213 | 384 | 649 | 1124 |
+| 640 | 398 | 612 | 1193 | 373 | 654 | 901 |
+| 768 | 409 | 613 | 1227 | 383 | 663 | 1044 |
+| 800 | 411 | 348 | 1073 | 353 | 358 | 809 |
+| 1024 | 427 | 640 | 1280 | 413 | 692 | 1004 |
+| 2048 | 414 | 626 | 1126 | 371 | 640 | 853 |
+| 2400 | 399 | 373 | 898 | 319 | 368 | 653 |
+| 4096 | 404 | 602 | 1059 | 357 | 633 | 778 |
+| 8192 | 332 | 584 | 792 | 308 | 616 | 716 |
+| 9216 | 322 | 561 | 783 | 299 | 586 | 687 |
+| 16384 | 344 | 568 | 778 | 314 | 617 | 745 |
+| 32768 | 342 | 564 | 737 | 314 | 552 | 629 |
+| 262144 | 201 | 383 | 313 | 227 | 435 | 413 |
+| 1048576 | 187 | 262 | 251 | 228 | 281 | 409 |
So it looks like, on ARM, gcc 4.7 is the best at scalar floating point
(the fftpack performance numbers are better with gcc), while clang is
@@ -334,36 +347,38 @@ the best with neon intrinsics (see how pffft perf has improved with
clang 3.2).
-NVIDIA Jetson TK1 board, gcc-4.8.2. The cpu is a 2.3GHz cortex A15 (Tegra K1).
+### NVIDIA Jetson TK1 board, gcc-4.8.2. The cpu is a 2.3GHz cortex A15 (Tegra K1).
Built with:
-gcc -O3 -march=armv7-a -mtune=native -mfloat-abi=hard -mfpu=neon -ffast-math test_pffft.c pffft.c -o test_pffft_arm fftpack.c -lm
-| input len |real FFTPack| real PFFFT | |cplx FFTPack| cplx PFFFT |
-|-----------+------------+------------| |------------+------------|
-| 64 | 1735 | 3308 | | 1994 | 3744 |
-| 96 | 1596 | 3448 | | 1987 | 3572 |
-| 128 | 1807 | 4076 | | 2255 | 3960 |
-| 160 | 1769 | 4083 | | 2071 | 3845 |
-| 192 | 1990 | 4233 | | 2017 | 3939 |
-| 256 | 2191 | 4882 | | 2254 | 4346 |
-| 384 | 1878 | 4492 | | 2073 | 4012 |
-| 480 | 1748 | 4398 | | 1923 | 3951 |
-| 512 | 2030 | 5064 | | 2267 | 4195 |
-| 640 | 1918 | 4756 | | 2094 | 4184 |
-| 768 | 2099 | 4907 | | 2048 | 4297 |
-| 800 | 1822 | 4555 | | 1880 | 4063 |
-| 1024 | 2232 | 5355 | | 2187 | 4420 |
-| 2048 | 2176 | 4983 | | 2027 | 3602 |
-| 2400 | 1741 | 4256 | | 1710 | 3344 |
-| 4096 | 1816 | 3914 | | 1851 | 3349 |
-| 8192 | 1716 | 3481 | | 1700 | 3255 |
-| 9216 | 1735 | 3589 | | 1653 | 3094 |
-| 16384 | 1567 | 3483 | | 1637 | 3244 |
-| 32768 | 1624 | 3240 | | 1655 | 3156 |
-| 262144 | 1012 | 1898 | | 983 | 1503 |
-| 1048576 | 876 | 1154 | | 868 | 1341 |
-|-----------+------------+------------| |------------+------------|
+```
+gcc -O3 -march=armv7-a -mtune=native -mfloat-abi=hard -mfpu=neon -ffast-math test_pffft.c pffft.c -o test_pffft_arm fftpack.c -lm
+```
+
+| input len |real FFTPack| real PFFFT |cplx FFTPack| cplx PFFFT |
+|----------:|-----------:|-----------:|-----------:|-----------:|
+| 64 | 1735 | 3308 | 1994 | 3744 |
+| 96 | 1596 | 3448 | 1987 | 3572 |
+| 128 | 1807 | 4076 | 2255 | 3960 |
+| 160 | 1769 | 4083 | 2071 | 3845 |
+| 192 | 1990 | 4233 | 2017 | 3939 |
+| 256 | 2191 | 4882 | 2254 | 4346 |
+| 384 | 1878 | 4492 | 2073 | 4012 |
+| 480 | 1748 | 4398 | 1923 | 3951 |
+| 512 | 2030 | 5064 | 2267 | 4195 |
+| 640 | 1918 | 4756 | 2094 | 4184 |
+| 768 | 2099 | 4907 | 2048 | 4297 |
+| 800 | 1822 | 4555 | 1880 | 4063 |
+| 1024 | 2232 | 5355 | 2187 | 4420 |
+| 2048 | 2176 | 4983 | 2027 | 3602 |
+| 2400 | 1741 | 4256 | 1710 | 3344 |
+| 4096 | 1816 | 3914 | 1851 | 3349 |
+| 8192 | 1716 | 3481 | 1700 | 3255 |
+| 9216 | 1735 | 3589 | 1653 | 3094 |
+| 16384 | 1567 | 3483 | 1637 | 3244 |
+| 32768 | 1624 | 3240 | 1655 | 3156 |
+| 262144 | 1012 | 1898 | 983 | 1503 |
+| 1048576 | 876 | 1154 | 868 | 1341 |
The performance on the tegra K1 is pretty impressive. I'm not
including the FFTW numbers as they as slightly below the scalar