Age | Commit message (Collapse) | Author | |
---|---|---|---|
2020-12-05 | Implement 6D parallelization with 1D and no tiling | Marat Dukhan | |
2020-12-05 | Use __STDC_NO_ATOMICS__ to detect C11 compilers without stdatomic.h | Marat Dukhan | |
Replace MSVC-specific check from #10 | |||
2020-12-05 | Support pre-C11 GCC intrinsics for atomics | Marat Dukhan | |
2020-10-05 | Fix MSVC build (#10) | peterjc123 | |
Fix MSVC build | |||
2020-05-26 | Use cpuinfo_get_current_uarch_index_with_default for parallelization with uarch | Marat Dukhan | |
index | |||
2020-05-26 | 3D/4D/5D parallelization functions with 1D or no tiling | Marat Dukhan | |
2020-05-16 | Guard against generating ARM yield instruction for unsupporting processors | Marat Dukhan | |
2020-05-08 | Reorder C11 atomics before MSVC x64 atomics | Marat Dukhan | |
clang-cl, which supports both, should prefer C11 atomics | |||
2020-05-08 | Use platform-specific yield/pause instructions | Marat Dukhan | |
2020-05-07 | MSVC-compatible FPU state functions | Marat Dukhan | |
2020-05-07 | Thumb-1 compatible assembly for disable_fpu_denormals | Marat Dukhan | |
2020-05-04 | Avoid including stdatomic.h in any WAsm builds | Marat Dukhan | |
2020-05-02 | Fast path using atomic decrement instead of atomic compare-and-swap | Marat Dukhan | |
50% higher throughput on x86 (disabled on other platforms) | |||
2020-04-22 | Reorder C11 atomics before MSVC atomics | Marat Dukhan | |
clang-cl, which supports both, should prefer C11 atomics | |||
2020-04-16 | Recognize Cygwin as Windows | Marat Dukhan | |
2020-04-14 | Use load-acquire + store-release on synchronization variables | Marat Dukhan | |
Synchronization using relaxed atomics + fences instead of LA/SR violates C11/C++11 memory model and cause failures under thread sanitizer | |||
2020-04-10 | Support Windows on ARM/ARM64 | Marat Dukhan | |
2020-04-10 | Replace atomic fetch_sub with decrement_fetch primitive | Marat Dukhan | |
Decrement-fetch is a closer match to the primitive used in implementation | |||
2020-04-10 | Add compiler barriers to MSVC atomics implementation | Marat Dukhan | |
2020-04-10 | Fix race condition in Windows implementation | Marat Dukhan | |
The command event for the next command must be reset before write-release of the new command, because as soon as the worker threads observe the new command, they may complete it and switch to waiting on the next command event | |||
2020-04-10 | Rewrite work spreading between threads | Marat Dukhan | |
- Avoid word x word -> doubleword multiplication - Avoid doubleword / word -> word division - Replace remaining division with multiplication via FXdiv - Improve portability through removal of platform-dependent multiply_divide function | |||
2020-04-10 | Direct implementation pthreadpool_try_decrement_relaxed_size_t | Marat Dukhan | |
Replace implementation of pthreadpool_try_decrement_relaxed_size_t on top of emulated pthreadpool_compare_exchange_weak_relaxed_size_t with a direct implementation using platform intrinsics | |||
2020-04-10 | Return static thread pool pointer in shim implementation | Marat Dukhan | |
Makes pthreadpool tests pass in WebAssembly builds | |||
2020-04-07 | Minor fixes in Windows implementation | Marat Dukhan | |
2020-04-07 | Windows implementation using Events | Marat Dukhan | |
2020-04-05 | Fix erroneous narrowing in pthreadpool_fetch_sub_relaxed_size_t | Marat Dukhan | |
2020-04-05 | Optimized pthreadpool_parallelize_* functions | Marat Dukhan | |
Eliminate function call and division per each processed item in the multi-threaded case | |||
2020-04-01 | Implementation using Grand Central Dispatch | Marat Dukhan | |
2020-04-01 | Refactor pthreadpool implementation | Marat Dukhan | |
Split implementation into two types of components: - Components dependent on threading API - Portable components | |||
2020-04-01 | Remove unused per-thread wakeup_condvar | Marat Dukhan | |
2020-03-26 | Microarchitecture-aware parallelization functions | Marat Dukhan | |
2020-03-26 | Refactor multi-threaded case of parallelization functions | Marat Dukhan | |
- Extract multi-threaded setup logic into a generalized pthreadpool_parallelize function - Call into pthreadpool_parallelize directly from tiled and 2+-dimensional functions | |||
2020-03-23 | Implement atomic_decrement with LL-SC on ARM/ARM64 | Marat Dukhan | |
2020-03-23 | Minor refactoring in pthreadpool_destroy | Marat Dukhan | |
2020-03-23 | Fix race conditions in non-futex implementation | Marat Dukhan | |
2020-03-23 | Futex-based WebAssembly+Threads implementation | Marat Dukhan | |
2020-03-23 | Support WebAssembly+Threads build | Marat Dukhan | |
- Abstract away atomic operations and data type from the source file - Polyfill atomic operations for Clang targeting WAsm+Threads - Set Emscripten link options for WebAssembly+Threads builds | |||
2020-03-23 | Remove redundant barriers | Marat Dukhan | |
2020-03-23 | Simplify parallel task initialization | Marat Dukhan | |
2020-03-23 | Avoid spinning thread-pool when task has the only item | Marat Dukhan | |
2020-03-05 | Remove Native Client support | Marat Dukhan | |
2020-03-05 | PTHREADPOOL_FLAG_YIELD_WORKERS flag to bypass spin-wait | Marat Dukhan | |
Makes it possible to signal the last operation in a sequence of computations, so pthreadpool workers don't spin in vain. | |||
2020-03-05 | Minor cleanup | Marat Dukhan | |
2020-03-01 | Build on Windows/mingw64 (#6) | mattn | |
Support Windows/mingw64 build | |||
2019-10-19 | Switch to C11 atomics to synchronization | Marat Dukhan | |
2019-10-08 | Make inline assembly compatible with old toolchain | Marat Dukhan | |
Fix #4 | |||
2019-09-30 | Fix typo in comment | Marat Dukhan | |
2019-09-30 | Enable spin-wait in the main thread | Marat Dukhan | |
2019-09-30 | New pthreadpool_parallelize_* API | Marat Dukhan | |
2019-09-30 | Enable spin-wait in worker threads | Marat Dukhan | |