aboutsummaryrefslogtreecommitdiff
path: root/FAQ.md
diff options
context:
space:
mode:
Diffstat (limited to 'FAQ.md')
-rw-r--r--FAQ.md227
1 files changed, 227 insertions, 0 deletions
diff --git a/FAQ.md b/FAQ.md
new file mode 100644
index 0000000..11117d3
--- /dev/null
+++ b/FAQ.md
@@ -0,0 +1,227 @@
+# Rayon FAQ
+
+This file is for general questions that don't fit into the README or
+crate docs.
+
+## How many threads will Rayon spawn?
+
+By default, Rayon uses the same number of threads as the number of
+CPUs available. Note that on systems with hyperthreading enabled this
+equals the number of logical cores and not the physical ones.
+
+If you want to alter the number of threads spawned, you can set the
+environmental variable `RAYON_NUM_THREADS` to the desired number of
+threads or use the
+[`ThreadPoolBuilder::build_global` function](https://docs.rs/rayon/*/rayon/struct.ThreadPoolBuilder.html#method.build_global)
+method.
+
+## How does Rayon balance work between threads?
+
+Behind the scenes, Rayon uses a technique called **work stealing** to
+try and dynamically ascertain how much parallelism is available and
+exploit it. The idea is very simple: we always have a pool of worker
+threads available, waiting for some work to do. When you call `join`
+the first time, we shift over into that pool of threads. But if you
+call `join(a, b)` from a worker thread W, then W will place `b` into
+its work queue, advertising that this is work that other worker
+threads might help out with. W will then start executing `a`.
+
+While W is busy with `a`, other threads might come along and take `b`
+from its queue. That is called *stealing* `b`. Once `a` is done, W
+checks whether `b` was stolen by another thread and, if not, executes
+`b` itself. If W runs out of jobs in its own queue, it will look
+through the other threads' queues and try to steal work from them.
+
+This technique is not new. It was first introduced by the
+[Cilk project][cilk], done at MIT in the late nineties. The name Rayon
+is an homage to that work.
+
+[cilk]: http://supertech.csail.mit.edu/cilk/
+
+## What should I do if I use `Rc`, `Cell`, `RefCell` or other non-Send-and-Sync types?
+
+There are a number of non-threadsafe types in the Rust standard library,
+and if your code is using them, you will not be able to combine it
+with Rayon. Similarly, even if you don't have such types, but you try
+to have multiple closures mutating the same state, you will get
+compilation errors; for example, this function won't work, because
+both closures access `slice`:
+
+```rust
+/// Increment all values in slice.
+fn increment_all(slice: &mut [i32]) {
+ rayon::join(|| process(slice), || process(slice));
+}
+```
+
+The correct way to resolve such errors will depend on the case. Some
+cases are easy: for example, uses of [`Rc`] can typically be replaced
+with [`Arc`], which is basically equivalent, but thread-safe.
+
+Code that uses `Cell` or `RefCell`, however, can be somewhat more complicated.
+If you can refactor your code to avoid those types, that is often the best way
+forward, but otherwise, you can try to replace those types with their threadsafe
+equivalents:
+
+- `Cell` -- replacement: `AtomicUsize`, `AtomicBool`, etc
+- `RefCell` -- replacement: `RwLock`, or perhaps `Mutex`
+
+However, you have to be wary! The parallel versions of these types
+have different atomicity guarantees. For example, with a `Cell`, you
+can increment a counter like so:
+
+```rust
+let value = counter.get();
+counter.set(value + 1);
+```
+
+But when you use the equivalent `AtomicUsize` methods, you are
+actually introducing a potential race condition (not a data race,
+technically, but it can be an awfully fine distinction):
+
+```rust
+let value = tscounter.load(Ordering::SeqCst);
+tscounter.store(value + 1, Ordering::SeqCst);
+```
+
+You can already see that the `AtomicUsize` API is a bit more complex,
+as it requires you to specify an
+[ordering](http://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html). (I
+won't go into the details on ordering here, but suffice to say that if
+you don't know what an ordering is, and probably even if you do, you
+should use `Ordering::SeqCst`.) The danger in this parallel version of
+the counter is that other threads might be running at the same time
+and they could cause our counter to get out of sync. For example, if
+we have two threads, then they might both execute the "load" before
+either has a chance to execute the "store":
+
+```
+Thread 1 Thread 2
+let value = tscounter.load(Ordering::SeqCst);
+// value = X let value = tscounter.load(Ordering::SeqCst);
+ // value = X
+tscounter.store(value+1); tscounter.store(value+1);
+// tscounter = X+1 // tscounter = X+1
+```
+
+Now even though we've had two increments, we'll only increase the
+counter by one! Even though we've got no data race, this is still
+probably not the result we wanted. The problem here is that the `Cell`
+API doesn't make clear the scope of a "transaction" -- that is, the
+set of reads/writes that should occur atomically. In this case, we
+probably wanted the get/set to occur together.
+
+In fact, when using the `Atomic` types, you very rarely want a plain
+`load` or plain `store`. You probably want the more complex
+operations. A counter, for example, would use `fetch_add` to
+atomically load and increment the value in one step. Compare-and-swap
+is another popular building block.
+
+A similar problem can arise when converting `RefCell` to `RwLock`, but
+it is somewhat less likely, because the `RefCell` API does in fact
+have a notion of a transaction: the scope of the handle returned by
+`borrow` or `borrow_mut`. So if you convert each call to `borrow` to
+`read` (and `borrow_mut` to `write`), things will mostly work fine in
+a parallel setting, but there can still be changes in behavior.
+Consider using a `handle: RefCell<Vec<i32>>` like:
+
+```rust
+let len = handle.borrow().len();
+for i in 0 .. len {
+ let data = handle.borrow()[i];
+ println!("{}", data);
+}
+```
+
+In sequential code, we know that this loop is safe. But if we convert
+this to parallel code with an `RwLock`, we do not: this is because
+another thread could come along and do
+`handle.write().unwrap().pop()`, and thus change the length of the
+vector. In fact, even in *sequential* code, using very small borrow
+sections like this is an anti-pattern: you ought to be enclosing the
+entire transaction together, like so:
+
+```rust
+let vec = handle.borrow();
+let len = vec.len();
+for i in 0 .. len {
+ let data = vec[i];
+ println!("{}", data);
+}
+```
+
+Or, even better, using an iterator instead of indexing:
+
+```rust
+let vec = handle.borrow();
+for data in vec {
+ println!("{}", data);
+}
+```
+
+There are several reasons to prefer one borrow over many. The most
+obvious is that it is more efficient, since each borrow has to perform
+some safety checks. But it's also more reliable: suppose we modified
+the loop above to not just print things out, but also call into a
+helper function:
+
+```rust
+let vec = handle.borrow();
+for data in vec {
+ helper(...);
+}
+```
+
+And now suppose, independently, this helper fn evolved and had to pop
+something off of the vector:
+
+```rust
+fn helper(...) {
+ handle.borrow_mut().pop();
+}
+```
+
+Under the old model, where we did lots of small borrows, this would
+yield precisely the same error that we saw in parallel land using an
+`RwLock`: the length would be out of sync and our indexing would fail
+(note that in neither case would there be an actual *data race* and
+hence there would never be undefined behavior). But now that we use a
+single borrow, we'll see a borrow error instead, which is much easier
+to diagnose, since it occurs at the point of the `borrow_mut`, rather
+than downstream. Similarly, if we move to an `RwLock`, we'll find that
+the code either deadlocks (if the write is on the same thread as the
+read) or, if the write is on another thread, works just fine. Both of
+these are preferable to random failures in my experience.
+
+## But wait, isn't Rust supposed to free me from this kind of thinking?
+
+You might think that Rust is supposed to mean that you don't have to
+think about atomicity at all. In fact, if you avoid interior
+mutability (`Cell` and `RefCell` in a sequential setting, or
+`AtomicUsize`, `RwLock`, `Mutex`, et al. in parallel code), then this
+is true: the type system will basically guarantee that you don't have
+to think about atomicity at all. But often there are times when you
+WANT threads to interleave in the ways I showed above.
+
+Consider for example when you are conducting a search in parallel, say
+to find the shortest route. To avoid fruitless search, you might want
+to keep a cell with the shortest route you've found thus far. This
+way, when you are searching down some path that's already longer than
+this shortest route, you can just stop and avoid wasted effort. In
+sequential land, you might model this "best result" as a shared value
+like `Rc<Cell<usize>>` (here the `usize` represents the length of best
+path found so far); in parallel land, you'd use a `Arc<AtomicUsize>`.
+Now we can make our search function look like:
+
+```rust
+fn search(path: &Path, cost_so_far: usize, best_cost: &Arc<AtomicUsize>) {
+ if cost_so_far >= best_cost.load(Ordering::SeqCst) {
+ return;
+ }
+ ...
+ best_cost.store(...);
+}
+```
+
+Now in this case, we really WANT to see results from other threads
+interjected into our execution!