1 files changed, 83 insertions, 43 deletions
diff --git a/README.md b/README.md
index 7815966..d7b0885 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,6 @@
-[base64](https://crates.io/crates/base64)
-===
+# [base64](https://crates.io/crates/base64)
 
-[![](https://img.shields.io/crates/v/base64.svg)](https://crates.io/crates/base64) [![Docs](https://docs.rs/base64/badge.svg)](https://docs.rs/base64) [![Build](https://travis-ci.org/marshallpierce/rust-base64.svg?branch=master)](https://travis-ci.org/marshallpierce/rust-base64) [![codecov](https://codecov.io/gh/marshallpierce/rust-base64/branch/master/graph/badge.svg)](https://codecov.io/gh/marshallpierce/rust-base64) [![unsafe forbidden](https://img.shields.io/badge/unsafe-forbidden-success.svg)](https://github.com/rust-secure-code/safety-dance/)
+[![](https://img.shields.io/crates/v/base64.svg)](https://crates.io/crates/base64) [![Docs](https://docs.rs/base64/badge.svg)](https://docs.rs/base64) [![CircleCI](https://circleci.com/gh/marshallpierce/rust-base64/tree/master.svg?style=shield)](https://circleci.com/gh/marshallpierce/rust-base64/tree/master) [![codecov](https://codecov.io/gh/marshallpierce/rust-base64/branch/master/graph/badge.svg)](https://codecov.io/gh/marshallpierce/rust-base64) [![unsafe forbidden](https://img.shields.io/badge/unsafe-forbidden-success.svg)](https://github.com/rust-secure-code/safety-dance/)
 
 <a href="https://www.jetbrains.com/?from=rust-base64"><img src="/icon_CLion.svg" height="40px"/></a>
 
@@ -9,58 +8,98 @@ Made with CLion. Thanks to JetBrains for supporting open source!
 
 It's base64. What more could anyone want?
 
-This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at multiple levels of abstraction so you can choose the level of convenience vs performance that you want, e.g. `decode_config_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input), whereas `decode_config` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
+This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at
+multiple levels of abstraction so you can choose the level of convenience vs performance that you want,
+e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input),
+whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is
+slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
 
-Example
----
+See the [docs](https://docs.rs/base64) for all the details.
 
-```rust
-extern crate base64;
+## FAQ
 
-use base64::{encode, decode};
+### I need to decode base64 with whitespace/null bytes/other random things interspersed in it. What should I do?
 
-fn main() {
-    let a = b"hello world";
-    let b = "aGVsbG8gd29ybGQ=";
+Remove non-base64 characters from your input before decoding.
 
-    assert_eq!(encode(a), b);
-    assert_eq!(a, &decode(b).unwrap()[..]);
-}
-```
+If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to
+strip out whatever you need removed.
 
-See the [docs](https://docs.rs/base64) for all the details.
+If you have a `Read` (e.g. reading a file or network socket), there are various approaches.
 
-Rust version compatibility
----
+- Use [iter_read](https://crates.io/crates/iter-read) together with `Read`'s `bytes()` to filter out unwanted bytes.
+- Implement `Read` with a `read()` impl that delegates to your actual `Read`, and then drops any bytes you don't want.
 
-The minimum required Rust version is 1.34.0.
+### I need to line-wrap base64, e.g. for MIME/PEM.
 
-Developing
----
+[line-wrap](https://crates.io/crates/line-wrap) does just that.
 
-Benchmarks are in `benches/`. Running them requires nightly rust, but `rustup` makes it easy:
+### I want canonical base64 encoding/decoding.
 
-```bash
-rustup run nightly cargo bench
-```
+First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to
+produce canonical output across all usage in the wild (hint: they don't).
+However, [people are drawn to their own destruction like moths to a flame](https://eprint.iacr.org/2022/361), so here we
+are.
+
+There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits
+of the last encoded token in two or three token suffixes, and the `=` token used to inflate the suffix to a full four
+tokens.
+
+The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens,
+with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we
+decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set
+to 1 instead of 0.
+
+The `=` pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other
+than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been
+wasted on pointless `=` bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping
+when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there
+are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC,
+or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the
+url-safe alphabet).
+
+All `Engine` implementations must at a minimum support treating non-canonical padding of both types as an error, and
+optionally may allow other behaviors.
 
-Decoding is aided by some pre-calculated tables, which are generated by:
+## Rust version compatibility
+
+The minimum supported Rust version is 1.57.0.
+
+# Contributing
+
+Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all
+PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody
+wants to chase bugs in encoding of any sort.
+
+All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the
+free time to give each PR the attention it deserves. I will get to everyone eventually!
+
+## Developing
+
+Benchmarks are in `benches/`. Running them requires nightly rust, but `rustup` makes it easy:
 
 ```bash
-cargo run --example make_tables > src/tables.rs.tmp && mv src/tables.rs.tmp src/tables.rs
+rustup run nightly cargo bench
 ```
 
-no_std
----
+## no_std
 
-This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate the `default-features` to target core instead. In that case you lose out on all the functionality revolving around `std::io`, `std::error::Error` and heap allocations. There is an additional `alloc` feature that you can activate to bring back the support for heap allocations.
+This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate
+the `default-features` to target `core` instead. In that case you lose out on all the functionality revolving
+around `std::io`, `std::error::Error`, and heap allocations. There is an additional `alloc` feature that you can activate
+to bring back the support for heap allocations.
 
-Profiling
----
+## Profiling
 
-On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the benchmarks with `rustup nightly run cargo bench --no-run`.
+On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the
+benchmarks with `rustup nightly run cargo bench --no-run`.
 
-Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
+Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results
+easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your
+CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual
+full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path
+that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want
+to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
 
 ```bash
 sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuse
@@ -72,7 +111,10 @@ Then analyze the results, again with perf:
 sudo perf annotate -l
 ```
 
-You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that 4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of instruction profiling is inherently inaccurate, especially in branch-heavy code.
+You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that
+4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as
+it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of
+instruction profiling is inherently inaccurate, especially in branch-heavy code.
 
 ```text
  lib.rs:322    0.70 :     10698:       mov    %rdi,%rax
@@ -94,11 +136,10 @@ You'll see a bunch of interleaved rust source and assembly like this. The sectio
     0.00 :        106ab:       je     1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>
 ```
 
+## Fuzzing
 
-Fuzzing
----
-
-This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts. To run, use an invocation like these:
+This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts.
+To run, use an invocation like these:
 
 ```bash
 cargo +nightly fuzz run roundtrip
@@ -107,8 +148,7 @@ cargo +nightly fuzz run roundtrip_random_config -- -max_len=10240
 cargo +nightly fuzz run decode_random
 ```
 
-
-License
----
+## License
 
 This project is dual-licensed under MIT and Apache 2.0.
+