diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 222 |
1 files changed, 222 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..0adc1b1 --- /dev/null +++ b/README.md @@ -0,0 +1,222 @@ +# virtio-queue + +The `virtio-queue` crate provides a virtio device implementation for a virtio +queue, a virtio descriptor and a chain of such descriptors. +Two formats of virtio queues are defined in the specification: split virtqueues +and packed virtqueues. The `virtio-queue` crate offers support only for the +[split virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-240006) +format. +The purpose of the virtio-queue API is to be consumed by virtio device +implementations (such as the block device or vsock device). +The main abstraction is the `Queue`. The crate is also defining a state object +for the queue, i.e. `QueueState`. + +## Usage + +Let’s take a concrete example of how a device would work with a queue, using +the MMIO bus. + +First, it is important to mention that the mandatory parts of the virtio +interface are the following: + +- the device status field → provides an indication of + [the completed steps](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001) + of the device initialization routine, +- the feature bits → + [the features](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001) + the driver/device understand(s), +- [notifications](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-170003), +- one or more + [virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-230005) + → the mechanism for data transport between the driver and device. + +Each virtqueue consists of three parts: + +- Descriptor Table, +- Available Ring, +- Used Ring. + +Before booting the virtual machine (VM), the VMM does the following set up: + +1. initialize an array of Queues using the Queue constructor. +2. register the device to the MMIO bus, so that the driver can later send + read/write requests from/to the MMIO space, some of those requests also set + up the queues’ state. +3. other pre-boot configurations, such as registering a fd for the interrupt + assigned to the device, fd which will be later used by the device to inform + the driver that it has information to communicate. + +After the boot of the VM, the driver starts sending read/write requests to +configure things like: + +* the supported features; +* queue parameters. The following setters are used for the queue set up: + * `set_size` → for setting the size of the queue. + * `set_ready` → configure the queue to the `ready for processing` state. + * `set_desc_table_address`, `set_avail_ring_address`, + `set_used_ring_address` → configure the guest address of the constituent + parts of the queue. + * `set_event_idx` → it is called as part of the features' negotiation in + the `virtio-device` crate, and is enabling or disabling the + VIRTIO_F_RING_EVENT_IDX feature. +* the device activation. As part of this activation, the device can also create + a queue handler for the device, that can be later used to process the queue. + +Once the queues are ready, the device can be used. + +The steady state operation of a virtio device follows a model where the driver +produces descriptor chains which are consumed by the device, and both parties +need to be notified when new elements have been placed on the associate ring to +avoid busy polling. The precise notification mechanism is left up to the VMM +that incorporates the devices and queues (it usually involves things like MMIO +vm exits and interrupt injection into the guest). The queue implementation is +agnostic to the notification mechanism in use, and it exposes methods and +functionality (such as iterators) that are called from the outside in response +to a notification event. + +### Data transmission using virtqueues + +The basic principle of how the queues are used by the device/driver is the +following, as showed in the diagram below as well: + +1. when the guest driver has a new request (buffer), it allocates free + descriptor(s) for the buffer in the descriptor table, chaining as necessary. +2. the driver adds a new entry with the head index of the descriptor chain + describing the request, in the available ring entries. +3. the driver increments the `idx` with the number of new entries, the diagram + shows the simple use case of only one new entry. +4. the driver sends an available buffer notification to the device if such + notifications are not suppressed. +5. the device will at some point consume that request, by first reading the + `idx` field from the available ring. This can be directly achieved with + `Queue::avail_idx`, but we do not recommend to the consumers of the crate + to use this because it is already called behind the scenes by the iterator + over all available descriptor chain heads. +6. the device gets the index of the descriptor chain(s) corresponding to the + read `idx` value. +7. the device reads the corresponding descriptor(s) from the descriptor table. +8. the device adds a new entry in the used ring by using `Queue::add_used`; the + entry is defined in the spec as `virtq_used_elem`, and in `virtio-queue` as + `VirtqUsedElem`. This structure is holding both the index of the descriptor + chain and the number of bytes that were written to the memory as part of + serving the request. +9. the device increments the `idx` from the used ring; this is done as part of + the `Queue::add_used` that was mentioned above. +10. the device sends a used buffer notification to the driver if such + notifications are not suppressed. + +![queue](https://raw.githubusercontent.com/rust-vmm/vm-virtio/main/crates/virtio-queue/docs/images/queue.png) + +A descriptor is storing four fields, with the first two, `addr` and `len`, +pointing to the data in memory to which the descriptor refers, as shown in the +diagram below. The `flags` field is useful for indicating if, for example, the +buffer is device readable or writable, or if we have another descriptor chained +after this one (VIRTQ_DESC_F_NEXT flag set). `next` field is storing the index +of the next descriptor if VIRTQ_DESC_F_NEXT is set. + +![descriptor](https://raw.githubusercontent.com/rust-vmm/vm-virtio/main/crates/virtio-queue/docs/images/descriptor.png) + +**Requirements for device implementation** + +* Abstractions from virtio-queue such as `DescriptorChain` can be used to parse + descriptors provided by the device, which represent input or output memory + areas for device I/O. A descriptor is essentially an (address, length) pair, + which is subsequently used by the device model operation. We do not check the + validity of the descriptors, and instead expect any validations to happen + when the device implementation is attempting to access the corresponding + areas. Early checks can add non-negligible additional costs, and exclusively + relying upon them may lead to time-of-check-to-time-of-use race conditions. +* The device should validate before reading/writing to a buffer that it is + device-readable/device-writable. + +## Design + +`QueueT` is a trait that allows different implementations for a `Queue` +object for single-threaded context and multi-threaded context. The +implementations provided in `virtio-queue` are: + +1. `Queue` → it is used for the single-threaded context. +2. `QueueSync` → it is used for the multi-threaded context, and is simply + a wrapper over an `Arc<Mutex<Queue>>`. + +Besides the above abstractions, the `virtio-queue` crate provides also the +following ones: + +* `Descriptor` → which mostly offers accessors for the members of the + `Descriptor`. +* `DescriptorChain` → provides accessors for the `DescriptorChain`’s members + and an `Iterator` implementation for iterating over the `DescriptorChain`, + there is also an abstraction for iterators over just the device readable or + just the device writable descriptors (`DescriptorChainRwIter`). +* `AvailIter` - is a consuming iterator over all available descriptor chain + heads in the queue. + +## Save/Restore Queue + +The `Queue` allows saving the state through the `state` function which returns +a `QueueState`. `Queue` objects can be created from a previously saved state by +using `QueueState::try_from`. The VMM should check for errors when restoring +a `Queue` from a previously saved state. + +### Notification suppression + +A big part of the `virtio-queue` crate consists of the notification suppression +support. As already mentioned, the driver can send an available buffer +notification to the device when there are new entries in the available ring, +and the device can send a used buffer notification to the driver when there are +new entries in the used ring. There might be cases when sending a notification +each time these scenarios happen is not efficient, for example when the driver +is processing the used ring, it would not need to receive another used buffer +notification. The mechanism for suppressing the notifications is detailed in +the following sections from the specification: +- [Used Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-400007), +- [Available Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-4800010). + +The `Queue` abstraction is proposing the following sequence of steps for +processing new available ring entries: + +1. the device first disables the notifications to make the driver aware it is + processing the available ring and does not want interruptions, by using + `Queue::disable_notification`. Notifications are disabled by the device + either if VIRTIO_F_EVENT_IDX is not negotiated, and VIRTQ_USED_F_NO_NOTIFY + is set in the `flags` field of the used ring, or if VIRTIO_F_EVENT_IDX is + negotiated, and `avail_event` value is not updated, i.e. it remains set to + the latest `idx` value of the available ring that was already notified by + the driver. +2. the device processes the new entries by using the `AvailIter` iterator. +3. the device can enable the notifications now, by using + `Queue::enable_notification`. Notifications are enabled by the device either + if VIRTIO_F_EVENT_IDX is not negotiated, and 0 is set in the `flags` field + of the used ring, or if VIRTIO_F_EVENT_IDX is negotiated, and `avail_event` + value is set to the smallest `idx` value of the available ring that was not + already notified by the driver. This way the device makes sure that it won’t + miss any notification. + +The above steps should be done in a loop to also handle the less likely case +where the driver added new entries just before we re-enabled notifications. + +On the driver side, the `Queue` provides the `needs_notification` method which +should be used each time the device adds a new entry to the used ring. +Depending on the `used_event` value and on the last used value +(`signalled_used`), `needs_notification` returns true to let the device know it +should send a notification to the guest. + +## Assumptions + +We assume the users of the `Queue` implementation won’t attempt to use the +queue before checking that the `ready` bit is set. This can be verified by +calling `Queue::is_valid` which, besides this, is also checking that the three +queue parts are valid memory regions. +We assume consumers will use `AvailIter::go_to_previous_position` only in +single-threaded contexts. +We assume the users will consume the entries from the available ring in the +recommended way from the documentation, i.e. device starts processing the +available ring entries, disables the notifications, processes the entries, +and then re-enables notifications. + +## License + +This project is licensed under either of + +- [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0 +- [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause) |