aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md222
1 files changed, 222 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..0adc1b1
--- /dev/null
+++ b/README.md
@@ -0,0 +1,222 @@
+# virtio-queue
+
+The `virtio-queue` crate provides a virtio device implementation for a virtio
+queue, a virtio descriptor and a chain of such descriptors.
+Two formats of virtio queues are defined in the specification: split virtqueues
+and packed virtqueues. The `virtio-queue` crate offers support only for the
+[split virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-240006)
+format.
+The purpose of the virtio-queue API is to be consumed by virtio device
+implementations (such as the block device or vsock device).
+The main abstraction is the `Queue`. The crate is also defining a state object
+for the queue, i.e. `QueueState`.
+
+## Usage
+
+Let’s take a concrete example of how a device would work with a queue, using
+the MMIO bus.
+
+First, it is important to mention that the mandatory parts of the virtio
+interface are the following:
+
+- the device status field → provides an indication of
+ [the completed steps](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001)
+ of the device initialization routine,
+- the feature bits →
+ [the features](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001)
+ the driver/device understand(s),
+- [notifications](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-170003),
+- one or more
+ [virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-230005)
+ → the mechanism for data transport between the driver and device.
+
+Each virtqueue consists of three parts:
+
+- Descriptor Table,
+- Available Ring,
+- Used Ring.
+
+Before booting the virtual machine (VM), the VMM does the following set up:
+
+1. initialize an array of Queues using the Queue constructor.
+2. register the device to the MMIO bus, so that the driver can later send
+ read/write requests from/to the MMIO space, some of those requests also set
+ up the queues’ state.
+3. other pre-boot configurations, such as registering a fd for the interrupt
+ assigned to the device, fd which will be later used by the device to inform
+ the driver that it has information to communicate.
+
+After the boot of the VM, the driver starts sending read/write requests to
+configure things like:
+
+* the supported features;
+* queue parameters. The following setters are used for the queue set up:
+ * `set_size` → for setting the size of the queue.
+ * `set_ready` → configure the queue to the `ready for processing` state.
+ * `set_desc_table_address`, `set_avail_ring_address`,
+ `set_used_ring_address` → configure the guest address of the constituent
+ parts of the queue.
+ * `set_event_idx` → it is called as part of the features' negotiation in
+ the `virtio-device` crate, and is enabling or disabling the
+ VIRTIO_F_RING_EVENT_IDX feature.
+* the device activation. As part of this activation, the device can also create
+ a queue handler for the device, that can be later used to process the queue.
+
+Once the queues are ready, the device can be used.
+
+The steady state operation of a virtio device follows a model where the driver
+produces descriptor chains which are consumed by the device, and both parties
+need to be notified when new elements have been placed on the associate ring to
+avoid busy polling. The precise notification mechanism is left up to the VMM
+that incorporates the devices and queues (it usually involves things like MMIO
+vm exits and interrupt injection into the guest). The queue implementation is
+agnostic to the notification mechanism in use, and it exposes methods and
+functionality (such as iterators) that are called from the outside in response
+to a notification event.
+
+### Data transmission using virtqueues
+
+The basic principle of how the queues are used by the device/driver is the
+following, as showed in the diagram below as well:
+
+1. when the guest driver has a new request (buffer), it allocates free
+ descriptor(s) for the buffer in the descriptor table, chaining as necessary.
+2. the driver adds a new entry with the head index of the descriptor chain
+ describing the request, in the available ring entries.
+3. the driver increments the `idx` with the number of new entries, the diagram
+ shows the simple use case of only one new entry.
+4. the driver sends an available buffer notification to the device if such
+ notifications are not suppressed.
+5. the device will at some point consume that request, by first reading the
+ `idx` field from the available ring. This can be directly achieved with
+ `Queue::avail_idx`, but we do not recommend to the consumers of the crate
+ to use this because it is already called behind the scenes by the iterator
+ over all available descriptor chain heads.
+6. the device gets the index of the descriptor chain(s) corresponding to the
+ read `idx` value.
+7. the device reads the corresponding descriptor(s) from the descriptor table.
+8. the device adds a new entry in the used ring by using `Queue::add_used`; the
+ entry is defined in the spec as `virtq_used_elem`, and in `virtio-queue` as
+ `VirtqUsedElem`. This structure is holding both the index of the descriptor
+ chain and the number of bytes that were written to the memory as part of
+ serving the request.
+9. the device increments the `idx` from the used ring; this is done as part of
+ the `Queue::add_used` that was mentioned above.
+10. the device sends a used buffer notification to the driver if such
+ notifications are not suppressed.
+
+![queue](https://raw.githubusercontent.com/rust-vmm/vm-virtio/main/crates/virtio-queue/docs/images/queue.png)
+
+A descriptor is storing four fields, with the first two, `addr` and `len`,
+pointing to the data in memory to which the descriptor refers, as shown in the
+diagram below. The `flags` field is useful for indicating if, for example, the
+buffer is device readable or writable, or if we have another descriptor chained
+after this one (VIRTQ_DESC_F_NEXT flag set). `next` field is storing the index
+of the next descriptor if VIRTQ_DESC_F_NEXT is set.
+
+![descriptor](https://raw.githubusercontent.com/rust-vmm/vm-virtio/main/crates/virtio-queue/docs/images/descriptor.png)
+
+**Requirements for device implementation**
+
+* Abstractions from virtio-queue such as `DescriptorChain` can be used to parse
+ descriptors provided by the device, which represent input or output memory
+ areas for device I/O. A descriptor is essentially an (address, length) pair,
+ which is subsequently used by the device model operation. We do not check the
+ validity of the descriptors, and instead expect any validations to happen
+ when the device implementation is attempting to access the corresponding
+ areas. Early checks can add non-negligible additional costs, and exclusively
+ relying upon them may lead to time-of-check-to-time-of-use race conditions.
+* The device should validate before reading/writing to a buffer that it is
+ device-readable/device-writable.
+
+## Design
+
+`QueueT` is a trait that allows different implementations for a `Queue`
+object for single-threaded context and multi-threaded context. The
+implementations provided in `virtio-queue` are:
+
+1. `Queue` → it is used for the single-threaded context.
+2. `QueueSync` → it is used for the multi-threaded context, and is simply
+ a wrapper over an `Arc<Mutex<Queue>>`.
+
+Besides the above abstractions, the `virtio-queue` crate provides also the
+following ones:
+
+* `Descriptor` → which mostly offers accessors for the members of the
+ `Descriptor`.
+* `DescriptorChain` → provides accessors for the `DescriptorChain`’s members
+ and an `Iterator` implementation for iterating over the `DescriptorChain`,
+ there is also an abstraction for iterators over just the device readable or
+ just the device writable descriptors (`DescriptorChainRwIter`).
+* `AvailIter` - is a consuming iterator over all available descriptor chain
+ heads in the queue.
+
+## Save/Restore Queue
+
+The `Queue` allows saving the state through the `state` function which returns
+a `QueueState`. `Queue` objects can be created from a previously saved state by
+using `QueueState::try_from`. The VMM should check for errors when restoring
+a `Queue` from a previously saved state.
+
+### Notification suppression
+
+A big part of the `virtio-queue` crate consists of the notification suppression
+support. As already mentioned, the driver can send an available buffer
+notification to the device when there are new entries in the available ring,
+and the device can send a used buffer notification to the driver when there are
+new entries in the used ring. There might be cases when sending a notification
+each time these scenarios happen is not efficient, for example when the driver
+is processing the used ring, it would not need to receive another used buffer
+notification. The mechanism for suppressing the notifications is detailed in
+the following sections from the specification:
+- [Used Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-400007),
+- [Available Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-4800010).
+
+The `Queue` abstraction is proposing the following sequence of steps for
+processing new available ring entries:
+
+1. the device first disables the notifications to make the driver aware it is
+ processing the available ring and does not want interruptions, by using
+ `Queue::disable_notification`. Notifications are disabled by the device
+ either if VIRTIO_F_EVENT_IDX is not negotiated, and VIRTQ_USED_F_NO_NOTIFY
+ is set in the `flags` field of the used ring, or if VIRTIO_F_EVENT_IDX is
+ negotiated, and `avail_event` value is not updated, i.e. it remains set to
+ the latest `idx` value of the available ring that was already notified by
+ the driver.
+2. the device processes the new entries by using the `AvailIter` iterator.
+3. the device can enable the notifications now, by using
+ `Queue::enable_notification`. Notifications are enabled by the device either
+ if VIRTIO_F_EVENT_IDX is not negotiated, and 0 is set in the `flags` field
+ of the used ring, or if VIRTIO_F_EVENT_IDX is negotiated, and `avail_event`
+ value is set to the smallest `idx` value of the available ring that was not
+ already notified by the driver. This way the device makes sure that it won’t
+ miss any notification.
+
+The above steps should be done in a loop to also handle the less likely case
+where the driver added new entries just before we re-enabled notifications.
+
+On the driver side, the `Queue` provides the `needs_notification` method which
+should be used each time the device adds a new entry to the used ring.
+Depending on the `used_event` value and on the last used value
+(`signalled_used`), `needs_notification` returns true to let the device know it
+should send a notification to the guest.
+
+## Assumptions
+
+We assume the users of the `Queue` implementation won’t attempt to use the
+queue before checking that the `ready` bit is set. This can be verified by
+calling `Queue::is_valid` which, besides this, is also checking that the three
+queue parts are valid memory regions.
+We assume consumers will use `AvailIter::go_to_previous_position` only in
+single-threaded contexts.
+We assume the users will consume the entries from the available ring in the
+recommended way from the documentation, i.e. device starts processing the
+available ring entries, disables the notifications, processes the entries,
+and then re-enables notifications.
+
+## License
+
+This project is licensed under either of
+
+- [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0
+- [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause)