summaryrefslogtreecommitdiff
path: root/Documentation/virtual
diff options
context:
space:
mode:
authorRusty Russell <rusty@rustcorp.com.au>2012-05-22 12:16:09 +0930
committerRusty Russell <rusty@rustcorp.com.au>2012-05-22 12:16:09 +0930
commit33950c6e2269f516059e2fa777f8c7559dfa31a5 (patch)
tree3823b5c6609b3b08551ea0cffb0ab7c3a3a4b754 /Documentation/virtual
parent76e10d158efb6d4516018846f60c2ab5501900bc (diff)
downloadlwn-33950c6e2269f516059e2fa777f8c7559dfa31a5.tar.gz
lwn-33950c6e2269f516059e2fa777f8c7559dfa31a5.zip
virtio: update documentation to v0.9.5 of spec
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Diffstat (limited to 'Documentation/virtual')
-rw-r--r--Documentation/virtual/virtio-spec.txt1164
1 files changed, 1087 insertions, 77 deletions
diff --git a/Documentation/virtual/virtio-spec.txt b/Documentation/virtual/virtio-spec.txt
index da094737e2f8..0d6ec85481cb 100644
--- a/Documentation/virtual/virtio-spec.txt
+++ b/Documentation/virtual/virtio-spec.txt
@@ -1,11 +1,11 @@
[Generated file: see http://ozlabs.org/~rusty/virtio-spec/]
Virtio PCI Card Specification
-v0.9.1 DRAFT
+v0.9.5 DRAFT
-
-Rusty Russell <rusty@rustcorp.com.au>IBM Corporation (Editor)
+Rusty Russell <rusty@rustcorp.com.au> IBM Corporation (Editor)
-2011 August 1.
+2012 May 7.
Purpose and Description
@@ -68,11 +68,11 @@ and consists of three parts:
+-------------------+-----------------------------------+-----------+
-When the driver wants to send buffers to the device, it puts them
-in one or more slots in the descriptor table, and writes the
-descriptor indices into the available ring. It then notifies the
-device. When the device has finished with the buffers, it writes
-the descriptors into the used ring, and sends an interrupt.
+When the driver wants to send a buffer to the device, it fills in
+a slot in the descriptor table (or chains several together), and
+writes the descriptor index into the available ring. It then
+notifies the device. When the device has finished a buffer, it
+writes the descriptor into the used ring, and sends an interrupt.
Specification
@@ -106,8 +106,14 @@ for informational purposes by the guest).
+----------------------+--------------------+---------------+
| 6 | ioMemory | - |
+----------------------+--------------------+---------------+
+| 7 | rpmsg | Appendix H |
++----------------------+--------------------+---------------+
+| 8 | SCSI host | Appendix I |
++----------------------+--------------------+---------------+
| 9 | 9P transport | - |
+----------------------+--------------------+---------------+
+| 10 | mac80211 wlan | - |
++----------------------+--------------------+---------------+
Device Configuration
@@ -127,7 +133,7 @@ Note that this is possible because while the virtio header is PCI
the native endian of the guest (where such distinction is
applicable).
- Device Initialization Sequence
+ Device Initialization Sequence<sub:Device-Initialization-Sequence>
We start with an overview of device initialization, then expand
on the details of the device and how each step is preformed.
@@ -177,7 +183,10 @@ The virtio header looks as follows:
If MSI-X is enabled for the device, two additional fields
-immediately follow this header:
+immediately follow this header:[footnote:
+ie. once you enable MSI-X on the device, the other fields move.
+If you turn it off again, they move back!
+]
+------------++----------------+--------+
@@ -191,20 +200,6 @@ immediately follow this header:
+------------++----------------+--------+
-Finally, if feature bits (VIRTIO_F_FEATURES_HI) this is
-immediately followed by two additional fields:
-
-
-+------------++----------------------+----------------------
-| Bits || 32 | 32
-+------------++----------------------+----------------------
-| Read/Write || R | R+W
-+------------++----------------------+----------------------
-| Purpose || Device | Guest
-| || Features bits 32:63 | Features bits 32:63
-+------------++----------------------+----------------------
-
-
Immediately following these general headers, there may be
device-specific headers:
@@ -238,31 +233,25 @@ at least one bit should be set:
may be a significant (or infinite) delay before setting this
bit.
- DRIVER_OK (3) Indicates that the driver is set up and ready to
+ DRIVER_OK (4) Indicates that the driver is set up and ready to
drive the device.
- FAILED (8) Indicates that something went wrong in the guest,
+ FAILED (128) Indicates that something went wrong in the guest,
and it has given up on the device. This could be an internal
error, or the driver didn't like the device for some reason, or
even a fatal error during device operation. The device must be
reset before attempting to re-initialize.
- Feature Bits
+ Feature Bits<sub:Feature-Bits>
-The least significant 31 bits of the first configuration field
-indicates the features that the device supports (the high bit is
-reserved, and will be used to indicate the presence of future
-feature bits elsewhere). If more than 31 feature bits are
-supported, the device indicates so by setting feature bit 31 (see
-[cha:Reserved-Feature-Bits]). The bits are allocated as follows:
+Thefirst configuration field indicates the features that the
+device supports. The bits are allocated as follows:
0 to 23 Feature bits for the specific device type
- 24 to 40 Feature bits reserved for extensions to the queue and
+ 24 to 32 Feature bits reserved for extensions to the queue and
feature negotiation mechanisms
- 41 to 63 Feature bits reserved for future extensions
-
For example, feature bit 0 for a network device (i.e. Subsystem
Device ID 1) indicates that the device supports checksumming of
packets.
@@ -286,10 +275,6 @@ will not see that feature bit in the Device Features field and
can go into backwards compatibility mode (or, for poor
implementations, set the FAILED Device Status bit).
-Access to feature bits 32 to 63 is enabled by Guest by setting
-feature bit 31. If this bit is unset, Device must assume that all
-feature bits > 31 are unset.
-
Configuration/Queue Vectors
When MSI-X capability is present and enabled in the device
@@ -324,7 +309,7 @@ success, the previously written value is returned, and on
failure, NO_VECTOR is returned. If a mapping failure is detected,
the driver can retry mapping with fewervectors, or disable MSI-X.
- Virtqueue Configuration
+ Virtqueue Configuration<sec:Virtqueue-Configuration>
As a device can have zero or more virtqueues for bulk data
transport (for example, the network driver has two), the driver
@@ -587,7 +572,7 @@ and Red Hat under the (3-clause) BSD license so that it can be
freely used by all other projects, and is reproduced (with slight
variation to remove Linux assumptions) in Appendix A.
- Device Operation
+ Device Operation<sec:Device-Operation>
There are two parts to device operation: supplying new buffers to
the device, and processing used buffers from the device. As an
@@ -813,7 +798,7 @@ vring.used->ring[vq->last_seen_used%vsz];
}
- Dealing With Configuration Changes
+ Dealing With Configuration Changes<sub:Dealing-With-Configuration>
Some virtio PCI devices can change the device configuration
state, as reflected in the virtio header in the PCI configuration
@@ -1260,18 +1245,6 @@ Currently there are five device-independent feature bits defined:
driver should ignore the used_event field; the device should
ignore the avail_event field; the flags field is used
- VIRTIO_F_BAD_FEATURE(30) This feature should never be
- negotiated by the guest; doing so is an indication that the
- guest is faulty[footnote:
-An experimental virtio PCI driver contained in Linux version
-2.6.25 had this problem, and this feature bit can be used to
-detect it.
-]
-
- VIRTIO_F_FEATURES_HIGH(31) This feature indicates that the
- device supports feature bits 32:63. If unset, feature bits
- 32:63 are unset.
-
Appendix C: Network Device
The virtio network device is a virtual ethernet card, and is the
@@ -1335,11 +1308,17 @@ were required.
VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering.
+ VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous
+ packets.
+
Device configuration layout Two configuration fields are
currently defined. The mac address field always exists (though
is only valid if VIRTIO_NET_F_MAC is set), and the status field
- only exists if VIRTIO_NET_F_STATUS is set. Only one bit is
- currently defined for the status field: VIRTIO_NET_S_LINK_UP. #define VIRTIO_NET_S_LINK_UP 1
+ only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits
+ are currently defined for the status field:
+ VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE. #define VIRTIO_NET_S_LINK_UP 1
+
+#define VIRTIO_NET_S_ANNOUNCE 2
@@ -1377,12 +1356,19 @@ struct virtio_net_config {
packets by negotating the VIRTIO_NET_F_CSUM feature. This “
checksum offload” is a common feature on modern network cards.
- If that feature is negotiated, a driver can use TCP or UDP
- segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4
- (IPv4 TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and
- VIRTIO_NET_F_HOST_UFO (UDP fragmentation) features. It should
- not send TCP packets requiring segmentation offload which have
- the Explicit Congestion Notification bit set, unless the
+ If that feature is negotiated[footnote:
+ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
+dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
+features must offer the checksum feature, and a driver which
+accepts the offload features must accept the checksum feature.
+Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
+depending on VIRTIO_NET_F_GUEST_CSUM.
+], a driver can use TCP or UDP segmentation offload by
+ negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 TCP),
+ VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
+ (UDP fragmentation) features. It should not send TCP packets
+ requiring segmentation offload which have the Explicit
+ Congestion Notification bit set, unless the
VIRTIO_NET_F_HOST_ECN feature is negotiated.[footnote:
This is a common restriction in real, older network cards.
]
@@ -1403,7 +1389,7 @@ segmentation, if both guests are amenable.
Packets are transmitted by placing them in the transmitq, and
buffers for incoming packets are placed in the receiveq. In each
-case, the packet itself is preceded by a header:
+case, the packet itself is preceeded by a header:
struct virtio_net_hdr {
@@ -1462,9 +1448,10 @@ It will have a 14 byte ethernet header and 20 byte IP header
followed by the TCP header (with the TCP checksum field 16 bytes
into that header). csum_start will be 14+20 = 34 (the TCP
checksum includes the header), and csum_offset will be 16. The
-value in the TCP checksum field will be the sum of the TCP pseudo
-header, so that replacing it by the ones' complement checksum of
-the TCP header and body will give the correct result.
+value in the TCP checksum field should be initialized to the sum
+of the TCP pseudo header, so that replacing it by the ones'
+complement checksum of the TCP header and body will give the
+correct result.
]
<enu:If-the-driver>If the driver negotiated
@@ -1483,8 +1470,8 @@ Due to various bugs in implementations, this field is not useful
as a guarantee of the transport header size.
]
- gso_size is the size of the packet beyond that header (ie.
- MSS).
+ gso_size is the maximum size of each packet beyond that header
+ (ie. MSS).
If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the
VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as well,
@@ -1567,7 +1554,9 @@ Processing packet involves:
If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were
negotiated, then the “gso_type” may be something other than
VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the
- desired MSS (see [enu:If-the-driver]).Control Virtqueue
+ desired MSS (see [enu:If-the-driver]).
+
+ Control Virtqueue
The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is
negotiated) to send commands to manipulate various features of
@@ -1642,7 +1631,7 @@ struct virtio_net_ctrl_mac {
The device can filter incoming packets by any number of
destination MAC addresses.[footnote:
-Since there are no guarantees, it can use a hash filter
+Since there are no guarentees, it can use a hash filter
orsilently switch to allmulti or promiscuous mode if it is given
too many addresses.
] This table is set using the class VIRTIO_NET_CTRL_MAC and the
@@ -1665,6 +1654,38 @@ can control a VLAN filter table in the device.
Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL
command take a 16-bit VLAN id as the command-specific-data.
+ Gratuitous Packet Sending
+
+If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends
+on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous
+packets; this is usually done after the guest has been physically
+migrated, and needs to announce its presence on the new network
+links. (As hypervisor does not have the knowledge of guest
+network configuration (eg. tagged vlan) it is simplest to prod
+the guest in this way).
+
+#define VIRTIO_NET_CTRL_ANNOUNCE 3
+
+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
+
+The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status
+field when it notices the changes of device configuration. The
+command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
+driver has recevied the notification and device would clear the
+VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
+this command.
+
+Processing this notification involves:
+
+ Sending the gratuitous packets or marking there are pending
+ gratuitous packets to be sent and letting deferred routine to
+ send them.
+
+ Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control
+ vq.
+
+ .
+
Appendix D: Block Device
The virtio block device is a simple virtual block device (ie.
@@ -1699,8 +1720,6 @@ device except where noted.
VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
-
-
Device configuration layout The capacity of the device
(expressed in 512-byte sectors) is always present. The
availability of the others all depend on various feature bits
@@ -1743,8 +1762,6 @@ device except where noted.
If the VIRTIO_BLK_F_RO feature is set by the device, any write
requests will fail.
-
-
Device Operation
The driver queues requests to the virtqueue, and they are used by
@@ -1805,7 +1822,7 @@ the FLUSH and FLUSH_OUT types are equivalent, the device does not
distinguish between them
]). If the device has VIRTIO_BLK_F_BARRIER feature the high bit
(VIRTIO_BLK_T_BARRIER) indicates that this request acts as a
-barrier and that all preceding requests must be complete before
+barrier and that all preceeding requests must be complete before
this one, and all following requests must not be started until
this is complete. Note that a barrier does not flush caches in
the underlying backend device in host, and thus does not serve as
@@ -2118,7 +2135,7 @@ This is historical, and independent of the guest page size
Otherwise, the guest may begin to re-use pages previously given
to the balloon before the device has acknowledged their
- withdrawal. [footnote:
+ withdrawl. [footnote:
In this case, deflation advice is merely a courtesy
]
@@ -2198,3 +2215,996 @@ as follows:
VIRTIO_BALLOON_S_MEMTOT The total amount of memory available
(in bytes).
+Appendix H: Rpmsg: Remote Processor Messaging
+
+Virtio rpmsg devices represent remote processors on the system
+which run in asymmetric multi-processing (AMP) configuration, and
+which are usually used to offload cpu-intensive tasks from the
+main application processor (a typical SoC methodology).
+
+Virtio is being used to communicate with those remote processors;
+empty buffers are placed in one virtqueue for receiving messages,
+and non-empty buffers, containing outbound messages, are enqueued
+in a second virtqueue for transmission.
+
+Numerous communication channels can be multiplexed over those two
+virtqueues, so different entities, running on the application and
+remote processor, can directly communicate in a point-to-point
+fashion.
+
+ Configuration
+
+ Subsystem Device ID 7
+
+ Virtqueues 0:receiveq. 1:transmitq.
+
+ Feature bits
+
+ VIRTIO_RPMSG_F_NS (0) Device sends (and capable of receiving)
+ name service messages announcing the creation (or
+ destruction) of a channel:/**
+
+ * struct rpmsg_ns_msg - dynamic name service announcement
+message
+
+ * @name: name of remote service that is published
+
+ * @addr: address of remote service that is published
+
+ * @flags: indicates whether service is created or destroyed
+
+ *
+
+ * This message is sent across to publish a new service (or
+announce
+
+ * about its removal). When we receives these messages, an
+appropriate
+
+ * rpmsg channel (i.e device) is created/destroyed.
+
+ */
+
+struct rpmsg_ns_msgoon_config {
+
+ char name[RPMSG_NAME_SIZE];
+
+ u32 addr;
+
+ u32 flags;
+
+} __packed;
+
+
+
+/**
+
+ * enum rpmsg_ns_flags - dynamic name service announcement flags
+
+ *
+
+ * @RPMSG_NS_CREATE: a new remote service was just created
+
+ * @RPMSG_NS_DESTROY: a remote service was just destroyed
+
+ */
+
+enum rpmsg_ns_flags {
+
+ RPMSG_NS_CREATE = 0,
+
+ RPMSG_NS_DESTROY = 1,
+
+};
+
+ Device configuration layout
+
+At his point none currently defined.
+
+ Device Initialization
+
+ The initialization routine should identify the receive and
+ transmission virtqueues.
+
+ The receive virtqueue should be filled with receive buffers.
+
+ Device Operation
+
+Messages are transmitted by placing them in the transmitq, and
+buffers for inbound messages are placed in the receiveq. In any
+case, messages are always preceded by the following header: /**
+
+ * struct rpmsg_hdr - common header for all rpmsg messages
+
+ * @src: source address
+
+ * @dst: destination address
+
+ * @reserved: reserved for future use
+
+ * @len: length of payload (in bytes)
+
+ * @flags: message flags
+
+ * @data: @len bytes of message payload data
+
+ *
+
+ * Every message sent(/received) on the rpmsg bus begins with
+this header.
+
+ */
+
+struct rpmsg_hdr {
+
+ u32 src;
+
+ u32 dst;
+
+ u32 reserved;
+
+ u16 len;
+
+ u16 flags;
+
+ u8 data[0];
+
+} __packed;
+
+Appendix I: SCSI Host Device
+
+The virtio SCSI host device groups together one or more virtual
+logical units (such as disks), and allows communicating to them
+using the SCSI protocol. An instance of the device represents a
+SCSI host to which many targets and LUNs are attached.
+
+The virtio SCSI device services two kinds of requests:
+
+ command requests for a logical unit;
+
+ task management functions related to a logical unit, target or
+ command.
+
+The device is also able to send out notifications about added and
+removed logical units. Together, these capabilities provide a
+SCSI transport protocol that uses virtqueues as the transfer
+medium. In the transport protocol, the virtio driver acts as the
+initiator, while the virtio SCSI host provides one or more
+targets that receive and process the requests.
+
+ Configuration
+
+ Subsystem Device ID 8
+
+ Virtqueues 0:controlq; 1:eventq; 2..n:request queues.
+
+ Feature bits
+
+ VIRTIO_SCSI_F_INOUT (0) A single request can include both
+ read-only and write-only data buffers.
+
+ VIRTIO_SCSI_F_HOTPLUG (1) The host should enable
+ hot-plug/hot-unplug of new LUNs and targets on the SCSI bus.
+
+ Device configuration layout All fields of this configuration
+ are always available. sense_size and cdb_size are writable by
+ the guest.struct virtio_scsi_config {
+
+ u32 num_queues;
+
+ u32 seg_max;
+
+ u32 max_sectors;
+
+ u32 cmd_per_lun;
+
+ u32 event_info_size;
+
+ u32 sense_size;
+
+ u32 cdb_size;
+
+ u16 max_channel;
+
+ u16 max_target;
+
+ u32 max_lun;
+
+};
+
+ num_queues is the total number of request virtqueues exposed by
+ the device. The driver is free to use only one request queue,
+ or it can use more to achieve better performance.
+
+ seg_max is the maximum number of segments that can be in a
+ command. A bidirectional command can include seg_max input
+ segments and seg_max output segments.
+
+ max_sectors is a hint to the guest about the maximum transfer
+ size it should use.
+
+ cmd_per_lun is a hint to the guest about the maximum number of
+ linked commands it should send to one LUN. The actual value
+ to be used is the minimum of cmd_per_lun and the virtqueue
+ size.
+
+ event_info_size is the maximum size that the device will fill
+ for buffers that the driver places in the eventq. The driver
+ should always put buffers at least of this size. It is
+ written by the device depending on the set of negotated
+ features.
+
+ sense_size is the maximum size of the sense data that the
+ device will write. The default value is written by the device
+ and will always be 96, but the driver can modify it. It is
+ restored to the default when the device is reset.
+
+ cdb_size is the maximum size of the CDB that the driver will
+ write. The default value is written by the device and will
+ always be 32, but the driver can likewise modify it. It is
+ restored to the default when the device is reset.
+
+ max_channel, max_target and max_lun can be used by the driver
+ as hints to constrain scanning the logical units on the
+ host.h
+
+ Device Initialization
+
+The initialization routine should first of all discover the
+device's virtqueues.
+
+If the driver uses the eventq, it should then place at least a
+buffer in the eventq.
+
+The driver can immediately issue requests (for example, INQUIRY
+or REPORT LUNS) or task management functions (for example, I_T
+RESET).
+
+ Device Operation: request queues
+
+The driver queues requests to an arbitrary request queue, and
+they are used by the device on that same queue. It is the
+responsibility of the driver to ensure strict request ordering
+for commands placed on different queues, because they will be
+consumed with no order constraints.
+
+Requests have the following format:
+
+struct virtio_scsi_req_cmd {
+
+ // Read-only
+
+ u8 lun[8];
+
+ u64 id;
+
+ u8 task_attr;
+
+ u8 prio;
+
+ u8 crn;
+
+ char cdb[cdb_size];
+
+ char dataout[];
+
+ // Write-only part
+
+ u32 sense_len;
+
+ u32 residual;
+
+ u16 status_qualifier;
+
+ u8 status;
+
+ u8 response;
+
+ u8 sense[sense_size];
+
+ char datain[];
+
+};
+
+
+
+/* command-specific response values */
+
+#define VIRTIO_SCSI_S_OK 0
+
+#define VIRTIO_SCSI_S_OVERRUN 1
+
+#define VIRTIO_SCSI_S_ABORTED 2
+
+#define VIRTIO_SCSI_S_BAD_TARGET 3
+
+#define VIRTIO_SCSI_S_RESET 4
+
+#define VIRTIO_SCSI_S_BUSY 5
+
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
+
+#define VIRTIO_SCSI_S_TARGET_FAILURE 7
+
+#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
+
+#define VIRTIO_SCSI_S_FAILURE 9
+
+
+
+/* task_attr */
+
+#define VIRTIO_SCSI_S_SIMPLE 0
+
+#define VIRTIO_SCSI_S_ORDERED 1
+
+#define VIRTIO_SCSI_S_HEAD 2
+
+#define VIRTIO_SCSI_S_ACA 3
+
+The lun field addresses a target and logical unit in the
+virtio-scsi device's SCSI domain. The only supported format for
+the LUN field is: first byte set to 1, second byte set to target,
+third and fourth byte representing a single level LUN structure,
+followed by four zero bytes. With this representation, a
+virtio-scsi device can serve up to 256 targets and 16384 LUNs per
+target.
+
+The id field is the command identifier (“tag”).
+
+task_attr, prio and crn should be left to zero. task_attr defines
+the task attribute as in the table above, but all task attributes
+may be mapped to SIMPLE by the device; crn may also be provided
+by clients, but is generally expected to be 0. The maximum CRN
+value defined by the protocol is 255, since CRN is stored in an
+8-bit integer.
+
+All of these fields are defined in SAM. They are always
+read-only, as are the cdb and dataout field. The cdb_size is
+taken from the configuration space.
+
+sense and subsequent fields are always write-only. The sense_len
+field indicates the number of bytes actually written to the sense
+buffer. The residual field indicates the residual size,
+calculated as “data_length - number_of_transferred_bytes”, for
+read or write operations. For bidirectional commands, the
+number_of_transferred_bytes includes both read and written bytes.
+A residual field that is less than the size of datain means that
+the dataout field was processed entirely. A residual field that
+exceeds the size of datain means that the dataout field was
+processed partially and the datain field was not processed at
+all.
+
+The status byte is written by the device to be the status code as
+defined in SAM.
+
+The response byte is written by the device to be one of the
+following:
+
+ VIRTIO_SCSI_S_OK when the request was completed and the status
+ byte is filled with a SCSI status code (not necessarily
+ "GOOD").
+
+ VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires
+ transferring more data than is available in the data buffers.
+
+ VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
+ ABORT TASK or ABORT TASK SET task management function.
+
+ VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
+ because the target indicated by the lun field does not exist.
+
+ VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
+ or device reset (including a task management function).
+
+ VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
+ problem in the connection between the host and the target
+ (severed link).
+
+ VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
+ failure and the guest should not retry on other paths.
+
+ VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
+ but retrying on other paths might yield a different result.
+
+ VIRTIO_SCSI_S_BUSY if the request failed but retrying on the
+ same path should work.
+
+ VIRTIO_SCSI_S_FAILURE for other host or guest error. In
+ particular, if neither dataout nor datain is empty, and the
+ VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
+ request will be immediately returned with a response equal to
+ VIRTIO_SCSI_S_FAILURE.
+
+ Device Operation: controlq
+
+The controlq is used for other SCSI transport operations.
+Requests have the following format:
+
+struct virtio_scsi_ctrl {
+
+ u32 type;
+
+ ...
+
+ u8 response;
+
+};
+
+
+
+/* response values valid for all commands */
+
+#define VIRTIO_SCSI_S_OK 0
+
+#define VIRTIO_SCSI_S_BAD_TARGET 3
+
+#define VIRTIO_SCSI_S_BUSY 5
+
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
+
+#define VIRTIO_SCSI_S_TARGET_FAILURE 7
+
+#define VIRTIO_SCSI_S_NEXUS_FAILURE 8
+
+#define VIRTIO_SCSI_S_FAILURE 9
+
+#define VIRTIO_SCSI_S_INCORRECT_LUN 12
+
+The type identifies the remaining fields.
+
+The following commands are defined:
+
+ Task management function
+#define VIRTIO_SCSI_T_TMF 0
+
+
+
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
+
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
+
+#define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2
+
+#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3
+
+#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
+
+#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
+
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
+
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
+
+
+
+struct virtio_scsi_ctrl_tmf
+
+{
+
+ // Read-only part
+
+ u32 type;
+
+ u32 subtype;
+
+ u8 lun[8];
+
+ u64 id;
+
+ // Write-only part
+
+ u8 response;
+
+}
+
+
+
+/* command-specific response values */
+
+#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
+
+#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10
+
+#define VIRTIO_SCSI_S_FUNCTION_REJECTED 11
+
+ The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All
+ fields except response are filled by the driver. The subtype
+ field must always be specified and identifies the requested
+ task management function.
+
+ Other fields may be irrelevant for the requested TMF; if so,
+ they are ignored but they should still be present. The lun
+ field is in the same format specified for request queues; the
+ single level LUN is ignored when the task management function
+ addresses a whole I_T nexus. When relevant, the value of the id
+ field is matched against the id values passed on the requestq.
+
+ The outcome of the task management function is written by the
+ device in the response field. The command-specific response
+ values map 1-to-1 with those defined in SAM.
+
+ Asynchronous notification query
+#define VIRTIO_SCSI_T_AN_QUERY 1
+
+
+
+struct virtio_scsi_ctrl_an {
+
+ // Read-only part
+
+ u32 type;
+
+ u8 lun[8];
+
+ u32 event_requested;
+
+ // Write-only part
+
+ u32 event_actual;
+
+ u8 response;
+
+}
+
+
+
+#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2
+
+#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4
+
+#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8
+
+#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
+
+#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
+
+#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
+
+ By sending this command, the driver asks the device which
+ events the given LUN can report, as described in paragraphs 6.6
+ and A.6 of the SCSI MMC specification. The driver writes the
+ events it is interested in into the event_requested; the device
+ responds by writing the events that it supports into
+ event_actual.
+
+ The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested
+ fields are written by the driver. The event_actual and response
+ fields are written by the device.
+
+ No command-specific values are defined for the response byte.
+
+ Asynchronous notification subscription
+#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
+
+
+
+struct virtio_scsi_ctrl_an {
+
+ // Read-only part
+
+ u32 type;
+
+ u8 lun[8];
+
+ u32 event_requested;
+
+ // Write-only part
+
+ u32 event_actual;
+
+ u8 response;
+
+}
+
+ By sending this command, the driver asks the specified LUN to
+ report events for its physical interface, again as described in
+ the SCSI MMC specification. The driver writes the events it is
+ interested in into the event_requested; the device responds by
+ writing the events that it supports into event_actual.
+
+ Event types are the same as for the asynchronous notification
+ query message.
+
+ The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and
+ event_requested fields are written by the driver. The
+ event_actual and response fields are written by the device.
+
+ No command-specific values are defined for the response byte.
+
+ Device Operation: eventq
+
+The eventq is used by the device to report information on logical
+units that are attached to it. The driver should always leave a
+few buffers ready in the eventq. In general, the device will not
+queue events to cope with an empty eventq, and will end up
+dropping events if it finds no buffer ready. However, when
+reporting events for many LUNs (e.g. when a whole target
+disappears), the device can throttle events to avoid dropping
+them. For this reason, placing 10-15 buffers on the event queue
+should be enough.
+
+Buffers are placed in the eventq and filled by the device when
+interesting events occur. The buffers should be strictly
+write-only (device-filled) and the size of the buffers should be
+at least the value given in the device's configuration
+information.
+
+Buffers returned by the device on the eventq will be referred to
+as "events" in the rest of this section. Events have the
+following format:
+
+#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
+
+
+
+struct virtio_scsi_event {
+
+ // Write-only part
+
+ u32 event;
+
+ ...
+
+}
+
+If bit 31 is set in the event field, the device failed to report
+an event due to missing buffers. In this case, the driver should
+poll the logical units for unit attention conditions, and/or do
+whatever form of bus scan is appropriate for the guest operating
+system.
+
+Other data that the device writes to the buffer depends on the
+contents of the event field. The following events are defined:
+
+ No event
+#define VIRTIO_SCSI_T_NO_EVENT 0
+
+ This event is fired in the following cases:
+
+ When the device detects in the eventq a buffer that is shorter
+ than what is indicated in the configuration field, it might
+ use it immediately and put this dummy value in the event
+ field. A well-written driver will never observe this
+ situation.
+
+ When events are dropped, the device may signal this event as
+ soon as the drivers makes a buffer available, in order to
+ request action from the driver. In this case, of course, this
+ event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
+ flag.
+
+ Transport reset
+#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
+
+
+
+struct virtio_scsi_event_reset {
+
+ // Write-only part
+
+ u32 event;
+
+ u8 lun[8];
+
+ u32 reason;
+
+}
+
+
+
+#define VIRTIO_SCSI_EVT_RESET_HARD 0
+
+#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
+
+#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
+
+ By sending this event, the device signals that a logical unit
+ on a target has been reset, including the case of a new device
+ appearing or disappearing on the bus.The device fills in all
+ fields. The event field is set to
+ VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a
+ logical unit in the SCSI host.
+
+ The reason value is one of the three #define values appearing
+ above:
+
+ VIRTIO_SCSI_EVT_RESET_REMOVED (“LUN/target removed”) is used if
+ the target or logical unit is no longer able to receive
+ commands.
+
+ VIRTIO_SCSI_EVT_RESET_HARD (“LUN hard reset”) is used if the
+ logical unit has been reset, but is still present.
+
+ VIRTIO_SCSI_EVT_RESET_RESCAN (“rescan LUN/target”) is used if a
+ target or logical unit has just appeared on the device.
+
+ The “removed” and “rescan” events, when sent for LUN 0, may
+ apply to the entire target. After receiving them the driver
+ should ask the initiator to rescan the target, in order to
+ detect the case when an entire target has appeared or
+ disappeared. These two events will never be reported unless the
+ VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host
+ and the guest.
+
+ Events will also be reported via sense codes (this obviously
+ does not apply to newly appeared buses or targets, since the
+ application has never discovered them):
+
+ “LUN/target removed” maps to sense key ILLEGAL REQUEST, asc
+ 0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
+
+ “LUN hard reset” maps to sense key UNIT ATTENTION, asc 0x29
+ (POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
+
+ “rescan LUN/target” maps to sense key UNIT ATTENTION, asc 0x3f,
+ ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
+
+ The preferred way to detect transport reset is always to use
+ events, because sense codes are only seen by the driver when it
+ sends a SCSI command to the logical unit or target. However, in
+ case events are dropped, the initiator will still be able to
+ synchronize with the actual state of the controller if the
+ driver asks the initiator to rescan of the SCSI bus. During the
+ rescan, the initiator will be able to observe the above sense
+ codes, and it will process them as if it the driver had
+ received the equivalent event.
+
+ Asynchronous notification
+#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+
+
+
+struct virtio_scsi_event_an {
+
+ // Write-only part
+
+ u32 event;
+
+ u8 lun[8];
+
+ u32 reason;
+
+}
+
+ By sending this event, the device signals that an asynchronous
+ event was fired from a physical interface.
+
+ All fields are written by the device. The event field is set to
+ VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical
+ unit in the SCSI host. The reason field is a subset of the
+ events that the driver has subscribed to via the "Asynchronous
+ notification subscription" command.
+
+ When dropped events are reported, the driver should poll for
+ asynchronous events manually using SCSI commands.
+
+Appendix X: virtio-mmio
+
+Virtual environments without PCI support (a common situation in
+embedded devices models) might use simple memory mapped device (“
+virtio-mmio”) instead of the PCI device.
+
+The memory mapped virtio device behaviour is based on the PCI
+device specification. Therefore most of operations like device
+initialization, queues configuration and buffer transfers are
+nearly identical. Existing differences are described in the
+following sections.
+
+ Device Initialization
+
+Instead of using the PCI IO space for virtio header, the “
+virtio-mmio” device provides a set of memory mapped control
+registers, all 32 bits wide, followed by device-specific
+configuration space. The following list presents their layout:
+
+ Offset from the device base address | Direction | Name
+ Description
+
+ 0x000 | R | MagicValue
+ “virt” string.
+
+ 0x004 | R | Version
+ Device version number. Currently must be 1.
+
+ 0x008 | R | DeviceID
+ Virtio Subsystem Device ID (ie. 1 for network card).
+
+ 0x00c | R | VendorID
+ Virtio Subsystem Vendor ID.
+
+ 0x010 | R | HostFeatures
+ Flags representing features the device supports.
+ Reading from this register returns 32 consecutive flag bits,
+ first bit depending on the last value written to
+ HostFeaturesSel register. Access to this register returns bits HostFeaturesSel*32
+
+ to (HostFeaturesSel*32)+31
+, eg. feature bits 0 to 31 if
+ HostFeaturesSel is set to 0 and features bits 32 to 63 if
+ HostFeaturesSel is set to 1. Also see [sub:Feature-Bits]
+
+ 0x014 | W | HostFeaturesSel
+ Device (Host) features word selection.
+ Writing to this register selects a set of 32 device feature bits
+ accessible by reading from HostFeatures register. Device driver
+ must write a value to the HostFeaturesSel register before
+ reading from the HostFeatures register.
+
+ 0x020 | W | GuestFeatures
+ Flags representing device features understood and activated by
+ the driver.
+ Writing to this register sets 32 consecutive flag bits, first
+ bit depending on the last value written to GuestFeaturesSel
+ register. Access to this register sets bits GuestFeaturesSel*32
+
+ to (GuestFeaturesSel*32)+31
+, eg. feature bits 0 to 31 if
+ GuestFeaturesSel is set to 0 and features bits 32 to 63 if
+ GuestFeaturesSel is set to 1. Also see [sub:Feature-Bits]
+
+ 0x024 | W | GuestFeaturesSel
+ Activated (Guest) features word selection.
+ Writing to this register selects a set of 32 activated feature
+ bits accessible by writing to the GuestFeatures register.
+ Device driver must write a value to the GuestFeaturesSel
+ register before writing to the GuestFeatures register.
+
+ 0x028 | W | GuestPageSize
+ Guest page size.
+ Device driver must write the guest page size in bytes to the
+ register during initialization, before any queues are used.
+ This value must be a power of 2 and is used by the Host to
+ calculate Guest address of the first queue page (see QueuePFN).
+
+ 0x030 | W | QueueSel
+ Virtual queue index (first queue is 0).
+ Writing to this register selects the virtual queue that the
+ following operations on QueueNum, QueueAlign and QueuePFN apply
+ to.
+
+ 0x034 | R | QueueNumMax
+ Maximum virtual queue size.
+ Reading from the register returns the maximum size of the queue
+ the Host is ready to process or zero (0x0) if the queue is not
+ available. This applies to the queue selected by writing to
+ QueueSel and is allowed only when QueuePFN is set to zero
+ (0x0), so when the queue is not actively used.
+
+ 0x038 | W | QueueNum
+ Virtual queue size.
+ Queue size is a number of elements in the queue, therefore size
+ of the descriptor table and both available and used rings.
+ Writing to this register notifies the Host what size of the
+ queue the Guest will use. This applies to the queue selected by
+ writing to QueueSel.
+
+ 0x03c | W | QueueAlign
+ Used Ring alignment in the virtual queue.
+ Writing to this register notifies the Host about alignment
+ boundary of the Used Ring in bytes. This value must be a power
+ of 2 and applies to the queue selected by writing to QueueSel.
+
+ 0x040 | RW | QueuePFN
+ Guest physical page number of the virtual queue.
+ Writing to this register notifies the host about location of the
+ virtual queue in the Guest's physical address space. This value
+ is the index number of a page starting with the queue
+ Descriptor Table. Value zero (0x0) means physical address zero
+ (0x00000000) and is illegal. When the Guest stops using the
+ queue it must write zero (0x0) to this register.
+ Reading from this register returns the currently used page
+ number of the queue, therefore a value other than zero (0x0)
+ means that the queue is in use.
+ Both read and write accesses apply to the queue selected by
+ writing to QueueSel.
+
+ 0x050 | W | QueueNotify
+ Queue notifier.
+ Writing a queue index to this register notifies the Host that
+ there are new buffers to process in the queue.
+
+ 0x60 | R | InterruptStatus
+Interrupt status.
+Reading from this register returns a bit mask of interrupts
+ asserted by the device. An interrupt is asserted if the
+ corresponding bit is set, ie. equals one (1).
+
+ Bit 0 | Used Ring Update
+This interrupt is asserted when the Host has updated the Used
+ Ring in at least one of the active virtual queues.
+
+ Bit 1 | Configuration change
+This interrupt is asserted when configuration of the device has
+ changed.
+
+ 0x064 | W | InterruptACK
+ Interrupt acknowledge.
+ Writing to this register notifies the Host that the Guest
+ finished handling interrupts. Set bits in the value clear the
+ corresponding bits of the InterruptStatus register.
+
+ 0x070 | RW | Status
+ Device status.
+ Reading from this register returns the current device status
+ flags.
+ Writing non-zero values to this register sets the status flags,
+ indicating the Guest progress. Writing zero (0x0) to this
+ register triggers a device reset.
+ Also see [sub:Device-Initialization-Sequence]
+
+ 0x100+ | RW | Config
+ Device-specific configuration space starts at an offset 0x100
+ and is accessed with byte alignment. Its meaning and size
+ depends on the device and the driver.
+
+Virtual queue size is a number of elements in the queue,
+therefore size of the descriptor table and both available and
+used rings.
+
+The endianness of the registers follows the native endianness of
+the Guest. Writing to registers described as “R” and reading from
+registers described as “W” is not permitted and can cause
+undefined behavior.
+
+The device initialization is performed as described in [sub:Device-Initialization-Sequence]
+ with one exception: the Guest must notify the Host about its
+page size, writing the size in bytes to GuestPageSize register
+before the initialization is finished.
+
+The memory mapped virtio devices generate single interrupt only,
+therefore no special configuration is required.
+
+ Virtqueue Configuration
+
+The virtual queue configuration is performed in a similar way to
+the one described in [sec:Virtqueue-Configuration] with a few
+additional operations:
+
+ Select the queue writing its index (first queue is 0) to the
+ QueueSel register.
+
+ Check if the queue is not already in use: read QueuePFN
+ register, returned value should be zero (0x0).
+
+ Read maximum queue size (number of elements) from the
+ QueueNumMax register. If the returned value is zero (0x0) the
+ queue is not available.
+
+ Allocate and zero the queue pages in contiguous virtual memory,
+ aligning the Used Ring to an optimal boundary (usually page
+ size). Size of the allocated queue may be smaller than or equal
+ to the maximum size returned by the Host.
+
+ Notify the Host about the queue size by writing the size to
+ QueueNum register.
+
+ Notify the Host about the used alignment by writing its value
+ in bytes to QueueAlign register.
+
+ Write the physical number of the first page of the queue to the
+ QueuePFN register.
+
+The queue and the device are ready to begin normal operations
+now.
+
+ Device Operation
+
+The memory mapped virtio device behaves in the same way as
+described in [sec:Device-Operation], with the following
+exceptions:
+
+ The device is notified about new buffers available in a queue
+ by writing the queue index to register QueueNum instead of the
+ virtio header in PCI I/O space ([sub:Notifying-The-Device]).
+
+ The memory mapped virtio device is using single, dedicated
+ interrupt signal, which is raised when at least one of the
+ interrupts described in the InterruptStatus register
+ description is asserted. After receiving an interrupt, the
+ driver must read the InterruptStatus register to check what
+ caused the interrupt (see the register description). After the
+ interrupt is handled, the driver must acknowledge it by writing
+ a bit mask corresponding to the serviced interrupt to the
+ InterruptACK register.
+