Replace QueueStateSync with QueueSync #111

bonzini · 2021-10-26T09:38:09Z

The multithreaded QueueStateSync introduced in #76 has two problems:

it has extremely short critical sections do not make it possible to enforce any invariant on the virtqueue, making QueueStateSync more or less useless without other locking built outside the crate
because it tries to share the same API between the single- and multithreaded cases, the multithreaded API's lock method also takes a &mut self; this requires unnecessary cloning of the Queue or QueueStateSync objects in order to ensure that you have a mutable reference.

The root issue here is that the locking should be done at the Queue level, because Queue already manages interior mutability of the GuestAddressSpace; the queue can easily provide you with both the current GuestMemory of the address space and a QueueState to operate on (including via an AvailIter, this time).

On top of this, this pull request also adds tests for the new QueueSync, as @lauralt reminded during the discussion in #76 (I might add that listening to her, probably, would have made the deficiencies more obvious). In order to reuse the test code, a private API is added that is similar to #76's lock(&mut self). This API is not public because it has all the defects listed above, but it is fine for tests because (unlike real-world code) they have easy access to a &mut QueueSync.

bonzini · 2021-10-26T10:44:21Z

I'm going to use this PR to more or less rewrite the public API of #76.

bonzini · 2021-10-26T15:40:21Z

BTW, the next decision should be whether to keep the methods in Queue, or force all users to use acquire() and go through QueueGuard. I think they should, but I am biased...

jiangliu · 2021-10-26T15:44:49Z

crates/virtio-queue/src/lib.rs

-    fn set_next_avail(&mut self, next_avail: u16);
-}
-
 /// Struct to maintain information and manipulate state of a virtio queue.


We have some usage scenarios using QueueStateSync for multi-threading. The idea is that: the guard object returned by QueueStateT::lock() can ensure invariance of the queue object for a series of queue operations. So if we want to execute a single operation on a queue, we may call QueueStateSync::method() directly. On the other hand, if we want to execute a series of operations on a queue in atomic mode, we get a guard object by QueueStateSync::lock() and then execute the operations against the guard object.

What operations do you need? We can add them to QueueSync if needed, but I'd prefer to keep the external API as small as possible and force you to write lock() every time.

The workload pattern may be abstracted as:

a single thread wait for queue event and fetch available descriptor from the queue.

the thread creates an asynchronous task for each available descriptor and dispatch async task to a multi-threading async engine.

the async task does some IO operations to fulfill the service request.

the async task eventually adds the descriptor to the used ring and notify guests.

On the other hand, much of our existing code is making use of the original queue interface, which is similar to QueueState and QueueStateSync. So it's appreciated to keep the flexibility for clients to choose among Queue, QueueState and QueueStateSync.
Thanks!

My preference is to start small, and only add new functionality with examples of how it improves real-world code without affecting performance negatively (as is often the case with too small critical sections).

Yes, we are trying to avoid the "too small critical sections" by QueueStateT::lock() and QueueStateGuard. But it's true that we need to be careful to use these interfaces for IO intensive devices.

How do you use lock()? It takes a &mut so in reality there's nothing to lock.

Here we are taking a mutable reference to Arc<Mutex<QueueState>> and lock the underlying QueueState object.

Since an Arc is by definition providing only shared references, how are you getting a mutable one?

Here is an example test case to use the QueueStateSync struct:

#[test] fn test_queue_state_sync() { let mut q = QueueStateSync::new(0x1000); let mut q2 = q.clone(); let q3 = q.clone(); let barrier = Arc::new(Barrier::new(3)); let b2 = barrier.clone(); let b3 = barrier.clone(); let t1 = std::thread::spawn(move || { { let guard = q2.lock(); assert_eq!(guard.ready(), false); } b2.wait(); b2.wait(); { let guard = q2.lock(); assert_eq!(guard.ready(), true); } }); let t2 = std::thread::spawn(move || { assert_eq!(q3.ready(), false); b3.wait(); b3.wait(); assert_eq!(q3.ready(), true); }); barrier.wait(); q.set_ready(true); barrier.wait(); t1.join().unwrap(); t2.join().unwrap(); }

Thanks for the example. With the proposed API, q2 need not be mut anymore; q3 needs an explicit lock().

sboeuf

This PR is another major change in the way virtio queues are managed. At some point we have to stop breaking the API if we want projects to be able to use this crate (I know we try very hard to in Cloud Hypervisor).
The gain of changing the API is not entirely clear to me, and more than that, the part which binds the guest memory with the queue feels like one step too far for this interface.
I'd like to hear from other users (@alexandruag @slp) about this as well.

bonzini · 2021-11-04T10:27:37Z

The gain in changing the API is mostly to allow using shared references more easily, and using the same standard interior mutability pattern (lock()) for both the Queue and the GuestAddressSpace. It should improve performance by avoiding small critical sections.

As an aside, using an enum to dispatch trait methods is a pattern that I've never seen in code other than rust-vmm (similar to rust-vmm/vm-memory@855306d). I'm not experienced enough with Rust to say whether it's an antipattern, but this series also removes it.

We are using a `: GuestAddressSpace` trait bound for `AvailIter` and `DescriptorChain`, but the objects are actually only interested in the fact that `M` can dereference to a `GuestMemory` implementation. Signed-off-by: Alexandru Agache <[email protected]>

Signed-off-by: Alexandru Agache <[email protected]>

The extremely short critical sections do not make it possible to enforce any invariant on the virtqueue, making QueueStateSync more or less useless. Rip it out, the mutex abstraction will be introduced at the Queue level in the next commits. Signed-off-by: Paolo Bonzini <[email protected]>

QueueGuard encapsulates Queue during a sequence of accesses under the same critical section. A QueueGuard always uses the same GuestMemory for all operations, and can be extended to be a single critical section in the case of multithreaded VMMs. Signed-off-by: Paolo Bonzini <[email protected]>

This will allow to reuse the same tests for QueueState and QueueStateSync. The trait is _not_ part of the public API, which consists exclusively of acquire() and lock(). This is because with() is mostly useless for the multithreaded case, where you have a &self but not a &mut self. Signed-off-by: Paolo Bonzini <[email protected]>

Avoid constant repetition of "GuestMemoryMmap::<()>". Signed-off-by: Paolo Bonzini <[email protected]>

Make all accesses to the underlying QueueState go through the with() function. The function which will take care of locking when testing QueueSync. Signed-off-by: Paolo Bonzini <[email protected]>

Similar to Queue, QueueSync associates a GuestAddressSpace to a QueueState. The QueueState however is wrapped with reference counting and with a mutex. All access to the QueueSync object has to go through a QueueGuard, which takes care of locking the mutex. Signed-off-by: Paolo Bonzini <[email protected]>

The tests simply invoke the same code twice, once with a Queue and once with a QueueSync. test_queue_guard/test_queue_guard_sync are different, because one uses lock() and one uses acquire(). Suggested-by: Laura Loghin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

jiangliu · 2021-11-04T11:01:34Z

The gain in changing the API is mostly to allow using shared references more easily, and using the same standard interior mutability pattern (lock()) for both the Queue and the GuestAddressSpace.

As an aside, using an enum to dispatch trait methods is a pattern that I've never seen in code other than rust-vmm (similar to rust-vmm/vm-memory@855306d). I'm not experienced enough with Rust to say whether it's an antipattern, but this series also removes it.

It's my fault to introduce an enum for the lock() operation. Your previous patch set which removes the enum seems really great to me, and we should go that way.

But it's really hard for us to make use of the Queue or QueueSync objects with an associated GuestAddressSpace object. We hope we could directly access the QueueState and QueueStateSync, which simplifies our implementation and leaves room for us to optimize ArcSwap related performance issues.

bonzini · 2021-11-04T11:08:24Z

If you cannot use Queue, you can use a Mutex<QueueState> and call QueueGuard::new in your own code. QueueGuard::new creates a combination of a GuestMemory and a MutexGuard<QueueState> so it should work for you.

But one point of this series is in fact to limit the cost of ArcSwap, because now you only pay the price of ArcSwap when you already have to lock a mutex (which is more expensive than getting an ArcSwap guard).

Signed-off-by: Paolo Bonzini <[email protected]>

andreeaflorescu · 2021-11-04T11:57:10Z

This PR is another major change in the way virtio queues are managed. At some point we have to stop breaking the API if we want projects to be able to use this crate (I know we try very hard to in Cloud Hypervisor).

I understand your concern. We're now in the process of figuring out what we actually want to publish to crates.io as an initial interface, so for the upcoming 2-3 weeks until we manage to get the interface clean and tidy, we're still going to probably introduce breaking changes. Once it is published it should be much easier because you can consume the version from crates.io. Are you already using this in Cloud Hypervisor? Could it be possible to temporarily pause the updates until we get the interface in a more stable state?

slp · 2021-11-09T12:28:41Z

I've been playing with this PR in the context of vhost-user-backend and virtiofsd-rs. In general, I think it makes the API more elegant, but I also think it'd be nicer if it was more flexible and not force the user to always go through QueueGuard.

For instance, there might be cases in which the queue is going to be solely owned by an struct that wraps it, adding more functionality, and managing both locking and atomic_mem on its own (VringMutex in vhost-user-backend is an example of such struct).

I think one way to satisfy both a use case in which the queue is going to act as a completely independent entity and another in which is going to be wrapped inside another object, would be to make all methods from QueueState public. That way users can choose between relying on the additional protection implemented by QueueGuard or just go with the bare QueueState and do their own thing.

That said, I also agree with @sboeuf that we should find an API that we all find acceptable and commit to it for 1.0.0, leaving breaking changes for 2.0.0 (following semver).

jiangliu · 2021-11-12T03:12:30Z

I've been playing with this PR in the context of vhost-user-backend and virtiofsd-rs. In general, I think it makes the API more elegant, but I also think it'd be nicer if it was more flexible and not force the user to always go through QueueGuard.

For instance, there might be cases in which the queue is going to be solely owned by an struct that wraps it, adding more functionality, and managing both locking and atomic_mem on its own (VringMutex in vhost-user-backend is an example of such struct).

I think one way to satisfy both a use case in which the queue is going to act as a completely independent entity and another in which is going to be wrapped inside another object, would be to make all methods from QueueState public. That way users can choose between relying on the additional protection implemented by QueueGuard or just go with the bare QueueState and do their own thing.

That said, I also agree with @sboeuf that we should find an API that we all find acceptable and commit to it for 1.0.0, leaving breaking changes for 2.0.0 (following semver).

Great to hear that you have done experiment with vhost-user-backend/virtiofsd, it's on my TODO list:)
So any chance to upstream the required changes?

slp · 2021-11-12T06:54:44Z

Great to hear that you have done experiment with vhost-user-backend/virtiofsd, it's on my TODO list:)
So any chance to upstream the required changes?

Sure, I've just pushed my test branches:

Those changes are made with the assumption that all methods in QueueState have been made public, allowing Vring to take direct ownership of it.

bonzini · 2021-11-12T13:43:30Z

@slp, can you try passing the mem to get_queue()/get_queue_mut(), and return the QueueGuard from there (the declarations can return a Deref<Target = QueueState> and a DerefMut<Target = QueueState> respectively)? The nice part of the QueueGuard, in theory, would be that it ensures that all operations use the same GuestMemory consistently, without having to pass mem to many places.

I can separate the guard object into a QueueReadGuard and a QueueWriteGuard too, if it's useful.

Then we can even decide that Queue/QueueSync are unnecessary; but I'd really prefer if all users that need to access memory did go through QueueGuard's API with implicit GuestMemory.

slp · 2021-11-12T14:12:25Z

@bonzini In the context of vhost_user_backend::Vring, I could, but that would mean holding a second lock, despite knowing that I'm holding the one guarding Vring, and that this one is sole owner of the queue. In addition of more expensive, it also introduces the unnecessary hurdle of having to coordinate both locks.

In a more general level, I'm starting to suspect the reason why we still haven't found an API for virtio-queue that satisfies everyone is because we're trying to push too much policy into it (and I'm talking in general, not just about this PR). This can be quite a problem specially because Queue frequently plays a part in critical sections. I think that, even if we provide a safe interface to it, we should always allow users to make direct use of Queue and QueueState, in case it makes sense for their particular use case.

bonzini · 2021-11-12T14:59:00Z

I could, but that would mean holding a second lock

QueueGuard is flexible and it does not have to be tied to a lock guard. It can be tied to a simple &mut QueueState, in which case all it does is tie a GuestMemory and a QueueState for the lifetime of the QueueGuard.

In a more general level, I'm starting to suspect the reason why we still haven't found an API for virtio-queue that satisfies everyone is because we're trying to push too much policy into it

Right, so I'm trying to understand if QueueGuard alone provides the right boundary between not-too-much-policy and consistent access to the GuestAddressSpace.

jiangliu · 2021-11-15T15:41:04Z

Great to hear that you have done experiment with vhost-user-backend/virtiofsd, it's on my TODO list:)
So any chance to upstream the required changes?

Sure, I've just pushed my test branches:

https://github.com/slp/vhost-user-backend/tree/bonzini-queue

https://gitlab.com/slp/virtiofsd-rs/-/commits/bonzini-queue

Those changes are made with the assumption that all methods in QueueState have been made public, allowing Vring to take direct ownership of it.

That's what we have expected for a long time:)
Adding policies over QueueState helps some usage cases, but there are still usage cases where consumers need/want raw access to the QueueState object. So it would be great to provide two sets of interfaces: advanced and raw APIs. Actually it has blocked us for a while to migrate to the upstream virtio-queue by only providing the advanced APIs.

slp · 2021-11-17T14:41:13Z

I could, but that would mean holding a second lock

QueueGuard is flexible and it does not have to be tied to a lock guard. It can be tied to a simple &mut QueueState, in which case all it does is tie a GuestMemory and a QueueState for the lifetime of the QueueGuard.

@bonzini I see your point, and certainly the interface looks better this way. We still have one problem though, which I think is the same @jiangliu is facing. Let me illustrate it with an example (I've simplified the code for readability):

pub struct VringState<M: GuestAddressSpace = GuestMemoryAtomic<GuestMemoryMmap>> {
    queue: Queue<M>,
    call: Option<EventFd>,
}

impl<M: GuestAddressSpace> VringState<M> {
    /// Notify the vhost-user master that used descriptors have been put into the used queue.
    pub fn signal_used_queue(&self) -> io::Result<()> {
        if let Some(call) = self.call.as_ref() {
            call.write(1)
        } else {
            Ok(())
        }
    }
}

In this scenario, if you acquire a QueueGuard with VringState::queue.acquire(), you lose the ability to call VringState::signal_used_queue, as you're already borrowing from VringState. This implies having to drop QueueGuard (and thus, its GuestAddressSpace reference) to signal the queue, which has a performance penalty on the critical section of processing a virtqueue.

So far the only solution I've been able to come up with is being able to directly have QueueState as a field of VringState, which would allow then, if desired, to implement a VringGuard providing similar semantics as QueueGuard, tying its lifetime with a GuestMemory. Perhaps there is a better solution I haven't come up with?

bonzini · 2021-11-17T16:17:25Z

you're already borrowing from VringState

IIUC that's because you're already borrowing mutably when building the QueueGuard:

let queue = vring_state.get_queue_mut();
...
vring_state.signal_used_queue().unwrap();

For the specific case of signal_used_queue(), which can be a method on &self, would it work to use a RefCell<QueueState> in Vring? For the multithreaded case, Mutex<QueueState> should already work because lock() is a method on &self (unlike acquire() which is on &mut self, hence the different names)

So far the only solution I've been able to come up with is being able to directly have QueueState as a field of VringState, which would allow then, if desired, to implement a VringGuard providing similar semantics as QueueGuard, tying its lifetime with a GuestMemory

Yes, that would be a valid solution too. Not a great one because it pushes the responsibility out of vm-virtio, but I guess it's okay.

slp · 2021-11-17T18:43:39Z

For the specific case of signal_used_queue(), which can be a method on &self, would it work to use a RefCell<QueueState> in Vring? For the multithreaded case, Mutex<QueueState> should already work because lock() is a method on &self (unlike acquire() which is on &mut self, hence the different names)

Hm... I guess there might be a way to make it work with RefCell, but I'm not sure if forcing users to rely on it is a good idea. It's too easy to come up with code that compiles but fails at runtime in some code path.

Yes, that would be a valid solution too. Not a great one because it pushes the responsibility out of vm-virtio, but I guess it's okay.

We can give users both options, documenting that the preferred one is consuming Queue and/or QueueSync.

@jiangliu @alexandruag what do you think?

bonzini · 2021-11-17T18:56:17Z

Maybe signal_used_queue() needs to be a trait that QueueGuard implements. Using some kind of smart pointer it should be possible to create a QueueGuard instance that holds a &mut VringState...

jiangliu · 2021-11-18T08:39:05Z

For the specific case of signal_used_queue(), which can be a method on &self, would it work to use a RefCell<QueueState> in Vring? For the multithreaded case, Mutex<QueueState> should already work because lock() is a method on &self (unlike acquire() which is on &mut self, hence the different names)

Hm... I guess there might be a way to make it work with RefCell, but I'm not sure if forcing users to rely on it is a good idea. It's too easy to come up with code that compiles but fails at runtime in some code path.

Yes, that would be a valid solution too. Not a great one because it pushes the responsibility out of vm-virtio, but I guess it's okay.

We can give users both options, documenting that the preferred one is consuming Queue and/or QueueSync.

@jiangliu @alexandruag what do you think?

It would be great to provide raw access to QueueState. After all it's needed for vm snapshot/upgrading/migration.
I feel it's ok for virtio-queue to provide a set of low-level APIs and a set of high-level APIs.

bonzini · 2021-11-18T08:40:52Z

I agree that it's okay to give access to QueueState, but it's also important that QueueGuard be usable.

There were a couple of changes in rust-vmm#76 that are not yet tested. PR that covers that functionality is in review here: rust-vmm#111. Until that one is merged, the coverage needs to be decreased. Signed-off-by: Laura Loghin <[email protected]>

There were a couple of changes in #76 that are not yet tested. PR that covers that functionality is in review here: #111. Until that one is merged, the coverage needs to be decreased. Signed-off-by: Laura Loghin <[email protected]>

jiangliu · 2021-12-21T02:11:14Z

Hi @bonzini , we have factored the QueueGuard recently and convert it from an enum into a trait. Please refer to
https://github.com/rust-vmm/vm-virtio/blob/main/crates/virtio-queue/src/lib.rs#L80 for lastest code.

Current implementation works for Cloud Hypervisor, virtiofsd, nydus and dragonball, and there are proposed non-breaking enhancement for Firecracker. So is it feasible to publish v0.1 recently and postpone this PR for v0.2?

bonzini · 2021-12-21T10:52:00Z

Yes, I agree this is the best that can be done without breaking the API.

bonzini requested review from alexandruag, jiangliu and slp as code owners October 26, 2021 09:38

bonzini force-pushed the qss-tests branch 2 times, most recently from 838f06e to 5c4341b Compare October 26, 2021 10:01

bonzini mentioned this pull request Oct 26, 2021

Remove generic type parameter M from QueueState #104

Merged

bonzini force-pushed the qss-tests branch 2 times, most recently from 2e68088 to b87f9f1 Compare October 26, 2021 15:39

bonzini force-pushed the qss-tests branch 2 times, most recently from 6ae196d to 65c816d Compare October 26, 2021 15:44

jiangliu reviewed Oct 26, 2021

View reviewed changes

bonzini changed the title ~~Cover QueueStateSync in tests~~ Replace QueueStateSync with QueueSync Oct 26, 2021

bonzini force-pushed the qss-tests branch 2 times, most recently from fa7bae8 to efd1d22 Compare October 26, 2021 16:24

sboeuf self-requested a review November 3, 2021 09:42

sboeuf reviewed Nov 4, 2021

View reviewed changes

alexandruag and others added 6 commits November 4, 2021 11:28

remove generic type parameter M from QueueState

e1e6ac0

Signed-off-by: Alexandru Agache <[email protected]>

tests: introduce a shortcut type AddrSpace

273b933

Avoid constant repetition of "GuestMemoryMmap::<()>". Signed-off-by: Paolo Bonzini <[email protected]>

bonzini force-pushed the qss-tests branch from efd1d22 to f08464f Compare November 4, 2021 10:43

bonzini added 3 commits November 4, 2021 11:56

tests: switch to QueueT

10306e5

Make all accesses to the underlying QueueState go through the with() function. The function which will take care of locking when testing QueueSync. Signed-off-by: Paolo Bonzini <[email protected]>

bonzini force-pushed the qss-tests branch from f08464f to e00449d Compare November 4, 2021 10:56

update coverage

ebc4db0

Signed-off-by: Paolo Bonzini <[email protected]>

bonzini force-pushed the qss-tests branch from e00449d to ebc4db0 Compare November 4, 2021 11:14

slp mentioned this pull request Nov 17, 2021

Fix support for big-endian architectures #118

Merged

bonzini closed this Dec 21, 2021

Replace QueueStateSync with QueueSync #111

Replace QueueStateSync with QueueSync #111

Uh oh!

Conversation

bonzini commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bonzini commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bonzini commented Oct 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sboeuf left a comment

Choose a reason for hiding this comment

Uh oh!

bonzini commented Nov 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangliu commented Nov 4, 2021

Uh oh!

bonzini commented Nov 4, 2021

Uh oh!

andreeaflorescu commented Nov 4, 2021

Uh oh!

slp commented Nov 9, 2021

Uh oh!

jiangliu commented Nov 12, 2021

Uh oh!

slp commented Nov 12, 2021

Uh oh!

bonzini commented Nov 12, 2021

Uh oh!

slp commented Nov 12, 2021

Uh oh!

bonzini commented Nov 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangliu commented Nov 15, 2021

Uh oh!

slp commented Nov 17, 2021

Uh oh!

bonzini commented Nov 17, 2021

Uh oh!

slp commented Nov 17, 2021

Uh oh!

bonzini commented Nov 17, 2021

Uh oh!

jiangliu commented Nov 18, 2021

Uh oh!

bonzini commented Nov 18, 2021

Uh oh!

jiangliu commented Dec 21, 2021

Uh oh!

bonzini commented Dec 21, 2021

Uh oh!

Uh oh!

bonzini commented Oct 26, 2021 •

edited

Loading

bonzini commented Oct 26, 2021 •

edited

Loading

bonzini commented Nov 4, 2021 •

edited

Loading

bonzini commented Nov 12, 2021 •

edited

Loading