Skip to content

Conversation

bonzini
Copy link
Member

@bonzini bonzini commented Oct 26, 2021

The multithreaded QueueStateSync introduced in #76 has two problems:

  • it has extremely short critical sections do not make it possible to enforce any invariant on the virtqueue, making QueueStateSync more or less useless without other locking built outside the crate
  • because it tries to share the same API between the single- and multithreaded cases, the multithreaded API's lock method also takes a &mut self; this requires unnecessary cloning of the Queue or QueueStateSync objects in order to ensure that you have a mutable reference.

The root issue here is that the locking should be done at the Queue level, because Queue already manages interior mutability of the GuestAddressSpace; the queue can easily provide you with both the current GuestMemory of the address space and a QueueState to operate on (including via an AvailIter, this time).

On top of this, this pull request also adds tests for the new QueueSync, as @lauralt reminded during the discussion in #76 (I might add that listening to her, probably, would have made the deficiencies more obvious). In order to reuse the test code, a private API is added that is similar to #76's lock(&mut self). This API is not public because it has all the defects listed above, but it is fine for tests because (unlike real-world code) they have easy access to a &mut QueueSync.

@bonzini
Copy link
Member Author

bonzini commented Oct 26, 2021

I'm going to use this PR to more or less rewrite the public API of #76.

@bonzini bonzini force-pushed the qss-tests branch 2 times, most recently from 2e68088 to b87f9f1 Compare October 26, 2021 15:39
@bonzini
Copy link
Member Author

bonzini commented Oct 26, 2021

BTW, the next decision should be whether to keep the methods in Queue, or force all users to use acquire() and go through QueueGuard. I think they should, but I am biased...

@bonzini bonzini force-pushed the qss-tests branch 2 times, most recently from 6ae196d to 65c816d Compare October 26, 2021 15:44
fn set_next_avail(&mut self, next_avail: u16);
}

/// Struct to maintain information and manipulate state of a virtio queue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some usage scenarios using QueueStateSync for multi-threading. The idea is that: the guard object returned by QueueStateT::lock() can ensure invariance of the queue object for a series of queue operations. So if we want to execute a single operation on a queue, we may call QueueStateSync::method() directly. On the other hand, if we want to execute a series of operations on a queue in atomic mode, we get a guard object by QueueStateSync::lock() and then execute the operations against the guard object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What operations do you need? We can add them to QueueSync if needed, but I'd prefer to keep the external API as small as possible and force you to write lock() every time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workload pattern may be abstracted as:

  1. a single thread wait for queue event and fetch available descriptor from the queue.
  2. the thread creates an asynchronous task for each available descriptor and dispatch async task to a multi-threading async engine.
  3. the async task does some IO operations to fulfill the service request.
  4. the async task eventually adds the descriptor to the used ring and notify guests.

On the other hand, much of our existing code is making use of the original queue interface, which is similar to QueueState and QueueStateSync. So it's appreciated to keep the flexibility for clients to choose among Queue, QueueState and QueueStateSync.
Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to start small, and only add new functionality with examples of how it improves real-world code without affecting performance negatively (as is often the case with too small critical sections).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are trying to avoid the "too small critical sections" by QueueStateT::lock() and QueueStateGuard. But it's true that we need to be careful to use these interfaces for IO intensive devices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you use lock()? It takes a &mut so in reality there's nothing to lock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are taking a mutable reference to Arc<Mutex<QueueState>> and lock the underlying QueueState object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since an Arc is by definition providing only shared references, how are you getting a mutable one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example test case to use the QueueStateSync struct:

    #[test]
    fn test_queue_state_sync() {
        let mut q = QueueStateSync::new(0x1000);
        let mut q2 = q.clone();
        let q3 = q.clone();
        let barrier = Arc::new(Barrier::new(3));
        let b2 = barrier.clone();
        let b3 = barrier.clone();

        let t1 = std::thread::spawn(move || {
            {
                let guard = q2.lock();
                assert_eq!(guard.ready(), false);
            }
            b2.wait();
            b2.wait();
            {
                let guard = q2.lock();
                assert_eq!(guard.ready(), true);
            }
        });

        let t2 = std::thread::spawn(move || {
            assert_eq!(q3.ready(), false);
            b3.wait();
            b3.wait();
            assert_eq!(q3.ready(), true);
        });

        barrier.wait();
        q.set_ready(true);
        barrier.wait();

        t1.join().unwrap();
        t2.join().unwrap();
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example. With the proposed API, q2 need not be mut anymore; q3 needs an explicit lock().

@bonzini bonzini changed the title Cover QueueStateSync in tests Replace QueueStateSync with QueueSync Oct 26, 2021
@bonzini bonzini force-pushed the qss-tests branch 2 times, most recently from fa7bae8 to efd1d22 Compare October 26, 2021 16:24
@sboeuf sboeuf self-requested a review November 3, 2021 09:42
Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is another major change in the way virtio queues are managed. At some point we have to stop breaking the API if we want projects to be able to use this crate (I know we try very hard to in Cloud Hypervisor).
The gain of changing the API is not entirely clear to me, and more than that, the part which binds the guest memory with the queue feels like one step too far for this interface.
I'd like to hear from other users (@alexandruag @slp) about this as well.

@bonzini
Copy link
Member Author

bonzini commented Nov 4, 2021

The gain in changing the API is mostly to allow using shared references more easily, and using the same standard interior mutability pattern (lock()) for both the Queue and the GuestAddressSpace. It should improve performance by avoiding small critical sections.

As an aside, using an enum to dispatch trait methods is a pattern that I've never seen in code other than rust-vmm (similar to rust-vmm/vm-memory@855306d). I'm not experienced enough with Rust to say whether it's an antipattern, but this series also removes it.

alexandruag and others added 6 commits November 4, 2021 11:28
We are using a `: GuestAddressSpace` trait bound for `AvailIter` and
`DescriptorChain`, but the objects are actually only interested in
the fact that `M` can dereference to a `GuestMemory` implementation.

Signed-off-by: Alexandru Agache <[email protected]>
The extremely short critical sections do not make it possible to enforce
any invariant on the virtqueue, making QueueStateSync more or less
useless.

Rip it out, the mutex abstraction will be introduced at the Queue
level in the next commits.

Signed-off-by: Paolo Bonzini <[email protected]>
QueueGuard encapsulates Queue during a sequence of accesses under
the same critical section.  A QueueGuard always uses the same
GuestMemory for all operations, and can be extended to be a single
critical section in the case of multithreaded VMMs.

Signed-off-by: Paolo Bonzini <[email protected]>
This will allow to reuse the same tests for QueueState and
QueueStateSync.

The trait is _not_ part of the public API, which consists exclusively
of acquire() and lock().  This is because with() is mostly useless for
the multithreaded case, where you have a &self but not a &mut self.

Signed-off-by: Paolo Bonzini <[email protected]>
Avoid constant repetition of "GuestMemoryMmap::<()>".

Signed-off-by: Paolo Bonzini <[email protected]>
Make all accesses to the underlying QueueState go through the with()
function.  The function which will take care of locking when testing
QueueSync.

Signed-off-by: Paolo Bonzini <[email protected]>
Similar to Queue, QueueSync associates a GuestAddressSpace to a
QueueState.  The QueueState however is wrapped with reference
counting and with a mutex.  All access to the QueueSync object
has to go through a QueueGuard, which takes care of locking the
mutex.

Signed-off-by: Paolo Bonzini <[email protected]>
The tests simply invoke the same code twice, once with a Queue
and once with a QueueSync.

test_queue_guard/test_queue_guard_sync are different, because
one uses lock() and one uses acquire().

Suggested-by: Laura Loghin <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
@jiangliu
Copy link
Member

jiangliu commented Nov 4, 2021

The gain in changing the API is mostly to allow using shared references more easily, and using the same standard interior mutability pattern (lock()) for both the Queue and the GuestAddressSpace.

As an aside, using an enum to dispatch trait methods is a pattern that I've never seen in code other than rust-vmm (similar to rust-vmm/vm-memory@855306d). I'm not experienced enough with Rust to say whether it's an antipattern, but this series also removes it.

It's my fault to introduce an enum for the lock() operation. Your previous patch set which removes the enum seems really great to me, and we should go that way.

But it's really hard for us to make use of the Queue or QueueSync objects with an associated GuestAddressSpace object. We hope we could directly access the QueueState and QueueStateSync, which simplifies our implementation and leaves room for us to optimize ArcSwap related performance issues.

@bonzini
Copy link
Member Author

bonzini commented Nov 4, 2021

If you cannot use Queue, you can use a Mutex<QueueState> and call QueueGuard::new in your own code. QueueGuard::new creates a combination of a GuestMemory and a MutexGuard<QueueState> so it should work for you.

But one point of this series is in fact to limit the cost of ArcSwap, because now you only pay the price of ArcSwap when you already have to lock a mutex (which is more expensive than getting an ArcSwap guard).

Signed-off-by: Paolo Bonzini <[email protected]>
@andreeaflorescu
Copy link
Member

This PR is another major change in the way virtio queues are managed. At some point we have to stop breaking the API if we want projects to be able to use this crate (I know we try very hard to in Cloud Hypervisor).

I understand your concern. We're now in the process of figuring out what we actually want to publish to crates.io as an initial interface, so for the upcoming 2-3 weeks until we manage to get the interface clean and tidy, we're still going to probably introduce breaking changes. Once it is published it should be much easier because you can consume the version from crates.io. Are you already using this in Cloud Hypervisor? Could it be possible to temporarily pause the updates until we get the interface in a more stable state?

@slp
Copy link
Collaborator

slp commented Nov 9, 2021

I've been playing with this PR in the context of vhost-user-backend and virtiofsd-rs. In general, I think it makes the API more elegant, but I also think it'd be nicer if it was more flexible and not force the user to always go through QueueGuard.

For instance, there might be cases in which the queue is going to be solely owned by an struct that wraps it, adding more functionality, and managing both locking and atomic_mem on its own (VringMutex in vhost-user-backend is an example of such struct).

I think one way to satisfy both a use case in which the queue is going to act as a completely independent entity and another in which is going to be wrapped inside another object, would be to make all methods from QueueState public. That way users can choose between relying on the additional protection implemented by QueueGuard or just go with the bare QueueState and do their own thing.

That said, I also agree with @sboeuf that we should find an API that we all find acceptable and commit to it for 1.0.0, leaving breaking changes for 2.0.0 (following semver).

@jiangliu
Copy link
Member

I've been playing with this PR in the context of vhost-user-backend and virtiofsd-rs. In general, I think it makes the API more elegant, but I also think it'd be nicer if it was more flexible and not force the user to always go through QueueGuard.

For instance, there might be cases in which the queue is going to be solely owned by an struct that wraps it, adding more functionality, and managing both locking and atomic_mem on its own (VringMutex in vhost-user-backend is an example of such struct).

I think one way to satisfy both a use case in which the queue is going to act as a completely independent entity and another in which is going to be wrapped inside another object, would be to make all methods from QueueState public. That way users can choose between relying on the additional protection implemented by QueueGuard or just go with the bare QueueState and do their own thing.

That said, I also agree with @sboeuf that we should find an API that we all find acceptable and commit to it for 1.0.0, leaving breaking changes for 2.0.0 (following semver).

Great to hear that you have done experiment with vhost-user-backend/virtiofsd, it's on my TODO list:)
So any chance to upstream the required changes?

@slp
Copy link
Collaborator

slp commented Nov 12, 2021

Great to hear that you have done experiment with vhost-user-backend/virtiofsd, it's on my TODO list:)
So any chance to upstream the required changes?

Sure, I've just pushed my test branches:

Those changes are made with the assumption that all methods in QueueState have been made public, allowing Vring to take direct ownership of it.

@bonzini
Copy link
Member Author

bonzini commented Nov 12, 2021

@slp, can you try passing the mem to get_queue()/get_queue_mut(), and return the QueueGuard from there (the declarations can return a Deref<Target = QueueState> and a DerefMut<Target = QueueState> respectively)? The nice part of the QueueGuard, in theory, would be that it ensures that all operations use the same GuestMemory consistently, without having to pass mem to many places.

I can separate the guard object into a QueueReadGuard and a QueueWriteGuard too, if it's useful.

Then we can even decide that Queue/QueueSync are unnecessary; but I'd really prefer if all users that need to access memory did go through QueueGuard's API with implicit GuestMemory.

@slp
Copy link
Collaborator

slp commented Nov 12, 2021

@bonzini In the context of vhost_user_backend::Vring, I could, but that would mean holding a second lock, despite knowing that I'm holding the one guarding Vring, and that this one is sole owner of the queue. In addition of more expensive, it also introduces the unnecessary hurdle of having to coordinate both locks.

In a more general level, I'm starting to suspect the reason why we still haven't found an API for virtio-queue that satisfies everyone is because we're trying to push too much policy into it (and I'm talking in general, not just about this PR). This can be quite a problem specially because Queue frequently plays a part in critical sections. I think that, even if we provide a safe interface to it, we should always allow users to make direct use of Queue and QueueState, in case it makes sense for their particular use case.

@bonzini
Copy link
Member Author

bonzini commented Nov 12, 2021

I could, but that would mean holding a second lock

QueueGuard is flexible and it does not have to be tied to a lock guard. It can be tied to a simple &mut QueueState, in which case all it does is tie a GuestMemory and a QueueState for the lifetime of the QueueGuard.

In a more general level, I'm starting to suspect the reason why we still haven't found an API for virtio-queue that satisfies everyone is because we're trying to push too much policy into it

Right, so I'm trying to understand if QueueGuard alone provides the right boundary between not-too-much-policy and consistent access to the GuestAddressSpace.

@jiangliu
Copy link
Member

Great to hear that you have done experiment with vhost-user-backend/virtiofsd, it's on my TODO list:)
So any chance to upstream the required changes?

Sure, I've just pushed my test branches:

Those changes are made with the assumption that all methods in QueueState have been made public, allowing Vring to take direct ownership of it.

That's what we have expected for a long time:)
Adding policies over QueueState helps some usage cases, but there are still usage cases where consumers need/want raw access to the QueueState object. So it would be great to provide two sets of interfaces: advanced and raw APIs. Actually it has blocked us for a while to migrate to the upstream virtio-queue by only providing the advanced APIs.

@slp
Copy link
Collaborator

slp commented Nov 17, 2021

I could, but that would mean holding a second lock

QueueGuard is flexible and it does not have to be tied to a lock guard. It can be tied to a simple &mut QueueState, in which case all it does is tie a GuestMemory and a QueueState for the lifetime of the QueueGuard.

@bonzini I see your point, and certainly the interface looks better this way. We still have one problem though, which I think is the same @jiangliu is facing. Let me illustrate it with an example (I've simplified the code for readability):

pub struct VringState<M: GuestAddressSpace = GuestMemoryAtomic<GuestMemoryMmap>> {
    queue: Queue<M>,
    call: Option<EventFd>,
}

impl<M: GuestAddressSpace> VringState<M> {
    /// Notify the vhost-user master that used descriptors have been put into the used queue.
    pub fn signal_used_queue(&self) -> io::Result<()> {
        if let Some(call) = self.call.as_ref() {
            call.write(1)
        } else {
            Ok(())
        }
    }
}

In this scenario, if you acquire a QueueGuard with VringState::queue.acquire(), you lose the ability to call VringState::signal_used_queue, as you're already borrowing from VringState. This implies having to drop QueueGuard (and thus, its GuestAddressSpace reference) to signal the queue, which has a performance penalty on the critical section of processing a virtqueue.

So far the only solution I've been able to come up with is being able to directly have QueueState as a field of VringState, which would allow then, if desired, to implement a VringGuard providing similar semantics as QueueGuard, tying its lifetime with a GuestMemory. Perhaps there is a better solution I haven't come up with?

@bonzini
Copy link
Member Author

bonzini commented Nov 17, 2021

you're already borrowing from VringState

IIUC that's because you're already borrowing mutably when building the QueueGuard:

let queue = vring_state.get_queue_mut();
...
vring_state.signal_used_queue().unwrap();

For the specific case of signal_used_queue(), which can be a method on &self, would it work to use a RefCell<QueueState> in Vring? For the multithreaded case, Mutex<QueueState> should already work because lock() is a method on &self (unlike acquire() which is on &mut self, hence the different names)

So far the only solution I've been able to come up with is being able to directly have QueueState as a field of VringState, which would allow then, if desired, to implement a VringGuard providing similar semantics as QueueGuard, tying its lifetime with a GuestMemory

Yes, that would be a valid solution too. Not a great one because it pushes the responsibility out of vm-virtio, but I guess it's okay.

@slp
Copy link
Collaborator

slp commented Nov 17, 2021

For the specific case of signal_used_queue(), which can be a method on &self, would it work to use a RefCell<QueueState> in Vring? For the multithreaded case, Mutex<QueueState> should already work because lock() is a method on &self (unlike acquire() which is on &mut self, hence the different names)

Hm... I guess there might be a way to make it work with RefCell, but I'm not sure if forcing users to rely on it is a good idea. It's too easy to come up with code that compiles but fails at runtime in some code path.

Yes, that would be a valid solution too. Not a great one because it pushes the responsibility out of vm-virtio, but I guess it's okay.

We can give users both options, documenting that the preferred one is consuming Queue and/or QueueSync.

@jiangliu @alexandruag what do you think?

@bonzini
Copy link
Member Author

bonzini commented Nov 17, 2021

Maybe signal_used_queue() needs to be a trait that QueueGuard implements. Using some kind of smart pointer it should be possible to create a QueueGuard instance that holds a &mut VringState...

@jiangliu
Copy link
Member

For the specific case of signal_used_queue(), which can be a method on &self, would it work to use a RefCell<QueueState> in Vring? For the multithreaded case, Mutex<QueueState> should already work because lock() is a method on &self (unlike acquire() which is on &mut self, hence the different names)

Hm... I guess there might be a way to make it work with RefCell, but I'm not sure if forcing users to rely on it is a good idea. It's too easy to come up with code that compiles but fails at runtime in some code path.

Yes, that would be a valid solution too. Not a great one because it pushes the responsibility out of vm-virtio, but I guess it's okay.

We can give users both options, documenting that the preferred one is consuming Queue and/or QueueSync.

@jiangliu @alexandruag what do you think?

It would be great to provide raw access to QueueState. After all it's needed for vm snapshot/upgrading/migration.
I feel it's ok for virtio-queue to provide a set of low-level APIs and a set of high-level APIs.

@bonzini
Copy link
Member Author

bonzini commented Nov 18, 2021

I agree that it's okay to give access to QueueState, but it's also important that QueueGuard be usable.

lauralt added a commit to alexandruag/vm-virtio that referenced this pull request Nov 18, 2021
There were a couple of changes in
rust-vmm#76 that are not
yet tested. PR that covers that functionality is in review
here: rust-vmm#111.
Until that one is merged, the coverage needs to be decreased.

Signed-off-by: Laura Loghin <[email protected]>
bonzini pushed a commit that referenced this pull request Nov 18, 2021
There were a couple of changes in
#76 that are not
yet tested. PR that covers that functionality is in review
here: #111.
Until that one is merged, the coverage needs to be decreased.

Signed-off-by: Laura Loghin <[email protected]>
@jiangliu
Copy link
Member

Hi @bonzini , we have factored the QueueGuard recently and convert it from an enum into a trait. Please refer to
https://github.com/rust-vmm/vm-virtio/blob/main/crates/virtio-queue/src/lib.rs#L80 for lastest code.

Current implementation works for Cloud Hypervisor, virtiofsd, nydus and dragonball, and there are proposed non-breaking enhancement for Firecracker. So is it feasible to publish v0.1 recently and postpone this PR for v0.2?

@bonzini
Copy link
Member Author

bonzini commented Dec 21, 2021

Yes, I agree this is the best that can be done without breaking the API.

@bonzini bonzini closed this Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants