-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Add critsec to rp2040 xfer, check endpoint status #2474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ae077b0
to
bbf9ea0
Compare
bbf9ea0
to
f2f0d25
Compare
f2f0d25
to
5ee7ae9
Compare
This did not work so well for me as already mentioned there: #1764 (comment) |
5ee7ae9
to
dcf472c
Compare
A lot has change in my project since I first came across this issue and now somehow even with the workarounds I had built-in I found a use case where I would always get So I eventually decided to checkout your branch gemarcano:rp2040_irq_lock_fix and sure enough it worked 🥳 Then I decided to remove the workaround I had in place to take care of the worst cases. It's basically filtering out some HID reports causing multiple USB remounts. Without it still did not panic but the USB connection seems to be broken somehow, not responding. I'll have to leave that workaround for now but it looks like I'll keep using your patch for the time being. |
As I was trying to rebase that on the tip of master I realised the tip of master works just fine too. Same symptom as described above. So I'm afraid I can't tell if that patch helps in any way after all. What's sure is that it works just as well as the tip of master apparently. I could rebase it with no conflict too. |
Tip of master still gave me that Panic after all, so I've started trying that patch again. Sure enough, I have not seen that Panic since. A couple of crashes and reboots maybe but possibly unrelated. |
5441fb3
to
2318575
Compare
2318575
to
1eb62ba
Compare
1eb62ba
to
f5b2459
Compare
There were some recent changes in the rp2040 code adding ISO endpoints, so I had to tweak this PR a little bit to carry over the null-pointer dereference checks. The main change is that Thinking about it more, I should probably split the locking implementation from the null-pointer deref checks/fixes. Thoughts? |
f5b2459
to
ca0d53e
Compare
@gemarcano The last commit ca0d53e Is working well for me. |
- Implemented a critical section for the rp2040 port, which now prevents an IRQ from clearing the endpoint data structures while a transfer was in progress, which would then lead to a null pointer derefence. - Fixed a null-pointer dereference regarding ep->endpoint_control for endpoint 0, which does not have a control register.
ca0d53e
to
056520e
Compare
I have been working on building CircuitPython for the Olimex RP2350pc and have modified the CircuitPython core to allow an rp2350 build to use the native usb pins for usb host instead of the normal CircuitPython device usage. I believe this is the first board attempting to use tinyUSB/RP2040 host with CircuitPython (non-PIO). I seem to be running into the issue this PR is attempting to address as my initial build immediately would panic in rp2040_usb.c with "ep xx was already available". When I applied the changes from this PR the panic moved to hcd_rp2040.c with "Seq Error: [0] = 0xnnnn [1] = 0xnnnn". Which makes me think that there is still an unguarded transfer being initiated somewhere. I don't know if it's any help but I also discovered that simply commenting out the original panic statement in rp2040_usb.c ( I was also able to largely mitigate the original panic by introducing "wait" loops in the CircuitPython core which effectively turned the usb (mouse) reads blocking. CircuitPython was using a short timeout period with the read transactions to provide a non-blocking read. This results in lots of transactions being started and then aborted. Adding wait loops to the core avoids the panic, however for reasons I'm not entirely sure (but I suspect it has to do with a callback routine not be used for transfer aborts) the wait ends up blocking any other USB transfers effectively disabling USB keyboard input when mouse reads are being performed. I'll continue to poke around and see if I can identify any unguarded transfers but the USB protocol is all a little out of my depth. I do suspect/hope that this PR is on the right path though. Edit: After 2-3 hours of exercising (playing minesweeper) the firmware that had the panic statement commented out finally crashed into safe mode, as I probably should have expected..... |
Describe the PR
hw_endpoint_lock_update
is unimplemented for the rp2040 port. This PR takes a stab at implementing it. Without it, if the USB port is disconnected, say withtud_disconnect
, a race condition can happen where a transfer is in progress and the USB IRQ can fire, callingreset_non_control_endpoints
and leaving the ongoing transfer in a bad state, as it tries to access endpoint data structures that are now zero'd.While debugging that main issue, I also found some cases of the rp2040 port assuming that an endpoint data structure exists for all endpoints, which is not the case as endpoint 0 has none. I've also included a few simple fixes for these.
Additional context
I've documented a lot of my debugging attempts in this discussion: #1764
I'm not super familiar with the tinyusb coding style, and I'm not sure my lock implementation is right (very naive, likely bad reference counting, might need atomic_int instead of int?).