Skip to content

Conversation

gemarcano
Copy link

Describe the PR
hw_endpoint_lock_update is unimplemented for the rp2040 port. This PR takes a stab at implementing it. Without it, if the USB port is disconnected, say with tud_disconnect, a race condition can happen where a transfer is in progress and the USB IRQ can fire, calling reset_non_control_endpoints and leaving the ongoing transfer in a bad state, as it tries to access endpoint data structures that are now zero'd.

While debugging that main issue, I also found some cases of the rp2040 port assuming that an endpoint data structure exists for all endpoints, which is not the case as endpoint 0 has none. I've also included a few simple fixes for these.

Additional context
I've documented a lot of my debugging attempts in this discussion: #1764

I'm not super familiar with the tinyusb coding style, and I'm not sure my lock implementation is right (very naive, likely bad reference counting, might need atomic_int instead of int?).

@Slion
Copy link
Contributor

Slion commented Jul 12, 2024

This did not work so well for me as already mentioned there: #1764 (comment)
Still, very interesting stuff, thanks for sharing.

@gemarcano gemarcano force-pushed the rp2040_irq_lock_fix branch from 5ee7ae9 to dcf472c Compare July 16, 2024 17:49
@Slion
Copy link
Contributor

Slion commented Oct 7, 2024

A lot has change in my project since I first came across this issue and now somehow even with the workarounds I had built-in I found a use case where I would always get PANIC ep 0 out was already available

So I eventually decided to checkout your branch gemarcano:rp2040_irq_lock_fix and sure enough it worked 🥳

Then I decided to remove the workaround I had in place to take care of the worst cases. It's basically filtering out some HID reports causing multiple USB remounts. Without it still did not panic but the USB connection seems to be broken somehow, not responding. I'll have to leave that workaround for now but it looks like I'll keep using your patch for the time being.

@Slion
Copy link
Contributor

Slion commented Oct 7, 2024

As I was trying to rebase that on the tip of master I realised the tip of master works just fine too. Same symptom as described above. So I'm afraid I can't tell if that patch helps in any way after all. What's sure is that it works just as well as the tip of master apparently. I could rebase it with no conflict too.

@Slion
Copy link
Contributor

Slion commented Oct 19, 2024

Tip of master still gave me that Panic after all, so I've started trying that patch again. Sure enough, I have not seen that Panic since. A couple of crashes and reboots maybe but possibly unrelated.

@gemarcano gemarcano force-pushed the rp2040_irq_lock_fix branch 2 times, most recently from 5441fb3 to 2318575 Compare November 13, 2024 06:16
@gemarcano gemarcano force-pushed the rp2040_irq_lock_fix branch from 1eb62ba to f5b2459 Compare December 6, 2024 20:25
@gemarcano
Copy link
Author

There were some recent changes in the rp2040 code adding ISO endpoints, so I had to tweak this PR a little bit to carry over the null-pointer dereference checks. The main change is that ep->configured = true; in dcd_rp2040.c is now set in the hw_endpoint_init function so it is set when either type of endpoint is initialized. This is similar to what happens in hcd_rp2040.c.

Thinking about it more, I should probably split the locking implementation from the null-pointer deref checks/fixes. Thoughts?

@FearLabsAudio
Copy link

@gemarcano The last commit ca0d53e Is working well for me.

 - Implemented a critical section for the rp2040 port, which now
   prevents an IRQ from clearing the endpoint data structures while a
   transfer was in progress, which would then lead to a null pointer
   derefence.
 - Fixed a null-pointer dereference regarding ep->endpoint_control for
   endpoint 0, which does not have a control register.
@RetiredWizard
Copy link

RetiredWizard commented Sep 12, 2025

I have been working on building CircuitPython for the Olimex RP2350pc and have modified the CircuitPython core to allow an rp2350 build to use the native usb pins for usb host instead of the normal CircuitPython device usage. I believe this is the first board attempting to use tinyUSB/RP2040 host with CircuitPython (non-PIO).

I seem to be running into the issue this PR is attempting to address as my initial build immediately would panic in rp2040_usb.c with "ep xx was already available". When I applied the changes from this PR the panic moved to hcd_rp2040.c with "Seq Error: [0] = 0xnnnn [1] = 0xnnnn". Which makes me think that there is still an unguarded transfer being initiated somewhere.

I don't know if it's any help but I also discovered that simply commenting out the original panic statement in rp2040_usb.c (panic("ep %02X was already available", ep->ep_addr);) and not applying this PR left me with what appears to be a fully functioning firmware. I exercised it for about 30 minutes without issue.

I was also able to largely mitigate the original panic by introducing "wait" loops in the CircuitPython core which effectively turned the usb (mouse) reads blocking. CircuitPython was using a short timeout period with the read transactions to provide a non-blocking read. This results in lots of transactions being started and then aborted. Adding wait loops to the core avoids the panic, however for reasons I'm not entirely sure (but I suspect it has to do with a callback routine not be used for transfer aborts) the wait ends up blocking any other USB transfers effectively disabling USB keyboard input when mouse reads are being performed.

I'll continue to poke around and see if I can identify any unguarded transfers but the USB protocol is all a little out of my depth. I do suspect/hope that this PR is on the right path though.

Edit: After 2-3 hours of exercising (playing minesweeper) the firmware that had the panic statement commented out finally crashed into safe mode, as I probably should have expected.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants