-
Notifications
You must be signed in to change notification settings - Fork 1.3k
ohci: Implement support for d-cache operations #3248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The initial motivation for doing this is to remove the `used` flag in the TD. If we use this flag, we end up being required to read from TDs that the OHCI controller might be modifying (i.e. the OHCI controller logically owns the TD). This happens when we try to allocate a new, empty TD while the OHCI host controller is working on a transfer. Move the `used` flag to `gtd_extra_data_t`. This data is only used by the CPU, and the OHCI controller never accesses it. The existing allocation method for TDs does *not* put an empty TD onto each ED (i.e it does *not* do what is shown in Figure 5-6 of the OHCI specification). Instead, the NextTD field of the last TD is set to 0. The TailP field of the ED is also set to 0. This works in many cases. However, this implementation means that the CPU may end up trying to write to the NextTD field of an in-progress transfer while the OHCI host controller logically owns it. Change the implementation to use an empty TD, as suggested by the specification, for endpoints other than EP0. This avoids the above issue. It is not necessary to make the change for EP0 because only at most one TD can ever be pending at a time. The above change should also remove the need for the stall workaround. In the future, we want to modify the code to access EDs through an uncached mapping. Because uncached mappings are slow, we want to access EDs as little as possible. Currently, when a TD completes, we access an ED in order to figure out the device address and endpoint number of the TD which was completed. Because moving `used` to `gtd_extra_data_t` necessitates expanding it, we have enough room to also store the device address and endpoint number of the TD. This patch does so. With the above two changes, we no longer need to access an ED when a TD completes. Also remove the `index` field from TDs as it is no longer necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your PR. I have a few questions regarding the changes.
TU_ATTR_WEAK bool hcd_dcache_clean(void const* addr, uint32_t data_size) { (void) addr; (void) data_size; return true; } | ||
TU_ATTR_WEAK bool hcd_dcache_invalidate(void const* addr, uint32_t data_size) { (void) addr; (void) data_size; return true; } | ||
#ifndef hcd_dcache_uncached | ||
#define hcd_dcache_uncached(x) (x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this function does specifically ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my application, I define it like this:
#define hcd_dcache_uncached(x) (*(volatile __typeof__(x) *)((uintptr_t)(&(x)) + MAGIC_OFFSET))
In words, it takes the address of x
, adds a specific offset to it, and then accesses this new address (the address with the offset added) with the same type as the original x
.
This is why it is a macro and not a function -- the return type is generic and always matches the type of x
.
The reason this is able to work is because of my application's platform setup, which is not shown anywhere in the TinyUSB code. The SoC I am using has a MMU. As part of setting up the MMU, I configure:
Virtual Address | Physical Address | Attributes |
---|---|---|
0xCxxxxxxx | 0xCxxxxxxx | Cacheable |
0xDxxxxxxx | 0xCxxxxxxx | Not cacheable |
System RAM is at physical address 0xCxxxxxxx, and the system RAM can be accessed from two different virtual address ranges. Most data (including OHCI TDs) is accessed using the cacheable range at VA 0xCxxxxxxx. In order to pass data between the CPU and the OHCI controller, manual cache operations are used.
However, OHCI EDs are accessed using the uncacheable range at VA 0xDxxxxxxx. This accesses the same physical memory, but it bypasses the cache. This means that I do not have to use manual cache operations on EDs. I can access individual words inside an ED without worrying that the other words in the same ED will get cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we can use the following virt to phys to do the conversion. You can implement it in the app in the way to check the range to do the offset correctly ? That would make thing consistent
TU_ATTR_WEAK extern void* tusb_app_virt_to_phys(void *virt_addr);
TU_ATTR_WEAK extern void* tusb_app_phys_to_virt(void *phys_addr);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those functions have slightly different semantics, especially since TDs do not go through the alternate uncached memory address. TDs use the normal memory mapping and use explicit cache operations.
It could be possible to use an approach like the following:
- Modify these functions to take a parameter specifying the type of object being translated (TD, ED, HCCA)
- Add compiler directives to place uncached data in a different linker section (placing it at 0xDxxxxxxx)
Would this be a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just realized that this should be possible to implement without adding a type parameter.
Instead, the reason I did not originally implement it this way was because I would have to split up ohci_data_t
into separate cacheable and uncacheable parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it, but hcd_dcache_uncached()
is too specific to your set-up. It makes the code not as readible as I would like. Can you centralize the uncached in the get_ed() function only ?
TU_VERIFY_STATIC( sizeof(ohci_gtd_t) == 16, "size is not correct" ); | ||
TU_VERIFY_STATIC( sizeof(ohci_gtd_t) == CFG_TUH_MEM_DCACHE_ENABLE ? CFG_TUH_MEM_DCACHE_LINE_SIZE : 16, "size is not correct" ); | ||
|
||
typedef union { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we need to have these as separated typedef ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These separate typedefs are used because I want to be able to explicitly and manually read and write each word in the ED. This is because uncached memory access is slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use following typedef to keep the code least modified
// Word 0
union {
struct {
uint32_t dev_addr : 7;
uint32_t ep_number : 4;
uint32_t pid : 2;
uint32_t speed : 1;
uint32_t skip : 1;
uint32_t is_iso : 1;
uint32_t max_packet_size : 11;
// HCD: make use of 5 reserved bits
uint32_t used : 1;
uint32_t is_interrupt_xfer : 1;
uint32_t is_stalled : 1;
uint32_t : 2;
};
uint32_t word0;
}
|
||
typedef struct { | ||
uint16_t expected_bytes; // up to 8192 bytes so max is 13 bits | ||
uint8_t dev_addr : 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need this here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid having to access an ED in done_queue_isr
, because I am in general trying to access an ED as infrequently as possible, and I noticed that there were two reasons to access an ED there:
- To apply a workaround for a STALL situation
- To obtain the
dev_addr
andep_addr
for callinghcd_event_xfer_complete
For reason 1, I believe the STALL workaround is not necessary if EDs have an empty TD like the OHCI specification suggests.
After making that change to TD allocation, I noticed that reason 2 can be eliminated by moving those fields out of the ED to somewhere else.
As an alternative, the dev_addr
and ep_addr
fields can be packed into unused bits in the TD word 0. However, this does not work for isochronous TDs (not enough free bits).
Using TD word 0 to store those fields and then packing used
alongside expected_bytes
can save some memory. Do you want me to use this approach instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you quote the section where OHCI specs suggest that. I wrote this long time ago and forgot most of the thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements manual d-cache support for the OHCI host controller driver to prevent data corruption when the CPU and OHCI controller access shared memory structures. The changes add cache-conscious data structure alignment, explicit cache operations, and modify struct access patterns to work with cache line boundaries.
Key changes:
- Adds conditional alignment of OHCI transfer descriptors to cache line boundaries when d-cache is enabled
- Introduces weak cache operation functions (clean/invalidate) and uncached access macros
- Refactors struct field access to use union types and cache-aware operations throughout the driver
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
src/portable/ohci/ohci.h | Adds cache line alignment for TDs, introduces union types for ED fields, and extensive documentation on cache coherency design |
src/portable/ohci/ohci.c | Implements cache operations throughout the driver, refactors struct access patterns, and updates transfer descriptor management |
This code is written very carefully to always use an uncached view of memory to read/write EDs. An uncached view must *always* be used, or else cache behavior can corrupt the ED. As part of this change, combine access into as few word-sized accesses as possible. This makes the code perform better. Doing this involves giving type names to the bitfields that make up the ED's data words.
8ead4e4
to
b9378eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check below feedback
|
||
typedef struct { | ||
uint16_t expected_bytes; // up to 8192 bytes so max is 13 bits | ||
uint8_t dev_addr : 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you quote the section where OHCI specs suggest that. I wrote this long time ago and forgot most of the thing.
TU_ATTR_WEAK bool hcd_dcache_clean(void const* addr, uint32_t data_size) { (void) addr; (void) data_size; return true; } | ||
TU_ATTR_WEAK bool hcd_dcache_invalidate(void const* addr, uint32_t data_size) { (void) addr; (void) data_size; return true; } | ||
#ifndef hcd_dcache_uncached | ||
#define hcd_dcache_uncached(x) (x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it, but hcd_dcache_uncached()
is too specific to your set-up. It makes the code not as readible as I would like. Can you centralize the uncached in the get_ed() function only ?
This will likely fix #2687 too |
@ArcaneNibble @Ryzee119 yeah, I agree we should follow the specs suggestion. I was trying to save SRAM footprint, but it cause more troubles than expected. I am in middle of other works, will review this asap, please be patient meanwhile |
This PR implements manual d-cache support for the OHCI host controller driver, using the approach described in #3208
This has been tested with only the USB HID driver on the TI AM1808