Skip to content

Conversation

@DrTobe
Copy link
Contributor

@DrTobe DrTobe commented Jul 23, 2025

In #3213, I had already experienced wakeup issues with the RTC-based low-power feature. Since we were running into that issue again, I have done some debugging. With a few more trace! calls in the corresponding code sections, I could observe that the on_wakeup_irq handler is not called for a considerable amount of WFE calls. The timings suggest that many of the WFE calls do not trigger the configured STOP mode at all. Instead, program execution seems to go on immediately. I have left a pretty long comment about that in the code.

In this PR, I can only offer a workaround which fixes the issue but I do not understand the root cause of the problem, why the WFE calls do not trigger the STOP mode in the first place. But from my perspective, this workaround fixes a large issue with a tiny overhead, so I would definitely like to see this merged.

@0e4ef622
Copy link
Contributor

0e4ef622 commented Jul 23, 2025

The timings suggest that many of the WFE calls do not trigger the configured STOP mode at all. Instead, program execution seems to go on immediately.

The WFE instruction doesn't wait if the Event Register is set. Could that explain the behavior you're seeing? From the Armv7-M reference:

The WaitForEvent() pseudocode procedure optionally suspends execution until a WFE wakeup event or reset
occurs, or until some earlier time if the implementation chooses. It is IMPLEMENTATION DEFINED whether restarting
execution after the period of suspension causes a ClearEventRegister() to occur.

This means after a WFE instruction finishes, the Event Register can still be set which means the next iteration in the loop won't wait.

@DrTobe
Copy link
Contributor Author

DrTobe commented Jul 23, 2025

In one test, I have observed that every other WFE call fails. So while your suggestion might be related, it does not explain why every other WFE then succeeds. Then, only a single WFE call would ever succeed.

The WaitForEvent() pseudocode procedure optionally suspends execution until a WFE wakeup event or reset
occurs, or until some earlier time if the implementation chooses.

"optionally" and "some earlier time if the implementation chooses" really does not sound good to me :)

Looking at the problem from another perspective, we could also consider calling time_driver.resume_time() after WFE instead of in the interrupt handler a real fix because the WFE is generally allowed to wakeup due to non-interrupt events, if I understand that correctly. For these cases, we need to resume the normal time handling after WFE anyway, not only in the interrupt handler(s).

@0e4ef622
Copy link
Contributor

Then, only a single WFE call would ever succeed.

The relevent pseudocode for WFE looks like this:

if EventRegistered() then
  ClearEventRegister();
else
  WaitForEvent();
  // Implementation defined whether the Event Register is still set or not here

This seems consistent with the every other WFE call fails behavior you see, since every second loop the Event Register is set.

For these cases, we need to resume the normal time handling after WFE anyway, not only in the interrupt handler(s).

I agree, generally the code should be made to work if the WFEs were replaced with NOPs.

@DrTobe
Copy link
Contributor Author

DrTobe commented Jul 24, 2025

This seems consistent with the every other WFE call fails behavior you see, since every second loop the Event Register is set.

Regarding the pseudocode you shared, that totally makes sense!

I agree, generally the code should be made to work if the WFEs were replaced with NOPs.

Yes, just to be sure.

So it seems we start to properly understand the issue. What about the following two changes:

  1. The interrupt handler should ClearEventRegister, but it should not resume_time
  2. After WFE, we resume_time

Do you happen to know how to ClearEventRegister?

@Dirbaio
Copy link
Member

Dirbaio commented Jul 24, 2025

the way to clear the event register is doing "SEV; WFE".

However we should never do this, it's guaranteed to cause issues. If some task got woken before clearing, the wake will get lost. Note not all wakes come from interrupts, it's possible for one task to wake another with e.g. signals, mutexes, channels. These wakes do just a SEV, clearing the event flag would lose them.

In these cases it's expected that the next WFE doesn't sleep. The executor really needs to do another loop of polling tasks right now, because there's more work to do. So the issue you describe "The timings suggest that many of the WFE calls do not trigger the configured STOP mode at all. Instead, program execution seems to go on immediately." is not necessarily an issue.

What we can do is turn off SEVONPEND. This'll make interrupts not set the event flag (unless the irq handler does a SEV). This will eliminate "WFE returns immediately" in some cases where it's not needed, but there's still cases where it is.


So, the issue IMO is the executor mainloop is wrong, it assumes the RTC WKUP irq will be called when running WFE which is not necessarily true since it is possible (and expected) for WFE to not enter stop at all.

So perhaps the fix is to make the mainloop manage both stopping and starting time:

loop {
    executor.inner.poll();
    self.configure_pwr();
    interrupt::free(|| {
        stop_time();
        wfe();
        resume_time();
    });
}

this is sort of similar to your propoesd fix, except we also change the RTC IRQ handler to not resume time. Maybe it can just do a SEV. Maybe it can be removed entirely?

The interrupt::free is so that we're guaranteed to resume time before interrupt handlers run. Might be needed if the interrupt handler runs code that expects high-precision timing, perhaps in an InterruptExecutor.

@DrTobe
Copy link
Contributor Author

DrTobe commented Jul 24, 2025

So, the issue IMO is the executor mainloop is wrong, it assumes the RTC WKUP irq will be called when running WFE which is not necessarily true since it is possible (and expected) for WFE to not enter stop at all.

I came to the same conclusion.

I will try to remove the resume_time from the IRQs, check your SEVONPEND suggestion, evaluate if it works in an interrupt-free block.

DrTobe added 2 commits July 24, 2025 14:47
…h are not required anymore because the resume_time happens in the main thread now
@0e4ef622
Copy link
Contributor

If some task got woken before clearing, the wake will get lost.

I don't follow, how would it get lost? The main loop would look like this:

loop {
    executor.inner.poll()
    asm!("wfe");
    asm!("sev");
    asm!("wfe");
}

If another SEV happens anywhere in here, poll() will always get called afterwards. Although there might be an argument against using SEV; WFE on multi-core chips since SEV wakes all of them and they could enter a loop where one core wakes the other and vice versa.

What we can do is turn off SEVONPEND. This'll make interrupts not set the event flag

SEVONPEND is already disabled by default. SEVONPEND is about letting disabled interrupts set the event register, you can't prevent enabled interrupts from setting the event register. You don't want to disable that anyways, otherwise you could lose interrupts that happen after poll() but before WFE.

@Dirbaio
Copy link
Member

Dirbaio commented Jul 24, 2025

The main loop would look like this:

if you do it that way the "sev, wfe" has no effect. To guarantee the WFE enters sleep you have to first clear it, then do the real WFE.

loop {
    executor.inner.poll()
    // <---- INTERRUPT HERE
    asm!("sev; wfe");  // clear
    asm!("wfe"); // actual wfe that's guaranteed to sleep
}

If an interrupt happens in the marked place, its SEV is lost.

@0e4ef622
Copy link
Contributor

if you do it that way the "sev, wfe" has no effect.

It clears the event register, which can still be set after WFE wakes up from sleep (see pseudocode above).

@DrTobe
Copy link
Contributor Author

DrTobe commented Jul 24, 2025

  1. Stopping and resuming time is only done from the main loop now.
  2. I can confirm that SEVONPEND has not been set before. I do not dare to change that because this would affect overall interrupt behavior, and we wanted to have it disabled anyway. The wakeup interrupts trigger the WFE wakeup as desired.
  3. Since SEVONPEND is not set, we can not do the pause-wfe-resume sequence in an interrupt-free context. If an interrupt-handler needs the resumed time, this is not guaranteed now. But as far as I can see, this was not guaranteed before either.

@Dirbaio
Copy link
Member

Dirbaio commented Jul 24, 2025

if you do it that way the "sev, wfe" has no effect.

It clears the event register, which can still be set after WFE wakes up from sleep (see pseudocode above).

you mean because of the "implementation defined" thing? Not sure where that's from, in ARM M-profile the WFE always clears the event.

@0e4ef622
Copy link
Contributor

0e4ef622 commented Jul 24, 2025

Not sure where that's from, in ARM M-profile the WFE always clears the event.

The Armv7-M Architecture Reference Manual, section B1.5.18 (Wait For Event and Send Event). I've definitely seen this behavior on the nRF52840, and it sounds like the same is happening here. The key is that WFE only clears the event if it is already set, if it's not set then it goes to sleep, a wakeup event sets the event register and continues execution, and the WFE might not clear the event after waking up.

EDIT: Couldn't find the same "implementation defined" language in the Armv8-M reference manual but the pseudocode is the same, honestly it seems more like the event register being set after WFE wakes up from sleep is the expected behavior.

@Dirbaio Dirbaio added this pull request to the merge queue Jul 24, 2025
@DrTobe
Copy link
Contributor Author

DrTobe commented Jul 24, 2025

Guys, I think you are talking about two different goals:

  1. @Dirbaio talks about clearing the event register before calling the "real" WFE to ensure that the following WFE actually triggers a STOP mode. I guess we all agree that this can not be achieved due to race conditions.
  2. @0e4ef622 talks about clearing the event register after calling the "real" WFE to ensure that it is definitely cleared, no matter if it was cleared by the previous WFE or not. If I am not mistaken, this seems to be possible:

The main loop would look like this:

loop {
    executor.inner.poll()
                 // here, the event register may or may not be set, so ...
    asm!("wfe"); // this may a) return immediately and clear the event register or b) trigger stop mode
                 // again, the event register may or may not be set
    asm!("sev"); // now, it is definitely set
    asm!("wfe"); // so this should return immediately and clear the event register
}

If this works as expected, this may be a good improvement for the current implementation because in many cases, it will prevent the doubled WFEs we are currently experiencing.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 24, 2025
@Dirbaio
Copy link
Member

Dirbaio commented Jul 24, 2025

seems this breaks the stop mode test on H5.

about clearing the event register after calling the "real" WFE to ensure that it is definitely cleared, no matter if it was cleared by the previous WFE or not

yes, I understand that. I thought that even if it's "implementation defined" whether the event is cleared on wakeup, implementations do it in practice. Is this really not the case?

If so, not sure what the fix is. Forcing clearing by doing "sev; wfe" will cause endless polling on multicores such as rp2040 as mentioned above...

@DrTobe
Copy link
Contributor Author

DrTobe commented Jul 28, 2025

seems this breaks the stop mode test on H5.

Where is that test? Can I reproduce this somehow?

@DrTobe
Copy link
Contributor Author

DrTobe commented Aug 18, 2025

@Dirbaio can you tell me what needs to be fixed or how I can reproduce the failing test?

If this can not be fixed quickly, shall we merge the workaround that this thread/PR started with?

@0e4ef622
Copy link
Contributor

@DrTobe The logs are in the CI run from the merge queue. I'm guessing the test code is here.

@DrTobe
Copy link
Contributor Author

DrTobe commented Aug 25, 2025

@0e4ef622 Thanks, but how can I find the logs for a specific commit or PR?

I will get an H5 and investigate the issue.

@0e4ef622
Copy link
Contributor

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants