Skip to content

Conversation

@bengsparks
Copy link

@bengsparks bengsparks commented Sep 9, 2025

This allows a job queue controller to detect whether a job has started / is being worked on.
InProgress is also the only ACK message that does not have a corresponding advisory message, so this fills the gap.

Signed-off-by: Benjamin Sparks [email protected]

This allows a job queue controller to detect whether a job has started / is being worked on.
InProgress is also the only ACK message that does not have a corresponding advisory message, so this fills the gap.

Signed-off-by: Benjamin Sparks <[email protected]>
@bengsparks bengsparks requested a review from a team as a code owner September 9, 2025 12:31
@ripienaar
Copy link
Contributor

ripienaar commented Sep 10, 2025

Generally not against this, but the progress ack is not about indicating a job has started, this is not what the feature is for.

It's lets say you have a minute to handle a email delivery but the SMTP server is a bit slow and you need another minute, thats what its for.

These acks - and so advisories - are for exceptional cases but from the description it sounds a bit like you want to use them every time a job starts? We should not use server generated advisories for that imo.

Advisories are for exceptional cases generally not for something that happens on every message.

@bengsparks
Copy link
Author

bengsparks commented Sep 10, 2025

@ripienaar hey 😄 thanks for the quick response.

Depending on the task, a long time between job enqueuement and its completion / failure / error can pass, so it would be great to be able to monitor which jobs are outstanding, requiring the worker to only supply the corresponding Ack* messages. AFAIK, this is currently not possible with NATS JetStream.

This could be aimed more at long-lived tasks, which would benefit more from precise lifecycle tracking, e.g. video encoding, where one might have to wait a long time to see the result, and not know if the task has been dispatched yet / is being worked upon.
If a task is shortlived enough, then it execution time will hopefully remain below AckWait, so the InProgress shouldn't be sent.

For example, Redis exposes this information with XPENDING for its Streams:

Fetching data from a stream via a consumer group, and not acknowledging such data, has the effect of creating pending entries. [...]
In the extended form we no longer see the summary information, instead there is detailed information for each message in the pending entries list: 1. The ID of the message.

Adding this advisory message would implement similar functionality, as monitoring for this new advisory message allows detection of which messages are pending completion / currently being worked on.
This would allow a workflow engine to detect job lifecycles through the advisory alone.

#[derive(Default)]
enum JobState {
    // Worker detected a transient failure and can retry,
    // and has therefore `AckNak`'ed the message,
    // which triggered the `CONSUMER.MSG_NAKED` advisory message.
    // This is also the default state upon insertion into storage.
    #[default]
    Enqueued,

    // Worker signalled that it is actively processing the requested job
    // and has therefore `AckInProgress`'ed the message,
    // which SHOULD trigger a `CONSUMER.IN_PROGRESS` advisory message
    Active, // <-- this is the state I'd like to trigger using this PR 

    // Worker has successfully completed the job
    // and indicates this by having `AckAck`'ed the message.
    // triggering the `CONSUMER.ACK` advisory message
    Success,

    // Worker detects that the job cannot be completed as was requested,
    // sending `AckTerm` to not reattempt the task,
    // triggering the `CONSUMER.MSG_TERMINATED` advisory message
    Error,

    // Worker detected a transient failure, but cannot retry due to
    // having exhausted `max_deliver`,
    // which triggers the `CONSUMER.MAX_DELIVERIES` advisory message
    Failure,
}

@ripienaar
Copy link
Contributor

But you can also just have a KV with a job status per job ID that you write into as things progress right? Those wishing to know job progress can watch the KV and be notified. With the added benefit of it being reliable (unlike advisories) and doing exactly what you want or choose to implement

@bengsparks
Copy link
Author

bengsparks commented Sep 10, 2025

That is true, but it places additional responsibility on the worker to be aware of what to write where, which might not be feasible e.g. if the worker crashes (job gets stuck in Active state, despite the message no longer being worked on, requiring $AckWait amount of time to pass, which can be high for long-lived tasks), or should not / does not have direct access to the desired KV.
As JetStream already offers AckWait and InProgress, all the worker has to do is interact with the JetStream API, and the rest can happen by itself. In turn, a workflow controller can watch the advisory to observe these events.

I'm not sure if I understand what you mean by advisories being unreliable, do you mean meeting delivery guarantees?
The docs mention storing advisory messages in streams, and due to the advisory subject structures, it is straight-forward to accomplish for precisely the streams and / or consumers you desire.

@ripienaar
Copy link
Contributor

Advisories are published using core nats, with no retries, no effort to make sure they reach the stream even if configured to listen on subjects. It's best efforts and subject to at-most-once guarantees. For the stream example, if a advisory gets published and the stream is leader-less or unreachable the advisory will just be lost.

They really are only advisories that something exceptional has happened but for which delivery is not essential.

You should not build application reliability logic ontop of these, strongly suggest you put your job management business status flow needs in your business domain

@bengsparks
Copy link
Author

bengsparks commented Sep 10, 2025

I see, that's unfortunate, but thanks very much for the advice on advisories and watchable KVs, that is very helpful to know 😄
I'll keep the PR up if you'd like the feature to be merged anyway, as InProgress stands out as the one ACK message that does not have a corresponding advisory message, but no hard feelings if you think it's not a good idea.

Thanks for your time, and for the continued development of NATS ❤️

@ripienaar
Copy link
Contributor

Lets park it for after 1.12 release, there's a few ack related things I'd like to see:

  • Reasons being supported on more ack types - like this one, wouldn't it be good to have a reason field where you could add something like upstream API failed 10 times
  • Next versions of each of the ack types
  • Expanding stats around acks - we have sampling now its nice but could be nicer

So the first one there could certainly impact what we do here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants