Skip to content

Conversation

@nachogiljaldo
Copy link

@nachogiljaldo nachogiljaldo commented Sep 30, 2023

Closes ThreeDotsLabs/watermill#374

Why?

It would be nice to have a simple way to increase the concurrency of message processing. With the current options, it is only possible to consume one message at the time (i.e. you need to get > process > ACK > get > process > ...).

That is not actually required in many use cases where events with the same key are sent to the same partition and, therefore, a partial order exists within the partition.

What?

This PR:

  • documents the previous and new consumption models
  • extracts the messageHandler struct to an interface, so multiple implementations can be provided while preserving the subscriber's structure
  • creates 2 new implementations of the interface:
    • batchedMessageHandler works by collecting batches of messages up to a certain time or duration. Those messages are then sent to the output channel. NACKed messages are managed by resending every message following the NACKed one in that topic / partition
    • partitionConcurrentMessageHandler works by multiplexing each claim into the output channel, that allows having up to N messages in-flight, where N is the number of topics/partitions in the subscriber. NACKs / ACKs are handled by each multiplexed
  • updates the tests so they run with both models (which makes them slower :/ )

IMPORTANT: I tried to preserve the current default behavior. Therefore, there is a Default consumption model which behaves the same way it does now.

Potential follow-up

A potential follow-up would be a consumer that allows for up to N messages being in-flight. That would allow for cases in which the developers want to maximize parallelism at the expense of complexity on their code.

@nachogiljaldo nachogiljaldo force-pushed the 374-allow_concurrent_consumption_of_events branch from 1cfa4da to dafe0de Compare October 1, 2023 18:25
@nachogiljaldo nachogiljaldo force-pushed the 374-allow_concurrent_consumption_of_events branch from 7466597 to c316ad9 Compare October 13, 2023 21:15
@nachogiljaldo nachogiljaldo marked this pull request as ready for review October 13, 2023 22:04
@nachogiljaldo
Copy link
Author

Hey @m110 sorry for the direct ping, but I wonder if you could, as member of the org, review the PR or ask somebody who could do it.

@nachogiljaldo nachogiljaldo force-pushed the 374-allow_concurrent_consumption_of_events branch 2 times, most recently from 644709c to f528a17 Compare March 8, 2024 14:28
@nachogiljaldo nachogiljaldo force-pushed the 374-allow_concurrent_consumption_of_events branch from f528a17 to 5c1acf0 Compare March 8, 2024 14:48
@nachogiljaldo
Copy link
Author

@aldinugroho24 @yangyangrusli @m110 are you the maintainers of this repo? If yes, would you mind having a look at this PR? I think it would be a nice improvement. Also I wonder if you could use any help maintaining this repo. I see there's a bunch of piled PRs and I'm affected by some of those bugs, so I'd be happy to help if it works.

@m110
Copy link
Member

m110 commented Aug 23, 2024

I'm sorry about the long wait @nachogiljaldo, I'm on it now. Thanks for contributing!

@kmuzychko
Copy link

Any updates? May I help in any way to speed it up?

@nachogiljaldo
Copy link
Author

To be honest, after 2 years waiting I just moved to raw sarama long time ago :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[watermill-kafka] - cannot process events concurrently without ACKing the event

3 participants