-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add suggestions for enqueuing many jobs and emails #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
README.adoc
Outdated
@@ -1956,6 +1956,30 @@ travel_to(Time.current.to_time) | |||
freeze_time | |||
---- | |||
|
|||
== Performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt that the category "Performance" is a bit too broad. I’m not entirely confident, but after looking over the categories, it seems to me that adding "Jobs" under "Mailers" would be a good fit. Strictly speaking, it covers not only Active Job but also Action Mailer, but I made this judgment because the link URL itself includes "jobs".
@rubocop/style-guide-editors If there is a better approach, please suggest it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a split.
6cec25c
to
4f0fcaa
Compare
Does this usually say which rails version introduced this? |
I guess it's not a bad idea to add an admonition (e.g. |
4f0fcaa
to
749d5ff
Compare
Pushed a note. When implementing a cop, is it ok to create a single cop for both cases (would be a cop that detects single operations in a loop which can be performed in bulk)? I also want it to be extensible to extend with Sidekiq'q |
# good | ||
emails = users.map { |user| Notifier.welcome(user) } | ||
ActionMailer.deliver_all_later(emails) | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is described as an independent rule, the NOTE
should also be included here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
=== Enqueuing Many Emails at Once [[enqueuing-many-emails]] | ||
|
||
Prefer enqueuing many emails at once instead of one by one in a loop. | ||
This can greatly reduce the number of round-trips to the queue datastore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you certain that a single batch is better for every store? That multiple messages is worse than one larger message for every protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batches are usually in size of 1000, so all the args can possibly be slit into multiple batches.
A single insert is faster than 1000 individual sql queries for database. A single redis command is faster that 1000 redis commands. Maybe there are cases when multiple commands can be better, but I can't think of such one and in 99.9% cases batches are better.
@@ -1640,6 +1640,40 @@ Sending emails while generating page response should be avoided. | |||
It causes delays in loading of the page and request can timeout if multiple email are sent. | |||
To overcome this emails can be sent in background process with the help of https://github.com/mperham/sidekiq[sidekiq] gem. | |||
|
|||
=== Enqueuing Many Emails at Once [[enqueuing-many-emails]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s https://github.com/toptal/active-job-style-guide, but I don’t know who maintains/owns it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for emails. Another one was added for jobs in a newly created "Jobs" section.
749d5ff
to
78fa70d
Compare
ids.each { |id| MyJob.perform_later(id) } | ||
|
||
# good | ||
jobs = ids.map { |id| MyJob.new(id) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m puzzled as to why this is needed in the first place. It was always a good practice to accumulate jobs and submit them all at once (to the queue) only when the DB transaction commits, with sidekiq-postpone. Now, there’s a built-in option for that in Sidekiq, and in Rails, too.
Why such explicit manual approach is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m puzzled as to why this is needed in the first place. It was always a good practice to accumulate jobs and submit them all at once (to the queue) only when the DB transaction commits, with sidekiq-postpone
Haven't ever seen this. It was always done manually where I saw it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But not jobs and emails are enqueued within transactions, most are not. I think the linked gems deal only with transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me a few practical examples of code that schedules thousands of jobs/emails and is so trivially simple that it doesn't have to be atomic, and is perfectly linear (wrote to the DB, and just then scheduled thousands of jobs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples where there is no transactions:
- Reminding thousands of users about appointments
- scheduling jobs for thousands of users to sync their calendars
- schedule jobs for thousands of users to check if they are active
- notify thousands of course members about the new course item unlocked
- etc
If I got what was asked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really the best practice to schedule thousands of jobs? Or should those better be iterable jobs like https://github.com/sidekiq/sidekiq/wiki/Iteration that you yourself contributed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no big problem with enqueing even millions of jobs if your job storage can handle it. It takes advantage of the job backend to execute them in parallel, depending on the queue size, while iterable jobs would do so in sequence.
BTW, even rails has absorbed a similar feature to what sidekiq has called "Continuations": rails/rails#55127. Or at least I think they're similar, haven't tried it out yet and will only be released in Rails 8.1
That said, for me when I had to enqueue huge amounts of jobs, it was pretty much always a one-off thing so (enqueue) performance didn't really matter that much in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are valid use cases for iterable jobs and valid use cases for enqueuing many jobs at once. Not all jobs should be converted to iterable.
I have seen code paths going from minutes to seconds when using batch enqueuing of jobs (and these code paths can run pretty regularly). And when using batches, it will take less time, less requests and load to redis and with less requests a smaller chance that any of these requests fail and the need to start the whole process over.
We discussed examples for a very large number of jobs. But this applies even to hundreds of jobs. The performance can be not as visible for redis, but for database backed queues 1 query vs 100 queries can be a difference (given that it is executed regularly).
Discussion - rubocop/rubocop-rails#1517