Skip to content

Conversation

fatkodima
Copy link
Contributor

README.adoc Outdated
@@ -1956,6 +1956,30 @@ travel_to(Time.current.to_time)
freeze_time
----

== Performance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt that the category "Performance" is a bit too broad. I’m not entirely confident, but after looking over the categories, it seems to me that adding "Jobs" under "Mailers" would be a good fit. Strictly speaking, it covers not only Active Job but also Action Mailer, but I made this judgment because the link URL itself includes "jobs".

@rubocop/style-guide-editors If there is a better approach, please suggest it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a split.

@fatkodima fatkodima force-pushed the enqueuing-many-jobs branch from 6cec25c to 4f0fcaa Compare August 19, 2025 08:49
@Earlopain
Copy link
Contributor

Does this usually say which rails version introduced this? deliver_all_later will be part of 8.1 which is not released yet

@bbatsov
Copy link
Collaborator

bbatsov commented Aug 19, 2025

I guess it's not a bad idea to add an admonition (e.g. NOTE:) to make this clearer.

@fatkodima fatkodima force-pushed the enqueuing-many-jobs branch from 4f0fcaa to 749d5ff Compare August 19, 2025 11:53
@fatkodima
Copy link
Contributor Author

Pushed a note.

When implementing a cop, is it ok to create a single cop for both cases (would be a cop that detects single operations in a loop which can be performed in bulk)? I also want it to be extensible to extend with Sidekiq'q perform_bulk method.

# good
emails = users.map { |user| Notifier.welcome(user) }
ActionMailer.deliver_all_later(emails)
----
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is described as an independent rule, the NOTE should also be included here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

=== Enqueuing Many Emails at Once [[enqueuing-many-emails]]

Prefer enqueuing many emails at once instead of one by one in a loop.
This can greatly reduce the number of round-trips to the queue datastore.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that a single batch is better for every store? That multiple messages is worse than one larger message for every protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batches are usually in size of 1000, so all the args can possibly be slit into multiple batches.
A single insert is faster than 1000 individual sql queries for database. A single redis command is faster that 1000 redis commands. Maybe there are cases when multiple commands can be better, but I can't think of such one and in 99.9% cases batches are better.

@@ -1640,6 +1640,40 @@ Sending emails while generating page response should be avoided.
It causes delays in loading of the page and request can timeout if multiple email are sent.
To overcome this emails can be sent in background process with the help of https://github.com/mperham/sidekiq[sidekiq] gem.

=== Enqueuing Many Emails at Once [[enqueuing-many-emails]]
Copy link
Member

@pirj pirj Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s https://github.com/toptal/active-job-style-guide, but I don’t know who maintains/owns it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for emails. Another one was added for jobs in a newly created "Jobs" section.

@fatkodima fatkodima force-pushed the enqueuing-many-jobs branch from 749d5ff to 78fa70d Compare August 19, 2025 18:47
ids.each { |id| MyJob.perform_later(id) }

# good
jobs = ids.map { |id| MyJob.new(id) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m puzzled as to why this is needed in the first place. It was always a good practice to accumulate jobs and submit them all at once (to the queue) only when the DB transaction commits, with sidekiq-postpone. Now, there’s a built-in option for that in Sidekiq, and in Rails, too.
Why such explicit manual approach is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m puzzled as to why this is needed in the first place. It was always a good practice to accumulate jobs and submit them all at once (to the queue) only when the DB transaction commits, with sidekiq-postpone

Haven't ever seen this. It was always done manually where I saw it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not jobs and emails are enqueued within transactions, most are not. I think the linked gems deal only with transactions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me a few practical examples of code that schedules thousands of jobs/emails and is so trivially simple that it doesn't have to be atomic, and is perfectly linear (wrote to the DB, and just then scheduled thousands of jobs)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples where there is no transactions:

  1. Reminding thousands of users about appointments
  2. scheduling jobs for thousands of users to sync their calendars
  3. schedule jobs for thousands of users to check if they are active
  4. notify thousands of course members about the new course item unlocked
  5. etc

If I got what was asked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really the best practice to schedule thousands of jobs? Or should those better be iterable jobs like https://github.com/sidekiq/sidekiq/wiki/Iteration that you yourself contributed?

Copy link
Contributor

@Earlopain Earlopain Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no big problem with enqueing even millions of jobs if your job storage can handle it. It takes advantage of the job backend to execute them in parallel, depending on the queue size, while iterable jobs would do so in sequence.

BTW, even rails has absorbed a similar feature to what sidekiq has called "Continuations": rails/rails#55127. Or at least I think they're similar, haven't tried it out yet and will only be released in Rails 8.1

That said, for me when I had to enqueue huge amounts of jobs, it was pretty much always a one-off thing so (enqueue) performance didn't really matter that much in those cases.

Copy link
Contributor Author

@fatkodima fatkodima Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are valid use cases for iterable jobs and valid use cases for enqueuing many jobs at once. Not all jobs should be converted to iterable.

I have seen code paths going from minutes to seconds when using batch enqueuing of jobs (and these code paths can run pretty regularly). And when using batches, it will take less time, less requests and load to redis and with less requests a smaller chance that any of these requests fail and the need to start the whole process over.

We discussed examples for a very large number of jobs. But this applies even to hundreds of jobs. The performance can be not as visible for redis, but for database backed queues 1 query vs 100 queries can be a difference (given that it is executed regularly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants