-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Description
This issue is automatically created based on existing pull request: #40271: Minor performance improvement of async order grid processing
The following should help add a minor performance improvement to async order grid processing, we saw this recently on a high volume store which has a large number of orders and continued to get more orders added at a high throughput.
On this system we had a bug which was causing the redis cache to be purged more frequently than it should have been which caused the symptoms to be more apparent in our case. For all normal usages, this will only really be a problem post deployment or if people are very eagerly flushing the full cache storage.
With a fix like this in place we can stop the query from trying to load data from the entire sales_order more frequently than it ought to (and other associated grid tables).
How async order grid processing works
magento2/app/code/Magento/Sales/Model/ResourceModel/Provider/UpdatedAtListProvider.php
Lines 12 to 17 in 182fdaa
| /** | |
| * Retrieves ID's of not synced by `updated_at` column entities. | |
| * The result should contain list of entities ID's from the main table which have `updated_at` column greater | |
| * than in the grid table. | |
| */ | |
| class UpdatedAtListProvider implements NotSyncedDataProviderInterface |
The idea is that we avoid populating the order grid tables synchronously during the order placement process, and offload it to a background job. We should then only update the order grid tables for entities that have been recently updated.
We can see that the functionality to get the order IDs to reprocess attaches $select->where('main_table.updated_at > ?', $lastUpdatedAt); to the query to keep the amount of data being queried to the minimal amount
magento2/app/code/Magento/Sales/Model/ResourceModel/Provider/UpdatedAtListProvider.php
Lines 50 to 63 in 182fdaa
| public function getIds($mainTableName, $gridTableName) | |
| { | |
| $select = $this->connection->select() | |
| ->from(['main_table' => $this->resourceConnection->getTableName($mainTableName)], ['main_table.entity_id']) | |
| ->joinInner( | |
| ['grid_table' => $this->resourceConnection->getTableName($gridTableName)], | |
| 'main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at', | |
| [] | |
| ); | |
| $lastUpdatedAt = $this->lastUpdateTimeCache->get($gridTableName); | |
| if ($lastUpdatedAt) { | |
| $select->where('main_table.updated_at > ?', $lastUpdatedAt); | |
| } |
The issue in our case, is that $this->lastUpdateTimeCache->get($gridTableName); is pulled from a transient storage, which means we can be running more expensive queries on very large tables more often than we actually need to. In the event of a cache miss we have to scan the whole table.
How this can be improved
Swapping the storage mechanism from using a cache storage to being held in the database makes sense to me. This is very like how we have version_id on the mview_state table which gets updated as the mview indexers process through their backlog.
I am not certain that we need a whole new table, so I thought about placing this data into the more permanent flag table.
This way even if deployments occur or caches are flushed, we don't lose the pointer to the last updated timestamp.
On small stores this is barely a blip, but when you start approaching millions+ of orders and more constantly streaming in, every little helps.
Manual testing scenarios (*)
For regression testing of the core functionality
bin/magento config:set --lock-env dev/grid/async_indexing 1; php bin/magento app:config:import- Place orders
- Run the
sales_grid_order_async_insertcron - Ensure orders end up in the grid
For manual testing and deep inspection of the queries produced and the process
You can enable async indexing and the db logger for easy analysis of queries.
bin/magento config:set --lock-env dev/grid/async_indexing 1;
php bin/magento app:config:import
bin/magento dev:query-log:enableYou can place orders properly through the frontend and see them synced over into the grid when the cron runs, but for a quick and dirty look at the process we can spoof things like so.
A query to directly insert a dummy order into sales_order (on 2.4.8)
INSERT INTO sales_order (state,status,store_id,customer_id,customer_is_guest,customer_email,customer_firstname,customer_lastname,base_grand_total,grand_total,base_subtotal,subtotal,base_tax_amount,tax_amount,base_shipping_amount,shipping_amount,base_currency_code,order_currency_code,store_currency_code,global_currency_code,increment_id,total_item_count,created_at,updated_at) VALUES ('processing','processing',1,NULL,1,CONCAT('guest',FLOOR(RAND()*1000000),'@example.com'),'John','Doe',120.0000,120.0000,100.0000,100.0000,20.0000,20.0000,0.0000,0.0000,'GBP','GBP','GBP','GBP',CONCAT('TEST',UNIX_TIMESTAMP(NOW()),LPAD(FLOOR(RAND()*1000),3,'0')),1,NOW(),NOW());Initial data looks like
MariaDB [magento]> select * from flag where flag_code like 'LAST%' limit 1 ; select count(*) from sales_order_grid;
Empty set (0.001 sec)
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.000 sec)I trigger a sales_order insert using above spoof query.
I trigger the cron and inspect the query produced, see that it does not include a timestamp for filtering, as this is the first one.
$ n98-magerun sys:cron:run sales_grid_order_async_insert; grep main_table.updated_at var/debug/db.log
Run SalesOrderIndexGridAsyncInsertCron::execute done
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_atI inspect the data in the database, i can see my flag persisted and the order grid populating.
MariaDB [magento]> select * from flag where flag_code like 'LAST%' limit 1 ; select count(*) from sales_order_grid;
+---------+----------------------------------------+-------+-----------------------+---------------------+
| flag_id | flag_code | state | flag_data | last_update |
+---------+----------------------------------------+-------+-----------------------+---------------------+
| 7 | LAST_GRID_UPDATE_TIME:sales_order_grid | 0 | "2025-11-05 17:27:33" | 2025-11-05 17:27:35 |
+---------+----------------------------------------+-------+-----------------------+---------------------+
1 row in set (0.013 sec)
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.005 sec)I trigger a sales_order insert using above spoof query.
I trigger the cron and inspect the query produced, see that the query produced now includes the timestamp for filtering.
$ n98-magerun sys:cron:run sales_grid_order_async_insert; grep main_table.updated_at var/debug/db.log
Run SalesOrderIndexGridAsyncInsertCron::execute done
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at WHERE (main_table.updated_at > '2025-11-05 17:27:33')I can repeat this process of insert order, run cron, inspect query and see that the timestamp moves along as expected.
$ grep main_table.updated_at var/debug/db.log
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at WHERE (main_table.updated_at > '2025-11-05 17:27:33')
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at WHERE (main_table.updated_at > '2025-11-05 17:29:20')
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at WHERE (main_table.updated_at > '2025-11-05 17:29:32')
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at WHERE (main_table.updated_at > '2025-11-05 17:29:46')
INNER JOIN `sales_order_grid` AS `grid_table` ON main_table.entity_id = grid_table.entity_id AND main_table.updated_at > grid_table.updated_at WHERE (main_table.updated_at > '2025-11-05 17:29:52')Contribution checklist (*)
- Pull request has a meaningful description of its purpose
- All commits are accompanied by meaningful commit messages
- All new or changed code is covered with unit/integration tests (if applicable)
- README.md files for modified modules are updated and included in the pull request if any README.md predefined sections require an update
- All automated tests passed successfully (all builds are green)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status