Add edge sorting method to rr_graph_storage #3316

AmirhosseinPoolad · 2025-10-21T05:10:23Z

Previously we used a non-conformant proxy iterator to sort the rr edges (which were stored in a struct-of-array fashion). That was very fragile and resulted in multiple problems when updating compilers and standard libraries. This PR addresses the issues we had with the previous methods we used to sort edges by using a slower but conformant method.

Sorting edges is now much slower and requires building 4 entire new arrays for edges, but since we only sort two times when building the RR graph it should be fine.

AmirhosseinPoolad · 2025-10-21T15:57:25Z

Running some performance tests and runtime numbers don't seem to be changed much. Now that I think about it this might actually be faster than the in-place sorting we had before. This method swaps only some indices when sorting and only copying edge data around once, while the previous method swapped (and copied) edge data a lot more during sorting.

soheilshahrouz

Thank @AmirhosseinPoolad.

I suggested a few minor changes.

libs/librrgraph/src/base/rr_graph_storage.cpp

libs/librrgraph/src/base/rr_graph_storage.h

libs/librrgraph/src/base/rr_graph_storage.cpp

libs/librrgraph/src/base/rr_graph_storage.h

soheilshahrouz · 2025-10-21T16:48:52Z

libs/librrgraph/src/base/rr_graph_storage.h

+            new_edge_src_node_[RREdgeId(new_index)] = edge_src_node_[RREdgeId(edge_index)];
+            new_edge_dest_node_[RREdgeId(new_index)] = edge_dest_node_[RREdgeId(edge_index)];
+            new_edge_switch_[RREdgeId(new_index)] = edge_switch_[RREdgeId(edge_index)];
+            new_edge_remapped_[RREdgeId(new_index)] = edge_remapped_[RREdgeId(edge_index)];


If you use a vtr::vector<RREdgeId, RRedgeId> for edge_indices, you can use vtr::vector::pairs() method to automatically iterate over new_index and edge_index. This can also help get rid of type casting.

I think this would kind of obfuscate what the code is doing and make it harder to understand. I did the new_index casting in a separate line to make these four lines more readable.

AmirhosseinPoolad · 2025-10-22T00:44:28Z

Ran the VTR circuits, results look very good.

Metric	Vs Master
Total VPR runtime	0.968
Max mem. usage	1.030
RRG generation time	0.805

This appears to be a pretty good change overall. New code is (imo) more readable, less fragile and faster with only a slight memory cost.

@vaughnbetz

vaughnbetz · 2025-10-22T15:23:15Z

The largest circuits look like a larger men increase — maybe 8%. We could reallocate the vectors one by one at some cpu cost but probably less peak memory. I’d zoom in on LU32 or neuron and try that.VaughnOn Oct 21, 2025, at 8:44 PM, Amir Poolad ***@***.***> wrote:AmirhosseinPoolad left a comment (verilog-to-routing/vtr-verilog-to-routing#3316) Ran the VTR circuits, results look very good. Metric Vs Master Total VPR runtime 0.968 Max mem. usage 1.030 RRG generation time 0.805 This appears to be a pretty good change overall. New code is (imo) more readable, less fragile and faster with only a slight memory cost. @vaughnbetz —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

This helps with potential linkage errors

AmirhosseinPoolad · 2025-10-22T18:43:04Z

Did the permutations one by one instead of all at once like you suggested. It certainly helps with memory usage and doesn't really seem to result in a noticeable slowdown that couldn't be attributed to machine load/noise.

Metric	At once vs Master	One by one vs Master
Total VPR runtime	0.968	0.971
Max mem. usage	1.030	1.012
RRG generation time	0.805	0.799

Same results, but only for circuits whose memory usage exceed 200MB:

Metric	At once vs Master	One by one vs Master
Total VPR runtime	0.960	0.967
Max mem. usage	1.055	1.025
RRG generation time	0.803	0.800

Detailed results are added to the previous link.

vaughnbetz

Nice! Code looks good, but I have a question / concern about .shrink_to_fit and .capacity. I think we can make the code more memory size safe (and maybe it will save some memory, depending on what the standard library is doing).

vaughnbetz · 2025-10-23T04:31:42Z

libs/librrgraph/src/base/rr_graph_storage.h

+
+            // Since vec could have any type, we need to figure out it's type to allocate new_vec.
+            // The scary std::remove_reference stuff does exactly that. This does nothing other than building a new 'vec' sized vector.
+            typename std::remove_reference<decltype(vec)>::type new_vec(vec.size(), default_value);


I wonder if we have to do a .reserve or .shrink_to_fit to make sure this new_vec does not have any extra capacity? We're careful to do that in the RR graph, so I imagine the original edge_src_node_ etc. have done that. If we don't do it for the new one, there's some chance the standard library created a vector with extra capacity for us, which we don't want with these big vectors.

Seems worth testing, and also a good practice to use .shrink_to_fit for safety in case some versions of the standard library default to capacity == requested constructor size for big vectors and some default to something else (e.g. 2x size for any constructor that asks for a size).

t_rr_graph_storage actually has a shrink_to_fit method that does this for all the vectors in t_rr_graph_storage. Currently it's only used for building tileable RR graphs. I think we should test calling t_rr_graph_storage::shrink_to_fit at the end of RR graph generation in a separate PR, since doing it in the sorting function wouldn't actually change the peak memory usage of the sorting function and would only help in "long term." It might also have an extra memory and runtime cost (The underlying allocator might have to allocate a new smaller buffer and copy all the elements to it.)

AmirhosseinPoolad · 2025-10-23T21:23:44Z

Ran Titan on the current version of the code (without shrink_to_fit). Full data is added to the link.

Metric	Vs Master
Total VPR runtime	1.050
Max mem. usage	0.996
RRG generation time	0.878

Pretty sure the runtime values were affected by machine load. Packing (which has nothing to do with the RR Graph) took 4.6% longer and placement took 7.5% longer (with the exact same #swaps) and routing took 1.2% longer. QoR is exactly the same as master which strongly implies that the RR Graph is not changed at all. I also wrote the RR Graph for stereo_vision to a file and the diff was empty.

I think the new code uses less memory here since the rr graph generation code over-allocates the edges (first allocates 10x number of nodes for edges and doubles it each time it needs more) but the new sorted edge vectors allocate precisely what they need.

vaughnb-cerebras · 2025-10-23T21:30:57Z

Thanks. Looks OK.
Placement may include rr-graph building time; we need to build it to create the placement delay estimator. So depending on where we put the timers it may be in placement time. You should be able to tell from the log file, as we should print messages when we create the rr-graph and compute the placement delay matrix, etc.
Still think we should do a shrink to fit since it gives us more control .... can you try that out?

vaughnb-cerebras · 2025-10-23T21:37:11Z

Hmmm .... good point that the shrink_to_fit could hurt if we get a copy. How about this: just print out the .size() and .capacity() of these four vectors on some large design. If they're equal, I'll stop worrying about it, since at least one version of the standard library does the logical thing and sets .capacity to .size when the constructor specifies an initial size. Let me know what you find ...

github-actions bot added the lang-cpp C/C++ code label Oct 21, 2025

Add edge sorting method to rr_graph_storage

afe997a

AmirhosseinPoolad force-pushed the rr_graph_edge_sorting branch from 24e62ad to afe997a Compare October 21, 2025 15:05

AmirhosseinPoolad changed the title ~~[WIP] Add edge sorting method to rr_graph_storage~~ Add edge sorting method to rr_graph_storage Oct 21, 2025

Fix t_rr_graph_storage::sort_edges warning

a9472ca

AmirhosseinPoolad force-pushed the rr_graph_edge_sorting branch from 2108493 to a9472ca Compare October 21, 2025 15:44

AmirhosseinPoolad added 2 commits October 21, 2025 12:06

Pass rr edge indices by value in edge sorting functors

8e0ec97

Remove TODO from edge sorting method

13d8a0d

soheilshahrouz reviewed Oct 21, 2025

View reviewed changes

AmirhosseinPoolad added 2 commits October 21, 2025 15:40

Fix bug in edge_compare_src_node_and_configurable_first

2853329

Address PR comments

942c11d

github-actions bot added the VPR VPR FPGA Placement & Routing Tool label Oct 21, 2025

AmirhosseinPoolad added 2 commits October 22, 2025 13:58

Put edge comparision functors in anonymous namespaces

3169003

This helps with potential linkage errors

Sort edges one by one

a3e451f

vaughnbetz reviewed Oct 23, 2025

View reviewed changes

Add edge sorting method to rr_graph_storage #3316

Are you sure you want to change the base?

Add edge sorting method to rr_graph_storage #3316

Conversation

AmirhosseinPoolad commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmirhosseinPoolad commented Oct 21, 2025

Uh oh!

soheilshahrouz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soheilshahrouz Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

AmirhosseinPoolad Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

AmirhosseinPoolad commented Oct 22, 2025

Uh oh!

vaughnbetz commented Oct 22, 2025 via email

Uh oh!

AmirhosseinPoolad commented Oct 22, 2025

Uh oh!

vaughnbetz left a comment

Choose a reason for hiding this comment

Uh oh!

vaughnbetz Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

AmirhosseinPoolad Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

AmirhosseinPoolad commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vaughnb-cerebras commented Oct 23, 2025

Uh oh!

vaughnb-cerebras commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AmirhosseinPoolad commented Oct 21, 2025 •

edited

Loading

AmirhosseinPoolad commented Oct 23, 2025 •

edited

Loading