-
Notifications
You must be signed in to change notification settings - Fork 428
Add edge sorting method to rr_graph_storage #3316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
24e62ad
to
afe997a
Compare
2108493
to
a9472ca
Compare
Running some performance tests and runtime numbers don't seem to be changed much. Now that I think about it this might actually be faster than the in-place sorting we had before. This method swaps only some indices when sorting and only copying edge data around once, while the previous method swapped (and copied) edge data a lot more during sorting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @AmirhosseinPoolad.
I suggested a few minor changes.
new_edge_src_node_[RREdgeId(new_index)] = edge_src_node_[RREdgeId(edge_index)]; | ||
new_edge_dest_node_[RREdgeId(new_index)] = edge_dest_node_[RREdgeId(edge_index)]; | ||
new_edge_switch_[RREdgeId(new_index)] = edge_switch_[RREdgeId(edge_index)]; | ||
new_edge_remapped_[RREdgeId(new_index)] = edge_remapped_[RREdgeId(edge_index)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use a vtr::vector<RREdgeId, RRedgeId> for edge_indices, you can use vtr::vector::pairs() method to automatically iterate over new_index and edge_index. This can also help get rid of type casting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would kind of obfuscate what the code is doing and make it harder to understand. I did the new_index casting in a separate line to make these four lines more readable.
Ran the VTR circuits, results look very good.
This appears to be a pretty good change overall. New code is (imo) more readable, less fragile and faster with only a slight memory cost. |
The largest circuits look like a larger men increase — maybe 8%. We could reallocate the vectors one by one at some cpu cost but probably less peak memory. I’d zoom in on LU32 or neuron and try that.VaughnOn Oct 21, 2025, at 8:44 PM, Amir Poolad ***@***.***> wrote:AmirhosseinPoolad left a comment (verilog-to-routing/vtr-verilog-to-routing#3316)
Ran the VTR circuits, results look very good.
Metric
Vs Master
Total VPR runtime
0.968
Max mem. usage
1.030
RRG generation time
0.805
This appears to be a pretty good change overall. New code is (imo) more readable, less fragile and faster with only a slight memory cost.
@vaughnbetz
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
This helps with potential linkage errors
Did the permutations one by one instead of all at once like you suggested. It certainly helps with memory usage and doesn't really seem to result in a noticeable slowdown that couldn't be attributed to machine load/noise.
Same results, but only for circuits whose memory usage exceed 200MB:
Detailed results are added to the previous link. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Code looks good, but I have a question / concern about .shrink_to_fit and .capacity. I think we can make the code more memory size safe (and maybe it will save some memory, depending on what the standard library is doing).
|
||
// Since vec could have any type, we need to figure out it's type to allocate new_vec. | ||
// The scary std::remove_reference stuff does exactly that. This does nothing other than building a new 'vec' sized vector. | ||
typename std::remove_reference<decltype(vec)>::type new_vec(vec.size(), default_value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have to do a .reserve or .shrink_to_fit to make sure this new_vec does not have any extra capacity? We're careful to do that in the RR graph, so I imagine the original edge_src_node_ etc. have done that. If we don't do it for the new one, there's some chance the standard library created a vector with extra capacity for us, which we don't want with these big vectors.
Seems worth testing, and also a good practice to use .shrink_to_fit for safety in case some versions of the standard library default to capacity == requested constructor size for big vectors and some default to something else (e.g. 2x size for any constructor that asks for a size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t_rr_graph_storage actually has a shrink_to_fit method that does this for all the vectors in t_rr_graph_storage. Currently it's only used for building tileable RR graphs. I think we should test calling t_rr_graph_storage::shrink_to_fit at the end of RR graph generation in a separate PR, since doing it in the sorting function wouldn't actually change the peak memory usage of the sorting function and would only help in "long term." It might also have an extra memory and runtime cost (The underlying allocator might have to allocate a new smaller buffer and copy all the elements to it.)
Ran Titan on the current version of the code (without shrink_to_fit). Full data is added to the link.
Pretty sure the runtime values were affected by machine load. Packing (which has nothing to do with the RR Graph) took 4.6% longer and placement took 7.5% longer (with the exact same #swaps) and routing took 1.2% longer. QoR is exactly the same as master which strongly implies that the RR Graph is not changed at all. I also wrote the RR Graph for stereo_vision to a file and the diff was empty. I think the new code uses less memory here since the rr graph generation code over-allocates the edges (first allocates 10x number of nodes for edges and doubles it each time it needs more) but the new sorted edge vectors allocate precisely what they need. |
Thanks. Looks OK. |
Hmmm .... good point that the shrink_to_fit could hurt if we get a copy. How about this: just print out the .size() and .capacity() of these four vectors on some large design. If they're equal, I'll stop worrying about it, since at least one version of the standard library does the logical thing and sets .capacity to .size when the constructor specifies an initial size. Let me know what you find ... |
Previously we used a non-conformant proxy iterator to sort the rr edges (which were stored in a struct-of-array fashion). That was very fragile and resulted in multiple problems when updating compilers and standard libraries. This PR addresses the issues we had with the previous methods we used to sort edges by using a slower but conformant method.
Sorting edges is now much slower and requires building 4 entire new arrays for edges, but since we only sort two times when building the RR graph it should be fine.