-
Notifications
You must be signed in to change notification settings - Fork 66
fix(autograd): include traced keys in HDF5 hash input #2875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
fix(autograd): include traced keys in HDF5 hash input #2875
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, no comments
161273f
to
a2fa0ec
Compare
Diff CoverageDiff: origin/develop...HEAD, staged and unstaged changes
Summary
tidy3d/components/autograd/field_map.py
|
add_data_to_file(data_dict=self.dict()) | ||
traced_keys_payload = self._serialized_traced_field_keys() | ||
if traced_keys_payload: | ||
f_handle.attrs[TRACED_FIELD_KEYS_ATTR] = traced_keys_payload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it makes a bit more sense to use a new dataset (like we do for the simulation json) rather than the attrs
directly. Feels like a bit of an abuse of even the hdf5 attrs
purpose.
For the simulation json, I remember Tyler first tried putting it in attrs too, but it can be a problem when it becomes large. Here it's unlikely that the json will be very big, but still potentially better to just make a dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually argue that this here is metadata and exactly the kind of thing that should go in hdf5 attrs 😄 - it's a short list of paths, it never grows beyond a couple of bytes, and never needs random access, and hdf5 attrs are intended for exactly that style of short descriptive information. I think it's different from the simulation json, because that really isn't metadata, and also as you said potentially quite large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, yeah, fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, but I also feel uneasy about this. what is the disadvantage to just making a dataset? I feel it makes things more explicit that this is some data that we care about. It's not purely metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a fair point, maybe the line between metadata and model data is a bit fuzzy here. i lean toward calling it metadata, because it's a deterministic byproduct of the actual model and the user inputs. if the simulation is rebuilt from scratch, then the same traced paths will be "rediscovered", so it's closer to a property rather than part of the model.
including it as a dataset also might require some additional plumbing for the hdf5 reader/writer (haven't checked), attrs doesnt show up in the json reconstruction. that being said, the difference is mostly semantic 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I asked chat gpt and it agrees with yannick :D. so I'm fine with current implementation
Short answer:
Use HDF5 attributes if the new info is small (a few strings/ints/bools), describes the run, and you’ll read it whole.
Use a dataset only if it’s large, tabular/array-like, needs chunking/compression, or will be queried/sliced.
This info is metadata (provenance/config that affects processing, not the simulation arrays themselves).
Why attrs fit here
Attributes are meant for descriptive/run-level info (schema versions, flags, tool versions).
Cheap to read, no chunking needed, live with the object they describe (usually the root group).
Keep them small; if it grows or becomes structured, promote to a dataset or a small /cache_key group.
Cache/hash tips (to avoid fragile keys)
Scope the key: hash a defined subset only (inputs + this new flag/signature + relevant versions), not the whole file.
Canonicalization: store under /cache_key (either as attributes or tiny datasets). When building the hash, read keys, sort by name, encode deterministically (e.g., canonical JSON with sorted keys / UTF-8, or CBOR), and hash.
Versioning: include cache_key_version and producer/tool version fields. Bump when semantics change.
Defaults for back-compat: treat “missing field” as an explicit default (e.g., "backend_signature":"v0"), so old files don’t slip through incorrectly.
Auditability: also store the computed cache_key back into the file (attr) so you can explain cache hits later.
Rule of thumb
Attrs: small, descriptive, per-run constants → ✅
Dataset: big/structured, per-step/per-record data or anything you might slice/ compress → ✅
Given your description, I’d put the new field(s) as root attributes (or attributes under /cache_key) and include them in the deterministic hash. Yes—consider it metadata.
a2fa0ec
to
7236cf8
Compare
Summary
__tidy3d_traced_field_keys__
) so cache hashes differ when the tracked set changesTracer
,FieldMap
, andTracerKeys
helpers intotidy3d/components/autograd
and reuse them from both the serializer and the web pipelineAlternatives Considered
TracerKeys
file: this leaves the cache blind to tracer changes, because the hash sees identical HDF5 exportsmodel.attrs
: those attrs are user-controlled metadata and cached JSON already assumes they are non-functionalRationale
Writing the traced-key summary to an HDF5 attribute keeps the metadata with the exported simulation, feeds into the hashing logic, and avoids making autograd bookkeeping part of the user-facing schema.
Greptile Overview
Updated On: 2025-10-08 08:26:25 UTC
Summary
This PR fixes a critical autograd cache bug where simulations with identical static parameters but different traced field sets would incorrectly share cache entries. The core issue was that the cache hash only considered the HDF5 file contents (which are identical for the same static simulation) but ignored the sidecar `TracerKeys` file that tracks which fields are being traced by autograd.The solution serializes traced field keys directly into the HDF5 metadata using a new attribute
__tidy3d_traced_field_keys__
. This ensures that simulations tracking different autograd parameters generate different cache hashes, preventing incorrect cache hits that could lead to wrong gradient computations in optimization workflows.The implementation includes a significant refactoring that moves the
Tracer
,FieldMap
, andTracerKeys
helper classes fromtidy3d/web/api/autograd/utils.py
into a new shared moduletidy3d/components/autograd/field_map.py
. This centralization enables code reuse between the web pipeline and the serialization components while maintaining consistency in traced field metadata handling.A new regression test
test_sim_hash_changes_with_traced_keys()
validates that when the traced field set changes (by converting a structure to static), both the field map and the simulation hash change accordingly, ensuring proper cache invalidation.Important Files Changed
Changed Files
Confidence score: 4/5
Sequence Diagram