fix(autograd): include traced keys in HDF5 hash input #2875

yaugenst-flex · 2025-10-08T08:24:11Z

Summary

serialize the adjoint traced-key paths into the forward simulation’s HDF5 metadata (__tidy3d_traced_field_keys__) so cache hashes differ when the tracked set changes
move the Tracer, FieldMap, and TracerKeys helpers into tidy3d/components/autograd and reuse them from both the serializer and the web pipeline
add a regression test confirming simulations with identical statics but different tracer sets now hash differently

Alternatives Considered

keep relying on the existing sidecar TracerKeys file: this leaves the cache blind to tracer changes, because the hash sees identical HDF5 exports
expose traced keys as a new model field: that would surface runtime autograd state in the public schema even though users neither set nor persist it
store the paths in model.attrs: those attrs are user-controlled metadata and cached JSON already assumes they are non-functional

Rationale

Writing the traced-key summary to an HDF5 attribute keeps the metadata with the exported simulation, feeds into the hashing logic, and avoids making autograd bookkeeping part of the user-facing schema.

Greptile Overview

Updated On: 2025-10-08 08:26:25 UTC

Summary

This PR fixes a critical autograd cache bug where simulations with identical static parameters but different traced field sets would incorrectly share cache entries. The core issue was that the cache hash only considered the HDF5 file contents (which are identical for the same static simulation) but ignored the sidecar `TracerKeys` file that tracks which fields are being traced by autograd.

The solution serializes traced field keys directly into the HDF5 metadata using a new attribute __tidy3d_traced_field_keys__. This ensures that simulations tracking different autograd parameters generate different cache hashes, preventing incorrect cache hits that could lead to wrong gradient computations in optimization workflows.

The implementation includes a significant refactoring that moves the Tracer, FieldMap, and TracerKeys helper classes from tidy3d/web/api/autograd/utils.py into a new shared module tidy3d/components/autograd/field_map.py. This centralization enables code reuse between the web pipeline and the serialization components while maintaining consistency in traced field metadata handling.

A new regression test test_sim_hash_changes_with_traced_keys() validates that when the traced field set changes (by converting a structure to static), both the field map and the simulation hash change accordingly, ensuring proper cache invalidation.

Important Files Changed

Changed Files

Filename	Score	Overview
tests/test_components/autograd/test_autograd.py	5/5	Adds regression test validating hash changes when traced keys differ
tidy3d/components/base.py	4/5	Implements core fix by adding traced field keys to HDF5 metadata
tidy3d/web/api/autograd/utils.py	5/5	Refactors to import helper classes from centralized location
tidy3d/components/autograd/field_map.py	4/5	New module containing extracted autograd field mapping utilities
tidy3d/web/api/autograd/io_utils.py	5/5	Updates imports to use centralized field mapping classes

Confidence score: 4/5

This PR addresses a critical correctness issue in autograd caching with a well-reasoned solution
Score reflects solid implementation with proper testing, though the core change touches critical caching logic
Pay close attention to tidy3d/components/base.py where the HDF5 metadata serialization is implemented

Sequence Diagram

sequenceDiagram
    participant User
    participant Simulation
    participant BaseModel as "Tidy3dBaseModel"
    participant FieldMap as "FieldMap/TracerKeys"
    participant HDF5 as "HDF5 File"
    participant Cache as "Cache System"

    User->>Simulation: "Create simulation with traced fields"
    Simulation->>BaseModel: "_strip_traced_fields()"
    BaseModel-->>Simulation: "AutogradFieldMap"
    
    User->>Simulation: "Export to HDF5"
    Simulation->>BaseModel: "to_hdf5()"
    BaseModel->>BaseModel: "_serialized_traced_field_keys()"
    BaseModel->>FieldMap: "TracerKeys.from_field_mapping()"
    FieldMap-->>BaseModel: "TracerKeys instance"
    BaseModel->>FieldMap: "tracer_keys.json()"
    FieldMap-->>BaseModel: "JSON serialized keys"
    BaseModel->>HDF5: "Store traced keys in attributes"
    BaseModel->>HDF5: "Write simulation data"
    
    User->>Cache: "Check cache hash"
    Cache->>BaseModel: "_hash_self()"
    BaseModel->>BaseModel: "to_hdf5() (in memory)"
    Note over BaseModel,HDF5: "HDF5 now includes traced keys in metadata"
    BaseModel-->>Cache: "Hash includes traced keys"
    
    Note over User,Cache: "Different traced sets now produce different hashes"

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

github-actions · 2025-10-08T08:53:33Z

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

tidy3d/components/autograd/field_map.py (97.0%): Missing lines 62
tidy3d/components/base.py (100%)
tidy3d/web/api/autograd/io_utils.py (100%)

Summary

Total: 45 lines
Missing: 1 line
Coverage: 97%

tidy3d/components/autograd/field_map.py

  58     )
  59 
  60     def encoded_keys(self) -> list[str]:
  61         """Return the JSON-encoded representation of keys."""
! 62         return [_encoded_path(path) for path in self.keys]
  63 
  64     @classmethod
  65     def from_field_mapping(
  66         cls,

momchil-flex · 2025-10-08T08:53:36Z

tidy3d/components/base.py

            add_data_to_file(data_dict=self.dict())
+            traced_keys_payload = self._serialized_traced_field_keys()
+            if traced_keys_payload:
+                f_handle.attrs[TRACED_FIELD_KEYS_ATTR] = traced_keys_payload


Wondering if it makes a bit more sense to use a new dataset (like we do for the simulation json) rather than the attrs directly. Feels like a bit of an abuse of even the hdf5 attrs purpose.

For the simulation json, I remember Tyler first tried putting it in attrs too, but it can be a problem when it becomes large. Here it's unlikely that the json will be very big, but still potentially better to just make a dataset?

I would actually argue that this here is metadata and exactly the kind of thing that should go in hdf5 attrs 😄 - it's a short list of paths, it never grows beyond a couple of bytes, and never needs random access, and hdf5 attrs are intended for exactly that style of short descriptive information. I think it's different from the simulation json, because that really isn't metadata, and also as you said potentially quite large.

Alright, yeah, fine by me.

hm, but I also feel uneasy about this. what is the disadvantage to just making a dataset? I feel it makes things more explicit that this is some data that we care about. It's not purely metadata.

it's a fair point, maybe the line between metadata and model data is a bit fuzzy here. i lean toward calling it metadata, because it's a deterministic byproduct of the actual model and the user inputs. if the simulation is rebuilt from scratch, then the same traced paths will be "rediscovered", so it's closer to a property rather than part of the model.
including it as a dataset also might require some additional plumbing for the hdf5 reader/writer (haven't checked), attrs doesnt show up in the json reconstruction. that being said, the difference is mostly semantic 🤷

ok I asked chat gpt and it agrees with yannick :D. so I'm fine with current implementation

Short answer:
Use HDF5 attributes if the new info is small (a few strings/ints/bools), describes the run, and you’ll read it whole.
Use a dataset only if it’s large, tabular/array-like, needs chunking/compression, or will be queried/sliced.
This info is metadata (provenance/config that affects processing, not the simulation arrays themselves).
Why attrs fit here
Attributes are meant for descriptive/run-level info (schema versions, flags, tool versions).
Cheap to read, no chunking needed, live with the object they describe (usually the root group).
Keep them small; if it grows or becomes structured, promote to a dataset or a small /cache_key group.
Cache/hash tips (to avoid fragile keys)
Scope the key: hash a defined subset only (inputs + this new flag/signature + relevant versions), not the whole file.
Canonicalization: store under /cache_key (either as attributes or tiny datasets). When building the hash, read keys, sort by name, encode deterministically (e.g., canonical JSON with sorted keys / UTF-8, or CBOR), and hash.
Versioning: include cache_key_version and producer/tool version fields. Bump when semantics change.
Defaults for back-compat: treat “missing field” as an explicit default (e.g., "backend_signature":"v0"), so old files don’t slip through incorrectly.
Auditability: also store the computed cache_key back into the file (attr) so you can explain cache hits later.
Rule of thumb
Attrs: small, descriptive, per-run constants → ✅
Dataset: big/structured, per-step/per-record data or anything you might slice/ compress → ✅
Given your description, I’d put the new field(s) as root attributes (or attributes under /cache_key) and include them in the deterministic hash. Yes—consider it metadata.

yaugenst-flex requested review from momchil-flex, tylerflex and groberts-flex October 8, 2025 08:24

yaugenst-flex self-assigned this Oct 8, 2025

greptile-apps bot reviewed Oct 8, 2025

View reviewed changes

yaugenst-flex force-pushed the FXC-3376-hash-must-include-traced-keys-remote-autograd-cache-bug branch from 161273f to a2fa0ec Compare October 8, 2025 08:34

momchil-flex approved these changes Oct 8, 2025

View reviewed changes

fix(autograd): include traced keys in HDF5 hash input

7236cf8

yaugenst-flex force-pushed the FXC-3376-hash-must-include-traced-keys-remote-autograd-cache-bug branch from a2fa0ec to 7236cf8 Compare October 8, 2025 09:20

tylerflex approved these changes Oct 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(autograd): include traced keys in HDF5 hash input #2875

fix(autograd): include traced keys in HDF5 hash input #2875

Uh oh!

yaugenst-flex commented Oct 8, 2025 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

momchil-flex Oct 8, 2025

Uh oh!

yaugenst-flex Oct 8, 2025

Uh oh!

momchil-flex Oct 8, 2025

Uh oh!

tylerflex Oct 8, 2025

Uh oh!

yaugenst-flex Oct 8, 2025

Uh oh!

tylerflex Oct 8, 2025

Uh oh!

Uh oh!

fix(autograd): include traced keys in HDF5 hash input #2875

Are you sure you want to change the base?

fix(autograd): include traced keys in HDF5 hash input #2875

Uh oh!

Conversation

yaugenst-flex commented Oct 8, 2025 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Alternatives Considered

Rationale

Greptile Overview

Summary

Important Files Changed

Confidence score: 4/5

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 8, 2025

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

Summary

tidy3d/components/autograd/field_map.py

Uh oh!

momchil-flex Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

yaugenst-flex Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

momchil-flex Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

tylerflex Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

yaugenst-flex Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

tylerflex Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yaugenst-flex commented Oct 8, 2025 •

edited by greptile-apps bot

Loading