Skip to content

Conversation

yaugenst-flex
Copy link
Collaborator

@yaugenst-flex yaugenst-flex commented Oct 8, 2025

Summary

  • serialize the adjoint traced-key paths into the forward simulation’s HDF5 metadata (__tidy3d_traced_field_keys__) so cache hashes differ when the tracked set changes
  • move the Tracer, FieldMap, and TracerKeys helpers into tidy3d/components/autograd and reuse them from both the serializer and the web pipeline
  • add a regression test confirming simulations with identical statics but different tracer sets now hash differently

Alternatives Considered

  • keep relying on the existing sidecar TracerKeys file: this leaves the cache blind to tracer changes, because the hash sees identical HDF5 exports
  • expose traced keys as a new model field: that would surface runtime autograd state in the public schema even though users neither set nor persist it
  • store the paths in model.attrs: those attrs are user-controlled metadata and cached JSON already assumes they are non-functional

Rationale

Writing the traced-key summary to an HDF5 attribute keeps the metadata with the exported simulation, feeds into the hashing logic, and avoids making autograd bookkeeping part of the user-facing schema.

Greptile Overview

Updated On: 2025-10-08 08:26:25 UTC

Summary

This PR fixes a critical autograd cache bug where simulations with identical static parameters but different traced field sets would incorrectly share cache entries. The core issue was that the cache hash only considered the HDF5 file contents (which are identical for the same static simulation) but ignored the sidecar `TracerKeys` file that tracks which fields are being traced by autograd.

The solution serializes traced field keys directly into the HDF5 metadata using a new attribute __tidy3d_traced_field_keys__. This ensures that simulations tracking different autograd parameters generate different cache hashes, preventing incorrect cache hits that could lead to wrong gradient computations in optimization workflows.

The implementation includes a significant refactoring that moves the Tracer, FieldMap, and TracerKeys helper classes from tidy3d/web/api/autograd/utils.py into a new shared module tidy3d/components/autograd/field_map.py. This centralization enables code reuse between the web pipeline and the serialization components while maintaining consistency in traced field metadata handling.

A new regression test test_sim_hash_changes_with_traced_keys() validates that when the traced field set changes (by converting a structure to static), both the field map and the simulation hash change accordingly, ensuring proper cache invalidation.

Important Files Changed

Changed Files
Filename Score Overview
tests/test_components/autograd/test_autograd.py 5/5 Adds regression test validating hash changes when traced keys differ
tidy3d/components/base.py 4/5 Implements core fix by adding traced field keys to HDF5 metadata
tidy3d/web/api/autograd/utils.py 5/5 Refactors to import helper classes from centralized location
tidy3d/components/autograd/field_map.py 4/5 New module containing extracted autograd field mapping utilities
tidy3d/web/api/autograd/io_utils.py 5/5 Updates imports to use centralized field mapping classes

Confidence score: 4/5

  • This PR addresses a critical correctness issue in autograd caching with a well-reasoned solution
  • Score reflects solid implementation with proper testing, though the core change touches critical caching logic
  • Pay close attention to tidy3d/components/base.py where the HDF5 metadata serialization is implemented

Sequence Diagram

sequenceDiagram
    participant User
    participant Simulation
    participant BaseModel as "Tidy3dBaseModel"
    participant FieldMap as "FieldMap/TracerKeys"
    participant HDF5 as "HDF5 File"
    participant Cache as "Cache System"

    User->>Simulation: "Create simulation with traced fields"
    Simulation->>BaseModel: "_strip_traced_fields()"
    BaseModel-->>Simulation: "AutogradFieldMap"
    
    User->>Simulation: "Export to HDF5"
    Simulation->>BaseModel: "to_hdf5()"
    BaseModel->>BaseModel: "_serialized_traced_field_keys()"
    BaseModel->>FieldMap: "TracerKeys.from_field_mapping()"
    FieldMap-->>BaseModel: "TracerKeys instance"
    BaseModel->>FieldMap: "tracer_keys.json()"
    FieldMap-->>BaseModel: "JSON serialized keys"
    BaseModel->>HDF5: "Store traced keys in attributes"
    BaseModel->>HDF5: "Write simulation data"
    
    User->>Cache: "Check cache hash"
    Cache->>BaseModel: "_hash_self()"
    BaseModel->>BaseModel: "to_hdf5() (in memory)"
    Note over BaseModel,HDF5: "HDF5 now includes traced keys in metadata"
    BaseModel-->>Cache: "Hash includes traced keys"
    
    Note over User,Cache: "Different traced sets now produce different hashes"
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@yaugenst-flex yaugenst-flex force-pushed the FXC-3376-hash-must-include-traced-keys-remote-autograd-cache-bug branch from 161273f to a2fa0ec Compare October 8, 2025 08:34
Copy link
Contributor

github-actions bot commented Oct 8, 2025

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

  • tidy3d/components/autograd/field_map.py (97.0%): Missing lines 62
  • tidy3d/components/base.py (100%)
  • tidy3d/web/api/autograd/io_utils.py (100%)

Summary

  • Total: 45 lines
  • Missing: 1 line
  • Coverage: 97%

tidy3d/components/autograd/field_map.py

  58     )
  59 
  60     def encoded_keys(self) -> list[str]:
  61         """Return the JSON-encoded representation of keys."""
! 62         return [_encoded_path(path) for path in self.keys]
  63 
  64     @classmethod
  65     def from_field_mapping(
  66         cls,

add_data_to_file(data_dict=self.dict())
traced_keys_payload = self._serialized_traced_field_keys()
if traced_keys_payload:
f_handle.attrs[TRACED_FIELD_KEYS_ATTR] = traced_keys_payload
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it makes a bit more sense to use a new dataset (like we do for the simulation json) rather than the attrs directly. Feels like a bit of an abuse of even the hdf5 attrs purpose.

For the simulation json, I remember Tyler first tried putting it in attrs too, but it can be a problem when it becomes large. Here it's unlikely that the json will be very big, but still potentially better to just make a dataset?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually argue that this here is metadata and exactly the kind of thing that should go in hdf5 attrs 😄 - it's a short list of paths, it never grows beyond a couple of bytes, and never needs random access, and hdf5 attrs are intended for exactly that style of short descriptive information. I think it's different from the simulation json, because that really isn't metadata, and also as you said potentially quite large.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, yeah, fine by me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, but I also feel uneasy about this. what is the disadvantage to just making a dataset? I feel it makes things more explicit that this is some data that we care about. It's not purely metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a fair point, maybe the line between metadata and model data is a bit fuzzy here. i lean toward calling it metadata, because it's a deterministic byproduct of the actual model and the user inputs. if the simulation is rebuilt from scratch, then the same traced paths will be "rediscovered", so it's closer to a property rather than part of the model.
including it as a dataset also might require some additional plumbing for the hdf5 reader/writer (haven't checked), attrs doesnt show up in the json reconstruction. that being said, the difference is mostly semantic 🤷

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I asked chat gpt and it agrees with yannick :D. so I'm fine with current implementation

Short answer:
Use HDF5 attributes if the new info is small (a few strings/ints/bools), describes the run, and you’ll read it whole.
Use a dataset only if it’s large, tabular/array-like, needs chunking/compression, or will be queried/sliced.
This info is metadata (provenance/config that affects processing, not the simulation arrays themselves).
Why attrs fit here
Attributes are meant for descriptive/run-level info (schema versions, flags, tool versions).
Cheap to read, no chunking needed, live with the object they describe (usually the root group).
Keep them small; if it grows or becomes structured, promote to a dataset or a small /cache_key group.
Cache/hash tips (to avoid fragile keys)
Scope the key: hash a defined subset only (inputs + this new flag/signature + relevant versions), not the whole file.
Canonicalization: store under /cache_key (either as attributes or tiny datasets). When building the hash, read keys, sort by name, encode deterministically (e.g., canonical JSON with sorted keys / UTF-8, or CBOR), and hash.
Versioning: include cache_key_version and producer/tool version fields. Bump when semantics change.
Defaults for back-compat: treat “missing field” as an explicit default (e.g., "backend_signature":"v0"), so old files don’t slip through incorrectly.
Auditability: also store the computed cache_key back into the file (attr) so you can explain cache hits later.
Rule of thumb
Attrs: small, descriptive, per-run constants → ✅
Dataset: big/structured, per-step/per-record data or anything you might slice/ compress → ✅
Given your description, I’d put the new field(s) as root attributes (or attributes under /cache_key) and include them in the deterministic hash. Yes—consider it metadata.

@yaugenst-flex yaugenst-flex force-pushed the FXC-3376-hash-must-include-traced-keys-remote-autograd-cache-bug branch from a2fa0ec to 7236cf8 Compare October 8, 2025 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants