Skip to content

Conversation

@maxdml
Copy link
Collaborator

@maxdml maxdml commented Oct 27, 2025

This PR:

  • Lift encoding outside of the system database
  • Improve the gob decoder to reconstruct pointer types automatically
  • Performs gob registration lazily before encode/decode
  • Distinguish between nil values and empty strings stored in the DB (we use a wrapper over the value to gob encode/decode)

What we do not support:

  • Workflow signatures with interface types (works if the user manually gob.Register the concrete implementation of the interface)
  • Workflow signatures with nested pointers

Tests

  • Getting result from direct handle, polling handle, listing workflows and getting wf steps
  • send/recv, setEvent/getEvent
  • workflow recovery, workflow queues
  • Test workflows with a "concrete" signature, and a workflow with a typeless signature (call the later with many concrete types)

Comment on lines -21 to -26
if inputStr, ok := workflow.Input.(string); ok {
if strings.Contains(inputStr, "Failed to decode") {
ctx.logger.Warn("Skipping workflow recovery due to input decoding failure", "workflow_id", workflow.ID, "name", workflow.Name)
continue
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused

@maxdml maxdml force-pushed the custom-serializer branch from f9c14ce to c609278 Compare October 28, 2025 20:20
@maxdml maxdml force-pushed the custom-serializer branch from cd0b4be to c48821f Compare October 30, 2025 20:58
@maxdml maxdml marked this pull request as ready for review October 30, 2025 21:31
@maxdml maxdml changed the title custom serializer Improve workflow/steps input/outputs serializer Nov 4, 2025
Copy link
Member

@kraftp kraftp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing it, but where in this PR enforces what can and can't be serialized?

@maxdml
Copy link
Collaborator Author

maxdml commented Nov 5, 2025

I'm probably missing it, but where in this PR enforces what can and can't be serialized?

There is actually no way to declare a generic type that exclude all interfaces or nested pointers.

We'd have to do it at runtime, both in RegisterWorkflow (as early as it gets) and RunAsStep (late feedback.)

But doing it for interfaces would also prevent users from registering themselves the interface. Manual registration from the user makes the whole thing work. I think we should just clearly document it.

Do you think we should error / panic at runtime if we identify nested pointers?

@kraftp
Copy link
Member

kraftp commented Nov 5, 2025

I'm probably missing it, but where in this PR enforces what can and can't be serialized?

There is actually no way to declare a generic type that exclude all interfaces or nested pointers.

We'd have to do it at runtime, both in RegisterWorkflow (as early as it gets) and RunAsStep (late feedback.)

But doing it for interfaces would also prevent users from registering themselves the interface. Manual registration from the user makes the whole thing work. I think we should just clearly document it.

Do you think we should error / panic at runtime if we identify nested pointers?

It's fine if there's no compile-time way to identify unserializable types. There isn't any in the other languages either. But we should clearly document what can and can't be serialized (ideally supplemented with links to gob docs) and should throw clear errors if we get something that isn't serializable.

@maxdml
Copy link
Collaborator Author

maxdml commented Nov 5, 2025

I'm probably missing it, but where in this PR enforces what can and can't be serialized?

There is actually no way to declare a generic type that exclude all interfaces or nested pointers.
We'd have to do it at runtime, both in RegisterWorkflow (as early as it gets) and RunAsStep (late feedback.)
But doing it for interfaces would also prevent users from registering themselves the interface. Manual registration from the user makes the whole thing work. I think we should just clearly document it.
Do you think we should error / panic at runtime if we identify nested pointers?

It's fine if there's no compile-time way to identify unserializable types. There isn't any in the other languages either. But we should clearly document what can and can't be serialized (ideally supplemented with links to gob docs) and should throw clear errors if we get something that isn't serializable.

Added the detection / prevention of nested pointers in eb4371a

Copy link
Member

@kraftp kraftp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, what's wrong with nested pointers?

@maxdml
Copy link
Collaborator Author

maxdml commented Nov 6, 2025

Just to be clear, what's wrong with nested pointers?

On the GetWorkflowSteps and ListWorkflows, we decode values into an any variable. gob cannot decode into any a value that was encoded from a concrete type. To address this, we wrap every encoded value in:

type gobValue struct {
    value any
}

Doing so allows Gob to decode anything into an any value (the get steps/ list wf paths).

On the typed paths (e.g., RetrieveWorkflow[T]), because we don't automatically decode into T, we must reconstruct it using reflection. If T is a pointer type, we must dynamically detect it and re-create a pointer type (we can't just do value.(T)). I elected to not perform recursive reflection on T to identify the nesting level (this is un-haltable.)

@kraftp
Copy link
Member

kraftp commented Nov 7, 2025

Just to be clear, what's wrong with nested pointers?

On the GetWorkflowSteps and ListWorkflows, we decode values into an any variable. gob cannot decode into any a value that was encoded from a concrete type. To address this, we wrap every encoded value in:

type gobValue struct {
    value any
}

Doing so allows Gob to decode anything into an any value (the get steps/ list wf paths).

On the typed paths (e.g., RetrieveWorkflow[T]), because we don't automatically decode into T, we must reconstruct it using reflection. If T is a pointer type, we must dynamically detect it and re-create a pointer type (we can't just do value.(T)). I elected to not perform recursive reflection on T to identify the nesting level (this is un-haltable.)

I don't get it--why is this PR so complex and why is there so much special-casing? We've had no serialization issues in the other three languages. Is Go just really bad for this, or are we doing something wrong or overcomplicating the problem or trying to do too much?

@maxdml
Copy link
Collaborator Author

maxdml commented Nov 7, 2025

Just to be clear, what's wrong with nested pointers?

On the GetWorkflowSteps and ListWorkflows, we decode values into an any variable. gob cannot decode into any a value that was encoded from a concrete type. To address this, we wrap every encoded value in:

type gobValue struct {
    value any
}

Doing so allows Gob to decode anything into an any value (the get steps/ list wf paths).
On the typed paths (e.g., RetrieveWorkflow[T]), because we don't automatically decode into T, we must reconstruct it using reflection. If T is a pointer type, we must dynamically detect it and re-create a pointer type (we can't just do value.(T)). I elected to not perform recursive reflection on T to identify the nesting level (this is un-haltable.)

I don't get it--why is this PR so complex and why is there so much special-casing? We've had no serialization issues in the other three languages. Is Go just really bad for this, or are we doing something wrong or overcomplicating the problem or trying to do too much?

It all boils down to how one can decode a value (json or gob) into a variable and to the fact there are no heterogeneous generic collections in Go.

If you decode directly into the target type, the decoder implementation should know how to reconstruct the target type properly. If you don't, i.e., you decode into any, you can't reconstruct beyond what gob stored alongside the value.

When you want to return a list of workflows/steps, you can't gather them in a slice where the type is known.

The only way out I can think out of this requires:

  • Maintaining a registry of step name / workflow name -> reflect.Type (no heterogeneous generic maps in Go)
  • query workflows / steps from Postgres (what we do now)
  • For each workflow/step name, instantiate a new WorkflowStatus/stepInfo of the correct type, done by reflection using something like:
if t, ok := types[name]; ok {
    return reflect.New(t).Interface(), true
}
  • At this point, T should be reconstructed and we can make a new copy of the entry into an any version, and return a single list

JSON as roughly the same issues

This is a limitation of the language. For example the Temporal go sdk does not even have a list workflowS (plural) method, and its "list workflow" method doesn't even return workflow inputs/outputs directly.

edit: we could also decide to disable loading inputs/outputs in the list methods, but that would make the client sides more difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants