Skip to content

Conversation

MasterSkepticista
Copy link
Collaborator

@MasterSkepticista MasterSkepticista commented Apr 4, 2025

Status quo

Today, only np.float32 arrays are transported over gRPC. There is a need for supporting wider range of (still numpy array) dtypes e.g. key shares in secure aggregation are strings.

There are two other issues:

  • Use of other numerics (np.intX, np.floatX) leads to silent typecasts to np.float32. User is not aware of this and also may lead to loss of precision.
  • Other dtypes fail with undefined behavior and/or garbage output.

Proposal

This PR proposes a generalized lossless NumpyArrayToBytes transformer which supports all dtypes that numpy supports. A new field in the tensor metadata is inserted: string dtype.

Impact

Secure aggregation currently creates a dependency on the component to serialize keys as json-encoded strings. It is not possible to separate TensorCodec from the Aggregator or Collaborator component without an informed way of sending arbitrary dtype arrays over communication channel.

Changes tested, ready for review (modulo any tests that failed due to tip moving ahead).

Signed-off-by: Shah, Karan <[email protected]>
Signed-off-by: Shah, Karan <[email protected]>
@MasterSkepticista MasterSkepticista force-pushed the karansh1/nparray_tobytes branch from 2a28559 to 1a4923a Compare April 4, 2025 11:13
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer(s): Changes in this file are unrelated to the PR. Variables are renamed for readability and python GC action

@MasterSkepticista MasterSkepticista force-pushed the karansh1/nparray_tobytes branch from 1a4923a to 439a6a7 Compare April 4, 2025 11:27
@payalcha
Copy link
Collaborator

payalcha commented Apr 4, 2025

All tests are failing. Attached participant logs for reference.
collaborator1.log
aggregator.log
collaborator2.log

Copy link
Collaborator

@theakshaypant theakshaypant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons for the CI failing seems to be the absence of "dtype" in metadata. I believe we also need to modify places where metadata dictionary is generated from proto as it does not take proto.dtype into account.
Specifically these lines need to be changed such that

        metadata_dict[tensor_proto.name] = [
            {
                "int_to_float": proto.int_to_float,
                "int_list": proto.int_list,
                "bool_list": proto.bool_list,
                "dtype": proto.dtype,
            }
            for proto in tensor_proto.transformer_metadata
        ]

Another such instance can be found here.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

openfl/protocols/base.proto:25

  • The 'dtype' field is defined as a repeated string, yet the transformer sets a single dtype value. Consider changing the field to 'string dtype = 4;' to match the expected single value.
repeated string dtype = 4;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants