Support generalized dtype `np.ndarray` transformers #1518

MasterSkepticista · 2025-04-04T11:11:56Z

Status quo

Today, only np.float32 arrays are transported over gRPC. There is a need for supporting wider range of (still numpy array) dtypes e.g. key shares in secure aggregation are strings.

There are two other issues:

Use of other numerics (np.intX, np.floatX) leads to silent typecasts to np.float32. User is not aware of this and also may lead to loss of precision.
Other dtypes fail with undefined behavior and/or garbage output.

Proposal

This PR proposes a generalized lossless NumpyArrayToBytes transformer which supports all dtypes that numpy supports. A new field in the tensor metadata is inserted: string dtype.

Impact

Secure aggregation currently creates a dependency on the component to serialize keys as json-encoded strings. It is not possible to separate TensorCodec from the Aggregator or Collaborator component without an informed way of sending arbitrary dtype arrays over communication channel.

Changes tested, ready for review (modulo any tests that failed due to tip moving ahead).

Signed-off-by: Shah, Karan <[email protected]>

MasterSkepticista · 2025-04-04T11:16:52Z

openfl/pipelines/tensor_codec.py

Note for reviewer(s): Changes in this file are unrelated to the PR. Variables are renamed for readability and python GC action

payalcha · 2025-04-04T12:33:46Z

All tests are failing. Attached participant logs for reference.
collaborator1.log
aggregator.log
collaborator2.log

theakshaypant

One of the reasons for the CI failing seems to be the absence of "dtype" in metadata. I believe we also need to modify places where metadata dictionary is generated from proto as it does not take proto.dtype into account.
Specifically these lines need to be changed such that

        metadata_dict[tensor_proto.name] = [
            {
                "int_to_float": proto.int_to_float,
                "int_list": proto.int_list,
                "bool_list": proto.bool_list,
                "dtype": proto.dtype,
            }
            for proto in tensor_proto.transformer_metadata
        ]

Another such instance can be found here.

Copilot

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

openfl/protocols/base.proto:25

The 'dtype' field is defined as a repeated string, yet the transformer sets a single dtype value. Consider changing the field to 'string dtype = 4;' to match the expected single value.

repeated string dtype = 4;

Signed-off-by: Shah, Karan <[email protected]>

Misc: variable reuse

f9fe941

Signed-off-by: Shah, Karan <[email protected]>

MasterSkepticista requested review from kminhta, psfoley and teoparvanov as code owners April 4, 2025 11:11

Generic ndarray transformer

439a6a7

Signed-off-by: Shah, Karan <[email protected]>

MasterSkepticista force-pushed the karansh1/nparray_tobytes branch from 2a28559 to 1a4923a Compare April 4, 2025 11:13

MasterSkepticista commented Apr 4, 2025

View reviewed changes

MasterSkepticista force-pushed the karansh1/nparray_tobytes branch from 1a4923a to 439a6a7 Compare April 4, 2025 11:27

theakshaypant reviewed Apr 5, 2025

View reviewed changes

theakshaypant mentioned this pull request Apr 7, 2025

Add CollaboratorSerialiser middleware for serialisation/deserialisation between collaborator and aggregator client #1476

Open

rahulga1 requested a review from Copilot April 8, 2025 09:49

Copilot AI reviewed Apr 8, 2025

View reviewed changes

MasterSkepticista added 2 commits April 10, 2025 17:26

Extract dtype field at relevant points

194ad13

Signed-off-by: Shah, Karan <[email protected]>

For the tests

55fc949

Signed-off-by: Shah, Karan <[email protected]>

MasterSkepticista requested review from noopurintel, payalcha, rahulga1, rajithkrishnegowda, ishaileshpant and tanwarsh as code owners April 10, 2025 12:21

MasterSkepticista added 2 commits April 10, 2025 17:51

Merge branch 'develop' into karansh1/nparray_tobytes

78478d3

Merge branch 'develop' into karansh1/nparray_tobytes

82fa232

MasterSkepticista added workflow_interface eden_compression labels Apr 11, 2025

MasterSkepticista added 2 commits April 11, 2025 15:47

Replace f32 transformer with generic class

c18253c

Signed-off-by: Shah, Karan <[email protected]>

Add dtype field

b38bec3

Signed-off-by: Shah, Karan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support generalized dtype `np.ndarray` transformers #1518

Support generalized dtype `np.ndarray` transformers #1518

Uh oh!

MasterSkepticista commented Apr 4, 2025 •

edited

Loading

Uh oh!

MasterSkepticista Apr 4, 2025

Uh oh!

payalcha commented Apr 4, 2025

Uh oh!

theakshaypant left a comment •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Support generalized dtype np.ndarray transformers #1518

Are you sure you want to change the base?

Support generalized dtype np.ndarray transformers #1518

Uh oh!

Conversation

MasterSkepticista commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status quo

Proposal

Impact

Uh oh!

MasterSkepticista Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

payalcha commented Apr 4, 2025

Uh oh!

theakshaypant left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Support generalized dtype `np.ndarray` transformers #1518

Support generalized dtype `np.ndarray` transformers #1518

MasterSkepticista commented Apr 4, 2025 •

edited

Loading

theakshaypant left a comment •

edited

Loading