Skip to content

Conversation

j143
Copy link
Member

@j143 j143 commented Aug 31, 2025

..

Copy link
Member Author

@j143 j143 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mboehm7 , I've attempted to implement matrix-matrix multiplication. There are some concerns.

  1. copy() method
  2. about a situation where different blocks having different sparsity. Since, I'm not explicitly handling - could that be handled?

Copy link
Contributor

@mboehm7 mboehm7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort here @j143 and my comments are below. However, in the future let's please better synchronize on who is doing which task (I thought I shared that @jessicapriebe is working on OOC matrix-multiplication; one PR merged, but additional ones to come). For now please continue with this PR since Jessica is anyway busy for the next weeks.


// Create a copy
MatrixBlock sourceBlock = (MatrixBlock) tmpA.getValue();
MatrixBlock blockCopy = new MatrixBlock(sourceBlock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need for explicit copies - for every block that is streamed in we perform a block matrix multiplication which creates new output blocks and these output blocks are then aggregated if necessary. By default we have copy-on-write-semantics which means operations are never performed in place unless we explicitly say so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

This patch introduces the MatrixMatrix multiplication logic. It performs a
shuffle-based matrix multiplication on two large matrix streams.

Implementation Detail:

Asynchronous Producer: The processInstruction method launches a background
thread to perform the entire two-phase multiplication, but returns control
to the main thread immediately. This non-blocking setup allows the compiler
to build the downstream executionplan while the OOC operation prepares to run
upon data request.

Two-Phase Streaming Logic: The background thread implements a shuffle-based
algorithm to handle two large inputs:

    * Phase 1 (Grouping/Shuffle): It first consumes both input streams entirely.
     Blocks from each stream (A_ik and B_kj) are partitioned into groups based
     on the output block index (C_ij) they contribute to. A HashMap stores these
     groups, effectively "shuffling" the data for parallel processing.

    * Phase 2 (Aggregation/Reduce): After grouping, it processes each group
    independently. Within a group, it pairs the corresponding blocks using their
    common index k, performs the block-level multiplication, and aggregates the
    results to produce a single, final output block which is then enqueued to
    theoutput stream.

Robust Block Identification: A TaggedMatrixValue wrapper is used during the grouping
phase to explicitly tag each block with its source matrix (A or B).
This ensures correct and unambiguous identification during the aggregation phase,
a critical requirement that cannot be met by relying on block dimensions alone.

Integration: The new instruction is fully integrated into the OOC framework:
  * The OOCInstructionParser is updated to recognize the aggregate binary in OOC context.
@j143 j143 force-pushed the SYSTEMDS-3910-ooc-matrix-matrix-operations branch from 648d9bf to eb44ff0 Compare September 2, 2025 17:26
@j143
Copy link
Member Author

j143 commented Sep 2, 2025

Hi @mboehm7 , any codestyle or commit style suggestions that I might be missing please do let me know - I will try to take care of them in the future code changes. Across all my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants