https://grok.com/share/bGVnYWN5_f0051360-ed42-4ba7-a89d-b70eacb4427d
The code looks good, it just uses the intrinsic matmul on each rank, and distributes data around. I have not tested it, and I think it is distributing full rows, and likely a better algorithm is to partition both rows and columns, but that can be done later.