Skip to content

cspan: Suggesting an ergonomic C99 API with multidimensional matrices for BLIS #772

@tylov

Description

@tylov

This is more a question whether there is interest for a small, fast and generic C99 implementation of a numpy-alike multidimensional array-view API (currently header file ~200 LOC)? I am not currently a BLIS user, but I gathered it may have some interest for C users of BLIS. It does not currently compile with C++ (can be done), but C is the primary target. It supports

  • row- and column-major layouts, transpose
  • type safety, bounds checking by default
  • full numpy slicing/subview capablilities (currently not negative increment step)
  • fast linear iteration of multidimensional (and sliced) arrays.

Note that C++23 std::mdspan will not support the two last bullets at all (maybe in C++26).
The API could possibly be adapted at some level as a "contribution API", for making it more convenient to use BLIS directly from C, by extending it to wrap BLIS-function calls, as shown in the example below. I am not aware of any similar C library with these specific features, ergonomics, and small package. I made a recent reddit post here. The example below is a rewrite of the example in the c++ std::mdspan proposal

// C99:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <string.h>
#include "cspan.h"

using_cspan3(Mat, float); // shorthand for defining Mat, Mat2, and Mat3

typedef Mat2 OutMat;
typedef struct { Mat2 m00, m01, m10, m11; } Partition;

Partition partition(Mat2 A)
{
  int32_t M = A.shape[0];
  int32_t N = A.shape[1];
  return (Partition){
    .m00 = cspan_slice(Mat2, &A, {0, M/2}, {0, N/2}),
    .m01 = cspan_slice(Mat2, &A, {0, M/2}, {N/2, N}),
    .m10 = cspan_slice(Mat2, &A, {M/2, M}, {0, N/2}),
    .m11 = cspan_slice(Mat2, &A, {M/2, M}, {N/2, N}),
  };
}

#if 0
void base_case_matrix_product(Mat2 A, Mat2 B, OutMat C)
{
  cblas_sgemm(
    CblasColMajor, CblasNoTrans, CblasNoTrans,
    C.shape[0], C.shape[1], A.shape[1], 1.0f,
    A.data, A.stride.d[1], B.data, B.stride.d[1], 1.0f,
    C.data, C.stride.d[1]);
}
#else
// Slow generic implementation
void base_case_matrix_product(Mat2 A, Mat2 B, OutMat C)
{
  for (int j = 0; j < C.shape[1]; ++j) {
    for (int i = 0; i < C.shape[0]; ++i) {
      Mat2_value C_ij = 0;
      for (int k = 0; k < A.shape[1]; ++k) {
        C_ij += *cspan_at(&A, i,k) * *cspan_at(&B, k,j);
      }
      *cspan_at(&C, i,j) += C_ij;
    }
  }
}
#endif

void recursive_matrix_product(Mat2 A, Mat2 B, OutMat C)
{
  // Some hardware-dependent constant
  enum {recursion_threshold = 32};
  if (C.shape[0] <= recursion_threshold || C.shape[1] <= recursion_threshold) {
    base_case_matrix_product(A, B, C);
  } else {
    Partition c = partition(C),
              a = partition(A),
              b = partition(B);
    recursive_matrix_product(a.m00, b.m00, c.m00);
    recursive_matrix_product(a.m01, b.m10, c.m00);
    recursive_matrix_product(a.m10, b.m00, c.m10);
    recursive_matrix_product(a.m11, b.m10, c.m10);
    recursive_matrix_product(a.m00, b.m01, c.m01);
    recursive_matrix_product(a.m01, b.m11, c.m01);
    recursive_matrix_product(a.m10, b.m01, c.m11);
    recursive_matrix_product(a.m11, b.m11, c.m11);
  }
}


int main(void)
{
  enum {N = 10, D = 256};
  float* values = malloc(N * D * D * sizeof values[0]);
  float out[D * D];

  Mat3 data = cspan_md(values, N, D, D); // default row-major
  c_foreach (i, Mat3, data)
    *i.ref = ((double)rand()/RAND_MAX - 0.5)*4.0;

  OutMat c = cspan_md_layout(c_COLMAJOR, out, D, D);
  Mat2 a = cspan_submd3(&data, 0);
  cspan_transpose(&a);

  clock_t t = clock();
  for (int i=1; i<N; ++i) {
    Mat2 b = cspan_submd3(&data, i);
    cspan_transpose(&b);
    memset(out, 0, sizeof out);
    //base_case_matrix_product(a, b, c);
    recursive_matrix_product(a, b, c);  // 2x faster -O3
  }
  t = clock() - t;

  double sum = 0.0;
  c_foreach (i, Mat2, c) sum += *i.ref;  // linear iterate md span.
  printf("sum=%.16g, %f ms\n", sum, (double)t*1000.0/CLOCKS_PER_SEC);
  free(values);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions