Skip to content

OutSquareCapital/framelib

Repository files navigation

Framelib: Declarative Data Architecture

Framelib transforms how you manage data projects.

Instead of juggling hardcoded paths and implicit data structures, you can define your entire data architecture—files, folders, schemas, and even embedded databases—as clean, self-documenting, and type-safe Python classes.

It leverages pathlib, polars, narwhals, and duckdb to provide a robust framework for building maintainable and scalable data pipelines.

Simple Example

import polars as pl
import framelib as fl
from pathlib import Path

df = pl.DataFrame(
    {
        "user_id": [1, 2, 3],
        "name": ["Alice", "Bob", "Charlie"],
        "value": [10.5, 20.75, 30.0],
    }
)


class MySchema(fl.Schema):
    user_id = fl.UInt16(primary_key=True)
    name = fl.String()
    value = fl.Float32()


class MyData(fl.Folder):
    my_csv = fl.CSV(model=MySchema)


MyData.my_csv.write(df)
MyData.my_csv.scan_cast().select(MySchema.value.pl_col.sum()).collect()



class MyJsonData(fl.Folder):
    __source__ = Path(__file__).parent  # override inferred source path if needed
    infos = fl.Json()
    sales = fl.Json()
    clients = fl.Json()

# Lots of convenient methods availables thanks to framelib + pyochain working together
# Rewrite all JSON files to NDJSON format conveniently using the schema API
def rewrite_json_to_ndjson() -> None:
    return (
        MyJsonData.schema()
        .map_values(lambda x: x.read().write_ndjson(x.source.with_suffix(".ndjson")))
        .pipe(lambda _: print(f"success: {MyJsonData.show_tree()}"))
    )

Why Framelib?

🏛️ Declare Your Architecture Once

Define your project's file and database layout using intuitive Python classes.

Each class represents a folder, file, types schema, or database table, making your data structure explicit and easy to understand.

If no source is provided, the source path is automatically inferred from the class name and its position in the hierarchy.

This applies for each file declared as an attribute of a Folder class, and each Column declared in a Schema class.

Define once, use everywhere. Your data structure definitions are reusable across your entire codebase.

📜 Enforce Data Contracts

Framelib provides a Schema class, with an API strongly inspired by dataframely, to define data schemas with strong typing and validation.

A Schema is a specialized Layout that only accepts Column entries.

A Column represents a single column in a data schema, with optional primary key designation.

Various Column types are available, such as Int32, Enum, Struct, and more.

Each Column can then be converted to it's corresponding polars, narwhals, or SQL datatype.

For example Column.UInt32.pl_dtype returns an instance of pl.UInt32.

You can cast data to the defined schema when reading from files or databases, ensuring consistency and reducing runtime errors.

This interoperability and data validation maintains the core declarative DRY philosophy of framelib.

🚀 Streamline Workflows

Read, write, and process data with a high-level API that abstracts away boilerplate code.

You don't have to manually pass your argument to polars.scan_parquet ever again. simply call MyFolder.myfile.scan() and framelib handles the rest.

At a glance, you can then check:

  • where is my data stored?
  • in which format?
  • with which schema?

🌲 Visualize Your Project

Automatically generate a recursive tree view of your data layout for easy navigation and documentation.

📦 Embedded Data Warehouse

Manage and query an embedded DuckDB database with the same declarative approach.

Get back your DuckDB queries as narwhals lazyframe, and write your queries with the polars syntax.

Installation

uv add git+https://github.com/OutSquareCapital/framelib.git

Development

uv run -m tests.main

Quickstart

A marimo notebook with more detailed examples is available at https://static.marimo.app/static/example-z9f2

Credits

Heavily inspired by dataframely: https://github.com/quantco/dataframely

License

MIT License. See LICENSE for details.

About

Declarative data schema with method chaining for duckdb and polars users

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages