Framelib transforms how you manage data projects.
Instead of juggling hardcoded paths and implicit data structures, you can define your entire data architecture—files, folders, schemas, and even embedded databases—as clean, self-documenting, and type-safe Python classes.
It leverages pathlib, polars, narwhals, and duckdb to provide a robust framework for building maintainable and scalable data pipelines.
import polars as pl
import framelib as fl
from pathlib import Path
df = pl.DataFrame(
{
"user_id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"value": [10.5, 20.75, 30.0],
}
)
class MySchema(fl.Schema):
user_id = fl.UInt16(primary_key=True)
name = fl.String()
value = fl.Float32()
class MyData(fl.Folder):
my_csv = fl.CSV(model=MySchema)
MyData.my_csv.write(df)
MyData.my_csv.scan_cast().select(MySchema.value.pl_col.sum()).collect()
class MyJsonData(fl.Folder):
__source__ = Path(__file__).parent # override inferred source path if needed
infos = fl.Json()
sales = fl.Json()
clients = fl.Json()
# Lots of convenient methods availables thanks to framelib + pyochain working together
# Rewrite all JSON files to NDJSON format conveniently using the schema API
def rewrite_json_to_ndjson() -> None:
return (
MyJsonData.schema()
.map_values(lambda x: x.read().write_ndjson(x.source.with_suffix(".ndjson")))
.pipe(lambda _: print(f"success: {MyJsonData.show_tree()}"))
)
Define your project's file and database layout using intuitive Python classes.
Each class represents a folder, file, types schema, or database table, making your data structure explicit and easy to understand.
If no source is provided, the source path is automatically inferred from the class name and its position in the hierarchy.
This applies for each file declared as an attribute of a Folder class, and each Column declared in a Schema class.
Define once, use everywhere. Your data structure definitions are reusable across your entire codebase.
Framelib provides a Schema class, with an API strongly inspired by dataframely, to define data schemas with strong typing and validation.
A Schema is a specialized Layout that only accepts Column entries.
A Column represents a single column in a data schema, with optional primary key designation.
Various Column types are available, such as Int32, Enum, Struct, and more.
Each Column can then be converted to it's corresponding polars, narwhals, or SQL datatype.
For example Column.UInt32.pl_dtype returns an instance of pl.UInt32.
You can cast data to the defined schema when reading from files or databases, ensuring consistency and reducing runtime errors.
This interoperability and data validation maintains the core declarative DRY philosophy of framelib.
Read, write, and process data with a high-level API that abstracts away boilerplate code.
You don't have to manually pass your argument to polars.scan_parquet ever again. simply call MyFolder.myfile.scan() and framelib handles the rest.
At a glance, you can then check:
- where is my data stored?
- in which format?
- with which schema?
Automatically generate a recursive tree view of your data layout for easy navigation and documentation.
Manage and query an embedded DuckDB database with the same declarative approach.
Get back your DuckDB queries as narwhals lazyframe, and write your queries with the polars syntax.
uv add git+https://github.com/OutSquareCapital/framelib.gituv run -m tests.mainA marimo notebook with more detailed examples is available at https://static.marimo.app/static/example-z9f2
Heavily inspired by dataframely: https://github.com/quantco/dataframely
MIT License. See LICENSE for details.