Skip to content

Datafuse is a free Cloud-Native Analytics DBMS(Inspired by ClickHouse) implemented in Rust

License

Notifications You must be signed in to change notification settings

TCeason/datafuse

 
 

Repository files navigation

Databend

ANY DATA. ANY SCALE. ONE DATABASE.

Multimodal data warehouse for the AI era with Snowflake-compatible SQL


databend

Why Databend?

Multimodal Data Warehouse: Analyze structured, semi-structured, vector, and geospatial data with unified Snowflake-compatible SQL.

AI-Native Platform: Built-in vector search, AI functions, embedding generation, and full-text search - no separate systems needed.

10x Faster & 90% Cost Reduction: Rust-powered vectorized execution with S3-native storage eliminates vendor lock-in and proprietary overhead.

Deploy Anywhere, Connect Everything: 100% open source - run locally with pip install databend, self-host, or use managed cloud clusters. All instances share the same data seamlessly.

Production Proven: Trusted by world-class enterprises managing 800+ petabytes and 100+ million queries daily.

Enterprise Ready: Fine-grained access control, data masking, and audit logging with complete data sovereignty.

Quick Start

Option 1: Databend Cloud Warehouse (Recommended)

Start with Databend Cloud - Serverless warehouse clusters, production-ready in 60 seconds

Option 2: Local Development with Python

pip install databend
import databend

ctx = databend.SessionContext()

# Local table for quick testing
ctx.sql("CREATE TABLE products (id INT, name STRING, price FLOAT)").collect()
ctx.sql("INSERT INTO products VALUES (1, 'Laptop', 1299.99), (2, 'Phone', 899.50)").collect()
ctx.sql("SELECT * FROM products").show()

# S3 remote table (same as cloud warehouse)
ctx.create_s3_connection("s3", "your_key", "your_secret")
ctx.sql("CREATE TABLE sales (id INT, revenue FLOAT) 's3://bucket/sales/' CONNECTION=(connection_name='s3')").collect()
ctx.sql("SELECT COUNT(*) FROM sales").show()

Option 3: Docker (Self-Host Experience)

docker run -p 8000:8000 datafuselabs/databend

Experience the full warehouse capabilities locally - same features as cloud clusters.

Benchmarks

Performance: TPC-H vs Snowflake | ClickBench Results Cost: 90% Cost Reduction

Architecture

Databend Architecture

Multimodal Cloud Warehouse: Production clusters analyze structured, semi-structured, vector, and geospatial data with Snowflake-compatible SQL. Local development environments can attach to the same warehouse data for seamless development.

Use Cases

  • Data Analytics: Snowflake alternative with significant cost reduction
  • AI/ML Pipelines: Vector search and AI functions built-in
  • Real-time Analytics: High-performance queries on petabyte-scale data
  • Data Lake Analytics: Query Parquet, CSV, TSV, NDJSON, Avro, ORC directly from S3

Community

Contributors get immortalized in system.contributors table! 🏆

📄 License

Apache License 2.0 + Elastic License 2.0 Licensing FAQs


Built by engineers who redefine what's possible with data
🌐 Website🐦 Twitter🗺️ Roadmap

About

Datafuse is a free Cloud-Native Analytics DBMS(Inspired by ClickHouse) implemented in Rust

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 96.3%
  • Shell 2.1%
  • Python 1.5%
  • Jinja 0.1%
  • Lua 0.0%
  • Dockerfile 0.0%