Skip to content
GlassFlow Logo

Docker GitHub LinkedIn Twitter

GlassFlow Overview

GlassFlow for ClickHouse Streaming ETL is a real-time stream processor designed to simplify data pipeline creation and management between Kafka and ClickHouse. It provides a powerful, user-friendly interface for building and managing real-time data pipelines with built-in support for deduplication and temporal joins.

Built specifically for data engineers, GlassFlow handles late-arriving events, ensures exactly-once correctness, and scales with high-throughput data. It delivers accurate, low-latency results from streaming data without compromising simplicity or performance. The tool's intuitive web interface makes it easy to configure and monitor pipelines, while its robust architecture ensures reliable data processing.

Key Features

  • Streaming Deduplication:

    • Real-time deduplication of Kafka streams before ingestion into ClickHouse
    • Configurable time windows up to 7 days for deduplication
    • Simple configuration of deduplication keys and time windows
    • One-click setup for deduplicated data pipelines
    • Prevents duplicate data from reaching ClickHouse
  • Temporal Stream Joins:

    • Join two Kafka streams in real-time
    • Configurable time windows up to 7 days for stream joins
    • Configure join keys and time windows through the UI
    • Simplified join setup process
    • Produce joined streams ready for ClickHouse ingestion
  • Built-in Kafka Connector:

    • Automatic data extraction from Kafka topics
    • Seamless integration with Kafka clusters
    • No manual data pulling required
    • Supports multiple Kafka topics and partitions
    • Native support for JSON data types
  • Optimized ClickHouse Sink:

    • Native ClickHouse connection for maximum performance
    • Configurable batch sizes for efficient data ingestion
    • Adjustable wait times for optimal throughput
    • Built-in retry mechanisms
    • Automatic schema detection and management
    • Full support for JSON data types in ClickHouse
  • User-Friendly Interface: Web-based UI for pipeline configuration and management

  • Local Development: Includes demo setup with local Kafka and ClickHouse instances

  • Docker Support: Easy deployment using Docker and docker-compose

  • Self-Hosted: Open-source solution that can be self-hosted in your infrastructure

Getting Started

To get started with GlassFlow, visit our main repository at glassflow/clickhouse-etl. The repository contains:

  • Complete documentation
  • Quick start guide
  • Example configurations
  • Docker setup instructions
  • API documentation

Clone the repository to get started:

git clone https://github.com/glassflow/clickhouse-etl.git

Pinned Loading

  1. clickhouse-etl clickhouse-etl Public

    Real-time deduplication and temporal joins for streaming data

    TypeScript 356 19

  2. clickhouse-etl-py-sdk clickhouse-etl-py-sdk Public

    Python SDK to create GlassFlow Clickhouse ETL pipelines

    Python 6

  3. glassgen glassgen Public

    A flexible synthetic data generator with configurable schemas, multiple sinks, and controlled event duplication.

    Python 9

Repositories

Showing 10 of 19 repositories
  • clickhouse-etl Public

    Real-time deduplication and temporal joins for streaming data

    glassflow/clickhouse-etl’s past year of commit activity
    TypeScript 356 Apache-2.0 19 0 3 Updated Nov 10, 2025
  • glassflow-python-sdk Public

    GlassFlow Python SDK to publish and consume data to your pipelines at Glassflow.dev

    glassflow/glassflow-python-sdk’s past year of commit activity
    Python 9 MIT 1 0 1 Updated Nov 10, 2025
  • cli Public

    GlassFlow CLI to create and manage real-time data pipelines

    glassflow/cli’s past year of commit activity
    Shell 5 0 0 1 Updated Nov 10, 2025
  • glassflow/opentelemetry-demo’s past year of commit activity
    Makefile 0 0 0 0 Updated Nov 7, 2025
  • charts Public
    glassflow/charts’s past year of commit activity
    Smarty 1 MIT 0 0 2 Updated Oct 26, 2025
  • kafka-kerberos-gateway Public

    A lightweight Go service that provides HTTP API endpoints for Kafka operations with Kerberos (GSSAPI) authentication.

    glassflow/kafka-kerberos-gateway’s past year of commit activity
    Go 0 Apache-2.0 0 0 0 Updated Oct 23, 2025
  • glassflow-etl-k8s-operator Public

    K8s operator for ETL components orchestration

    glassflow/glassflow-etl-k8s-operator’s past year of commit activity
    Go 5 0 1 (1 issue needs help) 1 Updated Oct 22, 2025
  • glassgen Public

    A flexible synthetic data generator with configurable schemas, multiple sinks, and controlled event duplication.

    glassflow/glassgen’s past year of commit activity
    Python 9 MIT 0 0 0 Updated Oct 14, 2025
  • opentelemetry-collector-contrib Public Forked from open-telemetry/opentelemetry-collector-contrib

    Contrib repository for the OpenTelemetry Collector

    glassflow/opentelemetry-collector-contrib’s past year of commit activity
    Go 0 Apache-2.0 3,135 0 0 Updated Oct 7, 2025
  • clickhouse-etl-py-sdk Public

    Python SDK to create GlassFlow Clickhouse ETL pipelines

    glassflow/clickhouse-etl-py-sdk’s past year of commit activity
    Python 6 MIT 0 0 0 Updated Sep 4, 2025