Skip to content

DataFlow Operator

Kubernetes operator for streaming and scheduled data pipelines between Kafka, PostgreSQL, ClickHouse, Trino, and Nessie.

Getting Started Architecture

Current Versions

Component Version
DataFlow Operator
Helm Charts
DataFlow MCP
DataFlow Web

Explore the docs

  • Getting Started


    Install via Helm and create your first pipeline in minutes

    Start

  • DataFlow


    Continuous streaming pipelines backed by a Deployment

    Learn more

  • DataFlowCron


    Scheduled batch runs with optional post-run triggers

    Learn more

  • Workload Types


    Choose between DataFlow and DataFlowCron

    Compare

  • Connectors


    Kafka, PostgreSQL, ClickHouse, Trino, Nessie, Iceberg

    Reference

  • Transformations


    Filter, mask, route, flatten, and more

    Reference

  • Agent Skills


    Portable AI guides for deploy, config, and fault tolerance (any IDE)

    Install skills

Overview

DataFlow Operator lets you declaratively define data flows between sources and sinks through Kubernetes CRDs. The operator manages processor lifecycle, applies transformations, and supports both continuous (DataFlow) and scheduled (DataFlowCron) workloads.

Key Features

Multiple Data Source Support

  • Kafka — TLS, SASL, Avro, Schema Registry
  • PostgreSQL — custom SQL, batch inserts, UPSERT
  • ClickHouse — polling, batch inserts, auto-create MergeTree
  • Trino — SQL queries, Keycloak OAuth2
  • Nessie — Apache Iceberg via Nessie catalog
  • Iceberg — Apache Iceberg via REST Catalog API

Rich Transformation Set

Timestamp, Flatten, Filter, Mask, Router, Select, Remove, SnakeCase, CamelCase, DebeziumUnwrap

Flexible Routing

Route messages to different sinks using JSONPath conditions.

Secure Configuration

Configure connectors from Kubernetes Secrets via SecretRef.

Quick Start

helm install dataflow-operator oci://ghcr.io/dataflow-operator/helm-charts/dataflow-operator
kubectl apply -f dataflow/config/samples/kafka-to-postgres.yaml
kubectl get dataflow kafka-to-postgres

Documentation map

Topic Link
Installation Getting Started
DataFlow CRD Overview · Spec · Lifecycle
DataFlowCron CRD Overview · Triggers · Examples
Operations Errors · Fault Tolerance · Metrics
Tools Web GUI · MCP · Agent Skills
Development Developer Guide

License

Apache License 2.0