DataFlow Operator
Kubernetes operator for streaming data between different data sources with support for message transformations.
Overview
DataFlow Operator allows you to declaratively define data flows between different sources and sinks through Kubernetes Custom Resource Definitions (CRD). The operator automatically manages the lifecycle of data flows, processes messages, and applies necessary transformations.
Quick Start
Installing the Operator
# Install operator via Helm from OCI registry
helm install dataflow-operator oci://ghcr.io/dataflow-operator/helm-charts/dataflow-operator
# Verify installation
kubectl get pods -l app.kubernetes.io/name=dataflow-operator
# Verify CRD
kubectl get crd dataflows.dataflow.dataflow.io
Creating Your First Data Flow
Create a simple data flow from Kafka to PostgreSQL:
kubectl apply -f dataflow/config/samples/kafka-to-postgres.yaml
Check status:
kubectl get dataflow kafka-to-postgres
kubectl describe dataflow kafka-to-postgres
Documentation
Documentation is available in two languages:
- English Documentation - Complete English documentation
- Русская Документация - Полная русская документация
Key Features
- Connectors: Kafka, PostgreSQL, ClickHouse, Trino, Nessie (sources and sinks)
- Transformations: Timestamp, Flatten, Filter, Mask, Router, Select, Remove, SnakeCase, CamelCase
- Security: Kubernetes Secrets via
SecretReffor credentials
License
Apache License 2.0