Skip to content

DataFlow Spec Reference

This page documents the DataFlow spec fields. For orchestration (Deployment, reconciliation, status), see Lifecycle & Status.

CRD structure

flowchart TB
  subgraph DataFlow["DataFlow"]
    Spec["spec"]
    Status["status"]
  end

  subgraph SpecFields["spec fields"]
    Source["source (required)"]
    Sink["sink (required)"]
    Trans["transformations (optional)"]
    Errors["errors (optional)"]
    Resources["resources (optional)"]
    Scheduling["scheduling (optional)"]
    Checkpoint["checkpointPersistence (optional)"]
    ChannelBuffer["channelBufferSize (optional)"]
    Replicas["replicas (optional, Kafka)"]
    Image["processorImage / processorVersion (optional)"]
  end

  Source --> SourceTypes["type: kafka | postgresql | trino | clickhouse | nessie"]
  Sink --> SinkTypes["type: kafka | postgresql | trino | clickhouse | nessie"]
  Trans --> TransTypes["timestamp | flatten | filter | mask | router | select | remove | snakeCase | camelCase"]

  Spec --> Source
  Spec --> Sink
  Spec --> Trans
  Spec --> Errors
  Spec --> Resources
  Spec --> Scheduling
  Spec --> Checkpoint
  Spec --> ChannelBuffer
  Spec --> Replicas
  Spec --> Image

Field reference

Field Required Description
source Yes Source connector type and config. See Connectors.
sink Yes Main destination connector.
transformations No Ordered list of message transformers. See Transformations.
errors No Optional error sink for failed writes to the main sink.
resources No CPU/memory for the processor pod.
nodeSelector, affinity, tolerations No Pod scheduling constraints.
checkpointPersistence No Default true. Polling sources persist read position to a ConfigMap. For Nessie, applies when source.config.incrementalBySnapshot: true. Set false to disable.
channelBufferSize No Default 100. Buffer between source, processor, and sink. Use 500–1000 for high Kafka throughput.
replicas No Default 1. Values > 1 allowed only for Kafka (consumer group). Webhook rejects replicas > 1 for polling sources.
processorImage / processorVersion No Override processor container image.
imagePullSecrets No Pull secrets for the processor pod.

Secrets

Credentials can be referenced via SecretRef in connector config. The operator resolves secrets before writing spec.json into the ConfigMap. See Connectors — Using Kubernetes Secrets.

Validation

When the validating webhook is enabled (Helm: webhook.enabled and webhook.caBundle), invalid specs are rejected at admission time — before ConfigMap or Deployment creation.

The same validation rules apply to the embedded DataFlowSpec inside DataFlowCron.

See also