Skip to content

DataFlow Lifecycle & Status

This page covers cluster objects, reconciliation, and status for the DataFlow CRD. For spec fields, see Spec Reference.

Resources created per DataFlow

For each DataFlow named <name> in a namespace:

Resource Name Purpose
ConfigMap df-<name>-spec Holds spec.json (resolved spec with secrets inlined).
ConfigMap df-<name>-checkpoint Read position for polling sources (default). Omitted when checkpointPersistence: false.
Deployment df-<name> Processor pod(s).
ServiceAccount, Role, RoleBinding df-<name>-processor RBAC for checkpoint ConfigMap access. Omitted when checkpointPersistence: false.

Processor pod

  • Image: from env PROCESSOR_IMAGE (often same as operator image/tag).
  • Args: --spec-path=/etc/dataflow/spec.json, --namespace=..., --name=....
  • Volume: ConfigMap df-<name>-spec mounted at /etc/dataflow (read-only).
  • Env: LOG_LEVEL (e.g. from PROCESSOR_LOG_LEVEL).

The controller sets owner references from the DataFlow to owned resources so they are deleted when the DataFlow is deleted.

Reconciliation loop

For each DataFlow, DataFlowReconciler runs:

  1. Get DataFlow — if deleting, clean up Deployment, ConfigMaps, RBAC; set status Stopped.
  2. Resolve secrets — substitute all SecretRef fields via SecretResolver.
  3. ConfigMap — create/update df-<name>-spec with spec.json.
  4. Checkpoint & RBAC (when checkpointPersistence is not false) — create checkpoint ConfigMap and processor RBAC.
  5. Deployment — create/update df-<name> with processor image, volume, resources, affinity.
  6. Deployment status — map Deployment readiness to DataFlow Phase / Message.
  7. Update status — write status back to the DataFlow resource.
flowchart TD
  A[Get DataFlow] --> B{Deleted?}
  B -->|Yes| C[Cleanup Deployment, ConfigMaps, RBAC]
  C --> D[Update Status Stopped]
  B -->|No| E[Resolve Secrets]
  E --> F[Create or Update ConfigMap]
  F --> F2{CheckpointPersistence?}
  F2 -->|Yes| F3[Create Checkpoint ConfigMap and RBAC]
  F2 -->|No| G
  F3 --> G[Create or Update Deployment]
  G --> H[Read Deployment Status]
  H --> I[Update DataFlow Status]

Status fields

Each DataFlow resource exposes status including:

Field Description
Phase e.g. Running, Pending, Error, Stopped
Message Additional status detail
LastProcessedTime Time of last processed message
ProcessedCount Messages processed
ErrorCount Processing errors
kubectl get dataflow
kubectl describe dataflow <name>
kubectl get deployment,configmap -l app.kubernetes.io/instance=<name>

See Metrics for Prometheus counters and Kubernetes Events for cluster events.

RBAC

The operator ClusterRole allows:

  • Read/write DataFlow and status.
  • Create/patch events.
  • Read secrets (for resolution).
  • Create/update/delete ConfigMaps and Deployments in DataFlow namespaces.
  • Create ServiceAccounts, Roles, RoleBindings for checkpoint access when enabled.

See Helm templates (clusterrole.yaml) for exact rules.

See also