DataFlow Lifecycle & Status
This page covers cluster objects, reconciliation, and status for the DataFlow CRD. For spec fields, see Spec Reference.
Resources created per DataFlow
For each DataFlow named <name> in a namespace:
| Resource | Name | Purpose |
|---|---|---|
| ConfigMap | df-<name>-spec |
Holds spec.json (resolved spec with secrets inlined). |
| ConfigMap | df-<name>-checkpoint |
Read position for polling sources (default). Omitted when checkpointPersistence: false. |
| Deployment | df-<name> |
Processor pod(s). |
| ServiceAccount, Role, RoleBinding | df-<name>-processor |
RBAC for checkpoint ConfigMap access. Omitted when checkpointPersistence: false. |
Processor pod
- Image: from env
PROCESSOR_IMAGE(often same as operator image/tag). - Args:
--spec-path=/etc/dataflow/spec.json,--namespace=...,--name=.... - Volume: ConfigMap
df-<name>-specmounted at/etc/dataflow(read-only). - Env:
LOG_LEVEL(e.g. fromPROCESSOR_LOG_LEVEL).
The controller sets owner references from the DataFlow to owned resources so they are deleted when the DataFlow is deleted.
Reconciliation loop
For each DataFlow, DataFlowReconciler runs:
- Get DataFlow — if deleting, clean up Deployment, ConfigMaps, RBAC; set status
Stopped. - Resolve secrets — substitute all
SecretReffields via SecretResolver. - ConfigMap — create/update
df-<name>-specwithspec.json. - Checkpoint & RBAC (when
checkpointPersistenceis notfalse) — create checkpoint ConfigMap and processor RBAC. - Deployment — create/update
df-<name>with processor image, volume, resources, affinity. - Deployment status — map Deployment readiness to DataFlow Phase / Message.
- Update status — write status back to the DataFlow resource.
flowchart TD
A[Get DataFlow] --> B{Deleted?}
B -->|Yes| C[Cleanup Deployment, ConfigMaps, RBAC]
C --> D[Update Status Stopped]
B -->|No| E[Resolve Secrets]
E --> F[Create or Update ConfigMap]
F --> F2{CheckpointPersistence?}
F2 -->|Yes| F3[Create Checkpoint ConfigMap and RBAC]
F2 -->|No| G
F3 --> G[Create or Update Deployment]
G --> H[Read Deployment Status]
H --> I[Update DataFlow Status]
Status fields
Each DataFlow resource exposes status including:
| Field | Description |
|---|---|
| Phase | e.g. Running, Pending, Error, Stopped |
| Message | Additional status detail |
| LastProcessedTime | Time of last processed message |
| ProcessedCount | Messages processed |
| ErrorCount | Processing errors |
kubectl get dataflow
kubectl describe dataflow <name>
kubectl get deployment,configmap -l app.kubernetes.io/instance=<name>
See Metrics for Prometheus counters and Kubernetes Events for cluster events.
RBAC
The operator ClusterRole allows:
- Read/write DataFlow and status.
- Create/patch events.
- Read secrets (for resolution).
- Create/update/delete ConfigMaps and Deployments in DataFlow namespaces.
- Create ServiceAccounts, Roles, RoleBindings for checkpoint access when enabled.
See Helm templates (clusterrole.yaml) for exact rules.
See also
- Architecture — operator deployment, processor runtime, webhook
- DataFlow Overview
- DataFlowCron Lifecycle — CronJob-based runs