DataFlowCron
DataFlowCron is a namespaced CRD (dataflowcrons, kind DataFlowCron, group dataflow.dataflow.io) for running the same Source → Transformations → Sink pipeline as DataFlow, but on a cron schedule using Kubernetes CronJob / Job instead of a long-lived Deployment.
Use it when the workload is naturally batch-oriented (e.g. poll a table, export a window, then stop) or when you want a periodic run with optional post-processing steps (triggers) after the processor finishes successfully.
How it differs from DataFlow
| DataFlow | DataFlowCron | |
|---|---|---|
| Orchestration | Deployment (always on) | CronJob → Job per tick |
| Post-run hooks | — | Optional triggers chain |
| Best sources | Kafka streaming | Polling / batch sources |
See Workload Types for decision guidance.
Execution flow
- On each schedule tick, the CronJob starts a Job whose pod runs the processor until the source is exhausted or the process exits.
- If
triggersis non-empty, after that Job succeeds, the operator enqueues trigger Jobs in order. - Status tracks phases such as runs in progress,
RunningTriggers,Completed, orFailed.
flowchart LR
Tick[Cron schedule] --> CJ[CronJob]
CJ --> PJ[Processor Job]
PJ -->|success| T1[Trigger Job 1]
T1 -->|success| T2[Trigger Job 2]
PJ -->|fail| Fail[Status Failed]
Source types and run completion
- Polling sources (PostgreSQL, Trino, ClickHouse, Nessie) typically finish when the source is exhausted, so the processor Job can complete and triggers can run.
- Kafka is streaming: the processor often does not stop by itself, so a Cron-driven Kafka pipeline may not reach “success → triggers” unless you design for completion. Prefer polling sources for scheduled post-triggers.
Documentation in this section
- Spec & Schedule — embedded DataFlowSpec, cron fields, cluster objects
- Triggers — ordered post-run Jobs, fields, debugging
- Examples — YAML samples, status, suspend