DataFlowCron

DataFlowCron is a namespaced CRD (dataflowcrons, kind DataFlowCron, group dataflow.dataflow.io) for running the same Source → Transformations → Sink pipeline as DataFlow, but on a cron schedule using Kubernetes CronJob / Job instead of a long-lived Deployment.

Use it when the workload is naturally batch-oriented (e.g. poll a table, export a window, then stop) or when you want a periodic run with optional post-processing steps (triggers) after the processor finishes successfully.

How it differs from DataFlow

	DataFlow	DataFlowCron
Orchestration	Deployment (always on)	CronJob → Job per tick
Post-run hooks	—	Optional `triggers` chain
Best sources	Kafka streaming	Polling / batch sources

See Workload Types for decision guidance.

Execution flow

On each schedule tick, the CronJob starts a Job whose pod runs the processor until the source is exhausted or the process exits.
If triggers is non-empty, after that Job succeeds, the operator enqueues trigger Jobs in order.
Status tracks phases such as runs in progress, RunningTriggers, Completed, or Failed.

flowchart LR
  Tick[Cron schedule] --> CJ[CronJob]
  CJ --> PJ[Processor Job]
  PJ -->|success| T1[Trigger Job 1]
  T1 -->|success| T2[Trigger Job 2]
  PJ -->|fail| Fail[Status Failed]

Source types and run completion

Polling sources (PostgreSQL, Trino, ClickHouse, Nessie) typically finish when the source is exhausted, so the processor Job can complete and triggers can run.
Kafka is streaming: the processor often does not stop by itself, so a Cron-driven Kafka pipeline may not reach “success → triggers” unless you design for completion. Prefer polling sources for scheduled post-triggers.

Documentation in this section

Spec & Schedule — embedded DataFlowSpec, cron fields, cluster objects
Triggers — ordered post-run Jobs, fields, debugging
Examples — YAML samples, status, suspend

DataFlowCron

How it differs from DataFlow

Execution flow

Source types and run completion

Documentation in this section

See also