Reliable Data Integration: Event-Driven Patterns, CDC, and the Outbox
Reliable data integration avoids dual writes by pairing state changes with outbox rows in one transaction, using CDC when legacy apps cannot emit events, and making every consumer idempotent because brokers deliver at-least-once in production.
Reliable data integration moves facts between services, databases, and analytics systems without losing updates or double-applying them when networks retry. Three dominant patterns are domain events (publish from the app), the transactional outbox (atomic commit with state), and change data capture (CDC—stream changes from the database WAL/binlog). This article explains when each fits, ordering and idempotency requirements, and the operational checklist we use before recommending one on client builds. For transactionally consistent messaging, the outbox pattern is well documented in enterprise integration literature and Debezium's CDC documentation (open-source CDC platform) as complementary tools: outbox for application-intent events, CDC for DB-level change streams.
Key takeaways
Never dual-write: do not commit a row then separately fire a message without a shared transactional guarantee—you will eventually have one without the other.
Use outbox when the message must reflect business intent committed with the same ACID transaction as your state change.
Use CDC when many consumers need all row changes or when you cannot change the legacy application but can read replication logs.
All consumers must be idempotent; assume at-least-once delivery from Kafka, SQS, Pub/Sub, or SNS in real deployments.
Treat message schemas like public API contracts—consumers across teams will break silently if evolution rules are vague.
Pattern comparison table (text)
Outbox — Consistency: strong with OLTP write. — Best for: explicit domain events, cross-service workflows. — Caveat: application must write outbox rows; schema migrations touch publisher.
CDC — Consistency: eventual from DB commit point. — Best for: search indexes, warehouses, caches fed from existing tables. — Caveat: exposes physical schema; refactors may break consumers without contracts.
Choreographed sagas without outbox — Consistency: fragile. — Best for: rarely. — Caveat: compensations and orphaned steps multiply; prefer outbox or orchestration with a durable process manager for money-moving flows.
Implementing the transactional outbox
Add an outbox table (id, aggregate_type, aggregate_id, payload, created_at) in the same database as your write model. In one transaction: update business row, insert outbox row.
A relay process (polling or log-tailing) publishes to the bus and marks rows published or deletes them; use lease/lock to avoid duplicate publish under crashes.
Messages carry versioning (event schema v2) and correlation IDs for tracing across services.
Implementing CDC safely
Run a CDC connector (Debezium, AWS DMS, Fivetran, etc.) against a read replica or primary per vendor guidance. Monitor replication lag; analytics fed from CDC is only as fresh as lag allows.
Treat CDC events as physical change feeds: renames and drops break downstream parsers. Mitigate with consumer contracts, deprecation windows, or views as stable publication surfaces.
For GDPR and right-to-erasure, ensure downstream sinks honor deletes—CDC emits tombstones in Kafka compacted topics when modeled correctly; batch warehouses may need periodic reconcile jobs.
Schema registry discipline matters: version event payloads and test consumers against forward-compatible contracts where possible.
Orchestration versus choreography (with money on the line)
Choreography via events is elegant until compensations multiply; money-moving flows often benefit from a durable orchestrator or saga manager with explicit state.
Outbox pairs naturally with choreography for domain events; orchestration still needs idempotent steps and durable checkpoints—do not confuse a diagram with enforcement.
Schema evolution and compatibility testing
Publish compatibility rules: which fields are optional, how enums evolve, and how consumers handle unknown fields. Breaking changes should be multi-phase with dual-write/dual-read windows when needed.
Automate contract tests in CI that replay sample event fixtures against consumer handlers—catch drift before production.
Ordering and partitioning
If per-order events must stay in order, partition the topic by order_id (or tenant + order). Global ordering is expensive and rarely needed.
Consumer parallelism increases throughput but breaks per-partition ordering guarantees if you shard work incorrectly—preserve partition affinity for ordered handlers.
Operational checklist before go-live
Dead-letter queues and replay tooling for poison messages.
Metrics: publish lag (outbox unprocessed count), consumer lag, CDC connector lag.
Load tests on peak publish rates; brokers and consumers sized with headroom.
Disaster recovery: can you rebuild search or warehouse from a snapshot + CDC backlog? Test it once per quarter.
Access controls: topic ACLs aligned to least privilege; separate prod/stage clusters or namespaces to prevent crosstalk accidents.
On-call runbooks for broker outages, poison messages, and replay procedures—practice a tabletop exercise before launch week.
Limitations
Outbox couples event shape to OLTP schema deployment cadence; CDC couples to physical tables—choose based on who owns evolution.
Cross-database transactions are not possible with a single outbox; cross-database sagas need careful design or consolidation.
Vendor-specific CDC limitations (data types, DDL filters) vary—pilot on a copy of production traffic before committing.
Security and access patterns for brokers and sinks
Apply least privilege to producers and consumers: a service that only needs to publish order events should not read unrelated topics.
Encrypt data in transit; consider encryption at rest for topics carrying PII. Logs that echo payloads can become accidental data leaks—scrub carefully.
Testing event-driven systems
Use contract tests and local broker containers in CI; spin up consumers against fixture streams to catch deserialization failures early.
Chaos experiments for broker slowdowns and consumer crashes validate that backpressure and retries behave—tabletop theory is insufficient.
Reconciliation jobs: when events and truth disagree
Even with outbox and CDC, periodic reconciliation catches edge cases: manual DB fixes, failed consumers that skipped offsets, or partial deploys.
Design reconciliation to be observable: metrics on mismatches, alerts when drift exceeds thresholds, and idempotent repair scripts.
Multi-region and disaster recovery for eventing
Document broker failover behaviour: split-brain risk, retention during outages, and whether producers pause or buffer.
Practice failover quarterly; discovering retention limits during an actual region outage is expensive.
Evolving away from risky dual-writes
If today you write to a database and fire a message without transactional guarantees, plan a migration: introduce outbox in the write path, backfill missing events via reconciliation, then cut over consumers to trust the outbox stream.
Run shadow consumers that compare legacy message paths to outbox-published events until discrepancy rates hit zero.
Communicate consumer cutover windows clearly; dual publishers should be temporary, not permanent.
Explore Product Strategy, Custom Software, and AI Development. If a build has stalled, see software project rescue. When you are ready to talk, get in touch.