All writingArchitecture guide

Reliable Data Integration: Event-Driven Patterns, CDC, and the Outbox

Published 1 August 2025 · Updated 6 April 2026

Reliable data integration avoids dual writes by pairing state changes with outbox rows in one transaction, using CDC when legacy apps cannot emit events, and making every consumer idempotent because brokers deliver at-least-once in production.

Consistent data integration transfers facts between services, databases and analytics systems without any updates being lost or applied twice each time the network failed. There exist three prevalent domain events (publish by the app), the transactional outbox (atomic commit with state), and change data capture (CDC how the database WAL/binlog streams change). This article details when each is appropriate, ordering and idempotency requirements and our operational checklist that we go through before we recommend one when it comes to client builds. To do transactionally consistent messaging, outbox pattern is supported in the enterprise integration literature and CDC documentation (open-source CDC platform) as complementary tools: outbox to events with application intent, CDC to streams of DB-level changes.

Key takeaways

Never dual-write: do not commit a row then separately fire a message without a shared transactional guarantee-you will eventually have one without the other.

Use outbox when the message must reflect business intent committed with the same ACID transaction as your state change.

Use CDC when many consumers need all row changes or when you cannot change the legacy application but can read replication logs.

All consumers must be idempotent; assume at-least-once delivery from Kafka, SQS, Pub/Sub, or SNS in real deployments.

Treat message schemas like public API contracts-consumers across teams will break silently if evolution rules are vague.

Pattern comparison table (text)

Outbox - Consistency: strong with OLTP write. - Best for: explicit domain events, cross-service workflows. - Caveat: application must write outbox rows; schema migrations touch publisher.

CDC - Consistency: eventual from DB commit point. - Best for: search indexes, warehouses, caches fed from existing tables. - Caveat: exposes physical schema; refactors may break consumers without contracts.

Choreographed sagas without outbox - Consistency: fragile. - Best for: rarely. - Caveat: compensations and orphaned steps multiply; prefer outbox or orchestration with a durable process manager for money-moving flows.

Implementing the transactional outbox

Add an outbox table (id, aggregate_type, aggregate_id, payload, created_at) in the same database as your write model. In one transaction: update business row, insert outbox row.

A relay process (polling or log-tailing) publishes to the bus and marks rows published or deletes them; use lease/lock to avoid duplicate publish under crashes.

Messages carry versioning (event schema v2) and correlation IDs for tracing across services.

Implementing CDC safely

Run a CDC connector (Debezium, AWS DMS, Fivetran, etc.) against a read replica or primary per vendor guidance. Monitor replication lag; analytics fed from CDC is only as fresh as lag allows.

Treat CDC events as physical change feeds: renames and drops break downstream parsers. Mitigate with consumer contracts, deprecation windows, or views as stable publication surfaces.

For GDPR and right-to-erasure, ensure downstream sinks honor deletes-CDC emits tombstones in Kafka compacted topics when modeled correctly; batch warehouses may need periodic reconcile jobs.

Schema registry discipline matters: version event payloads and test consumers against forward-compatible contracts where possible.

Orchestration versus choreography (with money on the line)

Choreography via events is elegant until compensations multiply; money-moving flows often benefit from a durable orchestrator or saga manager with explicit state.

Outbox pairs naturally with choreography for domain events; orchestration still needs idempotent steps and durable checkpoints-do not confuse a diagram with enforcement.

Schema evolution and compatibility testing

Publish compatibility rules: which fields are optional, how enums evolve, and how consumers handle unknown fields. Breaking changes should be multi-phase with dual-write/dual-read windows when needed.

Automate contract tests in CI that replay sample event fixtures against consumer handlers-catch drift before production.

Ordering and partitioning

If per-order events must stay in order, partition the topic by order_id (or tenant + order). Global ordering is expensive and rarely needed.

Consumer parallelism increases throughput but breaks per-partition ordering guarantees if you shard work incorrectly-preserve partition affinity for ordered handlers.

Operational checklist before go-live

Dead-letter queues and replay tooling for poison messages.

Metrics: publish lag (outbox unprocessed count), consumer lag, CDC connector lag.

Load tests on peak publish rates; brokers and consumers sized with headroom.

Disaster recovery: can you rebuild search or warehouse from a snapshot + CDC backlog? Test it once per quarter.

Access controls: topic ACLs aligned to least privilege; separate prod/stage clusters or namespaces to prevent crosstalk accidents.

On-call runbooks for broker outages, poison messages, and replay procedures-practice a tabletop exercise before launch week.

Limitations

Outbox couples event shape to OLTP schema deployment cadence; CDC couples to physical tables-choose based on who owns evolution.

Cross-database transactions are not possible with a single outbox; cross-database sagas need careful design or consolidation.

Vendor-specific CDC limitations (data types, DDL filters) vary-pilot on a copy of production traffic before committing.

Security and access patterns for brokers and sinks

Apply least privilege to producers and consumers: a service that only needs to publish order events should not read unrelated topics.

Encrypt data in transit; consider encryption at rest for topics carrying PII. Logs that echo payloads can become accidental data leaks-scrub carefully.

Testing event-driven systems

Use contract tests and local broker containers in CI; spin up consumers against fixture streams to catch deserialization failures early.

Chaos experiments for broker slowdowns and consumer crashes validate that backpressure and retries behave-tabletop theory is insufficient.

Reconciliation jobs: when events and truth disagree

Even with outbox and CDC, periodic reconciliation catches edge cases: manual DB fixes, failed consumers that skipped offsets, or partial deploys.

Design reconciliation to be observable: metrics on mismatches, alerts when drift exceeds thresholds, and idempotent repair scripts.

Multi-region and disaster recovery for eventing

Document broker failover behaviour: split-brain risk, retention during outages, and whether producers pause or buffer.

Practice failover quarterly; discovering retention limits during an actual region outage is expensive.

Evolving away from risky dual-writes

If today you write to a database and fire a message without transactional guarantees, plan a migration: introduce outbox in the write path, backfill missing events via reconciliation, then cut over consumers to trust the outbox stream.

Run shadow consumers that compare legacy message paths to outbox-published events until discrepancy rates hit zero.

Communicate consumer cutover windows clearly; dual publishers should be temporary, not permanent.

Related guides, services & programmes

Frequently Asked Questions

CDC streams inserts, updates, and deletes from a database's transaction log to downstream systems in near real time. It is commonly used to sync data warehouses, search indexes, and caches without adding write load to application code.

The transactional outbox stores events in the same database transaction as business data, then a separate publisher sends them to a message bus. It ensures you never commit a state change without a corresponding event record (or vice versa).

Use CDC when you cannot modify a legacy application, when you need all table changes for analytics, or when multiple teams consume the same physical data. Use application events when you want to publish domain intent with richer payloads and versioning tied to your bounded context.

Message brokers typically guarantee at-least-once delivery. The same event may be processed twice after crashes, retries, or consumer restarts. Idempotent handlers (dedupe keys, upserts, natural keys) prevent duplicate side effects.

Writing to a database and separately sending a message without a shared transaction leads to inconsistency: one can succeed while the other fails. Recovery requires error-prone reconciliation jobs. Outbox or CDC aligns the two paths.

Want this distilled for your roadmap? Brief the Baaz squad or scan ship evidence in case studies.

Get in touch Schedule a call