CDC (Change Data Capture)
Category: infrastructure
A set of software design patterns used to programmatically determine and track the specific data rows that have changed in a source database.
CDC reads transactional log files (like PostgreSQL write-ahead logs or MySQL binlogs) asynchronously, extracting row-level mutations (inserts, updates, deletes) in real-time. Instead of executing heavy, destructive batch queries against production databases, CDC streams these individual state deltas via event brokers down to target analytical storage, keeping downstream systems fully mirrored with zero production performance drag.
Common Examples
- We set up a Debezium-driven CDC pipeline to capture customer record updates from our production database and stream them instantly into our ClickHouse data engine.
- Relying on a log-based CDC workflow allows us to track historical state changes without polluting our application logic with custom auditing columns.