dbt, Incremental Models, Merge Strategy, Performance Optimization, Analytics Engineering
dbt Incremental Models: Efficient Transformations
Master dbt incremental models to optimize data transformations with merge strategies, streaming integration, and performance best practices for analytics engineering
In modern data warehouses, transforming billions of rows daily becomes expensive and time-consuming when using traditional full-refresh approaches. dbt incremental models solve this challenge by processing only new or changed records, dramatically reducing compute costs and transformation times.

Understanding Incremental Models
Incremental models operate on a simple but powerful principle: instead of rebuilding entire tables from scratch, they identify and process only the delta, the new or modified records since the last run. This approach transforms what might be hour-long jobs into minute-long operations.
Consider a typical scenario: you have an events table with billions of historical records. A full refresh would reprocess every single event, even though only today's data has changed. An incremental model processes just today's events, appending or merging them into the existing table.
Basic Incremental Configuration
Let's start with a fundamental incremental model:
This example uses dbt's Jinja templating syntax (the double curly braces {{ }}), which allows dynamic SQL generation. The is_incremental() macro is crucial, it returns false on the first run, building the full table. On subsequent runs, it returns true, triggering the filter that selects only new records. The {{ this }} reference is a special variable pointing to the current model's table in your data warehouse.
Merge Strategies Explained
dbt supports multiple strategies for handling incremental updates, each suited to different use cases.
Append Strategy
The simplest approach, new rows are added without checking for duplicates:
When to use append: Choose this strategy for truly immutable event streams where:
Each record represents a point-in-time event that never changes (clicks, page views, transactions)
Duplicates are impossible by design, or deduplication happens in an earlier pipeline stage
You need maximum write performance since no uniqueness checks occur
Merge Strategy
The merge strategy (default for Snowflake, BigQuery, Databricks) uses unique_key to update existing records and insert new ones:
This executes as a MERGE statement (or equivalent), perfect for slowly changing dimensions where records update over time.
When to use merge: Choose this strategy when:
Records can be updated after initial creation (user profiles, order statuses, product catalogs)
You need to maintain the latest version of each record
Your data warehouse supports efficient merge operations (most modern warehouses do)
Delete+Insert Strategy
For data warehouses lacking efficient merge operations (like Redshift), delete+insert removes matching records before inserting new ones:
This strategy processes batches (like the last 7 days) and completely refreshes those records, handling late-arriving updates effectively.
When to use delete+insert: Choose this strategy when:
Your warehouse lacks efficient merge operations (older Redshift clusters)
You need to handle late-arriving data within a specific window
You want transactional consistency within each batch
Microbatch Strategy (2025 Best Practice)
For time-series data, dbt 1.6+ introduced the microbatch strategy, which has become the standard approach for incremental processing in 2025:
Microbatch automatically divides your data into time-based batches and processes them independently. This provides:
Automatic backfill orchestration: dbt handles missing time periods intelligently
Parallel processing: Multiple batches can run simultaneously
Idempotent reruns: Each batch produces identical results regardless of execution timing
Built-in late data handling: Overlapping windows catch late arrivals without custom logic
When to use microbatch: This is the recommended approach for any time-series data with consistent timestamps, logs, events, IoT sensor data, or streaming analytics.
Streaming Integration with Kafka
Modern analytics pipelines increasingly integrate streaming data. When connecting dbt to Kafka topics, incremental models become essential for managing continuous data flows.
Conduktor provides enterprise-grade Kafka management, allowing you to monitor, transform, and route streaming data with features like schema registry integration, data quality rules, and real-time governance. Learn more about managing Kafka topics and Kafka Connect for streaming-to-warehouse pipelines. When paired with dbt incremental models, you can efficiently transform streaming data into analytics-ready tables:
Key concepts explained:
Kafka offsets: Sequential IDs for each message in a partition. Using offsets as your incremental filter guarantees exactly-once processing, each message is processed exactly one time, preventing duplicates or missing data.
Schema evolution handling: The
on_schema_change='sync_all_columns'option automatically adds new columns when your Kafka topic schema changes, preventing pipeline failures.Metadata columns: Most Kafka connectors (like Snowflake's Kafka connector) expose
_partitionand_offsetas columns, enabling offset-based incremental processing.
For production Kafka integration, use Conduktor's schema registry to manage Avro/Protobuf schemas and enforce data contracts across your streaming pipeline.
Performance Optimization Patterns
Partitioned Tables
Combine incremental models with table partitioning for maximum efficiency. This BigQuery example shows the configuration syntax:
This limits scans to relevant partitions, dramatically reducing query costs. For example, querying last week's transactions scans only 7 partitions instead of the entire table, a typical cost reduction of 90%+ for large historical tables.
Other warehouse syntax:
Snowflake: Use
cluster_by=['date_column']for automatic clusteringDatabricks: Specify partitioning in table properties:
partition_cols=['date']
Clustered Keys
Add clustering for frequently filtered columns:
Lookback Windows
For late-arriving data, implement lookback windows:
This reprocesses the last 3 days, catching late arrivals while avoiding full refreshes. Late data arrives due to:
Clock skew: Different systems having slightly different time clocks
Network delays: Temporary connectivity issues causing message delays
Data pipeline latency: Upstream processing taking longer than expected
Incremental Predicates (dbt 1.7+)
For more efficient filtering, especially with partitioned tables, use incremental predicates to push filters down to the source query:
The incremental_predicates filter applies to both the source data and the existing table during merge operations, reducing the amount of data scanned.
Testing and Monitoring
Always implement tests for incremental models:
2025 Monitoring Options:
dbt Cloud: Built-in observability with model timing, row counts, and test results
Elementary Data: Open-source data observability that monitors anomalies, schema changes, and data quality issues directly in your warehouse
dbt Mesh: For large organizations, implement cross-project dependencies and centralized monitoring across multiple dbt projects
Custom macros: Log metrics (row counts, runtime, error rates) to dedicated monitoring tables for trend analysis
Real-world example: A properly configured incremental model on a 5 billion row events table reduces processing from 2 hours (full refresh, $50 in compute) to 5 minutes (incremental, $2 in compute), a 96% cost reduction.
When NOT to Use Incremental Models
While incremental models offer significant benefits, they're not always the right choice:
Small tables (< 1 million rows): Full refresh is simpler and fast enough
Frequently changing dimensions: If >50% of rows update each run, full refresh may be faster
Complex business logic: When your transformation logic itself changes frequently, full refreshes ensure consistency
Initial development: Start with full refresh (
materialized='table'), then optimize to incremental once logic stabilizes
Consider dbt snapshots (Type 2 Slowly Changing Dimensions) instead of incremental models when you need to track the full history of how records change over time, not just the current state.
Common Pitfalls and Troubleshooting
Forgetting the initial load: Always test your model with a full refresh (dbt run --full-refresh) to ensure it works without the incremental filter.
Ignoring idempotency: Incremental models should produce identical results whether run once or multiple times, critical for backfills and reruns. Test this by running your model twice and comparing results.
Over-relying on timestamps: Late-arriving data can cause missed records. Solutions:
Use sequence numbers or Kafka offsets instead of timestamps when available
Implement lookback windows to reprocess recent data
Use the microbatch strategy which handles this automatically
Common error: "Compilation Error in model": Usually indicates syntax errors in Jinja logic. Check that all {% if %} blocks have matching {% endif %} tags.
Duplicate records despite unique_key: Verify your unique_key is truly unique with a test. Composite keys require list syntax: unique_key=['col1', 'col2'].
Conclusion
Incremental models represent a fundamental shift from batch-oriented to continuous transformation patterns. By processing only what's changed, they enable real-time analytics at scale while controlling costs. Whether you're integrating Kafka streams with Conduktor or transforming traditional database changes, mastering incremental strategies is essential for modern analytics engineering.
Quick decision guide:
Immutable events â Use
microbatch(2025 best practice) orappendUpdating dimensions â Use
mergeLate-arriving data in batches â Use
delete+insertReal-time streams â Use
appendwith Kafka offsets
Start with simple strategies for immutable events, graduate to merge strategies for changing dimensions, and leverage streaming offsets for real-time pipelines. With proper testing, monitoring via Elementary Data or dbt Cloud, and the right incremental strategy, you can achieve 90%+ cost reductions while maintaining data freshness.
Related Concepts
dbt Tests and Data Quality Checks - Validating incremental models to ensure data quality
Streaming ETL vs Traditional ETL - How incremental models bridge batch and streaming paradigms
Data Contracts for Reliable Pipelines - Defining expectations for incremental model inputs