september 29, 2025
Incremental population of fact tables in a dbt model
Data Build Tool (dbt) enables the consistency and freshness of analytical data in the warehouse. However, as projects scale, a key challenge emerges — implementing incremental updates for fact tables containing large volumes of event-based data spanning extended time periods. This article explores how to automate such updates in dbt models, with a focus on controlling update windows and preventing data loss.
When migrating from a traditional project where data was updated using standalone SQL procedures to dbt models, it’s necessary to configure incremental updates for fact tables. Unlike dimension tables, where a simple merge strategy on a unique key often suffices, event-based fact tables require a more tailored approach due to their size and the dynamic nature of update periods.
For fact tables that collect data over defined reporting periods, the delete+insert strategy is generally preferred. It deletes records for a selected time window and reinserts the current data for that period. This approach helps minimise discrepancies and simplifies the incremental loading process.
Defining the update window using variables
The data update window is determined dynamically — for example, covering the last two full months plus the current month up to the execution date. In dbt, it’s convenient to define such periods using variables passed through the Jinja templating engine.
Example of passing variables when running a model:
Defining variables in dbt_project.yml (with default values):
Using variables in a dbt fact table model:
Source data validation and removal of old records using hooks
Before inserting new data into the target fact table, it’s essential to verify that the source contains data for the required period — this prevents data loss when deleting old rows from the target. To handle this, dbt provides hooks — mini-scripts with SQL instructions (or lists of SQL instructions) that are automatically triggered to perform additional actions either before (pre-hook) or after (post-hook) a given step in the dbt workflow, such as:
- before running a model (table),
- after it has been successfully created,
- before or after running tests,
- at the start or end of the entire dbt run.
Example of a pre-hook for batch deletion of rows with data existence check:
In this pre-hook, we perform the following checks:
- Using if is_incremental(), we ensure that the target table exists, its materialisation is set to incremental, and the dbt model is being run without the —full-refresh flag (which would recreate the dbt model from scratch).
- The if exists condition checks whether the source table stg_1C_ERP_info_register_currency_rates contains data for the specified period.
- If data exists in the source table, we delete rows from the target table (referenced as {{ this }} — the current model where the pre-hook is defined) for the update period, in batches of 100,000 rows.
After the pre-hook completes, the model inserts new data for the update window using variables. Additionally, after the table is created (or rebuilt using —full-refresh), a post-hook executes a SQL script to create a clustered index on the date column, which optimises query performance when filtering by date.
Example of a complete dbt model with incremental fact table updates:
Conclusion
Based on the above, it can be concluded that dbt provides a convenient and reliable platform for building and managing analytical models. For incremental updates of fact tables with dynamic data load periods, the optimal solution is materialisation using the «delete+insert» strategy, which ensures data cleanliness and consistency.
Using Jinja variables to define the update window when running dbt models allows flexible control over data loads without the need for constant SQL code changes, while pre-hook and post-hook provide additional control over data integrity and overall performance.
Thus, it is entirely feasible to implement resilient and consistent fact table updates in the data warehouse using dbt models, provided the processes are properly configured and the framework’s tools are used effectively. This approach improves the quality and freshness of analytics while keeping operational costs to a minimum.