info@lasmart.biz

december 13, 2024

Comparative analysis of approaches to DWH modelling: Dimensional Modeling (Snowflake), Data Vault 2.0, Anchor Modeling

Data Warehouses (DWH) play a key role in supporting analytical and business decisions by providing structured data for reporting and analysis. Proper data modelling is crucial for the successful functioning of a DWH.

In this article, we will compare three popular modelling approaches: Anchor Modeling, Data Vault 2.0, and the Snowflake model. We will examine their core concepts, advantages, disadvantages, and areas of application, helping to choose the approach that best aligns with business requirements.

Snowflake

The Snowflake model is a data model based on two types of tables: facts and dimensions. The former are used to define quantitative data (events) – metrics or indicators of the system, while the latter provide context to the facts (where, when, and in what format the events occurred). These same concepts are used in the data mart layer – a level focused on providing an easy-to-use structure for analytics and report creation.

Figure 1 – example ER diagram for Dimensional Modeling
At the center is the fact table «Receipt» (a sales event), which is linked to the dimension tables

Data Vault 2.0

Data Vault 2.0 is a more flexible methodology designed to ensure scalability and track changes in data (historical tracking).

The core concepts of the model are:

  • Hub – an entity or event in the business. It contains the business key, hash key, a reference to the source, and the record’s addition date. A hub can only be linked to a link or satellite.
  • Link – relationships between hubs. They contain hash keys of the related entities, a reference to the source, and the record’s addition date. Links can be associated with hubs and satellites.
  • Satellite – contextual data for entities and relationships, mutable attributes, linked to hubs and links.

Figure 2 – ER diagram for Data Vault 2.0
Hubs are marked in green, links in blue, and satellites in grey

Anchor Modeling

Anchor Modeling is a flexible methodology designed to handle data that constantly changes in structure. It supports full historical tracking.

Key concepts of the model:

  • Anchor – a business entity or event. It contains only surrogate keys, a reference to the source, and a timestamp indicating when the record was added.
  • Link – an entity similar to a link in Data Vault 2.0, with minor differences: it cannot have satellites (attributes).
  • Attribute – models the context for anchors and contains their characteristics. A distinctive feature is that a separate table must be created for each attribute (e.g., name, colour, gender, etc.).

Figure 3 – ER diagram fragment for Anchor Modeling
Anchors are marked in blue-green, links in green, and anchor attributes in orange

Advantages and disadvantages of the models

Snowflake

This model is designed for regulated analytical systems that do not require (or at least frequently require) changes in the data structure.

Pros:

  • Simplicity of implementation, low learning curve for specialists – the model is intuitive
  • High query performance due to the small number of tables and the joins between them

Cons:

  • Low scalability, difficulty in adding new data sources
  • Data duplication and null values, leading to increased storage volume and complexity in data updates

Data Vault 2.0

This model is designed for systems that require historical data tracking and systems that change over time.

Pros:

  • Flexibility and scalability, ease in adding new data sources
  • Full support for historical tracking

Cons:

  • More complex structure, leading to difficulties in developing ETL processes
  • Reduced performance due to a larger number of tables and their joins

Anchor Modeling

This model is suitable for fast-growing businesses and systems with frequent changes and updates.

Pros:

  • Ease of supporting any changes in the data structure
  • Full historical tracking support
  • Low data redundancy due to high normalisation (no duplication, no null values)

Cons:

  • High learning curve for specialists, difficulty in understanding the concept
  • Low performance even for basic queries
  • Requirement for automated creation of entities (attributes, anchors, links)

Comparative analysis of the models

Parameter Snowflake Data Vault 2.0 Anchor Modeling
Scalability
Low
High
Very high
Historical tracking support
Partial
Full
Full
Performance
High
Medium
Low
Complexity
Low
Medium
High

Table 1 – Comparison by key criteria

Performance testing for data retrieval

The following typical queries were selected for the analysis:

  • Retrieving the entire «Receipts» fact table
  • Generating a three-month report by store for plan/actual analysis
  • Generating an ABC analysis report

The queries were executed in a test environment with approximately 2.5 million records in the «Receipts» fact table. For the second and third queries, additional dimensions such as Products (26.3 thousand rows), Stores (32 rows), and Product Groups (983 rows) were used.

Query results:

Snowflake Data Vault 2.0 Anchor Modeling
Retrieving fact table (Receipts)
24 sec
35 sec
72 sec
Plan/actual analysis report
2.2 sec
8.6 sec
13.3 sec
ABC analysis report
13 sec
29,4 sec
60 sec

Table 2 – Execution time

Snowflake Data Vault 2.0 Anchor Modeling
Retrieving fact table (Receipts)
92 MB
856 МВ
1641 MB
Plan/actual analysis report
52 MB
459 МВ
791 MB
ABC analysis report
86 MB
922 МВ
1344 MB

Table 3 – Memory usage

Snowflake Data Vault 2.0 Anchor Modeling
Retrieving fact table (Receipts)
152966400 B
171422100 В
334613800 B
Plan/actual analysis report
764040 B
792600 В
933484 B
ABC analysis report
69865 B
75310 В
95120 B

Table 4 – Data reads in bytes

Snowflake Data Vault 2.0 Anchor Modeling
Retrieving fact table (Receipts)
146 MB
163,5 МВ
319 MB
Plan/actual analysis report
0,75 MB
0,76 МВ
0,9 MB
ABC analysis report
0,068 MB
0,072 МВ
0,092 MB

Table 5 – Data reads in megabytes

Conclusions

Thus, each methodology has its own strengths and weaknesses, caters to specific business needs, and when chosen correctly, can improve the efficiency of a Data Warehouse (DWH).

Anchor Modeling offers high flexibility and resilience to changes due to its atomic structure. This approach is ideal for systems where data and business requirements frequently change, providing ease of adaptation.

Data Vault 2.0 is also characterised by high resilience and modularity. Its structure allows effective management of historical data and ensures data integrity, even in environments with frequent changes. This approach has proven especially effective for enterprise data warehouses, where data traceability and integration from multiple sources are crucial.

On the other hand, Snowflake is better suited for stable analytical systems. Its comparatively lower level of normalisation ensures better performance for analytical queries. However, its drawbacks become evident in environments with frequent changes — modifying the structure can require significant effort and impact the related parts of the model.

For more information, please get in touch here:

Contact Form main