info@lasmart.biz

november 27, 2024

Comparison of Anchor Modeling and Dimensional Modeling

Choosing the right data modeling methodology for a data warehouse at the design stage is crucial. It ensures the necessary level of flexibility and scalability while aligning with business objectives.

In this article, we will review the strengths and weaknesses of two popular methodologies—Anchor Modeling and Dimensional Modeling—and compare their performance with typical queries.

DIMENSIONAL MODELING, or size-based modeling, is a data modeling approach where facts and dimensions are key components for organising and describing entities in a data warehouse.

  • Facts – describe business logic events (e.g., sale, purchase, sending a letter).
  • Dimensions – their characteristics (e.g., store, product, sender) that add context to the stored events in fact tables.

Typical examples of such models include the Star and Snowflake schemas. Both models imply linking a fact table to dimension tables. In the Snowflake schema, dimensions can also be connected to other dimensions, increasing the level of normalisation up to the Third Normal Form (3NF).

Figure 1 – example ER diagram for Dimensional Modeling.
At the center is the fact table «Receipt» (a sales event), which is linked to the dimension tables.

ANCHOR MODELING is a data modeling method that enables efficient handling of large volumes of data that change in structure and content. This method is based on a high degree of normalisation (up to the Sixth Normal Form (6NF)).

The main concepts of the model include:

  • Anchor – represents an entity or event, contains surrogate keys and (optionally) a timestamp for when the record was added.
  • Attribute – used for modeling the properties and characteristics of anchors, contains the anchor’s surrogate key, the attribute value, and a timestamp to maintain data history.
  • Tie – models relationships between anchors, contains surrogate keys and a timestamp for when the record was added. Ties cannot have attributes.

Each of these concepts is represented by a separate table in the data warehouse for each entity. For example, if an anchor has 10 different attributes, there must be 10 separate independent tables in the warehouse for these attributes.

Figure 2 – ER diagram fragment for Anchor Modeling
Anchors are marked in blue-green, ties (links) in green, and anchor attributes in orange.

Attributes and ties are divided into static and historical. The latter are used when it is necessary to preserve all changes that have occurred to an entity. There are several approaches to implementing historical tracking; we used one of the basic methods—recording the last update date for each entry (the last_update_date field) and a flag indicating data relevance for selecting all the latest updates (the is_actual field).

Let’s compare the pros and cons of each model.

Anchor Modeling

Pros:

  • Flexibility in modeling and high scalability (due to the extremely high degree of data normalisation);
  • Space savings due to the absence of null values (if an anchor does not have a value for a given attribute, the corresponding record simply does not exist in the table) and reduced duplication;
  • Greater resilience to development errors (including in related sources), as data is stored independently;
  • Simplicity in maintaining data history, including old attributes, ties, and anchors that are no longer in use: it is enough to remove the join for that entity from queries, as the table is stored independently of others;

Cons:

  • A large number of tables and joins, making it challenging to optimise queries effectively;
  • High learning curve for specialists, with complexity in understanding the concept and management;
  • High load on the database, even for standard queries (discussed further below);
  • The large number of tables requires automation of the creation and updating process (potentially using Python);

Dimensional Modeling

Pros:

  • Simple architecture, low learning curve for specialists;
  • High performance in data retrieval (discussed further below);

Cons:

  • Low scalability, making the process of changing the data structure difficult;
  • Data duplication and null values, leading to increased storage volume and complexity in data updates;

Comparative performance analysis of Anchor Modeling and Dimensional Modeling

To conduct the analysis, the following typical queries were selected:

  1. Retrieving the entire «Receipts» fact table.
  2. Generating a three-month report by store for plan/actual analysis.
  3. Generating an ABC analysis report.

It is important to note that in Anchor Modeling, all work with tables is done through views, as all tables need to be denormalised into a convenient format for analysis.

The queries were performed in a test environment with approximately 2.5 million records in the «Receipts» fact table. For the second and third queries, additional dimensions such as Products (26.3 thousand rows), Stores (32 rows), and Product Groups (983 rows) were used.

Query results:

Dimensional Modeling Anchor Modeling
Retrieving fact table (Receipts)
24 s
72 s
Plan/actual analysis report
2.2 s
13.3 s
ABC analysis report
13 s
60 s

Table 1 – comparison of execution times

Dimensional Modeling Anchor Modeling
Retrieving fact table (Receipts)
92 MB
1641 MB
Plan/actual analysis report
52 MB
791 MB
ABC analysis report
86 MB
1344 MB

Table 2 – comparison of memory usage

Dimensional Modeling Anchor Modeling
Retrieving fact table (Receipts)
152966400 B
334613800 B
Plan/actual analysis report
764040 B
933484 B
ABC analysis report
69865 B
95120 B

Table 3 – comparison of data reads in bytes

Dimensional Modeling Anchor Modeling
Retrieving fact table (Receipts)
146 MB
319 MB
Plan/actual analysis report
0,75 MB
0,9 MB
ABC analysis report
0,068 MB
0,092 MB

Table 4 – comparison of data reads in megabytes

In terms of performance, due to the specifics of the schema, Anchor Modeling falls short.

Both models were tested within a database schema where maintaining the history of all fields was a mandatory requirement: changes could occur in the product name as well as in the receipt issue date. Therefore, in the case of Anchor Modeling, even to retrieve a single fact, it was necessary to first query the current data (latest updates) from about 40 tables before accessing the main fact table. This created a high load and degraded query performance compared to Dimensional Modeling.

Conclusion

  • If detailed information about business processes and modeling flexibility are required, Anchor Modeling is the best choice.
  • If quick analytics and simplicity of implementation are needed, the Dimensional Model is a more suitable solution.

Thus, the choice between Anchor Modeling and Dimensional Modeling depends on the specific needs of the project and the characteristics of the data. It is important to thoroughly analyse both options before making an implementation decision.

For more information, please get in touch here:

Contact Form main