july 13, 2024

Automatic statistics update

SQL Server uses a query optimizer that analyses statistics to determine the most efficient execution plan for a query. Therefore, the relevance of statistics directly affects query speed. This article discusses statistics and the MS SQL Server options that allow for their automatic updating.

Statistics are information about the distribution of values in one or more fields of a table. They consist of a histogram of values and a density vector. You can view the statistics in the Object Explorer in the tables section.

A histogram shows the frequency of values in a table field. The query optimizer uses it to calculate the number of rows that will be read and used further. An example of a histogram is shown in Figure 1.

Figure 1. Example of a histogram

A histogram consists of multiple groups of rows, with a maximum of 200 such groups. The RANGE_HI_KEY defines the upper boundary of a group, meaning the group includes values from the RANGE_HI_KEY of the previous group (exclusive) to the current value (inclusive). Statistics are gathered for each group, specifically:

The number of values not equal to RANGE_HI_KEY (RANGE_ROWS)
The number of values equal to RANGE_HI_KEY (EQ_ROWS)
The number of unique rows (DISTINCT_RANGE_ROWS)
The average number of unique rows (AVG_RANGE_ROWS = DISTINCT_RANGE_ROWS / RANGE_ROWS)

For example, let’s describe the statistics for the third group, where the RANGE_HI_KEY is 159. The group consists of values from 157 (exclusive) to 159 (inclusive). RANGE_ROWS is 1, meaning there is one value where 157 < n < 159, which in this case is 158. EQ_ROWS is also 1, indicating there is one value equal to 159 in the table. DISTINCT_RANGE_ROWS is 1, representing the single unique value 158, and AVG_RANGE_ROWS is 1.

Density is information about the selectivity of a field. It is equal to one divided by the number of unique values of the field or unique combinations of several fields. An example of density is shown in Figure 2.

Figure 2. Density of the ID field

According to the image, the density of the ID field is 0.005464481, which corresponds to 183 unique values. The Average Length column shows the average size of the value in bytes.

The density for multiple fields forms a vector. For example, if the statistics consist of the fields Customer, Item, and Price, the vector will include the densities of:

1) Customer

2) Customer and Item

3) Customer, Item and Price.

In our work, we use three options to regulate the automatic creation of statistics. Below are recommendations based on the analysis of statistical data from several projects.

1. Auto create statistics

When this option is enabled, the optimizer creates statistics for each field in the query if they do not already exist. By default, this option is enabled.

It is recommended to leave this option enabled; otherwise, the absence of statistics will lead to inefficient query plans and decreased performance.

You can disable or enable this option for the master database with the following query:

2. Auto update statistics

Modifying table data changes the distribution of values and renders statistics outdated. The AUTO_CREATE_STATISTICS option only creates missing statistics but does not update them. When the optimizer uses outdated statistics, it risks selecting an inefficient plan. Therefore, it is recommended to enable the AUTO_UPDATE_STATISTICS option, which allows the optimizer to periodically update statistics.

Before choosing an optimal plan, the optimizer checks the relevance of the statistics used. It compares the number of data modification operations for this field with a threshold value, which is based on the number of rows in the table. The required number of modifications for statistics to be considered outdated is presented in Table 1.

Table 1. Number of modifications required for statistics to be considered outdated

Type of table	Number of rows, n	Up to SQL Server 2014 (12.x)	After SQL Server 2016 (13.x)
Temporary	n < 6	6	6
Temporary	6 <= n <= 500	500	500
Regular	n <= 500	500	500
Temporary or Regular	500 < n <= 20000*	500 + (0.20 x n)	500 + (0.20 x n)
Temporary or Regular	n > 20000*	500 + (0.20 x n)	SQRT (1,000 x n)

* Approximate values

By default, this option is disabled. You can enable or disable the option for the master database with the following query:

The AUTO_UPDATE_STATISTICS option updates statistics synchronously, meaning the optimizer waits for the update to complete before selecting the plan for the current query. With asynchronous updating enabled, the optimizer does not wait for the statistics update to finish when choosing a plan. This results in the current query being executed with the old, potentially inefficient plan. Therefore, it is not recommended to use this option.

You can enable or disable the option for the master database with the following query:

3. Auto create incremental statistics

Updating statistics requires reading the entire table, which can be time-consuming. Often, such tables are partitioned, and it is possible to create statistics for each partition. This type of statistics is called incremental and does not require reading the entire table.

By default, this option is disabled. It is recommended to enable this option if the database uses large partitioned tables. You can enable or disable the option with the following query:

The following query creates incremental statistics:

You can check the value of these options for each database on the server with the following query:

Time comparison

We will compare the execution time of a simple query that joins two tables with up-to-date and outdated statistics. First, we will create statistics on an empty table, fill it with 6 million rows without updating the statistics, and execute the query. After that, we will update the statistics and run the same query. As a result, the query with outdated statistics took 69 seconds, while the query with up-to-date statistics took 52 seconds. The query time was reduced by 25% thanks to the statistics update. It is worth noting that for larger tables, the difference will be more significant.

It is important to note that statistics play a critical role in forming an efficient query plan. Therefore, it is recommended to enable the creation and updating of statistics and to create incremental statistics for partitioned tables.

info@lasmart.biz

Automatic statistics update

july 13, 2024

Automatic statistics update

1. Auto create statistics

2. Auto update statistics

3. Auto create incremental statistics

Time comparison

For more information, please get in touch here:

Go back

info@lasmart.biz