info@lasmart.biz

october 28, 2024

Database maintenance

Database maintenance is a comprehensive task that covers the entire lifecycle of a data repository. It is a crucial process that helps prevent database degradation.

This article will discuss the aspects of database maintenance that require special attention: table maintenance, index maintenance, statistics management, and fragmentation control.

Indexes, fragmentation and statistics

Indexes in databases

An index is a data structure used to speed up data retrieval in a database table. It works similarly to an index in a book, allowing quick access to information without having to look through every page (or row, in the case of a database). Indexes can significantly improve query performance, especially when working with large volumes of data.

Statistics are a set of metrics about the state of tables and indexes, used to optimise query execution plans. This information helps choose the most efficient way to process a query.

System and user statistics

System statistics are generated automatically by SQL Server and contain information about data distribution in tables and indexes. By default, SQL Server automatically updates system statistics during DML operations (INSERT, UPDATE, DELETE).

User statistics allow database users to explicitly define the data distribution in tables and indexes, which can be useful for optimising queries in cases where automatic statistics do not provide sufficient information. They are generated using the CREATE STATISTICS procedure and can be used by the query engine instead of system statistics. User statistics are created using the WITH FULLSCAN keyword, which means that all rows will be read to generate the statistics.

Why is it necessary to update statistics?

The system may use outdated statistics for query planning, leading to improper optimisation and reduced performance. When a significant amount of data is added, deleted, or modified, the statistics may become inaccurate, which also affects query execution plans. Regularly updating statistics helps maintain high system performance, preventing the negative effects of outdated information.

How to update statistics

In SQL Server, the UPDATE STATISTICS command is used to update statistics for specific objects.

In the database settings, you can enable the «Auto Update Statistics» option, which automatically updates statistics after a certain number of insert, update, or delete operations. By default, this option is set to «True». Auto update statistics is used in most cases. However, if regular statistics updates are required or if frequent automatic updates consume a lot of time/resources, it can be disabled by setting the flag to «False».

Another important parameter is «Auto Create Incremental Statistics». Incremental statistics are updated only for the rows that have changed, rather than the entire dataset. This type of statistics gathering is faster. However, this method can introduce a small margin of error, so it is not used in situations where precision is critical. By default, this option is set to «False».

Statistics in partitioned tables

For partitioned tables, SQL Server creates separate statistical objects for each partition. This means that the statistics for each partition can be updated independently. Automatic statistics updates occur when DML operations are performed within each partition, but sometimes a manual update of statistics may be necessary to improve query performance.

To update statistics on partitioned tables, the UPDATE STATISTICS command is used with the specified partition number. This allows for precise control over which partitions will be updated.

Suppose we have a table OrderDetails in the Sales schema, partitioned by the SaleYear field:

To update the statistics, you can use the following code:

Fragmentation

Index fragmentation occurs when the data in an index is disorganised or unevenly distributed. This can result in decreased search efficiency and increased query execution times. Fragmentation can arise due to the deletion or insertion of records, which causes the remaining index elements to shift.

Viewing RowStore fragmentation

To obtain detailed information about the physical state of RowStore indexes, you can use system views and functions, such as sys.dm_db_index_physical_stats.

The query will return a list of tables and indexes with a fragmentation percentage above zero, allowing you to identify areas for further investigation and potential rebuilding.

Viewing ColumnStore fragmentation

For ColumnStore indexes, specialised system views and functions are available, such as sys.dm_db_column_store_row_group_physical_stats.

This query allows you to view all fragmented segments of ColumnStore indexes.

Rebuild and Reorganize

The REBUILD and REORGANIZE operations are used to improve performance and reduce fragmentation in indexes and tables. These operations have different purposes and mechanisms, making them suitable for different scenarios.

Rebuild

  • Purpose: Complete reconstruction of an index or table. This is the most thorough way to address fragmentation, as it completely recreates the index or table from scratch, removing all old data and restoring the structure to an optimal state.
  • Mechanism: The REBUILD operation moves all data into a new data block, creates new pages and indexes, and deletes the old data. This can take a considerable amount of time, especially for large objects, and requires additional disk space for temporary files.
  • Example use case for indexes:
  • Example use case for tables:

Reorganize

  • Purpose: Improve performance by reducing fragmentation while preserving the primary structure of the index or table. It is a less time-consuming and resource-intensive operation compared to REBUILD.
  • Mechanism: REORGANIZE moves only the necessary parts of the data to reduce fragmentation while keeping the remaining data and indexes intact. This allows for a significant improvement in performance with less execution time and resource usage.
  • Example use case for indexes:
  • Example use case for tables:

Difference between REBUILD and REORGANIZE

REBUILD takes more time and resources because it completely recreates the index or table, whereas REORGANIZE performs a lighter operation by moving only the fragmented data. While REORGANIZE is faster, REBUILD can offer a more long-term solution for severe fragmentation, especially if it is caused by the deletion of a large amount of data.

The choice between REBUILD and REORGANIZE depends on the specific situation, the level of fragmentation, and the available resources. In most cases, REORGANIZE is the preferred option for daily maintenance, while REBUILD can be used to address more serious fragmentation issues.

Updating indexes in partitioned tables

Indexes of partitioned tables can be updated for each partition individually by adding PARTITION = partition_number after REBUILD/REORGANIZE:

Using Hints with REBUILD

Hints are instructions that provide additional guidance to the query optimiser on how to best execute a query. They can be used to improve query performance, but their use should be justified, as incorrect use of hints can lead to undesired outcomes.

Below are the most commonly used hints in conjunction with REBUILD:

  • ONLINE
    Allows maintaining the availability of tables/indexes during the REBUILD operation.

(Index operations in «online» mode are only supported in the SQL Server Enterprise Edition)

  • SORT_IN_TEMPDB
    Specifies where data sorting will occur during these operations. By default, sorting is performed in place, which can lead to significant memory and CPU usage on the server. Using SORT_IN_TEMPDB allows this process to be moved to tempdb, which can reduce the load on the main database and improve server performance.

Conclusion

Proper use of the described rules and tools will help optimise database performance and maintain high query execution speeds. Neglecting maintenance will inevitably lead to a decline in database performance.

For more information, please get in touch here:

Contact Form main