november 13, 2024
Partitioning in SQL Server
Partitioning of data is the process of dividing large volumes of data into smaller, more manageable parts known as partitions.
It is an essential tool for optimising database performance, especially in high-volume and high-load environments. Data partitioning allows large tables to be split into smaller, independent parts that can be indexed, processed, and restored separately. This improves query performance, reduces recovery time after a failure, and simplifies data management.
In MS SQL Server, partitioning can be implemented through physical, logical, or stored procedural partitioning. In this article, we will explore the basics and practical aspects of partitioning in MS SQL Server.
Partitioning function
A partitioning function is a database object that defines how rows in a table or index are mapped to a set of partitions based on the values of a specific column, known as the partitioning column. Each value in the partitioning column is the input for the partitioning function, which returns the partition value.
A partitioning function is created using the CREATE PARTITION FUNCTION statement. It defines the rules by which rows in a table or index will be distributed across partitions based on the values of the specified column. For example, you can create a partitioning function that distributes rows by months of the year:
Next, a partition scheme is created, which defines how the data will actually be distributed across the database’s filegroups:
When querying a partitioned table, the query execution plan will display how many partitions were involved.
Maintenance of partitioned tables
Maintenance of partitioned tables in MS SQL Server involves several key aspects aimed at ensuring high performance, reliability, and data security. Here are a few details to consider:
- Performance monitoring: Regularly tracking performance metrics, such as query execution time, CPU, and memory usage, helps identify bottlenecks and optimise system performance.
- Query analysis: Using tools like EXPLAIN and EXECUTE PLAN allows for the analysis of query execution plans and optimisation to improve performance.
- Data placement management: In some cases (e.g., when data within a partition grows), it may be necessary to move data between partitions or relocate partitions themselves to balance load or optimise storage.
- Backup: To ensure fault tolerance and recovery after failures, regular backups of data, including system tables and user data, must be performed.
- Automatic statistics updates: To maintain the accuracy of query execution plans, it is important to regularly update partition statistics. In MS SQL Server, automatic statistics updates can be configured to enhance query performance.
Detailed information on maintaining partitioned pages can be found in the article «Database Maintenance».
Incremental statistics
Incremental statistics is a method of gathering statistics for partitions that allows updating statistics only for the modified rows, rather than recalculating the entire table each time. This is particularly useful for large tables with frequent changes.
Why is incremental statistics needed?
- Query plan efficiency: The database can better optimise query execution plans by using up-to-date information about data distribution.
- Fast statistics update: After data changes, statistics are updated quickly, minimising delays in query execution.
Configuring incremental statistics
In MS SQL Server, incremental statistics can be enabled through the CREATE STATISTICS option with the WITH INCREMENTAL = TRUE parameter.
Optimising data deletion with TRUNCATE PARTITION
The TRUNCATE PARTITION operation is used to remove all rows from a specific partition without logging individual deletions.
This is much faster than deleting rows using DELETE. It also allows you to create a partition consisting of data that needs to be removed and quickly dispose of the block of unnecessary data.
Conclusion
Partitioning is a powerful tool for optimising queries on large tables and indexes, as well as for dividing data by importance.
For example, the number of data reads from a partitioned table is reduced compared to a non-partitioned table.