SQL Server 2008 R2 : SQL Server Index Maintenance (part 1)

8/18/2013 11:14:16 AM

SQL Server indexes are self-maintaining, which means that any time a data modification (such as an update, a delete, or an insert) takes place on a table, the index B-tree is automatically updated to reflect the correct data values and current rows. Generally, you do not have to do any maintenance of the indexes, but indexes and tables can become fragmented over time. There are two types of fragmentation: external fragmentation and internal fragmentation.

External fragmentation occurs when the logical order of pages does not match the physical order or the extents allocated to the table are not contiguous. These situations occur typically with clustered tables as a result of page splits and pages being allocated and linked into the page chain from other extents. External fragmentation is usually not much of an issue for most queries performing small result set retrievals via an index. It’s more of a performance issue for ordered scans of all or part of a table or index. If the table is heavily fragmented and the pages are not contiguous, scanning the page chain is more expensive.

Internal fragmentation occurs when an index is not using up all the space within the pages in the table or index. Fragmentation within an index page can happen for the following reasons:

As more records are added to a table, space is used on the data page and on the index page. As a result, the page eventually becomes completely full. If another insert takes place on that page and there is no more room for the new row, SQL Server splits the page into two, each page now being about 50% full. If the clustered key values being inserted are not evenly distributed throughout the table (as often happens with clustered indexes on sequential keys), this extra free space might not be used.
Frequent update statements can cause fragmentation in the database at the data and index page level because the updates cause rows to move to other pages. Again, if future clustered key values inserted into the table are not evenly distributed throughout the table, the empty slots left behind might not be used.
As rows are deleted, space becomes freed up on data and index pages. If no new rows within the range of deleted values on the page are inserted, the page remains sparse.

Note

Internal fragmentation is not always a bad thing. Although pages that are not completely full use up more space and require more I/O during retrieval, free space within a page allows for rows to be added without having to perform an expensive page split. For some environments where the activity is more insert intensive than query intensive, you might want more free space in pages. This can be accomplished by applying the fill factor when creating the index on the table. Applying the fill factor is described in more detail in the next section.

Usually, in a system, all these factors contribute to the fragmentation of data within the data pages and index pages. In an environment subject to a lot of data modification, you might see a lot of fragmentation on the data and index pages over a period of time. These sparse and fragmented pages remain allocated to the table or index even if they have only a single row or two, and the extent containing the page remains allocated to the table or index.

Data fragmentation can adversely affect performance for table or index scanning operations because the data is spread across more pages than necessary. More I/Os are required to retrieve the data. SQL Server provides a dynamic management view, sys.dm_db_index_physical_stats, which is a multistatement table-valued function that returns size and fragmentation information for the data and indexes of a specified table or view. The results from the function are returned by a normal SELECT statement and thus can be saved to a table for reporting purposes and historical analysis. The syntax of dm_db_index_physical_stats is as follows:

sys.dm_db_index_physical_stats (
    { database_id | NULL | 0 | DEFAULT }
    , { object_id | NULL | 0 | DEFAULT }
    , { index_id | NULL | 0 | -1 | DEFAULT }
    , { partition_number | NULL | 0 | DEFAULT }
    , { mode | NULL | DEFAULT } )

The parameters for dm_db_index_physical_stats are summarized in Table 1.

Table 1. dm_db_index_physical_stats Parameters
Parameter	Description
database_id	The ID of the database. The default is 0, which returns information for all databases. NULL, 0, and DEFAULT are equivalent values in this context. If you specify NULL or 0, for database_id, you must specify NULL for object_id, index_id, and partition_number.
object_id	The object ID of the table or view the index is on. Valid inputs are the ID number of a table or view, NULL, 0, or DEFAULT. The default is 0, which returns information for all tables and views in the specified database. NULL, 0, and DEFAULT are equivalent values in this context.
index_id	The ID of the index. Valid inputs are the ID number of an index, 0 if object_id is a heap, NULL, -1, or DEFAULT. The default is -1, which returns information for all indexes for a table or view. NULL, -1, and DEFAULT are equivalent values in this context. If you specify NULL for index_id, you must also specify NULL for partition_number.
partition_number	The partition number in the object. Valid inputs are the partition_number of an index or a heap, NULL, 0, or DEFAULT. The default is 0, which returns information for all partitions of the object. NULL, 0, and DEFAULT are equivalent values in this context. Use a partition_number of 1 for a nonpartitioned index or heap.
mode	The scan level used to obtain physical index statistics. Valid inputs are DEFAULT, NULL, LIMITED, SAMPLED, or DETAILED. The default mode is LIMITED. NULL and DEFAULT are equivalent values in this context.

The sys.dm_db_index_physical_stats function requires only an Intent-Shared table lock, regardless of the mode in which it runs. This provides for the capability to run the sys.dm_db_index_physical_stats function online without blocking update activity on a table.

The scan-level mode determines the level of scanning performed by the function to obtain the physical statistics for the index. The LIMITED mode is the fastest and scans the smallest number of pages. It scans all data pages for a heap but scans only leaf-level pages for an index. It also returns only a subset of the data columns, as shown in Table 2. The SAMPLED mode returns statistics based on a 1% sample of all the pages in the index or heap. If the index or heap has fewer than 10,000 pages, DETAILED mode is used instead of SAMPLED. The SAMPLED scan mode displays information for only data pages of a heap and leaf-level pages of an index. The DETAILED mode scans all pages and returns all statistics for all data and index levels.

Table 2. dm_db_index_physical_stats Result Columns
Column Name	Data Type	Description	Displayed in LIMITED Scan Mode
database_id	Smallint	A database ID database containing the table or view.	Yes
object_id	int	The object ID of the table or view where the index is located.	Yes
index_id	int	The index ID of the index. 0 indicates a heap.	Yes
partition_number	int	A partition number within the owning table, view, or index.	Yes
index_type_desc	nvarchar(60)	The index type. Values are HEAP, CLUSTERED INDEX, NONCLUSTERED INDEX, PRIMARY XML INDEX, and XML INDEX.	Yes
alloc_unit_type_desc	nvarchar(60)	A description of the allocation unit type. Values are IN_ROW_DATA, LOB_DATA, and ROW_OVERFLOW_DATA.	Yes
index_depth	tinyint	The number of index levels.	Yes
index_level	tinyint	The current level of the index. 0 indicates index leaf levels, heaps, and LOB_DATA or ROW_OVERFLOW_DATA allocation units.	Yes
avg_fragmentation_in_percent	float	The percentage of logical fragmentation (out-of-order pages in the index).	Yes
fragment_count	bigint	The number of fragments (physically consecutive leaf pages) in the index.	Yes
avg_fragment_size_in_pages	float	The average number of pages in one fragment in an index.	Yes
page_count	bigint	The total number of index or data pages at the current level.	Yes
avg_page_space_used_in_percent	Float	The average percentage of available data storage space used in all pages.	No
record_count	Bigint	The total number of records at the current level.	No
ghost_record_count	Bigint	The number of ghost records ready for removal by the ghost cleanup task.	No
version_ghost_record_count	Bigint	The number of ghost records retained by an outstanding snapshot isolation transaction in an allocation unit.	No
min_record_size_in_bytes	Int	The minimum record size, in bytes.	No
max_record_size_in_bytes	Int	The maximum record size, in bytes.	No
avg_record_size_in_bytes	Float	The average record size, in bytes.	No
forwarded_record_count	Bigint	The number of forwarded records in a heap.	No
compressed_page_count_	Bigint	The number of compressed pages in a heap.	No

Tip

The scan modes get progressively slower from LIMITED to DETAILED because more work is performed in each mode. To quickly gauge the size or fragmentation level of a table or an index, first use the LIMITED mode. It is the fastest and does not return a row for each nonleaf level in the IN_ROW_DATA allocation unit of the index.

Table 2 describes the result columns returned by the dm_db_index_physical_stats table-valued function.

Listing 1 shows examples of running sys.dm_db_index_physical_stats on the sales_big table, using both LIMITED and DETAILED scan modes.

Listing 1. sys.dm_db_index_physical_stats Examples

Code View: Scroll / Show All

use bigpubs2008
go
select str(index_id,3,0) as indid,
      left(index_type_desc, 20) as index_type_desc,
      index_depth as idx_depth,
      index_level as idx_level,
      str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
      str(page_count, 10,0) as pg_cnt
  FROM sys.dm_db_index_physical_stats
      (db_id(), object_id('sales_big'),null, 0, 'LIMITED')

select str(index_id,3,0) as indid,
      left(index_type_desc, 20) as index_type_desc,
      index_depth as idx_depth,
      index_level as idx_level,
      str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
      str(page_count, 10,0) as pg_cnt
  FROM sys.dm_db_index_physical_stats
    (db_id(), object_id('sales_big'),null, 0, 'DETAILED')
go

indid index_type_desc      idx_depth idx_level avg_frgmnt_pct pg_cnt
----- -------------------- --------- --------- -------------- ----------
  1   CLUSTERED INDEX      3         0         63.42               14519
  2   NONCLUSTERED INDEX   3         0         14.90                4571

indid index_type_desc      idx_depth idx_level avg_frgmnt_pct pg_cnt
----- -------------------- --------- --------- -------------- ----------
  1   CLUSTERED INDEX      3         0         63.42               14519
  1   CLUSTERED INDEX      3         1         92.11                  38
  1   CLUSTERED INDEX      3         2          0.00                   1
  2   NONCLUSTERED INDEX   3         0         14.90                4571
  2   NONCLUSTERED INDEX   3         1         87.50                   8
  2   NONCLUSTERED INDEX   3         2          0.00                   1

Again, you can see from the output in Listing 34.6 that the logical fragmentation (avg_frgmnt_pct) is 63.42% for the leaf level of the clustered index (idx_level = 0). This indicates that nearly two thirds of the data pages are out of sequence in relation to ordering of the clustered key values. If you want to improve the performance of table scans or clustered index scans for the sales_big table, you need to decide whether to rebuild the index or simply defragment the index.

The degree of fragmentation helps you decide which defragmentation method to use. A rough guideline to use to help decide is to examine the avg_fragmentation_in_percent value returned by the sys.dm_db_index_physical_stats function. If the avg_fragmentation_in_percent value is greater than 5% but less than 30%, you should reorganize the index. If the avg_fragmentation_in_percent value is greater than 30%, you should consider rebuilding the index. If you also have a dedicated maintenance window large enough to perform a rebuild instead of simply reorganizing the index, you may as well run a rebuild because it performs a more thorough defragmentation than reorganizing the index.

Another factor in determining whether an index needs to be defragmented is how the data is accessed. If your applications are performing primarily single-row lookups, randomly accessing individual rows of data, the internal or external fragmentation is not a factor when it comes to query performance. Accessing one row from a fragmented table is just as easy as from an unfragmented table. However, if your applications are performing ordered range scan operations and reading all or large numbers of the pages in a table, excessive fragmentation can greatly slow down the scan. The more contiguous and full the pages, the better the performance will be of the scanning operations.

Tip

If you have very low levels of fragmentation (less than 5%), it is recommended that you not bother with either a reorganization or a rebuild because the benefit of removing such a small amount of fragmentation is not enough to justify the cost of reorganizing or rebuilding the index.

In SQL Server 2008, the ALTER INDEX command provides options for defragmenting an index. Following is the syntax for the ALTER INDEX command:

ALTER INDEX { index_name | ALL }
    ON  [ database_name. [ schema_name ] . | schema_name. ]
        table_or_view_name
    { REBUILD
        [ [PARTITION = ALL]
                    [ WITH ( <rebuild_index_option> [ ,...n ] ) ]
          | [ PARTITION = partition_number
                [ WITH ( <single_partition_rebuild_index_option>
                        [ ,...n ] )
                ]
            ]
        ]
    | DISABLE
    | REORGANIZE
        [ PARTITION = partition_number ]
        [ WITH ( LOB_COMPACTION = { ON | OFF } ) ]
  | SET ( <set_index_option> [ ,...n ] )
    }
[ ; ]

<rebuild_index_option > ::=
{
    PAD_INDEX = { ON | OFF }
  | FILLFACTOR = fillfactor
  | SORT_IN_TEMPDB = { ON | OFF }
  | IGNORE_DUP_KEY = { ON | OFF }
  | STATISTICS_NORECOMPUTE = { ON | OFF }
  | ONLINE = { ON | OFF }
  | ALLOW_ROW_LOCKS = { ON | OFF }
  | ALLOW_PAGE_LOCKS = { ON | OFF }
  | MAXDOP = max_degree_of_parallelism
  | DATA_COMPRESSION = { NONE | ROW | PAGE }
     [ ON PARTITIONS ( { <partition_number_expression> | <range> }
     [ , ...n ] ) ]
}
<range> ::=
<partition_number_expression> TO <partition_number_expression>
}

<single_partition_rebuild_index_option> ::=
{
    SORT_IN_TEMPDB = { ON | OFF }
  | MAXDOP = max_degree_of_parallelism
  | DATA_COMPRESSION = { NONE | ROW | PAGE } }
}

<set_index_option>::=
{
    ALLOW_ROW_LOCKS = { ON | OFF }
  | ALLOW_PAGE_LOCKS = { ON | OFF }
  | IGNORE_DUP_KEY = { ON | OFF }
  | STATISTICS_NORECOMPUTE = { ON | OFF }
}

The REORGANIZE option is always performed online, regardless of which edition of SQL Server 2008 you are running, allowing for other users to continue to update and query the underlying data in the table while the REORGANIZE process is running. The REBUILD option can be executed online only if you are running SQL Server 2008 Enterprise or Developer Editions. In all other editions of SQL Server 2008, the REBUILD option is executed offline. When it is executed offline, SQL Server acquires exclusive locks on the underlying data and associated indexes so any data modifications to the table are blocked until the rebuild completes.

Reorganizing an index uses minimal system resources to defragment only the leaf level of clustered and nonclustered indexes of tables and views. The first phase of the reorganization process compacts the rows on the leaf pages, reapplying the current fill factor value to reduce the internal fragmentation. To view the current fill factor setting, you can run a query such as the following against the sys.indexes system catalog view:

select cast(name as varchar(30)) as name, index_id, fill_factor
   from sys.indexes
   where object_id = object_id('sales_big')
go
name                           index_id    fill_factor
---------------------------- ----------- -----------
ci_sales_big                   1           0
idx1                           2           0

The second phase of the reorganization process involves the rearranging of the leaf-level pages so that the logical and physical order of the pages match, thereby reducing the external fragmentation of the leaf level of the index. SQL Server 2008 runs a REORGANIZATION of an index online because the second phase processes only two pages at a time, in an operation similar to a bubble sort. When defragmenting the index, SQL Server 2008 determines the first physical page belonging to the leaf level and the first logical page in the leaf level, and it swaps the data on those two pages. It then identifies the next logical and physical page and swaps them, and so on, until no more swaps need to be made. At this point, the logical page ordering matches the physical page ordering. While swapping the logical and physical pages, SQL Server uses an additional new page as a temporary storage area. After each page swap, SQL Server releases all locks and latches and saves the key of the last moved page.

The following example uses ALTER TABLE to reorganize the clustered index on the sales_big table:

ALTER INDEX ci_sales_big on sales_big REORGANIZE

After running this command, you can run a query similar to the query in Listing 34.6 to display the fragmentation of the ci_sales_big index on the sales_big table:

select str(s.index_id,3,0) as indid,
      left(i.name, 20) as index_name,
      left(index_type_desc, 20) as index_type_desc,
      index_depth as idx_depth,
      index_level as level,
      str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
      str(page_count, 10,0) as pg_cnt
  FROM sys.dm_db_index_physical_stats
      (db_id('bigpubs2008'), object_id('sales_big'),1, 0, 'DETAILED') s
    join sys.indexes i on s.object_id = i.object_id and s.index_id = i.index_id
go

indid index_name   index_type_desc idx_depth level avg_frgmnt_pct pg_cnt
----- ------------ ------------- --------- ----- -------------- ------
  1   ci_sales_big CLUSTERED INDEX 3         0      0.32           14519
  1   ci_sales_big CLUSTERED INDEX 3         1     92.11              38
  1   ci_sales_big CLUSTERED INDEX 3         2      0.00               1

As you can see, the average fragmentation percentage is down to .32% from 63.42%, indicating that the index is now mostly defragmented. However, the average fragmentation percentage of the intermediate level of the index (level = 1) is still 92.11%, indicating that it is heavily fragmented. To defragment the nonleaf levels of the index, you need to rebuild the index. The following example shows how to rebuild the index using the ALTER INDEX command:

ALTER INDEX ci_sales_big on sales_big REBUILD

After running this command, you can again run a query similar to the query in Listing 34.6 to display the fragmentation of the ci_sales_big index on the sales_big table:

select str(s.index_id,3,0) as indid,
      left(i.name, 20) as index_name,
      left(index_type_desc, 20) as index_type_desc,
      index_depth as idx_depth,
      index_level as level,
      str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
      str(page_count, 10,0) as pg_cnt
  FROM sys.dm_db_index_physical_stats
      (db_id('bigpubs2008'), object_id('sales_big'),1, 0, 'DETAILED') s
    join sys.indexes i on s.object_id = i.object_id and s.index_id = i.index_id
go

indid index_name   index_type_desc idx_depth level avg_frgmnt_pct pg_cnt
----- ------------ ------------- --------- ----- -------------- --------
  1   ci_sales_big CLUSTERED INDEX 3         0      0.01           14520
  1   ci_sales_big CLUSTERED INDEX 3         1      5.26              38
  1   ci_sales_big CLUSTERED INDEX 3         2      0.00               1

You can see from these results that the REBUILD option performs a more thorough defragmentation of the ci_sales_big index than REORGANIZE. The average fragmentation percentage of both the leaf and intermediate levels is significantly less.

Note

When you rebuild a nonclustered index, the rebuild operation requires enough temporary disk space to store both the old and new indexes. However, if the index is disabled before being rebuilt, the disk space made available by disabling the index can be reused by the subsequent rebuild or any other operation. No additional space is required except for temporary disk space for sorting, which is typically only about 20% of the index size.

Therefore, if disk space is limited, it may be helpful to disable a nonclustered index before rebuilding it.

One of the other options to the CREATE INDEX and ALTER INDEX commands is the FILLFACTOR option. The fill factor allows you to specify, as a percentage, the fullness of the pages at the data and leaf index page levels, essentially deciding how much free space to create in the index and data pages to make room for new rows and avoid page splits.

Others

- SQL Server 2008 R2 : Index Statistics (part 2) - Estimating Rows Using Index Statistics, Generating and Maintaining Index and Column Statistics

- SQL Server 2008 R2 : Index Statistics (part 1) - The Statistics Histogram, How the Statistics Histogram Is Used, Index Densities

- Keeping Windows 7 and Other Software Up to Date : Installing and Removing Software (part 2)

- Keeping Windows 7 and Other Software Up to Date : Installing and Removing Software (part 1)

- Keeping Windows 7 and Other Software Up to Date : Service Packs

- SharePoint 2010 : Performance Monitoring - Implementing Visual Round Trip Analyzer

- SharePoint 2010 : Performance Monitoring - What and how to monitor with Performance Monitor

- SharePoint 2010 : Performance Monitoring - Using SQL Profiler

- SharePoint 2010 : Performance Monitoring - Enabling HTTP Request Monitoring and Throttling

- SharePoint 2010 : Monitoring and Reporting - Enabling the Developer Dashboard

SQL Server 2008 R2 : SQL Server Index Maintenance (part 1)

Table 1. dm_db_index_physical_stats Parameters

Table 2. dm_db_index_physical_stats Result Columns

Listing 1. sys.dm_db_index_physical_stats Examples