SQL Server indexes are self-maintaining, which means
that any time a data modification (such as an update, a delete, or an
insert) takes place on a table, the index B-tree is automatically
updated to reflect the correct data values and current rows. Generally,
you do not have to do any maintenance of the indexes, but indexes and
tables can become fragmented over time. There are two types of
fragmentation: external fragmentation and internal fragmentation.
External fragmentation occurs when the logical order
of pages does not match the physical order or the extents allocated to
the table are not contiguous. These situations occur typically with
clustered tables as a result of page splits and pages being allocated
and linked into the page chain from other extents. External
fragmentation is usually not much of an issue for most queries
performing small result set retrievals via an index. It’s more of a
performance issue for ordered scans of all or part of a table or index.
If the table is heavily fragmented and the pages are not contiguous,
scanning the page chain is more expensive.
Internal fragmentation occurs when an index is not
using up all the space within the pages in the table or index.
Fragmentation within an index page can happen for the following reasons:
As more records are added to a table, space
is used on the data page and on the index page. As a result, the page
eventually becomes completely full. If another insert takes place on
that page and there is no more room for the new row, SQL Server splits
the page into two, each page now being about 50% full. If the clustered
key values being inserted are not evenly distributed throughout the
table (as often happens with clustered indexes on sequential keys), this
extra free space might not be used.
Frequent update
statements can cause fragmentation in the database at the data and
index page level because the updates cause rows to move to other pages.
Again, if future clustered key values inserted into the table are not
evenly distributed throughout the table, the empty slots left behind
might not be used.
As rows are deleted,
space becomes freed up on data and index pages. If no new rows within
the range of deleted values on the page are inserted, the page remains
sparse.
Note
Internal fragmentation is not always a bad thing.
Although pages that are not completely full use up more space and
require more I/O during retrieval, free space within a page allows for
rows to be added without having to perform an expensive page split. For
some environments where the activity is more insert intensive than query
intensive, you might want more free space in pages. This can be
accomplished by applying the fill factor when creating the index on the
table. Applying the fill factor is described in more detail in the next
section.
Usually, in a system, all these factors contribute to
the fragmentation of data within the data pages and index pages. In an
environment subject to a lot of data modification, you might see a lot
of fragmentation on the data and index pages over a period of time.
These sparse and fragmented pages remain allocated to the table or index
even if they have only a single row or two, and the extent containing
the page remains allocated to the table or index.
Data fragmentation can adversely affect performance
for table or index scanning operations because the data is spread across
more pages than necessary. More I/Os are required to retrieve the data.
SQL Server provides a dynamic management view, sys.dm_db_index_physical_stats,
which is a multistatement table-valued function that returns size and
fragmentation information for the data and indexes of a specified table
or view. The results from the function are returned by a normal SELECT statement and thus can be saved to a table for reporting purposes and historical analysis. The syntax of dm_db_index_physical_stats is as follows:
sys.dm_db_index_physical_stats (
{ database_id | NULL | 0 | DEFAULT }
, { object_id | NULL | 0 | DEFAULT }
, { index_id | NULL | 0 | -1 | DEFAULT }
, { partition_number | NULL | 0 | DEFAULT }
, { mode | NULL | DEFAULT } )
The parameters for dm_db_index_physical_stats are summarized in Table 1.
Table 1. dm_db_index_physical_stats Parameters
Parameter | Description |
---|
database_id | The ID of the database. The default is 0, which returns information for all databases. NULL, 0, and DEFAULT are equivalent values in this context. If you specify NULL or 0, for database_id, you must specify NULL for object_id, index_id, and partition_number. |
object_id | The object ID of the table or view the index is on. Valid inputs are the ID number of a table or view, NULL, 0, or DEFAULT. The default is 0, which returns information for all tables and views in the specified database. NULL, 0, and DEFAULT are equivalent values in this context. |
index_id | The ID of the index. Valid inputs are the ID number of an index, 0 if object_id is a heap, NULL, -1, or DEFAULT. The default is -1, which returns information for all indexes for a table or view. NULL, -1, and DEFAULT are equivalent values in this context. If you specify NULL for index_id, you must also specify NULL for partition_number. |
partition_number | The partition number in the object. Valid inputs are the partition_number of an index or a heap, NULL, 0, or DEFAULT. The default is 0, which returns information for all partitions of the object. NULL, 0, and DEFAULT are equivalent values in this context. Use a partition_number of 1 for a nonpartitioned index or heap. |
mode | The scan level used to obtain physical index statistics. Valid inputs are DEFAULT, NULL, LIMITED, SAMPLED, or DETAILED. The default mode is LIMITED. NULL and DEFAULT are equivalent values in this context. |
The sys.dm_db_index_physical_stats function
requires only an Intent-Shared table lock, regardless of the mode in
which it runs. This provides for the capability to run the sys.dm_db_index_physical_stats function online without blocking update activity on a table.
The scan-level mode determines the level of scanning
performed by the function to obtain the physical statistics for the
index. The LIMITED mode is the fastest and scans the smallest
number of pages. It scans all data pages for a heap but scans only
leaf-level pages for an index. It also returns only a subset of the data
columns, as shown in Table 2. The SAMPLED
mode returns statistics based on a 1% sample of all the pages in the
index or heap. If the index or heap has fewer than 10,000 pages, DETAILED mode is used instead of SAMPLED. The SAMPLED scan mode displays information for only data pages of a heap and leaf-level pages of an index. The DETAILED mode scans all pages and returns all statistics for all data and index levels.
Table 2. dm_db_index_physical_stats Result Columns
Column Name | Data Type | Description | Displayed in LIMITED Scan Mode |
---|
database_id | Smallint | A database ID database containing the table or view. | Yes |
object_id | int | The object ID of the table or view where the index is located. | Yes |
index_id | int | The index ID of the index. 0 indicates a heap. | Yes |
partition_number | int | A partition number within the owning table, view, or index. | Yes |
index_type_desc | nvarchar(60) | The index type. Values are HEAP, CLUSTERED INDEX, NONCLUSTERED INDEX, PRIMARY XML INDEX, and XML INDEX. | Yes |
alloc_unit_type_desc | nvarchar(60) | A description of the allocation unit type. Values are IN_ROW_DATA, LOB_DATA, and ROW_OVERFLOW_DATA. | Yes |
index_depth | tinyint | The number of index levels. | Yes |
index_level | tinyint | The current level of the index. 0 indicates index leaf levels, heaps, and LOB_DATA or ROW_OVERFLOW_DATA allocation units. | Yes |
avg_fragmentation_in_percent | float | The percentage of logical fragmentation (out-of-order pages in the index). | Yes |
fragment_count | bigint | The number of fragments (physically consecutive leaf pages) in the index. | Yes |
avg_fragment_size_in_pages | float | The average number of pages in one fragment in an index. | Yes |
page_count | bigint | The total number of index or data pages at the current level. | Yes |
avg_page_space_used_in_percent | Float | The average percentage of available data storage space used in all pages. | No |
record_count | Bigint | The total number of records at the current level. | No |
ghost_record_count | Bigint | The number of ghost records ready for removal by the ghost cleanup task. | No |
version_ghost_record_count | Bigint | The number of ghost records retained by an outstanding snapshot isolation transaction in an allocation unit. | No |
min_record_size_in_bytes | Int | The minimum record size, in bytes. | No |
max_record_size_in_bytes | Int | The maximum record size, in bytes. | No |
avg_record_size_in_bytes | Float | The average record size, in bytes. | No |
forwarded_record_count | Bigint | The number of forwarded records in a heap. | No |
compressed_page_count_ | Bigint | The number of compressed pages in a heap. | No |
Tip
The scan modes get progressively slower from LIMITED to DETAILED
because more work is performed in each mode. To quickly gauge the size
or fragmentation level of a table or an index, first use the LIMITED mode. It is the fastest and does not return a row for each nonleaf level in the IN_ROW_DATA allocation unit of the index.
Table 2 describes the result columns returned by the dm_db_index_physical_stats table-valued function.
Listing 1 shows examples of running sys.dm_db_index_physical_stats on the sales_big table, using both LIMITED and DETAILED scan modes.
Listing 1. sys.dm_db_index_physical_stats Examples
use bigpubs2008
go
select str(index_id,3,0) as indid,
left(index_type_desc, 20) as index_type_desc,
index_depth as idx_depth,
index_level as idx_level,
str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
str(page_count, 10,0) as pg_cnt
FROM sys.dm_db_index_physical_stats
(db_id(), object_id('sales_big'),null, 0, 'LIMITED')
select str(index_id,3,0) as indid,
left(index_type_desc, 20) as index_type_desc,
index_depth as idx_depth,
index_level as idx_level,
str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
str(page_count, 10,0) as pg_cnt
FROM sys.dm_db_index_physical_stats
(db_id(), object_id('sales_big'),null, 0, 'DETAILED')
go
indid index_type_desc idx_depth idx_level avg_frgmnt_pct pg_cnt
----- -------------------- --------- --------- -------------- ----------
1 CLUSTERED INDEX 3 0 63.42 14519
2 NONCLUSTERED INDEX 3 0 14.90 4571
indid index_type_desc idx_depth idx_level avg_frgmnt_pct pg_cnt
----- -------------------- --------- --------- -------------- ----------
1 CLUSTERED INDEX 3 0 63.42 14519
1 CLUSTERED INDEX 3 1 92.11 38
1 CLUSTERED INDEX 3 2 0.00 1
2 NONCLUSTERED INDEX 3 0 14.90 4571
2 NONCLUSTERED INDEX 3 1 87.50 8
2 NONCLUSTERED INDEX 3 2 0.00 1
|
Again, you can see from the output in Listing 34.6 that the logical fragmentation (avg_frgmnt_pct) is 63.42% for the leaf level of the clustered index (idx_level = 0). This indicates that nearly two thirds of the data pages are out of sequence in relation to ordering of the clustered key values. If you want to improve the performance of table scans or clustered index scans for the sales_big table, you need to decide whether to rebuild the index or simply defragment the index.
The degree of fragmentation helps you decide which
defragmentation method to use. A rough guideline to use to help decide
is to examine the avg_fragmentation_in_percent value returned by the sys.dm_db_index_physical_stats function. If the avg_fragmentation_in_percent value is greater than 5% but less than 30%, you should reorganize the index. If the avg_fragmentation_in_percent
value is greater than 30%, you should consider rebuilding the index. If
you also have a dedicated maintenance window large enough to perform a
rebuild instead of simply reorganizing the index, you may as well run a
rebuild because it performs a more thorough defragmentation than
reorganizing the index.
Another factor in determining whether an index needs
to be defragmented is how the data is accessed. If your applications are
performing primarily single-row lookups, randomly accessing individual
rows of data, the internal or external fragmentation is not a factor
when it comes to query performance. Accessing one row from a fragmented
table is just as easy as from an unfragmented table. However, if your
applications are performing ordered range scan operations and reading
all or large numbers of the pages in a table, excessive fragmentation
can greatly slow down the scan. The more contiguous and full the pages,
the better the performance will be of the scanning operations.
Tip
If you have very low levels of fragmentation (less
than 5%), it is recommended that you not bother with either a
reorganization or a rebuild because the benefit of removing such a small
amount of fragmentation is not enough to justify the cost of
reorganizing or rebuilding the index.
In SQL Server 2008, the ALTER INDEX command provides options for defragmenting an index. Following is the syntax for the ALTER INDEX command:
ALTER INDEX { index_name | ALL }
ON [ database_name. [ schema_name ] . | schema_name. ]
table_or_view_name
{ REBUILD
[ [PARTITION = ALL]
[ WITH ( <rebuild_index_option> [ ,...n ] ) ]
| [ PARTITION = partition_number
[ WITH ( <single_partition_rebuild_index_option>
[ ,...n ] )
]
]
]
| DISABLE
| REORGANIZE
[ PARTITION = partition_number ]
[ WITH ( LOB_COMPACTION = { ON | OFF } ) ]
| SET ( <set_index_option> [ ,...n ] )
}
[ ; ]
<rebuild_index_option > ::=
{
PAD_INDEX = { ON | OFF }
| FILLFACTOR = fillfactor
| SORT_IN_TEMPDB = { ON | OFF }
| IGNORE_DUP_KEY = { ON | OFF }
| STATISTICS_NORECOMPUTE = { ON | OFF }
| ONLINE = { ON | OFF }
| ALLOW_ROW_LOCKS = { ON | OFF }
| ALLOW_PAGE_LOCKS = { ON | OFF }
| MAXDOP = max_degree_of_parallelism
| DATA_COMPRESSION = { NONE | ROW | PAGE }
[ ON PARTITIONS ( { <partition_number_expression> | <range> }
[ , ...n ] ) ]
}
<range> ::=
<partition_number_expression> TO <partition_number_expression>
}
<single_partition_rebuild_index_option> ::=
{
SORT_IN_TEMPDB = { ON | OFF }
| MAXDOP = max_degree_of_parallelism
| DATA_COMPRESSION = { NONE | ROW | PAGE } }
}
<set_index_option>::=
{
ALLOW_ROW_LOCKS = { ON | OFF }
| ALLOW_PAGE_LOCKS = { ON | OFF }
| IGNORE_DUP_KEY = { ON | OFF }
| STATISTICS_NORECOMPUTE = { ON | OFF }
}
The REORGANIZE option is always performed
online, regardless of which edition of SQL Server 2008 you are running,
allowing for other users to continue to update and query the underlying
data in the table while the REORGANIZE process is running. The REBUILD
option can be executed online only if you are running SQL Server 2008
Enterprise or Developer Editions. In all other editions of SQL Server
2008, the REBUILD option is executed offline. When it is
executed offline, SQL Server acquires exclusive locks on the underlying
data and associated indexes so any data modifications to the table are
blocked until the rebuild completes.
Reorganizing an index uses minimal system resources
to defragment only the leaf level of clustered and nonclustered indexes
of tables and views. The first phase of the reorganization process
compacts the rows on the leaf pages, reapplying the current fill factor
value to reduce the internal fragmentation. To view the current fill
factor setting, you can run a query such as the following against the sys.indexes system catalog view:
select cast(name as varchar(30)) as name, index_id, fill_factor
from sys.indexes
where object_id = object_id('sales_big')
go
name index_id fill_factor
---------------------------- ----------- -----------
ci_sales_big 1 0
idx1 2 0
The second phase of the reorganization process
involves the rearranging of the leaf-level pages so that the logical and
physical order of the pages match, thereby reducing the external
fragmentation of the leaf level of the index. SQL Server 2008 runs a REORGANIZATION
of an index online because the second phase processes only two pages at
a time, in an operation similar to a bubble sort. When defragmenting
the index, SQL Server 2008 determines the first physical page belonging
to the leaf level and the first logical page in the leaf level, and it
swaps the data on those two pages. It then identifies the next logical
and physical page and swaps them, and so on, until no more swaps need to
be made. At this point, the logical page ordering matches the physical
page ordering. While swapping the logical and physical pages, SQL Server
uses an additional new page as a temporary storage area. After each
page swap, SQL Server releases all locks and latches and saves the key
of the last moved page.
The following example uses ALTER TABLE to reorganize the clustered index on the sales_big table:
ALTER INDEX ci_sales_big on sales_big REORGANIZE
After running this command, you can run a query similar to the query in Listing 34.6 to display the fragmentation of the ci_sales_big index on the sales_big table:
select str(s.index_id,3,0) as indid,
left(i.name, 20) as index_name,
left(index_type_desc, 20) as index_type_desc,
index_depth as idx_depth,
index_level as level,
str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
str(page_count, 10,0) as pg_cnt
FROM sys.dm_db_index_physical_stats
(db_id('bigpubs2008'), object_id('sales_big'),1, 0, 'DETAILED') s
join sys.indexes i on s.object_id = i.object_id and s.index_id = i.index_id
go
indid index_name index_type_desc idx_depth level avg_frgmnt_pct pg_cnt
----- ------------ ------------- --------- ----- -------------- ------
1 ci_sales_big CLUSTERED INDEX 3 0 0.32 14519
1 ci_sales_big CLUSTERED INDEX 3 1 92.11 38
1 ci_sales_big CLUSTERED INDEX 3 2 0.00 1
As
you can see, the average fragmentation percentage is down to .32% from
63.42%, indicating that the index is now mostly defragmented. However,
the average fragmentation percentage of the intermediate level of the
index (level = 1) is still 92.11%, indicating that it
is heavily fragmented. To defragment the nonleaf levels of the index,
you need to rebuild the index. The following example shows how to
rebuild the index using the ALTER INDEX command:
ALTER INDEX ci_sales_big on sales_big REBUILD
After running this command, you can again run a query similar to the query in Listing 34.6 to display the fragmentation of the ci_sales_big index on the sales_big table:
select str(s.index_id,3,0) as indid,
left(i.name, 20) as index_name,
left(index_type_desc, 20) as index_type_desc,
index_depth as idx_depth,
index_level as level,
str(avg_fragmentation_in_percent, 5,2) as avg_frgmnt_pct,
str(page_count, 10,0) as pg_cnt
FROM sys.dm_db_index_physical_stats
(db_id('bigpubs2008'), object_id('sales_big'),1, 0, 'DETAILED') s
join sys.indexes i on s.object_id = i.object_id and s.index_id = i.index_id
go
indid index_name index_type_desc idx_depth level avg_frgmnt_pct pg_cnt
----- ------------ ------------- --------- ----- -------------- --------
1 ci_sales_big CLUSTERED INDEX 3 0 0.01 14520
1 ci_sales_big CLUSTERED INDEX 3 1 5.26 38
1 ci_sales_big CLUSTERED INDEX 3 2 0.00 1
You can see from these results that the REBUILD option performs a more thorough defragmentation of the ci_sales_big index than REORGANIZE. The average fragmentation percentage of both the leaf and intermediate levels is significantly less.
Note
When
you rebuild a nonclustered index, the rebuild operation requires enough
temporary disk space to store both the old and new indexes. However, if
the index is disabled before being rebuilt, the disk space made
available by disabling the index can be reused by the subsequent rebuild
or any other operation. No additional space is required except for
temporary disk space for sorting, which is typically only about 20% of
the index size.
Therefore, if disk space is limited, it may be
helpful to disable a nonclustered index before rebuilding it.
One of the other options to the CREATE INDEX and ALTER INDEX commands is the FILLFACTOR
option. The fill factor allows you to specify, as a percentage, the
fullness of the pages at the data and leaf index page levels,
essentially deciding how much free space to create in the index and data
pages to make room for new rows and avoid page splits.