As mentioned earlier, the selectivity of a key is an
important factor that determines whether an index will be used to
retrieve the data rows that satisfy a query. SQL Server stores the
selectivity and a histogram of sample values of the key; based on the
statistics stored for the key columns for the index and the SARGs
specified for the query, the Query Optimizer decides which index to use.
To see the statistical information stored for an index, use the DBCC SHOW_STATISTICS command, which returns the following pieces of information:
A histogram that contains an even sampling of
the values for the first column in the index key. SQL Server stores up
to 200 sample values in the histogram.
Index
densities for the combination of columns in the index. Index density
indicates the uniqueness of the index key(s) and is discussed later in
this section.
The number of rows in the table at the time the statistics were computed.
The number of rows sampled to generate the statistics.
The number of sample values (steps) stored in the histogram.
The average key length.
Whether the index is defined on a string column.
The date and time the statistics were generated.
The syntax for DBCC SHOW_STATISTICS is as follows:
DBCC SHOW_STATISTICS (tablename, index)
Listing 1 displays the abbreviated output from DBCC SHOW_STATISTICS, showing the statistical information for the aunmind nonclustered index on the au_lname and au_fname columns of the authors table.
Listing 1. DBCC SHOW_STATISTICS Output for the aunmind Index on the authors Table
dbcc show_statistics (authors, aunmind )
go
Name Updated Rows Rows Sampled Steps Density
Average key length String Index Filter Expression Unfiltered Rows
---------- -------------------- ---- --------------- ------ --------
--------------------------------------------------- ---------------
aunmind Mar 14 2010 10:20PM 172 172 148 1
24.06977 YES NULL 172
(1 row(s) affected)
All density Average Length Columns
------------- -------------- -------------------------
0.00625 6.406977 au_lname
0.005813953 13.06977 au_lname, au_fname
0.005813953 24.06977 au_lname, au_fname, au_id
(3 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
----------------------------------------------------------------------
Ahlberg 0 2 0 1
Alexander 0 1 0 1
Amis 0 1 0 1
Arendt 0 1 0 1
Arnosky 0 1 0 1
Bate 0 1 0 1
Bauer 0 1 0 1
Benchley 0 1 0 1
Bennet 0 1 0 1
Blotchet-Halls 0 1 0 1
...
del Castillo 0 1 0 1
Dillard 0 1 0 1
Doctorow 0 1 0 1
Doyle 0 1 0 1
Durrenmatt 2 1 2 1
Eastman 0 1 0 1
...
Gringlesby 0 1 0 1
Grisham 0 1 0 1
Gunning 0 1 0 1
Hill 0 1 0 1
Hutchins 3 2 3 1
Ionesco 0 1 0 1
Ishiguro 0 1 0 1
...
Tyler 0 1 0 1
Van Allsburg 0 1 0 1
Van der 0 1 0 1
Van der Meer 0 1 0 1
von Goethe 0 1 0 1
Walker 0 1 0 1
Warner 0 1 0 1
White 0 2 0 1
Wilder 0 1 0 1
Williams 0 2 0 1
Wilson 0 1 0 1
Yates 0 1 0 1
Yokomoto 0 1 0 1
Young 0 1 0 1
|
Looking
at the output, you can determine that the statistics were last updated
on March 14, 2010. At the time the statistics were generated, the table
had 172 rows, and all 172 rows were sampled to generate the statistics
(no filtering was applied). The average key length is 24.06977 bytes.
From the All density information, you can see that this index
is highly selective. (A low density means high selectivity; index
densities are covered shortly.)
After the general information and the index densities, the index histogram is displayed.
1. The Statistics Histogram
Up to 200 sample values can be stored in the statistics histogram. Each sample value is called a step. The sample value stored in each step is the endpoint of a range of values. Three values are stored for each step:
- RANGE_ROWS—This
indicates how many other rows are inside the range between the current
step and the step prior, not including the step values themselves.
- EQ_ROWS—This
is the number of rows that have the same value as the sample value. In
other words, it is the number of duplicate values for the step.
- Range density—This
indicates the number of distinct values within the range. The range
density information is actually displayed in two separate columns, DISTINCT_RANGE_ROWS and AVG_RANGE_ROWS:
DISTINCT_RANGE_ROWS is the number of distinct values between the current step and the step prior, not including the step values itself.
AVG_RANGE_ROWS is the average number of rows per distinct value within the range of the step.
In the output in Listing 34.4,
distinct key values in the first column of the index are stored as the
sample values in the histogram. Because most of the values for au_lname are unique, most of the range values are 0. You can see that there is a duplicate in the index key for the last name of Hutchins (EQ_ROWS is 2). For comparison purposes, Listing 2 shows a snippet of the DBCC SHOW_STATISTICS output for the titleidind index on the sales table in bigpubs2008.
Listing 2. DBCC SHOW_STATISTICS Output for the titleidind Index on the sales Table in the bigpubs2008 Database
dbcc show_statistics (sales, 'titleidind')
go
Name Updated Rows Rows Sampled Steps Density
Average key length String Index Filter Expression Unfiltered Rows
----------- ------------------- ------- ------------ ------ -------------
------------------ ------------ ----------------- ---------------
titleidind Mar 14 2010 10:39PM 168725 152432 188 0.003537365
26.40519 YES NULL 168725
All density Average Length Columns
------------- -------------- --------------------------------------------------
0.001858736 6 title_id
5.99844E-06 10 title_id, stor_id
5.926804E-06 26.4007 title_id, stor_id, ord_num
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ---------- -------- -------------------- --------------
BI0194 0 274.8199 0 1
BI2184 639.6047 312.9337 2 277.1448
BI2915 893.1208 271.811 3 261.8779
BI3976 637.2789 260.778 2 276.137
BI8448 1685.068 281.8409 6 300.0652
BU1111 616.3464 276.8259 2 267.0668
BU7832 357.0157 299.8948 1 296.2236
CH0249 1067.558 279.8349 3 313.0259
CH0639 1019.879 284.8499 3 299.0454
CH0671 316.3136 259.7751 1 262.4521
CH0847 1333.867 266.796 5 295.557
CH1260 1069.884 287.8589 3 313.7079
CH1380 612.8576 311.9307 2 265.5551
CH1568 645.4193 286.8559 2 279.6643
CH1692 974.525 275.8229 3 285.7469
CH2080 329.1057 285.8529 1 273.066
CH2240 715.1943 273.817 2 309.8983
CH2256 352.364 310.9277 1 292.364
CH2360 630.3014 293.8768 2 273.1136
CH2480 626.8126 311.9307 2 271.6019
CH2574 679.1439 279.8349 2 294.2774
CH2610 334.9203 280.8379 1 277.8905
CH2706 343.0607 300.8978 1 284.6448
CH2856 326.7799 287.8589 1 271.1362
...
FI9853 623.3239 295.8828 2 270.0902
FI9965 625.6497 323.9666 2 271.098
LC1680 629.1384 286.8559 2 272.6097
LC5292 647.7451 265.793 2 280.6721
MC3021 610.5318 244.7302 2 264.5473
NF2924 652.3968 266.796 2 282.6877
NF8918 669.8406 310.9277 2 290.2462
PC9999 665.1889 275.8229 2 288.2306
PS2106 709.3798 259.7751 2 307.3788
TC3218 617.5093 291.8708 2 267.5707
TC4203 29.23513 293.8768 0 284.9097
TC7777 29.23513 269.805 0 284.9097
|
As you can see in this example, there are a greater
number of rows per range and a greater number of duplicates for each
step value. Also, 188 steps in the histogram are used, and the sample
values for the 168,725 rows in the table are distributed across those
188 step values. Also, in this example, 152,432 rows, rather than the
whole table, were sampled to generate the statistics.
2. How the Statistics Histogram Is Used
The histogram steps are used for SARGs only when a
constant expression is compared against an indexed column and the value
of the constant expression is known at query compile time. The following
SARG examples show where histogram steps can be used:
Some constant expressions cannot be evaluated until
query runtime. They include search arguments that contain local
variables or subqueries and also join clauses, such as the following:
For
these types of statements, you need some other way of estimating the
number of matching rows. In addition, because histogram steps are kept
only on the first column of the index, SQL Server must use a different
method for determining the number of matching rows for SARGs that
specify multiple column values for a composite index, such as the
following:
select * from sales
where title_id = 'BI3976'
and stor_id = 'P648'
When the histogram is not used or cannot be used, SQL
Server uses the index density values to estimate the number of matching
rows.
3. Index Densities
SQL Server stores the density values of each column
in the index for use in queries where the SARG value is not known until
runtime or when the SARG is on multiple columns of the index. For
composite keys, SQL Server stores the density for the first column of
the composite key; for the first and second columns; for the first,
second, and third columns; and so on. This information is shown in the All density section of the DBCC SHOW_STATISTICS output in Listings 34.4 and 34.5.
Index density essentially represents the inverse of
all unique key values of the key. The density of each key is calculated
by using the following formula:
Key density = 1.00 / Count of distinct key values in the table |
Therefore, the density for the au_lname column in the authors table in the bigpubs2008 database is calculated as follows:
Select Density = 1.00/ (select count(distinct au_lname) from authors)
go
Density
---------------------------------------
0.0062500000000
The density for the combination of the columns au_lname and au_fname is as follows:
Select Density = 1.00/ (select count(distinct au_lname + au_fname) from authors)
go
Density
----------------
0.0058139534883
Notice that, unlike with the selectivity ratio, a smaller
index density indicates a more selective index. As the density value
approaches 1, the index becomes less selective and essentially useless.
When the index selectivity is poor, the Query Optimizer might choose to
do a table scan or a leaf-level index scan rather than perform an index
seek because it is more cost-effective.
Tip
Watch
out for database indexes that have poor selectivity. Such indexes are
often more of a detriment to the performance of the system than they are
a help. Not only are they usually not used for data retrieval, but they
also slow down your data modification statements because of the
additional index overhead. You should identify such indexes and consider
dropping them.
Typically, the density value should become smaller
(that is, more selective) as you add more columns to the key. For
example, in Listing 34.5, the densities get progressively smaller (and thus, more selective) as additional columns are factored in, as shown in Table 3.
Table 3. Index Densities for the titleidind Index on the sales Table
Key Column | Index Density |
---|
title_id | 0.001858736 |
title_id, stor_id | 5.99844E-06 (.00000599844) |
title_id, stor_id, ord_num | 5.926804E-06 (.000005926804) |