SQL Server 2008 R2 : Index Statistics (part 1) - The Statistics Histogram, How the Statistics Histogram Is Used, Index Densities

8/18/2013 11:05:59 AM

As mentioned earlier, the selectivity of a key is an important factor that determines whether an index will be used to retrieve the data rows that satisfy a query. SQL Server stores the selectivity and a histogram of sample values of the key; based on the statistics stored for the key columns for the index and the SARGs specified for the query, the Query Optimizer decides which index to use.

To see the statistical information stored for an index, use the DBCC SHOW_STATISTICS command, which returns the following pieces of information:

A histogram that contains an even sampling of the values for the first column in the index key. SQL Server stores up to 200 sample values in the histogram.
Index densities for the combination of columns in the index. Index density indicates the uniqueness of the index key(s) and is discussed later in this section.
The number of rows in the table at the time the statistics were computed.
The number of rows sampled to generate the statistics.
The number of sample values (steps) stored in the histogram.
The average key length.
Whether the index is defined on a string column.
The date and time the statistics were generated.

The syntax for DBCC SHOW_STATISTICS is as follows:

DBCC SHOW_STATISTICS (tablename, index)

Listing 1 displays the abbreviated output from DBCC SHOW_STATISTICS, showing the statistical information for the aunmind nonclustered index on the au_lname and au_fname columns of the authors table.

Listing 1. DBCC SHOW_STATISTICS Output for the aunmind Index on the authors Table

dbcc show_statistics (authors, aunmind )
go

Name       Updated              Rows Rows Sampled  Steps  Density
Average key length String Index Filter Expression   Unfiltered Rows
---------- -------------------- ---- --------------- ------ --------
--------------------------------------------------- ---------------
aunmind    Mar 14 2010 10:20PM  172  172           148    1
24.06977           YES          NULL                172

(1 row(s) affected)

All density   Average Length Columns
------------- -------------- -------------------------
0.00625       6.406977       au_lname
0.005813953   13.06977       au_lname, au_fname
0.005813953   24.06977       au_lname, au_fname, au_id

(3 row(s) affected)

RANGE_HI_KEY   RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
----------------------------------------------------------------------
Ahlberg        0          2       0                    1
Alexander      0          1       0                    1
Amis           0          1       0                    1
Arendt         0          1       0                    1
Arnosky        0          1       0                    1
Bate           0          1       0                    1
Bauer          0          1       0                    1
Benchley       0          1       0                    1
Bennet         0          1       0                    1
Blotchet-Halls 0          1       0                    1
...
del Castillo   0          1       0                    1
Dillard        0          1       0                    1
Doctorow       0          1       0                    1
Doyle          0          1       0                    1
Durrenmatt     2          1       2                    1
Eastman        0          1       0                    1
...
Gringlesby     0          1       0                    1
Grisham        0          1       0                    1
Gunning        0          1       0                    1
Hill           0          1       0                    1
Hutchins       3          2       3                    1
Ionesco        0          1       0                    1
Ishiguro       0          1       0                    1
...
Tyler          0          1       0                    1
Van Allsburg   0          1       0                    1
Van der        0          1       0                    1
Van der Meer   0          1       0                    1
von Goethe     0          1       0                    1
Walker         0          1       0                    1
Warner         0          1       0                    1
White          0          2       0                    1
Wilder         0          1       0                    1
Williams       0          2       0                    1
Wilson         0          1       0                    1
Yates          0          1       0                    1
Yokomoto       0          1       0                    1
Young          0          1       0                    1

Looking at the output, you can determine that the statistics were last updated on March 14, 2010. At the time the statistics were generated, the table had 172 rows, and all 172 rows were sampled to generate the statistics (no filtering was applied). The average key length is 24.06977 bytes. From the All density information, you can see that this index is highly selective. (A low density means high selectivity; index densities are covered shortly.)

After the general information and the index densities, the index histogram is displayed.

1. The Statistics Histogram

Up to 200 sample values can be stored in the statistics histogram. Each sample value is called a step. The sample value stored in each step is the endpoint of a range of values. Three values are stored for each step:

RANGE_ROWS—This indicates how many other rows are inside the range between the current step and the step prior, not including the step values themselves.
EQ_ROWS—This is the number of rows that have the same value as the sample value. In other words, it is the number of duplicate values for the step.
Range density—This indicates the number of distinct values within the range. The range density information is actually displayed in two separate columns, DISTINCT_RANGE_ROWS and AVG_RANGE_ROWS:
- DISTINCT_RANGE_ROWS is the number of distinct values between the current step and the step prior, not including the step values itself.
- AVG_RANGE_ROWS is the average number of rows per distinct value within the range of the step.

In the output in Listing 34.4, distinct key values in the first column of the index are stored as the sample values in the histogram. Because most of the values for au_lname are unique, most of the range values are 0. You can see that there is a duplicate in the index key for the last name of Hutchins (EQ_ROWS is 2). For comparison purposes, Listing 2 shows a snippet of the DBCC SHOW_STATISTICS output for the titleidind index on the sales table in bigpubs2008.

Listing 2. DBCC SHOW_STATISTICS Output for the titleidind Index on the sales Table in the bigpubs2008 Database

dbcc show_statistics (sales, 'titleidind')
go

Name        Updated              Rows    Rows Sampled Steps  Density
Average key length String Index Filter Expression Unfiltered Rows
----------- ------------------- ------- ------------ ------ -------------
------------------ ------------ ----------------- ---------------
titleidind  Mar 14 2010 10:39PM  168725  152432       188    0.003537365
26.40519           YES          NULL              168725

All density   Average Length Columns
------------- -------------- --------------------------------------------------
0.001858736   6              title_id
5.99844E-06   10             title_id, stor_id
5.926804E-06  26.4007        title_id, stor_id, ord_num

RANGE_HI_KEY RANGE_ROWS EQ_ROWS  DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
------------ ---------- -------- -------------------- --------------
BI0194       0          274.8199 0                    1
BI2184       639.6047   312.9337 2                    277.1448
BI2915       893.1208   271.811  3                    261.8779
BI3976       637.2789   260.778  2                    276.137
BI8448       1685.068   281.8409 6                    300.0652
BU1111       616.3464   276.8259 2                    267.0668
BU7832       357.0157   299.8948 1                    296.2236
CH0249       1067.558   279.8349 3                    313.0259
CH0639       1019.879   284.8499 3                    299.0454
CH0671       316.3136   259.7751 1                    262.4521
CH0847       1333.867   266.796  5                    295.557
CH1260       1069.884   287.8589 3                    313.7079
CH1380       612.8576   311.9307 2                    265.5551
CH1568       645.4193   286.8559 2                     279.6643
CH1692       974.525    275.8229 3                    285.7469
CH2080       329.1057   285.8529 1                    273.066
CH2240       715.1943   273.817  2                    309.8983
CH2256       352.364    310.9277 1                    292.364
CH2360       630.3014   293.8768 2                    273.1136
CH2480       626.8126   311.9307 2                    271.6019
CH2574       679.1439   279.8349 2                    294.2774
CH2610       334.9203   280.8379 1                    277.8905
CH2706       343.0607   300.8978 1                    284.6448
CH2856       326.7799   287.8589 1                    271.1362
...
FI9853       623.3239   295.8828 2                    270.0902
FI9965       625.6497   323.9666 2                    271.098
LC1680       629.1384   286.8559 2                    272.6097
LC5292       647.7451   265.793  2                    280.6721
MC3021       610.5318   244.7302 2                    264.5473
NF2924       652.3968   266.796  2                    282.6877
NF8918       669.8406   310.9277 2                    290.2462
PC9999       665.1889   275.8229 2                    288.2306
PS2106       709.3798   259.7751 2                    307.3788
TC3218       617.5093   291.8708 2                    267.5707
TC4203       29.23513   293.8768 0                    284.9097
TC7777       29.23513   269.805  0                     284.9097

As you can see in this example, there are a greater number of rows per range and a greater number of duplicates for each step value. Also, 188 steps in the histogram are used, and the sample values for the 168,725 rows in the table are distributed across those 188 step values. Also, in this example, 152,432 rows, rather than the whole table, were sampled to generate the statistics.

2. How the Statistics Histogram Is Used

The histogram steps are used for SARGs only when a constant expression is compared against an indexed column and the value of the constant expression is known at query compile time. The following SARG examples show where histogram steps can be used:

where col_a = getdate()
where cust_id = 12345
where monthly_sales < 10000 / 12
where l_name like "Smith" + "%"

Some constant expressions cannot be evaluated until query runtime. They include search arguments that contain local variables or subqueries and also join clauses, such as the following:

where price = @avg_price
where total_sales > (select sum(qty) from sales)
where titles.pub_id = publishers.pub_id

For these types of statements, you need some other way of estimating the number of matching rows. In addition, because histogram steps are kept only on the first column of the index, SQL Server must use a different method for determining the number of matching rows for SARGs that specify multiple column values for a composite index, such as the following:

select * from sales
   where title_id = 'BI3976'
     and stor_id = 'P648'

When the histogram is not used or cannot be used, SQL Server uses the index density values to estimate the number of matching rows.

3. Index Densities

SQL Server stores the density values of each column in the index for use in queries where the SARG value is not known until runtime or when the SARG is on multiple columns of the index. For composite keys, SQL Server stores the density for the first column of the composite key; for the first and second columns; for the first, second, and third columns; and so on. This information is shown in the All density section of the DBCC SHOW_STATISTICS output in Listings 34.4 and 34.5.

Index density essentially represents the inverse of all unique key values of the key. The density of each key is calculated by using the following formula:

Key density = 1.00 / Count of distinct key values in the table

Therefore, the density for the au_lname column in the authors table in the bigpubs2008 database is calculated as follows:

Select Density = 1.00/ (select count(distinct au_lname) from authors)
go
Density
---------------------------------------
0.0062500000000

The density for the combination of the columns au_lname and au_fname is as follows:

Code View: Scroll / Show All

Select Density = 1.00/ (select count(distinct au_lname + au_fname) from authors)
go
Density
----------------
0.0058139534883

Notice that, unlike with the selectivity ratio, a smaller index density indicates a more selective index. As the density value approaches 1, the index becomes less selective and essentially useless. When the index selectivity is poor, the Query Optimizer might choose to do a table scan or a leaf-level index scan rather than perform an index seek because it is more cost-effective.

Tip

Watch out for database indexes that have poor selectivity. Such indexes are often more of a detriment to the performance of the system than they are a help. Not only are they usually not used for data retrieval, but they also slow down your data modification statements because of the additional index overhead. You should identify such indexes and consider dropping them.

Typically, the density value should become smaller (that is, more selective) as you add more columns to the key. For example, in Listing 34.5, the densities get progressively smaller (and thus, more selective) as additional columns are factored in, as shown in Table 3.

Table 3. Index Densities for the titleidind Index on the sales Table
Key Column	Index Density
title_id	0.001858736
title_id, stor_id	5.99844E-06 (.00000599844)
title_id, stor_id, ord_num	5.926804E-06 (.000005926804)

Others

- Keeping Windows 7 and Other Software Up to Date : Installing and Removing Software (part 2)

- Keeping Windows 7 and Other Software Up to Date : Installing and Removing Software (part 1)

- Keeping Windows 7 and Other Software Up to Date : Service Packs

- SharePoint 2010 : Performance Monitoring - Implementing Visual Round Trip Analyzer

- SharePoint 2010 : Performance Monitoring - What and how to monitor with Performance Monitor

- SharePoint 2010 : Performance Monitoring - Using SQL Profiler

- SharePoint 2010 : Performance Monitoring - Enabling HTTP Request Monitoring and Throttling

- SharePoint 2010 : Monitoring and Reporting - Enabling the Developer Dashboard

- SharePoint 2010 : Monitoring and Reporting - Troubleshooting with correlation IDs