Exploring Statistics - Albert Masoliver's learning site

# Exploring Statistics > [!Note] > Ensuring the optimizer has **accurate statistics**—through both automatic and manual methods—relies on understanding **histograms**, **density**, **string statistics**, the variety of **single-column**, **multi-column**, and **filtered** statistics, and knowing when to apply **FULLSCAN** or rely on default sampling, all while monitoring the **staleness threshold** for updates. SQL Server can **automatically** generate and refresh statistics under the `AUTO_CREATE_STATISTICS` and `AUTO_UPDATE_STATISTICS` settings, building single-column histograms on demand when queries reference unindexed columns. You may also create statistics explicitly with the `CREATE STATISTICS` command to cover multi-column correlations or filtered subsets of data. At the heart of each statistics object lies a **histogram**, which captures the distribution of up to 200 value ranges for the leading column. **Density** metrics complement this by quantifying how selective each column is for equality predicates. For character data, **string statistics** enhance the optimizer’s ability to estimate `LIKE` and text patterns accurately. When maximum precision is required—especially after substantial data loads—you can force a full population of the histogram with `UPDATE STATISTICS … WITH FULLSCAN`. This bypasses the default sampling algorithm to ensure every row contributes to the distribution map. To prevent estimates from drifting, SQL Server defines a **staleness threshold**: once **500** plus **20%** of the rows in the key column have changed, the engine triggers an automatic statistics update on that object. Below is a concise tabular summary of these key statistics behaviors: |**Aspect**|**Details**| |---|---| |**Creation & Refresh**|Automatic via **AUTO_CREATE_STATISTICS** / **AUTO_UPDATE_STATISTICS**, or manual with `CREATE STATISTICS`| |**Core Components**|**Histogram** (value distribution), **Density** (equality selectivity), **String Stats** (text pattern estimates)| |**Types Available**|**Single-column** (auto), **Multi-column** (manual), **Filtered** (manual)| |**Advanced Update Option**|`UPDATE STATISTICS … WITH FULLSCAN` for complete histogram accuracy versus default sampled population| |**Staleness Threshold**|**500 + 20%** of modifications on the key column triggers automatic update| > [!tip] > Use **FULLSCAN** sparingly—reserve it for critical tables after bulk loads—to avoid prolonged maintenance windows and high I/O during sampling. > [!warning] > Overly delayed or infrequent statistics updates on high-churn tables can mislead the optimizer into poor plan choices, so monitor staleness via `sys.dm_db_stats_properties`. --- ## ## References - Korotkevitch, D. (2022). _SQL Server advanced troubleshooting and performance tuning: Best practices and techniques_. O’Reilly Media. - Nevarez, B. (2022). _SQL Server query tuning and optimization: Optimize Microsoft SQL Server 2022 queries and applications_. Packt Publishing.