Friday, September 2, 2016

SQL Server Statistics - When they are automatically getting updated?

You know that SQL Server uses an Execution Plan for every query you execute and the plan is created using multiple factors. Among the factors like Indexes and Statistics, Statistics play a key role because the cost for the plan is decided based on information provided by Statistics.

Statistics provides information on the data distribution in columns, tables and indexes. The Query optimizer uses statistics for estimating number of rows that query could be returned, and then decides the cost for CPU and other resources. Since the optimizer is a Cost-Based-Optimizer, the accuracy of the statistics is matter a lot because the plan selected for the query is based on the cost involved with it.

How often these statistics are getting updated? Since it updates automatically (by default, it is on), do we really need to care about it? Let's do a test and get answers for all the questions. Let's create a table and insert some records.

USE tempdb;
GO

-- DROP TABLE dbo.Person
-- Create the table
CREATE TABLE dbo.Person
(
 PersonId int identity(1,1) primary key
 , FirstName varchar(50) not null
 , LastName varchar(50) not null index IX_Person_LastName nonclustered
)

-- Insert 1000 records from Person.Person table
INSERT INTO dbo.Person
 (FirstName, LastName)
SELECT TOP (1000) FirstName, LastName 
FROM AdventureWorks2014.Person.Person

-- Query for one person named Abel and see
-- Enable Execution Plan too.
SET STATISTICS IO ON
SELECT * 
FROM dbo.Person
WHERE LastName = 'Abel'


As you see, it returns only one record and it uses the non-clustered index created for seeing the record. It is the best plan because SQL Server knows that there is only one record for Abel and Seek is the best operation for finding the record. How does it know? Statistics, see below code and the output. It says that there is only one record for Abel.

DBCC SHOW_STATISTICS ('dbo.Person', 'IX_Person_LastName')


Let's add some more records. But this time, I will be adding 699 records (will tell you the reason later) with the same Last Name : Abel.

INSERT INTO dbo.Person
 (FirstName, LastName)
SELECT TOP (699) FirstName, 'Abel'  
FROM AdventureWorks2014.Person.Person

Since there are many records for Abel now, for a query like above will be performed well with Scan operation instead of Seek but you will see that SQL Server still uses Seek instead of Scan.


Now you know the reason why it has not used Scan operation. As you see, still the statistics shows Estimated Number of Rows as 1.699 though there are 700 records. We can manually update statistics using UPDATE STATISTICS statement but let's try to understand why it has not been updated.

Generally, when the Auto Update Statistics is enabled, it will update statistics when it detects 500+20% of row changes. This algorithm gets changed with larger number of records but generally this is how it works. In our case, we have 1000 records and 20% of it is 200. Therefore it needs 200+500, which is 700 changes for updating statistics. Since we have inserted only 699, it does not fire statistics update.

Now you can either manually update or insert another record. Let's insert one more records and see how it works.

-- Can manually update
-- UPDATE STATISTICS dbo.Person IX_Person_LastName
-- or
-- Insert new record

INSERT INTO dbo.Person
 (FirstName, LastName)
SELECT TOP (1) FirstName, 'Abel'  
FROM AdventureWorks2014.Person.Person

-- Check the query again
SET STATISTICS IO ON
SELECT * 
FROM dbo.Person
WHERE LastName = 'Abel'


Statistics return Estimated Number of Rows accurately, hence optimizer decides to use Scan instead of Seek. Note that, logical reads has gone down to 12 now since not much lookups.

This is the reason why experts recommend update statistics manually if a large number of record are changed by an operation.

You can read more info on statistics at: https://blogs.msdn.microsoft.com/srgolla/2012/09/04/sql-server-statistics-explained/

No comments: