Wednesday, September 29, 2010

Table variable slower than Temporary table???

I used to use table variables when the resultset is smaller and want to hold it temporarily. The main reason for this selection is, data manipulation with table variables is faster than manipulation with temporary tables when the container contains a smaller amount of data. There were some instances, specially on JOINs, table variables show slowness, have been thinking to investigate but had no time. Thought start investigating on it again when I saw this post of Jack Li.

Simple Test
Let me start with an example, similar to an issue I had. The following code creates a table variable and loads sales data into it. Later, the table variable joins with a permanent table, to get a resultset. Note that there is no logic on the query, this is just for testing. Turn Include Actual Execution Plan on and run the query below.

DECLARE @TestTable TABLE 
(
    SalesOrderId int 
    , SalesOrderDetailID int
    , LineTotal money 
    , PRIMARY KEY (SalesOrderId, SalesOrderDetailID)
)
 
INSERT INTO @TestTable
SELECT SalesOrderID
    , SalesOrderDetailID
    , LineTotal 
FROM Sales.SalesOrderDetail
 
SELECT t.SalesOrderId, s.UnitPrice
FROM @TestTable t
INNER JOIN Sales.SalesOrderDetail s
    ON t.SalesOrderId = s.SalesOrderID
        AND t.SalesOrderDetailID = s.SalesOrderDetailID

plan
Now investigate the execution plan for SELECT query. See Estimated Number of Rows, it shows as 1 which is incorrect. This causes Optimizer to use Nested Loops instead of Merge Join and reduces the performance of data retrieval.

Is Merge Join better than Nested Loops?
According to the BOL, Query Optimizer determines the cost of query plan and selects based on two factors;

  • The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.
  • The cost model of the algorithm dictated by the operators used in the query.

The second factor depends on first one. Therefore having a correct cardinality will make sure that faster execution plan is generated. As the plan generated for our code does not provide a correct cardinality, the algorithm used for joining is not a optimized way to join two results.

Optimizer selects Nested Loop if one of tables contain smaller number of records, based on Estimated Number of Rows, that is the reason for selecting Nested Loop algorithm. But the reality is, number of records in variable table is not small (in this case).

Merge Join works differently. Usually Optimizer selects this algorithm when both resultsets are sorted, in our case, yes, it is. The matching operation for this join works differently and it is usually faster than Nested Loop. For more info, read this.

Using #Temporary Table
The below code does the same, only different is a temporary table has been used instead of table variable.

USE AdventureWorks2008
GO
 
CREATE TABLE #TestTable 
(
    SalesOrderId int 
    , SalesOrderDetailID int
    , LineTotal money
    ,CONSTRAINT pk_TestTable PRIMARY KEY
        (SalesOrderId, SalesOrderDetailID)
)
 
INSERT INTO #TestTable
SELECT SalesOrderID
    , SalesOrderDetailID
    , LineTotal 
FROM Sales.SalesOrderDetail
 
SELECT t.SalesOrderId, s.UnitPrice
FROM #TestTable t
INNER JOIN Sales.SalesOrderDetail s
    ON t.SalesOrderId = s.SalesOrderID
        AND t.SalesOrderDetailID = s.SalesOrderDetailID

plan2

Now you can see that cardinality of the plan is correct and Merge Join is used as the algorithm for joining two results.

A reason for wrong “Estimated Number of Rows” with table variable
As Jack Li explains, when the code is compiled as a batch, since the table variable does not contain data at the initial stage, optimizer makes an assumption, saying that table variable contains very low number of data, resulting a plan with Nested Loop. One other reason could be unavailability of statistics. SQL Server does not maintain statistics for table variables.

Solving the issue
Again, solution from Jack Li, all we have to do is, recompiling the SELECT before executing the SELECT code. At this time table variable contains data, hence Estimated Number of Rows is correctly determined. See the code below;

SELECT t.SalesOrderId, s.UnitPrice
FROM @TestTable t
INNER JOIN Sales.SalesOrderDetail s
    ON t.SalesOrderId = s.SalesOrderID
        AND t.SalesOrderDetailID = s.SalesOrderDetailID
OPTION (RECOMPILE)    

plan3

There will be a cost for recompiling, hence always compare the cost and the improvement, then select the best way.

No comments: