Thursday, March 5, 2015

How to load/insert large amount of data efficiently

Importing data from other sources is a very common operation with most of database solutions as not all data cannot be entered row-by-row. When large amount of data needs to be imported, we always consider about the constraints added to the table and minimal logging expecting improved performance.

SQL Server supports a set of tools for importing data: Import and Export Wizard, bcp (Bulk Copy Program), BULK INSERT, OPENROWSET (BULK). Preference always goes to either bcp or BULK INSERT, however, to improve the performance of it, some of the options have to be considered and set. This post explains how to load a large amount of data using bcp with various options.

For testing the best way, I created a database called TestDatabase and a table called dbo.Sales. I prepared a text file called Sales.txt that has 7,301,921 records matching with dbo.Sales table structure. The Recovery Model of the database is initial set as Full.

Then I loaded the table with following bcp commands. Note that each and every command was run after truncating and shrinking the log file.

--Loding without any specific option
bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c

--Loding with TABLOACK option for forcing to lock the entire table
bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c -h "TABLOCK"

--Loding with our own batch size
bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c -b 3000000 

--Loding with our own batch size and TABLOCK 
bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c -b 3000000 -h "TABLOCK"

Once the table is loaded four times, then the table was loaded again with Bulk Logged recovery model. All commands were executed just like previous one, making sure that log is truncated and shrunk before the execution.

Finally here is the result.

Command  Recovery Model Time (ms) Log file size after loading
With no specific option Full 189,968 3,164,032 KB
With TABLOCK Full 275,500 1,475,904 KB
With batchsize Full 108,500  2,377,088 KB
With batchsize and TABLOCK Full 99,609 1,964,480 KB
With no specific option  Bulk Logged 140,422 3,164,032 KB
With TABLOCK Bulk Logged 239,938 26,816 KB
With batchsize Bulk Logged 121,828 2,377,088 KB
With batchsize and TABLOCK Bulk Logged 86,422 1,475,904 KB

As you see, you can get improved performance in terms of the speed by setting the batch size and tablock with bulk-logged recovery model. If you really consider about log file growth, then tablock option with bulk-logged recovery model is the best.

I believe that BULK INSERT offers the same result, however it is yet to be tested in similar way.

No comments: