a collection of technical fixes and other randon stuff

Spodworld

  • Join Us on Facebook!
  • Follow Us on Twitter!
  • LinkedIn
  • Subcribe to Our RSS Feed

Speeding up SQL server for millions of records

Recently i've been required to insert 6 million database records into the latest version of one of our products.

Actually, it's a lot more than this because it it 6 million business objects which of course tend to have their related lookup tables so the figure is probably much more than this.

The problem I have is that for reliable data, the information needs to be inserted through the interface to guarantee it is like a production environment, and this is slow... especially when you get over 1 million database records in your table.

 

As an experienced dataabase developer, my first thoughts were to look at profiler, and get some information on improving the performance with code tweaks or better indexing. Being an almost mature product, there was little to do here, but a few changes to the indexes and stored procedures helped a bit. This gave be about 50% speed improvement but this didnt last for long.

Soon the process of inserting records, and the backgroung logic doing lookups to check for duplicates etc... was slowing down to 3 or four records per second, which was going to take forever. At 2 million records, it was clear that this needed a different approach.

 

To rub salt in the wound there were other factors stopping me doing my task.

  • I had a brilliant laod testing app, i had written that could reduce call time/latency to the website by doing all this multithreaded. Unfortunately the code wasnt very threadsafe, and performance got worse with multiple threads due to all the locking
  • My super fast servers from last year had been hijacked by IT to run sharepoint
  • I could just run several databases at low volumes and piec ehem all together. THis involved a big manual manipulation of the data (a risk) and also ..i didnt have time. At table sizes of this size , even SQL server's interface was struggling.

 

The solution was based around previous tests where we found a super fast SAN storage system sped up performance significantly. However i didnt have access to one in such a short time.

I decided to improvise  adn create a RAM disc  to store the SQL database on and hopefully speed things up. This in theory should be much faster at seeking reading/writing than a standard mechanical hard drive.

I used a PC with 12GB ram, and created A RAM DRIVE.

I downloaded a trial version of this software: http://www.superspeed.com/servers/ramdisk.php and created an 8GB drive within 5 mins.

I then 'detached' my database using the SQL server interface, and moved the mdf,ldb files to the ram drive.

I then  re-attached the database amking sure sql server had permission to the new files and their storage location.

After a database shrink and index rebuild on all tables i re-started my task.

Insertion continued at 50-100 inserts per second which was much faster than before. And slowdown has stopped or at least is slowing down at a slower rate than before.

 

You can also move the system temp db file to the ramdisk too which can speed things up further, however, bear in mind if you need to do any large data deletion or modification on large numbers of rows, you may find the tempdb grows and fills your ram disk whilst trying to do this. For this reason , i left this out on my final run.

 

Dont forget a large dataabse will probably get several maintenance tasks carried out on it over a growth like this so it may be handy to set an hourly index rebuild task whilst your insertion run is executing.

Annoyingly this should all be ready just before our much more efficient mass insertion feature passes testing.

If you have the resources, i'd try a solid state drive or SAM system as you must remember your dataa will be lost as soon as the power goes off.

Dont forget to back it up once you've finished

How to use SQL profiler (quick guide)

SQL profiler usage

 

SQL profiler is a great tool for finxing problems with your database. It can highlight where errors are occurring, and also to some extent assist with improving performance.
I say to some extent because, once you get running really fast, you need to take into account that profiler does slow down the responses coming from your database. My policy on this is to always test and improve on a slow running machine or virtual machine and then deply the changes to your fast server. Simply throwing more CPU and memory requires you to test with larger volumnes and makes you work harder rather than your tools.

 

·                     First you'll need SQL management studio, either the full version or a developer version.(SQL express wont have this)

·                     Connect to the database using SQL management studio. (You could try remoting to the server and using the management studio thats installed there if it's not on your machine)

·                     SQL profiler can be found in [Start]->[Programs]->[Microsoft SQL 2005]->[Performance Tools] or through the [Tools] menu option in SQL management studio it's self.

·                     On initial startup you are presented with a blank screen. You need to create a new trace.(process whereby data is collected)
Click [File]->[New trace]
Select your database server to connect to (usually the one your working on) Note: you will probably need Admin permission or the correct settings to do this.

·                     Use the standard template if this is your first attemt, but click on the [Events Selection] Tab to have a look what's available.
Here you can add events to be monitored by checking the check boxes. More options are available if you choose "Show all events" and/or "Show all columns".
You can also add filters to narrow down what your looking for by right clicking over a column header and selecting "Edit column filter". I mainly use this to select long running events eg: over 60ms initiallly to filter out neglegable actions.

·                     When you select okay, the trace starts and the events are displayed onscreen.

·                     Pause/Stop/COntinue buttons are available at the top to allow you to brows the captures data once it's come in.

·                     You can use this to gather information on what actions are taking the longest to execute, or... you can save the trace for the Database tuning Advisor to process and make suggestions on inproving performance.
To do this clisk [File]->[Save as]->[trace file].
Once saved go to [Tools]->[Dataabse tuning advisor] and use this to analyse your trace file.

 

SQL optimisation checklist

  • Place all your code in stored procedures where possible
  • Move declarations  to the start of stored procedures.
  • Are transactions required??? If there are no updates or inserts, then you should remove any transactions and rollback commands. These will just fill your transaction log up.
  • Repeat getdate() commands. Why not decalra a variable at the start of the stored procedure and re-use it later on in the code.
  • Count(*) and count(fieldname). If your deciding if there is data in a table or looping through a numbe rof rows. Use the EXISTS command instead. If a cound is required, just use one field instead of * as is reducess the number of reads.
  • Rollback transactions as soon as possible. Sometimes people build error messages before rolling back... these can be done outside the transaction.
  • XML use... try upgrading to the latest XML methods.
  • Prevent recompiliation
  • Avoid temporary table modifications
  • Reduce multiple lookups into one.
  • Move select statemnets to the start of stored procedures where possible. This reduces record locking times.
  • Modify multiple update statements to use CASE.
  • Reduce UNION commands
  • Get a tool to help you....MS query analyser is pretty valuable.
  • Avoid nested loops
  • Avoid functions in the WHERE clause as this affects performance.
  • Use performance monitors where available.
  • Use a fixed load rather than the maximum load when trying to improve performance
  • Call stored procedures using their fully qualified name
  • Split larger stored procedures into smaller ones and have  acontrolling procedure.
  • Do not use sp_ as a stored procedure prrefix
  • Consider returning the integer value as a RETURN statement instead of returning an integer value as part of a recordset.
  • Avoid temporary tables
  • SET NOCOUNT ON
  • Use the sp_executesql stored procedure instead of the EXECUTE statement.
  • Check the order of ields in the WHERE clause
  • Consider additional indexes on popular search fileds
  • Reduce the data returned (avoid select *)
  • Are ORDER BY and GROUP by required. - these are quite expensive operations
  • INcrease workerthreads
  • Increase CPUs / hard drive speed / Memory
  • Consider offloading work.
  • Use the select statements with TOP keyword or the SET ROWCOUNT
    statement, if you need to return only the first n rows.
  • Clustered indexes are more preferable than nonclustered, if you need
    to select by a range of values or you need to sort results set with
    GROUP BY or ORDER BY.
  • Try to avoid using the DISTINCT clause
  • Use table variables instead of temporary tables
  • Create indexes on XML columns

  • De-normalization

  • Avoid use of NULL