• Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About
Microsoft Data & AI

SQL Server Integration Services

Top 5 Best Practices for SSIS Performance

5/24/2015

0 Comments

 
Picture
Microsoft has built in multiple data flow performance features that you can read about here (http://msdn.microsoft.com/en-us/library/ms141031.aspx).  The following list is not all-inclusive, but the following best practices will help you to avoid the majority of common SSIS oversights and mistakes.

1. Give your SSIS process its own server.  The resources needed for data integration, primary memory and lots of it, are different than for data storage.  Granted, if your entire ETL process runs in just a few hours during the night when no end users are connecting, the case can be made to share servers; however, more often, real-time requirements and on-demand transformation are reasons to give your ETL process dedicated hardware.

2. Only update rows in your data warehouse that have been changed or deleted from your source system(s).
  • Use SQL Server’s change data capture abilities whenever possible if working with many updates and deletes in the source system.  This works well in a one-source-to-one-destination transform, but can become quite complex in a multi-source-to-one-destination ETL package
  • Consider the use of hash values when your source system does not indicate last update date time, or when you transforms involve multi-source-to-one-destination transforms.

3. Install and test for adequate RAM on the SSIS server.  Sorting 2TB of data requires 2TB of RAM, and SSIS will start to write the data to disc when all available memory is taken.  As part of your Test / QA methodology, you should use Performance Monitor and have your network team notify you whenever Buffers Spooled goes above zero or Avg. Disk sec/Transfer gets above 10.  Test and monitor the following PerfMon counters.  
  • % Processor Time
  • Avg Disk sec/Transfer
  • Available Mbytes
  • Buffer memory
  • Buffers in use
  • Buffers spooled
  • Private buffer memory
  • Private buffers in use

4. Take note of your MaxConcurrentExecutables package property.   This defines how many tasks can run simultaneously.  The default value is negative one (-1), which means the number of physical or logical processors plus two (2) is the number of control flow items that can be executed in parallel.  It is generally recommended that you leave this setting at the default unless you are absolutely sure that parallelism is causing an issue.

Picture
5. Adequately test and update data flow properties that impact performance.
  • EngineThreads
  • DefaultBufferMaxRows
  • DefaultBufferSize
You can find a plethora of information on these package properties on the Internet, and so their definitions will not reiterated here; however, the important thing is not to push a large ETL project into production with the SSIS defaults.  Best practice is to develop small, mid-sized and large data transforms with the SSIS package defaults and then take the time to test changes to each of these properties based on volume.  Unfortunately, there is no exact science.

0 Comments



Leave a Reply.

    Categories

    All
    Best Practices
    Design
    Error Handling
    SCD2
    SSIS

    RSS Feed

    Tweets by @SqlServerSME
Powered by Create your own unique website with customizable templates.
  • Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About