• Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About
Microsoft Data & AI

SQL Server Integration Services

Top 5 Best Practices for SSIS Design

5/23/2015

2 Comments

 
1. A good SSIS package design will be repeatable.  If you find yourself adding new tasks and data flow exceptions to your packages, you need to stop and reevaluate the original layout.  One SSIS project will have several “templates” that are reused without a single design change for the life of the data warehouse.  You should only have one or two template variations for each of these areas:
  • Staging package
  • Fact package
  • dimension package
  • Slowly changing dimension type 2 package
  • Control package
The key to manageable QA and the longevity of your ETL process is using a repeatable process.  The same thing should happen over and over again as data is moved from staging to your IDS (Information Data Store) to your EDW (Enterprise Data Warehouse).  Consider using t-SQL in USPs to work out complex business logic.

2. Plan for restartability.  As of SQL 2014, SSIS checkpoint files still did not work with sequence containers.  (The whole sequence container will restart including successfully completed tasks.)  The solution is to build Restartability into your ABC framework.

3. Verify your ETL process.  Just because your ETL finished without error – or you successfully handled your errors, doesn’t necessarily mean the SUM() of SourceSystemNet equals the SUM() of SSAScubeNet.  Use your verification process this way.
a. It should be the final step of your ETL / ELT process
b. It should confirm that strategic SUM() and row COUNT() are accurate
c. It should report on dropped rows discarded during the ETL from an INNER JOIN or WHERE clause
d. It should automatically send emails of errors that have been allowed to “report and pass”.

4. Collect Metadata!  Audit, balance and control (ABC) should be planned for and implemented from the very first day.  You should store this ABC data in a separate SQL Server database, and at any point in time be able to determine the following:
  • What packages are currently running?  (Your SQL Agent will kick off a master package but won’t tell you what child packages / sequence containers are in process.)
  • When did a package last successfully execute?
  • How many records were selected vs. inserted, updated or deleted from any given task?
  • At what data flow process did a package fail, and where should it restart?
  • How long did each task take to run?
  • How long did each package take to run?
  • What tasks are taking the longest to execute?

5. Trap for Errors both through On Error events and through precedence constraints.  There are two types of errors to successfully handle in an ETL / ELT process
  • Report and fail.  Use Event Handles and your package properties for this.
  •  Report and pass.  Use your precedence constraints to allow for errors, but always keep row-level reporting so that someone can be notified and the problem can be researched and corrected
With everything else, there must be a balance.  It is not appropriate to have every single task with an Error Output.  Choose wisely.  It is appropriate often to ‘report and fail’, but when there are hundreds of packages and tables churning through your ETL system, you cannot have it breaking constantly. (This is often the case when source systems keep sending through data quality “surprises”.)  Don’t jeopardize the reliability of your data warehouse by not handling a source system that continues to behave badly.  The DW is your responsibility.  The source system is often outside of your control.

For a full list of MS BI Best practices, download the following file:
ms_bi_best_practices.docx
File Size: 2137 kb
File Type: docx
Download File

2 Comments
Santosh
7/18/2016 11:42:54 am

Hi,

I work for HP, found this document is useful. you made the HP KB document to public for download.. isn’t it?

Reply
SYlvie Levesque
6/5/2021 03:27:13 am

Merci

Reply



Leave a Reply.

    Categories

    All
    Best Practices
    Design
    Error Handling
    SCD2
    SSIS

    RSS Feed

    Tweets by @SqlServerSME
Powered by Create your own unique website with customizable templates.
  • Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About