• Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About
Microsoft Data & AI

All Things Azure

Azure Data Factory (ADFv2) Parameter Passing: Date Filtering (blog post 1 of 3)

4/25/2017

10 Comments

 
I just had the pleasure of deleting in my blog backlog, three ADFv1 blog posts in the making.  The arrival of Azure Data Factory v2 (ADFv2) makes me want to stand up and sing Handel's Hallelujah Chorus.  Yes, my fine friend, ADFv2 is a real game player now.  Let us begin!

Assumptions:
  1. You have an ADFv2 environment in which to work.  These examples will not work in ADFv1.
  2. You are using VSTS GIT for source code control.  You will also need VSTS GIT for easy copying and editing.
  3. You are familiar with creating ADF linked services aka data connections.

Summary: This blog post will give you examples of the following ADFv2 functionality
  1. Calling a SQL Server stored procedure via ADFv2's new Lookup capability, and retaining the output dataset in memory
  2. Passing dynamic date filters to the Salesforce SELECT statement.  (I use Salesforce because most people can figure out everything else, but SOSQL is not very intuitive.)
  3. Creating date-specific destination folders in Azure Blob Storage (ABS).
  4. Concatenating string values
  5. Correctly referencing pipeline parameters and activities in ADFv2 dynamic content
This blog post is one of three in a series.  I will update this post with a link when they become available.
  • Azure Data Factory (ADF) v2 Parameter Passing: Table Names (2 of 3): I do not like hard-coding table names This blog post will parameter the table name, remove all hard-coded column definitions, and use a SELECT *.
  • ​Azure Data Factory (ADF) v2 Parameter Passing: Putting it All Together (3 of 3): When you combine a Salesforce filter with a parameterized table name, the SELECT * no longer works.  This blob post will show you how to parameterize a list of columns and put together both date filtering and a fully parameterized pipeline.

Setup
To get "this extract start time" we obviously have to incorporate a metadata layer in our solution.  ADFv2 can now Lookup to Cosmos DB as well as several other data sources listed here, but I'm an old fashioned SQL Server lady and I prefer to use an Azure Database for things like this.  It doesn't hurt to keep tables and procs, like what is included in the sample t-sql file at the end of this post, in an Azure Data Warehouse, but honestly, replicated 60 times?  If you can afford it, stand up a simple Azure SQL Database for metadata collection.

Step #1: Create your ADFv2 control objects in your metadata SQL database
  1. etl.ADFControl table
  2. etl.vwADFControl view
  3. etl.usp_ADF_Get_ADFControl_NextExtract stored procedure
  4. etl.usp_ADF_Set_ADFControl stored procedure
The CREATE t-sql scripts for these objects can be downloaded below. 

Step #2: Create your three ADFv2 linked services (data connections)
  1. SQL Server metadata source
  2. Salesforce, or some other data source
  3. Azure Blob Storage, or some other data destination
Tip: If you have never created a linked service before in ADF, use the wizard and do not try to copy someone's JSON.  A limited amount of data connection information is (understandably) stored in VSTS GIT, so you will need to manually update your data connections in each environment (DEV, STG, UAT, PRD etc.) anyway.

Step #3: Create your ADFv2 datasets
I am going to show you screen prints here, but the JSON for these datasets is provided at the end of this blog post.  This blog post uses a hard-coded table name.  In blog post 2 of 3, I'll show you how to use a single input and output dataset to extract all tables from a single source.
metadata DB used for all objects
Your lookup and execute SQL tasks will use this same dataset.
Picture
input dataset for your source
This is your source dataset.  You actually do not need to define every column, but we'll tackle that later.
Picture
Picture
output dataset for your destination
This is your destination dataset definition.  You actually do not need to define every column here either, but more on that later.
Picture
File Path Part 1
@concat(pipeline().parameters.blobContainerName,' /', formatDateTime(pipeline().parameters.actualRunTime, 'yyyy'), '/', formatDateTime(pipeline().parameters.actualRunTime, 'MM'), '/')

File Path Part 2
@concat('AccountPartner-', formatDateTime(pipeline().parameters.actualRunTime, 'yyyy'), formatDateTime(pipeline().parameters.actualRunTime, 'MM'), formatDateTime(pipeline().parameters.actualRunTime, 'dd'), ' ', formatDateTime(pipeline().parameters.actualRunTime, 'hh'), formatDateTime(pipeline().parameters.actualRunTime, 'mm'), '.txt')
Picture
You may be thinking, "This is a lot of stuff to just import one table!" but stay tuned.   I promise you, you will have less development time, cost of ownership, and less downtime by moving data with ADFv2 then you would with SSIS packages especially if you have > 2GB  of data in a source object.  Using parameters is the key to success.

Step #4: Create your ADFv2 pipeline
Picture
Lookup Activity: GetLastExtractStart
Picture
Click <Preview data> to see the result set from the stored procedure.  This value can be referenced with the following syntax:
​activity('GetLastEtxractStart').output.firstRow.NextExtractStartDateTime
Picture

​Stored Procedure Activities: Set Start and End Dates
SetLastExtractStart activity
Picture
SetLastExtractEnd activity
Picture

​Copy Activity: The actual copy data from source to destination filtered by YourSource.LastModifiedDate
Picture
Picture
<----Source Query Dynamic Syntax
SELECT Id, SystemModstamp FROM AccountPartner WHERE SystemModstamp >= @{formatDateTime(activity('GetLastEtxractStart').output.firstRow.NextExtractStartDateTime,'yyyy-MM-ddTHH:mm:ssZ')}
​
If you are new to ADFv2, here are a couple of screen prints on how to validate, save, publish and run your pipeline on demand.
Picture
Picture
Conclusion of the Matter
This blog post is intended for developers just starting out in ADFv2, and especially for ADFv2 developers pulling Salesforce data.  I do not expect you to have needed all the information provided, but I am hoping you'll find a tip or trick that will save you time.  Referencing activities and pipeline parameters is easy to do, for example, but when I first worked on this, I found very few examples.  Finding the right syntax for a Salesforce date filter took me about a week.  

You may be frustrated transitioning from SSIS to ADFv2, but don't give up.  Remember that ADFv2 is not a data transformation tool -- it is a data copy tool.  SSIS <> ADFv2, and in a world where we are moving to MPP architectures and ELT via stored procedures, ADFv2 as an "orchestrator" is worth considering.


Supporting Downloadable Files
json_for_adfv2_parameter_passing_1.zip
File Size: 2 kb
File Type: zip
Download File

adfcontrol_sql_server_metadata_objects.zip
File Size: 1 kb
File Type: zip
Download File

10 Comments
Mike Kiser
5/28/2019 07:45:26 am

Awesome Awesome Article. It made me feel better then you said that it took you a week to get the syntax correct because I am experiencing the same. Your article is the first that I have found that covers what I need to do. I have been working on a very similar project for 3 weeks total with very little to show. Your article explains everything VERY clearly and like you say in the article there is very little out there on this topic. Thanks for all 3 parts!!

Reply
Saiprasad
8/30/2019 05:07:35 am

I really loved your constructive details, I have one similar scenario where I have just started working on creating pipeline, but strucked in one of the scenario where I have to do stream catch-up when ever the stream run date is less than current date, if not exit...If there is any such scenario, kindly help me in creating the pipeling..if you need any further details, kindly tell me

Reply
Delora Bradish link
11/11/2019 08:50:48 pm

Saiprasad, I am unable to provide custom solutions, but you can reach out to Pragmatic Works for a consulting contract. Kindly, Delora

Reply
Nandini
12/4/2019 10:01:40 pm

Very well explained!!! It helped me a lot to build the framework using ADFv2. Thanks a lot for all the 3 posts.

Reply
Delora Bradish link
1/8/2020 09:30:28 am

Thank you, Nandini.

Reply
Rizal Ang
1/8/2020 02:39:42 am

Excellent post Delora! learn a ton from you thank you so much.

I do have one question about this syntax:

SELECT ... WHERE SystemModstamp >= @{formatDateTime(...}

Do we not need to add two single quotes in between the @{formatDateTime(...}? To make the SQL datetime comparison work?

Many thanks!
RA

Reply
Delora link
1/8/2020 09:29:55 am

Rizal, no extra formatting is needed. Just copy and paste the JSON provided; however, I did find it much easier in the end to setup a SQL metadata table which passed a correctly formatted WHERE clause. It was also much easier to use a SQL view to use logic to pass 'full load' or 'incremental load' parameter values. Anytime I find myself in a big brew-ha-ha in ADF expressions, I move it out to a SQL table with a Lookup() activity to pull what I need into ADF. Kindly, Delora

Reply
Azure Training in Chennai link
6/1/2022 07:58:49 am

Greetings of the Day delorabradish! The great Azure Data Factory blog and information shared are also very appreciable. You deserve a huge thanks for this wonderful post. This is really an excellent post. thanks!!

Reply
Julia link
11/8/2023 08:18:37 am

Greaat read thanks

Reply
Bloomington Sluts link
1/1/2025 01:33:24 pm

Thanks for writingg this

Reply



Leave a Reply.

    Categories

    All
    Agile Methodology
    Azure Blob Storage
    Azure Data Factory
    Azure Data Lake
    Azure Data Warehouse
    Azure SQL Database
    Cosmos DB
    Data Architecture
    Databricks
    Elastic Query
    External Tables
    Linked Services
    Migrating To The Cloud
    Parameters
    PolyBase
    Project Management

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About