• Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About
Microsoft Data & AI

All Things Azure

Azure Data Factory (ADFv1) JSON Example for Salesforce Incremental Load of RelationalSource Pipeline

4/25/2017

0 Comments

 
What I am sharing in this blob post isn't rocket science, but when I first began working with ADF, I would have had genuine feelings of gratitude if I could have found a JSON script example for incremental loads.  Consequently, this blog post is here in hopes that it finds someone who just needs a quick pick-me-up with ADF.

You will want to first reference
my blog post on initial data loads with ADF because I will build here on the same resources and linked services.  What is different is the pipeline JSON script.  Follow the instructions for initial data load for everything else.

Using the Account table from Salesforce as an example, here is  what I call an incremental load pipeline:

{
  "name": "InboundPipeline-SalesforceIncrementalLoad",
  "properties": {
    "activities": [
      {
        "type": "Copy",
        "typeProperties": {
          "source": {
            "type": "RelationalSource",
            "query": "$$Text.Format('select * from Account where LastModifiedDate >= {{ts\\'{0:yyyy-MM-dd HH:mm:ss}\\'}} AND LastModifiedDate < {{ts\\'{1:yyyy-MM-dd HH:mm:ss}\\'}}', WindowStart, WindowEnd)"
          },
          "sink": {
            "type": "AzureDataLakeStoreSink",
            "writeBatchSize": 0,
            "writeBatchTimeout": "00:00:00"
          }
        },
        "inputs": [
          {
            "name": "InputDataset-Account"
          }
        ],
        "outputs": [
          {
            "name": "OutputDataset-Account"
          }
        ],
        "policy": {
          "timeout": "1.00:00:00",
          "concurrency": 1,
          "executionPriorityOrder": "NewestFirst",
          "style": "StartOfInterval",
          "retry": 3,
          "longRetry": 0,
          "longRetryInterval": "00:00:00"
        },
        "scheduler": {
          "frequency": "Day",
          "interval": 1
        },
        "name": "Activity-Account"
      }
    ],
    "start": "2017-04-17T19:48:55.667Z",
    "end": "2099-12-31T05:00:00Z",
    "isPaused": false,
    "pipelineMode": "Scheduled"
  },
}


Unfortunately at this time, the "query" property isn't that flexible.  I have been told by Microsoft that the next version of ADF scheduled for release fall 2017 will include input parameters.  I have tested and was able to use a REPLACE() function in the above SELECT but that is because under the covers, Microsoft is using the Simba ODBC driver for their Salesforce connection.  Each data source will have different capabilities within their "SELECT..."  Having identified several challenges with the current version of ADF, I was just happy to be able to use a WHERE clause.


0 Comments



Leave a Reply.

    Categories

    All
    Agile Methodology
    Azure Blob Storage
    Azure Data Factory
    Azure Data Lake
    Azure Data Warehouse
    Azure SQL Database
    Cosmos DB
    Data Architecture
    Databricks
    Elastic Query
    External Tables
    Linked Services
    Migrating To The Cloud
    Parameters
    PolyBase
    Project Management

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About