• Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About
Microsoft Data & AI

All Things Azure

ADW PolyBase Error: Error converting RCFile Type string to Sql type NVARCHAR

8/10/2018

0 Comments

 
When testing Azure Data Lake (ADL) to Azure Data Warehouse (ADW) file ingestion, this error continued to come up on various external table SELECTs.  The confusion was that the ADL only contained parquet files.  There was only one external file format defined, and that too was obviously for parquet.  From where was an RCFile error originating?  The bottom line, in this particular engagement scenario, was that this error is actually a truncation error.

Things to Verify:
  1. The external file column list is defined in the right order.   If all your parquet file columns are defined as string, then this will not be evident until you look at the returned values from your SELECT query.  However, if you have defined actual data types in the file store, those data types must align with SQL data types in your external table definition.
  2. All NVARCHAR() columns in the external table definition are long enough.

Solution:
  1. Align your external table columns properly with the file store
  2. Enlarge the NVARCHAR() column length in your external table definition.  External tables do not store data, they just point to an external data source.  Therefore, consider going with NVARCHAR(4000) to avoid this error in the future.  The ELT data transforms is where you should CAST() all data to correct formats and lengths anyway.

Supporting t-sql Scripts
If you are new to PolyBase and external tables in SQL Server environments, here are a three t-sql scripts that are supporting references to the error resolution given above.

Example CREATE EXTERNAL DATA SOURCE t-SQL.  Click here for more information.

CREATE EXTERNAL DATA SOURCE [MyDataSourceName]
WITH (TYPE = HADOOP,
LOCATION = N'adl://MyDataLakeName.azuredatalakestore.net',
CREDENTIAL = [MyCredential])
GO


Example CREATE EXTERNAL FILE FORMAT t-SQL.  Click here for more information.
CREATE EXTERNAL FILE FORMAT [MyFileFormatName]
WITH (FORMAT_TYPE = PARQUET
, DATA_COMPRESSION = N'org.apache.hadoop.io.compress.SnappyCodec')
GO


Example CREATE EXTERNAL TABLE t-sql script.  Click here for more information.
BEGIN TRY DROP EXTERNAL TABLE [ext].[MyExternalTableName] END TRY BEGIN CATCH END CATCH
 
CREATE EXTERNAL TABLE [ext].[MyExternalTableName]
(
 [ColumnName1] bigint NULL  
,[ColumnName2] nvarchar(4000) NULL   -- if this value is too small, you will get the conversion error
,[ColumnName3] bit NULL  
,[ColumnName4] datetime NULL  
,[ADLcheckSum] nvarchar(64) NULL   
-- if this value is too small, you will get the conversion error
,[ADFIngestionId] nvarchar(64) NULL   -- if this value is too small, you will get the conversion error
)
WITH (DATA_SOURCE = [MyDataSourceName]
, LOCATION = N'/Folder1/Folder2/'
, FILE_FORMAT = [MyFileFormatName]
, REJECT_TYPE = VALUE
,REJECT_VALUE = 0)
GO 

0 Comments

Azure SQL Database Elastic Query

7/20/2018

0 Comments

 
Querying across cloud databases is supported in Azure through elastic queries (in preview).  You can read more about that here, but I thought a good talking point would be to briefly compare to elastic query to PolyBase.  You can read about PolyBase here.
Picture
​Note: At the righting of this blog post, an Azure Data Warehouse could not serve as a "principal" in an elastic query, but it can be the "secondary".

​
These two Azure features have similar setup.  They both require ...
  1. A master key
  2. A database scoped credential
  3. External table definitions
Elastic queries; however, allow you to not only SELECT from an external data source, but, you can also execute stored procedures.  In my thinking, the elastic query is as close as we are going to come to linked servers in Azure.  The definite downside is the defining of external tables in the principal.  These definitions must match the secondary schema name, table or view name.  The external table can omit columns, but it cannot rename or add columns.   This poses a bit of a deployment problem for the secondary when table definitions are changed -- the DDL changes must now be perpetuated in the principal external table definition.  The above images depicts vertical partitioning, but horizontal partitioning is worth a read.

Polybase is about linking to unstructured data, not another database.  That is truly the short version of the matter.  On both principal servers shown above the t-sql syntax is the same SELECT ColumnName FROM externalSchemaName.TableName.   It is not evident what feature you are using: Elastic Query or PolyBase.  Although you can JOIN an internal and external table together, this might fall under the heading "I can, but I won't".  It really depends on the size of your tables.  I personally do not feel that UNION ALL poses the same performance risk.

Conclusion: All said, elastic query is really a nice Azure feature which can solve data migration problems and an easy sharing of reference data.  It surely is not a replacement for ETL -- all things in moderation, my friend!  There remains a solid need for SSIS or ADFv2.   For every Azure offering there is an appropriate implementation place.
0 Comments

    Categories

    All
    Agile Methodology
    Azure Blob Storage
    Azure Data Factory
    Azure Data Lake
    Azure Data Warehouse
    Azure SQL Database
    Cosmos DB
    Data Architecture
    Databricks
    Elastic Query
    External Tables
    Linked Services
    Migrating To The Cloud
    Parameters
    PolyBase
    Project Management

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Modeling for BI
  • DBA
  • Azure
  • SSIS
  • SSAS
  • SSRS - Reporting
  • PBI - Analytics
  • Consulting
  • About