When testing Azure Data Lake (ADL) to Azure Data Warehouse (ADW) file ingestion, this error continued to come up on various external table SELECTs. The confusion was that the ADL only contained parquet files. There was only one external file format defined, and that too was obviously for parquet. From where was an RCFile error originating? The bottom line, in this particular engagement scenario, was that this error is actually a truncation error.
Things to Verify:
Supporting t-sql Scripts
If you are new to PolyBase and external tables in SQL Server environments, here are a three t-sql scripts that are supporting references to the error resolution given above.
Example CREATE EXTERNAL DATA SOURCE t-SQL. Click here for more information.
CREATE EXTERNAL DATA SOURCE [MyDataSourceName]
WITH (TYPE = HADOOP,
LOCATION = N'adl://MyDataLakeName.azuredatalakestore.net',
CREDENTIAL = [MyCredential])
Example CREATE EXTERNAL FILE FORMAT t-SQL. Click here for more information.
CREATE EXTERNAL FILE FORMAT [MyFileFormatName]
WITH (FORMAT_TYPE = PARQUET
, DATA_COMPRESSION = N'org.apache.hadoop.io.compress.SnappyCodec')
Example CREATE EXTERNAL TABLE t-sql script. Click here for more information.
BEGIN TRY DROP EXTERNAL TABLE [ext].[MyExternalTableName] END TRY BEGIN CATCH END CATCH
CREATE EXTERNAL TABLE [ext].[MyExternalTableName]
[ColumnName1] bigint NULL
,[ColumnName2] nvarchar(4000) NULL -- if this value is too small, you will get the conversion error
,[ColumnName3] bit NULL
,[ColumnName4] datetime NULL
,[ADLcheckSum] nvarchar(64) NULL -- if this value is too small, you will get the conversion error
,[ADFIngestionId] nvarchar(64) NULL -- if this value is too small, you will get the conversion error
WITH (DATA_SOURCE = [MyDataSourceName]
, LOCATION = N'/Folder1/Folder2/'
, FILE_FORMAT = [MyFileFormatName]
, REJECT_TYPE = VALUE
,REJECT_VALUE = 0)
|Microsoft Business Intelligence||
All Things Azure