SSAS Impact of Setting ErrorConfiguration - KeyDuplicate property to IgnoreError

11/1/2015

Having come from an OLTP background, I used to think that ignoring duplicate keys in a dimension resulted in duplicated fact records (measures) in a multidimensional cube. This, however, is not the case. Even if the primary key of the dimension is duplicated, SSAS is smart enough to not duplicate related fact rows. The POC explained in this blog post was originally created to prove the very opposite, but now I have learned a different version of the truth, and I thought this little POC might interest someone else. So, let’s get started:

First, just because we can get away with something, is not a good reason to do it. In other words, even though SSAS lets us have duplicate primary dimension keys, and what SSAS considers duplicate values in columns, is not an excuse to allow them. Data integrity should be considered non-negotiable.

Second, the dimension property I am talking about can be found in any dimension under [Dimension Name] --> Properties --> ErrorConfiguration --> KeyDuplicate.

99.9% of the time, the correct selection for the KeyDuplicate property is ReportAndStop. I strongly suggest that each of us just decide to set it now and work through the data cleansing issues that this error handling will almost assuredly expose. Specifically, the error that SSAS generates is as follows:

Dimension Processing Error: Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: 'Dimension Name', Column: 'Column Name', Value: 'abc123’. The attribute is 'Attribute Name'.

Third, let us be clear, “duplicate key” does not just mean “duplicate primary key”. What Analysis Services is talking about is the KeyColumns property associated with each attribute within a dimension. In the screen print below, the key of the Product attribute is the ProductKey column from the Product table. This happens to also be the primary key of both the table and the dimension.

However, every attribute in a dimension has a KeyColumns property and for most attributes, the KeyColumns value is the column itself. If the attribute is used in a hierarchy, you will often find that the KeyColumns property is a composite key containing greater than one column from the source table. In the screen print below, the key of the Calendar Quarter attribute is the composite column values of CalendarYear and CalendarQuarter columns from the Date table.

Now, let’s mock up a little data and see what really happens if we set the ErrorConfiguration:KeyDuplicate property to IgnoreError instead of ReportAndStop. The DSV tables I created are pictured here:

For the Fact table….
SELECT        1 AS PK, 100 AS NetSales, 1 AS DimFK
UNION
SELECT        2 AS PK, 999 AS NetSales, 2 AS DimFK

For the DuplicateStringsDIM …
SELECT        1 AS PK, 'CODE 1' AS Code, 'Descr 1' AS Description, LEN('Descr 1') AS DescrLen
UNION
SELECT        2 AS PK, 'Code 1' AS Expr1, 'Descr 1' + CHAR(9) AS Expr2, LEN('Descr 1' + CHAR(9))
UNION
SELECT        3 AS PK, 'Code 3' AS Expr1, 'Descr 3' AS Expr2, LEN('Descr 3')

For the DuplicatePKsDIM….
SELECT        1 AS PK, 'Code 1' AS Code, 'Descr 1' AS Description
UNION
SELECT        1 AS PK, 'Code 2' AS Expr1, 'Descr 2' AS Expr2
UNION
SELECT        2 AS PK, 'Code 3' AS Expr1, 'Descr 3' AS Expr2

I then created two dimensions – nothing fancy, just regular old dimensions with no hierarchies.

Each dimension has ErrorConfiguration:KeyDuplicate set to IgnoreError.

Next, I built a simple cube, and processed the cube.

While the cube was taking 5 seconds to process full, I jumped over to SSMS (SQL Server Management Studio) and wrote a little t-SQL.

--Create the table to hold duplicate code values
insert @DuplicateStringsDIM
SELECT *
FROM (
SELECT        1 AS PK, 'Code 1' AS Code, 'Descr 1' AS Description
UNION
SELECT        2 AS PK, 'CODE 1' AS Expr1, 'Descr 1' + CHAR(9) AS Expr2
UNION
SELECT        3 AS PK, 'Code 3' AS Expr1, 'Descr 3' AS Expr2
) x

--Create the table to hold duplicate primary key values
DECLARE @DuplicatePKsDIM TABLE (dPK int, Code varchar(30), Descr varchar(60))
insert @DuplicatePKsDIM
SELECT *
FROM (
SELECT        1 AS PK, 'Code 1' AS Code, 'Descr 1' AS Description
UNION
SELECT        1 AS PK, 'Code 2' AS Expr1, 'Descr 2' AS Expr2
UNION
SELECT        2 AS PK, 'Code 3' AS Expr1, 'Descr 3' AS Expr2
) x

--create the table to hold the measures
DECLARE @FactTbl TABLE (fPK int, NetSales int, DimFK int)
insert @FactTbl
SELECT *
FROM (
SELECT        1 AS PK, 100 AS NetSales, 1 AS DimFK
UNION
SELECT        2 AS PK, 999 AS NetSales, 2 AS DimFK
) x

--the results are as expected
--the fact record is duplicated for each duplicate DIM PK
SELECT *
from @FactTbl f
LEFT JOIN @DuplicatePKsDIM d1 on f.DimFK = d1.dPK

--the results are as expected:
--the 2nd 'Descr 1' value has a special character, and therefore groups into two separate rows
SELECT d2.Descr, SUM(NetSales) as SumOfNetSales
from @FactTbl f
LEFT JOIN @DuplicateStringsDIM d2 on f.DimFK = d2.dPK
GROUP BY d2.Descr

The above results in t-SQL makes sense

We had a duplicate primary key in the @DuplicatePKsDIM, so we doubled our NetSales amount.
We had a hidden special character lurking in a string column in the @DuplicateStringDIM, so we ended up with two code values, Code 1, that look duplicated.

Now let’s browse the cube whose data source view contained the exact same rows. First, test the measure all by itself. All is well.

Now look what happens when we slice by PK.

The reality is that there are two PKs with a value of #1 in the DSV, but SSAS has not duplicated the measure amount. We still have just 100 net sales for dimension PK #1. Furthermore, we have lost ‘Code 2’ and ‘Descr 2’, why? Analysis services has taken the row values of the first instance of the PK, and has ignored the second.

This phenomena has nothing to do with the measure group. The following MDX query shows what is actually in the [Duplicate PKs DIM] -- two rows and not three.

Don’t forget what was really in our data source table, @DuplicatePKsDim:

Now let’s look at what SSAS considers a “duplicate” in the [DuplicateStringsDIM] dimension.

If ErrorConfiguration:KeyDuplicate is set to ReportAndStop...

Primary Key #1 will result in a duplicate key error because SSAS knows this is the PK and there are two identical values.
[Code 1] will result in a duplicate key error because of the case difference.
[Descr 1] will result in a duplicate key error because of the hidden special character.

However, in our POC, we have IgnoreError selected, so again, SSAS has taken the first instance of the KeyColumn
attribute.

The [Duplicate PKs DIM] only has one instance of primary key values #1
The [Duplicate Strings DIM] has the first instance of [Code 1] and the first instance of [Descr 1]

When we slice by the primary key, our Net Sales is not duplicated.

In summary,

Duplicate primary keys, collation differences or special characters will not duplicate related measures.
This is not a good reason to allow unclean data into the data sources of the data source view tables or named queries.
However, if you cannot control the ETL data cleansing process, but must control daily cube processing success, ignoring duplicate key errors might be something you consider as a last resort.
Understand that Analysis Services will select the “first instance” of a duplicate key, so you might not get the row / attribute values of preference.

0 Comments

SQL Server Analysis Services

SSAS Impact of Setting ErrorConfiguration - KeyDuplicate property to IgnoreError

Leave a Reply.

Categories