SQL Server Profiler vs Extended Events for Analysis Services (SSAS) 2016 Database Administration.3/6/2017 I recently taught a live webinar for Pragmatic Works on SSAS Monitoring and Performance Tuning with Extended Events. It is worth noting that Profiler is still available to us in SQL Server 2016, and continues to provide benefits that Extended Events do not. This is how I see it. SQL Server Profiler 2016 1. This very fine SQL Server tool is being deprecated by Microsoft in a later version as stated on MDSN here. The good news is that Profiler is not deprecated quite yet. Hooray! 2. This is still the fastest way to log SSAS query activity for immediate analysis. You do not need to write out to a file, view said file, and import. SQL Profiler provides a very handy viewing tool that is real time. When needing to complete a one-off monitoring of a problematic query, SSIS package or SSRS report, this is the low hanging fruit. 3. Granted, Profiler has a heavier "footprint" than Extended Events (i.e. it takes more system resources). Coupled with the fact that it is going away ... eventually ... it should not be considered as a long-term monitoring tool of choice. When I say "long term monitoring" I am referring to scheduling monitoring through a SSIS package. 4. When working with Profiler and SSAS, it is important to Show All Properties on the Events Selection tab of the Trace Properties window and mark relevant SSAS events. Not all SSAS events are marked by default when connecting to a SSAS instance and taking the default SSAS trace. The list of tabular and multidimensional events I use most often is given below. Extended Events (EE) in SQL Server 2016 1. It is sometimes misunderstood, but EE is not a new SQL 2016 SSAS feature. You could use XMLA in earlier SQL Server versions. Chris Webb has a very good BI Blog post here that talks about doing this very thing in SQL 2012. 2. What is new in SQL 2016 is the user interface for creating SSAS EE sessions. Unfortunately, I feel that Microsoft did a half-hearted effort in this, and that is the politically correct way of saying it. In a SSAS instance, you can create EE sessions, but you cannot edit them. For real? The solution is to use the GUI to setup the session, then be sure to script it out before you save it. 3. When working with EE for SSAS, I recommend you use a XMLA query in SQL Server Management Studio (SSMS). Attached to this post is the XMLA for creatings, stopping, deleting and querying current EE sessions for both tabular models and multidimensional cubes. 4. If using EE to warm your SSAS multidimensional cube cache, or to monitor user query durations and execution counts, EE and SSIS is definitely the best solution. Use a SSIS package on a scheduled job to automatically stop, import and restart EE sessions. Use t-sql against the resulting SQL Server database table, and SSRS to automatically report long running queries. Profile Trace / Extended Events SSAS Events: This is not a complete listing, but these are the events that I find most helpful when monitoring SSAS. Multidimensional ExecuteMDXScriptEnd GetDataFromAggregations GetDataFromCache ProgressReportEnd QueryCubeEnd QueryEnd QuerySubcube QuerySubcubeVerbose Tabular DAXQueryPlan QueryCubeEnd QueryEnd QueryExceedsMemoryLimit VertiPaqSEQueryEnd (using the VertiPaq Scan subclass event) XMLA File Downloads
1 Comment
To prove out SQL Server’s columnstore capabilities, I wanted to set up my own POC (proof of concept). I found, however, that to do this, I needed a more fundamental understanding of how and when SQL Server actually took advantage of the benefits of columnstore. So having worked through this for myself, I thought I’d document my Columnstore POC for you. The purpose of the article is to help you setup your own POC so you can test this out for yourself with your own data – the intent is not to explain columnstore in general. First, to setup your POC, make sure your data store and t-SQL meet these qualifications: 1. Have at least one FACT table with at least 100 million rows. I setup the following: a. Four fact tables with identical values with 100.2 million rows each § A heap table with no indexes or keys – created for the sake of example § An indexed table with a clustered PK, but no columnstore – a common SQL 2008 R2 data warehouse scenario § A non-clustered columnstore indexed table which included all columns – a SQL 2012 scenario § A clustered columnstore indexed table which included all columns – a SQL 2014 scenario b. One degenerate dimension table containing a clustered PK. This table had a 1-1 relationship with my fact table. c. Two dimension tables containing a clustered PK, one being a date or calendar table. In the end, my table list was as follows … NoCSI = No clustered columnstore index (clustered PK) WithCCSI = With clustered columnstore index (no PK) WithNCCSI = With non-clustered columnstore index (no PK) … and my indexes were as follows: 2. Understand that a clustered Columnstore index is the primary storage for the entire table. A nonclustered Columnstore (NCCS) index is a secondary index. A table with a NCCS is actually a rowstore table and the index is a copy of the rows and columns in the table. 3. Model your POC in a proper star schema using integer data types for PKs and SKs etc. 4. To test Columnstore, your t-sql must use an aggregate, like SUM() or COUNT(), a GROUP BY and preferably an ORDER BY 5. To test segment elimination, use a WHERE clause that will pull back only a subset of data 6. To take advantage of batch mode query processing, your t-SQL should use INNER JOINS, not LEFT OUTER or OUTER joins. This was true in SQL Server 2012. I could not replicate the ‘Row’ Actual Execution Mode in SQL Server 2014. 7. This might seem obvious, but make sure you are working in the SQL server 2014 version of SSMS (SQL Server Management Studio). You can actually create a clustered Columnstore index in SQL 2012 SSMS on a table stored on a SQL 2014 instance, but it just gets confusing because the SQL 2012 GUI won’t visually indicate it. When you think you are ready to go, double check your table design and data by running some system views. 1. Verify that Columnstore indexes really exist. SELECT distinct t.name FROM sys.column_store_row_groups csrg LEFT JOIN sys.tables t on csrg.object_id = t.object_id 2. Verify that your fact tables contain the same number of rows. You can do this a couple of ways. a. Script to select table metadata SELECT tbl_ObjectID = t.object_id ,tbl_SchemaName = s.name ,tbl_TableName = t.name ,tbl_TableType = t.type ,tbl_TableTypeDescr = t.type_desc ,tbl_DateCreated = create_date ,tbl_RowCount = si.rowcnt ,tbl_ColumnCount = c.ColumnCount FROM sys.tables t WITH (NOLOCK) LEFT JOIN sys.schemas s WITH (NOLOCK) on t.schema_id = s.schema_id LEFT JOIN sys.sysindexes si WITH (NOLOCK) on t.object_id = si.id AND si.indid < 2 LEFT JOIN (SELECT object_id, COUNT(*) as ColumnCount from sys.columns c group by c.object_id) c on t.object_id = c.object_id b. Script to get space used exec sp_spaceused '[dbo].[FactTransactionsHeap]', true; exec sp_spaceused '[dbo].[FactTransactionsNoCSI]', true; exec sp_spaceused '[dbo].[FactTransactionsWithCCSI]', true; exec sp_spaceused '[dbo].[FactTransactionsWithNCCSI]', true; Before we start testing for query performance, also take a look at the space used. It should come as no surprise that the clustered columnstore table uses 32% less space than the other tables. You can read about sp_spaceused on MSDN here (https://msdn.microsoft.com/en-us/library/ms188776.aspx), but a brief explanation is as follows: · Reserved is the total amount of space allocated by objects in the database · Data is the total amount of space used by data · Index_size is the total amount of space used by index · Unused is the total amount of space reserved for objects in the database, but not yet used. Nicely formatted, our test data looks like this: Now for the good stuff! Let’s write a typical OLAP query and run it against each fact table recording execution times. SET STATISTICS TIME ON GO SELECT c.cal_Year ,i.itm_Code ,SUM(trn_NetAmt) as SumOfNetAmt FROM dbo.FactTransactionsHeap t --dbo.FactTransactionsNoCSI t --dbo.FactTransactionsWithCCSI t --dbo.FactTransactionsWithNCCSI t INNER JOIN dbo.DimCalendar c on t.trn_TransDateSK = c.cal_DateIntPK INNER JOIN dbo.DimITem i on t.trn_ItemSK = i.itm_PK GROUP BY c.cal_Year, i.itm_Code ORDER BY i.itm_Code, c.cal_Year Returning 177,099 aggregated rows, these were my execution times: Before moving on to add a WHERE clause, let’s take a look at our query execution plans. As expected, our heap table used a table scan. Our table with only a clustered PK (our SQL 2008 R2 scenario) used a clustered index scan. Our table with a non-clustered Columnstore index (our SQL 2012 scenario) used a Columnstore index scan. We have almost identical results with our clustered Columnstore indexed table (our SQL 2014 scenario). Let’s take this one step further. Hover your mouse over your Columnstore Index Scan operator. If your query is optimized, your Actual Execution Mode should say ‘Batch’. This is a fundamental principle of query performance. You want SQL Server to use ‘Batch’ mode vs ‘Row’ mode as much as possible. If you look under your Heap or NoCSI scan operators, you will see that they are using a ‘Row’ Actual Execution Mode … and therefore they take longer to execute. Okay, now let’s add a WHERE clause as 99% of the OLAP queries I have seen want to filter or slice on some sort of date value. We should also talk about segment elimination, which you can read up from TechNet here --> (http://social.technet.microsoft.com/wiki/contents/articles/5651.understanding-segment-elimination.aspx). The short version is SQL Server segments columnstore columns in groups of roughly 1 million rows. Metadata associated with each segment exposes a MAX and MIN value of each segment. So when you filter on DimDate.DateKey between [x] and [y], SQL Server will look at the segment metadata and automatically skip past those segments that do not contain qualifying column values. An example of how to verify columnstore segment elimination can be found here --> (http://social.technet.microsoft.com/wiki/contents/articles/5611.verifying-Columnstore-segment-elimination.aspx) but for our purposes, let’s have an element of trust and look at our execution results from a filtered query. First, let’s take a look to see what we have to work with. Next, let’s build a query. SELECT c.cal_Year ,i.itm_Code ,SUM(trn_NetAmt) as SumOfNetAmt FROM --dbo.FactTransactionsHeap t --dbo.FactTransactionsNoCSI t dbo.FactTransactionsWithCCSI t --dbo.FactTransactionsWithNCCSI t INNER JOIN dbo.DimCalendar c on t.trn_TransDateSK = c.cal_DateIntPK INNER JOIN dbo.DimITem i on t.trn_ItemSK = i.itm_PK WHERE t.trn_TransDateSK Between 20060101 and 20121231 GROUP BY c.cal_Year, i.itm_Code ORDER BY i.itm_Code, c.cal_Year Returning 93,469 aggregated rows, these were my execution times: A quick note about our ‘Heap’ table and the ‘NoCSI’ table which contained a clustered primary key. How many times have you heard the phrase “index lite”? We have a clustered PK on our FactTransactionsNoCSI and our one-to-one degenerate dimension DimTransactions to support that relationship, but notice how the ‘Heap’ table actually had better query performance. It is only fair that we take a minute to demo a couple of anti-performance practices given to TechNet’s SQL Server Columnstore Performance Tuning article found here --> (http://social.technet.microsoft.com/wiki/contents/articles/4995.sql-server-Columnstore-performance-tuning.aspx) 1. String Filters 2. OUTER JOINS 3. Joining to an equally large DegenerateDim table that does not have a columnstore index Here is our t-SQL where we are just wanting OPEN status transactions SELECT c.cal_Year ,i.itm_Code ,SUM(trn_NetAmt) as SumOfNetAmt FROM dbo.FactTransactionsWithCCSI t INNER JOIN dbo.DimCalendar c on t.trn_TransDateSK = c.cal_DateIntPK INNER JOIN dbo.DimITem i on t.trn_ItemSK = i.itm_PK INNER JOIN dbo.DimTransactions dt on t.trn_PK = dt.trn_PK WHERE t.trn_TransDateSK Between 20060101 and 20121231 and dt.trn_StatusCode <> 'CLOSED' GROUP BY c.cal_Year, i.itm_Code ORDER BY i.itm_Code, c.cal_Year Here is our query execution plan where our filter cost is 86% of our total execution time. Note the ‘Row’ Actual Execution Mode. The case can be made that this is expected behavior and not the fault of our fact table columnstore index, and that is correct as we are filtering on DimTransactions which does not have a columnstore. Let’s put a columnstore index on DimTransaction as it does have as many rows as our fact table, and see what happens. CREATE TABLE [dbo].[DimTransactionsWithCCSI]( [trn_PK] [int] NOT NULL, [trn_No] [int] NOT NULL, [trn_SeqNo] [int] NOT NULL, [trn_SourceCode] [varchar](50) NOT NULL, [trn_RouteCode] [varchar](50) NULL, [trn_StatusCode] [varchar](50) NULL, ) ON [PRIMARY] GO insert dbo.DimTransactionsWithCCSI select * from dbo.DimTransactions create clustered Columnstore index dtClustColStoreIndex ON dbo.DimTransactionsWithCCSI go SELECT c.cal_Year ,i.itm_Code ,SUM(trn_NetAmt) as SumOfNetAmt FROM dbo.FactTransactionsWithCCSI t INNER JOIN dbo.DimCalendar c on t.trn_TransDateSK = c.cal_DateIntPK INNER JOIN dbo.DimITem i on t.trn_ItemSK = i.itm_PK INNER JOIN dbo.DimTransactionsWithCCSI dt on t.trn_PK = dt.trn_PK WHERE t.trn_TransDateSK Between 20060101 and 20121231 and dt.trn_StatusCode <> 'CLOSED' GROUP BY c.cal_Year, i.itm_Code ORDER BY i.itm_Code, c.cal_Year The result is that DimTransaction now has a CCSI. Consequently, we now have a 'bach' Actual Execution Mode for DimTransaction vs 'Row' and our query is off and flying! Change the INNER JOINS to LEFT OUTER JOINS. SELECT c.cal_Year ,i.itm_Code ,SUM(trn_NetAmt) as SumOfNetAmt FROM dbo.FactTransactionsWithCCSI t LEFT OUTER JOIN dbo.DimCalendar c on t.trn_TransDateSK = c.cal_DateIntPK LEFT OUTER JOIN dbo.DimITem i on t.trn_ItemSK = i.itm_PK LEFT OUTER JOIN dbo.DimTransactionsWithCCSI dt on t.trn_PK = dt.trn_PK WHERE t.trn_TransDateSK Between 20060101 and 20121231 and dt.trn_StatusCode <> 'CLOSED' GROUP BY c.cal_Year, i.itm_Code ORDER BY i.itm_Code, c.cal_Year The result in SQL 2014 is that my LEFT OUTER joins still used ‘batch’ Actual Execution Mode and SQL Server is even aware enough to turn the one-to-one relationship between DimTrans and FactTrans into an INNER join. In summary,
1. With each new version of SQL Server, Columnstore is going to enhance, so do not assume that all prior guidelines and best practices remain the same. 2. Columnstore indexes do work and if you are running Enterprise edition, take advantage of them! 3. They are intended for OLAP environments primarily. 4. Executing “Select * FROM MyColumnstoreIndexedTable” will not show you the benefits. You have to aggregate some data, group, order and filter which is the exact scenario of data warehousing queries. 5. You’ll only see significant benefits of Columnstore when working with large data volumes. 6. Once again, Columnstore just like SSAS, will produce the best results when working with star schema data models. Please remember that the purpose of this documentation is not to explain columnstore, but to help you put together a POC and prove out the capabilities of columnstore indexes. A blog site that I feel goes above and beyond for columnstore documentation is Niko Neugebauer and can be found here (http://www.nikoport.com/Columnstore/). Niko’s site has 56 tips and tricks for columnstore. Enjoy! I will be writing more about SQL Server 2016, so this post is just cover step #1: installation. After downloading from Microsoft (http://www.microsoft.com/en-us/server-cloud/products/sql-server-2016/default.aspx), installation was typical. I only ran into one anomaly, the Oracle JRE. 1. If you choose to install Polybase Query Service for External Data… … MS will be looking for the Java SE Runtime Environment v7 or higher. You can download it from here --> http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html. Without the Oracle JRE installed, you will receive this failure message: 2. Take the defaults with these exceptions a.) Specify a named instance b.) Be sure to add yourself as a SQL Server Admin. I also always add a local system account that I have created specifically for SQL Server. c.) Explicitly choose your data directories d.) If you are installing on a local PC, you may wish to decrease the number of Temp DB files. 3. Be prepared to restart your computer. 4. Install a 2nd and 3rd instance for Tabular and PowerPivot SSAS instances. (You do not need additional database engines, DQS and SSRS for your additional SSAS instances -- you only want SSAS.) Note: PolyBase Query Service for External Data can only be installed on one SQL Server instance. 5. When complete, open your new SQL Server 2016 CTP2.0 Configuration Manager. Your SQL 2016 installed instances should look something like this: That's it! Stay tuned for more to come of SQL Server 2016 new features.
|
Microsoft Data & AI | Database administration |