SSAS ProcessFull results in inflated aggregated data - ssas

Every night we run a full cube processing through an SSIS job that runs a script:
<Batch>
<Process>
<Type>ProcessFull</Type>
<Object>
<DatabaseID>OurDatabase</DatabaseID>
</Object>
</Process>
</Batch>
After adding a measure and processing the cube is now showing measure data that is inflated almost way more than it should be. A measure that should read 11.8 million now reads 684 million. The underlying data is accurate but the aggregations are not. There is no pattern to the inflation of the numbers that I can see.
However, if I redeploy the cube via xmla with full processing attached to the alter, it works fine. I would rather not have to do this by hand every morning at 1 am... so any ideas would be helpful.
It should also be noted that we rolled back to the previous cube schema and still have this problem. We have also tried restarting the SSAS service in production with no success. This problem cannot be recreated in any other environment.

Do you have any partitions that could have inaccurate filters, resulting in multiple partitions reading in the same data?
Alternatively, have you tried:
<Batch Transaction="true" ProcessAffectedObjects="true">

Related

ProcessUpdate of the dimension triggers processing of all partitions of all measure groups in the cube

I have Account and Customer dimensions in the cube that are connected to the same measure groups (there are about 15 - 20 measure groups in the cube).
When I run XMLA command to process update these two dimensions, like this:
<Batch>
<Parallel>
<Process>
<Object>
<DatabaseID>My Database</DatabaseID>
<DimensionID>Dim Customer</DimensionID>
</Object>
<Type>ProcessUpdate</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
</Parallel>
</Batch>
in the case of Account dimension it finishes in a couple of minutes because it doesn't trigger processing of all partitions of all measure groups. But in the case of Customer dimension it triggers processing of all partitions of all measure groups, so process update of this dimension lasts longer then full processing of entire cube.
I am not sure what can be the reason from which the dimension will trigger all this processing in the case of one dimension and not in the case of the other. For both dimensions Process affected objects is set to Do not process. Where should I look, what to check, can I somehow prevent this reprocessing happen?
Thanks!
The documentation of ProcessUpdate states that
Forces a re-read of data and an update of dimension attributes. Flexible aggregations and indexes on related partitions will be dropped.
It can cause the aggregations to be dropped.
Specifically, there is an MSDN blog about the different processing options, which has details of when the aggregations could be dropped
Depending on the nature of the changes in the dimension table, ProcessUpdate can affect dependent partitions. If only new members were added, then the partitions are not affected. But if members were deleted or if member relationships changed (e.g., a Customer moved from Redmond to Seattle), then some of the aggregation data and bitmap indexes on the partitions are dropped.
Chris Webb one of the key persons in the BI world has blogged about this as well, specifically, he has the following to say:
The act of clearing the indexes/aggregations also shows up as "partition processing operations" in Profiler

Save history partitions in Analysis Services measure group

We have a multidimensional cube, with a stock on hand measure group, partitioned by year. The underlying table has 1.5 billion rows in it, and effectively equals around 275 million rows per partition.
Every night we do a process full on the entire SSAS database, and of course all these history partitions (like SOH 2011, SOH 2012 etc) process all over again, even though the data never changes.
I would like to know if there is a way of still executing a process full of the SSAS database , but preserving the data in those history partitions so that they do not get processed.
Update: In reply to one of the comments, about just processing the latest measure group partitions. Of course that is an option, and what that implies is that you are going to create a set of customised jobs / steps to process Dimensions, and then certain measure group partitions. This is more challenging to maintain, and also you have to be as smart as the SSAS engine to decide on parallel processing options etc.
My ideal solution would be to somehow mark those partitions as not needing processing, or restore the processed partitions from a previous process.

Loading huge flatfiles to SQL table is too slow via SSIS package

I receive about 8 huge delimited flatfiles to be loaded into an SQL server (2012)table once every week. Total number of rows in all the files would be about 150 million and each file has different number of rows. I have a simple SSIS package which loads data from flatfiles(using foreach container) into a history table. And then a select query runs on this history table to select current weeks data and loads into a staging table.
We ran into problems as history table grew very large(8 billion rows). So I decided to back up the data in history table and truncate. Before truncation the package execution time ranged from 15hrs to 63 hrs in that order.We hoped after truncation it should go back to 15hrs or less.But to my surprise even after 20+ hours the package is still running. The worst part is that it is still loading the history table. Latest count is around 120 million. It still has to load the staging data and it might take just as long.
Neither history table nor staging tables have any indexes, which is why select query on the history table used to take most of the execution time. But loading from all the flatfiles to history table was always under 3 hrs.
I hope i'm making sense. Can someone help me understand what could be the reason behind this unusual execution time for this week? Thanks.
Note: The biggest file(8GB) was read at flatfile source in 3 minutes. So I'm thinking source is not the bottle neck here.
There's no good reason, IMHO, why that server should take that long to load that much data. Are you saying that the process which used to take 3 hours, now takes 60+? Is it the first (data-load) or the second (history-table) portion that has suddenly become slow? Or, both at once?
I think the first thing that I would do is to "trust, but verify" that there are no indexes at play here. The second thing I'd look at is the storage allocation for this tablespace ... is it running out of room, such that the SQL server is having to do a bunch of extra calesthenics to obtain and to maintain storage? How does this process COMMIT? After every row? Can you prove that the package definition has not changed in the slightest, recently?
Obviously, "150 million rows" is not a lot of data, these days; neither is 8GB. If you were "simply" moving those rows into an un-indexed table, "3 hours" would be a generous expectation. Obviously, the only credible root-cause of this kind of behavior is that the disk-I/O load has increased dramatically, and I am healthily suspicious that "excessive COMMITs" might well be part of the cause: re-writing instead of "lazy-writing," re-reading instead of caching.

SSAS : Multiple Partitions

I have a situation where I 3 partitions in SSAS BIDS 2008 for different years. I need to know which partition is in use in current context and why? How to change it manually?
For example, I have to partitions: P2001, P2002 and P2001-2002; user queries for sales in 2002.
In this case, what partition comes in play and why only that? How can I change this. I want to use P2001 when user queries for sales in 2002 (It makes no sense logically but will clarify my doubts)
I hope I made sense in elaborating my idea?
Thanks in advance.
First of all, your partitions should not have overlapping data. It will read the overlapping data twice (or the number of partitions this data is on). You do not control which partition is being read, SSAS knows which partition each key is on, so it will just read that partition when running a query.
You can use the SQL Server Profiler to look at the queries being run to see which partitions are being read, here is an example from the web:
In order to be able to query without any cached data (to make sure to see which partitions are being read), you can run this XMLA to clear the cache for your cube then run your queries again:
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ClearCache>
<Object>
<DatabaseID> database id </DatabaseID>
<CubeID> cube id </CubeID>
</Object>
</ClearCache>
</Batch>

SQL Cube Processing Window

I've got Dim Tables, Fact Tables, ETL and a cube. I'm now looking to make sure my cube only holds the previous 2 months worth of data. Should this be done by forcing my fact table to hold only 2 months of data and doing a "full process", or is there a way to trim outdated data from my cube?
Your data is already dimensionalized through ETL and you have a cube built on top of it?
And you want to retain the data in the Fact table, but not necessarily need it in the cube for more than the last 2 months?
If you don't even want to retain the data, I would simply purge the fact table by date. Because you're probably going to want that space reclaimed anyway.
But there are also settings in the cube build - or build your cube off dynamic views that only expose the last two months - then the cube (re-)build can be done before you've even purged the underlying fact tables.
You can also look into partitioning by date:
http://www.mssqltips.com/tip.asp?tip=1549
http://www.sqlmag.com/Articles/ArticleID/100645/100645.html?Ad=1