i have read various article discussing different methods to optimize the SSAS Cube processing time but each one of them takes much time than i expected.
Background
I have a couple of tables in database that undergoes transaction 24/7, i have created a view and populate my cube and dimension with that view. i have only one cube that has only one measure and only one dimension.
Initially i have "Process Full" my dimension and cube respectively (it takes almost an hour). Now i want to update the cube once in every hour to reflect the current statistics.
Question
Now what type of "Process Option" do i need to consider to process the cube quickly?
What i have tried so far
Process Default:when i do "Process Default" to both dimension and cube, it takes less then a minute to succeed but reflects no recent changes.
Process Update : when i do "Process Update" to both dimension and cube ,it takes every time(once in every hour) almost an hour to succeed to reflects recent changes.
Process Full : when i do "Process Full" to both dimension and cube (), it takes every time(once in every hour) almost an hour to succeed to reflects recent changes.
Process Incremental: This option only add the new records but do not reflect any update or delete to the previous records present in the cube.
kindly give me any tip to resolve the matter.
Related
We have a multidimensional cube, with a stock on hand measure group, partitioned by year. The underlying table has 1.5 billion rows in it, and effectively equals around 275 million rows per partition.
Every night we do a process full on the entire SSAS database, and of course all these history partitions (like SOH 2011, SOH 2012 etc) process all over again, even though the data never changes.
I would like to know if there is a way of still executing a process full of the SSAS database , but preserving the data in those history partitions so that they do not get processed.
Update: In reply to one of the comments, about just processing the latest measure group partitions. Of course that is an option, and what that implies is that you are going to create a set of customised jobs / steps to process Dimensions, and then certain measure group partitions. This is more challenging to maintain, and also you have to be as smart as the SSAS engine to decide on parallel processing options etc.
My ideal solution would be to somehow mark those partitions as not needing processing, or restore the processed partitions from a previous process.
I have a scenario where an SSAS cube's data needs to be refreshed. We want to avoid using a full refresh that takes an hour, and do a 'delta' refresh. The delta refresh should
1) Update fact records that have changed
2) Insert fact records that are new
3) Delete fact records that no longer exist
Consider a fact table with three dimensions: Company, Security, FiscalYear
and two measures: Qty, Amount
Scenario: In the fact table, a record with Company A, Security A, FiscalYear A has the measure Qty changed from 2 to 20. Previously the cube correctly showed the Qty to be 2. After the update,
If we do a Full refresh, it correctly shows 20. But in order to get this, we had to suffer a full hour of cube processing.
We tried adding a timestamp column to the fact table, split the cube into Current and Old partitions, full refreshed the Current Partition and Merged into Old partition as seems to be the popular suggestion. When we browse the cube, it shows 22, which is incorrect
We tried an Incremental refresh of the cube, same issue. It shows 22, also incorrect.
So what I am trying to ascertain here, is whether there is no way to process a cube so it only takes the changes (and by that I mean Updates, Inserts AND deletes, not just Inserts!) and applies them to the data inside an SSAS cube?
Any help would be greatly appreciated!
Thanks!
No, there is no way to do this. The only control you have over processing is the granularity of what you process. For instance, if you know that data over a certain age will never change, you can put data over that age in a partition, and not include it in your processing.
I have a 'Employee' dimension which will be changed (modified) everyday, I made monthly partitions in cube and only process full the current month partition. Lately found that the past month's aggregation will not be dropped. Tired to 'ProcessUpdate' on this dimension and 'ProcessIndex' on partition but remained same. Also tried the setting 'ProcessAffectObjects'and 'ProcessIndex' again, still the same, tried both on lazyprocessing true and false with no luck.
So my question is how to drop the stale aggregation on past month and rebuild them explicit ?
It is a distinct count measurement and no aggregation designed via wizard
Tried drop the index by using 'ProcessClearIndexes' in XMAL command, it worked fine and use 'ProcessIndexes' did rebuild the indexes and aggregation, saw them from the SSMS query execution message .
So might it only be related to the distinct count , just because it is a non-aggregation measurement ?
"Non-additive measures create the following problems on a typical OLAP system:
Roll-ups are not possible. When pre-calculating results during cube processing, the system cannot deduce summaries from other summaries. All results must be calculated from the detail data. This situation places a heavy burden in processing time.
All results must be pre-calculated. With non-additive measures, there is no way to deduce the result for a higher-level summary query from one pre-calculated aggregation. Failure to pre-calculate the results in advance means that the results are not available. It is impossible to perform and maintain incremental updates to the system. A single transaction added to the cube usually invalidates huge portions of previously pre-calculated results. In order to recover from this, a complete recalculation is needed."
"Aggregations
As mentioned before, DISTINCT COUNTs are not additive (and this is the main reason why these measures are so problematic). Therefore, the aggregations, which are all derived from additive operators, are completely useless;"
someone answered my question on MSDN
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/7302227f-11b8-4adc-98ff-72b6c395775b/ssas-update-a-dimension-wont-drop-aggregation-process-index-wont-rebuild-aggregation?forum=sqlanalysisservices
If you use materialized reference dimensions ensure you do ProcessFull to reprocess the fact tables again fully. The reason is that the join to the intermediate dimension happens in the measure group partition processing query:
http://sqlblog.com/blogs/alberto_ferrari/archive/2009/02/25/ssas-reference-materialized-dimension-might-produce-incorrect-results.aspx
I'm building a dimensional data warehouse and learning how to model my various business processes from my source system in my warehouse.
I'm currently modelling a "Bid" (bid for work) from our source system in our data warehouse as a fact table which contains information such as:
Bid amount
Projected revenue
Sales employee
Bid status (active, pending, rejected, etc)
etc.
The problem is that the bid (or most any other process I'm trying to model) can go through various states and have its information updated at any given moment in the source system. According to Ralph Kimball, fact tables should only be updated if they are considered "accumulating snapshot" and I'm sure that not all of these processes would be considered an "accumulating snapshot" by the definition below.
How should these type of processes be modeled in the data warehouse according to the Kimball group's recommendations? Further more, what type of fact table would work for a bid (given the facts I've outlined above)?
Excert from http://www.kimballgroup.com/2008/11/fact-tables/
The transaction grain corresponds to a measurement taken at a single
instant. The grocery store beep is a transaction grain. The measured
facts are valid only for that instant and for that event. The next
measurement event could happen one millisecond later or next month or
never. Thus, transaction grain fact tables are unpredictably sparse or
dense. We have no guarantee that all the possible foreign keys will be
represented. Transaction grain fact tables can be enormous, with the
largest containing many billions of records.
The periodic snapshot grain corresponds to a predefined span of time,
often a financial reporting period. Figure 1 illustrates a monthly
account periodic snapshot. The measured facts summarize activity
during or at the end of the time span. The periodic snapshot grain
carries a powerful guarantee that all of the reporting entities (such
as the bank account in Figure 1) will appear in each snapshot, even if
there is no activity. The periodic snapshot is predictably dense, and
applications can rely on combinations of keys always being present.
Periodic snapshot fact tables can also get large. A bank with 20
million accounts and a 10-year history would have 2.4 billion records
in the monthly account periodic snapshot!
The accumulating snapshot fact table corresponds to a predictable
process that has a well-defined beginning and end. Order processing,
claims processing, service call resolution and college admissions are
typical candidates. The grain of an accumulating snapshot for order
processing, for example, is usually the line item on the order. Notice
in Figure 1 that there are multiple dates representing the standard
scenario that an order undergoes. Accumulating snapshot records are
revisited and overwritten as the process progresses through its steps
from beginning to end. Accumulating snapshot fact tables generally are
much smaller than the other two types because of this overwriting
strategy.
Like one of the comments mention, Change Data Capture is a fairly generic term for "how do I handle changes to data entities over time", and there are entire books on it (and a gazillion posts and articles).
Regardless of any statements that seem to suggest a clear black-and-white or always-do-it-like-this answer, the real answer, as usual, is "it depends" - in your case, on what grain you need for your particular fact table.
If your data changes in unpredictable ways or very often, it can become challenging to implement Kimball's version of an accumulated snapshot (picture how many "milestone" date columns, etc. you might end up needing).
So, if you prefer, you can decide to make your fact table be an transactional fact table rather than a snapshot, where the fact key would be (Bid Key, Timestamp), and then in your application layer (whether a view, mview, actual app, or whatever), you can ensure that a given query only gets the latest version of each Bid (note that this can be thought of as kind of a virtual accumulated snapshot). If you find that you don't need the previous versions (the history of each Bid), you can have a routine that prunes them (i.e. deletes or moves them somewhere else).
Alternatively, you can only allow the fact (Bid) to be added when it is in it's final state, but then you will likely have a significant lag where a new (updateable) Bid doesn't make it to the fact table for some time.
Either way, there are several solid and proven techniques for handling this - you just have to clearly identify the business requirements and design accordingly.
Good luck!
In a financial system, transactions of every year is stored in a separate table. So, there are Transactions2007, Transactions2008, ..., Transactions2012 tables in the system. They all have the same table design. The data in tables of previous years never change. But current years data is updated in a daily manner.
I want to build a cube on the union of tables of all years. The question is how to prevent SSAS from reprocessing previous years.
When processing the Cube, you can set the cube process option to Process Incremental and then in the Configuration Dialog, you can select a query to select data only from the recent tables. Here is a link for more info.
I handled it by partitioning the cube (by time dimension) and processing only the most recent partition.