In a financial system, transactions of every year is stored in a separate table. So, there are Transactions2007, Transactions2008, ..., Transactions2012 tables in the system. They all have the same table design. The data in tables of previous years never change. But current years data is updated in a daily manner.
I want to build a cube on the union of tables of all years. The question is how to prevent SSAS from reprocessing previous years.
When processing the Cube, you can set the cube process option to Process Incremental and then in the Configuration Dialog, you can select a query to select data only from the recent tables. Here is a link for more info.
I handled it by partitioning the cube (by time dimension) and processing only the most recent partition.
Related
We have a multidimensional cube, with a stock on hand measure group, partitioned by year. The underlying table has 1.5 billion rows in it, and effectively equals around 275 million rows per partition.
Every night we do a process full on the entire SSAS database, and of course all these history partitions (like SOH 2011, SOH 2012 etc) process all over again, even though the data never changes.
I would like to know if there is a way of still executing a process full of the SSAS database , but preserving the data in those history partitions so that they do not get processed.
Update: In reply to one of the comments, about just processing the latest measure group partitions. Of course that is an option, and what that implies is that you are going to create a set of customised jobs / steps to process Dimensions, and then certain measure group partitions. This is more challenging to maintain, and also you have to be as smart as the SSAS engine to decide on parallel processing options etc.
My ideal solution would be to somehow mark those partitions as not needing processing, or restore the processed partitions from a previous process.
I have a scenario where an SSAS cube's data needs to be refreshed. We want to avoid using a full refresh that takes an hour, and do a 'delta' refresh. The delta refresh should
1) Update fact records that have changed
2) Insert fact records that are new
3) Delete fact records that no longer exist
Consider a fact table with three dimensions: Company, Security, FiscalYear
and two measures: Qty, Amount
Scenario: In the fact table, a record with Company A, Security A, FiscalYear A has the measure Qty changed from 2 to 20. Previously the cube correctly showed the Qty to be 2. After the update,
If we do a Full refresh, it correctly shows 20. But in order to get this, we had to suffer a full hour of cube processing.
We tried adding a timestamp column to the fact table, split the cube into Current and Old partitions, full refreshed the Current Partition and Merged into Old partition as seems to be the popular suggestion. When we browse the cube, it shows 22, which is incorrect
We tried an Incremental refresh of the cube, same issue. It shows 22, also incorrect.
So what I am trying to ascertain here, is whether there is no way to process a cube so it only takes the changes (and by that I mean Updates, Inserts AND deletes, not just Inserts!) and applies them to the data inside an SSAS cube?
Any help would be greatly appreciated!
Thanks!
No, there is no way to do this. The only control you have over processing is the granularity of what you process. For instance, if you know that data over a certain age will never change, you can put data over that age in a partition, and not include it in your processing.
I've a SSAS cube with rigid relationship. Daily I get data from source for last 2 months only. My cube have data since 2010 onwards.
I am planning to partition that cube and then process it. My questions are
I know that in rigid relationship I've to go with Process full. Does that mean that I've to process all partition as Process Full or I can go ahead with selected partition for process full.
How can I design my partition strategy? If I do 2 months partition then I will end up in 6 partition per year and later they may increase. I thought of going with 6 months partition. but if I am on 7th month or 1st month then I've to process two partition(i.e. current + last 6 month). Is it good enough?
Marking attribute relationships as Rigid when they actually do change (meaning when the rollups change such as Product A rolling up to Cereal vs. Oatmeal category) is a bad idea. Just mark them as Flexible relationships. Rigid vs. flexible doesn't impact query performance just processing performance. And if Rigid causes you to do ProcessFull on dimensions that is going to mean you have to reprocess all your measure group partitions. So change relationships to Flexible unless you are 100% sure you never run an UPDATE statement on your dimension table in your ETL.
I would partition by month. Then you can just process the most recent two months every day. To be more explicit:
ProcessUpdate your dimensions
ProcessData the most recent two months of partitions.
ProcessIndexes on your cube (which rebuilds indexes and flexible aggs on older partitions)
I have an SSAS 2008 cube that is being used to house end of day financial data from the stock market. The cube is only processed once a day after the market closes, so it never has any information about the current intraday trading data. I also have a relational database that houses the current intraday trading information for stocks. I am trying to find a way to combine those two data sources so that I can perform calculations such as a 30 day moving average for a stock that is based off of its current price, as well as the previous 29 days of historical data. I am using SSAS Standard edition, so I don't have access to features such as Proactive Caching or multiple partitions to help me process the current data in near real time.
Is there any way that can somehow dynamically include rows from my SQL database into my fact table, for the context of an individual query? Essentially just bring in a small subset of data into the cube temporarily in order to process a certain calculation?
no, you should create a measure group that maps to your OLTP table
You should be able to create a partition for the current days data and specify ROLAP as the storage mode.
To simplify maintenance, I would probably create a view for the fact table and, in the definition, use date functions in the where clause. Something like:
CREATE VIEW CurrentTrades AS
SELECT * FROM factTrades
WHERE TradingDate BETWEEN DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE())) AND DATEADD(dd, 1, DATEDIFF(dd, 0, GETDATE()))
You could then use that view as the data source for the ROLAP partition.
You can incrementally process data for the cube on specific time intervals during the day depending on how long does it take to process new data. ( Of course if delay are acceptable )
It is possible to write your own DLL and call it from within MDX. It's not terribly graceful but I've done it in the past.
Not a great idea for 1000s of rows of data, but if you need less than 100, your function call could pass a value from MDX to the DLL, which can call the SQL database to return the numbers. Then your results get displayed in the cellset alongside the numbers from OLAP.
I've got Dim Tables, Fact Tables, ETL and a cube. I'm now looking to make sure my cube only holds the previous 2 months worth of data. Should this be done by forcing my fact table to hold only 2 months of data and doing a "full process", or is there a way to trim outdated data from my cube?
Your data is already dimensionalized through ETL and you have a cube built on top of it?
And you want to retain the data in the Fact table, but not necessarily need it in the cube for more than the last 2 months?
If you don't even want to retain the data, I would simply purge the fact table by date. Because you're probably going to want that space reclaimed anyway.
But there are also settings in the cube build - or build your cube off dynamic views that only expose the last two months - then the cube (re-)build can be done before you've even purged the underlying fact tables.
You can also look into partitioning by date:
http://www.mssqltips.com/tip.asp?tip=1549
http://www.sqlmag.com/Articles/ArticleID/100645/100645.html?Ad=1