I have a cube that is partitioned by year. I want to do a full process to only the last couple of years, as only data from this period can have changed, been added or deleted. I am unable to figure out how to choose that only certain partitions should be processed. Can anyone help me with this?
Partitions are defined on a Measuregroup. If your entire cube is time-partitioned, that sounds as though every measuregroup that's sliced by the Time dimension is time-partitioned. (You may have only one measuregroup in your cube, though).
You can see partitions in the "Partitions" tab of the Cube design window. You can process partitions here (manually, at the dev stage), or from SSMS, or by generating an XMLA script to be run on a scheduled basis. Essentially what you do is process the partition rather than the entire measuregroup.
If you use SSIS, there is a task called Analysis Services Processing Task. In the Processing Settings of this task you can select the kinds of objects you want to process, including one or more partitions for a table.
Related
I have an application that is driven by a Microsoft Analysis Services Multidimensional Cube.
Users access the application and load data periodically to the underlying SQL database.
The cube is processed fully over night so users may see updated data the next day.
Users may also kick off a cube partition process via the interface if they need data to be available in the application sooner
The cube processing is executed in the background using SQL Agent jobs with XMLA scripts.
The partition processing happens in two steps:
Dimension processing
Partition processing
The cube process is one step:
Process Full
Recently, I have run into an issue where users may load a significant amount of data, late in the evening and kick off a partition process that then runs at the same time the cube is being processed. Often this isn't an issue, other than longer running processing, but I have run into failures periodically.
The Ask
Using XMLA, is it possible to modify the cube partition script to only run if any part of the cube is not already being processed?
I am aware that I could probably accomplish this with SSIS, but it seems overkill if there is a possibility to use straight XMLA.
We have various SQL jobs for processing SSAS tabular models.
Each job is executed daily, with an interval of the half hour from the previous one.
Currently, we are using the Full Process mode, which consumes a lot of memory and causes some jobs to fail.
Thus, We need to understand how other processing modes work in SSAS Tabular.
What "Process Default" will do?
What "Process Data" will do?
Do Process Data and process default modes update existing data, or only insert new ones?
Process default checks the process state of the object(s) you select for processing and brings them up to a fully processed state. Will rebuild empty data tables or partitions, relationships and hierarchies. The difference between Process Full and Process Default is that Full will drop data, hierarchies and relationships first and the reprocess, Default will just bring them up to date. NB: Default will not process data in an object if data is already there, even if you know it is incomplete.
Process Data will process all the data from the data source but will not process relationships and hierarchies. If you only ever want to process some of the data in your ETL, then you should look at partitioning your data and just processing the partitions. There is no 'merge' aspect to Process Data.
Can the underlying data being used by the SSAS cube be updating while the cube is updating?
We process our cube in full once a week to clean it up (process update and process indexes during the week). However, there is a demand to process the data in full more than once. The data warehouse also has daily jobs to update data and our full cube process takes 24 hours. Currently we stage our daily updates after their jobs and the full cube processing is done in a way to avoid colliding with their data load jobs. But, if we are to meet the demand of processing the data more than once, we would run into times when the data warehouse is updating.
Does this just cause the cube processing to take longer as it waits for the underlying data changes to stop? Or, does it grab a snapshot as it goes?
Thank you!
The default is just standard read-locks. You can verify this in the Datasource for the cube - It'll probably say "Read Committed" for the isolation level. This means it'll take locks and release them as it reads. If data is modified after the read starts, it may be included in the cube process if that row hasn't been read yet.
Have you considered either snapshot isolation, or setting the database into Read Committed Snapshot mode? I did the latter with my DW and haven't looked back. I have my cube process incrementally after regular ETL loads, and with RCS I can also do SQL queries against the DW while the ETL is loading (Readers don't block writers).
I am having a strange problem when building a cube on SSAS. I have a fact table, let's say FactActivity. Then I have a dimension DimActivity, which has a 1 to 1 relationship with this fact, and all the foreign keys are bound to the dimension. So date dimensions, product dimensions and so on, are all bound to the DimActivity.
When I build the whole cube, it seems it is building the fact before the dimension, therefore it gives me errors. If I however, manually build the dimension before the fact, it works.
Is there anywhere in the SSAS that I can configure the build order, other than doing this from SSIS with the use of the Analysis Services Processing Task?
Many thanks!
Processing a cube will not process the dimensions it relates to because they are constructed as separate entities in SSAS. In practice, this means that a dimension can exist, be processed and accessed without a relationship to a cube.
There is no such thing as a "general build order to configure". It is up to you to decide how AS objects should be processed. There are many tools that facilitate this, and they will all do the same thing: construct XMLA scripts to run on the AS server.
SSIS: Analysis Services Processing task
Configure a SQL agent job.
Perform a manual process using SSMS.
Program your processing activities using AMO
...
Important is that you should process your dimensions before you process your cube. A simple solution is to process the entire SSAS database (containing your cubes and dimensions). This way, SSAS will automatically process the dimensions before processing the cubes.
Documentation on processing Analysis Services objects
When Processing a Dimension or the whole cube, before you click 'Run', click the 'Change Settings...' button. There you can change the way it should process. This link describes the effect of the options available.
http://technet.microsoft.com/en-us/library/ms174774.aspx
HTH
For others who are encountering similar problems....
The reason I was getting occasionally cube processing errors, is that the refreshing was happening at the same time - due to scheduled hourly imports.
I am now using logs to see what SSIS package is running. When importing activity, I inserted a record into this table, with a "Running" status.
Before processing the cube, I have a semaphore to check if records in this table, which are data imports and have a "Running" status. I only allow the refresh of the cube to happen if no imports are currently running. When the cube is processing, the imports also have a semaphore, and will not start importing, unless no cube processing is currently "Running".
After implementing this logic, I've never gotten any errors when processing the cubes.
We have an Analysis Services cube that needs to be as real-time as possible. It's a relatively small cube that currently takes a couple of seconds to process.
Are there any guidelines for this? I'm curious what other folks are doing.
Also, what would be the impact of processing the cube too frequently? Would the main concern be the load on the SSAS server and the source DB? In our case it would be fairly nominal. How would SSAS clients be affected? Current SSAS consumers are Excel, PerformancePoint, and Sharepoint/Excel Services.
I would say the first issue you'd have to consider is how much is this cube going to grow over time? If it is constantly updated and processed that couple seconds could quickly turn into 20 minutes.
For example, we currently have a cube that has 20 million rows (probably more now hehe) with financial data related to hospital billing and charges that takes about 20 mins to process and we do it once a day in the morning. Depending on the time of the year we sometimes do process during the day again but there have been no complaints as long as we notify people we are doing this.
Have you considered a real-time (ROLAP) partition to store the current day's data? This way, you get the performance of MOLAP for all your data prior to the current day, which you can process nightly, but have ROLAP's low latency for the data collected since the last cube process.
If your cube is small enough, you could even stretch that to be the current week's data, or more.
As far as the disadvantages of processing frequently, check out the below article, which says: "If the processing job succeeds, an exclusive lock is put on the object when changes are being committed, which means the object is temporarily unavailable for query or processing. During the commit phase of the transaction, queries can still be sent to the object, but they will be queued until the commit is completed."
http://technet.microsoft.com/en-us/library/ms174860.aspx
So your users will see an impact in query performance.
It may be that you have to 'put it out there' and track how it performs.
Once you can see how people are using the cube, you can determine if constant reprocessing is really necessary and if it is, you may have to optimise how this occurs.
Spcifically using "usage based optimisation" as described here:
http://www.databasejournal.com/features/mssql/article.php/3575751/Usage-Based-Optimization-in-Analysis-Services-2005.htm