How different process modes in SSAS tabular work? - sql

We have various SQL jobs for processing SSAS tabular models.
Each job is executed daily, with an interval of the half hour from the previous one.
Currently, we are using the Full Process mode, which consumes a lot of memory and causes some jobs to fail.
Thus, We need to understand how other processing modes work in SSAS Tabular.
What "Process Default" will do?
What "Process Data" will do?
Do Process Data and process default modes update existing data, or only insert new ones?

Process default checks the process state of the object(s) you select for processing and brings them up to a fully processed state. Will rebuild empty data tables or partitions, relationships and hierarchies. The difference between Process Full and Process Default is that Full will drop data, hierarchies and relationships first and the reprocess, Default will just bring them up to date. NB: Default will not process data in an object if data is already there, even if you know it is incomplete.
Process Data will process all the data from the data source but will not process relationships and hierarchies. If you only ever want to process some of the data in your ETL, then you should look at partitioning your data and just processing the partitions. There is no 'merge' aspect to Process Data.

Related

Check and process a SSAS cube when it's not already processing

I have an application that is driven by a Microsoft Analysis Services Multidimensional Cube.
Users access the application and load data periodically to the underlying SQL database.
The cube is processed fully over night so users may see updated data the next day.
Users may also kick off a cube partition process via the interface if they need data to be available in the application sooner
The cube processing is executed in the background using SQL Agent jobs with XMLA scripts.
The partition processing happens in two steps:
Dimension processing
Partition processing
The cube process is one step:
Process Full
Recently, I have run into an issue where users may load a significant amount of data, late in the evening and kick off a partition process that then runs at the same time the cube is being processed. Often this isn't an issue, other than longer running processing, but I have run into failures periodically.
The Ask
Using XMLA, is it possible to modify the cube partition script to only run if any part of the cube is not already being processed?
I am aware that I could probably accomplish this with SSIS, but it seems overkill if there is a possibility to use straight XMLA.

Process only certain partitions of cube

I have a cube that is partitioned by year. I want to do a full process to only the last couple of years, as only data from this period can have changed, been added or deleted. I am unable to figure out how to choose that only certain partitions should be processed. Can anyone help me with this?
Partitions are defined on a Measuregroup. If your entire cube is time-partitioned, that sounds as though every measuregroup that's sliced by the Time dimension is time-partitioned. (You may have only one measuregroup in your cube, though).
You can see partitions in the "Partitions" tab of the Cube design window. You can process partitions here (manually, at the dev stage), or from SSMS, or by generating an XMLA script to be run on a scheduled basis. Essentially what you do is process the partition rather than the entire measuregroup.
If you use SSIS, there is a task called Analysis Services Processing Task. In the Processing Settings of this task you can select the kinds of objects you want to process, including one or more partitions for a table.

SSAS update the cube while the underlying data is updating

Can the underlying data being used by the SSAS cube be updating while the cube is updating?
We process our cube in full once a week to clean it up (process update and process indexes during the week). However, there is a demand to process the data in full more than once. The data warehouse also has daily jobs to update data and our full cube process takes 24 hours. Currently we stage our daily updates after their jobs and the full cube processing is done in a way to avoid colliding with their data load jobs. But, if we are to meet the demand of processing the data more than once, we would run into times when the data warehouse is updating.
Does this just cause the cube processing to take longer as it waits for the underlying data changes to stop? Or, does it grab a snapshot as it goes?
Thank you!
The default is just standard read-locks. You can verify this in the Datasource for the cube - It'll probably say "Read Committed" for the isolation level. This means it'll take locks and release them as it reads. If data is modified after the read starts, it may be included in the cube process if that row hasn't been read yet.
Have you considered either snapshot isolation, or setting the database into Read Committed Snapshot mode? I did the latter with my DW and haven't looked back. I have my cube process incrementally after regular ETL loads, and with RCS I can also do SQL queries against the DW while the ETL is loading (Readers don't block writers).

Building whole SSAS cube does not work, building dimension by dimension works - build order?

I am having a strange problem when building a cube on SSAS. I have a fact table, let's say FactActivity. Then I have a dimension DimActivity, which has a 1 to 1 relationship with this fact, and all the foreign keys are bound to the dimension. So date dimensions, product dimensions and so on, are all bound to the DimActivity.
When I build the whole cube, it seems it is building the fact before the dimension, therefore it gives me errors. If I however, manually build the dimension before the fact, it works.
Is there anywhere in the SSAS that I can configure the build order, other than doing this from SSIS with the use of the Analysis Services Processing Task?
Many thanks!
Processing a cube will not process the dimensions it relates to because they are constructed as separate entities in SSAS. In practice, this means that a dimension can exist, be processed and accessed without a relationship to a cube.
There is no such thing as a "general build order to configure". It is up to you to decide how AS objects should be processed. There are many tools that facilitate this, and they will all do the same thing: construct XMLA scripts to run on the AS server.
SSIS: Analysis Services Processing task
Configure a SQL agent job.
Perform a manual process using SSMS.
Program your processing activities using AMO
...
Important is that you should process your dimensions before you process your cube. A simple solution is to process the entire SSAS database (containing your cubes and dimensions). This way, SSAS will automatically process the dimensions before processing the cubes.
Documentation on processing Analysis Services objects
When Processing a Dimension or the whole cube, before you click 'Run', click the 'Change Settings...' button. There you can change the way it should process. This link describes the effect of the options available.
http://technet.microsoft.com/en-us/library/ms174774.aspx
HTH
For others who are encountering similar problems....
The reason I was getting occasionally cube processing errors, is that the refreshing was happening at the same time - due to scheduled hourly imports.
I am now using logs to see what SSIS package is running. When importing activity, I inserted a record into this table, with a "Running" status.
Before processing the cube, I have a semaphore to check if records in this table, which are data imports and have a "Running" status. I only allow the refresh of the cube to happen if no imports are currently running. When the cube is processing, the imports also have a semaphore, and will not start importing, unless no cube processing is currently "Running".
After implementing this logic, I've never gotten any errors when processing the cubes.

Processing partitions takes longer than processing entire database

I have a Tabular Model cube where I have split the tables into partitions to make processing more efficient.
When I Process Full the daily partition only, it takes 2h 45m. However, when I Process Full the entire database (that includes daily and historical data), it takes 1h 10m.
Anyone know what can be causing this?
Thanks!
ProcessFull within a Tabular model basically is a combination of ProcessData (grab the data from the source, build dictionaries, etc.) and ProcessReCalc (build up indexes, attribute hierarchies, etc.). While the ProcessData is only grabbing the most recent data (i.e. the data for the partition), the ProcessReCalc itself needs to be executed on the entire database. A good reference is Cathy Dumas' blog post: http://cathydumas.com/2012/01/25/processing-data-transactionally-in-amo/
To get to the cause of the processing, best to dig into the profiler traces / logs to determine what actions are taking a very long time for the processing to complete. By any chance is your data something that has a lot of repeating set of data such as audit logs? It may be possible that its faster to do the entire database (vs. a single partition) because it's able to more efficiently compress and organize the data because the repeated data can be better compressed thus taking up less memory. A potential way to check this is to see what the model size is after running ProcessFull on the partition vs. running it on the entire database. If it is true, the latter processing will result in a smaller sized database.
HTH!