I have an application that is driven by a Microsoft Analysis Services Multidimensional Cube.
Users access the application and load data periodically to the underlying SQL database.
The cube is processed fully over night so users may see updated data the next day.
Users may also kick off a cube partition process via the interface if they need data to be available in the application sooner
The cube processing is executed in the background using SQL Agent jobs with XMLA scripts.
The partition processing happens in two steps:
Dimension processing
Partition processing
The cube process is one step:
Process Full
Recently, I have run into an issue where users may load a significant amount of data, late in the evening and kick off a partition process that then runs at the same time the cube is being processed. Often this isn't an issue, other than longer running processing, but I have run into failures periodically.
The Ask
Using XMLA, is it possible to modify the cube partition script to only run if any part of the cube is not already being processed?
I am aware that I could probably accomplish this with SSIS, but it seems overkill if there is a possibility to use straight XMLA.
Related
We have various SQL jobs for processing SSAS tabular models.
Each job is executed daily, with an interval of the half hour from the previous one.
Currently, we are using the Full Process mode, which consumes a lot of memory and causes some jobs to fail.
Thus, We need to understand how other processing modes work in SSAS Tabular.
What "Process Default" will do?
What "Process Data" will do?
Do Process Data and process default modes update existing data, or only insert new ones?
Process default checks the process state of the object(s) you select for processing and brings them up to a fully processed state. Will rebuild empty data tables or partitions, relationships and hierarchies. The difference between Process Full and Process Default is that Full will drop data, hierarchies and relationships first and the reprocess, Default will just bring them up to date. NB: Default will not process data in an object if data is already there, even if you know it is incomplete.
Process Data will process all the data from the data source but will not process relationships and hierarchies. If you only ever want to process some of the data in your ETL, then you should look at partitioning your data and just processing the partitions. There is no 'merge' aspect to Process Data.
I have a cube that is partitioned by year. I want to do a full process to only the last couple of years, as only data from this period can have changed, been added or deleted. I am unable to figure out how to choose that only certain partitions should be processed. Can anyone help me with this?
Partitions are defined on a Measuregroup. If your entire cube is time-partitioned, that sounds as though every measuregroup that's sliced by the Time dimension is time-partitioned. (You may have only one measuregroup in your cube, though).
You can see partitions in the "Partitions" tab of the Cube design window. You can process partitions here (manually, at the dev stage), or from SSMS, or by generating an XMLA script to be run on a scheduled basis. Essentially what you do is process the partition rather than the entire measuregroup.
If you use SSIS, there is a task called Analysis Services Processing Task. In the Processing Settings of this task you can select the kinds of objects you want to process, including one or more partitions for a table.
This is the error I get from the Log while trying to process a SQL Server 2012 MOLAP Cube.
"Time-out occurred while waiting for buffer latch type 3 for page (1:2044928) database ID 2.; 42000." Source="Microsoft SQL Server 2012 Analysis Services" HelpFile="Error ErrorCode="3240034318" Description="Errors in the OLAP storage engine: An error occurred while processing the 'Measurement' partition of the measure group for the 'PE cube' cube from the Cube database."
I have scripted the processing task in XMLA and execute the processing via a SSAS Command in an Agent Job.
The first step is to Process Update all dimensions and this succeeds, but when I want to Process Data of the cube the load fails and this error pops up.
I first tried processing with an SSIS package, but this caused the whole server to crash instead of just the job failing. This leads me to believe this a performance issue, but the machine running the job is an Azure VM with 16 processors and 112 GB RAM so I don't know where to look. I also tried running the job without any other activities on the server, but that did not help.
The disk containing the SSAS Instance still has 500GB Free.
The measure group is querying a table containing 180 million records.
While processing the cube on a Dev server with way less data there are no issues. I once succeeded to Process Full the whole cube while processing the SSAS cube directly within SSAS, but via DTEXEC, SSISDB or using SSDT the processing results in a server crash.
Earlier I got different time-out errors, but after adjusting the SSAS ExternalCommandTimeOut, ExternalConnectionTimeOut and ForceCommitTimeout properties to 0 this did not occur anymore.
I have tried multiple processing settings, but because I think it is a performance issue I tried to make the processing as low as possible on performance.
Processing Settings:
Object: Cube; Option: Process Data;
Processing Order: Sequential with Seperate Transactions.
Writeback Table Option: Use Existing;
Do not process affected objects.
Update:
I have processed the measure which triggered the error on its own, this did not finish and in the Activity Monitor I saw a lot of Wait_Type IO_Completion and CXPacket. And when querying the sys.dm_exe_requests I see a Select with wait_type IO_Completion which is already running for a long time and a lot of reads.
Last night I tried to process all measurements excluding the measuregroup which triggered the error earlier, but unfortunately the whole server crashed again...
Update2:
We have looked into upgrading to premium storage, but this means we have to switch from A11 to a DS or GS serie. Meaning we need to resize the whole VM which contains live solutions resulting in down-time and effort to restore the VHDS and replacing the current OS disk which contains parts of live solutions.
Another option we identified is applying partitioning or improving the underlying queries from the measures. Unfortunately way more effort than anticipated, a quick work-around for now would help a lot in selling a long-term solution improvement.
Update3:
We have had contact with Microsoft and they advice to migrate from an A11 VM to a D14 V2 and upgrade to premium storage disks. This will be our next step and will be executed upcoming friday. After the migration I will update or close this post.
If you miss information, please let me know. Any suggestions that would help me pin-point the situation would be much appreciated!
The upgrade to a VM better suitable for the situation (DS14 V2) and upgrade to P30 premium storage disks have resolved the occuring issues. The issue was not in the way the cube was being processed or configured, but in the hardware used.
Can the underlying data being used by the SSAS cube be updating while the cube is updating?
We process our cube in full once a week to clean it up (process update and process indexes during the week). However, there is a demand to process the data in full more than once. The data warehouse also has daily jobs to update data and our full cube process takes 24 hours. Currently we stage our daily updates after their jobs and the full cube processing is done in a way to avoid colliding with their data load jobs. But, if we are to meet the demand of processing the data more than once, we would run into times when the data warehouse is updating.
Does this just cause the cube processing to take longer as it waits for the underlying data changes to stop? Or, does it grab a snapshot as it goes?
Thank you!
The default is just standard read-locks. You can verify this in the Datasource for the cube - It'll probably say "Read Committed" for the isolation level. This means it'll take locks and release them as it reads. If data is modified after the read starts, it may be included in the cube process if that row hasn't been read yet.
Have you considered either snapshot isolation, or setting the database into Read Committed Snapshot mode? I did the latter with my DW and haven't looked back. I have my cube process incrementally after regular ETL loads, and with RCS I can also do SQL queries against the DW while the ETL is loading (Readers don't block writers).
I am having a strange problem when building a cube on SSAS. I have a fact table, let's say FactActivity. Then I have a dimension DimActivity, which has a 1 to 1 relationship with this fact, and all the foreign keys are bound to the dimension. So date dimensions, product dimensions and so on, are all bound to the DimActivity.
When I build the whole cube, it seems it is building the fact before the dimension, therefore it gives me errors. If I however, manually build the dimension before the fact, it works.
Is there anywhere in the SSAS that I can configure the build order, other than doing this from SSIS with the use of the Analysis Services Processing Task?
Many thanks!
Processing a cube will not process the dimensions it relates to because they are constructed as separate entities in SSAS. In practice, this means that a dimension can exist, be processed and accessed without a relationship to a cube.
There is no such thing as a "general build order to configure". It is up to you to decide how AS objects should be processed. There are many tools that facilitate this, and they will all do the same thing: construct XMLA scripts to run on the AS server.
SSIS: Analysis Services Processing task
Configure a SQL agent job.
Perform a manual process using SSMS.
Program your processing activities using AMO
...
Important is that you should process your dimensions before you process your cube. A simple solution is to process the entire SSAS database (containing your cubes and dimensions). This way, SSAS will automatically process the dimensions before processing the cubes.
Documentation on processing Analysis Services objects
When Processing a Dimension or the whole cube, before you click 'Run', click the 'Change Settings...' button. There you can change the way it should process. This link describes the effect of the options available.
http://technet.microsoft.com/en-us/library/ms174774.aspx
HTH
For others who are encountering similar problems....
The reason I was getting occasionally cube processing errors, is that the refreshing was happening at the same time - due to scheduled hourly imports.
I am now using logs to see what SSIS package is running. When importing activity, I inserted a record into this table, with a "Running" status.
Before processing the cube, I have a semaphore to check if records in this table, which are data imports and have a "Running" status. I only allow the refresh of the cube to happen if no imports are currently running. When the cube is processing, the imports also have a semaphore, and will not start importing, unless no cube processing is currently "Running".
After implementing this logic, I've never gotten any errors when processing the cubes.