Can the underlying data being used by the SSAS cube be updating while the cube is updating?
We process our cube in full once a week to clean it up (process update and process indexes during the week). However, there is a demand to process the data in full more than once. The data warehouse also has daily jobs to update data and our full cube process takes 24 hours. Currently we stage our daily updates after their jobs and the full cube processing is done in a way to avoid colliding with their data load jobs. But, if we are to meet the demand of processing the data more than once, we would run into times when the data warehouse is updating.
Does this just cause the cube processing to take longer as it waits for the underlying data changes to stop? Or, does it grab a snapshot as it goes?
Thank you!
The default is just standard read-locks. You can verify this in the Datasource for the cube - It'll probably say "Read Committed" for the isolation level. This means it'll take locks and release them as it reads. If data is modified after the read starts, it may be included in the cube process if that row hasn't been read yet.
Have you considered either snapshot isolation, or setting the database into Read Committed Snapshot mode? I did the latter with my DW and haven't looked back. I have my cube process incrementally after regular ETL loads, and with RCS I can also do SQL queries against the DW while the ETL is loading (Readers don't block writers).
Related
I have an application that is driven by a Microsoft Analysis Services Multidimensional Cube.
Users access the application and load data periodically to the underlying SQL database.
The cube is processed fully over night so users may see updated data the next day.
Users may also kick off a cube partition process via the interface if they need data to be available in the application sooner
The cube processing is executed in the background using SQL Agent jobs with XMLA scripts.
The partition processing happens in two steps:
Dimension processing
Partition processing
The cube process is one step:
Process Full
Recently, I have run into an issue where users may load a significant amount of data, late in the evening and kick off a partition process that then runs at the same time the cube is being processed. Often this isn't an issue, other than longer running processing, but I have run into failures periodically.
The Ask
Using XMLA, is it possible to modify the cube partition script to only run if any part of the cube is not already being processed?
I am aware that I could probably accomplish this with SSIS, but it seems overkill if there is a possibility to use straight XMLA.
We have various SQL jobs for processing SSAS tabular models.
Each job is executed daily, with an interval of the half hour from the previous one.
Currently, we are using the Full Process mode, which consumes a lot of memory and causes some jobs to fail.
Thus, We need to understand how other processing modes work in SSAS Tabular.
What "Process Default" will do?
What "Process Data" will do?
Do Process Data and process default modes update existing data, or only insert new ones?
Process default checks the process state of the object(s) you select for processing and brings them up to a fully processed state. Will rebuild empty data tables or partitions, relationships and hierarchies. The difference between Process Full and Process Default is that Full will drop data, hierarchies and relationships first and the reprocess, Default will just bring them up to date. NB: Default will not process data in an object if data is already there, even if you know it is incomplete.
Process Data will process all the data from the data source but will not process relationships and hierarchies. If you only ever want to process some of the data in your ETL, then you should look at partitioning your data and just processing the partitions. There is no 'merge' aspect to Process Data.
I have a cube that is partitioned by year. I want to do a full process to only the last couple of years, as only data from this period can have changed, been added or deleted. I am unable to figure out how to choose that only certain partitions should be processed. Can anyone help me with this?
Partitions are defined on a Measuregroup. If your entire cube is time-partitioned, that sounds as though every measuregroup that's sliced by the Time dimension is time-partitioned. (You may have only one measuregroup in your cube, though).
You can see partitions in the "Partitions" tab of the Cube design window. You can process partitions here (manually, at the dev stage), or from SSMS, or by generating an XMLA script to be run on a scheduled basis. Essentially what you do is process the partition rather than the entire measuregroup.
If you use SSIS, there is a task called Analysis Services Processing Task. In the Processing Settings of this task you can select the kinds of objects you want to process, including one or more partitions for a table.
every day we have an SSIS package running to import data into a database.
sometimes people are querying the database at the same time.
the loading (data import) times out because there's a table lock on the specific table.
what is the standard protocol on inserting data and querying data at the same time?
First you need to figure out where those locks are coming from. Use the link to see if there are any locks.
How to: Determine Which Queries Are Holding Locks
If you have another process that holds a table lock then not much you can do.
Are you sure the error is "not able to OBTAIN a table lock". If so look at changing your SSIS package to not use table locks.
There are several strategies.
One approach is to design your ETL pipeline as to minimize lock time. All the data is prepared in staging tables and then, when complete, is switched in using fast partition switch operations, see Transferring Data Efficiently by Using Partition Switching. This way the ETL blocks reads onyl for a very short duration. It also has the advantage that the reads see all the ETL data at once, not intermediate stages. The drawback is difficult implementation.
Another approach is to enable snapshot isolation and/or read committed snapshot in the database, see Row Versioning-based Isolation Levels in the Database Engine. This way reads no longer block behind the locks held by the ETL. the drawback is resource consumption, the hardware must be able to drive the additional load of row versioning.
Yet another approach is to move the data querying to a read-only standby server, eg. using log shipping.
We have an Analysis Services cube that needs to be as real-time as possible. It's a relatively small cube that currently takes a couple of seconds to process.
Are there any guidelines for this? I'm curious what other folks are doing.
Also, what would be the impact of processing the cube too frequently? Would the main concern be the load on the SSAS server and the source DB? In our case it would be fairly nominal. How would SSAS clients be affected? Current SSAS consumers are Excel, PerformancePoint, and Sharepoint/Excel Services.
I would say the first issue you'd have to consider is how much is this cube going to grow over time? If it is constantly updated and processed that couple seconds could quickly turn into 20 minutes.
For example, we currently have a cube that has 20 million rows (probably more now hehe) with financial data related to hospital billing and charges that takes about 20 mins to process and we do it once a day in the morning. Depending on the time of the year we sometimes do process during the day again but there have been no complaints as long as we notify people we are doing this.
Have you considered a real-time (ROLAP) partition to store the current day's data? This way, you get the performance of MOLAP for all your data prior to the current day, which you can process nightly, but have ROLAP's low latency for the data collected since the last cube process.
If your cube is small enough, you could even stretch that to be the current week's data, or more.
As far as the disadvantages of processing frequently, check out the below article, which says: "If the processing job succeeds, an exclusive lock is put on the object when changes are being committed, which means the object is temporarily unavailable for query or processing. During the commit phase of the transaction, queries can still be sent to the object, but they will be queued until the commit is completed."
http://technet.microsoft.com/en-us/library/ms174860.aspx
So your users will see an impact in query performance.
It may be that you have to 'put it out there' and track how it performs.
Once you can see how people are using the cube, you can determine if constant reprocessing is really necessary and if it is, you may have to optimise how this occurs.
Spcifically using "usage based optimisation" as described here:
http://www.databasejournal.com/features/mssql/article.php/3575751/Usage-Based-Optimization-in-Analysis-Services-2005.htm