We have an Analysis Services cube that needs to be as real-time as possible. It's a relatively small cube that currently takes a couple of seconds to process.
Are there any guidelines for this? I'm curious what other folks are doing.
Also, what would be the impact of processing the cube too frequently? Would the main concern be the load on the SSAS server and the source DB? In our case it would be fairly nominal. How would SSAS clients be affected? Current SSAS consumers are Excel, PerformancePoint, and Sharepoint/Excel Services.
I would say the first issue you'd have to consider is how much is this cube going to grow over time? If it is constantly updated and processed that couple seconds could quickly turn into 20 minutes.
For example, we currently have a cube that has 20 million rows (probably more now hehe) with financial data related to hospital billing and charges that takes about 20 mins to process and we do it once a day in the morning. Depending on the time of the year we sometimes do process during the day again but there have been no complaints as long as we notify people we are doing this.
Have you considered a real-time (ROLAP) partition to store the current day's data? This way, you get the performance of MOLAP for all your data prior to the current day, which you can process nightly, but have ROLAP's low latency for the data collected since the last cube process.
If your cube is small enough, you could even stretch that to be the current week's data, or more.
As far as the disadvantages of processing frequently, check out the below article, which says: "If the processing job succeeds, an exclusive lock is put on the object when changes are being committed, which means the object is temporarily unavailable for query or processing. During the commit phase of the transaction, queries can still be sent to the object, but they will be queued until the commit is completed."
http://technet.microsoft.com/en-us/library/ms174860.aspx
So your users will see an impact in query performance.
It may be that you have to 'put it out there' and track how it performs.
Once you can see how people are using the cube, you can determine if constant reprocessing is really necessary and if it is, you may have to optimise how this occurs.
Spcifically using "usage based optimisation" as described here:
http://www.databasejournal.com/features/mssql/article.php/3575751/Usage-Based-Optimization-in-Analysis-Services-2005.htm
Related
I'm trying to make sense of Process Default behavior on SSAS 2017 Enterprise Edition.
My cube is processed daily in this standard sequence:
Loop through 30 dimensions and performing Process Add or Process Update as required.
Process approximately 80 partitions for the previous day.
Exec a Process Default as the final step.
Everything works just fine, and for the amount of data involved, performs really well. However I have observed that after the process default completes, if I re-run the process default step manually (with no other activity having occurred whatsoever), it will take exactly the same time as the first run.
My understanding was that this step basically scans the cube looking for unprocessed objects and will process any objects found to be unprocessed. Given the flow of dimension processing, and subsequent partition processing, I'd certainly expect some objects to be unprocessed on the first run - particularly aggregations and indexes.
The end to end processing time is around 65 mins, but 10 mins of this is the final process default step.
What would explain this is that if the process default isn't actually finding anything to do, and the elapsed time is the cost of scanning the meta data. Firstly it seems an excessive amount of time, but also if I don't run the step, the cube doesn't come online, which suggests it is definitely doing something.
I've had a trawl through Profiler to try to find events to capture what process default is doing, but I'm not able to find anything that would capture the event specifically. I've also monitored the server performance during the step, and nothing is under any real load.
Any suggestions or clarifications..?
In our company we have "office Monday", that means every office/shop/department (circa 2000+ distinct user) should generate their reports, especially shops (SSRS with connection to tabular 1500 compatibility level). We are facing very high resource usage in 3+ hours range (CPU 100% - multiple cores) and the session queue is growing up and never flush. A report that takes 2 minutes off-peak can take more than an hour due to overload. We have on-premise machine. For the rest of the week problem, didn't occured (workload is 10 time lower, usage of CPU in peak is less than 30%).
Unfortunately, from a business point of view, we cannot spread the load over the remaining days of the week. We also have no influence on how many users will run the reports at a given time (load distribution throughout the day).
What we have tried already:
rewrite queries in reports from old MDX to Dax (always checking the performance of single query with Serving Timing in Dax Studio)
rewrite measures to less expenssive
Tuning our model (for example. change to the less consuming datatype, removing unused columns)
We can't migrate this model to Azure.
We can't make any hardware changes on this machine.
Maybe we can change some server properties? Model properties? Connections properties?
Can we manipulate for which reports / queries Tabular should keep the cache if out of resources? For example, for a group of store reports which we know will generate many similar inquiries (e.g. only the store number will change)
Any advices?
If you are reporting for the previous week could you automate ssrs to output the reports on Sunday night?
does a simple-recovery database records transaction logs when selected from a full-recovery database? I mean, we have a full-recovery database, and it records too much transaction logs, causing its size to grow.
my question is, does the simple recovery still has does its minimal logging even if the data are selected from a full-recovery model database? thank you!
One thing has nothing to do with the other. Where the data comes from does not affect logging of changes to the tables in the db it's going to.
However as Martin Smith pointed out this is solving a symptom, there's naff all point in having full recovery mode on if you (they??) aren't backing up the transaction logs frequently enough to make the overhead useful. Whole point of them, aside from restoring up to particular transaction in the event of some catastrophy in your applications is speed and granularity.
Please read the MSDN page for recovery models.
http://msdn.microsoft.com/en-us/library/ms189275.aspx
Here is a quick summary from MSDN.
1 - Simple model - Automatically reclaims log space to keep space requirements small,
essentially eliminating the need to manage the transaction log space
2 - Bulk Copy model -
An adjunct of the full recovery model that permits high-performance
bulk copy operations.
**The first two do not support point in time recovery!**
3 - Full model - Can recover to an arbitrary point in time
(for example, prior to application or user error).
If no tail log backup possible, recover to last log backup.
So your problem is with either log usage or log backups.
A - Are you deleting from temporary tables instead of truncating?
http://msdn.microsoft.com/en-us/library/ms177570.aspx The delete operating will log each row in the transaction log.
B - Are you inserting large amounts of data via a ETL job? Each insert will get logged in the T-Log.
If you use bulk copy and ETL that support (fast data loads), it will be minimally logged.
However, page density and fill factor come into play when determining the size of the T-LOG.
http://blogs.msdn.com/b/sqlserverfaq/archive/2011/01/07/using-bulk-logged-recovery-model-for-bulk-operations-will-reduce-the-size-of-transaction-log-backups-myths-and-truths.aspx
C - How often are you taking transaction log backups? After each backup, the T-LOG space can be reused. Resulting in overall smaller size.
D - How fragmented is the T-LOG? I suggest reducing and re-growing the log during a maintenance period. A 20% log to data ratio with hourly backups worked fine at my old company. It all depends on how many changes you are making. http://craftydba.com/?p=3374
In summary, these are the places you should be looking at. Not the old data in the system since it is probably not being modified.
Moving the old data to a read only reporting database so that ADHOC queries from novice T-SQL users might not be a bad idea. But that solves other problems, possible BLOCKING and DEADLOCKS in your OLTP database.
Can the underlying data being used by the SSAS cube be updating while the cube is updating?
We process our cube in full once a week to clean it up (process update and process indexes during the week). However, there is a demand to process the data in full more than once. The data warehouse also has daily jobs to update data and our full cube process takes 24 hours. Currently we stage our daily updates after their jobs and the full cube processing is done in a way to avoid colliding with their data load jobs. But, if we are to meet the demand of processing the data more than once, we would run into times when the data warehouse is updating.
Does this just cause the cube processing to take longer as it waits for the underlying data changes to stop? Or, does it grab a snapshot as it goes?
Thank you!
The default is just standard read-locks. You can verify this in the Datasource for the cube - It'll probably say "Read Committed" for the isolation level. This means it'll take locks and release them as it reads. If data is modified after the read starts, it may be included in the cube process if that row hasn't been read yet.
Have you considered either snapshot isolation, or setting the database into Read Committed Snapshot mode? I did the latter with my DW and haven't looked back. I have my cube process incrementally after regular ETL loads, and with RCS I can also do SQL queries against the DW while the ETL is loading (Readers don't block writers).
I'm a newbie to SSAS. I have a database which has an agreement table in which the status of the agreements changes over time. This is stored in the agreement log. the status can be any combination over an extended period of time. One set of questions I will need to answet are how many agreements are of a given status and also to show trends in the status over time. I'm reading Kimball and periodic snapshot seems to be the best fit but I'm at a loss how to design the fact table. Do I preaggregate the data into periods broken down by status? And then how do I manipulate it in SSAS and how do aggregations work as it's more like a bank balance. I sort of get some of the concepts but I'm still pretty confused.
Agreed, this is a good case for Periodic Snapshot.
In this case, you need a status dimension,
and a fact with a period indicator. Your reports will also need to filter on the period.
ETL is a bit more complicated, as during the current period, you clear down and reload the current period data. Previous periods to the current one are fixed. Obviously, you lose visibility on statuses that change multiple times within a period, so the period should be chosen based on how quickly the data changes as well as how often its reported. This is also why Periodic Snapshots are often used in conjunction with transaction fact tables