I want to create KPI for financial mart. I can create KPI in SSAS and I can calculate KPI in ETL process. Which way has better performance?
Performance wise its always good to do most of the processing at ETL level so that when browsing the cube less processing is involved, and users can get results faster, On the other hand having columns in fact table will increase database size, and cube processing time. It all depends on your business case, see link for more insights.
https://social.msdn.microsoft.com/forums/sqlserver/en-US/6a3d2a3f-8a32-4ce5-8828-2998ccea6ab1/best-practice-for-calculations
Related
I have SQL 2012 tabular and multidimensional models which is currently been processed through SQL jobs. All the models are processing with 'Process Full' Option everyday. However some of the models are taking long time for processing. Can anyone teel which is the best processing option that will not affect the performance of the SQL instance.
Without taking a look at you DB is hard to know but maybe I can give you a couple of hints:
It depends on how the data in DB is being updated. If all the fact table data is deleted and inserted every night, Process Full is probably the best way to go. But maybe you can partition data by date and proccess only the affected partitions.
On a multidimensional model you should check if aggregations are taking to much time. If so, you should consider redesign them.
On tabular models I found that some times having unnecesary big varchar fields can take huge amounts of time and memory to proccess. I found Kasper de Jonge's Server memory Analysis very helpful on identifying this kind of problems:
http://www.powerpivotblog.nl/what-is-using-all-that-memory-on-my-analysis-server-instance/bismservermemoryreport/
I am curious about what "reporting cubes" are and how they relate to Oracle SQL ?
I read that they are similar to V-Lookup in Excel, but I'm not understanding much else.
thanks !
They're rather more than that! A Cube is an Online Analytical Processing (OLAP) database, as opposed to a normal DB which is an Online Transaction Processing (OLTP) DB. It's a database optimised for reporting - many times faster than querying an OLTP database. For example, I had a DB which took users up to 2 hours to get reports out. We put the data in an OLAP cube and the queries took less than 10 seconds.
This Wikipedia article is a reasonable place to start.
Note that most OLAP databases will not be updated in real time as the OLTP db is updated, but will have to have extracts made on a regular basis. Also, designing an OLAP db is not like designing an OLTP one. You need to analyse the queries the users are going to want, and split your data into Fact tables (the base data which is being reported) and Dimensions (how the users will want the data selected selected or summed). Not too difficult once you get your head round the idea, though.
When attempting to do some benchmarking type reports, I run into the issue of extreme slowness due to the amount of data residing in the database, and this will get incrementally worse. I'm curious of what would be considered the best approach for reports that show for example a percentage of patients entering the hospital within a certain date range that were there due to a specific condition, as well as how that particular hospital compares to the state percentage and also the national percentage. Of course this is all based on the hospitals whose data resides in the database. I have just been writing stored procedures to calculate these percentages, but I know this isn't the best approach. I'm curious how other more experienced reporting professionals would tackle this. I'm currently using SSRS for reporting. I know a little about SSAS, but not enough to know if I should consider it for this type of reporting.
This all depends on the data-structure and the kind of calculations you have to do.
You try to narrow down the amount of data you have to process and the complexity of operations in every possible way
If you have lots of data on a slow system you first try to select the needed data, transfer it to the calculation point and then keep it cached as long as you can.
If you have huge amounts of data you try to preprocess it as much as you can. E.g. for datawarehouses you have a datetime-table with year/month/day/day-of-week/week-of-year etc in it and just constraints to them in the other tables. Like this you can avoid timeconsuming calculations.
If the operations are complex you have to analyze them to make them simpler/faster but on this point it is impossible to predict how much (if at all) there is some room.
It all depends on your understanding of the data-structure and processes you need them for, in order to improve everything as much as you can.
I myself haven't worked with SSAS yet but this is also a great tool but (imho) more for lots of different analysis.
I'm new to Analysis Services
My first cube has been deployed and it seems to work.
Dimension tables are ok and fact tables are ok.
My question is very simple : If I add a new record in the related datasource table,
Browsing the cube, I don't see the new record until process again the cube.
In my mind I think if new records are addedd, then cube must reflect the changes.
How to solve this issue? Do I need to reprocess the cube every time a new record is added? This is impossible of course.
You understand that essentially your cube represents a bunch of aggregated measures? That means that when the cube is processed it looks at all the data that is in your fact tables and processes the Measures (according to the dimensions).
The result of this is that you're able to access the data in the cube quickly and efficiently. The downside is as you have mentioned is that when new data is added to the fact table the cube isn't updated.
Typically there will be a daily batch job that will update the cube with the latest fact data, depending on the amount of data you have and the "real-time" requirements this could be done more than once p/day. A lot of people do this out of hours.
If you look closely in BIDS you will notice on the Partitions tab that for each partition it has a Storage Mode which you can define.
I would recommend you read this this article http://sqlblog.com/blogs/jorg_klein/archive/2008/03/27/ssas-molap-rolap-and-holap-storage-types.aspx
Basically, there are a few different modes you can use:
MOLAP (Multi dimensional Online Analytical Processing)
MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. Data AND aggregations are stored in optimized format in the cube. The data inside the cube will refresh only when the cube is processed, so latency is high.
ROLAP (Relational Online Analytical Processing)
ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube.
Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage.
HOLAP (Hybrid Online Analytical Processing)
HOLAP is a storage type between MOLAP and ROLAP. Data will be stored in relational format(ROLAP), so there will also be zero latency with this storage type.
Aggregations, on the other hand, are stored in multi dimensional format(MOLAP) in the cube to give better query performance. SSAS will listen to notifications from the source relational database, when changes are made, SSAS will get a notification and will process the aggregations again.
With this mode it’s possible to offer zero latency to the users but with medium query performance compared to MOLAP and ROLAP.
To get the real-time reporting without having to reprocess your cube you will need to try out ROLAP, but beware, the performance will suffer (depending on the size of your cube and server!).
I'm a few months into developing a reporting solution. Currently I am loading a relational data warehouse (Fact and Dimension tables) using SSIS. SSAS cubes and dimensions are then created from the relational Data warehouse. I then use SSRS to build reports using MDX queries.
The problem I have is that things are starting to get rather complicated trying to understand how multidimensional modelling works as well as MDX and cubes. Since the organization it's being designed for is rather small, I'm thinking that I should re-evaluate my approach.
I think maybe I should just eliminate SSAS from the picture and simply create reports that report directly off the relational data warehouse using SQL queries. The relational data warehouse could still be loaded nightly to allow up to date data for reporting.
I'm just wondering if that would be a good idea considering I'm not very experienced with data warehousing and SSAS. Also I wanted to know if keeping my relational data warehouse in dimension and fact tables would still work with SQL queries or would I need to redesign the tables. I don't want to make the decision to eliminate SSAS if that will end up causing more headaches or issues.
The reports will not include complicated calculations besides row counts and YTD percentages. For example "How many callers were male?" and "How many callers called for Product A?" Which are then broken down by month.
Any comments or suggestion are much appreciated cause I'm starting to feel rather frustrated with trying get SSAS cubes developed properly.
I was in a similar situation at my company. I had never used SSAS, and I was asked to do research on the benefits of using cubes to do some reporting. It was a pretty steep learning curve because my background is in development not data and reporting. SSAS is most useful when aggregate queries on a relational database are time consuming and if reports need to be broken down into hierarchies that an analyst can use to better understand the state of the business. Since SSAS stores aggregate info, queries of that nature are very quick. If your organization's data is small, the relational queries might be quick enough that you don't really need the benefit of storing aggregates.
Also you need to take into consideration the maintainability of using SSAS. If you're having trouble figuring out SSAS and MDX then how easy of a time will others? I tried to explain an MDX query I wrote to my boss who is experienced with SQL, but it's really quite different from relational queries. How easy is it going to be to add more complex reports?
A benefit to using SSAS is it can put the analyst in control of the report. Second, there are great tools and support. Finally, it's pretty easy to deploy and connect.
You can remove SSAS from your architecture yes because all the results you can get from an MDX query to SSAS, you can get from a T-SQL query to your datawarehouse because the cube was built reading data from the DW. BUT, bear in mind the following: the main advantage on an OLAP cube, in my point of view, are aggregations.
Very simple explanation: lets say you have a fact table called orders with 1 million orders per month. If you want to know how much you sold on that month, using sql you need to read row by row and sum the value to produce the total. That's like 1 million reads on your DB. If you have a cube, with the propper agrregations configured, you can have that value pre-calculated and pre-stored on your cube so if you need to know how much you sold on a month, you will have only one read to your cube.
Its a matter of analyse your situation, if you have a small cube, maybe aggregations are not necessary and you cna do fine with SQL, but depending on the situation, they can be very helpfull