What is the recommended practice in SSAS: to have a data source per DWH or per Cube?
In all samples I've seen, there is a single DS defined, but only a cube.
You could have multiple cubes in one project.
The datasource is defined for the whole project.
One datasource for each cube is not necessary and sometimes redundant.
Related
I was wondering if anyone here knows the exact differences for these 2 modes, more specifically:
What can we do in one model that we can't do with the other? (Multi-dimensional vs Tabular and vice versa)
How is the data stored in one model versus another?
If I am wring an SSRS / PowerBI / Excel report against this, what limitations does one model have over the other?
Does the tabular model have cubes? If not, what is the alternative storage medium and how does it differ from cubes (maybe provide for me
some background on what cubes are to begin with)
What are the differences in security considerations? As I understand, with the Multi-dimensional model, row-level, column, level
and even cell-level security can be applied - what is available with
this for the tabular model?
Also, as I understand SQL Server 2016 is moving to using the Tabular Model by default and that there may be some differences/improvements
over what is current in use (SQL Server 2014) - can you please provide
a list of what those are?
Thank you so much in advance.
A good place to start might be these articles which should be accurate as to the differences in SSAS 2014.
Advice on the decision points for choosing to build a Tabular or Multidimensional model
Paul Turley’s high-level description of Tabular strengths and weaknesses
Dimension relationships
Summary level presentation
Many-to-many relationships and writeback and scope statements and non-visual dimension security are some of the biggest missing features in SSAS 2014 Tabular in my opinion.
Tabular security is row based and just supports visual totals, not non-visual totals or cell security. But in many cases you don't want to use cell security for performance reasons.
Tabular uses in-memory columnar storage. Multidimensional uses disk-based row-based storage. So scanning a billion row fact table requires reading all columns from disk in Multidimensional and takes a minute or two to return a query on a fact table that large. If you optimize the Multidimensional model by building an aggregation then the query may take seconds. Tabular just scans the columns used in the query and simple queries or calculations even on a billion row table may return in under a second.
With SSAS 2016 Tabular the bidirectional relationship was added which was a very big deal for modeling flexibility and allowing many-to-many relationships. And parallel partition processing made loading large models feasible.
SQL 2017 installer for SSAS has Tabular as the default.
If you have the option for using SSAS 2016 Tabular or above it is highly recommended for performance and modeling flexibility. Here is what's new in SSAS 2016 and SSAS 2017.
New to DW concepts and SSAS. I'm reading alot that normalized relational dbs are optimal for OLTP due to a typical workload of many one-transaction batches. And denormalization is generally better for DW/BI applications because the nature of queries used for reporting are more batch-based... there were other reasons that I don't recall right now.
It sounds like the advice says to create a denormalized model and populate it from the base relationship model and then build your cubes off the denormalized model. Assuming you're using MOLAP storage type, your cube will store and incrementally update your data in a multidimensional model that it builds behind the scenes.
So now we have essentially the same data stored three times!
Am I reading that right? Why do we even need that intermediate denormalized table? It can't be to optimize report queries because those are being run against the multidimensional SSAS data store. Why not just build your cubes against a dsv whose definition is basically a view of the relational db?
The multidimensional model needs the relational model to be available in star schemas (that is what you call "denormalized model") for loading the data. And in many cases, there is some processing like combining data from different sources, keeping the data for reporting longer than it is needed in the OLTP world, keeping historical views like old regional or department structures available for analyzing which are not necessary and hence overwritten in the OLTP world. Hence, this intermediate step makes sense in many cases. You might also want to have clear cut of times, i. e. always report data for complete days (or, in some cases, months), and not have some data for the last day available and some not, which makes comparison of numbers for a day easier than comparing e. g. the sales of today containing only the data up to 10 o'clock with the sales of the whole day yesterday.
In some simple cases, the intermediate relational data structure need not be available physically. A few days ago I prepared a prototype cube where the star schema was just a set of views on the source data. In this case, of course, the data was only physically available in the original source form and in the cube. The structure of the source data did not make the views that inefficient, and thus data loading to the cube was fast enough for the prototype.
Long-time SQL Server relational DBA building my second-ever cube in my first-ever SSAS database (using 2008 R2).
I am trying to add to that second cube a database dimension already associated with the first cube. No matter how I launch it, the Add Cube Dimension dialog lists only the dimensions that have already been associated with the second cube, not the one I want (nor any others) already associated with the first cube. Based on multiple web-search results, I was under the impression that database dimensions are designed to be added to multiple cubes within the same database, and that this should be easy. What am I missing?
I believe what I'm asking is nearly identical to the question posed here, but I haven't had the same epiphany that the original poster did. The table behind the dimension I want is already in the DSV.
I have created a dimensional model which is similar in structure to the financial reporting design in the AdventureworksDW environment, where the value of each account is held as a single value column in the fact table and the dimensions give the data its semantic meaning.
There are over a thousand columns in this model so it works well for adding or deleting additional columns. Here is a really good blog on this design: http://garrettedmondson.wordpress.com/2011/10/26/dimensional-modeling-financial-data-in-ssas/
Although this model works well for querying the dimensional model, and there are examples supporting this model for dimensional analysis, I'm concerned that this model is not standard for cube development or data mining which seem to prefer wider tables.
Questions:
Is this design categorized as Entity-Attribute-Value (EAV)?
Would a design using multiple fact tables be better? So many wide fact tables (up to 10) with up to 200-300 columns each, but fewer rows.
Should I expect more performance issues with the much wider tables?
You are right that specific design is considered as EAV model.
By using such a design, you can easily add new accounts, hierarchies etc. You dont need to update your model.
I would not recommend one column per measure aproach. Most account will be null in most of the rows. Also with such a design, you need to read all of your measures even if you need to retrieve only one of them.
We heavily use account dimension in our cubes. Unfortunately things like shared members are not easy to handle in SSAS like in Essbase.
You need to create an Account dimension which is parent-child and also you need to have the key of this account dimension in the fact table as usual.
By using account dimension, you get nice support for time balance functionality. Using time balance functionality of SSAS supposed to be faster than custom MDX code.
We are converting unary operators and parent-child relationships to formulas at the moment.
So basically we have normal formulas, and parents in hierarchies also works as formulas.
At the end we are flattening the hierarchy. So it is not possible to drill down in account dimension. We are using account dimension as a calculation engine only.
It is possible to have proper hierarchies as well, but we decided not to mix custom rollup members and unary operators at the same time.
Shared members and all our formulas implemented as custom rollup members.
I'm new to Analysis Services
My first cube has been deployed and it seems to work.
Dimension tables are ok and fact tables are ok.
My question is very simple : If I add a new record in the related datasource table,
Browsing the cube, I don't see the new record until process again the cube.
In my mind I think if new records are addedd, then cube must reflect the changes.
How to solve this issue? Do I need to reprocess the cube every time a new record is added? This is impossible of course.
You understand that essentially your cube represents a bunch of aggregated measures? That means that when the cube is processed it looks at all the data that is in your fact tables and processes the Measures (according to the dimensions).
The result of this is that you're able to access the data in the cube quickly and efficiently. The downside is as you have mentioned is that when new data is added to the fact table the cube isn't updated.
Typically there will be a daily batch job that will update the cube with the latest fact data, depending on the amount of data you have and the "real-time" requirements this could be done more than once p/day. A lot of people do this out of hours.
If you look closely in BIDS you will notice on the Partitions tab that for each partition it has a Storage Mode which you can define.
I would recommend you read this this article http://sqlblog.com/blogs/jorg_klein/archive/2008/03/27/ssas-molap-rolap-and-holap-storage-types.aspx
Basically, there are a few different modes you can use:
MOLAP (Multi dimensional Online Analytical Processing)
MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. Data AND aggregations are stored in optimized format in the cube. The data inside the cube will refresh only when the cube is processed, so latency is high.
ROLAP (Relational Online Analytical Processing)
ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube.
Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage.
HOLAP (Hybrid Online Analytical Processing)
HOLAP is a storage type between MOLAP and ROLAP. Data will be stored in relational format(ROLAP), so there will also be zero latency with this storage type.
Aggregations, on the other hand, are stored in multi dimensional format(MOLAP) in the cube to give better query performance. SSAS will listen to notifications from the source relational database, when changes are made, SSAS will get a notification and will process the aggregations again.
With this mode it’s possible to offer zero latency to the users but with medium query performance compared to MOLAP and ROLAP.
To get the real-time reporting without having to reprocess your cube you will need to try out ROLAP, but beware, the performance will suffer (depending on the size of your cube and server!).