question about measures in ssas tabular. Is it better to have the physical measure in the fact table as a column and then do a simple sum measure? e.g
Imagine the scenario I have a measure called Income in the fact table, but the user wants to see ProductA income, Product B income as individual measures (not using income measure with product dimension, which yes gives the same result)
Or is it better to do a dax calculation with a sum and filter based on the product dim. e.g. Product B Income:= CALCULATE(SUM('fact'[Income]);VALUES('Product Type dim');('Product Type dim'[Product Tye] ="ProductB"))
I have tried both methods and both return the same result... just want to know what would be best practice here. (fact table around 300million rows)
The fewer the measures the more performant your report will be in the end. I would recommend, in this situation, just sticking to a simple sum() measure and applying it each product. This would be the most efficient approach, doing a calculation on 300 million rows whilst filtering and invoking Values will definitely slow you down.
DC07 How are end users viewing your model? Are they using it in PowerPivot (excel) or PowerBI? I would suggest performing those calculations in either the visualization element (PowerBI) in PowerPivot as a method of displaying the calculation.
Related
Good afternoon everyone, I have some problem with the result of aggregating a calculated member in a cube. Namely: I have two groups of measures Sales and Product, where I have two measures the sales qty and the product weight, I need to calculate the qty of products sold in kg.
I created a calculated member where I multiplied these two measures.
I understand that it would be more correct to implement this immediately in the fact table, but there are some difficulties with this.
I found many examples using the SCOPE operator, but there are also problems with this, as in DSV, I use not tables, but named queries, and I can’t create a named calculation on the fact table in, can there be any other solutions to the problem?
An example picture is presented below.
Thank you in advance.
my picture link is in a comment
Just to give you an overview I am very new at SSAS cubes and MDX Calculations in general. I'm currently having a hard time grasping the concept of querying an OLAP Cube.
So say I have a Movement Dimension with a unique key MovementID. I'm trying to create a calculation in SSAS that counts The total different MovementIDs in this dimension. So a few questions
Can I create this using the New Measure option in SSAS? It throws me an error if there is a MovementID in the dimension that is not present in my fact table so I'm trying to instead do this in MDX Calculation. Is my understanding here correct?
How do I make this script in MDX SSAS Calculation? The SQL equivalent should simply be SELECT COUNT(PFNo) FROM Dim_Movement. I know that the OLAP query works on axes rather than traditional table format. So how can I achieve a simple DISTINCT COUNT of a column in MDX script?
If I were to make the script to compute the SUM of the DISTINCT COUNT of the column instead. What would the script look like?
Thanks!
I would like to understand how OLAP-cube operations (i.e. drilling up/down, slicing/dicing and pivoting) and MDX are related.
My current guess is that OLAP-cube operations to MDX are like relational algebra to SQL. However, I do not see how some basic features of MDX correspond to OLAP-cube operations. For example, consider the following query on the demo "Sales" cube that comes with icCube:
SELECT {([Ottawa],[2009]), ([United States],[Feb 2010])} on Rows,
[Measures].members on Columns
FROM [Sales]
How does the use of tuples (e.g. ([Ottawa],[2009])) correspond to an OLAP-cube operation?
Yes, "OLAP-cube operations are what visualization tools are expected to implement". MDX is the query language that is executed against a cube that produces a result. OLAP clients generally run MDX against a cube. "OLAP cube operations" as described in that wikipedia are usually as a result of a person performing adhoc analysis against a cube in an client application.
Cube provide a structure and an access language that usually makes these types of operations easier (or at least faster)
How does MDX relate to a "drill down" operation? for example?
Firstly some MDX has already been run and yielded some kind of view of the cube (generally some rows, some columns, and a measure in the intersection although the MDX language syntax doesn't limit to two axes only).
So a person sees this information and decides to drill down on a single item in the row (this item was previously returned by some MDX). So the OLAP client generates some MDX that provides the drilled down view on the item
It might just add a children MDX function to the item in question. Or it might do it some other way. It depends on the client.
Heres some introductory info on how you can eavesdrop on the interactions between an OLAP client (which one? doesn't matter) and a SSAS cube
https://learn.microsoft.com/en-us/sql/analysis-services/instances/introduction-to-monitoring-analysis-services-with-sql-server-profiler
You can think of the MDX query specifying areas or regions of space within a cube - the tuple is a primary way of giving the processor co-ordinates which correspond to the part of the cube you are interested in.
It is the intersection of the co-ordinates and slices you specify that give you a result.
MDX is strongly related to set theory as the main types withing the cube are dimensions, sets, tuples, members etc.
An MDX query defines a table and for each of the table cell we've a tuple. In your scenario assuming we've two measures ( Meas1, Meas2 ) :
([Ottawa],[2009],[Meas1]) ([Ottawa],[2009],[Meas2])
([United States],[Feb 2010],[Meas1]) ([United States],[Feb 2010],[Meas2])
On this cell tuples you might add the WHERE clause, the SubQuery and the defaults that might be different than ALL (not advised). Remember all is a 'special' member that is ignored.
A tuple defines a single measure, Meas1 or Meas2, this will select the 'fact table' with a measure column, usually a numerical value. The other members are used to select rows in the table performing on them the aggregation defined by the measure ( sum , min, max.... ) on all rows defined by the tuple members, Ottawa and 2009 for example. As whytheq explains, you've a lot of transformations to 'play' with members as you would with sets.
This is the simple vision, as you can use calculated members that define a transformation instead of a simple row aggregation (e.g. Difference with previous year) and some aggregations are a bit more tricky ( open, close...).
But if you understand this well you got a perfect ground to understand MDX.
Using an accumulating snapshot fact, I multiple role-playing date dimensions in my Tabular Cube.
Users would like to be able to see when ANY of the date events occurred during a given period (as opposed to ALL of the date events which is quite natural in the tool).
This is essentially an OR statement.
I have tried adding another instance of the date dimension and then joining all of the role-playing dimensions to it (shown below), but am not having much success.
Not fully shown, but indicated are two fact tables related to the dimensions as well.
How can I essentially apply an OR condition to multiple dimensions from a pivot table?
The problem at hand is to retrieve the number of orders in a given month that are Received, Returned, or Invoiced. As in:
Time Period = January 2016
Received Count = 20
Returned Count = 16
Invoiced Count = 32
Thus, a fact record with ReceivedDateSID = 20160101 and ReturnedDateSID =20160115 and InvoicedDateSID = 20160130 should count once in each measure.
One straightforward approach that will perform great but require 3x more memory is to:
Have one DimDate
Have FactReceipt, FactReturn, FactInvoice. FactReceipt joins to DimDate on ReceiptDateKey. FactReturn joins to DimDate on ReturnedDateKey. FactInvoice joins to DimDate on InvoiceDateKey.
You can even put a where clause on the SQL view defining those fact tables. For example you only need the 1% of orders which are returned in FactReturn.
I personally prefer this approach to tons of complex DAX. Anytime a measure (like Returned Count) only makes sense with one role-playing date dimension I consider this approach.
You may also consider hiding all but one of those fact tables and putting all the calculated measures inside the one main fact table. That may reduce confusion for your users. Though drillthrough wouldn't work right then.
I think I know what a domain table is (it basically contains all the possible values that some other column can contain), and I've looked up dimension table in Wikipedia. Unfortunately, I'm having a hard time understanding the description they have there, because they explain it in terms of another piece of jargon: "fact table", which is explained to "consists of the measurements, metrics or facts of a business process." To me, that's very tautological, which is not helpful. Can someone explain this in plain English?
Short version:
Domains represent data you've pulled out of your fact table to make the fact table smaller.
Dimensions represent axes that you've pre-aggregated along for faster querying.
Here is a long version in plain English:
You start with some set of facts. For instance every single sale your company has received, with date, product, price, geographical location, customer name - whatever your full combination of information might be - for each sale. You can put these facts into a huge table.
A large variety of queries that you want to run are in principle some fairly simple query on your fact table. However, your fact table is freaking huge. You need to make the queries faster.
(1) The first trick to making it faster is to move data out of it so it is smaller. So you can take every column that is "long text", put its possible values into a domain table, and replace the original column with an id into that table. This will make your fact table much smaller, and you can still get at your original data if you need it. This makes it much faster to query all rows since they take up less data.
That's fine if you have a small enough data set that querying the whole fact table is acceptably fast. But a lot of companies have too much data for this to be enough, so they have to be smarter.
(2) The second trick to making it faster is to pre-compute queries. Here is one way to do this. Identify a set of dimensions, and then pre-compute along dimensions and combinations of dimensions.
For instance customer name is one dimensions, some queries are per customer name, and others are across all customers. So you can add to your fact table pre-computed facts that have pre-aggregated data across all customers, and customer name has become a dimension.
Another good candidate for a dimension is geographical location. You can add summary records that aggregate, by county, by state, and across all locations. This summarizing is done after you've done the customer name aggregation, and so it will automatically have a record for total sales for all customers in a given zip code.
Repeat for any number of other dimensions.
Now when someone comes along with a query, odds are good that their query can be rewritten to take advantage of your pre-aggregated dimensions to only look at a few pre-aggregated facts, rather than all of the individual sales records. This will greatly speed up queries.
In practice, this winds up pre-aggregating more than you really need. So people building data warehouses do clever things which let them trade off effort spent pre-aggregating combinations that nobody may want versus run-time effort of having to compute a combination on the fly that could have been done in advance.
You can start with http://en.wikipedia.org/wiki/Star_schema if you
want to dig deeper on this topic.
Fact Tables and Dimension Tables, taken together, make up a Star Schema. A Star Schema is a representation, in SQL tables, of a Multidimensional data model. A multidimensional data model stores statistics, "facts", as values in a multidimensional space, where the "location" in each dimension establishes part of the context for the fact. The multidimensional data model was developed in the context of advancing the concept of data warehousing.
Dimension tables provide a key to each dimension, and attributes relevant to that dimension.
An MDDB can be stored in a data cube specially built for that purpose instead of using an SQL (relational) database. Cognos is one vendor that has its own data cube product out there. There are some advantages to using an SQL database and a star schema, over using a special purpose data cube product. There are other advantages to using a data cube product. Sometimes the advantages to the SQL plus Star schema approach outweigh the advantages of a data cube product.
Some of the advantages obtained by normalization can be obtained by designing a Snowflake Schema instead of a Star schema. However, neither star schema nor snowflake schema are going to be free from update anomalies. They are generally used in data warehousing or reporting databases, and copying data from an operational DB into one of these databases is something of a programming challenge. There are tools sold for this purpose.
A Fact table is a table which contains measurements or metrics or facts of business processes. Example:
"monthly sales number" in the Sales business process
"monthly profit amount" in the Profit business process
Most of them are additive (sales, profit), some are semi-additive (balance as of) and some are not additive (unit price).
The level of detail in Fact table is called the "grain" of the table i.e. the granularity can be fine or coarse. Fact table also contains Foreign Keys for the dimension tables.
Whereas Dimension Tables are those tables which contain attributes that helps in describing the facts of the fact table.
The following are types of Dimension Tables:
Slowly Changing Dimensions
Junk Dimensions
Confirmed Dimensions
Degenerated Dimensions
To know more you can go through the Data Warehousing Tutorials