I have this fact table
I want to create a calculated member like this : S = the sum of the "TOT_LIV" that have the "NUM_PROD" = 1
For example:
S should be equal to 50
How could I do it ?
Assuming you have a measure defined on this fact table named "TOT LIV", this measure has "Sum" defined as its aggregate method, and that NUM_PROD is a foreign key to a dimension named Prod having an attribute named Num Prod which I assume being based on the primary key of the dimension, and hence it would be 1 as well for the records that are references by the NUM_PROD primary key, you would use
([Measures].[TOT LIV], [Prod].[Num Prod].[1])
You see there is no 1:1 translation from SQL to MDX, a lot of things depend on cube setup, which predefines many behaviors that you have to repeat again and again in SQL queries (like the usage of sum).
Related
I have a data set with three dimensions that I would like to store for use with a website:
A list of companies (about 1000)
Information about the company (about 15 things)
Time (monthly)
Essentially, I want to track this information over time and keep it up to date.
When I start, the data will be 1000x15x1, after a year it will be 1000x15x12, and after 10 years if will be 1000x15x120.
The main queries I would make are:
Get all information for one company over all times
Get all information for one particular time
What would be a good database configuration for doing this? I'm open to either SQL or noSQL solutions.
In case it matters, the website is on Google App Engine.
From the relational database schema design perspective:
If the goal is analytics / ad-hoc querying / OLAP in general only, then you can use star-schema which is well suited for these type of analytics. But beware, OLAP databases are de-normalized and not suitable for operational transaction storage / OLTP in general, if you are planning to do both on this database.
The beauty of the Star schema:
The fact tables are usually all numeric, making the tables very small even though there are too many records. Small table means it is very fast to read (I/O).
All joins from the fact table to dimension tables are based on foreign keys (single column, numeric, indexable foreign keys)
All dimension tables have surrogate key, which is a single column primary key. Single column primary key is easier to JOIN than a multi-column primary key and also easier to index.
There is no NULL in foreign keys in fact tables. This makes JOIN operations straightforward, i.e. always JOIN fact table to all of its dimension tables. If you need NULL case, you need to add that as a special case in your dimension table. For example: if a company is not listed on stock market, and one of the thing you track is stock price, then you enter 0 or NULL into the fact for the stock price table depending on (how you want to do SUM(), AVG() etc later) and then add a special case into your StockSymbols dimension table called 'Private company' and add the foreign key of this special case into the fact table as your foreign key.
Almost all filtering is done through the dimension tables that are much much smaller than the fact tables. This requires having a Date dimension to be able to do date-based queries.
If you can stay in pure Star schema, then all yours JOINs are single hop (i.e. no join between two tables through another table).
All these makes JOIN operations very fast, simple and straightforward. That's why the Star schema is at the heart of data-warehousing designs.
https://en.wikipedia.org/wiki/Star_schema
https://en.wikipedia.org/wiki/Data_warehouse
One level up from this is OLAP (SSAS SQL Server Analyses Services for example) which does pre-processing of the data to make it fast to query but it involves more learning than pure start-schema and it's an overkill in your case
For your example
In Star schema,
Companies will be a dimension table
You will need Month dimension table. It's simplified version of Date dimension, just for month info. An example of Date dimension is here.
https://www.codeproject.com/Articles/647950/Create-and-Populate-Date-Dimension-for-Data-Wareho
The information about the company (15 things you say) will be fact tables. The facts must be numeric (b/c ideally all non-numeric values is saved in dimension tables). This means taking the non-numeric part of a fact to a dimension table. For example: if you are keeping revenue and would like to keep the currency type too, then the you will need a Currency dimension and save only the amount in the fact table and a foreign key to the Currency dimension table.
If you have any non-numeric facts, you need to store the distinct list in a dimension table and add foreign key to that dimension table inside your Fact table (this is called factless fact table). The only exception to that is if the cardinality of the dimension and the fact table is very similar, then you can just store the non-numeric fact value inside the fact table directly as there is no benefit in having a dimension table (in fact a disadvantage).
Also the facts can be grouped by their granularity. For example you could have company_monthly_summary fact table and keep more than one fact in that table (which are all joining to Company dimension and Month dimension). This is all up-to-you how you would like to group facts table. But if their granularity are not the same, they should not be grouped as that will cause sparse fact tables and harder to query.
You will use foreign keys in Fact tables to join to your Dimension tables
Add index for your Dimension tables' most used columns
Add a numeric surrogate key to your dimension. It is usually an auto-increment number but that's up-to you. One exception people prefers for the surrogate key of Date dimension is using the format YYYYMMDD (as integer). This makes is easier on WHERE clause: i.e instead of filtering for the Date column (a DATETIME value), which will do search to find the surrogate keys, you just provide the surrogate keys directly b/c you know the format. Depending on your business domain, you may have other similar useful surrogate key patterns that you may want to consider and use. But just know, in case of a business domain change, you will have have to update all fact records. Simple auto-increment surrogate key does not have that problem. In your case, the surrogate key for the month can be actual month number (1 for Jan)
That being said, 1 million rows in 5 years is easy to query even without a Star-schema design (with proper indexing, database maintenance). But if this is part of a larger analytics system, then go with Star schema
The simplest way.
Create a table, companyname + info you needto store + column for year-month.
Ex:
CREATE TABLE tablename (
id int(11) NOT NULL AUTO_INCREMENT,
companyname varchar(255) ,
info1 int(11) NOT NULL,
info2 datetime ,
info3 varchar(255) ,
info4 bool ,
yearmonth datetime,
PRIMARY KEY (id) );
#queries
select * from tablename where companyname="nameofthecompany";
select * from tablename where yearmonth="year-month"; #can use between here
Firstly gonna show you example. We've got a fact table with some id, which is not primary key. Also we have dimension with all ids from fact table and names for that. Our id from fact table is a measure with aggregation function max. Is it possible to create calculated member, which will show name from our dimension using id from fact table? I know that it could be solved using rn and that structure:
Dimension.Hierahchy.Level.Item (meadures.rn).name
But is it possible to solve this another way?
We need to get key for number from measure. Something like that
Dimension.Hierahchy.Level.&[value of measures.maxid]
In mdx you can easily extract a maximum key of a set of members.
MAX(
Dimension.Hierahchy.Level.MEMBERS,
Dimension.Hierahchy.CurrentMember.MEMBERKEY
)
(the above is total guesswork as your current question does not include any example of mdx that you have already tried)
I have a fact table that has 4 date columns CreatedDate, LoginDate, ActiveDate and EngagedDate. I have a dimension table called DimDate whose primary key can be used as foreign key for all the 4 date columns in fact table. So the model looks like this.
But the problem is, when I want to do sub-filtering for the measures based on the date column. For ex: Count all users who were created in the last month and are engaged in this month. This is not possible to do with this design, coz when I filter the measure with create date , I can’t further filter for a different time window for engaged date. Since all the connected to same dimension, they are not working independently.
However, If I create a separate date dimension table for each of the columns, and join them like this then it works.
But this looks very cumbersome when I have 20 different date columns in fact table in real world scenario, where I have to create 20 different dimensions and connect them one by one. Is there any other way I can achieve my scenario w/o creating multiple duplicated date dimensions?
This concept is called a role-playing dimension. You don't have to add the table to the DSV or the actual dimensions one time for each date. Instead add the date once, then go to the dimension usage tab. Click Add Cube Dimension, and then choose the date dim. Right-click and rename it. Then update the relationship to use the correct fields.
There's a good article on MSSQLTips.com that covers this topic.
I wish to change the default aggregation from SUM to SUM on Distinct ID Values.
This is the current behaviour
ID Amount
1 $10
1 $10
2 $20
3 $30
3 $30
Sum Total = $90
By default, I am getting a sum of $90. I wish to do the sum on distinct ids and get a value of $60. How would I modify the default Aggregation Behavior to achieve this result?
Design your data as a many-to-many relationship: create one table/view having one record per ID and the amount column from the data shown in your question (the main fact table), and one table/view having one record per record of your data as shown in your question, presumably having another column, as otherwise it would not make any sense to have the data as shown in your question). This will be the m2m dimension table. Then, create a bridge table/view having the id of the m2m dimension table and your ID column.
Then create the following AS objects: A measure group from the main fact table, a dimension on column ID of the same table (in case there is no other column making a dimension table meaningful, in that case, you would better have a separate dimension table having ID as the primary key). Create a dimension from the m2m dimension table, and a measure group having only the invisible measure "count" from the bridge table. Finally, on the "Dimension Usage" tab of Cube Designer, set the relationship between the m2m dimension and the main measure group to be many to many via the bridge measure group.
See http://technet.microsoft.com/en-us/library/ms170463.aspx for a tutorial on many-to-many relationships.
I am trying to build a calculated measure in SSAS that incorporates a dimension parameter. I have two facts: Members & Orders and one Dimension: Date. Members represents all the unique members on my site. Orders are related to members by a fact key representing a unique user. Orders also contains a key representing the vendor for an order. Orders contains a key to the date dimension.
FactMember
- MemberFactKey
- MemberId
FactOrder
- FactOrderKey
- OrderId
- FactMemberKey
- DimVendorKey
- DimDateKey
DimDate
- DimDateKey
- FYYear
The calculated measure I am trying to build is the number of unique vendors a member has ordered from. The value of the calculation must of course change based on the date dimension.
Wouldn't the DISTINCTCOUNT function be the one to use here? Creating a distinct count of Vendors could then be used in this query and elsewhere.
WITH MEMBER [Test]
AS
DISTINCTCOUNT([Vendor].[Vendor].[Vendor])
I will say in advance that this may well be slow (Depending on data volume/distribution), so if this query will be a popular/big part of the design it may be worth considering a restructure.
I am confused, it would make more sense to make Members and Orders both separate dimensions and then reference them from a FACT table, say Fact.Sales. This would eliminate the need to even build a calculated member if you keyed your Members dimension on some sort of member_key.