Apologies if I've posted this in the wrong place. I appreciate this is a very hard question to answer as there are too many vairables but any advice or pointers would be very much appreciated.
We have an incredibly large, unwieldy, badly designed cube. It's that horrible 'one cube to rule them all' type as demonstrated below. Please note that Dimensions with names which may betray the place I am working for etc have been obfuscated.
What I am trying to get a feel for is how much data a cube can hold, as a generic rule of thumb. I (and several experts, of which I do not claim to be one!) have stated to management that if they continue to add data (and attirbutes) to the cube at the current level, it will fail. What we'd like is a way to work out whether this will be this year, next year, this month etc...and yes, I know this isn't going to have an exact formula answer. Any guidelines would be very helpful as I can't find any online; only best practise for build and I already know that this doesn't conform to that! I'm trying to get a budget approval to redesign it, hence the question...
23 dimensions, No KPIs, 4 calculated measures and 46 other measures
[Dim 1] - 11 attributes
no hierarchies
4 address lines, email address, full name, postcode, text provider type
[Area Detail] - 21 attributes
no hierarchies
2 address lines, postcode, various name and code fields (string)
[Area Main 1 Month Prior] - 5 attributes
2 hierarchies
[Area Main 4 Months Prior] - 5 attributes
2 hierarchies
[Area Main Dimension] - 5 attributes
2 hierarchies
[Type Dim 1] - 1 attributes
no hierarchies
[Date Dimension] - 36 attributes
4 hierarchies
[Event Dimension] - 29 attributes
no hierarchies
includes 5 dates which are not linked to date dimension but actually entered
[Event Rank Dimension] - 18 attributes
no hierarchies
[Event Track Dimension] - 21 attributes
no hierarchies
14 date fields
7 comment fields (freetext)
[History Date Dimension] - 4 attributes
no hierarchies
all date data
[Dim 2] - 5 attributes
no hierarchies
all freetext fields apart from code
[Official Date Dimension] - 9 attributes
no hierarchies
Date field and data about the date
[Previous Dim 2 Dimension] - 4 attributes
no hierarchies
[xxx Current Record Dimension] - 1 attribute
no hierarchies
[xxx Dimension] - 102 attributes
no hierarchies
4 address fields, postcode, 2 email fields, website
[xxx Dimension 1 Month Prior] - as above
[xxx Dimension 4 Months Prior] - as above
[Dim 3] - 12 attributes
no hierarchies
[Question Dimension] - 11 attributes
1 hierarchy
4 large text fields
[yyy Combination Dimension] - 1 attribute
no hierarchies
[yyy Current Record Dimension] - 1 attribute
no hierarchies
[yyy Status Dimension] - 3 attributes
no hierarchies
[Response Dimension] - 4 attributes
no hierarchies
2 large text fields
[zzz Area Dimension] - 4 attributes
no hierarchies
2 text fields
[zzz Type Dimension] - 1 attribute
no hierarchies
I hope this makes sense but happy to provide/clarify detail.
From my experience, the metrics you posted are mainly relevant to usability - adding more dimensions and measures will not cause your cube to "fail". I have successful stable cubes with many more dimensions and measures e.g. double or more.
The "one cube to rule them all" is an architectural advance introduced in SQL 2005. It optimises the build time, storage and query performance. With SQL Enterprise Edition you can present slices of it as "Perspectives", but I'm not a fan. I prefer to follow carefully planned Dimension and Measure naming as most client tools sort those objects alphabetically.
What can cause your cube to struggle and perhaps eventually "fail" is the volume of data in your larger dimensions and measure groups. Dimensions under 1m rows are normally no drama. Measure Groups under 100m rows are also usually fine with some basic Aggregations. Bigger than that and you may need to put more work into the design. I aim for sub 5 second response times for 95% of queries with simple end-user tools e.g. Excel 2010+.
Related
I want to define a cube measure in a SSAS Analysis Services Cube (multidimensional model) that calculates ratios for the selection a user makes for a predefined hierarchy. The following example illustrates the desired behavior:
|-City----|---|
| Hamburg | 2 |
| Berlin | 1 |
| Munich | 3 |
This is my base table. What I want to achieve is a cube measure that calculates ratios based on a users' selection. E.g. when the user queries Hamburg (2) and Berlin (1) the measure should return the values 67% (for Hamburg) and 33% (for Berlin). However if Munich (3) is added to the same query, the return values would be 33% (Hamburg), 17% (Berlin) and 50% (Munich). The sum of the values should always equal to 100% no matter how many hierarchy members have been included into the MDX query.
So far I came up with different measures, but they all seem to suffer from the same problem that is it seems impossible to access the context of the whole MDX query from within a cell.
My first approach to this was the following measure:
[Measures].[Ratio] AS SUM([City].MEMBERS,[Measures].[Amount])/[Measures].[Amount]
This however sums up the amount of all cities regardless of the users selection and though always returns the ratio of a city with regards to the whole city hierarchy.
I also tried to restrict the members to the query context by adding the EXISTING keyword.
[Measures].[Ratio] AS SUM(EXISTING [City].MEMBERS,[Measures].[Amount])/[Measures].[Amount]
But this seems to restrict the context to the cell which means that I get 100% as a result for each cell (because EXISTING [City].MEMBERS is now restricted to a cell it only returns the city of the current cell).
I also googled to find out whether it is possible to add a column or row with totals but that also seems not possible within MDX.
The closest I got was with the following measure:
[Measures].[Ratio] AS SUM(Axis(1),[Measures].[Amount])/[Measures].[Amount]
Along with this MDX query
SELECT {[Measures].[Ratio]} ON 0, {[City].[Hamburg],[City].[Berlin]} ON 1 FROM [Cube]
it would yield the correct result. However, this requires the user to put the correct hierarchy for this specific measure onto a specific axis - very error prone, very unintuitive, I don't want to go this way.
Are there any other ideas or approaches that could help me to define this measure?
I would first define a set with the selected cities
[GeoSet] AS {[City].[Hamburg],[City].[Berlin]}
Then the Ratio
[Measures].[Ratio] AS [Measures].[Amount]/SUM([GeoSet],[Measures],[Amount])
To get the ratio of that city to the set of cities. Lastly
SELECT [Measures].[Ratio] ON COLUMNS,
[GeoSet] ON ROWS
FROM [Cube]
Whenever you select a list of cities, change the [GeoSet] to the list of cities, or other levels in the hierarchy, as long as you don't select 2 overlapping values ([City].[Hamburg] and [Region].[DE6], for example).
I have data in a cube, organized across 5 axes:
Source (data provider)
GEO (country)
Product (A or B)
Item (Sales, Production, Sales_national)
Date
In short, I have multiple data providers for different Product, Item, GEO and Date, i.e. for different slices of the cube.
Not all "sources" cover all dates, product, countries. Some will have more up to date information, but it will be preliminary.
The core of the problem is to have a synthesis of what all sources say.
Importantly, the choice of data provide for each "slice" of the cube is made by the user/analyst and needs to be so (business knowledge of provider methodology, quality etc).
What I am looking for, is a way to create a 'central dictionary' with all the calculation-types.
Such dictionary would be organized like this:
Operation Source GEO Item Product Date_start Date_end
Assign Source3 ITA Sales Product_A 01/01/2016 01/01/2017
Assign Source1 ITA Sales Product_A 01/01/2017 last
Assign with %delta Source2 ITA Sales Product_A 01/01/2018 last
This means:
From Jan2016 to Jan 2017, ProdA Sales in Italy, take Source 3
From Jan17 to last available, take Source 1
From Jan18 to last available, take the existing, add %difference across time from Source 2
The data and calculation are examples, there are other more complex, but the gist of it is putting slices of the "Source" 5-dimensional cube into a "Target" 4-dimensional cube, with a set of sequential calculations.
In SQL, it is the equivalent of a bunch of filtered SELECTs + INSERT, but the complexity of the calculations will probably lead to lots of nested JOINS.
The solution will be most likely custom functions, but I was wondering if anyone is aware of a language or software other than DAX/MDX which would allow to do this with minimal customization?
Many thanks
Considering that I have the following 3 attributes:
WeatherCondition
- rainy
- sunny
- cloudy
Daytime
- day
- night
RoadType
- city
- highway
- underconstruction
And I want to map this values with indexes (day - 1, night - 2, etc..)
My question is, what way should I do this considering that anytime I would want to add 2-3 choices more to an attribute, or even new attributes?
Solution1:
AttributessTable:
ID AttributeType AttributeValue
AT1 WeatherCondition rainy
AT2 WeatherCondition sunny
AT3 Daytime day
AT4 Daytime night
AT5 WeatherCondition cloudy
Solution2:
Separate tables for each attribute with only 2 columns (ID and value). WeatherCondition table with values (1,rainy; 2,sunny; 3,cloudy)
Daytime table with values (1,day; 2,night)
I'm somehow reluctant on the second solution thinking that I may have to create 30 tables.
The final result, is that I want to have a "lookup" or "bridge" table with the ID FK from another table like this:
FinalConditions
ID Attribute
1 AT1
1 AT3
1 AT5
2 AT2
2 AT5
Also, it's important to me to create reports by joining all this data altogether, I'm thinking that with Solution 2 it will be harder to join altogether 30 tables.
I think it all comes down to scalability - how many rows are the tables expected to hold (all of them, summed up). If they're never going higher than say 10k, you shouldn't worry - Solution 1 will do.
However, if you expect the eventual number to be, say, in the millions of rows, Solution 2 is definitely the way to go - it'll lead to a lot less locks and it will probably be a lot easier to maintain (albeit harder to implement - you might have to "create 30 tables").
Hope this helps.
I am using SQL Server 2014 and Visual Studio 2015.
I have an SCD2 for staff names, for example
SK AltKey Name Gender IsActive
1 15 Sven Svensson M 1
2 16 Jo Jonsson M 1
and in the fact table
SK AgentSK CallDuration DateKey
100 1 335 20160808
101 2 235 20160809
So, you can see the cube is currently linked on FctAgentSK and DimSK. This works as planned. However, when Jo changes gender the SCD2 makes the row inactive (0) and inserts a new row with the new gender and IsActive of '1'.
The problem I face is that the factSK 101 still references the 'OLD' details for the Agent. How should I deal with this to be able to still report on the call, but also reference the "correct" details of the Agent - reflecting their current gender.
When a new fact is inserted it will have the 'NEW' SK assigned, but basically I would need to report on ALL calls that have happened either side of the gender change.
Any suggestions please?
Thank you.
As Nick.McDermaid suggested, if you don't want SCD2 functionality, you could remove it from the dimension design (I've often seen it over-implemented when it's not actually wanted: perhaps you've inherited that kind of setup?).
If you want to/must keep the SCD2 design, but want to report on current staff attributes (gender and any other SCD2 attributes).
Kimball documents a "Type 6" here: SCD types 0,4,5,6,7. You add a "current" value of the attribute to an existing Type2 design. You could then report on the "current" attributes only.
I'm assuming that the Staff Name "Alt Key" is the durable staff-member key, that stays the same through changes in staff attributes? You could make a slightly different Employee dimension (or, hierarchy inside the Employee dimension), that has Alt Key as its leaf-level key. If you don't still have SK as a dimension attribute, this will make the dimension table "collapse" into one member per AltKey, not one member per SK. Obviously, you can't add any SCD2 attributes to this Alt Key hierarchy, as there won't be a single value per key; and this raises special problems about what to call the durable "employee" (i.e. what the Name Column of the leaf level will be), since Employee Name is one of the most obvious SCD2 attributes that will not remain the same. Probably this approach is best combined with an underlying "Type6" inclusion of the "current value" in the dimension data, as described in (1) above.
I am new to SSAS and have a situation where I need help.
I have a many to many relationship table which contains info about the competitors of a property and the type of competitor it is at any given date.
So, something like this:
PID Type CompID Date
1 A 1 1/1/2001
1 A 2 1/1/2001
1 B 1 2/1/2001
1 B 1 3/1/2001
2 A 1 1/1/2001
2 B 1 1/1/2001
Now I need to include this in the cube and relate it to the main fact table. I have defined the relationship as a many to many but while writing the query to retrieve the information using MDX, I am stuck.
What I need is all the measures for a given property and all the aggregated measures of all its competitors of a given type at a given date.
So, given a proeprty ID, I need to identify the list of its competitors of a given type and on a given date and then I have to aggregate the measures for all these competitor properties.
I am stuck at this place where I have to identify all the competitors of a given property.
e.g. if I fire this query:
Select
{
TYNBC,YOYNBC_Improvement,LYNBC,TYADR,YOYADR_Improvement,LYADR
}
on 1,
Stay_DATE.Month.Month on 0
FROM Cube1
where {Hotel.Hotel_Key.&[480]}*{Stay_DATE.Hierarchy.Year.&[2015]}
The result would be the measures for a given property.
What I want in the result is all the above measures and the same measures for the competitors of the property 480 for a given date and a given competitor type. The issue I am facing is in identifying the competitors of the property in mdx because competitor table is added as a factless fact with many to many relationship in the cube. So, how do I retrieve the list of competitor properties when there is no hierarchy defined as it is not a dimension.
Thanks for your help in advance.