Scenario
Designing a Star Diagram for an OLAP environment for the process Incident Management. Management requests to be able to both filter on SLA status (breached, achieved or in progress) and being able to calculate the percentage of sla achieved vs breached. Reporting will be done through in Excel/SSRS through SSAS (tabular).
Question
I’m reasonable inexperienced in designing for an OLAP environment. I know my idea will work but I’m concerned this is not the best approach.
My idea:
SLA needs to be both a measure and a dimension.
DimSLA
…
(Nullable bool) Sla Achieved -> Yes=True, No=False, and InProgress=NULL
…
FactIncident
…
(Nullable Integer) Sla Achieved Yes=1,No=0 and In Progress=NULL
…
Then in SSAS, publish a calculated percentage field which averages FactIncident-SlaAchieved.
Is this the right/advisable way to do it?
As you describe it, "SLA achieved" should be an attribute, as you want to classify by it, not sum it. The only thing you want to sum or aggregate would be other measures (maybe an incident count) under the condition that the "SLA achieved" attribute has certain values like "achieved" or "not achieved". This is the main rule in dimensional design: Things you use for classifying or break down are attributes, and things that you calculate are measures. There are a few cases where you need a column for both, but not many.
Do not just use a boolean value. Use a string value easily understand by users like the texts "SLA achieved", "SLA not achieved", "in progress". This makes it much more easy for non technical users to use the cube. In case you use this in a dimension table , there would be just three records with the strings, and the fact table would reference them with maybe a byte foreign key, hence more meaningful texts do not use up millions of bytes.
Related
I require some more advanced MDX knowledge than mine.
I need to get the RepoRate_MAX for repo products, at book and instrument level, but also looking at the Java code I'm replacing that code always uses the max MurexId.
How can I perform the below (I've placed MAX in here on the dimension but this is wrong) and I need the combo of the dimensions and also the MAX MurexId:
[Measures].[RepoRate_VAL] = (([Deal].[ProductType].&[REPO],[Deal].[Book],[Deal].[Instrument],MAX([Deal].[MurexId])),[Measures].[RepoRate_MAX])
I'm sure it's a simple one but my mind is part way between the Java OO and MDX worlds currently haha :D
Thanks
Leigh
So after some experimenting I found out about the TAIL and Item MDX functions.
I think at one point I did get it working, but didn't make a note of what did work. I was playing around with this and variants of it..but most versions ended up in unusable query times:
[Measures].[RepoRate_VAL] = (([Deal].[ProductType].&[REPO],[Deal].[Book],[Deal].[Instrument],TAIL(EXISTING([Deal].[MurexId].[MurexId])).Item(0)),[Measures].[RepoRate_MAX])
So I then decided to push the RepoRate calculation back to the SQL data preparation script. Cleaner/smoother data is always better and then to have simple calculated members.
I used SQL to determine the RepoRate from tradelevel with MAX(MurexId) and GROUP BY on Book, Instrument to then update my main fact table to ensure that the correct RepoRate was set at Book, Instrument level.
Thus the calculated member is then:
[Measures].[RepoRate_VAL] = (([Deal].[Book],[Deal].[Instrument]),[Measures].[RepoRate_MAX])
Fast data prep and a fast calculated member on the Excel/Pivot/UI layer.
I am building a data model with PowerPivot for Excel 2013 and need to be able to identify the max number of emails sent per person. The DAX formula below gives me the result that I looking for but performance is incredibly slow. Is there an alternative that will compute a maximum by group without the performance hit?
Maximum Emails per Constituent:
=MAXX(SUMMARIZE('Email Data','Email Data'[person_id],"MAX Value",
([Emails Sent]/[Unique Count People])),[MAX Value])
So, without the measure definitions for [Emails Sent] or [Unique Count People], it is not possible to give definitive advice on performance. I'm going to assume they are trivial measures, though, based on their names - note that this is an assumption and its truth will affect the rest of my post. That being said, there is an obvious optimization to make to start with.
Maximum Emails per Consultant:=
MAXX(
ADDCOLUMNS(
VALUES('Email Data'[person_id])
,"MAX Value"
,[Emails Sent] / [Unique Count People]
)
,[MAX Value]
)
I used the ADDCOLUMNS() rather than a SUMMARIZE() to calculate new columns. See this post for an explanation of the performance implications.
Additionally, since you're not grouping by multiple columns, there's no need to use SUMMARIZE(). The performance impact of using VALUES() instead should be minimal.
The other question that comes to mind is whether this needs to be a measure. Are you going to be slicing by other dimensions? If not, this becomes a static attribute of a [person_id] which could be calculated during ETL, or in a calculated column.
A final note - I've also been assuming that your model is optimal as well. Again, we'd need to see it to make comment on whether you could see performance issues from something you're doing there.
How do I create custom rollup types in icCube?
Say, I need WAvg (which is already implemplemented there) instead of plain Avg function. But I it is not on the dropdown list in measure creation form. What should I do now?
Alexander, I assume you're talking about the cube builder.
The weighted-average is not available in the list of available aggregation types because there's no straitforward way to implement it at cube level. Aggregation types available for standard measures are simple calculations. Those calculation are meant to be very fast for millions of rows. You've two kinds of average available for standard measures : 'average on leafs(rows)' and 'average on children', which might be near what you're looking for.
In the case of a weighted average you have to create a calculated measure: you need to defined the values to "weight" your underlying measure against. The documentation weighted-average is giving several examples.
Starting with the customary - "please excuse me as this is my first post and i'm a relative beginner" disclaimer, i have the following question...
I work for a not for profit campaigning organisation, I've set up an SSAS solution to measure campaigning actions (e.g emailing the priminister) taken by a set on campaigners (customers) the main fact table has a count of actions as its measure, and is sliceable by say time and geography....
... but I also want to have another factless fact table that can show a count of how many campaigners are in what mailing segment... so i think what i need to do is basically dump a copy of my campaigner dimesion (which is slowly changing for people moving geography etc) into its own factless fact table... columns being FK_campaigner, segment_id, start_date, end_date but then how do i link that into the time dimension as it doesn't have an FK_time (merely a start and end time)... i guess what i want to do is relate the factless table to the time table on a "when PK_time > start_date and < end_date" then slice for me... but HOW? and is this possible or do i have to go down the route of loading one fact for each day that someone was in a segment?
many thanks to anybody who can point me in the right direction either structurally (is the broad approach wrong?) or even better in the practicalities of actually doing it in SSAS..
AJ
If you just want to analyse this data for a single point in time e.g. show me what what my numbers looked like at point x. Then you could have the time dimension be the "effective date" .
This would be semi additive and you would not be able to aggregate the data across time.
However, if what you interested in is analyzing the transition between time periods, than there is a "Many to Many" solution that would allow this:
Many to Many revolution white paper
The whitepaper provides several models the one that would be relevant in your scenario would be either the "Cross Time" or "Transition Matrix"
Good luck
Problem:
A relational database (Postgres) storing timeseries data of various measurement values. Each measurement value can have a specific "measurement type" (e.g. temperature, dissolved oxygen, etc) and can have specific "measurement units" (e.g. Fahrenheit/Celsius/Kelvin, percent/milligrams per liter, etc).
Question:
Has anyone built a similar database such that dimensional integrity is conserved? Have any suggestions?
I'm considering building a measurement_type and a measurement_unit table, both of these would have text two columns, ID and text. Then I would create foreign keys to these tables in the measured_value table. Text worries me somewhat because there's the possibility for non-unique duplicates (e.g. 'ug/l' vs 'µg/l' for micrograms per liter).
The purpose of this would be so that I can both convert and verify units on queries, or via programming externally. Ideally, I would have the ability later to include strict dimensional analysis (e.g. linking µg/l to the value 'M/V' (mass divided by volume)).
Is there a more elegant way to accomplish this?
I produced a database sub-schema for handling units an aeon ago (okay, I exaggerate slightly; it was about 20 years ago, though). Fortunately, it only had to deal with simple mass, length, time dimensions - not temperature, or electric current, or luminosity, etc. Rather less simple was the currency side of the game - there were a myriad different ways of converting between one currency and another depending on date, currency, and period over which conversion rate was valid. That was handled separately from the physical units.
Fundamentally, I created a table 'measures' with an 'id' column, a name for the unit, an abbreviation, and a set of dimension exponents - one each for mass, length, time. This gets populated with names such as 'volume' (length = 3, mass = 0, time = 0), 'density' (length = 3, mass = -1, time = 0) - and the like.
There was a second table of units, which identified a measure and then the actual units used by a particular measurement. For example, there were barrels, and cubic metres, and all sorts of other units of relevance.
There was a third table that defined conversion factors between specific units. This consisted of two units and the multiplicative conversion factor that converted unit 1 to unit 2. The biggest problem here was the dynamic range of the conversion factors. If the conversion from U1 to U2 is 1.234E+10, then the inverse is a rather small number (8.103727714749e-11).
The comment from S.Lott about temperatures is interesting - we didn't have to deal with those. A stored procedure would have addressed that - though integrating one stored procedure into the system might have been tricky.
The scheme I described allowed most conversions to be described once (including hypothetical units such as furlongs per fortnight, or less hypothetical but equally obscure ones - outside the USA - like acre-feet), and the conversions could be validated (for example, both units in the conversion factor table had to have the same measure). It could be extended to handle most of the other units - though the dimensionless units such as angles (or solid angles) present some interesting problems. There was supporting code that would handle arbitrary conversions - or generate an error when the conversion could not be supported. One reason for this system was that the various international affiliate companies would report their data in their locally convenient units, but the HQ system had to accept the original data and yet present the resulting aggregated data in units that suited the managers - where different managers each had their own idea (based on their national background and length of duty in the HQ) about the best units for their reports.
"Text worries me somewhat because there's the possibility for non-unique duplicates"
Right. So don't use text as a key. Use the ID as a key.
"Is there a more elegant way to accomplish this?"
Not really. It's hard. Temperature is it's own problem because temperature is itself an average, and doesn't sum like distance does; plus F to C conversion is not a multiply (as it is with every other unit conversion.)
A note about conversions: a lot of units are linearly related, and can be converted using a formula like "y = A + Bx", where A and B are constants which could be stored in the database for each pair of units that you need to convert between. For example, for Celsius to Farenheit the constants are A=32, B=1.8.
However, there are also rare exceptions. Converting between logarithmic and non-logarithmic units, for example. Or converting between mass-per-volume and molar-mass-per-volume (in which case you would need to know the molar mass of the compound being measured).
Of course, if you are sure that all the conversions required by the system are linear, then there's no need for over-engineering, just store the two constants. You can then extract standardized results from the database using straight SQL joins with calculated fields.