Trying to calculate quartiles in MDX

Trying to calculate quartiles in MDX - mdx

My data looks like this:
ID |PersonID |CompanyID |DateID |Throughput |AmountType
33F467AC-F35B-4F24-A05B-FC35CF005981 |7 |53 |200802 |3 |0
04EE0FF0-511D-48F5-AA58-7600B3A69695 |18 |4 |201309 |5 |0
AB058AA5-6228-4E7C-9469-55827A5A34C3 |25 |69 |201108 |266 |0
with around a million rows. The columns names *ID refers to other tables, so they can be used as dimensions.
I have an OLAP cube with the column Throughput as Measure and the rest as dimensions.
I want to calculate Quartile 1 and 3 of the Throughput measure.
I followed this guide: https://electrovoid.wordpress.com/2011/06/24/ssas-quartile/
together with this post: Calculating Quartiles in Analysis Services
From those I tried to use this MDX query:
WITH
SET selection as ([Dates].[Year].&[2014],[Dates].[Month].&[1])
SET [NonEmptyIds] AS
NonEmpty(
[ThroughputID].[ID].[id]
*[ThroughputID].[ID].[Id].ALLMEMBERS
,
{[Measures].[Throughput]} * [selection]
)
SET [ThroughputData] AS
ORDER
(
[NonEmptyIds],
[Measures].[Throughput],
BASC
)
MEMBER [Measures].[RowCount] AS COUNT (ThroughputData)
MEMBER [Measures].[i25] AS ( .25 * ( [RowCount] - 1 ) ) + 1
MEMBER [Measures].[i25Lo] AS FIX([i25]) - 1
MEMBER [Measures].[i25Rem] AS ([i25] - FIX([i25]))
MEMBER [Measures].[n25Lo] AS (ThroughputData.Item([i25Lo]), [Throughput])
MEMBER [Measures].[n25Hi] AS (ThroughputData.Item([i25Lo] + 1), [Throughput])
MEMBER [Measures].[Quartile1] AS [n25Lo] + ( [i25Rem] * ( [n25Hi] - [Throughput] ))
SELECT
selection ON 0,
[Measures].[Quartile1]
ON 1
FROM (SELECT [Dates].[Y-H-Q-M].MEMBERS ON 0 FROM [Throughput])
But I get: 'Query (6, 7) The ID hierarchy is used more than once in the Crossjoin function.'
I am quite new to OLAP and MDX. Any ideas what's wrong and how I should calculate the quartiles correct?
I read somewhere that I needed the ID dimensions to be able to get a set with all values instead of aggregated values when calculating Quartiles...

Culprit is the following piece of code:
SET [NonEmptyIds] AS
NonEmpty(
[ThroughputID].[ID].[id]
*[ThroughputID].[ID].[Id].ALLMEMBERS
,
{[Measures].[Throughput]} * [selection]
)
You can't use the same hierarchy more than once in the cross-join. Here you have used [ThroughputID].[ID] twice. Instead try the below:
SET [NonEmptyIds] AS
NonEmpty(
[ThroughputID].[ID].[Id].ALLMEMBERS
,
{[Measures].[Throughput]} * [selection]
)

Related

Find missing entries in a SQL table conditional on criteria

I have modest simple SQL experience (using MS SQL server 2012 here) but this evades me. I wish to output distinct names from a table (previously successfully created from a join) which have some required entries missing, but conditional on the existence of another similar entry. For anyone who has location 90, I want to check they also have locations 10 and 20...
For example, consider this table:
Name |Number |Location
--------|-------|--------
Alice |136218 |90
Alice |136218 |10
Alice |136218 |20
Alice |136218 |40
Bob |121478 |10
Bob |121478 |90
Chris |147835 |20
Chris |147835 |90
Don |138396 |20
Don |138396 |10
Emma |136412 |10
Emma |136412 |20
Emma |136412 |90
Fred |158647 |90
Gay |154221 |90
Gay |154221 |10
Gay |154221 |30
So formally, I would like to obtain the Names (and Numbers) of those entries in the table who:
Have an entry at location 90
AND do not have all the other required location entries - in this case also 10 and 20.
So in the example above
Alice and Emma are not output by this query, they have entries for 90, 10 & 20 (all present and correct, we ignore the location 40 entry).
Don is not output by this query, he does not have an entry for location 90.
Bob and Gay are output by this query, they are both missing location 20 (we ignore Gay's location 30 entry).
Chris is output by this query, he is missing location 10.
Fred is output by this query, he is missing locations 10 & 20.
The desired query output is therefore something like:
Name |Number |Location
--------|-------|--------
Bob |121478 |20
Chris |147835 |10
Fred |158647 |10
Fred |158647 |20
Gay |154221 |20
I've tried a few approaches with left/right joins where B.Key is null, and select from ... except but so far I can't quite get the logical approach correct. In the original table there are hundreds of thousands of entries and only a few tens of valid missing matches. Unfortunately I can't use anything that counts entries as the query has to be locations specific and there are other valid table entries at other locations outside of the desired ones.
I feel that the correct way to do this is something like a left outer join but as the starting table is the output of another join does this require declaring an intermediate table and then outer joining the intermediate table with its self? Note there is no requirement to fill in any gaps or enter items into the table.
Any advice would be very much appreciated.
===Answered and used code pasted here===
--STEP 0: Create a CTE of all valid actual data in the ranges that we want
WITH ValidSplits AS
(
SELECT DISTINCT C.StartNo, S.ChipNo, S.TimingPointId
FROM Splits AS S INNER JOIN Competitors AS C
ON S.ChipNo = C.ChipNo
AND (
S.TimingPointId IN (SELECT TimingPointId FROM #TimingPointCheck)
OR
S.TimingPointId = #TimingPointMasterCheck
)
),
--STEP 1: Create a CTE of the actual data that is specific to the precondition of passing #TimingPointMasterCheck
MasterSplits AS
(
SELECT DISTINCT StartNo, ChipNo, TimingPointId
FROM ValidSplits
WHERE TimingPointId = #TimingPointMasterCheck
)
--STEP 2: Create table of the other data we wish to see, i.e. a representation of the StartNo, ChipNo and TimingPointId of the finishers at the locations in #TimingPointCheck
--The key part here is the CROSS JOIN which makes a copy of every Start/ChipNo for every TimingPointId
SELECT StartNo, ChipNo, Missing.TimingPointId
FROM MasterSplits
CROSS JOIN (SELECT * FROM #TimingPointCheck) AS Missing(TimingPointId)
EXCEPT
SELECT StartNo, ChipNo, TimingPointId FROM ValidSplits
ORDER BY StartNo

Welcome to Stack Overflow.
What you need is a bit challenging, since you want to see data that do not exist.
Thus, we first must create all possible rows, then substract the ones that exist
select ppl_with_90.Name,ppl_with_90.Number,search_if_miss.Location
from
(
select distinct Name,Number
from yourtable t
where Location=90
)ppl_with_90 -- All Name/Numbers that have the 90
cross join (values (10),(20)) as search_if_miss(Location) -- For all the previous, combine them with both 10 and 20
except -- remove the lines already existing
select *
from yourtable
where Location in (10,20)

You need to generate the sets consisting of name, number, 10_and_20 for all rows where location = 90. You can then use your favorite method (left join + null, not exists, not in) to filter the rows that do not exist:
WITH name_number_location AS (
SELECT t.Name, t.Number, v.Location
FROM #yourdata AS t
CROSS JOIN (VALUES (10), (20)) AS v(Location)
WHERE t.Location = 90
)
SELECT *
FROM name_number_location AS r
WHERE NOT EXISTS (
SELECT *
FROM #yourdata AS t
WHERE r.Name = t.Name AND r.Location = t.Location
)

Aggregated sum in DAX

I'm leasing a car, which I use my self, but also rent out for other people to use. I have 2000km I can drive each month, so I'm trying to do an area pivot graph which will track how much I use it vs how much it's rented out.
I have a table column consisting of the rented mileage and my own mileage
___________________________________
|Date |Rented mileage|Own mileage|
|23/03-18| 315| 117|
|07-04-18| 255| 888|
|07/04-18| 349| 0|
|13/04-18| 114| 0|
|21/04-18| 246| 113|
|28/04-18| 1253| 0|
|01/05-18| 1253| 0|
So far I have two measures:
RentedMileage:=SUM(Table1[Rented Mileage])
OwnMileage:=SUM(Table1[Own Mileage])
Which, when I plot to the pivot chart looks like this:
I would like the mileage to be aggregated and have a Line which shown when I'm exceeding my 2000 km limit, so it would look something like this:
But I can't for the life of me figure out how to do an aggregated value of my table?

The issue was solved by adding following line of code to the measure:
Cumulative Quantity :=
CALCULATE (
SUM ( Transactions[Quantity] ),
FILTER (
ALL ( 'Date'[Date] ),
'Date'[Date] <= MAX ( 'Date'[Date] )
)
)

MDX/SSAS sum of certain values over totals - calculate success/failure rate

I have a simplified example cube used for learning purposes, and to try to figure out a more complex problem.
The cube represents a small web server log,
number of hits as a measure
hostname as a dimension
http status code as a dimension
I can get a breakdown on number of hits per host and http status code with the MDX
SELECT NON EMPTY { [Measures].[CNT HITS] } ON COLUMNS,
NON EMPTY { ([DIM NOS STATUSCODE].[Statuscode].[Statuscode].ALLMEMBERS *
[DIM NOS HOST].[HOST].[HOST].ALLMEMBERS ) } ON ROWS
FROM [DW]
Now what I would like is to make groups over various HTTP status codes to e.g. show the percentage of successful hits (all 2xx status codes), the percentage unsuccessful hits (all non 2xx status codes).
I can do this with SQL, but I'm at a loss on how to do it with MDX. e.g. with SQL I'd do:
select HOST,
sum(CNT_HITS) as HITS ,
SUM(CASE WHEN s.statuscode div 100 = 2 THEN CNT_HITS ELSE 0 END)/sum(CNT_HITS) * 100 as success_percent,
SUM(CASE WHEN s.statuscode div 100 = 2 THEN 0 ELSE CNT_HITS END)/sum(CNT_HITS) * 100 as failed_percent,
sum(CASE WHEN s.statuscode = 401 THEN CNT_HITS ELSE 0 END)/sum(CNT_HITS) * 100 as auth_fail_percent
from FACT_NOS_HTTPLOG fact
group by HOST;
And for the data shown in the above screenshot, I'd get
+-----------------+------+-----------------+----------------+-------------------+
| HOST | HITS | success_percent | failed_percent | auth_fail_percent |
+-----------------+------+-----------------+----------------+-------------------+
| www.example.com | 1610 | 93.1677 | 6.8323 | 6.2112 |
| www.test.com | 50 | 0.0000 | 100.0000 | 0.0000 |
+-----------------+------+-----------------+----------------+-------------------+
But how can I accomplish this with MDX ?

I think the easiest way to accomplish this is to add a column to your fact table (or view/query) that would contain keys for either success_percent, failed_percent or auth_fail_percent. Then create a new dimension with these 3 members. Join to the fact and you have your solution without the need for any MDX at all.

Add an extra attribute [Status] to your [DIM NOS STATUSCODE] dimension and use MDX for percentage, like this:
([DIM NOS STATUSCODE].[Status].&[Failed],[Measures].[CNT HITS]) / [Measures].[CNT HITS]

It will involve a certain amount of hard coding - although you could add these measures into your cube script.
WITH
MEMBER [Measures].[failed_percent] AS
DIVIDE(
(
[DIM NOS STATUSCODE].[Status].&[Failed]
,[DIM NOS HOST].[HOST].currentmember
,[Measures].[CNT HITS]
)
, (
[DIM NOS STATUSCODE].[Status].[All]
,[DIM NOS HOST].[HOST].currentmember
,[Measures].[CNT HITS]
)
)
SELECT
NON EMPTY
{
[Measures].[CNT HITS]
,[Measures].[failed_percent]
} ON COLUMNS,
NON EMPTY
[DIM NOS HOST].[HOST].[HOST].ALLMEMBERS
ON ROWS
FROM [DW];

ACCESS: calculate timestamp difference between rows

Here is the data I am working with in MS Access from a system tracking when a agent makes system changes:
|agentid|eventtype|reasoncode|eventdatetimelocal |
|1830 |2 |32762 |01/01/2014 7:11:44 PM|
|1830 |3 |0 |01/01/2014 7:13:46 PM|
|1830 |2 |32762 |01/01/2014 7:14:55 PM|
|1833 |2 |0 |01/01/2014 7:11:35 PM|
|1833 |3 |32762 |01/01/2014 7:13:25 PM|
I need to determine the number of seconds which elapsed between rows by agent. I would also like to preserve the detail of the eventtype and reasoncode.
I tried joining on a subqry but it's not working:
SELECT sub1.agentid,
sub1.eventtype,
sub1.reasoncode,
sub1.eventdatetimelocal,
(sub1.next_timestamp-sub1.eventdatetimelocal) AS duration
FROM (SELECT i.agentid,
eventdatetimelocal,
eventtype,
reasoncode, (SELECT
Min([eventdatetimelocal])
FROM state_detail_tbl
WHERE [eventdatetimelocal] > i.eventdatetimelocal
) AS next_timestamp
FROM state_detail_tbl AS i
WHERE i.eventdatetimelocal BETWEEN #01/01/2014# AND #01/31/2014#
) AS sub1;

You can try this query
SELECT sub1.agentid,
sub1.eventtype,
sub1.reasoncode,
sub1.eventdatetimelocal,
(SELECT TOP 1 sub2.eventdatetimelocal - sub1.eventdatetimelocal
FROM state_detail_tbl AS sub2
WHERE sub1.agentid=sub2.agentid
AND sub2.eventdatetimelocal > sub1.eventdatetimelocal
ORDER BY sub2.eventdatetimelocal) AS duration
FROM state_detail_tbl sub1
WHERE (SELECT TOP 1 eventdatetimelocal
FROM state_detail_tbl AS s3
WHERE sub1.agentid=s3.agentid
AND s3.eventdatetimelocal > sub1.eventdatetimelocal) Is Not Null
AND sub1.eventdatetimelocal BETWEEN #01/01/2014# AND #01/31/2014#
ORDER BY sub1.agentid, sub1.eventdatetimelocal;

I received an error from the query below stating something along the lines of "this query can only return a maximum of one row". But the query along with this reference: Calculating time difference between activity timestamps in a query gave me what I needed which I am listing below for reference. I decided that my initial needs were too broad and so I simplified the query to return the bare minimum data needed to bring the timestamp up to the previous row. I can step another query in with datediff to figure out the seconds. It takes awhile for this to process but it works and can run overnight if required.
SELECT i.agentid, i.eventtype, i.reasoncode, eventdatetimelocal, (SELECT
Min([eventdatetimelocal]) FROM state_detail_subqry WHERE agentid = i.agentid
AND [eventdatetimelocal]>i.[eventdatetimelocal]) AS next_timestamp
FROM state_detail_subqry AS i
ORDER BY agentid, eventdatetimelocal;

MDX calculation has wrong order of precendence

Im having an issue with an MDX query, and I think it boils down to the order of precedence between calculating an aggregate and a calculated member.
Let me start with the underlying data, which revolves around a valuation (which has a date, and some other data such as a member type, a scheme - and crucially for this question; a loading factor) and an associated value.
The data
Valuation Table
Id | Valuation Date | Member Type | Scheme | Loading Factor
=============================================================
1 | 2010-01-01 | TypeA | Scheme X | 0.02
2 | 2010-01-01 | TypeB | Scheme X | 0.02
3 | 2010-01-01 | TypeA | Scheme Y | 0.02
4 | 2010-01-01 | TypeB | Scheme Y | 0.02
ValuationValue table
ValuationId | Value
====================
1 | 1000.0
2 | 2000.0
3 | 3000.0
4 | 4000.0
This, when loaded into a cube has a Valuation dimension with attributes MemberType, Scheme and date. And a cube with Measure group ValuationValue containing Value measure, and a Valuation measure group containing Loading Factor like so:
Cube
-Measure Groups
- Valuation
|_Loading Factor
- ValuationValue
|_Value
- Dimensions
- Valuation
|_MemberType
|_Scheme
|_Date
The question
Loading factor is used to load the Value, think of it like a tax, so 0.02 means "Loading amount is 2% of the value". When returning Value from a query, I need to also calculate the amount to load this value by. A typical query might look like
SELECT
{
[Measures].[Value]
} ON 0,
[Valuation].[Scheme] ON 1
FROM Cube
This would return 2 rows, and as you can see by comparing to the data above it correctly sums across memberType:
Scheme | Value
=================
Scheme X | 3000.0
Scheme Y | 7000.0
Now, if I try to calculate my loading factor in that query, all goes wrong - i'll demonstrate. Given the following query:
WITH MEMBER [Measures].[Loading Value]
AS
(
[Measures].[Value] * [Measures].[Loading Factor]
)
SELECT
{
[Measures].[Value] ,
[Measures].[Loading Value]
} ON 0,
[Valuation].[Scheme] ON 1
FROM Cube
I get the result
Scheme | Value | Loading Value
=================================
Scheme X | 3000.0 | 120.0
Scheme Y | 7000.0 | 280.0
Basically, what is happening is that it is suming my Loading Factor and then multiplying that by the Sum of my values(The first row above should be 1000 * 0.02 + 2000 * 0.02 = 60. Instead it's calculating 3000 * 0.04 = 120).
This is of course a contrived example, my actual structure is a bit more complex - but I think this demonstrates the problem. I was under the impression that the calculated member in the example above should occur on a row-by-row basis, instead of at the end of an aggration of my Value measure.
Thanks for any replies.

Your [Measures].[Loading Factor] - How is that set, is it a SUM?
Calculated members are generally done as per the rows returned if I remember - Unless you specify otherwise.
If you want an example, take a look at the currency conversion wizard output - This does something similar using the LEAVES command - You will need to do this in the MDX script as a SCOPE'd command though.
Given your description, the code could be something like:
CREATE MEMBER [Measures].[Loading Value] AS NULL
Scope( { [Measures].[Loading Value] } );
Scope( Leaves([Valuation]) );
This = [Measures].[Value] * [Measures].[Loading Factor]
Format_String(This) = "#,##0.00;-#,##0.00";
End Scope;
End Scope;

I'm not sure I follow your example completely, but you might try using SOLVE_ORDER and SCOPE_ISOLATION to manipulate the order of the calculations.
For example,
WITH
MEMBER [Measures].[Custom Calculation] AS
'([Measures].[Sales Count] - [Measures].[Unit Returns])',
SOLVE_ORDER = 65535, SCOPE_ISOLATION = CUBE
SELECT
{[Measures].[Custom Calculation]} ON COLUMNS,
NON EMPTY [Time].[YQMD].[Day].AllMembers ON ROWS
FROM [Waremart]

Thes one turned out ot be REALLY easy.
WITH MEMBER [Measures].[Loading Value]
AS
(
[Measures].[Value] * [Measures].[Loading Factor]
)
WITH MEMBER [Measures].[Total Loading Value]
AS
SUM (
EXISTING [Valuation].[Id].[Id],
[Measures].[Loading Value]
)
SELECT
{
[Measures].[Value] ,
[Measures].[Measures].[Total Loading Value]
} ON 0,
[Valuation].[Scheme] ON 1
FROM Cube

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Trying to calculate quartiles in MDX - mdx

Related

Find missing entries in a SQL table conditional on criteria

Aggregated sum in DAX

MDX/SSAS sum of certain values over totals - calculate success/failure rate

ACCESS: calculate timestamp difference between rows

MDX calculation has wrong order of precendence

Categories

Resources