Get the first date and time from within a group - mdx

Many of the questions we get asked are related to how much time passes between two events (e.g. - patient was admitted, some assessment occurred). Some events take place multiple times. How can I find the first date and time that an event occurred per patient visit?
In the model is a table called Clinical Queries. In that table is a column called Activity Date Time. There is a calculated column called Date and another called Time that are based off of the Activity Date Time. These are related to a Date dimension table and a Time dimension table.
I've managed to build a query that gives me the earliest date, and it gives me the earliest time. However, it returns the earliest time regardless of the date. For example, if a patient has two assessments performed, one on 1/1/2017 at 23:59 and another on 1/2/2017 at 00:01, the query returns 1/1/2017 at 00:01.
WITH
MEMBER [Measures].[FirstInterventionDate] AS (
NonEmpty(existing{[Date].[Fiscal].[Date].MEMBERS}, {[Measures].[Clinical Queries Interventions Performed]}).Item(0).Name
)
MEMBER [Measures].[FirstInterventionTime] AS (
NonEmpty(existing{[Time].[Time].[Time].MEMBERS}, {[Measures].[Clinical Queries Interventions Performed]}).Item(0).Name
)
SELECT {[Measures].[Clinical Queries Interventions Performed], [Measures].[FirstInterventionDate], [Measures].[FirstInterventionTime]} on 0,
NON EMPTY([Clinical Queries].[Account Number].Children) ON 1
FROM (
SELECT { [Clinical Queries].[InterventionID].&[3000195],
[Clinical Queries].[InterventionID].&[3000186],
[Clinical Queries].[InterventionID].&[3000184],
[Clinical Queries].[InterventionID].&[3000182],
[Clinical Queries].[InterventionID].&[3000184] } ON 0
FROM (
SELECT { [Clinical Queries].[Account Number].&[ACCT992],
[Clinical Queries].[Account Number].&[ACCT064] } ON 0
FROM [Model]
))
What do I need to change to make it return the earliest date and time for that earliest date. For the example above it would be 1/1/2017 at 23:59.

Try the following code:
select
{[Clinical Queries].[Account Number].&[ACCT992],
[Clinical Queries].[Account Number].&[ACCT064]} *
{[Measures].[Clinical Queries Interventions Performed]} on 0,
Generate(
{[Clinical Queries].[InterventionID].&[3000195],
[Clinical Queries].[InterventionID].&[3000186],
[Clinical Queries].[InterventionID].&[3000184],
[Clinical Queries].[InterventionID].&[3000182],
[Clinical Queries].[InterventionID].&[3000184]},
Head(
NonEmpty(
[Clinical Queries].[InterventionID].CurrentMember *
[Date].[Fiscal].[Date].Members *
[Time].[Time].[Time].Members,
[Measures].[Clinical Queries Interventions Performed]
),
1
)
) on 1
from [Model]
You may be not very happy about performance, so I strongly recommend you to move the logic on DWH level: add a measure field like 201703301020 (3/30/2017 10:20) with Min aggregation, you may parse the value using MDX calculated measure combined with left() right() functions.

Related

SQL: Apply an aggregate result per day using window functions

Consider a time-series table that contains three fields time of type timestamptz, balance of type numeric, and is_spent_column of type text.
The following query generates a valid result for the last day of the given interval.
SELECT
MAX(DATE_TRUNC('DAY', (time))) as last_day,
SUM(balance) FILTER ( WHERE is_spent_column is NULL ) AS value_at_last_day
FROM tbl
2010-07-12 18681.800775017498741407984000
However, I am in need of an equivalent query based on window functions to report the total value of the column named balance for all the days up to and including the given date .
Here is what I've tried so far, but without any valid result:
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(sum(balance) FILTER ( WHERE is_spent_column is NULL ) ) OVER ( ORDER BY DATE_TRUNC('DAY', (time)) ) AS total_value_per_day
FROM tbl
group by 1
order by 1 desc
2010-07-12 16050.496339044977568391974000
2010-07-11 13103.159119670350269890284000
2010-07-10 12594.525752964512456914454000
2010-07-09 12380.159588711091681327014000
2010-07-08 12178.119542536668113577014000
2010-07-07 11995.943973804127033140014000
EDIT:
Here is a sample dataset:
LINK REMOVED
The running total can be computed by applying the first query above on the entire dataset up to and including the desired day. For example, for day 2009-01-31, the result is 97.13522530000000000000, or for day 2009-01-15 when we filter time as time < '2009-01-16 00:00:00' it returns 24.446144000000000000.
What I need is an alternative query that computes the running total for each day in a single query.
EDIT 2:
Thank you all so very much for your participation and support.
The reason for differences in result sets of the queries was on the preceding ETL pipelines. Sorry for my ignorance!
Below I've provided a sample schema to test the queries.
https://www.db-fiddle.com/f/veUiRauLs23s3WUfXQu3WE/2
Now both queries given above and the query given in the answer below return the same result.
Consider calculating running total via window function after aggregating data to day level. And since you aggregate with a single condition, FILTER condition can be converted to basic WHERE:
SELECT daily,
SUM(total_balance) OVER (ORDER BY daily) AS total_value_per_day
FROM (
SELECT
DATE_TRUNC('DAY', (time)) AS daily,
SUM(balance) AS total_balance
FROM tbl
WHERE is_spent_column IS NULL
GROUP BY 1
) AS daily_agg
ORDER BY daily

Get schedules of particular date from date ranges with MDX query

I already posted this problem many times, but unfortunately nobody could understand, I m sorry for my poor english :(
I reformulate ...
I have following fact table
I want to get records that match with a particular date and day of week (JOUR) in all dates range (DATE_DEB, DATE_FIN)
I can do that in SQL like this:
SELECT DATE_DEB,
DATE_FIN,
ID_HOR,
to_char(HR_DEB,'hh24:mi:ss') as HR_DEB,
to_char(HR_FIN,'hh24:mi:ss') as HR_FIN,
JOUR
FROM GRP_HOR HOR, GRP
WHERE GRP.ID_ACTIV_GRP = HOR.ID_ACTIV_GRP
AND TO_DATE('1998-01-08', 'YYYY-MM-DD') between DATE_DEB and DATE_FIN
AND 1 + TRUNC(TO_DATE('1998-01-08', 'YYYY-MM-DD')) - TRUNC(TO_DATE('1998-01-08', 'YYYY-MM-DD'), 'IW') = JOUR
So, I'll get 29 records (see below) which included in each range and match with the day of week (JOUR ), after that I want to enlarge them by hours (HR_DEB, HR_FIN).
The problem is, what's how the best way to do this ?
Create 2 date dimension and link them with DATE_DEB, DATE_FIN.
Create 2 Time dimension and link them with HR_DEB, HR_FIN.
How can I implement the between SQL clause in MDX ? Or geater than or Less than ?
Thank you in advance.
OUTPUT :
To specify a range of dates in MDX you just use the colon operator :
Here is the documentation on MSDN: https://learn.microsoft.com/en-us/sql/mdx/range-mdx?view=sql-server-2017
This is the example they give:
With Member [Measures].[Freight Per Customer] as
(
[Measures].[Internet Freight Cost]
/
[Measures].[Customer Count]
)
SELECT
{[Ship Date].[Calendar].[Month].&[2004]&[1] : [Ship Date].[Calendar].[Month].&[2004]&[3]} ON 0,
[Product].[Category].[Category].Members ON 1
FROM
[Adventure Works]
WHERE
([Measures].[Freight Per Customer])

SQL query with summed statistical data, grouped by date

I'm trying to wrap my head around a problem with making a query for a statistical overview of a system.
The table I want to pull data from is called 'Event', and holds the following columns (among others, only the necessary is posted):
date (as timestamp)
positionId (as number)
eventType (as string)
Another table that most likely is necessary is 'Location', with, among others, holds the following columns:
id (as number)
clinic (as boolean)
What I want is a sum of events in different conditions, grouped by days. The user can give an input over the range of days wanted, which means the output should only show a line per day inside the given limits. The columns should be the following:
date: a date, grouping the data by days
deliverySum: A sum of entries for the given day, where eventType is 'sightingDelivered', and the Location with id=posiitonId has clinic=true
pickupSum: Same as deliverySum, but eventType is 'sightingPickup'
rejectedSum: A sum over events for the day, where the positionId is 4000
acceptedSum: Same as rejectedSum, but positionId is 3000
So, one line should show the sums for the given day over the different criteria.
I'm fairly well read in SQL, but my experience is quite low, which lead to me asking here.
Any help would be appreciated
SQL Server has neither timestamps nor booleans, so I'll answer this for MySQL.
select date(date),
sum( e.eventtype = 'sightingDelivered' and l.clinic) as deliverySum,
sum( e.eventtype = 'sightingPickup' and l.clinic) as pickupSum,
sum( e.position_id = 4000 ) as rejectedSum,
sum( e.position_id = 3000 ) as acceptedSum
from event e left join
location l
on e.position_id = l.id
where date >= $date1 and date < $date2 + interval 1 day
group by date(date);

DAX running total based on 3 columns, one of which is a repeating integer running total

Very new to DAX/PowerPivot, and faced with devilishly tricky question on day one.
I have some data (90,000 rows) I'm trying to use to calculate a cumulative fatigue score for folk working shifts(using PowerPivot/Excel 2016). As per the below screenshot, the dataset is shift data for multiple employees, that has a cumulative count of days worked vs. days off that resets back to 1 whenever they switch from one state to the other, and a 'Score' column that in my production data contains a measure of how fatigued they are.
I would like to cumulatively sum that fatigue score, and reset it whenever they move between the 'Days worked' and 'Days off' states. My desired output is in the 'Desired' column far right, and I've used green highlighting to show days worked vs. days off as well as put a bold border around separate Emp_ID blocks to help demonstrate the data.
There is some similarity between my question and the SO post at DAX running total (or count) across 2 groups except that one of my columns (i.e. the Cumulative Days one) is in a repeating sequence from 1 to x. And Javier Guillén's post would probably make a good starting point if I'd had a couple of months of DAX under my belt, rather than the couple of hours I've gained today.
I can barely begin to conceptualize what the DAX would need to look like, given I'm a DAX newbie (my background is VBA, SQL, and Excel formulas). But lest someone berate me for not even providing a starting point, I tried to tweak the following DAX without really having a clue what I was doing:
Cumulative:=CALCULATE(
SUM( Shifts[Score] ) ,
FILTER(Shifts,Shifts[Cumulative Days] <= VALUES(Shifts[Cumulative Days] )) ,
ALLEXCEPT( shifts, Shifts[Workday],Shifts[EMP_ID] ) )
Now I'll be the first to admit that this code is DAX equivelant of the Infinite Monkey Theorem. And alas, I have no bananas today, and my only hope is that someone finds this problem suitably a-peeling.
The problem with this table is there is no way to determine when stop summing while performing the cumulative total.
I think one way to achive it could be calculating the next first date where continuous workday status changes.
For example the workday status in the first three rows for EMP_ID 70073 are the same, until the fourth row, date 04-May which is the date the workday status changes. My idea is to create a calculated column that find the status change date for each workday serie. That column lets us implement the cumulative sum.
Below is the expression for the calculated column I named Helper.
Helper =
IF (
ISBLANK (
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
),
CALCULATE (
MAX ( [Date] ),
FILTER (
Shifts,
Shifts[Date] >= EARLIER ( Shifts[Date] )
&& Shifts[EMP_ID] = EARLIER ( Shifts[EMP_ID] )
)
)
+ 1,
CALCULATE (
MIN ( [Date] ),
FILTER (
'Shifts',
'Shifts'[EMP_ID] = EARLIER ( 'Shifts'[EMP_ID] )
&& 'Shifts'[Workday] <> EARLIER ( 'Shifts'[Workday] )
&& [Date] > EARLIER ( 'Shifts'[Date] )
)
)
)
In short, the expression says if the date calculation for the current workday series change returns a blank use the last date for that EMP_ID ading one date.
Note there is no way to calculate the change date for the last workday serie, in this case 08-May rows, so if the the calculation returns blank it means it is being evaluated in the last serie then my expression should return the max date for that EMP_ID adding one day.
Once the calculated column is in the table you can use the following expression to create a measure for the cumulative value:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER ( ALL ( 'Shifts'[Helper] ), [Helper] = MAX ( [Helper] ) ),
FILTER ( ALL ( 'Shifts'[Date] ), [Date] <= MAX ( [Date] ) )
)
In a table in Power BI (I have no access to PowerPivot at least eight hours) the result is this:
I think there is an easier solution, my first thought was using a variable, but that is only supported in DAX 2015, it is quite possible you are not using Excel 2016.
UPDATE: Leaving only one filter in the measure calculation. FILTER are iterators through the entire table, so using only one filter and logic operators could be more performant.
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALL ( 'Shifts'[Helper], Shifts[Date] ),
[Helper] = MAX ( [Helper] )
&& [Date] <= MAX ( [Date] )
)
)
UPDATE 2: Solution for pivot tables (matrix), since previous expression worked only for a tabular visualization. Also measure expression was optimized to implement only one filter.
This should be the final expression for pivot table:
Cumulative Score =
CALCULATE (
SUM ( 'Shifts'[Score] ),
FILTER (
ALLSELECTED ( Shifts ),
[Helper] = MAX ( [Helper] )
&& [EMP_ID] = MAX ( Shifts[EMP_ID] )
&& [Date] <= MAX ( Shifts[Date] )
)
)
Note: If you want to ignore filters use ALL instead of
ALLSELECTED.
Results in Power BI Matrix:
Results in PowerPivot Pivot Table:
Let me know if this helps.

Using intersect with 2 large sets to get the distinct count - MDX

I have a calculated member which represents an active customer. That would be the following;
WITH MEMBER [Measures].[Active Customers] AS
Count ( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
This works great, when I want to get active customers in the current period and previous ones, as I get my time dimension, and use the CurrentMember, CurrentMember.PrevMember and CurrentMember with the Lag function in order to get customers who were active in previous periods.
My problem is when I want to get the count of customers, who are common in different members. Say I want to get customers who are active in the current period, and NOT in the previous period. Or another case, active in current, and active in previous. Because of this, I would need to use the INTERSECT function, and my customer dimension has 4 million records. This is already a subset of 9 million records.
So when checking for a customer who is active in 2 consecutive periods, I do this (The Active Previous Period, and Active Current Period is basically the calculated member above, however with CurrentMember and CurrentMember.PrevMember) :
set [Previous Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
)
set [Current Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Current Period] ),
[Measures].[Active Current Period] > 0
)
member [Measures].[Active 2 consecutive periods] as
count(INTERSECT([Current Active Customers Set],[Previous Active Customers Set]) )
This takes forever. Is there anyway to improve, or go around this performance problem of using the INTERSECT with large sets? Or maybe optimizations on the MDX query? I tried always using a subset of my customers dimension, but this only reduced the number of records to less than 4 million - so it's still large. Any help would be appreciated!
I would assume you can speed this up if you avoid using named sets and calculated members as far as possible.
One step towards this would be as follows: Create a new fact table with foreign keys just to your customer and time dimension, and add a record to it if a customer was active on that day. Build a measure group, let's say "activeCustomers" based on this table, just using "count" as the measure. But make this invisible, as we do not need it.
Then, you can replace
count( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )
with
count( Exists(
[Customer].[Customer Key].Members,
<state your time selection here>,
"activeCustomers"
) )
Exists should be more efficient than Filter.
Another optimization approach could be the observation that instead of intersecting two sets generated via Filter, you could define one set with a more complex filter, avoiding that AS is looping along the customers twice, and then intersecting the results:
set [Active Customers Set] AS
Filter (
( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
[Measures].[Active Previous Period] > 0
AND
[Measures].[Active Current Period] > 0
)