Functionality Similar to SUMIFS with DAX? - powerpivot

Sample workbook: http://1drv.ms/1VDgAjf
I've got a table similar to:
ActiveDate CommenceDate Amount
-------------------------------------------
20150115 20150201 10
20150115 20150201 2
20150223 20150301 3
20150223 20150202 5
I need to calculate the following:
Date Amount
---------------------
25-Jan-15 0
30-Jan-15 0
04-Feb-15 12
09-Feb-15 12
14-Feb-15 12
19-Feb-15 12
24-Feb-15 17
01-Mar-15 20
06-Mar-15 20
11-Mar-15 20
So.. in Excel I've tested this with the following statement:
=SUMIFS(
Table[amount]
,Table[commence] ,"<="&TEXT(<<DateRef>>, "yyyymmdd")
,Table[active] ,"<="&TEXT(<<DateRef>>, "yyyymmdd")
)
This works fine.. my question is, how do i replicate this in DAX?
Here is my best stab (assuming a date dimension, and it connected to "CommenceDate"):
TotalAmount :=
CALCULATE (
SUM ( Table[Amount] ),
FILTER (
ALL ( 'Date'[Date] ),
'Date'[Date] <= MAX ( 'Date'[Date] )
)
)
My best idea (and i think it's a pretty crappy idea), is to add a new column that gives me the greater of ActiveDate, or CommenceDate and then use an in-active relationship to join to that, and make the relationship active just for this calculation?
=IF([#active]>[#commence], [active], [commence])
Thoughts?

Your thought of creating an additional column is probably the best in this case. No matter what you do, you'll have to escape the active relationship between Transactions[commence] and DimDate[DateKey] to get this logic working correctly.
After doing that manipulation, you'll then have to filter on two columns ([active], [commence]) against the current date in context. This will be more cumbersome and less efficient than navigating a single relationship and using a single filter.
The thing to keep in mind is that you need to apply the relationship manipulation step before, not with the filter manipulation step. This looks like the following nested CALCULATE():
TotalAmount:=
CALCULATE(
CALCULATE(
SUM(Transactions[Amount])
,USERELATIONSHIP(Transactions[MaxDateKey], DimDate[DateKey])
)
,FILTER(
ALL('Date')
,'Date'[Date] <= MAX('Date'[Date])
)
)

Related

Snowflake Unpivot data with boolean

I have data as following
STORE_NO STORE_ADDRESS STORE_TYPE STORE_OWNER STORE_HOURS
1 123 Drive Thru Harpo 24hrs
1 123 Curbside Harpo 24hrs
1 123 Counter Harpo 24hrs
2 456 Drive Thru Groucho 9 to 9
2 456 Counter Groucho 9 to 9
And I want to pivot it as following.
STORE_NO STORE_ADDRESS Drive Thru Curbside Counter STORE_OWNER STORE_HOURS
1 123 TRUE TRUE TRUE Harpo 24hrs
2 456 TRUE FALSE TRUE Groucho 9 to 9
Here is what I have
select *
from stores
pivot(count(STORE_TYPE) for STORE_TYPE in ('Drive Thru', 'Curbside', 'Counter'))
as store_flattened;
But this returns a 1 or a 0. How do I convert to TRUE / FALSE without making this a CTE?
If you are ok with putting column names rather then select *, then following can be used -
select STORE_NO,STORE_ADDRESS,STORE_OWNER,STORE_HOURS,
"'Drive Thru'"=1 as drivethru,
"'Curbside'"=1 as curbside,
"'Counter'"=1 as counter
from stores
pivot(count(STORE_TYPE) for STORE_TYPE in ('Drive Thru', 'Curbside', 'Counter'))
as store_flattened;
I honestly think you should leave it as is. Any attempt at a workaround will result in either complicating the pivot logic or having to manually specify the columns names in multiple places; especially with pivoted columns appearing before the rest. Having said that, if you must find a way to do this, here is an attempt.
I know you wanted to avoid a CTE, but I am using it for a purpose different than what you might had in mind. General idea in steps--
In a CTE, sub-select some of the columns you want to appear before
the pivoted columns. Create a flag based on whether store_type
(b.value) from the lateral flatten matches existing store_type
for a given row. You'll notice the values passed to input=> can be easily copy-pasted to the pivot clause
Pivot using max(flag) which will turn (false,true)->true and
(false,false)->false. You can run the CTE portion to see why that
matters and how it solves the main issue
Finally, use a natural join with the main table to append the rest of
the columns (this is the first time I found a natural join useful enough to keep it. I actively avoid them otherwise)
with cte (store_no, store_address, store_type, flag) as
(select store_no, store_address, b.value::string, b.value::string = store_type
from t, lateral flatten(input=>['Drive Thru', 'Curbside', 'Counter']) b)
select *
from cte pivot(max(flag) for store_type in ('Drive Thru', 'Curbside', 'Counter'))
natural join (select distinct store_no, store_owner, store_hours from t)
Outputs:

Is there a way do dynamically set ROWS BETWEEN X PRECENDING AND CURRENT ROW?

i'm looking for a way to, on my query, dynamically set the beginning of the window function on Sql Server using ROWS BETWEEN.
Something like:
SUM(field) OVER(ORDER BY field2 ROWS BETWEEN field3 PRECEDING AND CURRENT ROW)
field3 holds the amount of items (via group by from a CTE) that represent a group.
Is that possible or should i try a different approach?
>> EDIT
My query is too big and messy to share here, but let me try to explain what i need. It's from a report builder which allows users to create custom formulas, like "emplyoees/10". This also allows the user to simply input a formula like "12" and i need to calculate subtotals and the grand total for them. When using a field, like "employees", everything works fine. But for constant values i can't sum the values without rewriting a lot of stuff (which i'm trying to avoid).
So, consider a CTE called "aggregator" and the following query:
SELECT
*,
"employees"/10 as "ten_percent"
12 as "twelve"
FROM aggregator
This query returns this output:
row_type counter company_name department_name employees ten_percent twelve
data 1 A A1 10 1 12
data 1 A A2 15 1,5 12
data 1 A A3 10 1 12
subtotal 3 A 35 3,5 12
data 1 B B1 10 1 12
subtotal 1 B 10 1 12
total 4 45 4,5 12
As you can see, the values fot "twelve" are wrong for subtotal and total row types. I'm trying to solve this without changing the CTE.
ROLLUP won't work because i already have the sum for other columns.
I tried this (i ommited "row_type_sort" on the table above, it defines the sorting):
CASE
WHEN row_type = 'data' THEN
MAX(aggregator.[twelve])
ELSE
SUM(SUM(aggregator.[twelve]))
OVER (ORDER BY "row_type_sort" ROWS BETWEEN unbounded PRECEDING AND CURRENT ROW)
END AS "twelve"
This would work OK if i could change "unbounded" by the value of column "counter", which was my original question.
LAG/LEAD wasn't helpful neither.
I'm out of ideas. Is it possible to achieve what i need only by changing this part of the query, or the result of the CTE should be changed as well?
Thanks

Counting latest instance of multiple only based on filter context

I've got a large table of events that have occurred in an inventory of vehicles, which affect whether they are in service or out of service. I would like to create a measure that would be able to count the number of vehicles in the various inventories at any point in time, based on the events in this table.
This table is pulled from a SQL database into an Excel 2016 sheet, and I'm using PowerPivot to try to come up with the DAX measure.
Here is some example data event_list:
vehicle_id event_date event event_sequence inventory
100 2018-01-01 purchase 1 in-service
101 2018-01-01 purchase 1 in-service
102 2018-02-04 purchase 1 in-service
100 2018-02-07 maintenance 2 out-of-service
101 2018-02-14 damage 2 out-of-service
101 2018-02-18 repaired 3 in-service
100 2018-03-15 repaired 3 in-service
102 2018-05-01 damage 2 out-of-service
103 2018-06-03 purchase 1 in-service
I'd like to be able to create a pivot table in Excel (or use CUBE functions, etc) to get an output table like this:
date in-service out-of-service
2018-02-04 3 0
2018-02-14 1 2
2018-03-15 3 0
2018-06-03 3 1
Essentially, I want to be able to calculate the inventory based on any date in time. The example only has a few dates, but hopefully provides enough of a picture.
I've basically come up with this so far, but it counts more vehicles than desired - I can't figure out how to only take the latest event_sequence or event_date and use that to count the inventory.
cumulative_vehicles_at_date:=CALCULATE(
COUNTA([vehicle_id]),
IF(IF(HASONEVALUE (event_list[event_date]), VALUES (event_list[event_date]))>=event_list[event_date],event_list[event_date])
)
I tried using MAX() and EARLIER() functions, but they don't seem to work.
Edit: Added the PowerBI tag as I'm now using that software to attempt to solve this as well. See comments on Alexis Olson's answer.
I think I've found a much cleaner method than I gave previously.
Let's add two columns onto the event_list table. One which counts vehicles "in-service" on that date and one which counts vehicles "out-of-service" on that date.
InService =
VAR Summary = SUMMARIZE(
FILTER(event_list,
event_list[event_date] <= EARLIER(event_list[event_date])),
event_list[vehicle_id],
"MaxSeq", MAX(event_list[event_sequence]))
VAR Filtered = FILTER(event_list,
event_list[event_sequence] =
MAXX(
FILTER(Summary,
event_list[vehicle_id] = EARLIER(event_list[vehicle_id])),
[MaxSeq]))
RETURN SUMX(Filtered, 1 * (event_list[inventory] = "in-service"))
You can create an analogous calculated column for OutOfService or you can just take the total minus the InService count.
OutOfService =
CALCULATE(
DISTINCTCOUNT(event_list[vehicle_id]),
FILTER(event_list,
event_list[event_date] <= EARLIER(event_list[event_date])))
- event_list[InService]
Now all you have to do is put event_date on the matrix visual rows section and add the InService and OutOfService columns to the values section (use Maximum or Minimum for the aggregation option rather than Sum).
Here's the logic behind the calculated column InService:
We first create a Summary table which calculates the maximal event_sequence value for each vehicle. (We filter the event_date to only consider dates up to the current one we are working with.)
Now that we know what the last event_sequence value is for each vehicle, we use that to filter the entire table down to just the rows that correspond to those vehicles and sequence values. The filter goes through the table row by row and checks to see if the sequence value matches the one we calculated in the Summary table. Note that when we filter the Summary table to just the vehicle we are currently working with, we only get a single row. I'm just using MAXX to extract the [MaxSeq] value. (It's kind of like using LOOKUPVALUE, but you can't use that on a variable.)
Now that we've filtered the table just to the most recent events for each vehicle, all we need to do is count how many of them are "in-service". I used a SUMX here where the 1*(True/False) coerces the boolean value to return 1 or 0.
This is pretty difficult. I don't have a great answer, but here's something that kind of works.
You'll create a new calculated table where you'll calculate the status for each vehicle on each date. Start with the base cross join for each vehicle and each date:
= CROSSJOIN(VALUES(event_list[vehicle_id]), VALUES(event_list[event_date]))
Then add a calculated column to find the max sequence number for each vehicle on that date.
Sequence = MAXX(
FILTER(event_list,
event_list[event_date] <= Cross[event_date] &&
event_list[vehicle_id] = Cross[vehicle_id]),
event_list[event_sequence])
Now you can lookup the inventory value for each vehicle/sequence pair with another calculated column:
Inventory = LOOKUPVALUE(
event_list[inventory],
event_list[vehicle_id], Cross[vehicle_id],
event_list[event_sequence], Cross[Sequence])
The result should look something like this:
Once you have this, you can create a matrix using this calculated table. Put the event_date on the rows and Inventory on the columns. Filter out blank inventory values in the visual level filter and put the vehicle_id in the values field, using a count or distinct count as the aggregation method (instead of the default sum).
It should look like this:

Efficiently identify all FK items with n>3 dates within any 8 week period from a SQL table?

I have a ~400,000 row table containing the dates at which a collection of ~30,000 people had appointments. Each row has the patient ID number and an appointment date. I want to efficiently select people who had at least 4 appointments in an 8 week span. Ideally, I would also flag the appointments that were within this 8 week span as I did so. I am working in a server environment that does not allow CLR aggregate functions. Is this possible to do in SQL server? If so, how?
What I've thought about:
If I could write my own aggregate function to do this via GROUP BY that would obviously be best - but I can't seem to find any way to do it with the built in aggregate functions.
I can add a column to my original table giving a date 8 weeks out from any given appointment, but can't come up with any way that doesn't involve a for loop to then ask the question row by row whether there are at least 3 other appointments within that window.
Finally, I've even though that perhaps I could just do GROUP BY but somehow create 100 new columns (as there are up to that many appointments for some patients) to create a table that contains every appointment indexed by patient, but even as a SQL newbie I'm pretty sure that as soon as I get to the point of imagining adding 100 new columns I'm going down the wrong road....
For clarity of discussion, here is some notation:
MyTable:
ApptID PatientID ApptDate (in smalldatetime)
--------------------------------------------------
Apt1 Pt1 Datetime1
Apt2 Pt1 Datetime2
Apt3 Pt2 Datetime3
... ... ...
Desired output (one option):
PatientID 4aptsIn8weeks? (Boolean) InitialApptDateForWin
Pt1 1 Datetime1
Pt2 0 NULL
Pt3 1 Datetime3
...
Desired output (another option):
ApptID PatientID ApptDate InAn8wkWindow? InitialApptDateForWin
Apt1 Pt1 Datetime1 1 Datetime1
Apt2 Pt1 Datetime2 1 Datetime1
Apt3 Pt2 Datetime3 0 NULL
... ... ...
But really, any output format that will in the end let me select patients and appointments that meet this criterion would be dandy....
Thanks for any ideas!
EDIT: Here's a slightly decompressed outline of my implementation of the selected answer below, just in case the details are helpful for anyone else (being new to SQL, it took me a couple stabs to get it working):
WITH MyTableAlias AS (
SELECT * FROM MyTable
)
SELECT MyTableAlias.PatientID, MyTable.Apptdate AS V1,
MyTableAlias.Apptdate AS V2
INTO temp1
FROM MyTable INNER JOIN MyTableAlias
ON (
MyTable.PatientID = MyTableAlia.PatientID
AND (DATEDIFF(Wk,MyTable.Apptdate,MyTableAlias.Apptdate) <=8 )
);
-- Since this gives for any given two visit dates 3 hits
-- (V1-V1, V1-V2, V2-V2), delete the ones where the second visit is being
-- selected as V1:
DELETE FROM temp1
WHERE V2<V1;
-- So far we have just selected pairs of visits within an 8 week
-- span of each other, including an entry for each visit being
-- within 8 weeks of itself, but for the rest only including the item
-- where the second visit is after the first. Now we want to look
-- for examples of first visits where there are at least 4 hits:
SELECT PatientID, V1, MAX(V2) AS lastvisitinspan, DATEDIFF(Wk,V1,MAX(V2))
AS nWeeksInSpan, COUNT(*) AS nWeeksInSpan
INTO MyOutputTable
FROM temp
GROUP BY PatientID, V1
HAVING COUNT(*)>3;
-- From here on it's just a matter of how I want to handle patients with two
-- separate V1 examples meeting criteria...
Rough outline of the query:
INNER JOIN the table ("table") with itself ("alias"), the ON clause would be:
table.patientid = alias.patientid
table.appointment_date < alias.appointment_date
datediff(table.appointment_date, alias.appointment_date) <= 8 week
Then GROUP BY table.patientid, table.appointment_date
Output table.patientid, table.appointment_date, MAX(alias.appointment_date), COUNT(*)
Add a HAVING COUNT(*) > n clause
There are some issues though:
With 400,000 rows the JOIN could produce a very large result set
It will count some date ranges twice. E.g. if there were 4 visits in 9 week period then it will return two rows (#1, #2, #3 and #2, #3, #4).

finding range by comparing two tables

I have a table in database as "EXPERIENCE RANGE" with rows as (I can also edit this table according to my need)
0
0.5
1
2
3
5
10
20
I have total experience as integer. I need to display the range in which it lies.
Example - for experience of 8, Range will be 5 - 10
I need to write a sql query. Any ideas will be quite helpful as I am new to SQL.
I cannot hard code it..need to take values from tables only.
Assuming that you are using Oracle, the following query works fine with your existing table:
SELECT
( SELECT MAX( value ) FROM experience_range WHERE value <= :search_value ) AS range_start,
( SELECT MIN( value ) FROM experience_range WHERE value > :search_value ) AS range_end
FROM dual;
No need to hardcode the values, and no need to store the lower and upper bounds redundantly.
you can do it with CASE Expression, the syntax is:
SELECT
CASE
WHEN experience >= 0 and experience <= 4 THEN '0-4'
WHEN experience >= 5 and experience <= 10 THEN '5-10'
.....
ELSE 'No Range'
END as Range
FROM Table_Name
If you do need to store the ranges in a table, I would personally suggest altering the structure of the range table (Assuming you are able to), maybe something like:
|--------------------------------------|
|ID|DESCRIPTION|LOWER_LIMIT|UPPER_LIMIT|
|1 | 0 - 0.5 | 0 | 0.5 |
|2 | 0.5 - 1 | 0.5 | 1 |
...
Then you could get your range by running something like:
SELECT DESCRIPTION FROM [RANGES] WHERE <VALUE> >= LOWER_LIMIT AND <VALUE> < UPPER_LIMIT
EDIT - Mikhail's answer also works, defining the ranges within the query itself is also an option and probably simpler providing you don't need to get these ranges from several reports. (That would require editing every report/query individually)
EDIT 2 - I see you are not able to hardcode the ranges, in which case the above would be best. Can I ask why you are unable to hardcode them?