Counting latest instance of multiple only based on filter context - ssas

I've got a large table of events that have occurred in an inventory of vehicles, which affect whether they are in service or out of service. I would like to create a measure that would be able to count the number of vehicles in the various inventories at any point in time, based on the events in this table.
This table is pulled from a SQL database into an Excel 2016 sheet, and I'm using PowerPivot to try to come up with the DAX measure.
Here is some example data event_list:
vehicle_id event_date event event_sequence inventory
100 2018-01-01 purchase 1 in-service
101 2018-01-01 purchase 1 in-service
102 2018-02-04 purchase 1 in-service
100 2018-02-07 maintenance 2 out-of-service
101 2018-02-14 damage 2 out-of-service
101 2018-02-18 repaired 3 in-service
100 2018-03-15 repaired 3 in-service
102 2018-05-01 damage 2 out-of-service
103 2018-06-03 purchase 1 in-service
I'd like to be able to create a pivot table in Excel (or use CUBE functions, etc) to get an output table like this:
date in-service out-of-service
2018-02-04 3 0
2018-02-14 1 2
2018-03-15 3 0
2018-06-03 3 1
Essentially, I want to be able to calculate the inventory based on any date in time. The example only has a few dates, but hopefully provides enough of a picture.
I've basically come up with this so far, but it counts more vehicles than desired - I can't figure out how to only take the latest event_sequence or event_date and use that to count the inventory.
cumulative_vehicles_at_date:=CALCULATE(
COUNTA([vehicle_id]),
IF(IF(HASONEVALUE (event_list[event_date]), VALUES (event_list[event_date]))>=event_list[event_date],event_list[event_date])
)
I tried using MAX() and EARLIER() functions, but they don't seem to work.
Edit: Added the PowerBI tag as I'm now using that software to attempt to solve this as well. See comments on Alexis Olson's answer.

I think I've found a much cleaner method than I gave previously.
Let's add two columns onto the event_list table. One which counts vehicles "in-service" on that date and one which counts vehicles "out-of-service" on that date.
InService =
VAR Summary = SUMMARIZE(
FILTER(event_list,
event_list[event_date] <= EARLIER(event_list[event_date])),
event_list[vehicle_id],
"MaxSeq", MAX(event_list[event_sequence]))
VAR Filtered = FILTER(event_list,
event_list[event_sequence] =
MAXX(
FILTER(Summary,
event_list[vehicle_id] = EARLIER(event_list[vehicle_id])),
[MaxSeq]))
RETURN SUMX(Filtered, 1 * (event_list[inventory] = "in-service"))
You can create an analogous calculated column for OutOfService or you can just take the total minus the InService count.
OutOfService =
CALCULATE(
DISTINCTCOUNT(event_list[vehicle_id]),
FILTER(event_list,
event_list[event_date] <= EARLIER(event_list[event_date])))
- event_list[InService]
Now all you have to do is put event_date on the matrix visual rows section and add the InService and OutOfService columns to the values section (use Maximum or Minimum for the aggregation option rather than Sum).
Here's the logic behind the calculated column InService:
We first create a Summary table which calculates the maximal event_sequence value for each vehicle. (We filter the event_date to only consider dates up to the current one we are working with.)
Now that we know what the last event_sequence value is for each vehicle, we use that to filter the entire table down to just the rows that correspond to those vehicles and sequence values. The filter goes through the table row by row and checks to see if the sequence value matches the one we calculated in the Summary table. Note that when we filter the Summary table to just the vehicle we are currently working with, we only get a single row. I'm just using MAXX to extract the [MaxSeq] value. (It's kind of like using LOOKUPVALUE, but you can't use that on a variable.)
Now that we've filtered the table just to the most recent events for each vehicle, all we need to do is count how many of them are "in-service". I used a SUMX here where the 1*(True/False) coerces the boolean value to return 1 or 0.

This is pretty difficult. I don't have a great answer, but here's something that kind of works.
You'll create a new calculated table where you'll calculate the status for each vehicle on each date. Start with the base cross join for each vehicle and each date:
= CROSSJOIN(VALUES(event_list[vehicle_id]), VALUES(event_list[event_date]))
Then add a calculated column to find the max sequence number for each vehicle on that date.
Sequence = MAXX(
FILTER(event_list,
event_list[event_date] <= Cross[event_date] &&
event_list[vehicle_id] = Cross[vehicle_id]),
event_list[event_sequence])
Now you can lookup the inventory value for each vehicle/sequence pair with another calculated column:
Inventory = LOOKUPVALUE(
event_list[inventory],
event_list[vehicle_id], Cross[vehicle_id],
event_list[event_sequence], Cross[Sequence])
The result should look something like this:
Once you have this, you can create a matrix using this calculated table. Put the event_date on the rows and Inventory on the columns. Filter out blank inventory values in the visual level filter and put the vehicle_id in the values field, using a count or distinct count as the aggregation method (instead of the default sum).
It should look like this:

Related

SQL new variable using multiple conditions (count of occurrences in 6 month look-back period using timestamp for each unique ID)

I am trying to achieve the following:
Attached is what my data looks like.
I want to create 2 new variables which counts the number of times 'Target' (variable 1) and 'Competitor' appears (variable 2), within the last 6 months of a given date_of_prescription. This would be done for every unique D_PRESCRIBER_ID.
So for example:
For ID: 1003000902 prescribing on 2020-03-18 date, the COMPETITOR drug. When you look at the rows before that, you can see that within 6 months prior to the 2020-03-18 date, there are 2 Target drugs prescribed and 0 competitor drugs prescribed. So my variable values will be: 2 (variable 1) and 0 (variable 2)
My data is much larger than what the screenshot looks like. It has more variables and 1000's of unique D_PRESCRIBER_IDs. Each row is not a unique ID, there are duplicates in the data for various date_of_prescription timestamps. These variables need to be created in my select statements in order to keep the rest of the data the same.
Any help here would be awesome. Thanks!

TOTAL vs Aggr in QlikView

I'm trying to understand how TOTAL and Aggr work in QlikView. Could someone please explain the difference between the two examples below, and if possible please illustrate with a SQL query?
Example1:
Max({<Field1=>} Aggr(Sum({<Field2={'Value'}, Field1=>} StuffCount), Field1))
Example2:
Max({<Field1=>} TOTAL Aggr(Sum({<Field2={'Value'}, Field1=>} StuffCount), Field1))
Not sure what you mean with and SQL query in this example. Anyway, imagine you have this list of Customers (CustomerID) and Sales (Sales):
CustomerID/ Sales
Customer1 25
Customer2 20
Customer1 10
Customer1 5
Customer1 20
Customer3 30
Customer2 30
Then you want to show it on a pivot table with dimension CustomerID and two expressions:
Max(Aggr(Sum(Sales), CustomerID)) // this will show 60 for the first customer, 50 for the second and 30 for the third one
Max(TOTAL Aggr(Sum(Sales),CustomerID)) //this will show 60 in every row of your table (which is the maximum sum of sales among all customers)
So basically AGGR creates a temporal list of whatever you put in the first function input (in this case sum(Sales)) using the dimension of the second (CustomerID). Then you can perform operations on that list (such as Max, Min, Avg...). If you write TOTAL and use the expression in a pivot table, then you 'ignore' the dimensions that might be affecting the operations.
Hope it helps
TOTAL keyword is useful in charts/pivot tables. It applies the same calculation on every datapoint in the chart/pivot, with independence of dimentions.
Therefore - if you put your expression into pivot table - 1st option may display different values per cell (if the Aggr is rellevant) when the 2nd will result in same values.
Aggr function allows making double aggregations (avg of sum, max of count etc..) on different group by bases.

How to insert uneven data rows into matrix in SAS?

I have an originations data set with loan ids. I then have a corresponding dataset with performance data for each of these loans ids, which can be anywhere from 10-40 rows in the performance data set.
The start date of each of the performance loans is not the same either, although some do overlap. What I want to do is take every loan id group in the performance data set, and then create a row of a certain column value across all occurrences in the data set. It doesn't matter if they start on different dates, I just want to align the values as this is the first value for loan id x and y.
For example:
ID Date Val
3 201601 100
3 201602 102
3 201603 103
--> Result:
ID Val1 Val2 Val3
3 100 102 103
I'm having two issues. One is the differing size of performance data for each id. I can't construct a matrix with differing lengths of rows. I'm assuming I'll need to append 0's to the end of each row to meet a predefined width.
My second issue is that I'm not sure how to read through a the performance data set to group loans, extract the value column, construct the column into a row for that id, and then insert into a matrix. I know how I would do this in Python but I need to use SAS. I can construct tables in SAS, but I'm not sure how to append rows, only columns.
If someone could provide some guidance on this it'd be a great help.
Anyone who runs into a similar issue it ended up being only a few lines of code.
proc transpose data = new_data
out = new_data1;
var trans_state;
by id;
run;
The output will be

counting and numbering in a select statement in Access SQL

Could you please help me figuring out how to accomplish the following.
I have a table containing the number of products available between one date and another as per below:
TABLE MyProducts
DateProduct ProductId Quantity Price
26/02/2016 7 2 100
27/02/2016 7 3 100
28/02/2016 7 4 100
I have created a form where users need to select a date range and the number of products they are looking for (in my example, the number of products is going to be 1).
In this example, let's say that a user makes the following selection:
SELECT SUM(MyProducts.Price) As TotalPrice
FROM MyProducts WHERE MyProducts.DateProduct
Between #2/26/2016# And #2/29/2016#-1 AND MyProducts.Quantity>=1
Now the user can see the total amount that 1 product costs: 300
For this date range, however, I want to allow users to select from a combobox also the number of products that they can still buy: if you give a look at the Quantity for this date rate, a user can only buy a maximum of 2 products because 2 is the lowest quantity available is in common for all the dates listed in the query.
First question: how can I feed the combobox with a "1 to 2" list (in this case) considering that 2 is lowest quantity available in common for all the dates queried by this user?
Second question: how can I manage the products that a user has purchased.
Let's say that a user has purchased 1 product within this date range and a second user has purchased for the very same date range the same quantity too (which is 1) for a total of 2 products purchased already in this date range. How can I see that for this date rate and giving this case the number of products actually available are:
DateProduct ProductId Quantity Price
26/02/2016 7 0 100
27/02/2016 7 1 100
28/02/2016 7 2 100
Thank you in advance and please let me know should you need further information.
You could create a table with an integer field counting from 1 to whatever max qty you could expect. Then create a query that will only return rows from your new table up to the min() qty in the MyProducts table. Use that query as the control source of your combobox.
EDIT: You will actually need two queries. The first should be:
SELECT Min(MyProducts.Quantity) AS MinQty FROM MyProducts;
which I called "qryMinimumProductQty". I create the table called "Numbering" with a single integer field called "Sequence". The second query:
SELECT Numbering.Sequence FROM Numbering, qryMinimumProductQty WHERE Numbering.Sequence<=qryMinimumProductQty.MinQty;
AFAIK there is no Access function/feature that will fill in a series of numbers in a combobox control source. You have to build the control source yourself. (Anyone with more VBA experience might have a solution to solve this, but I do not.)
It makes me ache thinking of an entire table with a single integer column only being used for a combobox though. A simpler approach to the combobox would just to show the qty available in a control on your form, give an unbound text box for the user to enter their order qty, and add a validation rule to stop the order and notify them if they have chosen a number greater than the qty on hand. (Just a thought)
As for your second question, I don't really understand what you're looking for either. It sounds like there may be another table of purchases? It should be a simple query to relate MyProducts to Purchases and take the difference between your MyProducts!qty and the Purchases!qty. If you don't have a table to store Purchases, it might be warranted based on my cursory understanding of your system.

Dynamic use of MDX AVG function

Anyone have advice on how to build an average measure that is dynamic -- it doesn't specify a particular slice but instead uses your current view? I'm working within a front-end OLAP viewer (Strategy Companion) and I need a "dynamic" implementation based on the dimensions that are currently filtered in the data view.
My fact table looks something like this:
Key AmountA IndicatorA AmountB Other Data
1 5 1 null 25
2 6 1 null 52
3 7 1 2 106
4 null 0 4 108
Now I can specify a simple average for "[Measures].[AmountA]" with "[Measures].[AmountA] / [Measures].[IndicatorA]" which works great - "[IndicatorA]" sums up to the number of non-null values of "[AmountA]". And this also works great no matter what dimensions are selected in the view - it always divides by the count of rows that have been filtered in.
But what about [AmountB]? I don't have a null indicator column. I want to get an average value of [AmountB] for whatever rows have been filtered in for my current view. If I try to use the count of rows as a simple formula (psuedo-code "[Measures].[AmountB] / Count([Measures].[Key])") I get the wrong result, because it is counting all the null rows in the average.
So, I need a way to use the AVG function to specify the average of [AmountB] over the set of "whatever rows I'm currently filtering in, based on whatever dimensions I'm currently using". How do I specify this dynamic set?
I've tried several different uses of the AVG function and they have either returned null or summed up to huge numbers, clearly not the average I'm looking for.
Thanks-
Matt
Sorry, my first suggestion was wrong. If you don't have access to OLAP cube you can't write any mdx-query for this purpose (IMHO). Because, you don't have any detailed data (from your fact table) in this access level and you can use only aggregated data and dimensions from your cube.
Otherwise (if you have access to olap db), you can create this metric (count of not NULL rows) in your measure group and after that use it for AVG calculation (as calculated member in your cube or in section "WITH" in your mdx-query).