I'm new to DAX and I have a problem that I don't know how to solve. I simplify it with an artificial example. I'm in the context of a SSAS tabular model.
Let's say I have a factory of "zirkbols" (invented) and a table representing the sales of zirkbols. Each customer bought a different number of zirkbols and gave a rating 1 to 5.
The table looks like this:
with this code to generate it:
= DATATABLE(
"ClientId"; INTEGER;
"CountryCode"; STRING;
"OrderDate"; DATETIME;
"OrderAmount"; DOUBLE;
"Rating"; INTEGER;
{
{123; "US"; "2018-01-01"; 502; 1};
{124; "US"; "2018-01-01"; 400; 4};
{125; "US"; "2018-01-03"; 60; 5};
{126; "US"; "2018-01-02"; 160; 4};
{124; "US"; "2018-01-05"; 210; 3};
{128; "JP"; "2018-01-03"; 22; 5};
{129; "JP"; "2018-01-07"; 540; 2};
{130; "JP"; "2018-01-03"; 350; 4};
{131; "JP"; "2018-01-09"; 405; 4};
{132; "JP"; "2018-01-09"; 85; 5}
}
)
I need to create measures that give me statistics for the sample of clients that bought 30% of my sales, taken among the most satisfied. This means that I need to rank by "Rating" and sum the "OrderAmounts" until I get at least 30% of the total. This sample are my happy zirkbols owners. For these happy zirkbols owners I would like to know for example their average rating.
I think that this could be easier if I could put the running total of the order amounts in a calculated column, but I would like to give the analyst the possibility to filter for example only the "US" sales and I don't know if this is possible in a calculated column.
On the other hand I suppose that the ranking by rating can be stored in a calculated column (Ranking = RANK.EQ([Rating];ClientOrders[Rating])).
I expect the following result:
As I said I'm new to SSAS and DAX, so I don't know if I am taking this problem from the wrong angle...
Regards,
Nicola
P.S. Please see the comments on the accepted answer as well
I've got some DAX mostly working, but I'll need to come back to it.
In the meantime, here's some of the code:
Happy owners amount =
VAR Summary =
SUMMARIZE (
Orders,
Orders[CountryCode],
Orders[ClientId],
Orders[Rating],
"Amount", SUM ( Orders[OrderAmount] )
)
VAR Ranked =
ADDCOLUMNS ( Summary, "Rank", RANKX ( Summary, Orders[Rating] + 1 / [Amount] ) )
VAR Cumulative =
ADDCOLUMNS (
Ranked,
"CumAmt", CALCULATE (
SUM ( Orders[OrderAmount] ),
FILTER ( Ranked, [Rank] <= EARLIER ( [Rank] ) )
)
)
VAR CutOff =
MINX (
FILTER (
Cumulative,
[CumAmt]
> 0.3 * CALCULATE ( SUM ( Orders[OrderAmount] ), ALLSELECTED ( Orders ) )
),
[Rank]
)
RETURN
SUMX ( FILTER ( Cumulative, [Rank] <= CutOff ), [Amount] )
Related
I need to create measure based on maxium planid per each emp. I tried dax query in ssas (SSMS) but not getting expected results.
Logic: We need to calculate count by each status on latest plan.
for example: employee (5245) is having two plans (8869,6988)
when we calculate status for accepted then we have to consider this empid becauase latest planid(8869) staus is accepted .i.e. this eid should consider status: ACCEPTED only because it has the latest planid.
Source Data
OutPut Expected
I am using below query
Define
MEASURE table[AcceptedCount] =
`
COUNTROWS (
FILTER (
SUMMARIZE ( table,table[EmployeeNumber],table[status]),
table[status] = "ACCEPTED"
))
MEASURE table[EscalatedCount] =
COUNTROWS (
FILTER (
SUMMARIZE ( table,table[EmployeeNumber],table[status]),
table[status] = "ESCALATED"
))
MEASURE table[overall] =
COUNTROWS (
FILTER (
SUMMARIZE ( table,table[EmployeeNumber],table[status]),
table[status] in {"ESCALATED","ACCEPTED"}
))
VAR status_count =
SUMMARIZECOLUMNS (
table[unit],
"AcceptedCount_status", table[AcceptedCount],
"EscalatedCount_status", table[EscalatedCount]
"Overallcount",table[overall]
)
evaluate status_count
Output for the above query : this is wrong results
enter image description here
Can you please help me to get the item with the highest count using DAX?
Measure = FIRSTNONBLANK('Table1'[ItemName],CALCULATE(COUNT('Table2'[Instance])))
This shows the First ItemName in the table but doesnt get the ItemName of the Highest Value.
Thanks
Well, it's more complicated than I would have wanted, but here's what I came up with.
There things that you are hoping to do that are not so straightforward in DAX. First, you want an aggregated aggregation ;) -- in this case, the Max of a Count. The second thing is that you want to use a value from one column that you identify by what's in another column. That's row-based thinking and DAX prefers column-based thinking.
So, to do the aggregate of aggregates, we just have to slog through it. SUMMARIZE gives us counts of items. Max and Rank functions could help us find the biggest count, but wouldn't be so useful for getting Item Name. TOP N gives us the whole row where our count is the biggest.
But now we need to get our ItemName out of the row, so SELECTCOLUMNS lets us pick the field to work with. Finally, we really want a value not a 1-column, 1-row table. So FirstNonBlank finishes the job.
Hope it helps.
Here's my DAX
MostFrequentItem =
VAR SummaryTable = SUMMARIZE ( 'Table', 'Table'[ItemName], "CountsByItem", COUNT ( 'Table'[ItemName] ) )
VAR TopSummaryItemRow = TOPN(1, SummaryTable, [CountsByItem], DESC)
VAR TopItem = SELECTCOLUMNS (TopSummaryItemRow, "TopItemName", [ItemName])
RETURN FIRSTNONBLANK (TopItem, [TopItemName])
Here's the DAX without using variables (not tested, sorry. Should be close):
MostFrequentItem_2 =
FIRSTNONBLANK (
SELECTCOLUMNS (
TOPN (
1,
SUMMARIZE ( 'Table', 'Table'[ItemName], "Count", COUNT ( 'Table'[ItemName] ) ),
[Count], DESC
),
"ItemName", [ItemName]
),
[ItemName]
)
Here's the mock data:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcipNSspJTS/NVYrVIZ/nnFmUnJOKznRJzSlJxMlyzi9PSs3JAbODElMyizNQmLEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Stuff = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Stuff", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Stuff", "ItemName"}})
in
#"Renamed Columns"
This DAX formula issue is frustrating me to no end so I appreciate any help or another way to look at this.
Both of these formulas calculate the same value and that's been confirmed so don't worry about what the LY_Key actually equals. BUT the one with Variables removes the ability to drill down into separate years in the same table. My problem really exists when using weeks, but it's slightly easier to understand using these tables.
Can you see a difference between these 2 formulas that would remove a drilldown capability? Thank you in advance for any assistance.
---Original
Net Sales Trailing 3 Periods LY:=
CALCULATE (
factSales[Net Sales],
ALL(dimDate),
FILTER (
ALL ( 'dimPeriod' ),
'dimPeriod'[PeriodKey] <= MAX ( 'dimPeriod'[PeriodKey] ) - 14
&& 'dimPeriod'[PeriodKey] >= MAX ( 'dimPeriod'[PeriodKey] ) - 16
)
)
---With Variables
Net Sales Trailing 3 Periods LY:=
VAR
LY_FPW = MAX('dimPeriod'[YYYYFP]) - 100
VAR
LY_Key = MAXX(FILTER('dimPeriod', 'dimPeriod'[YYYYFP] = LY_FPW), 'dimPeriod'[PeriodKey])
RETURN
CALCULATE (
factSales[Net Sales],
ALL(dimDate),
FILTER (
ALL ( 'dimPeriod' ),
'dimPeriod'[PeriodKey] <= LY_Key - 1
&& 'dimPeriod'[PeriodKey] >= LY_Key - 3
)
)
I'm trying to compare scraped retail item price data in BigQuery (~2-3B rows depending on the time period and retailers included); with the intent to identify meaningful price differences. For example $1.99 vs $2.00 isn't meaningful, but $1.99 vs $2.50 is meaningful. Meaningful is quantified as a 2% difference between prices.
Example dataset for one item looks like this:
ITEM Price($) Meaningful (This is the column I'm trying to flag)
Apple $1.99 Y (lowest price would always be flagged)
Apple $2.00 N ($1.99 v $2.00)
Apple $2.01 N ($1.99 v $2.01) Still using $1.99 for comparison
Apple $2.50 Y ($1.99 v $2.50) Still using $1.99 for comparison
Apple $2.56 Y ($2.50 v $2.56) Now using $2.50 as new comp. price
Apple $2.62 Y ($2.55 v $2.62) Now using $2.56 as new comp. price
I was hoping to solve the problem just using SQL Window functions (lead, lag, partition over, etc..) comparing the current row's price to the next following row. However, that doesn't work correctly when I get to a non-meaningful price because I always want the next value to be compared to the most recent meaningful price (see $2.50 row example above that's compared to $2.00 and NOT $2.01 in the prior row)
My Questions:
Is it possible to solve this with SQL alone in BigQuery? (e.g. What creative SQL logic solution am I overlooking, like bucketing based on the variance amounts?)
What programmatic options do I have since I can't use stored procedures with BQ? Python/Dataframes in GCP Datalab? BQ UDFs?
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
var result = [];
var last = 0;
var flag = '';
for (i = 0; i < prices.length; i++){
if (i == 0) {
last = prices[i];
flag = 'Y'
} else {
if ((prices[i] - last)/last > 0.02) {
last = prices[i];
flag = 'Y'
} else {flag = 'N'}
}
var rec = [];
rec.price = prices[i];
rec.flag = flag;
result.push(rec);
}
return result;
""";
SELECT item, rec.*
FROM (
SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
FROM `yourTable`
GROUP BY item
), UNNEST(x(prices) ) AS rec
-- ORDER BY item, price
You can play with / test it with below dummy data from your question
#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
var result = [];
var last = 0;
var flag = '';
for (i = 0; i < prices.length; i++){
if (i == 0) {
last = prices[i];
flag = 'Y'
} else {
if ((prices[i] - last)/last > 0.02) {
last = prices[i];
flag = 'Y'
} else {flag = 'N'}
}
var rec = [];
rec.price = prices[i];
rec.flag = flag;
result.push(rec);
}
return result;
""";
WITH `yourTable` AS (
SELECT 'Apple' AS item, 1.99 AS price UNION ALL
SELECT 'Apple', 2.00 UNION ALL
SELECT 'Apple', 2.01 UNION ALL
SELECT 'Apple', 2.50 UNION ALL
SELECT 'Apple', 2.56 UNION ALL
SELECT 'Apple', 2.62
)
SELECT item, rec.*
FROM (
SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
FROM `yourTable`
GROUP BY item
), UNNEST(x(prices) ) AS rec
ORDER BY item, price
Result is as below
item price flag
---- ----- ----
Apple 1.99 Y
Apple 2.0 N
Apple 2.01 N
Apple 2.5 Y
Apple 2.56 Y
Apple 2.62 Y
From r In ReceiptLines
Where
r.RECEIPT.RECEIPTDATE >= _reportStartDate
And r.RECEIPT.RECEIPTDATE <= _reportEndDate
Let amount = r.QUANTITY * r.PRICE
Let discount = r.RECEIPTDISCOUNTs.Sum(Function(d) d.DISCOUNT)
where discount > 0
Group By Department = r.ITEMSTYLE.ITEM.CATEGORY.DEPARTMENT.DEPARTMENTNAME
Into Sales = Sum(amount - discount),
Average = Average(amount - discount),
Count = Count()
I am fetching all departments and their sales, average, count from the ReceiptLine, Receipt, ReceiptDiscount tables. The problem i am facing is, if i remove where discount > 0, I am getting null exception. But if I include that, then I only get sales that has discount.
How would I write query that bring all sales less discount (if it has one). Any help is highly appreciated.
This is a common pitfall with LINQ2SQL.
The function SUM in SQL returns null if there are no items in the collection, but the signature of Enumerable.Sum() returns an int. This gives a runtime exception when the SQL query return null where the LINQ2SQL provider expects an integer.
The solution is to cast the result of the sum to a nullable integer and use GetValueOrDefault to convert the null-case to 0.
Replace
Let discount = r.RECEIPTDISCOUNTs.Sum(Function(d) d.DISCOUNT)
with
Let discount = CType(r.RECEIPTDISCOUNTs.Sum(Function(d) d.DISCOUNT), Integer?).GetValueOrDefault(0)
Have you tried:
...
Let amount = r.QUANTITY * r.PRICE
Let nDiscount = r.RECEIPTDISCOUNTs.Sum(Function(d) d.DISCOUNT)
Let discount = IIf(nDiscount == Nothing, 0, nDiscount)
Group By Department = r.ITEMSTYLE.ITEM.CATEGORY.DEPARTMENT.DEPARTMENTNAME
...