How to apply multiple logical layers to a Measure to create another Measure in DAX - powerpivot

My data is similar to the following:
Type Compliant Non Compliant
A 1 0
B 0 1
C 1 0
I have the following measure that returns the percentage of compliance:
sum(Table[Compliance])/(sum(Table[Compliance])+sum(Table[NonCompliant]))
which gives something like this:
Type Compliance %
A 74.45
B 53.36
C 29.88
Great so far, but now I want to categorize the percentage by % range AND type:
Type Compliance % Class
A 74.45 5
B 53.36 3
C 29.88 1
Currently, I have one measure for each type, because the thresholds for each percentage are different:
Type A Category= CALCULATE(
SWITCH (
TRUE (),
[MeasureCompliancePCT] <= 0.25, 1,
[MeasureCompliancePCT] <= 0.50, 2,
[MeasureCompliancePCT] <= 0.61, 3,
[MeasureCompliancePCT] <= 0.80, 4,
5),Table[Type]="A"
)
While this works, these pivoted measures are hard to work with, so I tried to make a nested SWITCH calculation:
Class= CALCULATE(SUM(
SWITCH (
TRUE (),
[MeasureCompliancePCT] <= SWITCH(TRUE(), Table[Type]="A", -1,
Table[Type]="B", -1),0,
[MeasureCompliancePCT] <= SWITCH(TRUE(), Table[Type]="A", 0.25,
Table[Type]="B", 0.49),1,
[MeasureCompliancePCT] <= SWITCH(TRUE(), Table[Type]="A", 0.50,
Table[Type]="B", 0.66),2,
[MeasureCompliancePCT] <= SWITCH(TRUE(), Table[Type]="A", 0.61,
Table[Type]="B", 0.78),3,
[MeasureCompliancePCT] <= SWITCH(TRUE(), Table[Type]="A", 0.80,
Table[Type]="B", 0.86),4,
5)))
But so far, no combination of formulas that I have tried avoids the dreaded "single value cannot be determined" error.
I know this must be possible in the language, but there's some element that I'm missing. If only the table relationship supported a BETWEEN operator...

Rather than try to implement an entire table in a measure, a little bit of modeling will go a long way.
Create two more tables:
One, with just the types -
Type
A
B
C
And one with your type class bands -
Type BandBegin BandEnd Class
A -1 0.25 1
A 0.25 0.5 2
A 0.5 0.61 3
A 0.61 0.8 4
A 0.8 10 5
B -1 0.49 1
B 0.49 0.66 2
B 0.66 0.78 3
B 0.78 0.86 4
B 0.86 10 5
Join up your Compliance table and the Classes table to the Types table on Type.
Now you just need to create two measures:
CompliancePct:=SUM( Compliance[Compliant] ) / COUNTROWS( Compliance )
CorrectBand := CALCULATE( MAX(Classes[Class])
, FILTER (
Classes,
Classes[BandBegin] <= Compliance[CompliancePct]
&& Classes[BandEnd] > Compliance[CompliancePct]
)
)

Related

How would I select multiple summed columns each with their own condition in Postgres?

Basically:
I have a number representing an amount of time in minutes, we'll call my_minutes
I have an aircraft type that any record within the table must first match on to be qualified (WHERE)
There are 12 months worth of Minutes data for each month, in the form of month_01_minutes, month_02_minutes, month_03_minutes...
If a particular field for that month is within +/- 10% of the provided number of minutes, add that to the sum
If the value isn't less than/greater than 10% of my provided amount of minutes (my_minutes), return 0 for that select column in particular
At the end, I'd like to sum up each of the selected values for a grand total of everything
select 
   sum( if(month_01_minutes <= 0.9 * my_minutes and month_01_minutes >= 1.1 * my_minutes, month_01_minutes, 0 ) ),
sum( if(month_02_minutes <= 0.9 * my_minutes and month_02_minutes >= 1.1 * my_minutes, month_02_minutes, 0 ) ),
where tableName.aircraft_type = providedAircraftType
Table with all of the minute data
I've tried it with just one column, but the "where" clause of this just returns a zero value despite there being a field with a value that is within +/- 10% of 225
select
sum(case when c.month_01_minutes <= 0.9 * 225 and c.month_01_minutes >= 1.1 * 225 then c.month_01_minutes else 0 end)
from fumes_schema.consumption c
where c.aircraft_type = 'xyz';
The predicate was wrong, a number could not be smaller than 90% and bigger than 110% at same time,it would always return false.
select
sum(case when /* c.month_01_minutes <= 0.9 * 225 and */ c.month_01_minutes >= 1.1 * 225 then c.month_01_minutes else 0 end)
from fumes_schema.consumption c
where c.aircraft_type = 'xyz';

Find rolling line of best fit in SQL

I'm trying to find the rolling line of best fit for a set of data, when we look at groups of five points at a time, ordered by the x value. In other words:
For rows 1-4 there is no value, because we don't have 5 total values yet
For row 5, get the slope and yIntercept for rows 1-5
For row 6, get the slope and yIntercept for rows 2-6
For row 7, get the slope and yIntercept for rows 3-7
For row 8, get the slope and yIntercept for rows 4-8
For row 9, get the slope and yIntercept for rows 5-9
Here's the values I'm aiming for, in an Excel sheet and plot. The values for slope and yIntercept are correct according to pen-and-paper and online linear regression calculation:
...and here's the SQL I have so far:
WITH dataset AS (
SELECT 1 AS x, 9 AS y UNION ALL
SELECT 2 AS x, 7 AS y UNION ALL
SELECT 3 AS x, 5 AS y UNION ALL
SELECT 4 AS x, 3 AS y UNION ALL
SELECT 5 AS x, 1 AS y UNION ALL
SELECT 6 AS x, 1 AS y UNION ALL
SELECT 7 AS x, 1 AS y UNION ALL
SELECT 8 AS x, 1 AS y UNION ALL
SELECT 9 AS x, 1 AS y
),
rollingAverages AS (
SELECT
dataset.*,
AVG(dataset.x * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [xMean],
AVG(dataset.y * 1.00) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yMean],
SUM(1) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS [yCount]
FROM dataset
),
mValue AS (
SELECT
*,
CASE WHEN yCount < 5 THEN NULL ELSE x - yCount + 1 END AS xStart,
CASE WHEN yCount < 5 THEN NULL ELSE x END AS xEnd,
CASE
WHEN yCount < 5 THEN NULL
WHEN SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) = 0
THEN 0
ELSE
SUM((x - xMean) * (y - yMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
/ SUM((x - xMean) * (x - xMean)) OVER (ORDER BY x ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
END AS slope
FROM rollingAverages
),
-- This is the y intercept at the start of the range, i.e. 40 trading days before "today"
yIntercept AS (
SELECT
*,
yMean - slope * xMean AS yIntercept
FROM mValue
),
channelNowMidpoint AS (
SELECT
*
FROM yIntercept
)
SELECT *
FROM channelNowMidpoint
ORDER BY x
I'm not getting the correct values for slope or yIntercept, I think because the line-of-best-fit algorithm I'm using expects an unbounded set of values, so the calculations I get for xMean and yMean have lost context by the time I get to the CTE named mValue. For reference, you can find a line-of-best-fit algorithm that uses the "least squares" method here.
See below for the values I'm getting when I run this SQL in SSMS:
As you can see, where x = 5, the slope or yIntercept are correct, but after that it is incorrect. I'm not sure where I'm going wrong and how to get the values I'm aiming for.
Ok, I've figured it out. The problem with using window functions in this case is we're getting the values of xMean and yMean at the row being processed, not at the lead row of the window.
In order to fix this, the mValue CTE needs to join back to the dataset CTE to get the values for x and y then stop using window functions, so that the xMean and yMean values are static for each row:
mValue AS (
SELECT
ra.*,
CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x - ra.yCount + 1 END AS xStart,
CASE WHEN ra.yCount < 5 THEN NULL ELSE ra.x END AS xEnd,
CASE
WHEN ra.yCount < 5 THEN NULL
WHEN SUM((ds.x - xMean) * (ds.x - xMean)) = 0
THEN 0
ELSE
SUM((ds.x - xMean) * (ds.y - yMean)) / SUM((ds.x - xMean) * (ds.x - xMean))
END AS slope
FROM rollingAverages AS ra
INNER JOIN dataset AS ds
ON ra.x - ds.x BETWEEN 0 AND 4
GROUP BY
ra.x,
ra.y,
ra.xMean,
ra.yMean,
ra.yCount
),
Results:

Get bins range from temporary table SQL

I have a question related to my previous one.
What I have is a database that looks like:
category price date
-------------------------
Cat1 37 2019-03
Cat2 65 2019-03
Cat3 34 2019-03
Cat1 45 2019-03
Cat2 100 2019-03
Cat3 60 2019-03
This db has hundred of categories and comes from another one that has different attributes for each observation.
With this code:
WITH table AS
(
SELECT
category, price, date,
substring(date, 1, 4) AS year,
substring(date, 6, 2) as month
FROM
original_table
WHERE
(year = "2019" or year = "2020")
AND (month = "03")
AND product = "XXXXX"
ORDER BY
anno
)
-- I get this from a bigger table, but prefer to make small steps
-- that anyone in the fute can understand where this comes from as
-- the original table is expected to grow fast
SELECT
category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM
(SELECT
*,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM
table)
WHERE
next_date IS NOT NULL AND Pct_change >= 0
ORDER BY
Pct_change DESC
This code gets me a view of the data that looks like:
category Pct_change period
cat1 0.21 2019-2020
cat2 0.53 2019-2020
cat3 0.76 "
This is great! But my next view has to take this one and provide me with a range that shows how many categories are in each range.
It should look like:
range avg num_cat_in
[0.1- 0.4] 0.3 3
This last table is just an example of what I expect
I have been trying with a code that looks like this but i get nothing
WITH table AS (
SELECT category, price, date, substring(date, 1, 4) AS year, substring(date, 6, 2) as month
FROM original_table
WHERE (year= "2019" or year= "2020") and (month= "03") and product = "XXXXX"
order by anno
)
-- I get this from a bigger table, but prefer to make small steps that anyone in the future can understand where this comes from as the original table is expected to grow fast
SELECT category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM (
SELECT *,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM table
)
WHERE next_date IS NOT NULL AND Pct_change>=0
ORDER BY Pct_change DESC
WHERE next_date IS NOT NULL AND Pct_change>=0
)
SELECT
count(CASE WHEN Pct_change> 0.12 AND Pct_change <= 0.22 THEN 1 END) AS [12 - 22],
count(CASE WHEN Pct_change> 0.22 AND Pct_change <= 0.32 THEN 1 END) AS [22 - 32],
count(CASE WHEN Pct_change> 0.32 AND Pct_change <= 0.42 THEN 1 END) AS [32 - 42],
count(CASE WHEN Pct_change> 0.42 AND Pct_change <= 0.52 THEN 1 END) AS [42 - 52],
count(CASE WHEN Pct_change> 0.52 AND Pct_change <= 0.62 THEN 1 END) AS [52 - 62],
count(CASE WHEN Pct_change> 0.62 AND Pct_change <= 0.72 THEN 1 END) AS [62 - 72],
count(CASE WHEN Pct_change> 0.72 AND Pct_change <= 0.82 THEN 1 END) AS [72 - 82]
Thank you!!!
cf. my comment, I'm first assuming that your ranges are not hard-coded and that you wish to evenly split your data across quantiles of Prc_change. What this means is the calculation will figure out the ranges which split your sample as uniformly as possible. In this case, the following would work (where theview is the name of your previous view which calculates percentages):
select
concat('[',min(Pct_change),'-',min(Pct_change),']') as `range`
, avg(Pct_change) as `avg`
, count(*) as num_cat_in
from(
select *
, ntile(5)over(order by Pct_change) as bin
from theview
) t
group by bin
order by bin;
Here is a fiddle.
If on the other hand your ranges are hard-coded, I assume the ranges are in a table such as the one I create:
create table theranges (lower DOUBLE, upper DOUBLE);
insert into theranges values (0,0.2),(0.2,0.4),(0.4,0.6),(0.6,0.8),(0.8,1);
(You have to make sure that the ranges are non-overlapping. By convention I include percentages in the range from the lower bound included to the upper bound excluded, except for the upper bound of 1 which is included.) It is then a matter of left-joining the tables:
select
concat('[',lower,'-',upper,']') as `range`
, avg(Pct_change) as `avg`
, sum(if(Pct_change is null, 0, 1)) as num_cat_in
from theranges left join theview on (Pct_change>=lower and if(upper=1,true,Pct_change<upper))
group by lower, upper
order by lower;
(Note that in the bit that says upper=1, you must change 1 to whatever your highest hard-coded range is; here I am assuming your percentages are between 0 and 1.)
Here is the second fiddle.

Follow up: Access considers query too complex after normalization

At first, I want to apologize for the n+1st Query-too-complex-question.
Since I've been told that my dtabase deserves normalisation, I tried to do the very same thing with a normalised setup. However, Access complains about a too complex query by now.
What I want to do: Starting point is a query that yields fields Item ID, Difference in attribute 1, Group of attribute 1, Difference in Attribute 2, Group of attribute 2,... (about 10.000 rows, query is just an equijoin of the two datasets to compare). For each attribute, I want to draw a histogram showing the distribution of differences for that attribute. In fact, I want to draw two histograms in one canvas, one of which is constrained to Group = 1.
What I tried:
First step was to build a normalisation query (based on unions) which yields columns Item ID, attribute, difference, group. This query yields about 100.000 rows.
Next step is to draw the histogram. Unfortunately, access has no built-in histogram chart, so I resorted to a xy-chart whose data source is a query that is a union of four queries that yield the for corners of the bins.
Since I want to draw two histograms, this data source query is unioned with a copy of itself that bears the additional group = 1-condition.
So I endet up with the following query:
select bin, max(c), max(c2) from (
-- collect the four corners of the first histogram
select bin, cnt as c, 1 as c2, ord from (
-- top right
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
Count(bin) AS cnt,
1 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
-- bottom right
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
0 AS cnt,
2 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
-- bottom left
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
0 AS cnt,
3 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
union all
-- top left
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
Count(bin) AS cnt,
4 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
order by bin, ord asc
)
union all
-- connect the corners of the other one
select bin, 1 as c, cnt as c2, ord from (
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
Count(bin) AS cnt,
1 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
select
cdbl(bin(difference, 0.1, -2, 2) + 0.05) AS bin,
0 AS cnt,
2 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) + 0.05)
union all
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
0 AS cnt,
3 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
union all
select
cdbl(bin(difference, 0.1, -2, 2) - 0.05) AS bin,
Count(bin) AS cnt,
4 as ord
FROM normalized_data
WHERE difference Is Not Null
and attribut='attibute_name'
and difference between -2 and 2
AND (GR=1)
GROUP BY
cdbl(bin(difference, 0.1, -2, 2) - 0.05)
order by bin, ord asc
)
)
group by bin, ord
order by bin, ord asc
The two innermost unions are computed, as well as the two queries in the middle level. However, when I try to compute the outermost union, access complains about the query's complexity (which did not happen on the unnormalized query)
Question: Do I have any chance to resolve this?
Remark: The middle step has been introduced to ease automatic code generation. Removal doesn't solve the problem.
Edit: What is in the tables: As requested, I shall add some information about what is stored in tables and what is queried. I am given two tables
create table A (
id integer not null primary key,
[attribute 1] integer,
[attribute 2] integer,
...
)
create table B (
id integer not null primary key,
[attribute 1] integer,
[attribute 2] integer,
...
)
and a query differences to complare them:
select
id,
A.[attribute 1] - B.[attribute 1] as [delta 1],
...
from A
inner join B on A.id = B.id
I am aware that this unnormalized data model is bad design, but I am not the person in charge to design the model. This is why a have built a query normalized_data which unpivots the data from differences:
select
id,
'attribute 1' as attribute,
delta 1 as difference
from differences
union all
select
id,
'attribute 2' as attribute,
delta 2 as difference
from differences
union all
...
Note that the code for the query in question above does not change very much if one uses the unnormalized data from differences as input, or the unpivoted data from normalized_data.

Calculating progressive pricing in PostgreSQL

I need to calculate revenue based on how many items a user has.
So for example, the first 10 items are free, up to 100 items are 0.50, up to 200 are 0.25 and up to 500 are 0.15 for example.
I have no idea where to start with this, can I get some direction please?
EG. If a user has 365 items, this would be (10 * 0) + (90 * 0.5) + (100 * 0.25) + (165 * 0.15)
Ideally I'd be doing this in python or something, but the dashboarding tool doesn't have that capability...
EDIT:
I should have mentioned that the number of items isn't actually the number they have, it's the limit they have chosen. The limit is saved as a single number in a subscription event. So for each user I will have an integer representing their max items eg. 365
First number items using window function row_number,
then use a case expression to assign a proper value for each item.
Simple example: http://sqlfiddle.com/#!17/32e4a/9
SELECT user_id,
SUM(
CASE
WHEN rn <= 10 THEN 0
WHEN rn <= 100 THEN 0.5
WHEN rn <= 200 THEN 0.25
WHEN rn <= 500 THEN 0.15
ELSE 0.05
END
) As revenue
FROM (
SELECT *,
row_number() OVER (partition by user_id order by item_no ) As rn
FROM mytable
) x
GROUP BY user_id
I should have mentioned that the number of items isn't actually the
number they have, it's the limit they have chosen. The limit is saved
as a single number in a subscription event. So for each user I will
have an integer representing their max items eg. 365
In this case the below query probably fits your needs:
Demo: http://sqlfiddle.com/#!17/e7a6a/2
SELECT *,
(SELECT SUM(
CASE
WHEN rn <= 10 THEN 0
WHEN rn <= 100 THEN 0.5
WHEN rn <= 200 THEN 0.25
WHEN rn <= 500 THEN 0.15
ELSE 0.05
END
)
FROM generate_series(1,t.user_limit) rn
)
FROM mytab t;