Turning Percentile_cont/disc (Median) into scalar function - sql

Editing this since turns out we were trying to recreate the wheel.
The below works perfectly in determining the median. Now, how would we go about converting into a function so that we can call median(column) instead of having to do the below each time. Below does the trick:
select percentile_cont(0.5) within group (order by n) over (PARTITION BY [column1]),
from t;
AHH - I see. Is it possible to groupby where it calcs the median only across where column1 = a,b,c so output would be
A median of values with A identifier
B median of values with B identifier
C median of values with C identifier

You should just usetthe percentile_cont() or percentile_disc() window functions:
select percentile_cont(0.5) within group (order by n) over (),
percentile_disc(0.5) within group (order by n) over ()
from t;
There is no need to re-invent the wheel.

Related

How to generate ranges of a column based on condition

There is a column with numbers- I would like to develop a report that categorizes values of this column into ranges (lower limit and upper limit). This split must happen if the difference in values is more than 10. Is this something achievable by either query in Power BI or SQL Server?
In SQL, I would use lag() and a window sum() to define the groups, and then aggregate:
select min(x) lower_limit, max(x) upper_limit
from (
select x, sum(case when x <= lag_x + 10 then 0 else 1 end) over(order by x) grp
from (select x, lag(x) over(order by x) lag_x from mytable) t
) t
group by grp
lag() gives you the the previous value. Then, the window sum implements the following logic: everytime the difference between the current and the previous value is more than 10, a new group starts. Finally, the outer query aggregates by group and computes the lower and upper bounds.
GMB's solution is definitely the canonical approach to solving this, by treating it as a variant of gaps-and-islands. I was wondering if there is a way to do this without two levels of subqueries. And there is:
select coalesce(lag(next_x) over (order by x), first_x) as lower,
x as upper
from (select t.*,
first_value(x) over (order by x) as first_x,
lead(x) over (order by x) as next_x
from t
) t
where next_x is null or next_x > x + 10;
Here is a db<>fiddle.
It would be interesting to compare the performance on a large set of data -- 2 window functions + aggregation versus 3 window functions + filtering.

RANK repeating data set

Assuming that I have data like the ones in columns A and B, how can I rank them like column C? I have tried multiple varieties of RANK and NTILE but have been unsuccessful. Thank you.
Note: There are not always 3 rows for each group, it varies.
SQL tables are inherently unordered. There is no distinguishing between the 1st and 4th row, with the data as you've presented. You can generate an equivalent result set, but the ordering may differ.
Simple arithmetic may do the trick:
select a,
( row_number() over (order by a) + 2) / 3 ) as
from t
order by a, b, c;
A better method uses the b column:
select a,
row_number() over (partition by b order by a) as c
from t
order by a, c;
you can use ntile as below:
Select *, ntile(2) over(order by (Select NULL)) from #data
Instead of (Select NULL) you can provide any other valid ordering column based on your data

SQL find median of col2 for every distinct value of col1

I'm trying to calculate median of time for every distinct value in column1, which stores some kind of id's. The second column stores time in miliseconds. I want to calculate median of records for every id. I have this:
Declare #Median varchar(max)
SELECT #Median = PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY ExecTime) OVER ()
FROM
(
SELECT ExecTime
FROM logs
WHERE Message Like '<%'
) AS median
SELECT #Median as Median --, Name
Which calculates median of all values in the col2 (I deleted extra conditions which are not relevant at this point). I think it's just one step away from the solution but I can't catch it.
I think you are looking for the partition by clause:
SELECT DISTINCT column1,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ExecTime) OVER (PARTITION BY column1) as median
FROM logs l
WHERE l.Message Like '<%';

Row_number function

I'm trying to update a column called Rank in X table where erank would be the rank of a column called annual sales and it is computed using row_number () over (order by annual sales desc) and as this is a function, hence it cannot be used to update as it should only be a part of select clause so we have written something like this :
UPDATE X
SET rank = SELECT acc_id,
annual_call,
ROW_NUMBER() OVER (ORDER BY annual sales DESC)
FROM x
GROUP BY acc_id,annual_call
But this is throwing an error
As general rule, when you ask a question and you mention an error, say what error you get. It help us help you, you don't have to make it a mystery novel...
Here is how you do it:
with cte as (
select [rank], row_number() over (
partition by acc_id, annual_call
order by [annual sales] desc) as [row_number]
from x)
update cte
set [rank] = [row_number];
Of course, persisting such rank is usually doomed since it will become incorrect after first update, but that is a different topic.
Here while
set rank = select acc_id,annual_call,row_number ()
over (order by annual sales desc)
from x
must select only 1 column, you must implement some other logic for this. as assigning the values must return only a single column.

Evaluating the mean absolute deviation of a set of numbers in Oracle

I'm trying to implement a procedure to evaluate the median absolute deviation of a set of numbers (usually obtained via a GROUP BY clause).
An example of a query where I'd like to use this is:
select id, mad(values) from mytable group by id;
I'm going by the aggregate function example but am a little confused since the function needs to know the median of all the numbers before all the iterations are done.
Any pointers to how such a function could be implemented would be much appreciated.
In Oracle 10g+:
SELECT MEDIAN(ABS(value - med))
FROM (
SELECT value, MEDIAN(value) OVER() AS med
FROM mytable
)
, or the same with the GROUP BY:
SELECT id, MEDIAN(ABS(value - med))
FROM (
SELECT id, value, MEDIAN(value) OVER(PARTITION BY id) AS med
FROM mytable
)
GROUP BY
id