Calculate SQL Median - sql

I was trying to implement a median from this solution (among others, but this seemed the simplest Median code): Function to Calculate Median in Sql Server
However, I'm having difficulty in its application. This is my current SQL query. My goal is to find the Median TotalTimeOnCall for CallerComplaintTypeID on a given Week, Month, and Department. I think my biggest issue is that I'm just fundamentally not understanding how to apply this Median function to achieve my results.
For example, if I needed an Average, instead, I could just change that ORDER BY to a GROUP BY and then slap an AVG(TotalTimeOnCall) instead. How do I accomplish this idea with this Median solution, instead?
This is the "raw data" query:
WITH rawData as (
SELECT
DepartmentName
,MONTH(PlacedOnLocal) AS MonthNumber
,CASE
WHEN Datepart(day, PlacedOnLocal) < 8 THEN '1'
WHEN Datepart(day, PlacedOnLocal) < 15 THEN '2'
WHEN Datepart(day, PlacedOnLocal) < 22 THEN '3'
WHEN Datepart(day, PlacedOnLocal) < 29 THEN '4'
ELSE '5'
END AS WeekNumber
,CallerComplaintTypeID
,TotalTimeOnCall
FROM [THE_RELEVANT_TABLE]
WHERE PlacedOnLocal BETWEEN '2014-09-01' AND '2014-12-31'
AND CallerComplaintTypeID IN (5,89,9,31,203)
AND TotalTimeOnCall IS NOT NULL
)
SELECT
DepartmentName,
MonthNumber,
WeekNumber,
CallerComplaintTypeID,
TotalTimeOnCall
FROM
rawData
ORDER BY DepartmentName, MonthNumber, WeekNumber, CallerComplaintTypeID
with this sample output:
DepartmentName MonthNumber WeekNumber CallerComplaintTypeID TotalTimeOnCall
Dept_01 9 1 5 654
Dept_01 9 1 5 156
Dept_01 9 1 5 21
Dept_01 9 1 5 67
Dept_01 9 1 5 13
Dept_01 9 1 5 97
Dept_01 9 1 5 87
Dept_01 9 1 5 16
this is the Median solution from above:
SELECT
(
(
SELECT MAX(TotalTimeOnCall)
FROM
(
SELECT TOP 50 PERCENT TotalTimeOnCall
FROM rawData
WHERE TotalTimeOnCall IS NOT NULL
ORDER BY TotalTimeOnCall
) AS BottomHalf
)
+
(
SELECT MIN(TotalTimeOnCall)
FROM
(
SELECT TOP 50 PERCENT TotalTimeOnCall
FROM rawData
WHERE TotalTimeOnCall IS NOT NULL
ORDER BY TotalTimeOnCall DESC
) AS TopHalf
)
) / 2 AS Median

Here is a simple median solution that allows you to get a median per group.
-- Example of how to get median from a set of data
;with cte_my_query as (
-- this cte simulates the query that would return your data
select '2016-01-01' as dt, 1 as val
union
select '2016-01-01' as dt, 10 as val
union
select '2016-01-01' as dt, 7 as val
union
select '2016-01-01' as dt, 16 as val
union
select '2016-01-01' as dt, 11 as val
union
select '2016-01-01' as dt, 2 as val
union
select '2016-01-01' as dt, 5 as val
union
select '2016-01-02' as dt, 6 as val
union
select '2016-01-02' as dt, 13 as val
union
select '2016-01-02' as dt, 7 as val
union
select '2016-01-02' as dt, 9 as val
union
select '2016-01-02' as dt, 18 as val
)
,cte_dates as (
-- get the distinct key we want to get median for
select distinct dt from cte_my_query
)
select dt, median.val
from cte_dates
cross apply (
-- of the top 50% (below), take the top 1, desc, which is the median value
select top 1 val from (
-- for each date, get the top 50% of the values
select top 50 percent val
from cte_my_query
where cte_dates.dt = cte_my_query.dt
order by dt
) as inner_median
order by inner_median.val desc
) median

Related

How can I divide hours to next working days in SQL?

I have a table that stores the start-date and number of the hours. I have also another time table as reference to working days. My main goal is the divide this hours to the working days.
For examle:
ID Date Hour
1 20210504 40
I want it to be structured as
ID Date Hour
1 20210504 8
1 20210505 8
1 20210506 8
1 20210507 8
1 20210510 8
I manage to divide the hours with the given code but couldn't manage to make it in working days.
WITH cte1 AS
(
select 1 AS ID, 20210504 AS Date, 40 AS Hours --just a test case
), working_days AS
(
select date from dateTable
),
cte2 AS
(
select ID, Date, Hours, IIF(Hours<=8, Hours, 8) AS dailyHours FROM cte1
UNION ALL
SELECT
cte2.ID,
cte2.Date + 1
,cte2.Hours - 8
,IIF(Hours<=8, Hours, 8)
FROM cte2
JOIN cte1 t ON cte2.ID = t.ID
WHERE cte2.HOURS > 8 AND cte2.Date + 1 IN (select * from working_days)
When I use it like this it only gives me this output with one day missing
ID Date Hour
1 20210504 8
1 20210505 8
1 20210506 8
1 20210507 8
To solve your problem you need to build your calendar in the right way,
adding also to working_days a ROW_NUMBER to get correct progression.
declare #date_start date = '2021-05-01'
;WITH
cte1 AS (
SELECT * FROM
(VALUES
(1, '20210504', 40),
(2, '20210505', 55),
(3, '20210503', 44)
) X (ID, Date, Hour)
),
numbers as (
SELECT ROW_NUMBER() over (order by o.object_id) N
FROM sys.objects o
),
cal as (
SELECT cast(DATEADD(day, n, #date_start) as date) d, n-1 n
FROM numbers n
where n.n<32
),
working_days as (
select d, ROW_NUMBER() over (order by n) dn
from cal
where DATEPART(weekday, d) < 6 /* monday to friday in italy (country dependent) */
),
base as (
SELECT t.ID, t.Hour, w.d, w.dn
from cte1 t
join working_days w on w.d = t.date
)
SELECT t.ID, w.d, iif((8*n)<=Hour, 8, 8 + Hour - (8*n) ) h
FROM base t
join numbers m on m.n <= (t.Hour / 8.0) + 0.5
join working_days w on w.dn = t.dn + N -1
order by 1,2
You can use a recursive CTE. This should do the trick:
with cte as (
select id, date, 8 as hour, hour as total_hour
from t
union all
select id, dateadd(day, 1, date),
(case when total_hour < 8 then total_hour else 8 end),
total_hour - 8
from cte
where total_hour > 0
)
select *
from cte;
Note: This assumes that total_hour is at least 8, just to avoid a case expression in the anchor part of the CTE. That can trivially be added.
Also, if there might be more than 100 days, you will need option (maxrecursion 0).

Oracle SQL recursive adding values

I have the following data in the table
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 null
01/03/20 3 null
01/04/20 8 null
01/05/20 31 null
Based on the above data I would like to have the following situation.
Period Total_amount R_total
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
Additional data
01/06/20 21 0 (previously it would be -2)
01/07/20 25 25
01/08/20 29 4
Pattern to the additional data is:
if total_amount < previous(r_total) then 0
Based on the filled data, we can spot the pattern is:
R_total = total_amount - previous(R_total)
Could you please help me out with this issue?
As Gordon Linoff suspected, it is possible to solve this problem with analytic functions. The benefit is that the query will likely be much faster. The price to pay for that benefit is that you need to do a bit of math beforehand (before ever thinking about "programming" and "computers").
A bit of elementary arithmetic shows that R_TOTAL is an alternating sum of TOTAL_AMOUNT. This can be arranged easily by using ROW_NUMBER() (to get the signs) and then an analytic SUM(), as shown below.
Table setup:
create table sample_data (period, total_amount) as
select to_date('01/01/20', 'mm/dd/rr'), 2 from dual union all
select to_date('01/02/20', 'mm/dd/rr'), 5 from dual union all
select to_date('01/03/20', 'mm/dd/rr'), 3 from dual union all
select to_date('01/04/20', 'mm/dd/rr'), 8 from dual union all
select to_date('01/05/20', 'mm/dd/rr'), 31 from dual
;
Query and result:
with
prep (period, total_amount, sgn) as (
select period, total_amount,
case mod(row_number() over (order by period), 2) when 0 then 1 else -1 end
from sample_data
)
select period, total_amount,
sgn * sum(sgn * total_amount) over (order by period) as r_total
from prep
;
PERIOD TOTAL_AMOUNT R_TOTAL
-------- ------------ ----------
01/01/20 2 2
01/02/20 5 3
01/03/20 3 0
01/04/20 8 8
01/05/20 31 23
This may be possible with window functions, but the simplest method is probably a recursive CTE:
with t as (
select t.*, row_number() over (order by period) as seqnum
from yourtable t
),
cte(period, total_amount, r_amount, seqnum) as (
select period, total_amount, r_amount, seqnum
from t
where seqnum = 1
union all
select t.period, t.total_amount, t.total_amount - cte.r_amount, t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select *
from cte;
This question explicitly talks about "recursively" adding values. If you want to solve this using another mechanism, you might explain the logic in detail and ask if there is a non-recursive CTE solution.

take sum of last 7 days from the observed date in BigQuery

I have a table on which I want to compute the sum of revenue on last 7 days from the observed day. Here is my table -
with temp as
(
select DATE('2019-06-29') as transaction_date, "x"as id, 0 as revenue
union all
select DATE('2019-06-30') as transaction_date, "x"as id, 80 as revenue
union all
select DATE('2019-07-04') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-06') as transaction_date, "x"as id, 64 as revenue
union all
select DATE('2019-07-11') as transaction_date, "x"as id, 75 as revenue
union all
select DATE('2019-07-12') as transaction_date, "x"as id, 0 as revenue
)
select * from temp
I want to take a sum of last 7 days for each transaction_date. For instance for the last record which has transaction_date = 2019-07-12, I would like to add another column which adds up revenue for last 7 days from 2019-07-12 (which is until 2019-07-05), hence the value of new rollup_revenue column would be 0 + 75 + 64 = 139. Likewise, I need to compute the rollup for all the dates for every ID.
Note - the ID may or may not appear daily.
I have tried self join but I am unable to figure it out.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.temp` AS (
SELECT DATE '2019-06-29' AS transaction_date, 'x' AS id, 0 AS revenue UNION ALL
SELECT '2019-06-30', 'x', 80 UNION ALL
SELECT '2019-07-04', 'x', 64 UNION ALL
SELECT '2019-07-06', 'x', 64 UNION ALL
SELECT '2019-07-11', 'x', 75 UNION ALL
SELECT '2019-07-12', 'x', 0
)
SELECT *,
SUM(revenue) OVER(
PARTITION BY id ORDER BY UNIX_DATE(transaction_date)
RANGE BETWEEN 6 PRECEDING AND CURRENT ROW
) rollup_revenue
FROM `project.dataset.temp`
-- ORDER BY transaction_date
with result
Row transaction_date id revenue rollup_revenue
1 2019-06-29 x 0 0
2 2019-06-30 x 80 80
3 2019-07-04 x 64 144
4 2019-07-06 x 64 208
5 2019-07-11 x 75 139
6 2019-07-12 x 0 139
One option uses a correlated subquery to find the rolling sum:
SELECT
transaction_date,
revenue,
(SELECT SUM(t2.revenue) FROM temp t2 WHERE t2.transaction_date
BETWEEN DATE_SUB(t1.transaction_date, INTERVAL 7 DAY) AND
t1.transaction_date) AS rev_7_days
FROM temp t1
ORDER BY
transaction_date;

Select min/max dates for periods that don't intersect

Example! I have a table with 4 columns. date format dd.MM.yy
id ban start end
1 1 01.01.15 31.12.18
1 1 02.02.15 31.12.18
1 1 05.04.15 31.12.17
In this case dates from rows 2 and 3 are included in dates from row 1
1 1 02.04.19 31.12.20
1 1 05.05.19 31.12.20
In this case dates from row 5 are included in dates from rows 4. Basically we have 2 periods that don't intersect.
01.01.15 31.12.18
and
02.04.19 31.12.20
Situation where a date starts in one period and ends in another are impossible. The end result should look like this
1 1 01.01.15 31.12.18
1 1 02.04.19 31.12.20
I tried using analitical functions(LAG)
select id
, ban
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(start) over (partition by id, ban order by start, end asc), start)
else start
end as start
, case
when start >= nvl(lag(start) over (partition by id, ban order by start, end asc), start)
and end <= nvl(lag(end) over (partition by id, ban order by start, end asc), end)
then nvl(lag(end) over (partition by id, ban order by start, end asc), end)
else end
end as end
from table
Where I order rows and if current dates are included in previous I replace them. It works if I have just 2 rows. For example this
1 1 08.09.15 31.12.99
1 1 31.12.15 31.12.99
turns into this
1 1 08.09.15 31.12.99
1 1 08.09.15 31.12.99
which I can then group by all fields and get what I want, but if there are more
1 2 13.11.15 31.12.99
1 2 31.12.15 31.12.99
1 2 16.06.15 31.12.99
I get
1 2 16.06.15 31.12.99
1 2 16.06.15 31.12.99
1 2 13.11.15 31.12.99
I understand why this happens, but how do I work around it? Running the query multiple times is not an option.
This query looks promising:
-- test data
with t(id, ban, dtstart, dtend) as (
select 1, 1, date '2015-01-01', date '2015-03-31' from dual union all
select 1, 1, date '2015-02-02', date '2015-03-31' from dual union all
select 1, 1, date '2015-03-15', date '2015-03-31' from dual union all
select 1, 1, date '2015-08-05', date '2015-12-31' from dual union all
select 1, 2, date '2015-01-01', date '2016-12-31' from dual union all
select 2, 1, date '2016-01-01', date '2017-12-31' from dual),
-- end of test data
step1 as (select id, ban, dt, to_number(inout) direction
from t unpivot (dt for inout in (dtstart as '1', dtend as '-1'))),
step2 as (select distinct id, ban, dt, direction,
sum(direction) over (partition by id, ban order by dt) sm
from step1),
step3 as (select id, ban, direction, dt dt1,
lead(dt) over (partition by id, ban order by dt) dt2
from step2
where (direction = 1 and sm = 1) or (direction = -1 and sm = 0) )
select id, ban, dt1, dt2
from step3 where direction = 1 order by id, ban, dt1
step1 - unpivot dates and assign 1 for start date, -1 for end
date (column direction)
step2 - add cumulative sum for direction
step3 - filter only interesting dates, pivot second date using lead()
You can shorten this syntax, I divided it to steps to show what's going on.
Result:
ID BAN DT1 DT2
------ ---------- ----------- -----------
1 1 2015-01-01 2015-03-31
1 1 2015-08-05 2015-12-31
1 2 2015-01-01 2016-12-31
2 1 2016-01-01 2017-12-31
I assumed that for different (ID, BAN) we have to make calculations separately. If not - change partitioning and ordering in sum() and lead().
Pivot and unpivot works in Oracle 11 and later, for earlier versions you need case when.
BTW - START is reserved word in Oracle so in my example I changed slightly column names.
I like to do this by identifying the period starts, then doing a cumulative sum to define the group, and a final aggregation:
select id, ban, min(start), max(end)
from (select t.*, sum(start_flag) over (partition by id, bin order by start) as grp
from (select t.*,
(case when exists (select 1
from t t2
where t2.id = t.id and t2.ban = t.ban and
t.start <= t2.end and t.end >= t2.start and
t.start <> t2.start and t.end <> t2.end
)
then 0 else 1
end) as start_flag
from t
) t
) t
group by id, ban, grp;

SQL - Count number of changes in an ordered list

Say I've got a table with two columns (date and price). If I select over a range of dates, then is there a way to count the number of price changes over time?
For instance:
Date | Price
22-Oct-11 | 3.20
23-Oct-11 | 3.40
24-Oct-11 | 3.40
25-Oct-11 | 3.50
26-Oct-11 | 3.40
27-Oct-11 | 3.20
28-Oct-11 | 3.20
In this case, I would like it to return a count of 4 price changes.
Thanks in advance.
You can use the analytic functions LEAD and LAG to access to prior and next row of a result set and then use that to see if there are changes.
SQL> ed
Wrote file afiedt.buf
1 with t as (
2 select date '2011-10-22' dt, 3.2 price from dual union all
3 select date '2011-10-23', 3.4 from dual union all
4 select date '2011-10-24', 3.4 from dual union all
5 select date '2011-10-25', 3.5 from dual union all
6 select date '2011-10-26', 3.4 from dual union all
7 select date '2011-10-27', 3.2 from dual union all
8 select date '2011-10-28', 3.2 from dual
9 )
10 select sum(is_change)
11 from (
12 select dt,
13 price,
14 lag(price) over (order by dt) prior_price,
15 (case when lag(price) over (order by dt) != price
16 then 1
17 else 0
18 end) is_change
19* from t)
SQL> /
SUM(IS_CHANGE)
--------------
4
Try this
select count(*)
from
(select date,price from table where date between X and Y
group by date,price )
Depending on the Oracle version use either analytical functions (see answer from Justin Cave) or this
SELECT
SUM (CASE WHEN PREVPRICE != PRICE THEN 1 ELSE 0 END) CNTCHANGES
FROM
(
SELECT
C.DATE,
C.PRICE,
MAX ( D.PRICE ) PREVPRICE
FROM
(
SELECT
A.Date,
A.Price,
(SELECT MAX (B.DATE) FROM MyTable B WHERE B.DATE < A.DATE) PrevDate
FROM MyTable A
WHERE A.DATE BETWEEN YourStartDate AND YourEndDate
) C
INNER JOIN MyTable D ON D.DATE = C.PREVDATE
GROUP BY C.DATE, C.PRICE
)