Running median last n days group by user. BigQuery SQL - sql

I have a table like this:
date
user_id
revenue
2021-10-01
1
2
2021-10-02
1
3
2021-10-05
1
2
2021-10-09
1
3
2021-10-15
1
3
2021-10-01
2
2
2021-10-04
2
2
2021-10-10
2
1
2021-10-11
2
3
2021-10-11
2
3
2021-10-20
2
5
And I want to add column with median revenue for the last 5 days group by user_id. Desired output should looks like this:
date
user_id
revenue
median_last_5_days
2021-10-01
1
2
NULL
2021-10-02
1
3
2
2021-10-05
1
2
2.5
2021-10-09
1
3
2
2021-10-16
1
3
NULL
2021-10-01
2
2
NULL
2021-10-02
2
3
2
2021-10-03
2
3
2.5
2021-10-04
2
2
3
2021-10-10
2
1
NULL
2021-10-11
2
3
1
2021-10-11
2
3
2
2021-10-20
2
5
NULL
Can I produce this with SQL BigQuery?
Thanks for helping me)

Use of PERCENTILE_CONT or PERCENTILE_DISC to get the median will not work on these conditions as window_frame_clause are not allowed in Navigation functions.
Try this approach,
CREATE TEMP FUNCTION MEDIAN(arr ANY TYPE) AS ((
SELECT
IF(
MOD(ARRAY_LENGTH(arr), 2) = 0,
(arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] + arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]) / 2,
arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
)
FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
));
SELECT
date,
user_id,
revenue,
MEDIAN(
ARRAY_AGG(revenue) OVER (PARTITION BY user_id
ORDER BY datetime_diff(date, date('2000-01-01'), day)
RANGE BETWEEN 4 PRECEDING AND 1 PRECEDING)
) AS median_last_5_days
FROM `my-project.my-dataset.my-table`
GROUP BY date, user_id, revenue
ORDER BY user_id;
Sample data:
Output:

Related

Need YTD and MTD calculations in SQL

Date Amt ytd mtd
01-Jan-21 1 2 2
01-Jan-21 1 2 2
02-Jan-21 1 3 3
03-Jan-21 1 4 4
01-Feb-21 1 5 1
02-Feb-21 1 6 2
03-Feb-21 1 7 3
04-Feb-21 1 8 4
05-Feb-21 1 9 5
01-Mar-21 1 10 1
02-Mar-21 1 11 2
03-Mar-21 1 12 3
04-Mar-21 1 13 4
01-Apr-21 1 14 1
02-Apr-21 1 15 2
03-Apr-21 1 16 3
01-May-21 1 17 1
02-May-21 1 18 2
03-May-21 1 19 3
04-May-21 1 20 4
05-May-21 1 21 5
06-May-21 1 22 6
I have the first two columns (Date, Amt) and i need the YTD and MTD columns in MS SQL so that i can show the above table.
Seems like a rolling COUNT OVER was used to calculate the ytd & mtd in the Oracle source.
(Personally, I would prefere RANK or DENSE_RANK)
And since Oracle datestamps can be casted to a DATE as-is.
SELECT [Date], Amt
, ytd = COUNT(*) OVER (ORDER BY CAST([Date] AS DATE))
, mtd = COUNT(*) OVER (PARTITION BY EOMONTH(CAST([Date] AS DATE)) ORDER BY CAST([Date] AS DATE))
FROM your_table
ORDER BY CAST([Date] AS DATE)
Date
Amt
ytd
mtd
01-Jan-21
1
2
2
01-Jan-21
1
2
2
02-Jan-21
1
3
3
03-Jan-21
1
4
4
01-Feb-21
1
5
1
02-Feb-21
1
6
2
03-Feb-21
1
7
3
04-Feb-21
1
8
4
05-Feb-21
1
9
5
db<>fiddle here

Distinguish the first rows where a given column's value changes in a grouped result

I want to create a select query in SQL Server where I group the rows by a column (BaseId) and also order them by Status, RTime and Version. I want to add a column "isFirst" that has the value 1 if the BaseId value is the first in the group, and 0 if it's not.
My sample table:
Table name: Head
Id BaseId Name RTime Status Version
2 2 abc 04-12 12:34 1 1
3 3 xyz 04-12 13:10 9 1
4 2 abc 04-13 14:25 0 2
5 3 xyz 04-14 12:34 0 2
6 3 xyz 04-14 13:10 9 3
7 3 xyz 04-16 14:25 1 4
8 2 abc 04-16 17:40 1 3
9 9 sql 04-17 02:23 9 1
10 9 sql 04-17 07:31 0 2
Expected result:
isFirst Id BaseId Name RTime Status Version
1 10 9 sql 04-17 07:31 0 2
0 9 9 sql 04-17 02:23 9 1
1 5 3 xyz 04-14 12:34 0 2
0 7 3 xyz 04-16 14:25 1 4
0 6 3 xyz 04-14 13:10 9 3
0 3 3 xyz 04-12 13:10 9 1
1 4 2 abc 04-13 14:25 0 2
0 8 2 abc 04-16 17:40 1 3
0 2 2 abc 04-12 12:34 1 1
My query now looks like this:
SELECT *
FROM Head
ORDER BY BaseId desc, Status, RTime desc, Version desc
I think I should use CASE to create the isFirst column, but I've had no luck so far. Anyone could help me?
You can use row_number() and a case expression:
select
case when row_number() over(
partition by BaseId
order by Status, RTime desc, Version desc
) = 1
then 1
else 0
end isFirst,
h.*
from head h
order by BaseId desc, Status, RTime desc, Version desc

Restart Row Number Based on Date and Increments of N

I have a table with the following data that I generated with a date table
date day_num (DAY_NUM % 7)
2019-07-09 0 0
2019-07-10 1 1
2019-07-11 2 2
2019-07-12 3 3
2019-07-13 4 4
2019-07-14 5 5
2019-07-15 6 6
2019-07-16 7 0
I basically want to get a week number that restarts at 0 and I need help figuring out the last part
The final output would look like this
date day_num (DAY_NUM % 7) week num
2019-07-09 0 0 1
2019-07-10 1 1 1
2019-07-11 2 2 1
2019-07-12 3 3 1
2019-07-13 4 4 1
2019-07-14 5 5 1
2019-07-15 6 6 1
2019-07-16 7 0 2
This is the sql I have so far
select
SUB.*,
DAY_NUM%7
FROM(
SELECT
DISTINCT
id_date,
row_number() over(order by id_date) -1 as day_num
FROM schema.date_tbl
WHERE Id_date BETWEEN "2019-07-09" AND date_add("2019-07-09",146)
Building on your query:
select SUB.*, DAY_NUM%7,
DENSE_RANK() OVER (ORDER BY FLOOR(DAY_NUM / 7)) as weeknum
FROM (SELECT DISTINCT id_date,
row_number() over(order by id_date) -1 as day_num
FROM schema.date_tbl
WHERE Id_date BETWEEN "2019-07-09" AND date_add("2019-07-09", 146)
) x

SQL - Include a not exists line in a aggregation

I have a order table and I'm aggregation this table by this query:
SELECT dtdate, idsku, sum(vlorder) as vlorder, sum(qtditem) as qtditem
FROM order
GROUP BY dtdate, idsku
And I'm getting this results:
dtdate idsku vlorder qtditem
01/01/2016 1 4 8
02/01/2016 1 5 10
03/01/2016 1 3 6
04/01/2016 1 2 4
05/01/2016 1 3 6
06/01/2016 1 1 2
But, I don't have results for 07/01/2016 and idsku = 1 because doesn't exists on the database (sounds dummy). And I have to include this "empty" line 07/01/2016 1 0 0, like this:
dtdate idsku vlorder qtditem
01/01/2016 1 4 8
02/01/2016 1 5 10
03/01/2016 1 3 6
04/01/2016 1 2 4
05/01/2016 1 3 6
06/01/2016 1 1 2
07/01/2016 1 0 0
Is this possible?
In Postgres, you can use generate_series():
select g.mon, o.idsku,
sum(o.vlorder) as vlorder, sum(o.qtditem) as qtditem
from generate_series('2016-01-01'::timestamp, '2016-07-01'::timestamp,
interval '1' month) g(mon) left join
orders o
on o.dtdate = g.mon
group by g.mon, idsku;
If your dates are not on the first of the month, then you can use date_trunc('day', o.dtdate) = g.mon.

sql Query on effective date

I would like to get report for drink purchased in whole month but price of the drink can change any time in month and I would like to get report for a month with price change
I have two tables
SELECT [ID]
,[DrinkID]
,[UserID]
,[qty]
,[DateTaken]
FROM [Snacks].[dbo].[DrinkHistory]
SELECT [ID]
,[DrinkID]
,[UserID]
,[qty]
,[DateTaken]
FROM [Snacks].[dbo].[DrinkHistory]
[DrinkHistory]:
ID DrinkID UserID qty DateTaken
----------------------------------------------------------------------
1 1 1 1 2014-05-10
2 1 1 2 2014-05-15
3 2 1 1 2014-06-01
4 2 1 4 2014-06-01
5 1 1 3 2014-05-20
6 1 1 4 2014-05-30
[DrinkPricesEffect]:
PriceID DrinkID DrinkPrice PriceEffectiveDate IsCurrent
-----------------------------------------------------------------------------------
1 1 10.00 2014-05-01 1
2 1 20.00 2014-05-20 1
3 2 9.00 2014-06-01 1
4 2 8.00 2014-01-01 1
5 1 30.00 2014-05-25 1
6 1 40.00 2014-05-28 1
I would like to have result as under date taken between 2014-05-1 to 2014-05-31
DrinkId Qty Price DateTaken PriceEffectiveDate
-----------------------------------------------------------------------
1 1 10 2014-05-10 2014-05-01
1 2 10 2014-05-15 2014-05-01
1 3 20 2014-05-20 2014-05-20
1 4 40 2014-05-30 2014-05-28
Is there any who can give me some idea or write query for me?
If your drink price can change any time in a month you could additionaly save the price for each purchase. I would add a column [PricePaid] to the table [DrinkHistory].
When adding a record to [DrinkHistory], the price for the drink at the moment is known, but later it might change so you save the current price to the history...
Then for your result you could just display the Whole [DrinkHistory]
SELECT * FROM DrinkHistory;
This should work:
Select
DH.DrinkId,
DH.Qty,
DPE.DrinkPrice AS Price,
DH.DateTaken,
DPE.PriceEffectiveDate
FROM DrinkHistory DH
JOIN DrinkPricesEffect DPE ON DPE.PriceID =
(
Select Top 1 PriceID FROM
(
Select PriceID,RANK() OVER(ORDER BY PriceEffectiveDate DESC ) AS rnk
FROM DrinkPricesEffect
WHERE DH.DrinkId = DrinkId AND
DH.DateTaken >= PriceEffectiveDate
)SubQ WHERE rnk = 1
)
WHERE DH.DateTaken Between '2014-05-01' AND '2014-05-30'
Here you can find the SQL Fiddle link: http://sqlfiddle.com/#!6/5f8fb/26/0