Bigquery Week level info - google-bigquery

I have a query that returns daily data for the last 7 days. I would like to know the syntax for getting weekly data for the last 4 weeks using bigquery
-Week Total
week 1 15
week 2 20
week 3 35

Something along those lines:
SELECT
YEAR(day) AS year,
WEEK(day) AS week,
SUM(metric) AS total
FROM YourTable
WHERE WEEK(CURRENT_DATE()) - WEEK(day) < 4
GROUP BY 1, 2
To test/play - you can use below approach that hopefuly mimics your data
SELECT
YEAR(day) AS year,
WEEK(day) AS week,
SUM(metric) AS total
FROM (
SELECT
DATE(DATE_ADD(TIMESTAMP('2016-01-01'), pos - 1, "DAY")) AS day,
CAST(100 * RAND() AS INTEGER) AS metric
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP('2016-01-01')), '.'),'') AS h
FROM (SELECT NULL)),h
)))
) AS YourTable
WHERE WEEK(CURRENT_DATE()) - WEEK(day) < 4
GROUP BY 1, 2

Related

How can i do a rolling 12 month sum when some year month values are missing?

I am calculating rolling sum as such:
select
city,
month_year,
person,
sum(total) over (partition by person,city order by month_year rows between 11 preceding and current row) rolling_one_year
from
(select
city,
month_year,
person,
sum(amount_dollar) as total
from db1 d
group by 1,2,3) ;
however sometimes the not every person has a month_year value: e.g. a rolling 12 year some is as below IF we had consecutive month values:
but what if a month was missing for person e.g. 202208, according to the logic above it would calculate the following 202201 - 202301 which as we know 13 months.
How can i adapt my code above to ensure that the range of months selected is within 1 year?
A possible solution is to LEFT JOIN your data to the calendar table.
Here is a guide on how to create the calendar table if you don't have one.
Create a date table in hive
You should use a logical window frame RANGE instead of ROWS. consider below query.
WITH monthly_total AS (
SELECT '201911' year_month, 4 total UNION ALL
SELECT '201912' year_month, 10 total UNION ALL
SELECT '202201' year_month, 1 total UNION ALL
SELECT '202202' year_month, 3 total UNION ALL
SELECT '202203' year_month, 9 total UNION ALL
SELECT '202204' year_month, 4 total UNION ALL
SELECT '202205' year_month, 2 total UNION ALL
SELECT '202206' year_month, 8 total UNION ALL
SELECT '202207' year_month, 6 total UNION ALL
SELECT '202209' year_month, 3 total UNION ALL
SELECT '202210' year_month, 10 total UNION ALL
SELECT '202211' year_month, 1 total UNION ALL
SELECT '202212' year_month, 3 total UNION ALL
SELECT '202301' year_month, 50 total
)
SELECT *, SUM(total) OVER w AS rolling_12m_sum
FROM monthly_total
WINDOW w AS (
ORDER BY CAST(SUBSTR(year_month, 1, 4) AS INTEGER) * 12 + CAST(SUBSTR(year_month, 5, 2) AS INTEGER)
RANGE BETWEEN 11 PRECEDING AND CURRENT ROW
) ORDER BY year_month;
I'ved ignored partition by person,city for simplicity.
Below would be helpful in case you're not familiar with RANGE
https://learnsql.com/blog/difference-between-rows-range-window-functions/
Query results

How can I divide hours to next working days in SQL?

I have a table that stores the start-date and number of the hours. I have also another time table as reference to working days. My main goal is the divide this hours to the working days.
For examle:
ID Date Hour
1 20210504 40
I want it to be structured as
ID Date Hour
1 20210504 8
1 20210505 8
1 20210506 8
1 20210507 8
1 20210510 8
I manage to divide the hours with the given code but couldn't manage to make it in working days.
WITH cte1 AS
(
select 1 AS ID, 20210504 AS Date, 40 AS Hours --just a test case
), working_days AS
(
select date from dateTable
),
cte2 AS
(
select ID, Date, Hours, IIF(Hours<=8, Hours, 8) AS dailyHours FROM cte1
UNION ALL
SELECT
cte2.ID,
cte2.Date + 1
,cte2.Hours - 8
,IIF(Hours<=8, Hours, 8)
FROM cte2
JOIN cte1 t ON cte2.ID = t.ID
WHERE cte2.HOURS > 8 AND cte2.Date + 1 IN (select * from working_days)
When I use it like this it only gives me this output with one day missing
ID Date Hour
1 20210504 8
1 20210505 8
1 20210506 8
1 20210507 8
To solve your problem you need to build your calendar in the right way,
adding also to working_days a ROW_NUMBER to get correct progression.
declare #date_start date = '2021-05-01'
;WITH
cte1 AS (
SELECT * FROM
(VALUES
(1, '20210504', 40),
(2, '20210505', 55),
(3, '20210503', 44)
) X (ID, Date, Hour)
),
numbers as (
SELECT ROW_NUMBER() over (order by o.object_id) N
FROM sys.objects o
),
cal as (
SELECT cast(DATEADD(day, n, #date_start) as date) d, n-1 n
FROM numbers n
where n.n<32
),
working_days as (
select d, ROW_NUMBER() over (order by n) dn
from cal
where DATEPART(weekday, d) < 6 /* monday to friday in italy (country dependent) */
),
base as (
SELECT t.ID, t.Hour, w.d, w.dn
from cte1 t
join working_days w on w.d = t.date
)
SELECT t.ID, w.d, iif((8*n)<=Hour, 8, 8 + Hour - (8*n) ) h
FROM base t
join numbers m on m.n <= (t.Hour / 8.0) + 0.5
join working_days w on w.dn = t.dn + N -1
order by 1,2
You can use a recursive CTE. This should do the trick:
with cte as (
select id, date, 8 as hour, hour as total_hour
from t
union all
select id, dateadd(day, 1, date),
(case when total_hour < 8 then total_hour else 8 end),
total_hour - 8
from cte
where total_hour > 0
)
select *
from cte;
Note: This assumes that total_hour is at least 8, just to avoid a case expression in the anchor part of the CTE. That can trivially be added.
Also, if there might be more than 100 days, you will need option (maxrecursion 0).

Bigquery - How to Calculate the sum of two continuous rows

How can I get the sum of two rows clubbed together for instance If I have 5 rows in total, I should get 3 rows a result.
Below is my table:
2020-08-01 1
2020-08-02 3
2020-08-03 4
2020-08-04 2
2020-08-05 4
I want to achive this:
4
6
4
August 1 and 2 = 4
August 3 and 4 = 6
August 5 = 4
You could use ROW_NUMBER here:
WITH cte AS (
SELECT dt, val, ROW_NUMBER() OVER (ORDER BY dt) rn
FROM yourTable
)
SELECT SUM(val)
FROM cte
GROUP BY FLOOR((rn - 1) / 2)
GROUP BY MIN(dt);
Here is a demo link, shown in SQL Server, but whose logic should also be working for BigQuery:
Demo
Below is for Bigquery Standard SQL
#standardSQL
SELECT SUM(value) AS value,
STRING_AGG(FORMAT_DATE('%B %d', day), ' and ') || ' = ' || CAST(SUM(value) AS STRING) AS calc
FROM (
SELECT day, value, DIV(ROW_NUMBER() OVER(ORDER BY day) - 1, 2) grp
FROM `project.dataset.table` t
)
GROUP BY grp
ORDER BY grp
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2020-08-01' day, 1 value UNION ALL
SELECT '2020-08-02', 3 UNION ALL
SELECT '2020-08-03', 4 UNION ALL
SELECT '2020-08-04', 2 UNION ALL
SELECT '2020-08-05', 4
)
SELECT SUM(value) AS value,
STRING_AGG(FORMAT_DATE('%B %d', day), ' and ') || ' = ' || CAST(SUM(value) AS STRING) AS calc
FROM (
SELECT day, value, DIV(ROW_NUMBER() OVER(ORDER BY day) - 1, 2) grp
FROM `project.dataset.table` t
)
GROUP BY grp
ORDER BY grp
with output
Row value calc
1 4 August 01 and August 02 = 4
2 6 August 03 and August 04 = 6
3 4 August 05 = 4

sql command to find average count of user visits to a website from past 6 months

I have a table with 2 columns, Date and number of visits.
i need to calculate average count difference of visits by month from past 6 months
Date Number_of_Visits
2018-04-06 5
2018-02-06 6
2017-04-10 3
2017-02-10 9
SQL should output
Avg_count difference visits past 6 months
5-3=2
6-9=-3
-3+2/2=-0.5
sql query output should be -0.5
creating sql as below
With cte as (
SELECT Year(v1.date) as Year, Month(v1.date) as Month, sum(v1.visits) as SumCount
FROM visits_table v1
group by Year(v1.date), Month(v1.date)
)
You wanted the average of the different of the same month over the years ? Year on Year comparison ?
This will gives you the result that you want -0.5
; With
cte as
(
SELECT Year(v1.date) as Year, Month(v1.date) as Month, sum(v1.visits) as SumCount
FROM visits_table v1
WHERE v1.date >= DATEADD(MONTH, -6, GETDATE()) -- Add here
group by Year(v1.date), Month(v1.date)
)
SELECT AVG (diff * 1.0)
FROM
(
SELECT *, diff = SumCount
- LAG (SumCount) OVER (PARTITION BY Month
ORDER BY Year)
FROM cte
) d

How to use lead lag function in oracle

I have written some query to get my resultant result as below :
Note: I have months starting from jan-2016 to jan-2018.
There are two types, either 'hist' or 'future'
Resultant dataset :
In this example : let consider combination of id1+id2+id3 as 1,2,3
type month id1 id2 id3 value
hist jan-17 1 2 3 10
hist feb-17 1 2 3 20
future jan-17 1 2 3 15
future feb-17 1 2 3 1
hist mar-17 1 2 3 2
future apr-17 1 2 3 5
My calculation logic depends on the quarter number of month .
For eg . for month of january(first month of quarter) i want the value to be : future of jan + future value of feb + future value of march .
so for jan-17 , output should be : 15+1 + 0(for march there is no corresponding future value)
for the month of feb (2nd month of quarter), value should be : hist of jan + future of feb + future of march i.e 10+1+0(future of march is not available)
Similarly for the month of march , value should be : history of jan + history of feb + future of march i.e 10+20+0(frecast of march no present) .
similarly for april,may.june(depending on quarter number of month)
I am aware of the lead lag function , but I am not able to apply it here
Can someone please help
I would not mess with lag, this can all be done with a group by if you convert your dates to quarters:
WITH
dset
AS
(SELECT DATE '2017-01-17' month, 5 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-02-17' month, 6 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-03-25' month, 7 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-05-25' month, 4 VALUE
FROM DUAL)
SELECT SUM (VALUE) value_sum, TO_CHAR (month, 'q') quarter, TO_CHAR (month, 'YYYY') year
FROM dset
GROUP BY TO_CHAR (month, 'q'), TO_CHAR (month, 'YYYY');
This results in:
VALUE_SUM QUARTER YEAR
18 1 2017
4 2 2017
We can use an analytic function if you need the result on each record:
SELECT SUM (VALUE) OVER (PARTITION BY TO_CHAR (month, 'q'), TO_CHAR (month, 'YYYY')) quarter_sum, month, VALUE
FROM dset
This results in:
QUARTER_SUM MONTH VALUE
18 1/17/2017 5
18 2/17/2017 6
18 3/25/2017 7
4 5/25/2017 4
Make certain you include year, you don't want to combine quarters from different years.
Well, as said in one of the comments.. the trick lies in another question of yours & the corresponding answer. Well... it goes somewhat like this..
with
x as
(select 'hist' type, To_Date('JAN-2017','MON-YYYY') ym , 10 value from dual union all
select 'future' type, To_Date('JAN-2017','MON-YYYY'), 15 value from dual union all
select 'future' type, To_Date('FEB-2017','MON-YYYY'), 1 value from dual),
y as
(select * from x Pivot(Sum(Value) For Type in ('hist' as h,'future' as f))),
/* Pivot for easy lag,lead query instead of working with rows..*/
z as
(
select ym,sum(h) H,sum(f) F from (
Select y.ym,y.H,y.F from y
union all
select add_months(to_Date('01-JAN-2017','DD-MON-YYYY'),rownum-1) ym, 0 H, 0 F
from dual connect by rownum <=3 /* depends on how many months you are querying...
so this dual adds the corresponding missing 0 records...*/
) group by ym
)
select
ym,
Case
When MOD(Extract(Month from YM),3) = 1
Then F + Lead(F,1) Over(Order by ym) + Lead(F,2) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 2
Then Lag(H,1) Over(Order by ym) + F + Lead(F,1) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 3
Then Lag(H,2) Over(Order by ym) + Lag(H,1) Over(Order by ym) + F
End Required_Value
from z