Bigquery - How to Calculate the sum of two continuous rows - google-bigquery

How can I get the sum of two rows clubbed together for instance If I have 5 rows in total, I should get 3 rows a result.
Below is my table:
2020-08-01 1
2020-08-02 3
2020-08-03 4
2020-08-04 2
2020-08-05 4
I want to achive this:
4
6
4
August 1 and 2 = 4
August 3 and 4 = 6
August 5 = 4

You could use ROW_NUMBER here:
WITH cte AS (
SELECT dt, val, ROW_NUMBER() OVER (ORDER BY dt) rn
FROM yourTable
)
SELECT SUM(val)
FROM cte
GROUP BY FLOOR((rn - 1) / 2)
GROUP BY MIN(dt);
Here is a demo link, shown in SQL Server, but whose logic should also be working for BigQuery:
Demo

Below is for Bigquery Standard SQL
#standardSQL
SELECT SUM(value) AS value,
STRING_AGG(FORMAT_DATE('%B %d', day), ' and ') || ' = ' || CAST(SUM(value) AS STRING) AS calc
FROM (
SELECT day, value, DIV(ROW_NUMBER() OVER(ORDER BY day) - 1, 2) grp
FROM `project.dataset.table` t
)
GROUP BY grp
ORDER BY grp
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2020-08-01' day, 1 value UNION ALL
SELECT '2020-08-02', 3 UNION ALL
SELECT '2020-08-03', 4 UNION ALL
SELECT '2020-08-04', 2 UNION ALL
SELECT '2020-08-05', 4
)
SELECT SUM(value) AS value,
STRING_AGG(FORMAT_DATE('%B %d', day), ' and ') || ' = ' || CAST(SUM(value) AS STRING) AS calc
FROM (
SELECT day, value, DIV(ROW_NUMBER() OVER(ORDER BY day) - 1, 2) grp
FROM `project.dataset.table` t
)
GROUP BY grp
ORDER BY grp
with output
Row value calc
1 4 August 01 and August 02 = 4
2 6 August 03 and August 04 = 6
3 4 August 05 = 4

Related

SQL count distinct over partition by cumulatively

I am using AWS Athena (Presto based) and I have this table named base:
id
category
year
month
1
a
2021
6
1
b
2022
8
1
a
2022
11
2
a
2022
1
2
a
2022
4
2
b
2022
6
I would like to craft a query that counts the distinct values of the categories per id, cumulatively per month and year, but retaining the original columns:
id
category
year
month
sumC
1
a
2021
6
1
1
b
2022
8
2
1
a
2022
11
2
2
a
2022
1
1
2
a
2022
4
1
2
b
2022
6
2
I've tried doing the following query with no success:
SELECT id,
category,
year,
month,
COUNT(category) OVER (PARTITION BY id, ORDER BY year, month) AS sumC FROM base;
This results in 1, 2, 3, 1, 2, 3 which is not what I'm looking for. I'd rather need something like a COUNT(DISTINCT) inside a window function, though it's not supported as a construct.
I also tried the DENSE_RANK trick:
DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
+ DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
- 1 as sumC
Though, because there is no ordering between year and month, it just results in 2, 2, 2, 2, 2, 2.
Any help is appreciated!
One option is
creating a new column that will contain when each "category" is seen for the first time (partitioning on "id", "category" and ordering on "year", "month")
computing a running sum over this column, with the same partition
WITH cte AS (
SELECT *,
CASE WHEN ROW_NUMBER() OVER(
PARTITION BY id, category
ORDER BY year, month) = 1
THEN 1
ELSE 0
END AS rn1
FROM base
ORDER BY id,
year_,
month_
)
SELECT id,
category,
year_,
month_,
SUM(rn1) OVER(
PARTITION BY id
ORDER BY year, month
) AS sumC
FROM cte

sum values based on 7-day cycle in SQL Oracle

I have dates and some value, I would like to sum values within 7-day cycle starting from the first date.
date value
01-01-2021 1
02-01-2021 1
05-01-2021 1
07-01-2021 1
10-01-2021 1
12-01-2021 1
13-01-2021 1
16-01-2021 1
18-01-2021 1
22-01-2021 1
23-01-2021 1
30-01-2021 1
this is my input data with 4 groups to see what groups will create the 7-day cycle.
It should start with first date and sum all values within 7 days after first date included.
then start a new group with next day plus anothe 7 days, 10-01 till 17-01 and then again new group from 18-01 till 25-01 and so on.
so the output will be
group1 4
group2 4
group3 3
group4 1
with match_recognize would be easy current_day < first_day + 7 as a condition for the pattern but please don't use match_recognize clause as solution !!!
One approach is a recursive CTE:
with tt as (
select dte, value, row_number() over (order by dte) as seqnum
from t
),
cte (dte, value, seqnum, firstdte) as (
select tt.dte, tt.value, tt.seqnum, tt.dte
from tt
where seqnum = 1
union all
select tt.dte, tt.value, tt.seqnum,
(case when tt.dte < cte.firstdte + interval '7' day then cte.firstdte else tt.dte end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select firstdte, sum(value)
from cte
group by firstdte
order by firstdte;
This identifies the groups by the first date. You can use row_number() over (order by firstdte) if you want a number.
Here is a db<>fiddle.

SQL: How to create a weekly user count summary by month

I’m trying to create a week over week active user count summary report/table aggregated by month. I have one table for June 2017 and one table for May 2016 which I need to join together in order to. The date timestamp is created_utc which is a UNIX timestamp which I can figure out to transform into a human-readable format and from there extract the week of the year value so 1 through 52. The questions I have are:
Number the weeks just by values of 1 through 4. So, week 1 for June, Week 1 for May, Week 2 for June week 2 for May and so on.
Joining the tables based by those weeks 1 through 4 values
Pivoting the table and adding a WOW Change variable.
I'd like the final table to look like this:
W
| Week | June_count | May_count |WOW_Change |
|:-----------|:-----------:|:------------:|:----------:
| Week_1 | 5 | 8 | 0.6 |
| Week_2 | 2 | 1 | -0.5 |
| Week_3 | 10 | 5 | -0.5 |
| Week_4 | 30 | 6 | 1 |
Below is some sample data as well as the code I've started.
CREATE TABLE June
(created_utc int, id varchar(6))
;
INSERT INTO June
(created_utc, userid)
VALUES
(1496354167, '6eq4xf'),
(1496362973, '6eqzz3'),
(1496431934, '6ewlm8'),
(1496870877, '6fwied'),
(1496778080, '6fo79k'),
(1496933893, '6g1gcg'),
(1497154559, '6gjkid'),
(1497618561, '6hmeud'),
(1497377349, '6h1osm'),
(1497221017, '6god73'),
(1497731470, '6hvmic'),
(1497273130, '6gs4ay'),
(1498080798, '6ioz8q'),
(1497769316, '6hyer4'),
(1497415729, '6h5cgu'),
(1497978764, '6iffwq')
;
CREATE TABLE May
(created_utc int, id varchar(6))
;
INSERT INTO May
(created_utc, userid)
VALUES
(1493729491, '68sx7k'),
(1493646801, '68m2s2'),
(1493747285, '68uohf'),
(1493664087, '68ntss'),
(1493690759, '68qe5k'),
(1493829196, '691fy9'),
(1493646344, '68m1dv'),
(1494166859, '69rhkl'),
(1493883023, '6963qb'),
(1494362328, '6a83wv'),
(1494525998, '6alv6c'),
(1493945230, '69bkhb'),
(1494050355, '69jqtz'),
(1494418011, '6accd0'),
(1494425781, '6ad0xm'),
(1494024697, '69hx2z'),
(1494586576, '6aql9y')
;
#standardSQL
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM June
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM May
Below is for BigQuery Standard SQL
#standardSQL
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.June` AS (
SELECT 1496354167 created_utc, '6eq4xf' userid UNION ALL
SELECT 1496362973, '6eqzz3' UNION ALL
SELECT 1496431934, '6ewlm8' UNION ALL
SELECT 1496870877, '6fwied' UNION ALL
SELECT 1496778080, '6fo79k' UNION ALL
SELECT 1496933893, '6g1gcg' UNION ALL
SELECT 1497154559, '6gjkid' UNION ALL
SELECT 1497618561, '6hmeud' UNION ALL
SELECT 1497377349, '6h1osm' UNION ALL
SELECT 1497221017, '6god73' UNION ALL
SELECT 1497731470, '6hvmic' UNION ALL
SELECT 1497273130, '6gs4ay' UNION ALL
SELECT 1498080798, '6ioz8q' UNION ALL
SELECT 1497769316, '6hyer4' UNION ALL
SELECT 1497415729, '6h5cgu' UNION ALL
SELECT 1497978764, '6iffwq'
), `project.dataset.May` AS (
SELECT 1493729491 created_utc, '68sx7k' userid UNION ALL
SELECT 1493646801, '68m2s2' UNION ALL
SELECT 1493747285, '68uohf' UNION ALL
SELECT 1493664087, '68ntss' UNION ALL
SELECT 1493690759, '68qe5k' UNION ALL
SELECT 1493829196, '691fy9' UNION ALL
SELECT 1493646344, '68m1dv' UNION ALL
SELECT 1494166859, '69rhkl' UNION ALL
SELECT 1493883023, '6963qb' UNION ALL
SELECT 1494362328, '6a83wv' UNION ALL
SELECT 1494525998, '6alv6c' UNION ALL
SELECT 1493945230, '69bkhb' UNION ALL
SELECT 1494050355, '69jqtz' UNION ALL
SELECT 1494418011, '6accd0' UNION ALL
SELECT 1494425781, '6ad0xm' UNION ALL
SELECT 1494024697, '69hx2z' UNION ALL
SELECT 1494586576, '6aql9y'
)
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
-- ORDER BY week
with result (as sample data is limited to just first two weeks result is also showing two weeks only which should not be an issue when you apply it to real data)
Row Week June_count May_count WOW_Change
1 Week_1 5 12 1.4
2 Week_2 6 5 -0.17
Use arithmetic on the day of the month to get the week:
SELECT j.weeknumber, j.user_count as june_user_count,
m.user_count as may_user_count
FROM (SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM June
GROUP BY week_number
) j JOIN
(SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM May
GROUP BY week_number
) m
ON m.week_number = j.week_number;
Note that splitting data into different tables just based on the date is bad idea. The data should all go into one table, perhaps partitioned if data volume is an issue.

How to use lead lag function in oracle

I have written some query to get my resultant result as below :
Note: I have months starting from jan-2016 to jan-2018.
There are two types, either 'hist' or 'future'
Resultant dataset :
In this example : let consider combination of id1+id2+id3 as 1,2,3
type month id1 id2 id3 value
hist jan-17 1 2 3 10
hist feb-17 1 2 3 20
future jan-17 1 2 3 15
future feb-17 1 2 3 1
hist mar-17 1 2 3 2
future apr-17 1 2 3 5
My calculation logic depends on the quarter number of month .
For eg . for month of january(first month of quarter) i want the value to be : future of jan + future value of feb + future value of march .
so for jan-17 , output should be : 15+1 + 0(for march there is no corresponding future value)
for the month of feb (2nd month of quarter), value should be : hist of jan + future of feb + future of march i.e 10+1+0(future of march is not available)
Similarly for the month of march , value should be : history of jan + history of feb + future of march i.e 10+20+0(frecast of march no present) .
similarly for april,may.june(depending on quarter number of month)
I am aware of the lead lag function , but I am not able to apply it here
Can someone please help
I would not mess with lag, this can all be done with a group by if you convert your dates to quarters:
WITH
dset
AS
(SELECT DATE '2017-01-17' month, 5 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-02-17' month, 6 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-03-25' month, 7 VALUE
FROM DUAL
UNION ALL
SELECT DATE '2017-05-25' month, 4 VALUE
FROM DUAL)
SELECT SUM (VALUE) value_sum, TO_CHAR (month, 'q') quarter, TO_CHAR (month, 'YYYY') year
FROM dset
GROUP BY TO_CHAR (month, 'q'), TO_CHAR (month, 'YYYY');
This results in:
VALUE_SUM QUARTER YEAR
18 1 2017
4 2 2017
We can use an analytic function if you need the result on each record:
SELECT SUM (VALUE) OVER (PARTITION BY TO_CHAR (month, 'q'), TO_CHAR (month, 'YYYY')) quarter_sum, month, VALUE
FROM dset
This results in:
QUARTER_SUM MONTH VALUE
18 1/17/2017 5
18 2/17/2017 6
18 3/25/2017 7
4 5/25/2017 4
Make certain you include year, you don't want to combine quarters from different years.
Well, as said in one of the comments.. the trick lies in another question of yours & the corresponding answer. Well... it goes somewhat like this..
with
x as
(select 'hist' type, To_Date('JAN-2017','MON-YYYY') ym , 10 value from dual union all
select 'future' type, To_Date('JAN-2017','MON-YYYY'), 15 value from dual union all
select 'future' type, To_Date('FEB-2017','MON-YYYY'), 1 value from dual),
y as
(select * from x Pivot(Sum(Value) For Type in ('hist' as h,'future' as f))),
/* Pivot for easy lag,lead query instead of working with rows..*/
z as
(
select ym,sum(h) H,sum(f) F from (
Select y.ym,y.H,y.F from y
union all
select add_months(to_Date('01-JAN-2017','DD-MON-YYYY'),rownum-1) ym, 0 H, 0 F
from dual connect by rownum <=3 /* depends on how many months you are querying...
so this dual adds the corresponding missing 0 records...*/
) group by ym
)
select
ym,
Case
When MOD(Extract(Month from YM),3) = 1
Then F + Lead(F,1) Over(Order by ym) + Lead(F,2) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 2
Then Lag(H,1) Over(Order by ym) + F + Lead(F,1) Over(Order by ym)
When MOD(Extract(Month from YM),3) = 3
Then Lag(H,2) Over(Order by ym) + Lag(H,1) Over(Order by ym) + F
End Required_Value
from z

SQL - Count number of changes in an ordered list

Say I've got a table with two columns (date and price). If I select over a range of dates, then is there a way to count the number of price changes over time?
For instance:
Date | Price
22-Oct-11 | 3.20
23-Oct-11 | 3.40
24-Oct-11 | 3.40
25-Oct-11 | 3.50
26-Oct-11 | 3.40
27-Oct-11 | 3.20
28-Oct-11 | 3.20
In this case, I would like it to return a count of 4 price changes.
Thanks in advance.
You can use the analytic functions LEAD and LAG to access to prior and next row of a result set and then use that to see if there are changes.
SQL> ed
Wrote file afiedt.buf
1 with t as (
2 select date '2011-10-22' dt, 3.2 price from dual union all
3 select date '2011-10-23', 3.4 from dual union all
4 select date '2011-10-24', 3.4 from dual union all
5 select date '2011-10-25', 3.5 from dual union all
6 select date '2011-10-26', 3.4 from dual union all
7 select date '2011-10-27', 3.2 from dual union all
8 select date '2011-10-28', 3.2 from dual
9 )
10 select sum(is_change)
11 from (
12 select dt,
13 price,
14 lag(price) over (order by dt) prior_price,
15 (case when lag(price) over (order by dt) != price
16 then 1
17 else 0
18 end) is_change
19* from t)
SQL> /
SUM(IS_CHANGE)
--------------
4
Try this
select count(*)
from
(select date,price from table where date between X and Y
group by date,price )
Depending on the Oracle version use either analytical functions (see answer from Justin Cave) or this
SELECT
SUM (CASE WHEN PREVPRICE != PRICE THEN 1 ELSE 0 END) CNTCHANGES
FROM
(
SELECT
C.DATE,
C.PRICE,
MAX ( D.PRICE ) PREVPRICE
FROM
(
SELECT
A.Date,
A.Price,
(SELECT MAX (B.DATE) FROM MyTable B WHERE B.DATE < A.DATE) PrevDate
FROM MyTable A
WHERE A.DATE BETWEEN YourStartDate AND YourEndDate
) C
INNER JOIN MyTable D ON D.DATE = C.PREVDATE
GROUP BY C.DATE, C.PRICE
)