How to find the highest sales in each year in BigQuery? - google-bigquery

The following table contains phone name,number of items sold,month and year.
with table1 as(
select "iphone" as phone,3 as sold_out,"Jan" as month,2015 as year union all
select "iphone",10,"Feb",2015 union all
select "samsung",4,"March",2015 union all
select "Lava",14,"June",2016 union all
select "Lenova",8,"July",2016 union all
select "Lenova",10,"Sep",2016 union all
select "Motorola",8,"Jan",2017 union all
select "Nokia",7,"Jan",2017 union all
select "Nokia",3,"Feb",2017
)
and I would to get the answer like this
-----------------------------
year Phone sales
-----------------------------
2015 iphone 13
2016 lenova 18
2017 Nokia 10
-----------------------------
I haven't tried because honestly I don't know

Below is for BigQuery Standrad SQL
#standardSQL
SELECT
year,
ARRAY_AGG(STRUCT(phone, sales) ORDER BY sales DESC LIMIT 1)[OFFSET(0)].*
FROM (
SELECT year, phone, SUM(sold_out) sales
FROM `project.dataset.table1`
GROUP BY year, phone
)
GROUP BY year
You can test / play above with dummy data from your question as below
#standardSQL
WITH `project.dataset.table1` AS(
SELECT "iphone" AS phone,3 AS sold_out,"Jan" AS month,2015 AS year UNION ALL
SELECT "iphone",10,"Feb",2015 UNION ALL
SELECT "samsung",4,"March",2015 UNION ALL
SELECT "Lava",14,"June",2016 UNION ALL
SELECT "Lenova",8,"July",2016 UNION ALL
SELECT "Lenova",10,"Sep",2016 UNION ALL
SELECT "Motorola",8,"Jan",2017 UNION ALL
SELECT "Nokia",7,"Jan",2017 UNION ALL
SELECT "Nokia",3,"Feb",2017
)
SELECT
year,
ARRAY_AGG(STRUCT(phone, sales) ORDER BY sales DESC LIMIT 1)[OFFSET(0)].*
FROM (
SELECT year, phone, SUM(sold_out) sales
FROM `project.dataset.table1`
GROUP BY year, phone
)
GROUP BY year
ORDER BY year
with result
Row year phone sales
1 2015 iphone 13
2 2016 Lenova 18
3 2017 Nokia 10

SELECT year AS year, phone AS Phone, sum(sold_out) AS sales
FROM table1
GROUP BY year, Phone
HAVING COUNT(Phone)=2
ORDER BY year ASC
;
This will give you the output that you desire, in Standard SQL.

Related

How can i do a rolling 12 month sum when some year month values are missing?

I am calculating rolling sum as such:
select
city,
month_year,
person,
sum(total) over (partition by person,city order by month_year rows between 11 preceding and current row) rolling_one_year
from
(select
city,
month_year,
person,
sum(amount_dollar) as total
from db1 d
group by 1,2,3) ;
however sometimes the not every person has a month_year value: e.g. a rolling 12 year some is as below IF we had consecutive month values:
but what if a month was missing for person e.g. 202208, according to the logic above it would calculate the following 202201 - 202301 which as we know 13 months.
How can i adapt my code above to ensure that the range of months selected is within 1 year?
A possible solution is to LEFT JOIN your data to the calendar table.
Here is a guide on how to create the calendar table if you don't have one.
Create a date table in hive
You should use a logical window frame RANGE instead of ROWS. consider below query.
WITH monthly_total AS (
SELECT '201911' year_month, 4 total UNION ALL
SELECT '201912' year_month, 10 total UNION ALL
SELECT '202201' year_month, 1 total UNION ALL
SELECT '202202' year_month, 3 total UNION ALL
SELECT '202203' year_month, 9 total UNION ALL
SELECT '202204' year_month, 4 total UNION ALL
SELECT '202205' year_month, 2 total UNION ALL
SELECT '202206' year_month, 8 total UNION ALL
SELECT '202207' year_month, 6 total UNION ALL
SELECT '202209' year_month, 3 total UNION ALL
SELECT '202210' year_month, 10 total UNION ALL
SELECT '202211' year_month, 1 total UNION ALL
SELECT '202212' year_month, 3 total UNION ALL
SELECT '202301' year_month, 50 total
)
SELECT *, SUM(total) OVER w AS rolling_12m_sum
FROM monthly_total
WINDOW w AS (
ORDER BY CAST(SUBSTR(year_month, 1, 4) AS INTEGER) * 12 + CAST(SUBSTR(year_month, 5, 2) AS INTEGER)
RANGE BETWEEN 11 PRECEDING AND CURRENT ROW
) ORDER BY year_month;
I'ved ignored partition by person,city for simplicity.
Below would be helpful in case you're not familiar with RANGE
https://learnsql.com/blog/difference-between-rows-range-window-functions/
Query results

Selecting min count('x') by year

I'm trying to create a table that displays the song(s) with the minimum number of plays by year.
For instance if Song1 and Song2 both only had 1 play and Song3 had 2 plays in 2018 and Song1 had 1 play in 2017 and Song2 had 2 plays in 2017, I want a table that would return 3 rows:
Song1 - 2018 - 1 play Song2 - 2018 - 1 play
Song1 - 2017 - 1 play
Is there a way to display the songs where min(count('x')) = count('x').
I'm sure that isn't the proper syntax but it's essentially what I'm trying to find.
SELECT * FROM music
NATURAL JOIN (SELECT extract(year from date) AS yr, song_code, COUNT('x')
FROM singles NATURAL JOIN plays
GROUP BY extract(year from date), song_code
ORDER BY yr desc, COUNT('x') desc);
Currently I have the songs grouped by number of plays a year, but I'm not sure how to only show those that have played the minimum amount of times.
-- You can prefer using analytic functions such as dense_rank() rather than joins or in-subqueries.
with songs( id, year, play_id ) as
(
select 1, 2018, 1 from dual union all
select 2, 2018, 1 from dual union all
select 3, 2018, 1 from dual union all
select 3, 2018, 2 from dual union all
select 1, 2017, 1 from dual union all
select 2, 2017, 1 from dual union all
select 2, 2017, 2 from dual
)
select id, year, play_cnt
from
(select s.*, dense_rank() over (partition by year order by play_cnt) dr
from
(select id, year, count(play_id) as play_cnt
from songs s
group by id, year
) s
)
where dr = 1;
ID YEAR PLAY_CNT
---------- ---------- ----------
1 2017 1
2 2018 1
1 2018 1
This should work. This is also a good example of why we should use WITH (subquery factoring) instead of an inline view. creating a song_count subquery factoring with year, song_code and count.
Also, I would explicitly define the joins for clarity
WITH song_count as
SELECT extract(year from date) AS song_year, song_code, count(*) as play_count
FROM singles NATURAL JOIN plays
group by extract(year from date),song_code
select * from song_count
where
(song_year,play_count) in (select song_year,min(play_count) from song_count group by song_year)
You can try with row_number():
SELECT *
FROM music
NATURAL JOIN
(select yr, song_code, play_count
from
(SELECT extract(year from date) AS yr, song_code, COUNT('x') play_count, row_number() over (partition by extract(year from date), song_code order by COUNT('x')) rn
FROM singles NATURAL JOIN plays
GROUP BY extract(year from date), song_code
)
where rn = 1;

SQL: How to create a weekly user count summary by month

I’m trying to create a week over week active user count summary report/table aggregated by month. I have one table for June 2017 and one table for May 2016 which I need to join together in order to. The date timestamp is created_utc which is a UNIX timestamp which I can figure out to transform into a human-readable format and from there extract the week of the year value so 1 through 52. The questions I have are:
Number the weeks just by values of 1 through 4. So, week 1 for June, Week 1 for May, Week 2 for June week 2 for May and so on.
Joining the tables based by those weeks 1 through 4 values
Pivoting the table and adding a WOW Change variable.
I'd like the final table to look like this:
W
| Week | June_count | May_count |WOW_Change |
|:-----------|:-----------:|:------------:|:----------:
| Week_1 | 5 | 8 | 0.6 |
| Week_2 | 2 | 1 | -0.5 |
| Week_3 | 10 | 5 | -0.5 |
| Week_4 | 30 | 6 | 1 |
Below is some sample data as well as the code I've started.
CREATE TABLE June
(created_utc int, id varchar(6))
;
INSERT INTO June
(created_utc, userid)
VALUES
(1496354167, '6eq4xf'),
(1496362973, '6eqzz3'),
(1496431934, '6ewlm8'),
(1496870877, '6fwied'),
(1496778080, '6fo79k'),
(1496933893, '6g1gcg'),
(1497154559, '6gjkid'),
(1497618561, '6hmeud'),
(1497377349, '6h1osm'),
(1497221017, '6god73'),
(1497731470, '6hvmic'),
(1497273130, '6gs4ay'),
(1498080798, '6ioz8q'),
(1497769316, '6hyer4'),
(1497415729, '6h5cgu'),
(1497978764, '6iffwq')
;
CREATE TABLE May
(created_utc int, id varchar(6))
;
INSERT INTO May
(created_utc, userid)
VALUES
(1493729491, '68sx7k'),
(1493646801, '68m2s2'),
(1493747285, '68uohf'),
(1493664087, '68ntss'),
(1493690759, '68qe5k'),
(1493829196, '691fy9'),
(1493646344, '68m1dv'),
(1494166859, '69rhkl'),
(1493883023, '6963qb'),
(1494362328, '6a83wv'),
(1494525998, '6alv6c'),
(1493945230, '69bkhb'),
(1494050355, '69jqtz'),
(1494418011, '6accd0'),
(1494425781, '6ad0xm'),
(1494024697, '69hx2z'),
(1494586576, '6aql9y')
;
#standardSQL
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM June
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM May
Below is for BigQuery Standard SQL
#standardSQL
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.June` AS (
SELECT 1496354167 created_utc, '6eq4xf' userid UNION ALL
SELECT 1496362973, '6eqzz3' UNION ALL
SELECT 1496431934, '6ewlm8' UNION ALL
SELECT 1496870877, '6fwied' UNION ALL
SELECT 1496778080, '6fo79k' UNION ALL
SELECT 1496933893, '6g1gcg' UNION ALL
SELECT 1497154559, '6gjkid' UNION ALL
SELECT 1497618561, '6hmeud' UNION ALL
SELECT 1497377349, '6h1osm' UNION ALL
SELECT 1497221017, '6god73' UNION ALL
SELECT 1497731470, '6hvmic' UNION ALL
SELECT 1497273130, '6gs4ay' UNION ALL
SELECT 1498080798, '6ioz8q' UNION ALL
SELECT 1497769316, '6hyer4' UNION ALL
SELECT 1497415729, '6h5cgu' UNION ALL
SELECT 1497978764, '6iffwq'
), `project.dataset.May` AS (
SELECT 1493729491 created_utc, '68sx7k' userid UNION ALL
SELECT 1493646801, '68m2s2' UNION ALL
SELECT 1493747285, '68uohf' UNION ALL
SELECT 1493664087, '68ntss' UNION ALL
SELECT 1493690759, '68qe5k' UNION ALL
SELECT 1493829196, '691fy9' UNION ALL
SELECT 1493646344, '68m1dv' UNION ALL
SELECT 1494166859, '69rhkl' UNION ALL
SELECT 1493883023, '6963qb' UNION ALL
SELECT 1494362328, '6a83wv' UNION ALL
SELECT 1494525998, '6alv6c' UNION ALL
SELECT 1493945230, '69bkhb' UNION ALL
SELECT 1494050355, '69jqtz' UNION ALL
SELECT 1494418011, '6accd0' UNION ALL
SELECT 1494425781, '6ad0xm' UNION ALL
SELECT 1494024697, '69hx2z' UNION ALL
SELECT 1494586576, '6aql9y'
)
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
-- ORDER BY week
with result (as sample data is limited to just first two weeks result is also showing two weeks only which should not be an issue when you apply it to real data)
Row Week June_count May_count WOW_Change
1 Week_1 5 12 1.4
2 Week_2 6 5 -0.17
Use arithmetic on the day of the month to get the week:
SELECT j.weeknumber, j.user_count as june_user_count,
m.user_count as may_user_count
FROM (SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM June
GROUP BY week_number
) j JOIN
(SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM May
GROUP BY week_number
) m
ON m.week_number = j.week_number;
Note that splitting data into different tables just based on the date is bad idea. The data should all go into one table, perhaps partitioned if data volume is an issue.

SQL SUMs in where clause with conditionals

I want to get a "totals" report for business XYZ. They want the season,term,distinct count of employees, and total employee's dropped hours, only when dropped hours of anemployee != any adds that equal the drops.
trying to do something like this:
select year,
season,
(select count(distinct empID)
from tableA
where a.season = season
and a.year = year) "Employees",
(select sum(hours)
from(
select distinct year,season,empID,hours
from tableA
where code like 'Drop%'
)
where a.season = season
and a.year = year) "Dropped"
from tableA a
-- need help below
where (select sum(hours)
from(
select distinct year,season,empID,hours
from tableA
where code like 'Drop%'
)
where a.season = season
and a.year = year
and a.emplID = emplID)
!=
(select sum(hours)
from(
select distinct year,season,empID,hours
from tableA
where code like 'Add%'
)
where a.season = season
and a.year = year
and a.emplID = emplID)
group by year,season
It appears I am not correctly doing my where clause correctly. I dont believe I am joining the emplID to each emplID correctly to exlude those whos "drops" <> "adds"
EDIT:
sample data:
year,season,EmplID,hours,code
2015, FALL, 001,10,Drop
20150 FALL, 001,10,Add
2015,FALL,002,5,Drop
2015,FALL,003,10,Drop
The total hours should be 15. EmplyID 001 should be removed from the totaling because he has drops that are exactly equal to adds.
I managed to work it out with a bit of analytics .. ;)
with tableA as (
select 2015 year, 1 season, 1234 empID, 2 hours , 'Add' code from dual union all
select 2015 year, 1 season, 1234 empID, 3 hours , 'Add' code from dual union all
select 2015 year, 1 season, 1234 empID, 4 hours , 'Add' code from dual union all
select 2015 year, 1 season, 1234 empID, 2 hours , 'Drop' code from dual union all
select 2015 year, 1 season, 2345 empID, 5 hours , 'Add' code from dual union all
select 2015 year, 1 season, 2345 empID, 3.5 hours, 'Add' code from dual union all
select 2015 year, 2 season, 1234 empID, 7 hours , 'Add' code from dual union all
select 2015 year, 2 season, 1234 empID, 5 hours , 'Add' code from dual union all
select 2015 year, 2 season, 2345 empID, 5 hours , 'Add' code from dual union all
select 2015 year, 2 season, 7890 empID, 3 hours , 'Add' code from dual union all
select 2014 year, 1 season, 1234 empID, 1 hours , 'Add' code from dual union all
select 2014 year, 1 season, 1234 empID, 2 hours , 'Add' code from dual union all
select 2014 year, 1 season, 1234 empID, 4 hours , 'Add' code from dual
),
w_group as (
select year, season, empID, hours, code,
lead(hours) over (partition by year, season, empID, hours
order by case when code like 'Drop%' then 'DROP'
when code like 'Add%' then 'ADD'
else NULL end ) new_hours
from tableA
)
select year, season, count(distinct empID),
sum(hours-nvl(new_hours,0)) total_hours
from w_group
where code like 'Add%'
group by year, season
/
YEAR SEASON COUNT(DISTINCTEMPID) TOTAL_HOURS
---------- ---------- -------------------- -----------
2015 1 2 15.5
2014 1 1 7
2015 2 3 20
(the first part "with tableA" is just faking some data, since you didn't provide any) :)
[edit]
corrected based on your data, and your explanation - in short, you're counting the DROPs, (minus the ADDs), I was doing the reverse
[edit2] replaced below query with minor tweak based on comment/feedback: don't count an empID if their DROP-ADD zero out)
with tableA as (
select 2015 year, 'FALL' season, '001' empID, 10 hours, 'Drop' code from dual union all
select 2015 year, 'FALL' season, '001' empID, 10 hours, 'Add' code from dual union all
select 2015 year, 'FALL' season, '002' empID, 5 hours, 'Drop' code from dual union all
select 2015 year, 'FALL' season, '003' empID, 10 hours, 'Drop' code from dual
),
w_group as (
select year, season, empID, hours, code,
lag(hours) over (partition by year, season, empID, hours
order by case when code like 'Drop%' then 'DROP'
when code like 'Add%' then 'ADD'
else NULL end ) new_hours
from tableA
)
select year, season, count(distinct empID),
sum(hours-nvl(new_hours,0)) total_hours
from w_group
where code like 'Drop%'
and hours - nvl(new_hours,0) > 0
group by year, season
/
YEAR SEAS COUNT(DISTINCTEMPID) TOTAL_HOURS
---------- ---- -------------------- -----------
2015 FALL 2 15
[/edit]
I think you can do what you want with just conditional aggregation. Something like this:
select year, season, count(distinct empID) as Employees,
sum(case when code like 'Drop%' then hours end) as Dropped
from tableA
group by year, season;
It is hard to tell exactly what you want, because you do not have sample data and desired results (or better yet, a SQL Fiddle). You might also want a having clause:
having (sum(case when code like 'Drop%' then hours end) <>
sum(case when code like 'Add%' then hours end)
)
Are you wanting the result of something like this?
SELECT
year
,season
,COUNT(DISTINCT empID) AS Employees
,SUM(CASE WHEN code LIKE 'Drop%' THEN hours ELSE 0 END) AS Dropped
FROM
TableA
GROUP BY
year
,season
HAVING
(
SUM(CASE WHEN code LIKE 'Drop%' THEN hours ELSE 0 END)
- SUM(CASE WHEN code LIKE 'Add%' THEN hours ELSE 0 END)
) <> 0

Moving average of 2 columns

Hello I have a problem. I know how to calculate moving average last 3 months using oracle analytic functions... but my situatiion is a little different
Month-----ProductType-----Sales----------Average(HAVE TO FIND THIS)
1---------A---------------10
1---------B---------------12
1---------C---------------17
2---------A---------------21
3---------C---------------2
3---------B---------------21
4---------B---------------23
5
6
7
8
9
So we have sales for each month and each product type... I need to calculate the moving average of the last 3 months and the particular product.
example:
For month 4 and Produt B it would be (21+0+12)/3
Any ideas ?
Another option is to use the windowing clause of analytic functions
with my_data as (
select 1 as month, 'A' as product, 10 as sales from dual union all
select 1 as month, 'B' as product, 12 as sales from dual union all
select 1 as month, 'C' as product, 17 as sales from dual union all
select 2 as month, 'A' as product, 21 as sales from dual union all
select 3 as month, 'C' as product, 2 as sales from dual union all
select 3 as month, 'B' as product, 21 as sales from dual union all
select 4 as month, 'B' as product, 23 as sales from dual
)
select
month,
product,
sales,
nvl(sum(sales)
over (partition by product order by month
range between 3 preceding and 1 preceding),0)/3 as average_sales
from my_data
order by month, product
SELECT month,
productType,
sales,
(lag(sales, 3) over (partition by produtType order by month) +
lag(sales, 2) over (partition by productType order by month) +
lag(sales, 1) over (partition by productType order by month)/3 moving_avg
FROM your_table_name