Ranking and obtaining data across moving window - google-bigquery

I have following table -
create table iphone_defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
insert into iphone_defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into iphone_defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into iphone_defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into iphone_defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into iphone_defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into iphone_defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into iphone_defects values ('iPhone','Audio problem',25,202106,'2020-08-09');
Am expecting the following result -
fwkyr refers to Financial Week in a year. I have added in additional column fwenddate basically referring to max date in the financial week of the year.
Basically the ask is to obtain the defect with largest quantity in a 4 week window from the current week. Say for the fwkyr - 202112, the highest defects is for 'Glass breakage' and the total quantity is 100.
This is a static window. My actual use case needs 52 week.
Without the moving window, I know that I can rank and get the data but not sure on how to even approach this problem. Any help?

Per updated question my updated solution gets much longer and changes quite a bit.
I am still not sure if user selects from which week you need another 52 weeks or if you are looking at this calculation from start (week 1) of every year.
I also assume that you have a typo in one of your insert statements when I compare to your desired output table. So I changed it to fit your output table.
1. Create table
create table table.defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
2. Insert data (adjusted last insert to match your output table)
insert into table.defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into table.defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into table.defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into table.defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into table.defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into table.defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into table.defects values ('iPhone','Audio problem',55,202106,'2020-08-09');
3. Query for results
###############################################################################
### start count of weeks since selected first week and
### get number of weeks by desired range
###############################################################################
WITH
get_weeks AS (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_numbering,
SPLIT(CAST(ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr)/4 AS string), '.')[
OFFSET
(0)] AS week_id_0,
FROM
table.defects
ORDER BY
fwkyr DESC
),
###############################################################################
### produce filter column for each window period by offsetting
###############################################################################
get_weeks_consequtive AS (
SELECT
*,
LAG(week_id_0,1) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_1,
LAG(week_id_0,2) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_2,
LAG(week_id_0,3) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_3
FROM
get_weeks ),
###############################################################################
### create tables and calculations per window using filter column where you group by for qty and keep top qty only
###############################################################################
week_id_0 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_0 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_1 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_1 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_2 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_2 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_3 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_3 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1)
###############################################################################
### union all selected windows
###############################################################################
SELECT
*
FROM
week_id_0
UNION ALL
SELECT
*
FROM
week_id_1
UNION ALL
SELECT
*
FROM
week_id_2
UNION ALL
SELECT
*
FROM
week_id_3
ORDER BY
week_id DESC
get_weeks
get_weeks_consequtive
week_id_1
result
PS ---
I brainstormed this quick per your update perhaps there is a better way and I would be interested in seeing it.
Anyhow, with such lengthy queries I typically produce a python script with text templates for repetitive parts and use a loop to expand repetitive parts to desired lengths by incrementing changing values and inserting them with so called f strings.

Related

Display duplicate row indicator and get only one row when duplicate

I built the schema at http://sqlfiddle.com/#!18/7e9e3
CREATE TABLE BoatOwners
(
BoatID INT,
OwnerDOB DATETIME,
Name VARCHAR(200)
);
INSERT INTO BoatOwners (BoatID, OwnerDOB,Name)
VALUES (1, '2021-04-06', 'Bob1'),
(1, '2020-04-06', 'Bob2'),
(1, '2019-04-06', 'Bob3'),
(2, '2012-04-06', 'Tom'),
(3, '2009-04-06', 'David'),
(4, '2006-04-06', 'Dale1'),
(4, '2009-04-06', 'Dale2'),
(4, '2013-04-06', 'Dale3');
I would like to write a query that would produce the following result characteristics :
Returns only one owner per boat
When multiple owners on a single boat, return the youngest owner.
Display a column to indicate if a boat has multiple owners.
So the following data set when apply that query would produce
I tried
ROW_NUMBER() OVER (PARTITION BY ....
but haven't had much luck so far.
with data as (
select BoatID, OwnerDOB, Name,
row_number() over (partition by BoatID order by OwnerDOB desc) as rn,
count() over (partition by BoatID) as cnt
from BoatOwners
)
select BoatID, OwnerDOB, Name,
case when cnt > 1 then 'Yes' else 'No' end as MultipleOwner
from data
where rn = 1
This is just a case of numbering the rows for each BoatId group and also counting the rows in each group, then filtering accordingly:
select BoatId, OwnerDob, Name, Iif(qty=1,'No','Yes') MultipleOwner
from (
select *, Row_Number() over(partition by boatid order by OwnerDOB desc)rn, Count(*) over(partition by boatid) qty
from BoatOwners
)b where rn=1

How to insert line break in between ranked items in SQL?

I would like to ask how to insert a line break in sql
For example the image below I have a few items which I ranked by the highest price to the lowest price. I would like to insert a line break in between the various groups of items.
My query is as such
with
item_list as (
Select
item_name, price
from table_A
where month = 3 and year = 2020
order by price desc
)
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY a.item_name
ORDER BY price DESC) as rank, *
From item_list a)
WHERE rank <= 5
ORDER BY item_name

How to find the frequency that a specific entity occurs in an SQL table?

I have a question for a school assignment that gives us the following data:
insert into vehicles values ('U200', 'Chevrolet', 'Camaro', 1969, 'red');
insert into vehicles values ('U201', 'Toyoto', 'Corolla', 2012, 'red');
insert into vehicles values ('U202', 'Toyoto', 'RAV4', 2013, 'red');
insert into vehicles values ('U203', 'Kia', 'Cube', 2013, 'red');
insert into vehicles values ('U300', 'Mercedes', 'SL 230', 1964, 'black');
insert into vehicles values ('U301', 'Audi', 'A4', 2013, 'black');
insert into vehicles values ('U302', 'Toyoto', 'RAV4', 2012, 'black');
insert into vehicles values ('U303', 'Mercedes', 'SL 230', 2014, 'black');
insert into vehicles values ('U400', 'Chevrolet', 'Camaro', 2012, 'black');
EDIT: I trimmed down some of the data since it made the question too long. I think the idea is still clear without all the data values entered.
We're tasked with finding for each year of a car, find the most frequent make and the most frequent car color. In case of ties, list all those ties.
We're given this table and we're supposed to make a query for this data that answers the question.
vehicles(vin,make,model,year,color)
I tried to use 'join' or 'group by' to somehow group everything together, but I have no idea if my methodology even makes sense. I tried something like the code I have here, but I'm kinda lost on how the rest should function.
select year, make, color, count(make)
from vehicles
where (Not sure what to put here)...
order by year desc
having count(*)
The expected result should be something like this:
1964|Mercedes|black
1969|Chevrolet|red
1969|Chevrolet|white
2012|Chevrolet|white
2012|Toyoto|white
2013|Audi|red
2014|Audi|white
2015|Audi|white
2016|Audi|white
Your description and comments imply that you want something this:
WITH cte1 AS (
SELECT *,
COUNT(*) OVER (PARTITION BY year, make) make_count,
COUNT(*) OVER (PARTITION BY year, color) color_count
FROM vehicles
),
cte2 AS (
SELECT *,
RANK() OVER (PARTITION BY year ORDER BY make_count DESC) make_rnk,
RANK() OVER (PARTITION BY year ORDER BY color_count DESC) color_rnk
FROM cte1
)
SELECT year, make, color
FROM cte2
WHERE 1 IN (make_rnk, color_rnk)
ORDER BY year, make;
I believe that you need two queries (although they could be combined as below) as I understand the question although, each year of a car is confusing, so this is based on for each year.
As such I believe that the following will do what you appear to want :-
WITH
colour_year_counts AS (SELECT colour, year, count() AS c FROM vehicles GROUP BY year, colour)
SELECT * FROM colour_year_counts AS cyctrim WHERE c >= (SELECT max(c) FROM colour_year_counts WHERE year = cyctrim.year) ORDER BY year;
WITH
car_make_counts AS (SELECT make,year,count() AS c FROM vehicles GROUP BY year, make)
SELECT * FROM car_make_counts AS cmctrim WHERE c >= (SELECT max(c) FROM car_make_counts WHERE year = cmctrim.year) ORDER BY year;
This results in :-
For colour :-
For make :-
The following combines the two into a single query :-
WITH
colour_year_counts AS (SELECT colour, year, count() AS c FROM vehicles GROUP BY year, colour),
trimmedcyc AS (SELECT * FROM colour_year_counts AS cyctrim WHERE c >= (SELECT max(c) FROM colour_year_counts WHERE year = cyctrim.year) ORDER BY year),
car_make_counts AS (SELECT make,year,count() AS c FROM vehicles GROUP BY year, make),
trimmedcmc AS (SELECT * FROM car_make_counts AS cmctrim WHERE c >= (SELECT max(c) FROM car_make_counts WHERE year = cmctrim.year) ORDER BY year),
report AS (
SELECT 'Colour' AS Item ,'Year' AS Year,'Count' AS Count, 0 AS sortorder
UNION SELECT *, 1 AS sortorder FROM trimmedcyc
UNION SELECT 'Make','','', 2 as sortorder
UNION SELECT *, 3 AS sortorder FROM trimmedcmc
)
SELECT Item,Year,Count FROM report ORDER BY sortorder,year;
and results in :-
Note The line that Appears as Item Year Count not part of the output, but the column names as shown by the tool.

SQL Running total grouped by ID

Using this Query, I need to populate the NULL column with running total for each row where it would correspond to the paid amount over the period of a calendar year, year to date, of the current table. This running total should be grouped by member_id.
SELECT id=identity(int,1,1), cast(null as numeric(22,3)) as max_running_total, *
INTO #temp
FROM Customer_DB..Sales_Table
ORDER BY Date_Column asc
UPDATE #temp
SET max_running_total = (SELECT SUM(paid_amount)
FROM #temp
WHERE id <= id
GROUP BY member_id)
Since you have not given the schema, I have taken a sample schema and have tried to a rolling sum. You can use the same sql windows functions and achieve your results
CREATE TABLE amt
(
id INT,
paid_amount DECIMAL,
running_total DECIMAL
)
insert INTO amt VALUES (1, 100, NULL), (2, 50, NULL), (3, 50, NULL)
SELECT id, paid_amount,
SUM(paid_amount) over(ORDER BY id ROWS BETWEEN unbounded preceding AND CURRENT ROW) AS running_total
FROM amt

SQL Group BY SUM one column and select of first row of grouped items

I have a part table where I have 5 fields. I want to sum the QTY of the mfgpn while showing the first returned row for the other 3 fields (Manfucturer, DateCode, Description). I initially thought of using the MIN function as follows, but that doesn't really help me insofar as that the data is not a int data type. How would I go about doing this? Right now I'm stuck at the following query below:
SELECT SUM([QTY]) AS QTY
,[MFGPN]
,MIN([MANUFACTURER]) AS MANUFACTURER
,MIN([DATECODE]) AS DateCode
,MIN([DESCRIPTION]) AS DESCRIPTION
INTO part
GROUP BY MFGPN, MANUFACTURER, DATECODE, description
ORDER BY mfgpn ASC
Would CROSS APPLY work for you?
SELECT
SUM(a.[QTY]) AS QTY
,a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
FROM part a
CROSS APPLY (SELECT TOP 1 * FROM part b WHERE a.[MFGPN] = b.[MFGPN]) c
GROUP BY
a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
Tested with the following:
DECLARE #T1 AS TABLE (
[QTY] int
,[MFGPN] NVARCHAR(50)
,[MANUFACTURER] NVARCHAR(50)
,[DATECODE] DATE
,[DESCRIPTION] NVARCHAR(50));
INSERT #T1 VALUES
(2, 'MFGPN-1', 'MANUFACTURER-A', '20120101', 'A-1'),
(4, 'MFGPN-1', 'MANUFACTURER-B', '20120102', 'B-1'),
(3, 'MFGPN-1', 'MANUFACTURER-C', '20120103', 'C-1'),
(1, 'MFGPN-2', 'MANUFACTURER-A', '20120101', 'A-2'),
(5, 'MFGPN-2', 'MANUFACTURER-B', '20120101', 'B-2')
SELECT
SUM(a.[QTY]) AS QTY
,a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
FROM #T1 a
CROSS APPLY (SELECT TOP 1 * FROM #T1 b WHERE a.[MFGPN] = b.[MFGPN]) c
GROUP BY
a.[MFGPN]
,c.[MANUFACTURER]
,c.[DATECODE]
,c.[DESCRIPTION]
Produces
QTY MFGPN MANUFACTURER DATECODE DESCRIPTION
9 MFGPN-1 MANUFACTURER-A 2012-01-01 A-1
6 MFGPN-2 MANUFACTURER-A 2012-01-01 A-2
This can be easily managed with a windowed SUM():
WITH summed_and_ranked AS (
SELECT
MFGPN,
MANUFACTURER,
DATECODE,
DESCRIPTION,
QTY = SUM(QTY) OVER (PARTITION BY MFGPN),
RNK = ROW_NUMBER() OVER (
PARTITION BY MFGPN
ORDER BY DATECODE -- or which column should define the order?
)
FROM atable
)
SELECT
MFGPN,
MANUFACTURER,
DATECODE,
DESCRIPTION,
QTY,
INTO parts
FROM summed_and_ranked
WHERE RNK = 1
;
For every row, the total group quantity and the ranking within the group is calculated. When actually getting rows for inserting into the new table (the main SELECT), only rows with RNK values of 1 are pulled. Thus you get a result set containing group totals as well as details of certain rows.