Related
I would like to know the sum of a value in the first n items in a related table. For example, I want to get the sum of a companies first 6 invoices (the invoices can be sorted by ID asc)
Current SQL:
SELECT invoices.company_id, SUM(invoices.amount)
FROM invoices
JOIN companies on invoices.company_id = companies.id
GROUP BY invoices.company_id
This seems simple but I can't wrap my head around it.
Consider also below approach
select company_id, (
select sum(amount)
from t.amounts amount
) as top_six_invoices_amount
from (
select invoices.company_id,
array_agg(invoices.amount order by invoices.invoice_id limit 6) amounts
from your_table invoices
group by invoices.company_id
) t
You can create order row numbers to the lines in a partition based on invoice id and filter to it, something like this:
with array_table as (
select 'a' field, * from unnest([3, 2, 1 ,4, 6, 3]) id
union all
select 'b' field, * from unnest([1, 2, 1, 7]) id
)
select field, sum(id) from (
select field, id, row_number() over (partition by a.field order by id desc) rownum
from array_table a
)
where rownum < 3
group by field
More examples for analytical examples here:
https://medium.com/#aliz_ai/analytic-functions-in-google-bigquery-part-1-basics-745d97958fe2
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
I have following table -
create table iphone_defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
insert into iphone_defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into iphone_defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into iphone_defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into iphone_defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into iphone_defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into iphone_defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into iphone_defects values ('iPhone','Audio problem',25,202106,'2020-08-09');
Am expecting the following result -
fwkyr refers to Financial Week in a year. I have added in additional column fwenddate basically referring to max date in the financial week of the year.
Basically the ask is to obtain the defect with largest quantity in a 4 week window from the current week. Say for the fwkyr - 202112, the highest defects is for 'Glass breakage' and the total quantity is 100.
This is a static window. My actual use case needs 52 week.
Without the moving window, I know that I can rank and get the data but not sure on how to even approach this problem. Any help?
Per updated question my updated solution gets much longer and changes quite a bit.
I am still not sure if user selects from which week you need another 52 weeks or if you are looking at this calculation from start (week 1) of every year.
I also assume that you have a typo in one of your insert statements when I compare to your desired output table. So I changed it to fit your output table.
1. Create table
create table table.defects(
product string
,defect string
,qty int64
,fwkyr int64
,fwenddate date
);
2. Insert data (adjusted last insert to match your output table)
insert into table.defects values ('iPhone','Glass breakage',100,202112,'2020-09-20');
insert into table.defects values ('iPhone','No sound',30,202111,'2020-09-30');
insert into table.defects values ('iPhone','Glass breakage',25,202110,'2020-09-06');
insert into table.defects values ('iPhone','Audio problem',20,202109,'2020-08-30');
insert into table.defects values ('iPhone','No sound',60,202108,'2020-08-23');
insert into table.defects values ('iPhone','Empty boxes',30,202107,'2020-08-16');
insert into table.defects values ('iPhone','Audio problem',55,202106,'2020-08-09');
3. Query for results
###############################################################################
### start count of weeks since selected first week and
### get number of weeks by desired range
###############################################################################
WITH
get_weeks AS (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_numbering,
SPLIT(CAST(ROW_NUMBER() OVER(PARTITION BY product ORDER BY fwkyr)/4 AS string), '.')[
OFFSET
(0)] AS week_id_0,
FROM
table.defects
ORDER BY
fwkyr DESC
),
###############################################################################
### produce filter column for each window period by offsetting
###############################################################################
get_weeks_consequtive AS (
SELECT
*,
LAG(week_id_0,1) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_1,
LAG(week_id_0,2) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_2,
LAG(week_id_0,3) OVER(PARTITION BY product ORDER BY fwkyr DESC) AS week_id_3
FROM
get_weeks ),
###############################################################################
### create tables and calculations per window using filter column where you group by for qty and keep top qty only
###############################################################################
week_id_0 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_0 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_1 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_1 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_2 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_2 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1),
week_id_3 AS (
SELECT
SUM(qty) AS qty,
product,
defect,
week_id
FROM (
SELECT
* EXCEPT(week_id_0,
week_id_1,
week_id_2,
week_id_3),
MAX(fwkyr) OVER() AS week_id
FROM
get_weeks_consequtive
WHERE
week_id_3 = '1' )
GROUP BY
2,
3,
4
ORDER BY
1 DESC
LIMIT
1)
###############################################################################
### union all selected windows
###############################################################################
SELECT
*
FROM
week_id_0
UNION ALL
SELECT
*
FROM
week_id_1
UNION ALL
SELECT
*
FROM
week_id_2
UNION ALL
SELECT
*
FROM
week_id_3
ORDER BY
week_id DESC
get_weeks
get_weeks_consequtive
week_id_1
result
PS ---
I brainstormed this quick per your update perhaps there is a better way and I would be interested in seeing it.
Anyhow, with such lengthy queries I typically produce a python script with text templates for repetitive parts and use a loop to expand repetitive parts to desired lengths by incrementing changing values and inserting them with so called f strings.
I have a SQL database with a list of Customer IDs CustomerID and invoices, the specific product purchased in each invoice ProductID, the Date and the Income of each invoice . I need to write a query that will retrieve for each product, which was the second customer who made a purchase
How do I do that?
EDIT:
I have come up with the following query:
SELECT *,
LEAD(CustomerID) OVER (ORDER BY ProductID, Date) AS 'Second Customer Who Made A Purchase'
FROM a
ORDER BY ProductID, Date ASC
However, this query presents multiple results for products that have more than two purchases. Can you advise?
SELECT a2.ProductID,
(
SELECT a1.CustomerID
FROM a a1
WHERE a1.ProductID = a2.ProductID
ORDER BY Date asc
LIMIT 1,1
) as SecondCustomer
FROM a a2
GROUP BY a2.ProductID
I need to write a query that will retrieve for each product, which was the second customer who made a purchase
This sounds like a window function:
select a.*
from (select a.*,
row_number() over (partition by productid order by date asc) as seqnum
from a
) a
where seqnum = 2;
I'm trying to get the most expensive and cheapest items from two different tables.
The output should be one row with the values for MostExpensiveItem, MostExpensivePrice, CheapestItem, CheapestPrice
I was able to get the price of the most expensive and cheapest items in the two tables with following query:
SELECT
MAX(ExtrasPrice) as MostExpensivePrice, MIN(ExtrasPrice) as CheapestPrice
FROM
(
SELECT ExtrasPrice FROM Extras
UNION ALL
SELECT ItemPrice FROM Items
) foo
How can I add the names of the items (ItemName, ExtrasName) to my output? Again, there should only be one row as the output.
Try this:
SELECT TOP 1 FIRST_VALUE(Price) OVER (ORDER BY Price) AS MinPrice,
FIRST_VALUE(Name) OVER (ORDER BY Price) AS MinName,
LAST_VALUE(Price) OVER (ORDER BY Price DESC) AS MaxPrice,
LAST_VALUE(Name) OVER (ORDER BY Price DESC) AS MaxName
FROM (
SELECT ExtrasName AS Name, ExtrasPrice AS Price FROM Extras
UNION ALL
SELECT ItemName As Name, ItemPrice AS Price FROM Items) u
SQL Fiddle Demo
TOP 1 with order by clause should work for you. Try this
SELECT *
FROM (SELECT TOP 1 ExtrasPrice,ExtrasName
FROM Extras ORDER BY ExtrasPrice Asc),
(SELECT TOP 1 ItemPrice,ItemName
FROM Items ORDER BY ItemPrice Desc)
Note: Comma can be replaced with CROSS JOIN
You can use row_number() for this. If you are satisfied with two rows:
SELECT item, price
FROM (SELECT foo.*, row_number() over (order by price) as seqnum_asc,
row_number() over (order by price) as seqnum_desc
FROM (SELECT item, ExtrasPrice as price FROM Extras
UNION ALL
SELECT item, ItemPrice FROM Items
) foo
) t
WHERE seqnum_asc = 1 or seqnum_desc = 1;
EDIT:
If you have an index on "price" in both tables, then the cheapest method is probably:
with exp as (
(select top 1 item, ExtrasPrice as price
from Extras e
order by price desc
) union all
(select top 1 i.item, ItemPrice
from Items i
order by price desc
)
),
cheap as (
(select top 1 item, ExtrasPrice as price
from Extras e
order by price asc
) union all
(select top 1 i.item, ItemPrice
from Items i
order by price asc
)
)
select top 1 *
from exp
order by price desc
union all
select top 1 *
from cheap
order by price asc;
If you want this in one row, you can replace the final query with:
select e.*, c.*
from (select top 1 *
from exp
order by price desc
) e cross join
(select top 1 *
from cheap
order by price asc
) c
I want to write a stored procedure.
In that stored procedure, I want to find duplicate row values from a table, and calculate sum operation on these rows to the same table.
Let's say, I have a CustomerSales table;
ID SalesRepresentative Customer Quantity
1 Michael CustA 55
2 Michael CustA 10
and I need to turn table to...
ID SalesRepresentative Customer Quantity
1 Michael CustA 65
2 Michael CustA 0
When I find SalesRepresentative and Customer duplicates at the same time, I want to sum all Quantity values of these rows and assign to the first row of a table, and others will be '0'.
Could you help me.
To aggregate duplicates into one row:
SELECT min(ID) AS ID, SalesRepresentative, Customer
,sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
ORDER BY min(ID)
Or, if you actually want those extra rows with 0 as Quantity in the result:
SELECT ID, SalesRepresentative, Customer
,CASE
WHEN (count(*) OVER (PARTITION BY SalesRepresentative,Customer)) = 1
THEN Quantity
WHEN (row_number() OVER (PARTITION BY SalesRepresentative,Customer
ORDER BY ID)) = 1
THEN sum(Quantity) OVER (PARTITION BY SalesRepresentative,Customer)
ELSE 0
END AS Quantity
FROM CustomerSales
ORDER BY ID
This makes heavy use of window functions.
Alternative version without window functions:
SELECT min(ID) AS ID, SalesRepresentative, Customer, sum(Quantity) AS Quantity
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
UNION ALL
SELECT ID, SalesRepresentative, Customer, 0 AS Quantity
FROM CustomerSales c
GROUP BY SalesRepresentative, Customer
LEFT JOIN (
SELECT min(ID) AS ID
FROM CustomerSales
GROUP BY SalesRepresentative, Customer
) x ON (x.ID = c.ID)
WHERE x.ID IS NULL
ORDER BY ID