BigQuery SQL: Sum of first N related items - sql

I would like to know the sum of a value in the first n items in a related table. For example, I want to get the sum of a companies first 6 invoices (the invoices can be sorted by ID asc)
Current SQL:
SELECT invoices.company_id, SUM(invoices.amount)
FROM invoices
JOIN companies on invoices.company_id = companies.id
GROUP BY invoices.company_id
This seems simple but I can't wrap my head around it.

Consider also below approach
select company_id, (
select sum(amount)
from t.amounts amount
) as top_six_invoices_amount
from (
select invoices.company_id,
array_agg(invoices.amount order by invoices.invoice_id limit 6) amounts
from your_table invoices
group by invoices.company_id
) t

You can create order row numbers to the lines in a partition based on invoice id and filter to it, something like this:
with array_table as (
select 'a' field, * from unnest([3, 2, 1 ,4, 6, 3]) id
union all
select 'b' field, * from unnest([1, 2, 1, 7]) id
)
select field, sum(id) from (
select field, id, row_number() over (partition by a.field order by id desc) rownum
from array_table a
)
where rownum < 3
group by field
More examples for analytical examples here:
https://medium.com/#aliz_ai/analytic-functions-in-google-bigquery-part-1-basics-745d97958fe2
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts

Related

How to select columns that aren't part of an aggregate query using HAVING SUM() in the WHERE and selecting only certain rows on db2

Using AS400 db2 for this.
I have a table of orders. From that table I have to:
Get all orders from a specified list of order IDs and type
Group by the user_id on those orders
Check to make sure the total order amount on the group is greater than $100
Return all orders that matched the group but the results won't be grouped, which includes order_id which is not part of the group
I got a bit stuck because the AS400 did not like that I was asking to select a field that wasn't part of the group, which I need.
I came up with this query, but it's slow.
-- Create a common temp table we can use in both places
WITH wantedOrders AS (
SELECT order_id FROM orders
WHERE
-- Only orders from the web
order_type = 'web'
-- And only orders that we want to get at this time
AND order_id IN
(
50,
20,
30
)
)
-- Our main select that gets all order information, even the non-grouped stuff
SELECT
t1.order_id,
t1.user_id,
t1.amount,
t2.total_amount,
t2.count
FROM orders AS t1
-- Join in the group data where we can do our query
JOIN (
SELECT
user_id,
SUM(amount) as total_amount,
COUNT(*) AS count
FROM
orders
-- Re use the temp table to get the order numbers
WHERE order_id IN (SELECT order_id FROM wantedOrders)
GROUP BY
user_id
HAVING SUM(amount)>100
) AS t2 ON t2.user_id=t1.user_id
-- Make sure we only use the order numbers
WHERE order_id IN (SELECT order_id FROM wantedOrders)
ORDER BY t1.user_id ASC;
What's the better way to write this query?
Try this:
WITH
wantedOrders (order_id) AS
(
VALUES 1, 2
)
, orders (order_id, user_id, amount) AS
(
VALUES
(1, 1, 50)
, (2, 1, 50)
, (1, 2, 60)
, (2, 2, 60)
, (3, 3, 200)
, (4, 3, 200)
)
-- Our main select that gets all order information, even the non-grouped stuff
SELECT *
FROM
(
SELECT
order_id,
user_id,
amount,
SUM (amount) OVER (PARTITION BY user_id) AS total_amount,
COUNT (*) OVER (PARTITION BY user_id) AS count
FROM orders t
WHERE EXISTS
(
SELECT 1
FROM wantedOrders w
WHERE w.order_id = t.order_id
)
) A
WHERE total_amount > 100
ORDER BY user_id ASC
ORDER_ID
USER_ID
AMOUNT
TOTAL_AMOUNT
COUNT
1
2
60
120
2
2
2
60
120
2
If order_id is the PK of the table. Then just add the columns you need to the wantedOrders query and use it as your "base" (instead of using orders and refiltering it. You should end up joining wantedOrders with itself.
You can do:
select t.*
from orders t
join (
select user_id
from orders t
where order_id in (50, 20, 30)
group by user_id
having sum(total_amount) > 100
) s on s.user_id = t.user_id
The first table orders as t will produce the data you want. It will be filtered by the second "table expression" s that preselects the groups according to your logic.

Efficient way to combine 2 tables and get the row with max year with preference to one of the table

I am trying to combine 2 tables (key_ratios_cnd and key_ratios_snd) both tables are identical and primary key columns for both tables are symbol and fiscal_year.
In the final result set i want the row with maximum year in both the tables for each symbol. if the row with maximum year is present in both the tables then row from key_ratios_cnd should be selected.
I come up with below SQL query to give the result. I wanted to know if their are any other way to write the query that is more optimized.
select sq2.*
from
(select sq.*,
max(id) over(partition by sq.symbol) as max_id,
max(fiscal_year) over(partition by sq.symbol) as max_year
from
( select *,'2' as id
from test.key_ratios_cnd
union all
select *,'1' as id
from test.key_ratios_snd
) as sq
) as sq2
where id = max_id and fiscal_year = max_year
order by symbol asc
I would select a row from each table first and then combine. Postgres has distinct on which is perfect for this purpose.
select distinct on (symbol) sc.*
from ((select distinct on (cnd.symbol) cnd.*, 1 as ord
from test.key_ratios_cnd cnd
order by cnd.symbol, cnd.fiscal_year desc
) union all
(select distinct on (snd.symbol) cnd.*, 2 as ord
from test.key_ratios_cnd cnd
order by snd.symbol, snd.fiscal_year desc
)
) sc
order by symbol, fiscal_year desc, ord;
To speed this up, add an index on (symbol, fiscal_year desc) to each table.

Select MAX Value for Each ROW - Oracle Sql

I have one doubt.
I need to find what is the latest occurrence for a specific list of Customers, let's say to simplify, I need it for 3 Customers out of 100.
I need to check when it was the last time each of them got a bonus.
The table would be:
EVENT_TBL
Fields: Account ID, EVENT_DATE, BONUS ID, ....
Can you suggest a way to grab the latest (MAX) EVENT DATE (that means one row each)
I'm using SELECT...IN to specify the Account ID but not sure how to use MAX, Group BY etc etc (if ever needed).
Use the ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( PARTITION BY Account_id ORDER BY event_date DESC ) AS rn
FROM EVENT_TBL t
WHERE Account_ID IN ( 123, 456, 789 )
)
WHERE rn = 1
You can try
with AccountID_Max_EVENT_DATE as (
select AccountID, max(EVENT_DATE) MAX_D
from EVENT_TBL
group by AccountID
)
SELECT E.*
FROM EVENT_TBL E
INNER JOIN AccountID_Max_EVENT_DATE M
ON (E.AccountID = M.AccountID AND M.MAX_D = E.EVENT_DATE)

ERROR: ORA-00923: FROM keyword not found where expected

I tried to fetch data from a oracle sql table with the count of records. I tried like following,
SELECT *,
(COUNT(BRAND_ID) AS TOTAL)
FROM
(
SELECT BRAND_ID,
BRAND_CODE,
BRAND_TITLE
FROM BRAND
WHERE ACTIVE = '1'
ORDER BY BRAND_TITLE ASC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY
) BRAND
LEFT JOIN
((
SELECT PRODUCT_ID,
PRODUCT_SKU_ID,
PRODUCT_WEB_ID,
PRODUCT_TITLE,
PRODUCT_SALES_PRICE,
PRODUCT_REGULAR_PRICE,
PRODUCT_RATING
FROM PRODUCT
WHERE
(
PRODUCT_TYPE='B'
OR PRODUCT_TYPE='R'
)
AND AVAILABILITY='1'
) PRDUCT ) ON BRAND.BRAND_CODE= PRDUCT.BRAND_CODE
When I'm executing this I got the following error,
ERROR: ORA-00923: FROM keyword not found where expected
How may I fix this.
Thanks in Advance!
I guess You should remove * from select statement in the first line. Try the below one.
SELECT (COUNT(BRAND_ID) AS TOTAL)
FROM
(
SELECT BRAND_ID,
BRAND_CODE,
BRAND_TITLE
FROM BRAND
WHERE ACTIVE = '1'
ORDER BY BRAND_TITLE ASC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY
) BRAND
LEFT JOIN
((
SELECT PRODUCT_ID,
PRODUCT_SKU_ID,
PRODUCT_WEB_ID,
PRODUCT_TITLE,
PRODUCT_SALES_PRICE,
PRODUCT_REGULAR_PRICE,
PRODUCT_RATING
FROM PRODUCT
WHERE
(
PRODUCT_TYPE='B'
OR PRODUCT_TYPE='R'
)
AND AVAILABILITY='1'
) PRDUCT ) ON BRAND.BRAND_CODE= PRDUCT.BRAND_CODE
You are using a aggreagte function in the select statement . So you cannot simply call Select * for other columns.
First you should give an alias for the inside columns selected for easiness.
Then select that columns in the outside SELECT
Since one of the column in select is using agg function then a Group By should be done by other columns coming in Select.
Here for easiness i gave column name as c2,c3....rename as like u want.
If no alias is given u can specify the column as it is specified.
SELECT c2,c3,c4,c5,c6,c7,c8,c9,c10,
COUNT(BRAND_ID) AS TOTAL
FROM
(
SELECT BRAND_ID ,
BRAND_CODE AS c2,
BRAND_TITLE AS c3
FROM BRAND
WHERE ACTIVE = '1'
ORDER BY BRAND_TITLE ASC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY
) BRAND
LEFT JOIN
((
SELECT PRODUCT_ID AS c4,
PRODUCT_SKU_ID AS c5,
PRODUCT_WEB_ID AS c6,
PRODUCT_TITLE AS c7,
PRODUCT_SALES_PRICE AS c8,
PRODUCT_REGULAR_PRICE AS c9,
PRODUCT_RATING AS c10
FROM PRODUCT
WHERE
(
PRODUCT_TYPE='B'
OR PRODUCT_TYPE='R'
)
AND AVAILABILITY='1'
) PRDUCT ) ON BRAND.BRAND_CODE= PRDUCT.BRAND_CODE
Group By c2,c3,c4,c5,c6,c7,c8,c9,c10
I don't have 12c, so can't test, but maybe this is what you're after?
SELECT *
FROM
(
SELECT BRAND_ID,
BRAND_CODE,
BRAND_TITLE
FROM (select b.*,
count(brand_id) over () total
from BRAND b
WHERE ACTIVE = '1'
ORDER BY BRAND_TITLE ASC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY
) BRAND
LEFT JOIN
((
SELECT PRODUCT_ID,
PRODUCT_SKU_ID,
PRODUCT_WEB_ID,
PRODUCT_TITLE,
PRODUCT_SALES_PRICE,
PRODUCT_REGULAR_PRICE,
PRODUCT_RATING
FROM PRODUCT
WHERE
(
PRODUCT_TYPE='B'
OR PRODUCT_TYPE='R'
)
AND AVAILABILITY='1'
) PRDUCT ) ON BRAND.BRAND_CODE= PRDUCT.BRAND_CODE;
This uses an analytic query to get the count of all brand_ids over the whole table before you filter the rows. I'm not sure if you wanted the count per brand_id (count(*) over (partititon by brand_id) or perhaps the count of distinct brand_ids (count(distinct brand_id) over ()), though, so you'll have to play around with the count function to get the results you're after.

"Group" some rows together before sorting (Oracle)

I'm using Oracle Database 11g.
I have a query that selects, among other things, an ID and a date from a table. Basically, what I want to do is keep the rows that have the same ID together, and then sort those "groups" of rows by the most recent date in the "group".
So if my original result was this:
ID Date
3 11/26/11
1 1/5/12
2 6/3/13
2 10/15/13
1 7/5/13
The output I'm hoping for is:
ID Date
3 11/26/11 <-- (Using this date for "group" ID = 3)
1 1/5/12
1 7/5/13 <-- (Using this date for "group" ID = 1)
2 6/3/13
2 10/15/13 <-- (Using this date for "group" ID = 2)
Is there any way to do this?
One way to get this is by using analytic functions; I don't have an example of that handy.
This is another way to get the specified result, without using an analytic function (this is ordering first by the most_recent_date for each ID, then by ID, then by Date):
SELECT t.ID
, t.Date
FROM mytable t
JOIN ( SELECT s.ID
, MAX(s.Date) AS most_recent_date
FROM mytable s
WHERE s.Date IS NOT NULL
GROUP BY s.ID
) r
ON r.ID = t.ID
ORDER
BY r.most_recent_date
, t.ID
, t.Date
The "trick" here is to return "most_recent_date" for each ID, and then join that to each row. The result can be ordered by that first, then by whatever else.
(I also think there's a way to get this same ordering using Analytic functions, but I don't have an example of that handy.)
You can use the MAX ... KEEP function with your aggregate to create your sort key:
with
sample_data as
(select 3 id, to_date('11/26/11','MM/DD/RR') date_col from dual union all
select 1, to_date('1/5/12','MM/DD/RR') date_col from dual union all
select 2, to_date('6/3/13','MM/DD/RR') date_col from dual union all
select 2, to_date('10/15/13','MM/DD/RR') date_col from dual union all
select 1, to_date('7/5/13','MM/DD/RR') date_col from dual)
select
id,
date_col,
-- For illustration purposes, does not need to be selected:
max(date_col) keep (dense_rank last order by date_col) over (partition by id) sort_key
from sample_data
order by max(date_col) keep (dense_rank last order by date_col) over (partition by id);
Here is the query using analytic functions:
select
id
, date_
, max(date_) over (partition by id) as max_date
from table_name
order by max_date, id
;