Bigquery - aggregate while filtering out values

Bigquery - aggregate while filtering out values - google-bigquery

I'm sure this question has been answered elsewhere but I can't find it.
I have a table of invoices like
id
company
index
date_sent
amount
1
Com1
1
2022-01-01
100
2
Com1
2
2022-02-01
100
3
Com1
3
2022-03-01
100
4
Com1
4
2022-04-01
100
5
Com2
1
2022-02-01
100
6
Com2
2
2022-03-01
100
7
Com2
3
2022-04-01
100
8
Com3
1
2022-01-01
100
9
Com3
2
2022-02-01
100
10
Com4
1
2022-01-01
100
(Index here is added by basically doing RANK() OVER (PARTITION BY co ORDER BY date_sent) as index)
I'd like to return companies who have more than 3 invoices, the aggregate sum of those 3 invoices and the date sent of the 3rd invoice.
For example, for the data above, the returned data should be:
company
date_3rd
amount_sum_3
Com1
2022-03-01
300
Com2
2022-04-01
300
So far I've got:
select company,
(select sum(amount) from grouped_invs.amount_sum_3 amount) as amount_sum_3,
from (
select company,
array_agg(invoices.amount order by invoices.index limit 3) amount_sum_3,
from `data` invoices
group by invoices.company
having count(*) => 3
) grouped_invs
which gives me
company
amount_sum_3
Com1
300
Com2
300
But I can't figure out how to get the 3rd date sent out from there.
Thanks in advance

You might consider below
SELECT (SELECT AS STRUCT
ANY_VALUE(company) AS company,
MAX(date_sent) date_3rd,
SUM(amount) amount_sum_3
FROM grouped_invs.amount_sum_3).*
FROM (
SELECT ARRAY_AGG(invoices ORDER BY index LIMIT 3) amount_sum_3
FROM `data` invoices
GROUP BY invoices.company HAVING COUNT(*) >= 3
) grouped_invs;
Assuming that your data already has an index, below will return same results.
SELECT company, MAX(date_sent) date_3rd, SUM(amount) amount_sum_3
FROM (
SELECT * FROM `data` invoices
WHERE index <= 3
QUALIFY COUNT(*) OVER (PARTITION BY company) >= 3
)
GROUP BY 1;
Query results

Related

Grouping and Summarize SQL

My table looks like the following:
income
date
productid
invoiceid
customerid
300
2015-01-01
A
1234551
1
300
2016-01-02
A
1234552
1
300
2016-01-03
B
1234553
2
300
2016-01-03
A
1234553
2
300
2016-01-04
C
1234554
3
300
2016-01-04
C
1234554
3
300
2016-01-08
A
1234556
3
300
2016-01-08
B
1234556
3
300
2016-01-11
C
1234557
3
I need to know : Number of invoices per customer, how many customers in total (for example one invoice = several customers, two invoices = two customers, three invoices = three customers, and so..).
What is the syntax for this query?
In my sample data above, customer 1 has two invoices, customer 2 one invoice and customer 3 three invoices. So there is one customer each with a count of 1, 2, and 3 invoices in my example.
Expected result:
invoice_count
customers_with_this_invoice_count
1
1
2
1
3
1
I tried this syntax and I'm still stuck:
select * from
(
select CustomerID,count(distinct InvoiceID) as 'Total Invoices'
from exam
GROUP BY CustomerID
) a

Select Count(customerID),CustomerID From a
Group By customerID
Having Count(customerID) > 1

Access sql Moving Average of Top N With 2 criterias

I have been searching the forum and found a single post that is a little smilair to my problem here: Calculate average for Top n combined with SQL Group By.
My situation is:
I have a table tblWEIGHT that contains: ID, Date, idPONR, Weight
I have a second table tblSALES that contains: ID, Date, Sales, idPONR
I have a third table tblPONR that contains: ID, PONR, idProduct
And a fouth table tblPRODUCT that contais: ID, Product
The linking:
tblWEIGHT.idPONR = tblPONR.ID
tblSALES.idPONR = tblPONR.ID
tblPONR.idProduct = tblPRODUCT.ID
The maintable of my query is tblSALES. I want to all my sales listed, with the moving average of the top5
weights of the PRODUCT where the date of the weight is less than the sales date, and the product is the same as the sold product. Its IMPORTANT that the result isn't grouped by the date. I need all the records of tblSALES.
i have gotten as far as to get the top 1 weight, but im not able to get the moving average instread.
The query that gest the top 1 is the following, and i am guessing that the query i need is going to look a lot like it.
SELECT tblSALES.ID, tblSALES.Dato, tblPONR.idPRODUCT,
(
SELECT top 1 Weight FROM tblWEIGHT INNER JOIN tblPONR ON tblWeight.idPONR = tblPONR.ID
WHERE tblPONR.idPRODUCT = idPRODUCT AND
SALES.Date > tblWEIGHT.Date
ORDER BY tblWEIGHT.Date desc
) AS LatestWeight
FROM tblSALES INNER JOIN VtblPONR ON tblSALES.idPONR = tblPONR.ID
this is not my exact query since im danish and i wouldnt make sense. I know im not supposed to use Date as a fieldname.
i imagine the filan query would be something like:
SELECT tblSALES.ID..... avg(SELECT TOP 5 weight .........)
but doing this i keep getting error at max 1 record can be returned by this subquery
Final Question.
How do i make a query that creates a moving average of the top 5 weights of my sold product, where the date of the weight is earlier than the date i sold the product?
EDIT Sampledata:
DATEFORMAT: dd/mm/yyyy
tblWEIGHT
ID Date idPONR Weight
1 01-01-2020 1 100
2 02-01-2020 2 200
3 03-01-2020 3 200
4 04-01-2020 3 400
5 05-01-2020 2 250
6 06-01-2020 1 150
7 07-01-2020 2 200
tblSALES
ID Date Sales(amt) idPONR
1 05-01-2020 30 1
2 06-01-2020 15 2
3 10-01-2020 20 3
tblPONR
ID PONR(production Number) idProduct
1 2521 1
2 1548 1
3 5484 2
tblPRODUCT
ID Product
1 Bricks
2 Tiles
Desired outcome read comments for AvgWeight
tblSALES.ID tblSALES.Date tblSales.Sales(amt) AvgWeigt
1 05-01-2020 30 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<05-01-2020)
2 06-01-2020 15 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<06-01-2020)
3 10-01-2020 20 123 -->avg(top 5 newest weight of idPONR 3 since thats the only idPONR with that product, and where tblWeight.Date<10-01-2020)

Consider:
Query1
SELECT tblWeight.ID AS WeightID, tblWeight.Date AS WtDate,
tblWeight.idPONR, tblPONR.PONR, tblPONR.idProduct, tblWeight.Weight, tblSales.SalesAmt,
tblSales.ID AS SalesID, tblSales.Date AS SalesDate
FROM (tblPONR INNER JOIN tblWeight ON tblPONR.ID = tblWeight.idPONR)
INNER JOIN tblSales ON tblPONR.ID = tblSales.idPONR;
Query2
SELECT * FROM Query1 WHERE WeightID IN (
SELECT TOP 5 WeightID FROM Query1 AS Dupe WHERE Dupe.idProduct = Query1.idProduct
AND Dupe.WtDate<Query1.SalesDate ORDER BY Dupe.WtDate);
Query3
SELECT Query2.SalesID, Query2.SalesDate, Query2.SalesAmt,
First(DAvg("Weight","Query2","idProduct=" & [idProduct] & " AND WtDate<#" & [SalesDate] & "#")) AS AvgWt
FROM Query2
GROUP BY Query2.SalesID, Query2.SalesDate, Query2.SalesAmt;

Need to group records based on matching reversal in sql

I have a tricky scenario to aggregate the data.
Data in my source table is as follows.
CustomerId Transaction Type Transaction Amount
1 Payment 100
1 ReversePayment -100
1 payment 100
1 ReversePayment -100
1 Payment 100
1 Payment 100
Requirement is as follows:
If the payment as a assoociated Reversepayment with matched amount, sum these two records.
If the payment does not have an associated Reverse payment, consider it as orphan(dont sum it).
I want output to be like this.
CustomerId Transaction Type Transaction Amount
1 Payment,ReversePayment 0
1 payment,ReversePayment 0
1 payment 100
1 Payment 100
In this scenario,
First record which is payment has an associated reverse payment (2nd record), Hence the sum becomes 0
Third record which is payment has an associated reverse payment (4th record), then the sum becomes 0
Fifth and sixth does not have associated reversals. dont sum these records.
Second Example:
Data in the source as follows:
CustomerId Transaction Type Transaction Amount
1 Payment 100
1 ReversePayment -100
1 payment 300
1 ReversePayment -300
1 Payment 400
1 Payment 500
Expected Output
CustomerId Transaction Type Transaction Amount
1 Payment,ReversePayment 0
1 payment,ReversePayment 0
1 payment 400
1 Payment 500
Second example requirement:
-As first and second records (payment and its associated reverse payment got
matched) ,sum these two records, output is 0.
- As third and fourth records (payment and its associated reverse payment got
matched), sum these two records, output is 0.
- Fifth and sixth does not have associated reversals. don't sum these records.
I got solutions in group, but data is not always guaranteed to have orphan records as 'payments'. Some times they are 'Payments' and some times they are 'ReversePayments'. Can some help me get ouptut like the below (using rank or rownumber functions ) so that i can group by using RRR column.
CustomerId Transaction Type Transaction Amount RRR
1 Payment 100 1
1 ReversePayment -100 1
1 payment 100 2
1 ReversePayment -100 2
1 Payment 100 3
1 Payment 100 4
CustomerId Transaction Type Transaction Amount RRR
1 Payment 100 1
1 ReversePayment -100 1
1 payment 300 2
1 ReversePayment -300 2
1 Payment 400 3
1 Payment 500 4

You can enumerate the different types and then aggregate:
select customerid,
listagg(ttype, ',') within group (order by ttype) as types,
sum(amount) as amount
from (select t.*,
row_number() over (partition by customerid, ttype, amount order by customerid) as seqnum
from t
) t
group by customerid, seqnum;

Edited to include your second scenario:
Using rownum to enforce inherent ordering (i.e. transactions happened in the order you've listed ), since your example is missing a transaction id or transaction time
SQL> select * from trans_data2;
CUSTOMER_ID TRANSACTION_TY TRANSACTION_AMOUNT
----------- -------------- ------------------
1 Payment 100
1 ReversePayment -100
1 payment 300
1 ReversePayment -300
1 Payment 400
1 Payment 500
6 rows selected.
SQL> select customer_id,
2 case
3 when upper(next_transaction) = 'REVERSEPAYMENT' then transaction_type||','||next_transaction
4 else transaction_type
5 end transaction_type,
6 case
7 when upper(next_transaction) = 'REVERSEPAYMENT' then transaction_amount + next_transaction_amount
8 else transaction_amount
9 end transaction_amount
10 from (
11 select customer_id, transaction_type, transaction_amount,
12 lead (transaction_type) over ( partition by customer_id order by transaction_id ) next_transaction,
13 nvl(lead (transaction_amount) over ( partition by customer_id order by transaction_id),0) next_transaction_amount
14 from ( select rownum transaction_id, t.* from trans_data2 t )
15 ) where upper(transaction_type) = 'PAYMENT'
16 ;
CUSTOMER_ID TRANSACTION_TYPE TRANSACTION_AMOUNT
----------- ----------------------------- ------------------
1 Payment,ReversePayment 0
1 payment,ReversePayment 0
1 Payment 400
1 Payment 500

How to get latest records based on two columns of max

I have a table called Inventory with the below columns
item warehouse date sequence number value
111 100 2019-09-25 12:29:41.000 1 10
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 1 5
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-19 12:05:23.000 1 4
333 300 2020-01-20 12:05:23.000 1 5
Expected Output:
item warehouse date sequence number value
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-20 12:05:23.000 1 5
Based on item and warehouse, i need to pick latest date and latest sequence number of value.
I tried with below code
select item,warehouse,sequencenumber,sum(value),max(date) as date1
from Inventory t1
where
t1.date IN (select max(date) from Inventory t2
where t1.warehouse=t2.warehouse
and t1.item = t2.item
group by t2.item,t2.warehouse)
group by t1.item,t1.warehouse,t1.sequencenumber
Its working for latest date but not for latest sequence number.
Can you please suggest how to write a query to get my expected output.

You can use row_number() for this:
select *
from (
select
t.*,
row_number() over(
partition by item, warehouse
order by date desc, sequence_number desc, value desc
) rn
from mytable t
) t
where rn = 1

Sum and subtract operations in select query with multiple joins

I have following tables:
TABLE ITEMS Contains
ITEM_ID ITEM
-------------------
1 Food
2 Medical
3 Shopping
4 Others
TABLE EXPENSE_DURATION Contains
E_ID NAME FROM_DATE TO_DATE
----------------------------------------------------------------
1 FEB_2012 1-Feb-2013 12:00:00 AM 28-Feb-2013 12:00:00 AM
2 MAR_2012 1-Mar-2013 12:00:00 AM 31-Mar-2013 12:00:00 AM
TABLE AMOUNT_FOR_EXPENSE Contains
AFE_ID E_ID ITEM_LIST AMOUNT
------------------------------------
1 1 1,2,3,4 5000
2 2 1,2,3,4 6000
TABLE EXPENSE Contains
EXPENSE_ID E_ID ITEM_ID DATE AMOUNT
---------------------------------------------------------------------
1 1 1 1-Feb-2013 12:00:00 AM 250
2 1 2 1-Feb-2013 12:00:00 AM 450
3 1 3 1-Feb-2013 12:00:00 AM 300
4 1 4 1-Feb-2013 12:00:00 AM 100
5 1 1 2-Feb-2013 12:00:00 AM 4500
6 1 2 2-Feb-2013 12:00:00 AM 3500
7 1 3 2-Feb-2013 12:00:00 AM 2000
8 1 4 2-Feb-2013 12:00:00 AM 1500
Now I want to make one stored Procedure that gives me expense_summary. I am passing Just E_ID as parameter to this stored procedure.
As a result I need one table contains respective summary.
Example: E_ID=1
Result:
TOTAL_OUT TOTAL_IN SUMMARY (IN-OUT)
12600 5000 -7600
I know only
SELECT SUM(AMOUNT) FROM EXPENSE WHERE E_ID=1
Result > 12600
And
SELECT AMOUNT FROM AMOUNT_FOR_EXPENSE WHERE E_ID=1
Result > 5000
I Know this two separate queries but I don’t know how to merge them and how to perform subtraction in select query with joins.
Please help to make select query / stored procedure so that I can generate result as I need.

You can use the following query to get the result:
select e.e_id,
e.Total_out,
a.amount Total_in,
(e.Total_out - a.amount) * -1 Summary
from
(
select sum(amount) Total_out,
e_id
from expense
group by e_id
) e
left join AMOUNT_FOR_EXPENSE a
on e.e_id = a.e_id
where e.e_id = 1
See SQL Fiddle with Demo

try this
SELECT SUM(AMOUNT),AMOUNT FROM EXPENSE ,AMOUNT_FOR_EXPENSE WHERE E_ID=1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas