Grouping and Summarize SQL - sql

My table looks like the following:
income
date
productid
invoiceid
customerid
300
2015-01-01
A
1234551
1
300
2016-01-02
A
1234552
1
300
2016-01-03
B
1234553
2
300
2016-01-03
A
1234553
2
300
2016-01-04
C
1234554
3
300
2016-01-04
C
1234554
3
300
2016-01-08
A
1234556
3
300
2016-01-08
B
1234556
3
300
2016-01-11
C
1234557
3
I need to know : Number of invoices per customer, how many customers in total (for example one invoice = several customers, two invoices = two customers, three invoices = three customers, and so..).
What is the syntax for this query?
In my sample data above, customer 1 has two invoices, customer 2 one invoice and customer 3 three invoices. So there is one customer each with a count of 1, 2, and 3 invoices in my example.
Expected result:
invoice_count
customers_with_this_invoice_count
1
1
2
1
3
1
I tried this syntax and I'm still stuck:
select * from
(
select CustomerID,count(distinct InvoiceID) as 'Total Invoices'
from exam
GROUP BY CustomerID
) a

Select Count(customerID),CustomerID From a
Group By customerID
Having Count(customerID) > 1

Related

Bigquery - aggregate while filtering out values

I'm sure this question has been answered elsewhere but I can't find it.
I have a table of invoices like
id
company
index
date_sent
amount
1
Com1
1
2022-01-01
100
2
Com1
2
2022-02-01
100
3
Com1
3
2022-03-01
100
4
Com1
4
2022-04-01
100
5
Com2
1
2022-02-01
100
6
Com2
2
2022-03-01
100
7
Com2
3
2022-04-01
100
8
Com3
1
2022-01-01
100
9
Com3
2
2022-02-01
100
10
Com4
1
2022-01-01
100
(Index here is added by basically doing RANK() OVER (PARTITION BY co ORDER BY date_sent) as index)
I'd like to return companies who have more than 3 invoices, the aggregate sum of those 3 invoices and the date sent of the 3rd invoice.
For example, for the data above, the returned data should be:
company
date_3rd
amount_sum_3
Com1
2022-03-01
300
Com2
2022-04-01
300
So far I've got:
select company,
(select sum(amount) from grouped_invs.amount_sum_3 amount) as amount_sum_3,
from (
select company,
array_agg(invoices.amount order by invoices.index limit 3) amount_sum_3,
from `data` invoices
group by invoices.company
having count(*) => 3
) grouped_invs
which gives me
company
amount_sum_3
Com1
300
Com2
300
But I can't figure out how to get the 3rd date sent out from there.
Thanks in advance
You might consider below
SELECT (SELECT AS STRUCT
ANY_VALUE(company) AS company,
MAX(date_sent) date_3rd,
SUM(amount) amount_sum_3
FROM grouped_invs.amount_sum_3).*
FROM (
SELECT ARRAY_AGG(invoices ORDER BY index LIMIT 3) amount_sum_3
FROM `data` invoices
GROUP BY invoices.company HAVING COUNT(*) >= 3
) grouped_invs;
Assuming that your data already has an index, below will return same results.
SELECT company, MAX(date_sent) date_3rd, SUM(amount) amount_sum_3
FROM (
SELECT * FROM `data` invoices
WHERE index <= 3
QUALIFY COUNT(*) OVER (PARTITION BY company) >= 3
)
GROUP BY 1;
Query results

Aggregate payments per year per customer per type

Please consider the following payment data:
customerID paymentID pamentType paymentDate paymentAmount
---------------------------------------------------------------------
1 1 A 2015-11-28 500
1 2 A 2015-11-29 -150
1 3 B 2016-03-07 300
2 4 A 2015-03-03 200
2 5 B 2016-05-25 -100
2 6 C 2016-06-24 700
1 7 B 2015-09-22 110
2 8 B 2016-01-03 400
I need to tally per year, per customer, the sum of the diverse payment types (A = invoice, B = credit note, etc), as follows:
year customerID paymentType paymentSum
-----------------------------------------------
2015 1 A 350 : paymentID 1 + 2
2015 1 B 110 : paymentID 7
2015 1 C 0
2015 2 A 200 : paymentID 4
2015 2 B 0
2015 2 C 0
2016 1 A 0
2016 1 B 300 : paymentID 3
2016 1 C 0
2016 2 A 0
2016 2 B 300 : paymentID 5 + 8
2016 2 C 700 : paymentId 6
It is important that there are values for every category (so for 2015, customer 1 has 0 payment value for type C, but still it is good to see this).
In reality, there are over 10 payment types and about 30 customers. The total date range is 10 years.
Is this possible to do in only SQL, and if so could somebody show me how? If possible by using relatively easy queries so that I can learn from it, for instance by storing intermediary result into a #temptable.
Any help is greatly appreciated!
a simple GROUP BY with SUM() on the paymentAmount will gives you what you wanted
select year = datepart(year, paymentDate),
customerID,
paymentType,
paymentSum = sum(paymentAmount)
from payment_data
group by datepart(year, paymentDate), customerID, paymentType
This is a simple query that generates the required 0s. Note that it may not be the most efficient way to generate this result set. If you already have lookup tables for customers or payment types, it would be preferable to use those rather than the CTEs1 I use here:
declare #t table (customerID int,paymentID int,paymentType char(1),paymentDate date,
paymentAmount int)
insert into #t(customerID,paymentID,paymentType,paymentDate,paymentAmount) values
(1,1,'A','20151128', 500),
(1,2,'A','20151129',-150),
(1,3,'B','20160307', 300),
(2,4,'A','20150303', 200),
(2,5,'B','20160525',-100),
(2,6,'C','20160624', 700),
(1,7,'B','20150922', 110),
(2,8,'B','20160103', 400)
;With Customers as (
select DISTINCT customerID from #t
), PaymentTypes as (
select DISTINCT paymentType from #t
), Years as (
select DISTINCT DATEPART(year,paymentDate) as Yr from #t
), Matrix as (
select
customerID,
paymentType,
Yr
from
Customers
cross join
PaymentTypes
cross join
Years
)
select
m.customerID,
m.paymentType,
m.Yr,
COALESCE(SUM(paymentAmount),0) as Total
from
Matrix m
left join
#t t
on
m.customerID = t.customerID and
m.paymentType = t.paymentType and
m.Yr = DATEPART(year,t.paymentDate)
group by
m.customerID,
m.paymentType,
m.Yr
Result:
customerID paymentType Yr Total
----------- ----------- ----------- -----------
1 A 2015 350
1 A 2016 0
1 B 2015 110
1 B 2016 300
1 C 2015 0
1 C 2016 0
2 A 2015 200
2 A 2016 0
2 B 2015 0
2 B 2016 300
2 C 2015 0
2 C 2016 700
(We may also want to play games with a numbers table and/or generate actual start and end dates for years if the date processing above needs to be able to use an index)
Note also how similar the top of my script is to the sample data in your question - except it's actual code that generates the sample data. You may wish to consider presenting sample code in such a way in the future since it simplifies the process of actually being able to test scripts in answers.
1CTEs - Common Table Expressions. They may be thought of as conceptually similar to temp tables - except we don't actually (necessarily) materialize the results. They also are incorporated into the single query that follows them and the whole query is optimized as a whole.
Your suggestion to use temp tables means that you'd be breaking this into multiple separate queries that then necessarily force SQL to perform the task in an order that we have selected rather than letting the optimizer choose the best approach for the above single query.

Calculate payment date for each invoice

Please consider the following table transaction: a company regularly sends invoices to their customers that are part of the same order. The companies' clients will often pay only once per so many weeks.
(trans_date in format yyyy-mm-dd)
id order_id trans_type trans_date trans_amount
----------------------------------------------------------
1 1 invoice 2017-01-10 100
2 1 invoice 2017-05-23 150
3 1 invoice 2017-05-28 200
4 2 invoice 2017-03-01 700
5 2 payment 2017-06-16 700
6 1 payment 2017-10-12 450
7 3 invoice 2017-06-24 199
The company would like to see on what date each invoice was paid for. For example: invoice (id) 1 (part of order_id=1 group) was sent on 2017-01-10 and paid on 2017-10-12 (id=6). Invoice with id=7 has not been paid at all.
The desired output would be the payment date for each invoice (payment_date):
id order_id trans_type trans_date trans_amount payment_date
--------------------------------------------------------------------------
1 1 invoice 2017-01-10 100 2017-10-12
2 1 invoice 2017-05-23 150 2017-10-12
3 1 invoice 2017-05-28 200 2017-10-12
4 2 invoice 2017-03-01 700 2017-06-16
5 2 payment 2017-06-16 700
6 1 payment 2017-10-12 450
7 3 invoice 2017-06-24 199
For transactions 5, 6 and 7, the payment_date is empty because it is either a payment (id=5 and 6) or an unpaid invoice (id=7).
I don't understand how I should solve this issue. In combination with regular scripting, I would get the whole set and loop through it to find each payment. But how can this be solved in SQL only?
Any help would be greatly appreciated!
Did you try a simple left join?
Below code is standard SQL.
Select a.id , a.order_id, a.trans_type, a.trans_date, a.trans_amount, isnull(b.trans_date, '') As payment_date
From transaction a
Left join transaction b
On a.order_id = b.order_id
And a.trans_type = 'invoice'
And b.trans_type = 'payment'
You can do a cumulative sum of payments and invoices and get the first date when the payment total meets or exceeds the invoice total:
with ip as (
select ip.*,
sum(case when ip.trans_type = 'invoice' then ip.trans_amount else 0 end) over (order by ip.trans_date) as running_invoice,
sum(case when ip.trans_type = 'payment' then ip.trans_amount else 0 end) over (order by ip.trans_date) as running_payment,
from invoicepayments i
)
select ip.*,
(select min(ip2.trans_date)
from ip ip2
where ip2.running_payment >= ip.running_invoice and
ip.trans_type = 'invoice'
) as payment_date
from ip;

SQL Group by range and total column

With the following function and stored procedure i get a resultset with 2 columns.
What i need additional is a third column total for all open invoices inner score range
It would great for any idea.
ALTER FUNCTION dbo.OpenOrders
(
#MandantId int
)
RETURNS TABLE
AS
RETURN
SELECT SUM(Invoices.Amount - ISNULL(Payment.Amount, 0)) AS Op
FROM Invoices INNER JOIN
Orders ON Invoices.OrderId = Orders.OrderId LEFT OUTER JOIN
Payment ON Invoices.InvoiceId = Payment.InvoiceId
WHERE (Orders.MandantId = #MandantId)
GROUP BY Invoice.InvoiceId, Invoices.Amount
HAVING (SUM(Invoices.Amount - ISNULL(Payment.Amount, 0)) <> 0)
ALTER PROCEDURE dbo.GetOpRanges
#MandantId int
AS
BEGIN
SELECT * INTO #tmp_ranges
FROM
//wrong in first post -> OPDebitorByMandant(#MandantId)
OpenOrders(#MandantId)
SELECT op AS [score range], COUNT(*) AS [number of occurences]
FROM
(SELECT CASE
WHEN op BETWEEN 0 AND 50 THEN ' 0- 50'
WHEN op BETWEEN 50 AND 100 THEN ' 50-100'
WHEN op BETWEEN 100 AND 500 THEN '100-500'
ELSE '500 and >' END AS op
FROM [#tmp_ranges]) AS t
GROUP BY op
RETURN
Result:
score range number of occurences range
------------------+-------------
0- 50 23
50-100 4
100-500 4
500 and > 21
What i need additional is a third column total for all open invoices inner score range.
Target result:
score range number of occurences Total
-----------+--------------------+------
0- 50 23 1.150
50-100 4 400
100-500 4 2.000
500 and > 21 22.000
Tables:
Invoices
InvoiceId CustomerId OrderId DateOfInvoice Amount
----------+----------+-------+-------------+------
1 1 20 20160301 1000.00
2 2 22 20160501 2000.00
3 1 102 20160601 3000.00
...
Orders
OrderId MandantId CustomerId DateOfOrder Amount
-------+---------+----------+-----------+-----------
20 1 1 20160101 1000.00
22 1 2 20160101 2000.00
102 1 1 20160101 3000.00
...
Payment
PaymentId MandantId CustomerId InvoiceId OrderId DateOfPayment Amount
---------+---------+----------+---------+-------+-------------+-------------
1 1 1 1 20 20160310 1000.00
2 1 2 2 22 20160505 2000.00
3 1 1 3 102 20160610 3000.00
...
hope it's helpfull and thanks again in advance for any solution

Joining to another table only on the first occurrence of a field

Note: I have tried to simplify the below to make it simpler both for me and for anyone else to understand, the tables I reference below are in fact sub-queries joining a lot of different data together from different sources)
I have a table of purchased items:
Items
ItemSaleID CustomerID ItemCode
1 100 A
2 100 B
3 100 C
4 200 A
5 200 C
I also have transaction header and detail tables coming from a till system:
TranDetail
TranDetailID TranHeaderID ItemSaleID Cost
11 51 1 $10
12 51 2 $10
13 51 3 $10
14 52 4 $20
15 52 5 $10
TranHeader
TranHeaderID CustomerID Payment Time
51 100 $100 11:00
52 200 $50 12:00
53 100 $20 13:00
I want to get to a point where I have a table like:
ItemSaleID CustomerID ItemCode Cost Payment Time
1 100 A $10 $120 11:00
2 100 B $10 11:00
3 100 C $10 11:00
4 200 D $20 $50 12:00
5 200 E $10 12:00
I have a query which produces the results but when I add in the ROW_NUMBER() case statement goes from 2 minutes to 30+ minutes.
The query is further confused because I need to supply the earliest date relating to the list of transactions and the total price paid (could be many transactions throughout the day for upgrades etc)
Query below:
SELECT ItemSaleID
, CustomerID
, ItemCode
, Cost
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY TranHeaderID ORDER BY ItemSaleID) = 1
THEN TRN.Payment ELSE NULL END AS Payment
FROM Items I
OUTER APPLY (
SELECT TOP 1 SUB.Payment, Time
FROM TranHeader H
INNER JOIN TranDetail D ON H.TranHeaderID = D.TranHeaderID
OUTER APPLY (SELECT SUM(Payment) AS Payment
FROM TranHeader H2
WHERE H2.CustomerID = Items.CustomerID
) SUB
WHERE D.CustomerID = I.CustomerID
) TRN
WHERE ...
Is there a way that I can only show payments for each occurrence of the customer ID whilst maintaining performance