SQL - Knocking off Accordingly - sql

I have an issue in SQL where all transaction just come into one giant messy tables.
Example:
1 | Invoice | $300
2 | Invoice | $250
3 | Payment | $100
4 | Invoice | $200
5 | Payment | $300
So i will have 3 invoices and 2 paymentsbut
the Payment at line 3 can only be knocking off the Invoice on line 1
and Payment on line 5 is for the Invoice in line 2.
I want to Net Off the Payment and Find rather it is an Overpayment or Underpayment or it is knocking off entire invoice.
How can i do this?

This is easy using ROW_NUMBER(). Assume there must be an invoice before payment.
SELECT
i.Id AS InvoiceId, p.Id AS PaymentId,
CASE
WHEN RefAmount = 0 THEN 'Settled'
WHEN RefAmount > 0 THEN 'Underpayment'
ELSE 'Overpayment'
END AS State
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY Id) AS GroupId, *
FROM messy WHERE Type = 'Invoice'
) i
LEFT OUTER JOIN
(
SELECT ROW_NUMBER() OVER (ORDER BY Id) AS GroupId, *
FROM messy WHERE Type = 'Payment'
) p ON i.GroupId = p.GroupId
CROSS APPLY (SELECT i.Amount - ISNULL(p.Amount, 0) AS RefAmount) r
SQL Fiddle

Related

How to prevent duplicates when getting sum of multiple columns with multiple joins

Lets say I have 3 tables: Invoices, Charges, and Payments. Invoices can have multiple charges, and charges can have multiple payments.
Doing a simple join, data would look like this:
invoiceid | chargeid | charge | payment
----------------------------------
1 | 1 | 50 | 50
2 | 2 | 100 | 25
2 | 2 | 100 | 75
2 | 3 | 30 | 10
2 | 3 | 30 | 5
If I do an join with sums,
select invoiceid, sum(charge), sum(payment)
from invoices i
inner join charges c on i.invoiceid = c.invoiceid
inner join payments p on p.chargeid = c.chargeid
group by invoiceid
The sum of payments would be correct but charges would include duplicates:
invoiceid | charges | payments
--------------------------------------
1 | 50 | 50
2 | 260 | 115
I want a query to get a list of invoices with the sum of payments and sum of charges per invoice, like this:
invoiceid | charges | payments
--------------------------------------
1 | 50 | 50
2 | 130 | 115
Is there any way to do this by modifying the query above WITHOUT using subqueries since subqueries can be quite slow when dealing with a large amount of data? I feel like there must be a way to only include unique charges in the sum.
You can also achieve this by using LATERAL JOINS
SELECT
i.invoiceid,
chgs.total_charges,
pays.total_payments
FROM
invoices AS i
JOIN LATERAL (
SELECT
SUM( charge ) AS total_charges
FROM
charges AS c
WHERE
c.invoiceid = i.invoiceid
) AS chgs ON TRUE
JOIN LATERAL (
SELECT
SUM( payment ) AS total_payments
FROM
payments AS p
WHERE
p.chargeid = c.chargeid
) AS pays ON TRUE
one way is to do the aggregation by the tables before the joins on the grouping value
SELECT i.invoiceid, SumOfCharge, SumOfInvoice
FROM invoices i
INNER JOIN (SELECT InvoiceID, suM(charges) sumOfCharges
FROM charges c
GROUP BY Invoiceid) c
on i.invoiceid = c.invoiceid
INNER JOIN (SELECT invoiceid, sum(payment) as SumOfPayment
FROM charages c
INNER JOIN payments p on p.chargeid = c.chargeid
GROUP BY Invoiceid) P
on i.invoiceID = p.invoiceid
Another way would be to do it inline per invoice using correlation
SELECT i.invoiceid
, (SELECT SUM(charge) FROM charges c WHERE c.invoiceid = i.invoiceid) SumOfCharge
, SUM(Payment) SumOfInvoice
FROM invoices i
INNER JOIN charges c
on i.invoiceid = c.invoiceid
INNER JOIN payments p
on p.chargeid = c.chargeid
GROUP BY Invoiceid
I hope this will help.
select invoiceid, sum(distinct charge)as charges, sum(payment)as payments
from yourtable
group by invoiceid;

How to create BigQuery this query in retail dataset

I have a table with user retail transactions. It includes sales and cancels. If Qty is positive - it sells, if negative - cancels. I want to attach cancels to the most appropriate sell. So, I have tables likes that:
| CustomerId | StockId | Qty | Date |
|--------------+-----------+-------+------------|
| 1 | 100 | 50 | 2020-01-01 |
| 1 | 100 | -10 | 2020-01-10 |
| 1 | 100 | 60 | 2020-02-10 |
| 1 | 100 | -20 | 2020-02-10 |
| 1 | 100 | 200 | 2020-03-01 |
| 1 | 100 | 10 | 2020-03-05 |
| 1 | 100 | -90 | 2020-03-10 |
User with ID 1 has the following actions: buy 50 -> return 10 -> buy 60 -> return 20 -> buy 200 -> buy 10 - return 90. For each cancel row (with negative Qty) I find the previous row (by Date) with positive Qty and greater than cancel Qty.
So I need to create BigQuery queries to create table likes this:
| CustomerId | StockId | Qty | Date | CancelQty |
|--------------+-----------+-------+------------+-------------|
| 1 | 100 | 50 | 2020-01-01 | -10 |
| 1 | 100 | 60 | 2020-02-10 | -20 |
| 1 | 100 | 200 | 2020-03-01 | -90 |
| 1 | 100 | 10 | 2020-03-05 | 0 |
Does anybody help me with these queries? I have created one candidate query (split cancel and sales, join them, and do some staff for removing), but it works incorrectly in the above case.
I use BigQuery, so any BQ SQL features could be applied.
Any ideas will be helpful.
You can use the following query.
;WITH result AS (
select t1.*,t2.Qty as cQty,t2.Date as Date_t2 from
(select *,ROW_NUMBER() OVER (ORDER BY qty DESC) AS [ROW NUMBER] from Test) t1
join
(select *,ROW_NUMBER() OVER (ORDER BY qty) AS [ROW NUMBER] from Test) t2
on t1.[ROW NUMBER] = t2.[ROW NUMBER]
)
select CustomerId,StockId,Qty,Date,ISNULL(cQty, 0) As CancelQty,Date_t2
from (select CustomerId,StockId,Qty,Date,case
when cQty < 0 then cQty
else NULL
end AS cQty,
case
when cQty < 0 then Date_t2
else NULL
end AS Date_t2 from result) t
where qty > 0
order by cQty desc
result: https://dbfiddle.uk
You can do this as a gaps-and-islands problem. Basically, add a grouping column to the rows based on a cumulative reverse count of negative values. Then within each group, choose the first row where the sum is positive. So:
select t.* (except cancelqty, grp),
(case when min(case when cancelqty + qty >= 0 then date end) over (partition by customerid grp) = date
then cancelqty
else 0
end) as cancelqty
from (select t.*,
min(cancelqty) over (partition by customerid, grp) as cancelqty
from (select t.*,
countif(qty < 0) over (partition by customerid order by date desc) as grp
from transactions t
) t
from t
) t;
Note: This works for the data you have provided. However, there may be complicated scenarios where this does not work. In fact, I don't think there is a simple optimal solution assuming that the returns are not connected to the original sales. I would suggest that you fix the data model so you record where the returns come from.
The below query seems to satisfy the conditions and the output mentioned.The solution is based on mapping the base table (t) and having the corresponding canceled qty row alongside from same table(t1)
First, a self join based on the customer and StockId is done since they need to correspond to the same customer and product.
Additionally, we are bringing in the canceled transactions t1 that happened after the base row in table t t.Dt<=t1.Dt and to ensure this is a negative qty t1.Qty<0 clause is added
Further we cannot attribute the canceled qty if they are less than the Original Qty. Therefore I am checking if the positive is greater than the canceled qty. This is done by adding a '-' sign to the cancel qty so that they can be compared easily. -(t1.Qty)<=t.Qty
After the Join, we are interested only in the positive qty, so adding a where clause to filter the other rows from the base table t with canceled quantities t.Qty>0.
Now we have the table joined to every other canceled qty row which is less than the transaction date. For example, the Qty 50 can have all the canceled qty mapped to it but we are interested only in the immediate one came after. So we first group all the base quantity values and then choose the date of the canceled Qty that came in first in the Having clause condition HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
Finally we get the rows we need and we can exclude the last column if required using an outer select query
SELECT t.CustomerId,t.StockId,t.Qty,t.Dt,IFNULL(t1.Qty, 0) CancelQty
,t1.dt dt_t1
FROM tbl t
LEFT JOIN tbl t1 ON t.CustomerId=t1.CustomerId AND
t.StockId=t1.StockId
AND t.Dt<=t1.Dt AND t1.Qty<0 AND -(t1.Qty)<=t.Qty
WHERE t.Qty>0
GROUP BY 1,2,3,4
HAVING IFNULL(t1.dt, '0')=MIN(IFNULL(t1.dt, '0'))
ORDER BY 1,2,4,3
fiddle
Consider below approach
with sales as (
select * from `project.dataset.table` where Qty > 0
), cancels as (
select * from `project.dataset.table` where Qty < 0
)
select any_value(s).*,
ifnull(array_agg(c.Qty order by c.Date limit 1)[offset(0)], 0) as CancelQty
from sales s
left join cancels c
on s.CustomerId = c.CustomerId
and s.StockId = c.StockId
and s.Date <= c.Date
and s.Qty > abs(c.Qty)
group by format('%t', s)
if applied to sample data in your question - output is

SQL Server map payments to products

We have shopping carts that might have included products and payments.
Since these payments will be made to the carts, there will be no relation between the products and the payments except that they are in the same cart.
There are cases that these products will be invoiced individually even though they are in the same cart.
To create the invoices for the products we need the payment details, so we have to map the products to the payments.
These are our tables:
create table Products
(
ItemId int primary key,
CartId int not null,
ItemAmount smallmoney not null
)
create table Payments
(
PaymentId int primary key,
CartId int not null,
PaymentAmount smallmoney not null
)
create table MappedTable
(
ItemId int not null,
PaymentId int not null,
Amount smallmoney not null
)
INSERT INTO Products (ItemId, CartId, ItemAmount)
VALUES (1, 1, 143.49), (2, 1, 143.49), (3, 1, 143.49), (4, 2, 50.00), (5, 3, 75.00), (6, 3, 75.00)
INSERT INTO Payments (PaymentId, CartId, PaymentAmount)
VALUES (1, 1, 376.47), (2, 1, 54.00), (3, 2, 60.00), (4, 3, 140.00)
--select * from Products
--select * from Payments
--DROP TABLE Products
--DROP TABLE Payments
--DROP TABLE MappedTable
Products
ItemId | CartId | ItemAmount
------ | ------ | ----------
1 | 1 | 143.49
2 | 1 | 143.49
3 | 1 | 143.49
4 | 2 | 50.00
5 | 3 | 75.00
6 | 3 | 75.00
Payments
PaymentId | CartId | PaymentAmount
--------- | ------ | -------------
1 | 1 | 376.47
2 | 1 | 54.00
3 | 2 | 60.00
4 | 3 | 140.00
The order of the products and the payments may differ.
We need the output to look like this:
MappingTable
ItemId | PaymentId | MappedAmount
------ | --------- | ------------
1 | 1 | 143.49
2 | 1 | 143.49
3 | 1 | 89.49
3 | 2 | 54.00
4 | 3 | 50.00 (Remaining 10.00 from Payment 3 will be ignored)
5 | 4 | 75.00
6 | 4 | 65.00 (Missing 10.00 from Payment 4 will be ignored)
Cart 1: Sum of payments = sum of product costs
Cart 2: Sum of payments > sum of product costs. Only take the total product cost. Ignore the remaining 10.00
Cart 3: Sum of payments < sum of product costs. Take all the payments, ignore the fact that the payment is 10.00 short.
I thought that a query like the one below may solve the problem, but no luck.
insert into MappedTable
select
prd.ItemId, pay.PaymentId,
(Case
when prd.ItemAmount - isnull((select sum(m.Amount)
from MappedTable m
where m.ItemId = prd.ItemId), 0) <= pay.PaymentAmount - isnull((select sum(m.Amount) from MappedTable m where m.PaymentId = pay.PaymentId), 0)
then prd.ItemAmount - isnull((select sum(m.Amount) from MappedTable m where m.ItemId = prd.ItemId), 0)
else pay.PaymentAmount - isnull((select sum(m.Amount) from MappedTable m where m.PaymentId = pay.PaymentId), 0)
end)
from
Products prd
inner join
Payments pay on pay.CartId = prd.CartId
where
prd.ItemAmount > isnull((select sum(m.Amount) from MappedTable m where m.ItemId = prd.ItemId), 0)
and pay.PaymentAmount > isnull((select sum(m.Amount) from MappedTable m where m.PaymentId = pay.PaymentId), 0)
I've read about CTE (Common Table Expressions) and set-based approaches but I couldn't handle the issue.
So is this possible without using a cursor or a while loop?
Generally, this kind of task is referred to as a "knapsack problem", which is known to have no solution more efficient than brute forcing all possible combinations. In your case, however, you have additional conditions, namely ordered sets of both items and payments, so it is actually solvable using "overlapping intervals" technique.
The idea is to generate ranges of items and payments (1 pair of ranges per cart) and then look which payments overlap with which items, sequentially.
For any item-payment combination, there are 3 possible scenarios:
Payment covers the beginning of the item range (possibly completely covering the item);
Payment is completely inside the item;
Payment covers the end of item, thus "closing" it.
So, all that is needed is, for every item, find all suitable payments that match the aforementioned criteria, and order them by their identifiers. Here is a query that does it:
with cte as (
-- Project payment ranges, per cart
select pm.*, sum(pm.PaymentAmount) over(partition by pm.CartId order by pm.PaymentId) as [RT]
from #Payments pm
)
select q.ItemId, q.PaymentId,
-- Calculating the amount from payment that goes for this item
case q.Match
when 1 then q.PaymentRT - (q.ItemRT - q.ItemAmount)
when 2 then q.PaymentAmount
when 3 then case
-- Single payment spans over several items
when q.PaymentRT >= q.ItemRT and q.PaymentRT - q.PaymentAmount <= q.ItemRT - q.ItemAmount then q.ItemAmount
-- Payment is smaller than item
else q.ItemRT - (q.PaymentRT - q.PaymentAmount)
end
end as [Amount]
--, q.*
from (
select
sq.ItemId, pm.PaymentId, sq.ItemAmount, sq.RT as [ItemRT],
pm.PaymentAmount, pm.RT as [PaymentRT],
row_number() over(partition by sq.CartId, sq.ItemId, pm.PaymentId order by pm.RT) as [RN],
pm.Match
--, sq.CartId
from (
select pr.*, sum(pr.ItemAmount) over(partition by pr.CartId order by pr.ItemId) as [RT]
from #Products pr
) sq
outer apply (
-- First payment to partially cover this item
select top (1) c.*, 1 as [Match] from cte c where c.CartId = sq.CartId
and c.RT > sq.RT - sq.ItemAmount and c.RT < sq.RT
order by sq.RT
union all
-- Any payments that cover this item only
select c.*, 2 as [Match] from cte c where c.CartId = sq.CartId
and c.RT - c.PaymentAmount > sq.RT - sq.ItemAmount
and c.RT < sq.RT
union all
-- Last payment that covers this item
select top (1) c.*, 3 as [Match] from cte c where c.CartId = sq.CartId
and c.RT >= sq.RT
order by sq.RT
) pm
) q
where q.RN = 1;
The outer apply section is where I get payments related to each item. The only problem is, if a payment covers an item in its entirety, it will be listed several times. In order to remove these duplicates, I have ordered matches using row_number() and added an additional wrapping level - subquery with the q alias - where I cut off any duplicated rows by filtering the row number value.
P.S. If your SQL Server version is prior to 2012, you will need to calculate running totals using one of many approaches available on the Internet, because sum() over(order by ...) is only available on 2012 and later versions.

Grouping in SQL Table [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Suppose I have a Table such that:
|ID | product |orderid | brand |number of product cust ord|
|----|---------|--------|-------|--------------------------|
| 1 | 123 | 111 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 1 | 234 | 111 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 1 | 345 | 333 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 2 | 123 | 211 | br | 1 |
|----|---------|--------|-------|--------------------------|
| 2 | 456 | 212 | br | 2 |
|----|---------|--------|-------|--------------------------|
| 3 | 567 | 213 | br | 1 |
|----|---------|--------|-------|--------------------------|
What I'd like to do is group them as:
|ID | brand |number of product cust ord|
|----|---------|--------------------------|
| 1 | br | 3 |
|----|---------|--------------------------|
| 2 | br | 4 |
|----|---------|--------------------------|
further to that i'd like to classify them and tried a case...when but can't seem to get it right.
if ID purchases more than 3 unique products and orders more than twice- i'd like to call them a frequent buyer (in the above example, ID '1' would be a 'frequent buyer'), if the average number of products they purchase is higher than the average number of that product sold - i'd like to call them a 'merchant', else just a purchaser.
I've renamed the last field to qty for brevity and called the table test1.
To get frequent flyers use below query. Note that I used >= instead of >. I changed this based on your example where ID 1 is a "frequent flyer" even though he only bought 3 products, not more than 3.
SELECT ID, count(distinct product) as DistinctProducts, count(distinct orderid) DistinctOrders
FROM test1
GROUP BY ID
HAVING count(distinct product) >= 3 and count(distinct orderid) >= 2
Not sure if I understood the merchant logic correctly. Below is the query which will give you customers that on average purchased more than overall average of product for any given product. There are none in the data.
SELECT DISTINCT c.ID
FROM
(select ID, product, avg(qty) as AvgQty
FROM test1
GROUP BY ID, product) as c
FULL OUTER JOIN
(select product, avg(qty) as AvgQty
FROM test1
GROUP BY product) p ON p.product = c.product
WHERE c.AvgQty > p.AvgQty;
To get "purchasers" do EXCEPT between all customer and the UNION of merchants and frequent buyers:
select distinct ID from test1
EXCEPT
(SELECT ID FROM (
select ID, count(distinct product) as DistinctProducts, count(distinct orderid) DistinctOrders
FROM test1
GROUP BY ID
HAVING count(distinct product) >= 3 and count(distinct orderid) >= 2) t
UNION
SELECT DISTINCT c.ID
FROM
(select ID, product, avg(qty) as AvgQty
FROM test1
GROUP BY ID, product) as c
FULL OUTER JOIN
(select product, avg(qty) as AvgQty
FROM test1
GROUP BY product) p ON p.product = c.product
WHERE c.AvgQty > p.AvgQty
);
This is one way that you could do it. Note that according to the description you gave, buyers could be constantly being reclassified between 'Merchant' and 'Purchaser' as the average goes up and down. That might not be what you want.
With cte As (
Select ID,
Brand,
DistinctOrders = Count(Distinct OrderID), -- How many separate orders by this customer for the brand?
DistinctProducts = Count(Distinct Product), -- How many different products by this customer for the brand?
[number of product cust ord] = Sum(CountOfProduct), -- Total number of items by this customer for the brand.
AverageCountOfProductPerBuyer =
Sum(Sum(CountOfProduct)) Over () * 1.0 / (Select Count(*) From (Select Distinct ID, Brand From #table) As tbl)
-- Average number of items per customer (for all customers) for this brand
From #table
Group By ID, Brand)
Select ID, Brand, DistinctOrders, DistinctProducts, [number of product cust ord],
IsFrequentBuyer = iif(DistinctOrders > 1 And DistinctProducts > 2, 'Frequent Buyer', NULL),
IsMerchant = iif(AverageCountOfProductPerBuyer < [number of product cust ord], 'Merchant', 'Purchaser')
From cte;
This query could be written without the common-table expression, but was written this way to avoid defining expressions multiple times.
Note that I have the first ID as a 'Frequent Buyer' based on your description, so I'm assuming that when you say 'more than 3 unique products' you mean 3 or more. Likewise with two or more distinct orders.

Partition By in Nested Select Returning Strange Results

I came across a strange behavior in SQLServer 2008 that I didn't understand. I wanted to quickly just pair a unique customer with a unique payment.
Using this query, I get the results that I am expecting. Each CustomerId is paired with a different PaymentId.
SELECT CustomerId, PaymentId, RowNumber1, RowNumber2
FROM (
SELECT
c.Id as CustomerId,
p.Id as PaymentId,
ROW_NUMBER() OVER (PARTITION BY p.Id ORDER BY p.Id) AS RowNumber1,
ROW_NUMBER() OVER (PARTITION BY c.Id ORDER BY c.Id) AS RowNumber2
FROM Customer as c
CROSS JOIN Payment as p
) AS INNERSELECT WHERE RowNumber2 = 1
+------------+-----------+------------+------------+
| CustomerId | PaymentId | RowNumber1 | RowNumber2 |
+------------+-----------+------------+------------+
| 4 | 1 | 1 | 1 |
| 5 | 2 | 2 | 1 |
+------------+-----------+------------+------------+
However, if I remove the RowNumber1 column from the outer select, the results seem to change. Now every value of PaymentId is 1, even though I did not touch the inner select statement.
SELECT CustomerId, PaymentId, RowNumber2
FROM (
SELECT
c.Id as CustomerId,
p.Id as PaymentId,
ROW_NUMBER() OVER (PARTITION BY p.Id ORDER BY p.Id) AS RowNumber1,
ROW_NUMBER() OVER (PARTITION BY c.Id ORDER BY c.Id) AS RowNumber2
FROM Customer as c
CROSS JOIN Payment as p
) AS INNERSELECT WHERE RowNumber2 = 1
+------------+-----------+------------+
| CustomerId | PaymentId | RowNumber2 |
+------------+-----------+------------+
| 4 | 1 | 1 |
| 5 | 1 | 1 |
+------------+-----------+------------+
Could anyone explain to me why removing a column from the outer select causes the values in the PaymentId column to change? What other method could I use to achieve my desired goal without needing the row numbers in the result set?
It's because order for row_number() inside your subquery is generally not defined.
When you make a cross join, rows could be in any order
It could be:
CUSTOMERID PAYMENTID
4 1
4 2
5 2
5 1
or it could be
CUSTOMERID PAYMENTID
4 1
4 2
5 1
5 2
when you compute row_number in first recordset partition by CUSTOMERID, you get
CUSTOMERID PAYMENTID ROWNUMBER
4 1 1
4 2 2
5 2 1
5 1 2
when you compute row_number in second recordset partition by CUSTOMERID, you get
CUSTOMERID PAYMENTID ROWNUMBER
4 1 1
4 2 2
5 1 1
5 2 2
IF you wnat just pair random customers and payments, you can do this
with cte_cust as (
select id, row_number() over (order by id) as row_num
from Customer
), cte_pay as (
select id, row_number() over (order by id) as row_num
from Payment
)
select
c.id as CustomerId,
p.id as PaymentId
from cte_cust as c
inner join cte_pay as p on p.row_num = c.row_num
note that if you have more customers than payments, some customers will not appear in the result (and vice versa).
sql fiddle demo