How to use a subselect in a LEFT JOIN ON clause? - sql

I have a table t with
ORD_DATE
ORD_ID
ORD_REF
ORD_TYPE1
ORD_TYPE2
PRODNUM
PRODQUAL
PRICE
2020-09-01
101
101
ORDER
ORDER
456
F
555
2020-09-02
102
101
CONF
ORDER
456
F
555
2020-11-30
103
102
ORDER
ORDER
123
K
444
2020-12-01
104
102
CONF
ORDER
123
K
444
2020-12-01
105
103
ORDER
ORDER
123
K
444
2020-12-01
106
104
ORDER
ORDER
123
K
333
2020-12-02
107
104
CONF
ORDER
123
K
333
2020-12-08
108
104
CONF
RETURN
123
K
-333
2020-12-01
109
105
ORDER
ORDER
123
F
222
2020-12-02
110
105
CONF
ORDER
123
F
222
and a table s with:
ORD_DATE
PROD_NUMBER
PROD_QUAL
2020-12-01-00.00.00.000000
123
K
2020-12-01-00.00.00.000000
123
L
In table t are all sales per day.
A sale has 2 stages: first the order is generated when the customer buys something
("ORDER"/"ORDER"). Then it gets confirmed which is at the next day or within the next days normally ("CONF"/"ORDER"). If a customer sends the product back it's a return ("CONF"/"RETURN").
In table s are the products that are "second hand".
if a product is in that table it means all sales from table t with
ORDER_TYPE_1 = "ORDER"
AND ORDER_TYPE_2 = "ORDER"
AND t.ORD_DATE >= s.ORD_DATE
AND t.PROD_NUMBER = s.PROD_NUMBER
AND t.PROD_QUAL = s.PROD_QUAL
count as "second hand".
I need the sum of all "second hand" sales that are confirmed from the year 2021 and month 12. But only rows with CONF/ORDER or CONF/RETURN should be in the calculation. I have CAL_YEAR and CAL_MONTH in table t for that (omitted for less clutter).
From table t only ORDER_REF 105 matches that and the sum would be 0 because only these 2 rows matter:
| 2020-12-02 | 107 | 104 | CONF | ORDER | 123 | K | 333
| 2020-12-08 | 108 | 104 | CONF | RETURN | 123 | K | -333
My code so far:
SELECT SUM(PRICE)
FROM t
--
LEFT JOIN s
ON t.PRODNUM = s.PRODNUM
AND t.PRODQUAL = s.PRODQUAL
AND (SELECT ORD_DATE FROM t WHERE ORDER_TYPE_1 = 'ORDER' AND ORDER_TYPE_2 = 'ORDER') >= s.ORD_DATE
--
WHERE CAL_YEAR = 2021
AND CAL_MONTH = 12
AND ORDER_TYPE_1 = 'CONF'
AND ORDER_TYPE_2 IN ('ORDER', 'RETURN')
--
GROUP BY PRICE
;
SQL-Error: "single-row subquery returns more than one row
My problem is limiting the LEFT JOIN to ORDER/ORDER (so that ORDER_REF 105 is in) but only use CONF/ORDER and CONF/RETURN for the sum (so that ORDER_REF 102 is out).
Anyone can help?

The simplest way I can think of would be to do a self-join, where you join a second copy of table t aliased t2 to use for the CONF/ORDER and CONF/RETURN rows, while you use t for the ORDER/ORDER rows.
SELECT SUM(t2.PRICE)
FROM t
--
INNER JOIN t t2
ON t2.ORD_REF = t.ORD_REF
AND t2.ORDER_TYPE_1 = 'CONF'
AND t2.ORDER_TYPE_2 IN ('ORDER', 'RETURN')
--
LEFT JOIN s
ON t.PRODNUM = s.PRODNUM
AND t.PRODQUAL = s.PRODQUAL
AND t.ORD_DATE >= s.ORD_DATE
--
WHERE t.CAL_YEAR = 2021
AND t.CAL_MONTH = 12
AND t.ORDER_TYPE_1 = 'ORDER'
AND t.ORDER_TYPE_2 = 'ORDER'
;
If you need it to be more efficient, you could use analytic/window functions to pull the summed price from the CONF rows into the ORDER/ORDER row as a new column. This way it will only query table t once instead of twice.
SELECT SUM(t2.order_price_sum)
FROM (select t.*,
sum(case when ORDER_TYPE_1 = 'CONF'
AND ORDER_TYPE_2 IN ('ORDER', 'RETURN')
then t.price
else 0 end) over (partition by ord_ref) as order_price_sum
from t) t2
--
LEFT JOIN s
ON t2.PRODNUM = s.PRODNUM
AND t2.PRODQUAL = s.PRODQUAL
AND t2.ord_date >= s.ORD_DATE
--
WHERE CAL_YEAR = 2021
AND CAL_MONTH = 12
AND ORDER_TYPE_1 = 'ORDER'
AND ORDER_TYPE_2 = 'ORDER'
;

Related

Joining multiple tables and getting MAX value in subquery PostgreSQL

I have 4 Tables in PostgreSQL with the following structure as you can see below:
"Customers"
ID | NAME
101 Max
102 Peter
103 Alex
"orders"
ID | customer_id | CREATED_AT
1 101 2022-05-12
2 101 2022-06-14
3 101 2022-07-9
4 102 2022-02-14
5 102 2022-06-18
6 103 2022-05-22
"orderEntry"
ID | order_id | product_id |
1 3 10
2 3 20
3 3 30
4 5 20
5 5 40
6 6 20
"product"
ID | min_duration
10 P10D
20 P20D
30 P30D
40 P40D
50 P50D
Firstly I need to select "orders" with the max(created_at) date for each customer this is done with the query (it works!):
SELECT c.id as customerId,
o.id as orderId,
o.created_at
FROM Customer c
INNER JOIN Orders o
ON c.id = o.customer_id
INNER JOIN
(
SELECT customer_id, MAX(created_at) Max_Date
FROM Orders
GROUP BY customer_id
) res ON o.customer_id = res.customer_id AND
o.created_at = res.Max_date
the result will look like this:
customer_id | order_id | CREATED_AT
101 3 2022-07-9
102 5 2022-06-18
103 6 2022-05-22
Secondly I need to select for each order_id from "orderEntry" Table, "products" with the max(min_duration) the result should be:
order_id | max(min_duration)
3 P30D
5 P40D
6 P20D
and then join results from 1) and 2) queries by "order_id" and the total result which I'm trying to get should look like this:
customer_name | customer_id | Order_ID | Order_CREATED_AT | Max_Duration
Max 101 3 2022-07-9 P30D
Peter 102 5 2022-06-18 P40D
Alex 103 6 2022-05-22 P20D
I'm struggling to get query for 2) and then join everything with query from 1) to get the result. Any help I would appreciate!
You could make the first query to an CTE and use that to join the rest of the queries.
Like this.
WITH CTE AS ( SELECT c.id as customerId,
o.id as orderId,
o.created_at
FROM Customer c
INNER JOIN Orders o
ON c.id = o.customer_id
INNER JOIN
(
SELECT customer_id, MAX(created_at) Max_Date
FROM Orders
GROUP BY customer_id
) res ON o.customer_id = res.customer_id AND
o.created_at = res.Max_date)
SELECT customerId,orderId,created_at,p.min_duration
FROM CTE
JOIN (SELECT "orderId", MAX("product_id") as product_id FROM "orderEntry" GROUP BY orderId) oe ON CTE.orderId = oe.orderId
JOIN "product" pr ON oe.product_id = pr."ID"

Tracing original Value through Iteration SQL

Suppose there is a data collection system that, whenever a record is altered, it is then saved as a new record with a prefix (say M-[most recent number in que and is unique]).
Suppose I am given the following data set:
Customer | Original_Val
1 1020
2 1011
3 1001
I need to find the most recent value for each customer given the following table:
Customer | Most_Recent_Val | Pretained_To_Val | date
1 M-2000 M-1050 20170225
1 M-1050 M-1035 20170205
1 M-1035 1020 20170131
1 1020 NULL 20170101
2 M-1031 1011 20170105
2 1011 NULL 20161231
3 1001 NULL 20150101
My desired output would be:
Customer | Original_Val | Most_Recent_Val | date
1 1020 M-2000 20170225
2 1011 M-1031 20170105
3 1001 1001 20150101
For customer 1, there are 4 levels i.e (M-2000 <- M-1050 <- M-1035 <- 1020) Note that there would be no more than 10 levels of depth for each customer.
Much Appreciated! Thanks in advance.
Find the min and max of each customer and then join it together. Something like this:
Select
[min].Customer
,[min].Most_Recent_Val as Original_Val
,[max].Most_Recent_Val as Most_Recent_Val
,[max].date
From
(
Select
Customer
,Most_Recent_Val
,date
From
table t1
inner join (
Select
Customer
,MIN(date) as MIN_Date
From
table
Group By
Customer
) t2 ON t2.Customer = t1.Customer
and t2.MIN_Date = t1.Date
) [min]
inner join (
Select
Customer
,Most_Recent_Val
,date
From
table t1
inner join (
Select
Customer
,MAX(date) as MAX_Date
From
table
Group By
Customer
) t2 ON t2.Customer = t1.Customer
and t2.MAX_Date = t1.Date
) [max] ON [max].Customer = [min].Customer

Subtracting rows depending on values of another column

I have two tables purchase, I want to subtract purchase date. depending on Customer ID, there are repeating customer ID's, so I want to subtract purchase date of Customer ID 105 and 105, 108 and 108 etc.
I have the following code, but it is subtracting each purchase date from the next purchase date
SELECT DATEDIFF(DAY,P1.PURCHASEDATE,P2.PURCHASEDATE) AS "diff in days since last purchase"
FROM Purchases P1
JOIN Purchases P2
ON P1.CustomerID= P2.CustomerID
Try adding to your ON a not equal: P1.PURCHASEID <> P2.PURCHASEID , meaning something like this:
SELECT DATEDIFF(DAY,P1.PURCHASEDATE,P2.PURCHASEDATE) AS "diff in days"
FROM Purchases P1
JOIN Purchases P2
(ON P1.CustomerID= P2.CustomerID and P1.PURCHASEID <> P2.PURCHASEID )
You can use OUTER APPLY:
;WITH Purchases AS (
SELECT *
FROM (VALUES
(1,'2012-08-15',1,105,'a510'),
(2,'2012-08-15',2,102,'a510'),
(3,'2012-08-15',3,103,'a506'),
(4,'2012-08-16',1,105,'a510'),
(5,'2012-08-17',5,106,'a507'),
(6,'2012-08-17',5,107,'a509'),
(7,'2012-08-18',4,108,'a502'),
(8,'2012-08-19',2,108,'a510'),
(9,'2012-08-19',3,109,'a502'),
(10,'2012-08-20',3,110,'a503')
) as t(PurchaseID,PurchaseDate,Qty,CustomerID,ProductID)
)
SELECT p1.*,
DATEDIFF(DAY,P2.PurchaseDate,P1.PurchaseDate) as ddiff
FROM Purchases p1
OUTER APPLY (
SELECT TOP 1 *
FROM Purchases
WHERE p1.CustomerID = CustomerID
AND PurchaseDate < p1.PurchaseDate
ORDER BY PurchaseDate DESC
) p2
Will output:
PurchaseID PurchaseDate Qty CustomerID ProductID ddiff
1 2012-08-15 1 105 a510 NULL
2 2012-08-15 2 102 a510 NULL
3 2012-08-15 3 103 a506 NULL
4 2012-08-16 1 105 a510 1
5 2012-08-17 5 106 a507 NULL
6 2012-08-17 5 107 a509 NULL
7 2012-08-18 4 108 a502 NULL
8 2012-08-19 2 108 a510 1
9 2012-08-19 3 109 a502 NULL
10 2012-08-20 3 110 a503 NULL
Also you can use LAG (SQL Server 2012 and up):
SELECT *,
DATEDIFF(DAY,LAG(PurchaseDate,1,NULL) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate),PurchaseDate) as ddiff
FROM Purchases

Select 1+ most recent rows

Given is a table with articles. The following exemplary table contains one article in different variations:
ID ARTICLE_NUMBER STORE_ID COUNTRY TYPE VALID_FROM
----------------------------------------------------------------
100 1 22 DE A 2015-11-01
101 1 22 DE A 2015-11-02
102 1 22 DE A 2015-11-03
103 1 22 DE A 2015-11-04
104 1 22 DE B 2015-11-10
105 1 22 DE B 2015-11-11
106 1 22 DE B 2015-11-11
What I need is a query which returns just the ID of the article with
article_number = 1 AND
store_id = 22 AND
country = 'DE' AND
the latest valid_from timestamp.
So far, the query should return ID = 105 or 106 (both have the same valid_from date, but I want only the one or the other in my result, no matter which, but not both). AND: because there are two types for this article (A + B), I also need ID = 103 in my result set.
How must the query look like?
You could try the HAVING parameter in your filter and selecting MAX(ID)
Or with a subselect:
SELECT [Type],(SELECT TOP(1) ID from dbo.articles S WHERE S.[Type] = A.Type AND S.Valid_From = MAX(A.Valid_From))
FROM dbo.articles A
WHERE
ARTICLE_NUMBER = 1
AND STORE_ID = 22
AND Country = 'DE'
-- AND Valid_FROM = (SELECT MAX(VALID_FROM) FROM dbo.articles)
GROUP BY [Type]

SQL Server : take 1 to many record set and make 1 record per id

I need some help. I need to take the data from these 3 tables and create an output that looks like below. The plan_name_x and pending_tallyx columns are derived to make one line per claim id. Each claim id can be associated to up to 3 plans and I want to show each plan and tally amounts in one record. What is the best way to do this?
Thanks for any ideas. :)
Output result set needed:
claim_id ac_name plan_name_1 pending_tally1 plan_name_2 Pending_tally2 plan_name_3 pending_tally3
-------- ------- ----------- -------------- ----------- -------------- ----------- --------------
1234 abc cooks delux_prime 22 prime_express 23 standard_prime 2
2341 zzz bakers delpux_prime 22 standard_prime 2 NULL NULL
3412 azb pasta's prime_express 23 NULL NULL NULL NULL
SQL Server 2005 table to use for the above result set:
company_claims
claim_id ac_name
1234 abc cooks
2341 zzz bakers
3412 azb pasta's
claim_plans
claim_id plan_id plan_name
1234 101 delux_prime
1234 102 Prime_express
1234 103 standard_prime
2341 101 delux_prime
2341 103 standard_prime
3412 102 Prime_express
Pending_amounts
claim_id plan_id Pending_tally
1234 101 22
1234 102 23
1234 103 2
2341 101 22
2341 103 2
3412 102 23
If you know that 3 is always the max amount of plans then some left joins will work fine:
select c.claim_id, c.ac_name,
cp1.plan_name as plan_name_1, pa1.pending_tally as pending_tally1,
cp2.plan_name as plan_name_2, pa2.pending_tally as pending_tally2,
cp3.plan_name as plan_name_3, pa3.pending_tally as pending_tally3,
from company_claims c
left join claim_plans cp1 on c.claim_id = cp1.claim_id and cp1.planid = 101
left join claim_plans cp2 on c.claim_id = cp2.claim_id and cp2.planid = 102
left join claim_plans cp3 on c.claim_id = cp3.claim_id and cp3.planid = 103
left join pending_amounts pa1 on cp1.claim_id = pa1.claimid and cp1.planid = pa1.plainid
left join pending_amounts pa2 on cp2.claim_id = pa2.claimid and cp2.planid = pa2.plainid
left join pending_amounts pa3 on cp3.claim_id = pa3.claimid and cp3.planid = pa3.plainid
I would first join all your data so that you get the relevant columns: claim_id, ac_name, plan_name, pending tally.
Then I would add transform this to get plan name and plan tally on different rows, with a label tying them together.
Then it should be easy to pivot.
I would tie these together with common table expressions.
Here's the query:
with X as (
select cc.*, cp.plan_name, pa.pending_tally,
rank() over (partition by cc.claim_id order by plan_name) as r
from company_claims cc
join claim_plans cp on cp.claim_id = cc.claim_id
join pending_amounts pa on pa.claim_id = cp.claim_id
and pa.plan_id = cp.plan_id
), P as (
select
X.claim_id,
x.ac_name,
x.plan_name as value,
'plan_name_' + cast(r as varchar(max)) as label
from x
union all
select
X.claim_id,
x.ac_name,
cast(x.pending_tally as varchar(max)) as value,
'pending_tally' + cast(r as varchar(max)) as label
from x
)
select claim_id, ac_name, [plan_name_1], [pending_tally1],[plan_name_2], [pending_tally2],[plan_name_3], [pending_tally3]
from (select * from P) p
pivot (
max(value)
for label in ([plan_name_1], [pending_tally1],[plan_name_2], [pending_tally2],[plan_name_3], [pending_tally3])
) as pvt
order by pvt.claim_id, ac_name
Here's a fiddle showing it in action: http://sqlfiddle.com/#!3/68f62/10