SQL joining tables based off latest previous date

SQL joining tables based off latest previous date - sql

Let's say I have two tables for example:
Table 1 - customer order information
x---------x--------x-------------x
cust_id | item | order date |
x---------x--------x-------------x
1 | 100 | 01/01/2020 |
1 | 112 | 03/07/2022 |
2 | 100 | 01/02/2020 |
2 | 168 | 05/03/2022 |
3 | 200 | 15/06/2021 |
----------x--------x-------------x
and Table 2 - customer membership status
x---------x--------x-------------x
cust_id | Status | startdate |
x---------x--------x-------------x
1 | silver | 01/01/2019 |
1 | bronze | 05/12/2019 |
1 | gold | 05/06/2022 |
2 | silver | 24/12/2021 |
----------x--------x-------------x
I want to join the two tables so that I can see what their membership status was at the time of purchase, to produce something like this:
x---------x--------x-------------x----------x
cust_id | item | order date | status |
x---------x--------x-------------x----------x
1 | 100 | 01/01/2020 | bronze |
1 | 112 | 03/07/2022 | gold |
2 | 100 | 01/02/2020 | NULL |
2 | 168 | 05/03/2022 | silver |
3 | 200 | 15/06/2021 | NULL |
----------x--------x-------------x----------x
Tried multiple ways include min/max, >=, group by having etc with no luck. I feel like multiple joins are going to be needed here but I can't figure out - any help would be greatly appreciated.
(also note: dates are in European/au not American format.)

Try the following using LEAD function to define periods limits for each status:
SELECT T.cust_id, T.item, T.orderdate, D.status
FROM order_information T
LEFT JOIN
(
SELECT cust_id, Status, startdate,
LEAD(startdate, 1, GETDATE()) OVER (PARTITION BY cust_id ORDER BY startdate) AS enddate
FROM customer_membership
) D
ON T.cust_id = D.cust_id AND
T.orderdate BETWEEN D.startdate AND D.enddate
See a demo on SQL Server.

SELECT
[cust_id],
[item],
[order date],
[status]
FROM
(
SELECT
t1.[cust_id],
t1.[item],
t1.[order date],
t2.[status],
ROW_NUMBER() OVER (PARTITION BY t1.[cust_id], t1.[item] ORDER BY t2.[startdate] DESC) rn
FROM #t1 t1
LEFT JOIN #t2 t2
ON t1.[cust_id] = t2.[cust_id] AND t1.[order date] >= t2.[startdate]
) a
WHERE rn = 1

SELECT
o.cust_id,
o.item,
o.order_date,
m.status
FROM
customer_order o
LEFT JOIN
customer_membership m
ON o.cust_id = m.cust_id
AND o.order_date > m.start_date
GROUP BY
o.cust_id,
o.item,
o.order_date
HAVING
Count(m.status) = 0
OR m.start_date = Max(m.start_date);

Related

Getting date, and count of unique customers when first order was placed

I have a table called orders that looks like this:
+--------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| memberid | int(11) | YES | | NULL | |
| deliverydate | date | YES | | NULL | |
+--------------+---------+------+-----+---------+-------+
And that contains the following data:
+------+----------+--------------+
| id | memberid | deliverydate |
+------+----------+--------------+
| 1 | 991 | 2019-10-25 |
| 2 | 991 | 2019-10-26 |
| 3 | 992 | 2019-10-25 |
| 4 | 992 | 2019-10-25 |
| 5 | 993 | 2019-10-24 |
| 7 | 994 | 2019-10-21 |
| 6 | 994 | 2019-10-26 |
| 8 | 995 | 2019-10-26 |
+------+----------+--------------+
I would like a result set returning each unique date, and a separate column showing how many customers that placed their first order that day.
I'm having problems with querying this the right way, especially when the data consists of multiple orders the same day from the same customer.
My approach has been to
Get all unique memberids that placed an order during the time period I want to look at
Filter out the ones that placed their first order during the period by comparing the memberids that has placed an order before the timeperiod
Grouping by delivery date, and counting all unique memberids (but this obviously counts unique memberids each day individually!)
Here's the corresponding SQL:
SELECT deliverydate,COUNT(DISTINCT memberid) FROM orders
WHERE
MemberId IN (SELECT DISTINCT memberid FROM orders WHERE deliverydate BETWEEN '2019-10-25' AND '2019-10-26')
AND NOT
MemberId In (SELECT DISTINCT memberid FROM orders WHERE deliverydate < '2019-10-25')
GROUP BY deliverydate
ORDER BY deliverydate ASC;
But this results in the following with the above data:
+--------------+--------------------------+
| deliverydate | COUNT(DISTINCT memberid) |
+--------------+--------------------------+
| 2019-10-25 | 2 |
| 2019-10-26 | 2 |
+--------------+--------------------------+
The count for 2019-10-26 should be 1.
Appreciate any help :)

You can aggregate twice:
select first_deliverydate, count(*) cnt
from (
select min(deliverydate) first_deliverydate
from orders
group by memberid
) t
group by first_deliverydate
order by first_deliverydate
The subquery gives you the first order data of each member, then the outer query aggregates and counts by first order date.
This demo on DB Fiddle with your sample data returns:
first_deliverydate | cnt
:----------------- | --:
2019-10-21 | 1
2019-10-24 | 1
2019-10-25 | 2
2019-10-26 | 1
In MySQL 8.0, This can also be achieved with window functions:
select deliverydate first_deliverydate, count(*) cnt
from (
select deliverydate, row_number() over(partition by memberid order by deliverydate) rn
from orders
) t
where rn = 1
group by deliverydate
order by deliverydate
Demo on DB Fiddle

you have first to figure out when was the first delivery date:
SELECT firstdeliverydate,COUNT(DISTINCT memberid) FROM (
select memberid, min(deliverydate) as firstdeliverydate
from orders
WHERE
MemberId IN (SELECT DISTINCT memberid FROM orders WHERE deliverydate BETWEEN '2019-10-25' AND '2019-10-26')
AND NOT
MemberId In (SELECT DISTINCT memberid FROM orders WHERE deliverydate < '2019-10-25')
group by memberid)
t1
group by firstdeliverydate

Get the first order of each customer with NOT EXISTS and then GROUP BY deliverydate to count the distinct customers who placed their order:
select o.deliverydate, count(distinct o.memberid) counter
from orders o
where not exists (
select 1 from orders
where memberid = o.memberid and deliverydate < o.deliverydate
)
group by o.deliverydate
See the demo.
Results:
| deliverydate | counter |
| ------------------- | ------- |
| 2019-10-21 00:00:00 | 1 |
| 2019-10-24 00:00:00 | 1 |
| 2019-10-25 00:00:00 | 2 |
| 2019-10-26 00:00:00 | 1 |
But if you want results for all the dates in the table including those dates where there where no orders from new customers (so the counter will be 0):
select d.deliverydate, count(distinct o.memberid) counter
from (
select distinct deliverydate
from orders
) d left join orders o
on o.deliverydate = d.deliverydate and not exists (
select 1 from orders
where memberid = o.memberid and deliverydate < o.deliverydate
)
group by d.deliverydate

Retrieve the minimal create date with multiple rows

I have an issue with an SQL query that I am trying to write. I am trying to retrieve the row that has the minimal create_dt for each inst (see table) and amount (which isn't unique).
Unfortunately I can't use group by as the amount column isn't unique.
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company A | 400 | 4545 | 01/11/2018 |
| Company A | 200 | 4545 | 31/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
| Company B | 212 | 4893 | 04/10/2016 |
| Company B | 100 | 4893 | 10/10/2017 |
| Company B | 20 | 4893 | 04/10/2018 |
+--------------+--------+------+-------------+
In the above example I expect to see:
+--------------+--------+------+-------------+
| Company_Name | Amount | inst | Create Date |
+--------------+--------+------+-------------+
| Company A | 1000 | 4545 | 01/10/2018 |
| Company B | 2000 | 4893 | 01/10/2016 |
+--------------+--------+------+-------------+
Code:
SELECT
bill_company, bill_name, account_no
FROM
dbo.customer_information;
SELECT
balance_id, balance_id2, minus_balance,new_balance,
create_date, account_no
FROM
dbo.btr
SELECT
balance_id, balance_id2, expired_Date, amount, balance_type, account_no
FROM
dbo.btr_balance
SELECT
balance_ist, expired_date, account_no, balance_type
FROM
dbo.BALANCE_inst
Retrieve the minimal create data for a balance instance with the lowest balance for a balance inst.
(SELECT
bill_company,
bill_name,
account_no,
balance_ist,
amount,
MIN(create_date)
FROM
dbo.mtr btr
LEFT JOIN
btr_balance btrb ON btr.balance_id = btrb.balance_id
AND btr.balance_id2 = btrb.balance_id2
LEFT JOIN
balance_inst bali ON btr.account_no = bali.account_no
AND btrb.expired_date = bali.expired_date
GROUP BY
bill_company, bill_name, account_no,amount, balance_ist)
I have seen some solutions about using correlated query but can't see to get my head around it.

Common Table Expression (CTE) will help you.
;with cte as (
select *, row_number() over(partition by company_name order by create_date) rn
from dbo.myTable
)
select * from cte
where rn = 1;

use row_number() i assumed bill_company is your company name
select * from
( SELECT bill_company,
bill_name,
account_no,
balance_ist,
amount,
create_date,
row_number() over(partition by bill_company order by create_date) rn
FROM dbo.mtr btr left join btr_balance btrb
on btr.balance_id = btrb.balance_id and btr.balance_id2 = btrb.balance_id2
left join balance_inst bali
on btr.account_no = bali.account_no and btrb.expired_date = bali.expired_date
) t where t.rn=1

SQL sum of different status within 24 hrs group by hours

I am trying to sum status within 24 hour groups by hours. I have an order, order status and status table.
Order Table:
+---------+-------------------------+
| orderid | orderdate |
+---------+-------------------------+
| 1 | 2015-09-16 00:04:19.100 |
| 2 | 2015-09-16 00:01:19.490 |
| 3 | 2015-09-16 00:02:33.733 |
| 4 | 2015-09-16 00:03:58.800 |
| 5 | 2015-09-16 00:01:16.020 |
| 6 | 2015-09-16 00:01:16.677 |
| 7 | 2015-09-16 00:02:06.920 |
+---------+-------------------------+
Order Status Table:
+---------+----------+
| orderid | statusid |
+---------+----------+
| 1 | 11 |
| 2 | 22 |
| 3 | 22 |
| 4 | 11 |
| 5 | 22 |
| 6 | 33 |
| 7 | 11 |
+---------+----------+
Status Table:
+----------+----------+
| statusid | status |
+----------+----------+
| 11 | PVC |
| 22 | CCC |
| 33 | WWW |
| | |
+----------+----------+
I am try to write SQL that display the count of the status within 24 hours for distinct orderids grouped by hour like below:
+------+-----+-----+-----+
| Hour | PVC | CCC | WWW |
+------+-----+-----+-----+
| 1 | 0 | 2 | 1 |
| 2 | 1 | 1 | 0 |
| 3 | 1 | 0 | 0 |
| 4 | 1 | 0 | 0 |
+------+-----+-----+-----+
This is my SQL so far. I am stuck trying to get the sum of each order status:
SELECT
DATEPART(hour, o.orderdate) AS Hour,
SUM(
CASE (
SELECT stat.status
FROM Status stat, orderstatus os
WHERE stat.status IN ('PVC') AND os.orderid = o.id AND os.statusid = stat.id
)
WHEN 'PVC' THEN 1
ELSE 0
END
) AS PVC,
SUM(
CASE (
SELECT stat.status
FROM Status stat, orderstatus os
WHERE stat.status IN ('WWW') AND os.orderid = o.id AND os.statusid = stat.id
)
WHEN 'CCC' THEN 1
ELSE 0
END
) AS CCC,
SUM(
CASE (
SELECT stat.status
FROM Status stat, orderstatus os
WHERE stat.status IN ('CCC') AND os.orderid = o.id AND os.statusid = stat.id)
WHEN 'WWW' THEN 1
ELSE 0
END
) AS WWW
FROM orders o
WHERE o.orderdate BETWEEN DATEADD(d,-1,CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
GROUP BY DATEPART(hour, o.orderdate)
ORDER BY DATEPART(hour, o.orderdate);

Here you go -- I'm ignoring the errors in your data since this will fail if the status table really had duplicate ids like in your example data.
SELECT hour, sum(PVC) as PVC, sum(CCC) as CCC, sum(WWW) as WWW
from (
select datepart(hour,orderdate) as hour,
case when s.status = 'PVC' then 1 else 0 end as PVC,
case when s.status = 'CCC' then 1 else 0 end as CCC,
case when s.status = 'WWW' then 1 else 0 end as WWW
from order o
join orderstatus os on o.orderid = os.orderid
join status s on s.statusid = os.statusid
) sub
group by hour

this should get you closer, then you have to pivot:
SELECT
DATEPART(HOUR,o.orderdate) AS orderDate_hour,
s.status,
COUNT(DISTINCT o.orderid) AS count_orderID
FROM
orders o INNER JOIN
orderstatus os ON
o.orderid = os.orderid INNER JOIN
status s ON
os.statusid = s.statusid
WHERE
o.orderdate >= DATEADD(d,-1,CURRENT_TIMESTAMP)
GROUP BY
DATEPART(HOUR,o.orderdate) , s.status
ORDER BY
DATEPART(HOUR,o.orderdate)
try this for the pivot:
SELECT
*
FROM
(SELECT
DATEPART(HOUR,o.orderdate) AS orderDate_hour,
s.status,
COUNT(DISTINCT o.orderid) AS count_orderID
FROM
orders o INNER JOIN
orderstatus os ON
o.orderid = os.orderid INNER JOIN
status s ON
os.statusid = s.statusid
WHERE
o.orderdate >= DATEADD(d,-1,CURRENT_TIMESTAMP)
GROUP BY
DATEPART(HOUR,o.orderdate) , s.status) s
PIVOT ( MAX(count_orderID) FOR status IN ('pvc','ccc','www')) AS p
ORDER BY
orderDate_hour

Choose column based on max() of another column

Given the data below from the two tables cases and acct_transaction, how can I include just the acct_transaction.create_date of the largest acct_transaction amount whilst also calculating the sum of all amounts and the value of the largest amount? Platform is t-sql.
id amount create_date
---|----------|------------|
1 | 1.99 | 01/09/2009 |
1 | 2.99 | 01/13/2009 |
1 | 578.23 | 11/03/2007 |
1 | 64.57 | 03/03/2008 |
1 | 3.99 | 12/12/2012 |
1 | 31337.00 | 04/18/2009 |
1 | 123.45 | 05/12/2008 |
1 | 987.65 | 10/10/2010 |
Result set should look like this:
id amount create_date sum max_amount max_amount_date
---|----------|------------|----------|-----------|-----------
1 | 1.99 | 01/09/2009 | 33099.87 | 31337.00 | 04/18/2009
1 | 2.99 | 01/13/2009 | 33099.87 | 31337.00 | 04/18/2009
1 | 578.23 | 11/03/2007 | 33099.87 | 31337.00 | 04/18/2009
1 | 64.57 | 03/03/2008 | 33099.87 | 31337.00 | 04/18/2009
1 | 3.99 | 12/12/2012 | 33099.87 | 31337.00 | 04/18/2009
1 | 31337.00 | 04/18/2009 | 33099.87 | 31337.00 | 04/18/2009
1 | 123.45 | 05/12/2008 | 33099.87 | 31337.00 | 04/18/2009
1 | 987.65 | 10/10/2010 | 33099.87 | 31337.00 | 04/18/2009
This is what I have so far, I just don't know how to pull the date of the largest acct_transaction amount for max_amount_date column.
SELECT cases.id, acct_transaction.amount, acct_transaction.create_date AS 'create_date', SUM(acct_transaction.amount) OVER () AS 'sum', MIN(acct_transaction.amount) OVER () AS 'max_amount'
FROM cases INNER JOIN
acct_transaction ON cases.id = acct_transaction.id
WHERE (cases.id = '1')

;WITH x AS
(
SELECT c.id, t.amount, t.create_date,
s = SUM(t.amount) OVER(),
m = MAX(t.amount) OVER(),
rn = ROW_NUMBER() OVER(ORDER BY t.amount DESC)
FROM dbo.cases AS c
INNER JOIN dbo.acct_transaction AS t
ON c.id = t.id
)
SELECT x.id, x.amount, x.create_date,
[sum] = y.s,
max_amount = y.m,
max_amount_date = y.create_date
FROM x CROSS JOIN x AS y WHERE y.rn = 1;

You can just do a full outer join to the table which defines the aggregates:
select id, amount, create_date, x.sum, x.max_amount, x.max_amount_date
from table1
full outer join
(select sum(amount) as sum, max(amount) as max_amount,
(select top 1 create_date from table1 where amount = (select max(amount) from table1)) as max_amount_date
from table1) x
on 1 = 1
SQL Fiddle demo

Try this abomination of a query... I make no claims for its speed or elegance. It's likely I should pray that Cod have mercy on my soul.
Here is the out put of a join on the two tables that you mention but for which you do not provide schemas.
[SQL Fiddle][1]
SELECT A.case_id
,A.trans_id
,A.trans_amount
,A.trans_create_date
,A.trans_type
,B.max_amount
,B.max_amount_date
,E.sum_amount
FROM acct_transaction AS A
INNER JOIN (select C.case_id
,MAX(C.trans_amount) AS max_amount
,C.trans_create_date AS max_amount_date
FROM acct_transaction AS C group by C.case_id, C.trans_create_date ) AS B ON B.case_id = A.case_id
inner JOIN (select D.case_id, SUM(D.trans_amount) AS sum_amount FROM acct_transaction AS D GROUP BY D.case_id) AS E on E.case_id = A.case_id
WHERE (A.case_id = '1') AND (A.trans_type = 'F')
GROUP BY A.case_id

Thanks, that got me on the right track to this which is working:
,CAST((SELECT TOP 1 t2.create_date from acct_transaction t2
WHERE t2.case_sk = act.case_sk AND (t2.trans_type = 'F')
order by t2.amount, t2.create_date DESC) AS date) AS 'max_date'
It won't let me upvote because I have less than 15 rep :(

SQL sub select if exists

I am using SQL Server 2012. I have two tables to hold orders for products. Order which has a received date and OrderItem which has a price and order id fk.
I am trying to write a query get all orders within a date range, group them by the date, and then sum all the order items price to get the total of all orders for that date.
I have this working. Now I want to add another column to select the difference between the total price for that day, and 7 days ago. If there are no orders 7 days ago then the column should be null.
So at the moment I have the below query:
select cast(o.ReceivedDate as date) as OrderDate,
coalesce(count(orderItems.orderId), 0) as Orders,
coalesce(sum(orderItems.Price), 0) as Price
from [Order] o
left outer join (
select o.Id as orderId, sum(ot.Price) as Price
from OrderItem ot
join [Order] o on ot.OrderId = o.Id
where o.ReceivedDate >= #DateFrom and o.ReceivedDate <= #DateTo
group by o.Id
) as orderItems on o.Id = orderItems.orderId
where o.ReceivedDate >= #DateFrom and o.ReceivedDate <= #DateTo
group by cast(o.ReceivedDate as date)
order by cast(o.ReceivedDate as date) desc
So how can I add my other column to this query? I need to do something like:
//pseudo
if o.RecievedDate - 7 exists then orderItems.Price - Price from 7 days ago else null
But I am not sure how to do this? I have created a sqlfiddle to help explain http://sqlfiddle.com/#!6/8b837/1
So from my sample data what I want to achieve is results like this:
| ORDERDATE | ORDERS | PRICE | DIFF7DAYS |
---------------------------------------------
| 2013-01-25 | 3 | 38 | 28 |
| 2013-01-24 | 1 | 12 | null |
| 2013-01-23 | 1 | 10 | null |
| 2013-01-22 | 1 | 33 | null |
| 2013-01-18 | 1 | 10 | null |
| 2013-01-10 | 1 | 3 | -43 |
| 2013-01-08 | 2 | 11 | null |
| 2013-01-04 | 1 | 1 | null |
| 2013-01-03 | 3 | 46 | null |
As you can see, the 25th has a order 7 days ago so the difference is shown. The 24th doesn't so null is displayed.
Any help would be much appreciated.

Not sure why you are using a left outer join between [Orders] table and the subquery as there cannot be orders without order items (in general):
To get your results you could do it in a simplified version as below using a CTE
SQL-FIDDLE-DEMO
;with cte as (
select convert(date,o.ReceivedDate) orderDate,
count(distinct o.Id) as Orders,
coalesce(sum(ot.Price),0) as Price
from OrderItem ot
join [Order] o on ot.OrderId = o.Id
where o.ReceivedDate >= #DateFrom and o.ReceivedDate <= #DateTo
group by convert(date,o.ReceivedDate)
)
select c1.orderDate, c1.Orders, c1.Price, c1.Price-c2.Price DIFF7DAYS
from cte c1 left join cte c2 on dateadd(day,-7,c1.orderdate) = c2.orderdate
order by c1.orderdate desc
| ORDERDATE | ORDERS | PRICE | DIFF7DAYS |
-------------------------------------------
| 2013-01-25 | 3 | 38 | 28 |
| 2013-01-24 | 1 | 12 | (null) |
| 2013-01-23 | 1 | 10 | (null) |
| 2013-01-22 | 1 | 33 | (null) |
| 2013-01-18 | 1 | 10 | (null) |
| 2013-01-10 | 1 | 3 | -43 |
| 2013-01-08 | 2 | 11 | (null) |
| 2013-01-04 | 1 | 1 | (null) |
| 2013-01-03 | 3 | 46 | (null) |

Use a temp table and join it on the datediff.
DECLARE #DateFrom datetime
SET #DateFrom = '2012-12-02'
DECLARE #DateTo datetime
SET #DateTo = '2013-03-13'
CREATE TABLE #temp ( orderdate date, orders int, price money)
INSERT INTO #temp
SELECT cast(o.ReceivedDate AS date) AS OrderDate,
coalesce(count(orderItems.orderId), 0) AS Orders,
coalesce(sum(orderItems.Price), 0) AS Price
FROM [Order] o
LEFT OUTER JOIN (
SELECT o.Id AS orderId, sum(ot.Price) AS Price
FROM OrderItem ot
JOIN [Order] o ON ot.OrderId = o.Id
WHERE o.ReceivedDate >= #DateFrom AND o.ReceivedDate <= #DateTo
GROUP BY o.Id
) AS orderItems ON o.Id = orderItems.orderId
WHERE o.ReceivedDate >= #DateFrom AND o.ReceivedDate <= #DateTo
GROUP BY cast(o.ReceivedDate AS date)
SELECT t1.orderdate, t1.orders, t1.price,
t1.price - t2.price AS diff7days
FROM #temp t1 LEFT JOIN #temp t2
ON datediff(DAY, t2.orderdate, t1.orderdate) = 7
ORDER BY t1.orderdate DESC
http://sqlfiddle.com/#!6/8b837/34

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL joining tables based off latest previous date - sql

SELECT o.cust_id, o.item, o.order_date, m.status FROM customer_order o LEFT JOIN customer_membership m ON o.cust_id = m.cust_id AND o.order_date > m.start_date GROUP BY o.cust_id, o.item, o.order_date HAVING Count(m.status) = 0 OR m.start_date = Max(m.start_date);

Related

Getting date, and count of unique customers when first order was placed

Retrieve the minimal create date with multiple rows

SQL sum of different status within 24 hrs group by hours

Choose column based on max() of another column

SQL sub select if exists

Categories

Resources