Is there a way to get the most recent and the original record with a number of conditions included using SQL - sql

I want to get the most recent data and also the original data for each group in a table but with a set of conditions.
Below is the current structure of dataset/table.
Each group can have multiple items
Each item_id can have the same item_name and these are known as change item_names with one significant difference the (). The number inside defines how many iterations of changes are made.
Each item_id can have multiple status but for the example below it is simplified to only 2 status Draft->Approved.
group
date
item_id
item_name
status
price
stock
A
2022-01-01
36FG-34-45
AB-1234
Draft
15
100
B
2022-01-02
28AE-23-67
CD-4567
Approved
30
120
A
2022-01-05
45RE-12-99
DE-1234
Approved
20
300
C
2022-01-07
78ED-14-88
EA-4532
Draft
10
500
B
2022-01-05
45AB-16-77
CD-4567(1)
Draft
35
200
A
2022-01-03
76JJ-98-66
DE-1234(1)
Approved
50
250
A
2022-02-02
17KL-10-43
DE-1234(2)
Draft
12
400
C
2022-03-03
97EE-42-17
AE-2468
Approved
25
450
The output required: take the most recent item_id for each group & when involved in the change process and the status is not equal to approve then take the most recent item_id that has been approved for each group.
Also to note it won't necessarily be the second most recent record per group that is approved can be further back in the timeline and process.
group
date
item_id
item_name
status
price
stock
original_item_id
original_item_name
original_status
original_price
original_stock
A
2022-02-02
17KL-10-43
DE-1234(2)
Draft
12
400
76JJ-98-66
DE-1234(1)
Approved
50
250
B
2022-01-05
28AE-23-67
CD-4567(1)
Draft
35
200
45AB-16-77
CD-4567
Approved
30
120
C
2022-03-03
97EE-42-17
AE-2468
Approved
25
450
NULL
NULL
NULL
NULL
NULL

Your example output for group A shows the original item name (most recent item name for that group that was approved) as DE-1234(1). This has a date of 1/3/2022, however item name DE-1234 has a date of 1/5/2022 making it the most recent item id that was approved from group A. Because of that, my output differs from yours for that reference.
Here is a link to the SQL Fiddle where I recreated this.
Here is the query I created for this:
First we create a CTE that ranks your items by group to get the most recent per group.
WITH cte AS--rank records by group ordered by date DESC
(
SELECT
[group]
,[date]
,item_id
,item_name
,status
,price
,stock
,ROW_NUMBER() OVER (PARTITION BY [group] ORDER BY [date] DESC) AS rn
FROM t
)
Then we get filter the CTE to only approved and re-rank to get the most recently approved by group.
,cte2 AS--rank joined records where status = approved by group ordered by date DESC
(
SELECT
a.[group]
,a.[date]
,a.item_id
,a.item_name
,a.status
,a.price
,a.stock
,b.[group] AS original_group
,b.[date] AS original_date
,b.item_id AS original_item_id
,b.item_name AS original_item_name
,b.status AS original_status
,b.price AS original_price
,b.stock AS original_stock
,ROW_NUMBER() OVER (PARTITION BY a.[group] ORDER BY b.rn) rn--get most recent record that was approved
FROM cte a
LEFT JOIN cte b ON
a.[group] = b.[group]
AND b.rn > a.rn--b is a previous record
AND b.status = 'Approved'
WHERE a.rn = 1--Most recent item id
)
Lastly, we query cte2 filtering for only the most recent record that was approved
SELECT
[group]
,[date]
,item_id
,item_name
,status
,price
,stock
--,original_group
--,original_date
,original_item_id
,original_item_name
,original_status
,original_price
,original_stock
FROM cte2
WHERE rn = 1--filter for most recent record that was approved

Related

Subquery in BigQuery (JOIN on same Table)

I have a BigQuery table with this data
client spent balance date
A 20 500 2022-01-01
A 10 490 2022-01-02
A 50 440 2022-01-03
B 200 1000 1995-07-09
B 300 700 1998-08-11
B 100 600 2002-04-17
C 2 100 2021-01-04
C 10 90 2021-06-06
C 70 20 2021-10-07
I need the latest balance of each client based on the date:
client spent balance date
A 50 440 2022-01-03
B 100 600 2002-04-17
C 70 20 2021-10-07
distinct does not work like in sql and group on client does also not work because I need count, sum, etc. with the other columns when I use group.
For just one client I use:
SELECT balance FROM `table` WHERE client = "A" ORDER BY date DESC LIMIT 1.
But how can I get this data for every client in just one statement.
I tried with subselect
SELECT client,
(SELECT balance FROM ` table ` WHERE client = tb. client ORDER by date DESC limit 1) AS bal
FROM `table` AS tb;
and got the error:
Correlated subqueries that reference other tables are not supported
unless they can be de-correlated, such as by transforming them into an
efficient JOIN.
I don’t know how to make a JOIN out of this subquery to make it work.
Hope you have an idea.
Use below
select * from your_table
qualify 1 = row_number() over(partition by client order by date desc)
if applied to sample data in your question - output is
have you tried using row_number window function?
select client, spent, balance, date
from (
select client, spent, balance, date
, ROW_NUMBER() OVER (PARTITION BY client ORDER BY date DESC) AS row_num -- adding row number, starting from latest date
from table
)
where row_num = 1 -- filter out only the latest date

SQL JOIN - retrieve MAX DateTime from second table and the first DateTime after previous MAX for other value

I have issue with creating a proper SQL expression.
I have table TICKET with column TICKETID
TICKETID
1000
1001
I then have table STATUSHISTORY from where I need to retrieve what was the last time (maximum time) when that ticket entered VENDOR status (last VENDOR status) and when it exited VENDOR status (by exiting VENDOR status I mean the first next INPROG status, but only first INPROG after the VENDOR status, it's always INPROG the next status after VENDOR status). Also it is also possible that VENDOR status for ID does not exist at all in STATUSHISOTRY (then nulls should be returned), but INPROG exists always - it can be before but also and after VENDOR status, if ID is not anymore in VENDOR status.
Here is the example of STATUSHISTORY.
ID TICKETID STATUS DATETIME
1 1000 INPROG 01.01.2017 10:00
2 1000 VENDOR 02.01.2017 10:00
3 1000 INPROG 03.01.2017 10:00
4 1000 VENDOR 04.01.2017 10:00
5 1000 INPROG 05.01.2017 10:00
6 1000 HOLD 06.01.2017 10:00
7 1000 INPROG 07.01.2017 10:00
8 1001 INPROG 02.02.2017 10:00
9 1001 VENDOR 03.02.2017 10:00
10 1001 INPROG 04.02.2017 10:00
11 1001 VENDOR 05.02.2017 10:00
So the result when doing the query from TICKET table and doing the JOIN with table STATUSHISTORY should be:
ID VENDOR_ENTERED VENDOR_EXITED
1000 04.01.2017 10:00 05.01.2017 10:00
1001 05.02.2017 10:00 null
Because for ID 1000 last VENDOR status was at 04.01.2017 and the first INPROG status after the VENDOR status for that ID was at 05.01.2017 while for ID 1001 the last VENDOR status was at 05.02.2017 and after that INPROG status did not happen yet.
If VENDOR did not exist then both columns should be null in result.
I am really stuck with this, trying different JOINs but without any progress.
Thank you in advance if you can help me.
You can do this with window functions. First, assign a "vendor" group to the tickets. You can do this using a cumulative sum counting the number of "vendor" records on or before each record.
Then, aggregate the records to get one record per "vendor" group. And use row numbers to get the most recent records. So:
with vg as (
select ticket,
min(datetime) as vendor_entered,
min(case when status = 'INPROG' then datetime end) as vendor_exitied
from (select sh.*,
sum(case when status = 'VENDOR' then 1 else 0 end) over (partition by ticketid order by datetime) as grp
from statushistory sh
) sh
group by ticket, grp
)
select vg.tiketid, vg.vendor_entered, vg.vendor_exited
from (select vg.*,
row_number() over (partition by ticket order by vendor_entered desc) as seqnum
from vg
) vg
where seqnum = 1;
You can aggregate to get max time, then join onto all of the date values higher than that time, and then re-aggregate:
select a.TicketID,
a.VENDOR_ENTERED,
min( EXIT_TIME ) as VENDOR_EXITED
from (
select TicketID,
max( DATETIME ) as VENDOR_ENTERED
from StatusHistory
where Status = 'VENDOR'
group by TicketID
) as a
left join
(
select TicketID,
DATETIME as EXIT_TIME
from StatusHistory
where Status = 'INPROG'
) as b
on a.TicketID = b.TicketID
and EXIT_TIME >= a.VENDOR_ENTERED
group by a.TicketID,
a.VENDOR_ENTERED
DB2 is not supported in SQLfiddle, but a standard SQL example can be found here.

SQL select specific group from table

I have a table named trades like this:
id trade_date trade_price trade_status seller_name
1 2015-01-02 150 open Alex
2 2015-03-04 500 close John
3 2015-04-02 850 close Otabek
4 2015-05-02 150 close Alex
5 2015-06-02 100 open Otabek
6 2015-07-02 200 open John
I want to sum up trade_price grouped by seller_name when last (by trade_date) trade_status was 'open'. That is:
sum_trade_price seller_name
700 John
950 Otabek
The rows where seller_name is Alex are skipped because the last trade_status was 'close'.
Although I can get desirable output result with the help of nested select
SELECT SUM(t1.trade_price), t1.seller_name
WHERE t1.seller_name NOT IN
(SELECT t2.seller_name FROM trades t2
WHERE t2.seller_name = t1.seller_name AND t2.trade_status = 'close'
ORDER BY t2.trade_date DESC LIMIT 1)
from trades t1
group by t1.seller_name
But it takes more than 1 minute to execute above query (I have approximately 100K rows).
Is there another way to handle it?
I am using PostgreSQL.
I would approach this with window functions:
SELECT SUM(t.trade_price), t.seller_name
FROM (SELECT t.*,
FIRST_VALUE(trade_status) OVER (PARTITION BY seller_name ORDER BY trade_date desc) as last_trade_status
FROM trades t
) t
WHERE last_trade_status <> 'close;
GROUP BY t.seller_name;
This should perform reasonably with an index on seller_name
select
sum(trade_price) as sum_trade_price,
seller_name
from
trades
inner join
(
select distinct on (seller_name) seller_name, trade_status
from trades
order by seller_name, trade_date desc
) s using (seller_name)
where s.trade_status = 'open'
group by seller_name

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.
In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.
Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.
in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id
So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID

Find the the value of one field that matches the maximum value of data in another field

I'm trying to write a query that gets the value of one field that's associated with the maximum value of another field (or fields). Let's say I have the following table of data:
OrderID CustomerID OrderDate LocationID
1 4 1/1/2001 1001
2 4 1/2/2001 1003
3 4 1/3/2001 1001
4 5 1/4/2001 1001
5 5 1/5/2001 1001
6 5 1/6/2001 1003
7 5 1/7/2001 1002
8 5 1/8/2001 1003
9 5 1/8/2001 1002
Grouping by CustomerID, I want to get the maximum OrderDate and then the LocationID associated with whatever is the maximum OrderDate. If there are several records that share the maximum order date, then take the LocationID associated with the maximum OrderID from among those records with the maximum date.
The final set of data should look like this:
CustomerID OrderDate LocationID
4 1/3/2001 1001
5 1/8/2001 1002
I had been trying to write a query with lots of nested subqueries and ugly joins, but I'm not really getting anywhere. What SQL do I need to write to help me get this result.
with cte As
(
select *,
row_number() over (partition by CustomerID
order by OrderDate desc, OrderId desc) as rn
from yourtable
)
select CustomerID, OrderDate,LocationID
from cte
where rn=1;
SELECT
C.Name,
C.CustomerID,
X.*
FROM
Customers C
CROSS APPLY (
SELECT TOP 1 OrderDate, LocationID
FROM Orders O
WHERE C.CustomerID = O.CustomerID
ORDER BY OrderDate Desc, OrderID Desc
) X
If you will pull any columns from the Customers table, this will probably outperform other methods.
If not, then the Row_Number answer, pulling only from Orders, will probably be best. But if you restrict by Customer in any way, then the CROSS APPLY will again be best. Possibly by a big margin.
The trick is to use a subquery as a value, not as a join:
select customerId,orderDate,locationId
from orders o1
where orderDate = (
select top 1 orderdate
from orders o2
where o1.customerId = o2.customerId
order by orderdate desc
)