How to select the latest date for each group by number? - sql

I've been stuck on this question for a while, and I was wondering if the community would be able to direct me in the right direction?
I have some tag IDs that needs to be grouped, with exceptions (column: deleted) that need to be retained in the results. After which, for each grouped tag ID, I need to select the one with the latest date. How can I do this? An example below:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
4 | 400 | 05/01/20 | null
5 | 400 | 04/01/20 | null
6 | 500 | 03/01/20 | null
7 | 500 | 02/01/20 | null
I am trying to reach this outcome:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
6 | 500 | 03/01/20 | null
So, firstly if there is a date in the "DELETED" column, I would like the row to be present. Secondly, for each unique tag ID, I would like the row with the latest "DATE" to be present.
Hopefully this question is clear. Would appreciate your feedback and help! A big thanks in advance.

Your results seem to be something like this:
select t.*
from (select t.*,
row_number() over (partition by tag_id, deleted order by date desc) as seqnum
from t
) t
where seqnum = 1 or deleted is not null;
This takes one row where deleted is null -- the most recent row. It also keeps each row where deleted is not null.

You need 2 conditions combined with OR in the WHERE clause:
the 1st is deleted is not null, or
the 2nd that there isn't any other row with the same tag_id and date later than the current row's date, meaning that the current row's date is the latest:
select t.* from tablename t
where t.deleted is not null
or not exists (
select 1 from tablename
where tag_id = t.tag_id and date > t.date
)
See the demo.
Results:
| id | tag_id | date | deleted |
| --- | ------ | ---------- | -------- |
| 1 | 300 | 2020-05-01 | |
| 2 | 300 | 2020-03-01 | 04/01/20 |
| 3 | 400 | 2020-06-01 | |
| 6 | 500 | 2020-03-01 | |

Related

How do i get the latest user udpated column value in a table based on timestamp entry on a different table in SQL Server?

I have a temp table #StatusInfo with the following data
+---------+--------------+-------+-------------------------+--+
| OrderNo | GroupLineNum | Type1 | UpdateDate | |
+---------+--------------+-------+-------------------------+--+
| Order85 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | 2 | 2 | 2019-11-25 05:32:23.773 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | 1 | 2 | 2019-11-25 05:43:37.637 | | B
+---------+--------------+-------+-------------------------+--+
| Order87 | 2 | 2 | 2019-11-25 05:42:32.390 | | A
+---------+--------------+-------+-------------------------+--+
| Order88 | NULL | 1 | 2019-11-25 06:35:13.000 | |
+---------+--------------+-------+-------------------------+--+
| Order88 | 1 | 2 | 2019-11-25 06:39:16.170 | |
+---------+--------------+-------+-------------------------+--+
Any update the user does on an order will be pulled into this temp table. Type 1 column with value 2 denotes a 'Required Date' field change by the user. The timestamp when the user made the change is the last column.
I have another temp table #LineInfo with the following data. This table is created by joining other tables and a left join with the above table too. The 'LineNum' column from below table will match the 'GroupLineNum' column in the above table for Type1=2
+---------+-----------+---------+------------+-------------------------+-------+
| OrderNo | RowNumber | LineNum | TotalCost | ReqDate | Type1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 2 | 2 | 265.560000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 2 | 2 | 265.560000 | 2019-12-28 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 1 | 1 | 309.110000 | 2020-01-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 2 | 2 | 265.560000 | 2020-01-01 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 1 | 1 | 309.110000 | 2019-11-29 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 2 | 2 | 265.560000 | 2019-12-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
I will be joining #lineInfo with other tables to generate a new table with only one record for an orderno. Its grouped by orderno.
What I need to do is ensure that the new selectquery will have a column 'ReqDate' which will be the latest ReqDate value for the order.
For example, Order87 has two lines in the order. User updated Line 2 first at '2019-11-25 05:42:32.390' as seen in the row marked 'A' followed by Line 1 marked B # '2019-11-25 05:43:37.637 ' from the first table.
The new query should have the data from LineInfo and only the 'ReqDate' value matching the 'LineNum' that has the maximum of 'UpdateDate' column for Type1=2 and group by orderno.
So in our example, the output should have the ReqDate value '2020-01-31 23:59:00.000'.
In short, an order should have the most recently updated required date. Order can have multiple line items where reqdate is udpated. If there is no entry in #StatusInfo table with Type2 for an order, then any one of the ReqDate value from the #LineInfo table will suffice. Maybe the first line
I wrote something like this but it doesnt pull orders without any entry in StatusInfo table. Those orders will have a default value even though user didnt udpate and i am not sure how to join the result of this with LineInfo table to set the latest value
Select SIT.Orderno, max_date,grouplinenum
from #StatusInfo SIT
inner join
(SELECT Orderno, MAX(ActDate) as max_date
FROM #StatusInfo SI
WHERE SI.Type1=2
GROUP BY SI.Orderno)a
on a.Orderno = SIT.Orderno and a.max_date = SIT.ActDate
This is what I did. I created the blow CTE to load orders with req date change in order of Updated date and assigned it row number. Record with row number 1 will be the most recently updated date
;WITH cteLatestReqDate AS ( --We need to pull the latest ReqDate value the user set. So we are are ordering the SIT table by ActDate and assigning a row number and respective line's required date here
SELECT SIT.OrderNo, SIT.UpdateDate, SIT.GroupLineNum, LLI.ReqDate,
ROW_NUMBER() OVER (PARTITION BY SIT.OrderNo ORDER BY ActDate DESC) AS RowNum
FROM #StatusInfo SIT INNER JOIN #LineLevelInfo LLI ON SIT.OrderNo = OI.OrderNo AND SIT.GroupLineNum = LLI.LineNum
WHERE SIT.Type1 = 2
)
and then I added the below condition to my select query. Below select query is partial
SELECT
CASE WHEN MAX(LRD.ReqDate) IS NULL THEN CAST(FORMAT(MAX(LLI.ReqDate), 'yyMMdd') AS NVARCHAR(10))
ELSE CAST(FORMAT(MAX(LRD.ReqDate), 'yyMMdd') AS NVARCHAR(10)) END AS LatestReqDate
FROM #LineLevelInfo LLI
LEFT JOIN(SELECT * FROM cteLatestReqDate WHERE RowNum = 1)LRD ON LRD.OrderNo = LLI.OrderNo And LRD.GroupLineNum = LLI.LineNum

How do I do multiple selection based on a flowchart of criteria?

Table name: Copies
+------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value | most_recent |
+----------------------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 900 | 2 | null | Y | 3 | Oct 16 |
| 900 | 3 | null | N | 9 | Oct 16 |
| 901 | 4 | 378 | Y | 3 | Oct 16 |
| 901 | 5 | null | N | 2 | Oct 16 |
| 902 | 6 | null | N | 5 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
| 903 | 9 | null | Y | 3 | May16 |
| 904 | 10 | null | N | 0 | May 16 |
| 904 | 11 | null | N | 0 | May16
--------------------------------------------------------------------------------------
Output table
+---------------------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value |most_recent|
+----------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
---------------------------------------------------------------------------------------------------------
Hi all, I need help with a query that returns one record within a group based on the importance of the field. The importance is ranked as follows:
previous- if one record within the group_id is not null, then neither record within a group_id is returned (because according to our rules, all records within a group should have the same previous value)
in_this- If one record is Y, and the other is N within a group_id, then we keep the Y; If all records are Y or all are N, then we move to the next attribute
Higher_value- If all records in the ‘in_this’ field are equal, then we need to select the record with the greater value from this field. If both records have an equal value, we move to the next attribute
Most_recent- If all records were of equal value in the ‘higher_value’ field, then we consider the newest record. If these are equal, then nothing is returned.
This is a simplified version of the table I am looking at, but I just would like to get the gist of how something like this would work. Basically, my table has multiple copies of records that have been grouped through some algorithm. I have been tasked with selecting which of these records within a group is the ‘good’ one, and we are basing this on these fields.
I’d like the output to actually show all fields, because I will likely attempt to refine the query to include other fields (there are over 40 to consider), but the most important is the group_id and my_id fields. It would be neat if we could also somehow flag why each record got picked, but that isn’t necessary.
It seems like something like this should be easy, but I have a hard time wrapping my head around how to pick from within a group_id. Thanks for your help.
You can use analytic functions for this. The trick is establishing the right variables for each condition:
select t.*
from (select t.*,
max(in_this) over (partition by group_id) as max_in_this,
min(higher_value) over (partition by group_id) as min_higher_value,
max(higher_value) over (partition by group_id) as max_higher_value,
row_number() over (partition by group_id, higher_value order by my_id) as seqnum_ghv,
min(most_recent) over (partition by group_id) as min_most_recent,
max(most_recent) over (partition by group_id) as max_most_recent,
row_number() over (partition by group_id order by most_recent) as seqnum_mr
from t
) t
where max_in_this is not null and
( (min_higher_value <> max_higher_value and seqnum_ghv = 1) or
(min_higher_value = max_higher_value and min_most_recent <> max_most_recent and seqnum_mr = 1
)
);
The third condition as stated makes no sense, but you should get the idea for how to implement this.

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product

Filter by value in last row of LEFT OUTER JOIN table

I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo

SQL Query to Join Two Tables Based On Closest Timestamp

I need to retrieve the records from dbo.transaction (transaction of all users-more than one transaction for each user) that having timestamp which is closest to the time in dbo.bal (current balance details of each user-only one record for each user)
ie, the resultant records should equal to the no of records in the dbo.bal
Here i tried the below query, am getting only the records less than the time in dbo.bal. But there are some record having timestamp greater than and closest to dbo.bal.time
SELECT dbo.bal.uid,
dbo.bal.userId,
dbo.bal.balance,
dbo.bal.time,
(SELECT TOP 1 transactionBal
FROM dbo.transaction
WHERE TIMESTAMP <= dbo.bal.time
ORDER BY TIMESTAMP DESC) AS newBal
FROM dbo.bal
WHERE dbo.bal.time IS NOT NULL
ORDER BY dbo.bal.time DESC
here is my table structure,
dbo.transaction
---------------
| uid| userId | description| timestamp | credit | transactionBal
-------------------------------------------------------------------------
| 1 | 101 | buy credit1| 2012-01-25 03:23:31.624 | 100 | 500
| 2 | 102 | buy credit5| 2012-01-18 03:13:12.657 | 500 | 700
| 3 | 103 | buy credit3| 2012-01-15 02:16:34.667 | 300 | 300
| 4 | 101 | buy credit2| 2012-01-13 05:34:45.637 | 200 | 300
| 5 | 101 | buy credit1| 2012-01-12 07:45:21.457 | 100 | 100
| 6 | 102 | buy credit2| 2012-01-01 08:18:34.677 | 200 | 200
dbo.bal
-------
| uid| userId | balance | time |
-----------------------------------------------------
| 1 | 101 | 500 | 2012-01-13 05:34:45.645 |
| 2 | 102 | 700 | 2012-01-01 08:18:34.685 |
| 3 | 103 | 300 | 2012-01-15 02:16:34.672 |
And the result should be like,
| Id | userId | balance | time | credit | transactionBal
-----------------------------------------------------------------------------
| 1 | 101 | 500 | 2012-01-13 05:34:45.645 | 200 | 300
| 2 | 102 | 700 | 2012-01-01 08:18:34.685 | 200 | 200
| 3 | 103 | 300 | 2012-01-15 02:16:34.672 | 300 | 300
Please help me.. Any help is must appreciated...Thankyou
It would be helpful if you posted your table structures, but ...
I think your inner query needs a join condition. (That is not actually in your question)
Your ORDER BY clause in the inner query could be ABS(TIMESTAMP - DB0.BAL.TIME). That should give you the smallest difference between the 2.
Does that help ?
Based on the follwing Sql Fiddle http://sqlfiddle.com/#!3/7a900/15 I came up with ...
SELECT
bal.uid,
bal.userId,
bal.balance,
bal.time,
trn.timestamp,
trn.description,
datediff(ms, bal.time, trn.timestamp)
FROM
money_balances bal
JOIN money_transaction trn on
trn.userid = bal.userid and
trn.uid =
(
select top 1 uid
from money_transaction trn2
where trn2.userid = trn.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
)
WHERE
bal.time IS NOT NULL
ORDER BY
bal.time DESC
I cannot vouch for its performance because I know nothing of your data, but I believe it works.
I have simplified my answer - I believe what you need is
SELECT
bal.uid as baluid,
(
select top 1 uid
from money_transaction trn2
where trn2.userid = bal.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
) as tranuid
FROM
money_balances bal
and from that you can derive all the datasets you need.
for example :
with matched_credits as
(
SELECT
bal.uid as baluid,
(
select top 1 uid
from money_transaction trn2
where trn2.userid = bal.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
) as tranuid
FROM
money_balances bal
)
select
*
from
matched_credits mc
join money_balances mb on
mb.uid = mc.baluid
join money_transaction trn on
trn.uid = mc.tranuid
Try:
SELECT dbo.bal.uid,
dbo.bal.userId,
dbo.bal.balance,
dbo.bal.time,
(SELECT TOP 1 transactionBal
FROM dbo.transaction
ORDER BY abs(datediff(ms, dbo.bal.time, TIMESTAMP))) AS newBal
FROM dbo.bal
WHERE dbo.bal.time IS NOT NULL
ORDER BY dbo.bal.time DESC