How can I return rows that meet criteria for occurring in one day, but over a date range? - sql

I have a query (shown below) that returns all rows for UserID that have :
a JOIN,
a subsequent CANCEL, and then
a subsequent JOIN
But: I need to return UserIDs that meet this criteria of having a JOIN,CANCEL, then JOIN in sequence ON THE SAME DAY, but for a date range: for example BETWEEN 2016-11-01 and 2016-11-30. So in the example table below, UserIDs 12345, 9876, and 33445 would be returned.
I'm not sure how this is achieved - would be involve some sort of grouping on the timestamp date? Would a stored procedure that iterates over conditional tests for UserID and ActionType be a viable solution?
+--------+--------+----------------------+------------+------------------+
| rownum | UserID | Timestamp | ActionType | Return in query? |
+--------+--------+----------------------+------------+------------------+
| 1 | 12345 | 2016-11-01 08:25:39 | JOIN | yes |
| 2 | 12345 | 2016-11-01 08:27:00 | NULL | yes |
| 3 | 12345 | 2016-11-01 08:28:20 | DOWNGRADE | yes |
| 4 | 12345 | 2016-11-01 08:31:34 | NULL | yes |
| 5 | 12345 | 2016-11-01 08:32:44 | CANCEL | yes |
| 6 | 12345 | 2016-11-01 08:45:51 | NULL | yes |
| 7 | 12345 | 2016-11-01 08:50:57 | JOIN | yes |
| 1 | 9876 | 2016-11-01 16:05:42 | JOIN | yes |
| 2 | 9876 | 2016-11-01 16:07:33 | CANCEL | yes |
| 3 | 9876 | 2016-11-01 16:09:09 | JOIN | yes |
| 1 | 56565 | 2016-11-01 18:15:16 | JOIN | no |
| 2 | 56565 | 2016-11-01 19:22:25 | CANCEL | no |
| 3 | 56565 | 2016-11-01 20:05:05 | CANCEL | no |
| 1 | 34343 | 2016-11-01 05:32:56 | JOIN | no |
NEXT DAY
| 1 | 7878 | 2016-11-02 10:05:04 | JOIN | no |
| 2 | 7878 | 2016-11-02 10:06:06 | JOIN | no |
| 1 | 33445 | 2016-11-02 02:33:34 | JOIN | yes |
| 2 | 33445 | 2016-11-02 02:33:34 | NULL | yes |
| 3 | 33445 | 2016-11-02 02:37:56 | CANCEL | yes |
| 4 | 33445 | 2016-11-02 02:38:01 | JOIN | yes |
+--------+--------+----------------------+------------+------------------+
Here is a link to the question which led me to the query that pulls data for exactly one day (not a range): How can I return rows that meet a specific sequence of events?
Here is the query:
SELECT *
FROM T
WHERE USERID IN (
select distinct userid
from t first_join
inner join t cancel
on first_join.tmstmp < cancel.tmstp
and first_join.userid = cancel.userid
inner join t.second_join
on second_join.tmstmp > cancel.tmstp
and second_join.userid = cancel.userid
where first_join.actiontype = 'JOIN'
and cancel.actiontype = 'CANCEL'
and second_join.actiontype = 'JOIN'
)
Clarification of comments/questions:
vkp:
QUESTION: can join,cancel,join be on different days with other values in between? ANSWER: No, I need to find the join>cancel>join that occur in one day only. If there is a join on 11/1 and a cancel on 11/2, that UserID does not need to be returned.
QUESTION: if a particular date satisfies cancel,join,cancel in the date range, will that be enough for a user to be included in the results? ANSWER: No, I am specifically looking at rows that meet the ActionType sequence in one day, not over a range of days.
THANK YOU!

To get all the users and the days when they have the specified sequence of events happen, use
select distinct userid,tmstamp::date
from t
where ActionType = 'CANCEL' and tmstamp::date between date '2016-11-01' and date '2016-11-30' and
exists (select 1
from t t2
where t2.userId = t.userId and
t2.actiontype = 'JOIN' and
t2.tmstamp < t.tmstamp and
t2.tmstamp::date = t.tmstamp::date
) and
exists (select 1
from t t3
where t3.userId = t.userId and
t3.actiontype = 'JOIN' and
t3.tmstamp > t.tmstamp and
t3.tmstamp::date = t.tmstamp::date
)
To get all the rows for such users on those days, wrap the previous query as a subquery against the original table.
select * from t where (userid,tmstamp::date) in (
select distinct userid,tmstamp::date
from t
where ActionType = 'CANCEL'
and tmstamp::date between date '2016-11-01' and date '2016-11-30' and
exists (select 1
from t t2
where t2.userId = t.userId and
t2.actiontype = 'JOIN' and
t2.tmstamp < t.tmstamp and
t2.tmstamp::date = t.tmstamp::date
) and
exists (select 1
from t t3
where t3.userId = t.userId and
t3.actiontype = 'JOIN' and
t3.tmstamp > t.tmstamp and
t3.tmstamp::date = t.tmstamp::date
)
)
Sample Demo
Note that this is a minor tweak to #Gordon's query (to check for these sequence of events on a particular day) in your previous question which i felt was the best.
Edit: An alternate approach with window functions
select * from t where (userid,tmstamp::date) in (
select distinct userid,tmstamp::date from (
select t.*
,min(case when actiontype = 'JOIN' then 1 else 2 end) over(partition by t.userid,t.tmstamp::date order by t.tmstamp rows between unbounded preceding and 1 preceding) min_before
,min(case when actiontype = 'JOIN' then 1 else 2 end) over(partition by t.userid,t.tmstamp::date order by t.tmstamp rows between 1 following and unbounded following) min_after
from (select userid,tmstamp from t where actiontype='CANCEL') tc
join t on t.userid=tc.userid and t.tmstamp::date=tc.tmstamp::date
) x
where min_before=1 and min_after=1
)
1) Using a case expression we designate all actiontype JOIN rows as 1 and 2 for all other actiontypes.
2) We join it with the actiontype CANCEL rows.
3) Then we check for the minimum value before CANCEL and minimum value after CANCEL for each date and userid combination. Per the case expression defined, it should be 1.
4) Get all such dates and userid's and fetch the corresponding rows.

Related

How to select the latest date for each group by number?

I've been stuck on this question for a while, and I was wondering if the community would be able to direct me in the right direction?
I have some tag IDs that needs to be grouped, with exceptions (column: deleted) that need to be retained in the results. After which, for each grouped tag ID, I need to select the one with the latest date. How can I do this? An example below:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
4 | 400 | 05/01/20 | null
5 | 400 | 04/01/20 | null
6 | 500 | 03/01/20 | null
7 | 500 | 02/01/20 | null
I am trying to reach this outcome:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
6 | 500 | 03/01/20 | null
So, firstly if there is a date in the "DELETED" column, I would like the row to be present. Secondly, for each unique tag ID, I would like the row with the latest "DATE" to be present.
Hopefully this question is clear. Would appreciate your feedback and help! A big thanks in advance.
Your results seem to be something like this:
select t.*
from (select t.*,
row_number() over (partition by tag_id, deleted order by date desc) as seqnum
from t
) t
where seqnum = 1 or deleted is not null;
This takes one row where deleted is null -- the most recent row. It also keeps each row where deleted is not null.
You need 2 conditions combined with OR in the WHERE clause:
the 1st is deleted is not null, or
the 2nd that there isn't any other row with the same tag_id and date later than the current row's date, meaning that the current row's date is the latest:
select t.* from tablename t
where t.deleted is not null
or not exists (
select 1 from tablename
where tag_id = t.tag_id and date > t.date
)
See the demo.
Results:
| id | tag_id | date | deleted |
| --- | ------ | ---------- | -------- |
| 1 | 300 | 2020-05-01 | |
| 2 | 300 | 2020-03-01 | 04/01/20 |
| 3 | 400 | 2020-06-01 | |
| 6 | 500 | 2020-03-01 | |

How do i get the latest user udpated column value in a table based on timestamp entry on a different table in SQL Server?

I have a temp table #StatusInfo with the following data
+---------+--------------+-------+-------------------------+--+
| OrderNo | GroupLineNum | Type1 | UpdateDate | |
+---------+--------------+-------+-------------------------+--+
| Order85 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order86 | 2 | 2 | 2019-11-25 05:32:23.773 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | NULL | 1 | 2019-11-25 05:15:55.000 | |
+---------+--------------+-------+-------------------------+--+
| Order87 | 1 | 2 | 2019-11-25 05:43:37.637 | | B
+---------+--------------+-------+-------------------------+--+
| Order87 | 2 | 2 | 2019-11-25 05:42:32.390 | | A
+---------+--------------+-------+-------------------------+--+
| Order88 | NULL | 1 | 2019-11-25 06:35:13.000 | |
+---------+--------------+-------+-------------------------+--+
| Order88 | 1 | 2 | 2019-11-25 06:39:16.170 | |
+---------+--------------+-------+-------------------------+--+
Any update the user does on an order will be pulled into this temp table. Type 1 column with value 2 denotes a 'Required Date' field change by the user. The timestamp when the user made the change is the last column.
I have another temp table #LineInfo with the following data. This table is created by joining other tables and a left join with the above table too. The 'LineNum' column from below table will match the 'GroupLineNum' column in the above table for Type1=2
+---------+-----------+---------+------------+-------------------------+-------+
| OrderNo | RowNumber | LineNum | TotalCost | ReqDate | Type1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order85 | 2 | 2 | 265.560000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 1 | 1 | 309.110000 | 2019-10-30 23:59:00.000 | 1 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order86 | 2 | 2 | 265.560000 | 2019-12-28 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 1 | 1 | 309.110000 | 2020-01-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order87 | 2 | 2 | 265.560000 | 2020-01-01 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 1 | 1 | 309.110000 | 2019-11-29 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
| Order88 | 2 | 2 | 265.560000 | 2019-12-31 23:59:00.000 | 2 |
+---------+-----------+---------+------------+-------------------------+-------+
I will be joining #lineInfo with other tables to generate a new table with only one record for an orderno. Its grouped by orderno.
What I need to do is ensure that the new selectquery will have a column 'ReqDate' which will be the latest ReqDate value for the order.
For example, Order87 has two lines in the order. User updated Line 2 first at '2019-11-25 05:42:32.390' as seen in the row marked 'A' followed by Line 1 marked B # '2019-11-25 05:43:37.637 ' from the first table.
The new query should have the data from LineInfo and only the 'ReqDate' value matching the 'LineNum' that has the maximum of 'UpdateDate' column for Type1=2 and group by orderno.
So in our example, the output should have the ReqDate value '2020-01-31 23:59:00.000'.
In short, an order should have the most recently updated required date. Order can have multiple line items where reqdate is udpated. If there is no entry in #StatusInfo table with Type2 for an order, then any one of the ReqDate value from the #LineInfo table will suffice. Maybe the first line
I wrote something like this but it doesnt pull orders without any entry in StatusInfo table. Those orders will have a default value even though user didnt udpate and i am not sure how to join the result of this with LineInfo table to set the latest value
Select SIT.Orderno, max_date,grouplinenum
from #StatusInfo SIT
inner join
(SELECT Orderno, MAX(ActDate) as max_date
FROM #StatusInfo SI
WHERE SI.Type1=2
GROUP BY SI.Orderno)a
on a.Orderno = SIT.Orderno and a.max_date = SIT.ActDate
This is what I did. I created the blow CTE to load orders with req date change in order of Updated date and assigned it row number. Record with row number 1 will be the most recently updated date
;WITH cteLatestReqDate AS ( --We need to pull the latest ReqDate value the user set. So we are are ordering the SIT table by ActDate and assigning a row number and respective line's required date here
SELECT SIT.OrderNo, SIT.UpdateDate, SIT.GroupLineNum, LLI.ReqDate,
ROW_NUMBER() OVER (PARTITION BY SIT.OrderNo ORDER BY ActDate DESC) AS RowNum
FROM #StatusInfo SIT INNER JOIN #LineLevelInfo LLI ON SIT.OrderNo = OI.OrderNo AND SIT.GroupLineNum = LLI.LineNum
WHERE SIT.Type1 = 2
)
and then I added the below condition to my select query. Below select query is partial
SELECT
CASE WHEN MAX(LRD.ReqDate) IS NULL THEN CAST(FORMAT(MAX(LLI.ReqDate), 'yyMMdd') AS NVARCHAR(10))
ELSE CAST(FORMAT(MAX(LRD.ReqDate), 'yyMMdd') AS NVARCHAR(10)) END AS LatestReqDate
FROM #LineLevelInfo LLI
LEFT JOIN(SELECT * FROM cteLatestReqDate WHERE RowNum = 1)LRD ON LRD.OrderNo = LLI.OrderNo And LRD.GroupLineNum = LLI.LineNum

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product

Filter by value in last row of LEFT OUTER JOIN table

I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo

SQL Query to Join Two Tables Based On Closest Timestamp

I need to retrieve the records from dbo.transaction (transaction of all users-more than one transaction for each user) that having timestamp which is closest to the time in dbo.bal (current balance details of each user-only one record for each user)
ie, the resultant records should equal to the no of records in the dbo.bal
Here i tried the below query, am getting only the records less than the time in dbo.bal. But there are some record having timestamp greater than and closest to dbo.bal.time
SELECT dbo.bal.uid,
dbo.bal.userId,
dbo.bal.balance,
dbo.bal.time,
(SELECT TOP 1 transactionBal
FROM dbo.transaction
WHERE TIMESTAMP <= dbo.bal.time
ORDER BY TIMESTAMP DESC) AS newBal
FROM dbo.bal
WHERE dbo.bal.time IS NOT NULL
ORDER BY dbo.bal.time DESC
here is my table structure,
dbo.transaction
---------------
| uid| userId | description| timestamp | credit | transactionBal
-------------------------------------------------------------------------
| 1 | 101 | buy credit1| 2012-01-25 03:23:31.624 | 100 | 500
| 2 | 102 | buy credit5| 2012-01-18 03:13:12.657 | 500 | 700
| 3 | 103 | buy credit3| 2012-01-15 02:16:34.667 | 300 | 300
| 4 | 101 | buy credit2| 2012-01-13 05:34:45.637 | 200 | 300
| 5 | 101 | buy credit1| 2012-01-12 07:45:21.457 | 100 | 100
| 6 | 102 | buy credit2| 2012-01-01 08:18:34.677 | 200 | 200
dbo.bal
-------
| uid| userId | balance | time |
-----------------------------------------------------
| 1 | 101 | 500 | 2012-01-13 05:34:45.645 |
| 2 | 102 | 700 | 2012-01-01 08:18:34.685 |
| 3 | 103 | 300 | 2012-01-15 02:16:34.672 |
And the result should be like,
| Id | userId | balance | time | credit | transactionBal
-----------------------------------------------------------------------------
| 1 | 101 | 500 | 2012-01-13 05:34:45.645 | 200 | 300
| 2 | 102 | 700 | 2012-01-01 08:18:34.685 | 200 | 200
| 3 | 103 | 300 | 2012-01-15 02:16:34.672 | 300 | 300
Please help me.. Any help is must appreciated...Thankyou
It would be helpful if you posted your table structures, but ...
I think your inner query needs a join condition. (That is not actually in your question)
Your ORDER BY clause in the inner query could be ABS(TIMESTAMP - DB0.BAL.TIME). That should give you the smallest difference between the 2.
Does that help ?
Based on the follwing Sql Fiddle http://sqlfiddle.com/#!3/7a900/15 I came up with ...
SELECT
bal.uid,
bal.userId,
bal.balance,
bal.time,
trn.timestamp,
trn.description,
datediff(ms, bal.time, trn.timestamp)
FROM
money_balances bal
JOIN money_transaction trn on
trn.userid = bal.userid and
trn.uid =
(
select top 1 uid
from money_transaction trn2
where trn2.userid = trn.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
)
WHERE
bal.time IS NOT NULL
ORDER BY
bal.time DESC
I cannot vouch for its performance because I know nothing of your data, but I believe it works.
I have simplified my answer - I believe what you need is
SELECT
bal.uid as baluid,
(
select top 1 uid
from money_transaction trn2
where trn2.userid = bal.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
) as tranuid
FROM
money_balances bal
and from that you can derive all the datasets you need.
for example :
with matched_credits as
(
SELECT
bal.uid as baluid,
(
select top 1 uid
from money_transaction trn2
where trn2.userid = bal.userid
order by abs(datediff(ms, bal.time, trn2.timestamp))
) as tranuid
FROM
money_balances bal
)
select
*
from
matched_credits mc
join money_balances mb on
mb.uid = mc.baluid
join money_transaction trn on
trn.uid = mc.tranuid
Try:
SELECT dbo.bal.uid,
dbo.bal.userId,
dbo.bal.balance,
dbo.bal.time,
(SELECT TOP 1 transactionBal
FROM dbo.transaction
ORDER BY abs(datediff(ms, dbo.bal.time, TIMESTAMP))) AS newBal
FROM dbo.bal
WHERE dbo.bal.time IS NOT NULL
ORDER BY dbo.bal.time DESC