Get time stamp of change in column value - sql

I have a table that tracks a certain status using a bit column.I want to get the first timestamp of the status change. I have got the desired output using temp table but is there a better way to do this?
I get the max time stamp for status 1, then I get the min timestamp for status 0 and if the min timestamp for status 0 is greater than max timestamp for status 1 then I include it in the result set.
Sample data
123 0 2016-12-21 20:04:56.217
123 0 2016-12-21 19:00:28.980
123 0 2016-12-21 17:00:10.207 <-- Get this record because this is the latest status change from 1 to 0
123 1 2016-12-20 16:15:58.787
123 1 2016-12-20 16:11:36.523
123 1 2016-12-20 14:20:02.467
123 1 2016-12-20 13:57:57.623
123 0 2016-12-20 13:55:31.421 <-- This should not be included in the result even though it is a status change but since it is not the latest
123 1 2016-12-20 13:54:57.307
123 0 2016-12-19 12:23:46.103
123 0 2016-12-18 11:47:21.267
SQL
CREATE TABLE #temp_status_changed
(
id VARCHAR(22) NOT NULL,
enabled BIT NOT NULL,
dt_create DATETIME NOT null
)
INSERT INTO #temp_status_changed
SELECT id,enabled,MAX(dt_create) FROM mytable WHERE enabled=1
GROUP BY id,enabled
SELECT a.id,a.enabled,MIN(a.dt_create) FROM mytable a
JOIN #temp_status_changed b ON a.id=b.id
WHERE a.enabled=0
GROUP BY a.id,a.enabled
HAVING MIN(a.dt_create) > (SELECT dt_create FROM #temp_status_changed WHERE id=a.id)
DROP TABLE #temp_status_changed

There are several ways to achieve that.
For example, using LAG() function you can always get the previous value and compare it:
SELECT * FROM
(
SELECT *, LAG(Enabled) OVER (PARTITION BY id ORDER BY dt_create) PrevEnabled
FROM YourTable
) x
WHERE Enabled = 0 AND PrevEnabled = 1

Another approach without window functions would be:
SELECT
sc.id,
sc.enabled,
dt_create = MIN(sc.dt_create)
FROM
YourTable AS sc
JOIN (
SELECT
id,
max_dt_create = MAX(dt_create)
FROM
YourTable
WHERE
enabled = 1
GROUP BY
id
) as MaxStatusChanges
ON sc.id = MaxStatusChanges.id AND
sc.dt_create > MaxStatusChanges.max_dt_create
GROUP BY
sc.id,
sc.enabled
The query returns no rows for an id if there's no rows with status 1 for that id, as well as if the most recent status for the id is 1. An unclustered index on enabled column with included id and dt_create columns could improve query performance.

Related

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

Select rows from a particular row to latest row if that particular row type exist

I want to achieve these two requirements using a single query. Currently I'm using 2 queries in the program and use C# to do the process part something like this.
Pseudocode
select top 1 id from table where type=b
if result.row.count > 0 {var typeBid = row["id"]}
select * from table where id >= {typeBid}
else
select * from table
Req1: If there is records exist with type=b, Result should be latest row with type=b and all other rows added after.
Table
--------------------
id type date
--------------------
1 b 2021-10-15
2 a 2021-11-16
3 b 2021-11-19
4 a 2021-12-02
5 c 2021-12-12
6 a 2021-12-16
Result
--------------------
id type date
--------------------
3 b 2021-11-19
4 a 2021-12-02
5 c 2021-12-12
6 a 2021-12-16
Req2: There is NO record exist with type=b. Query should select all the records in the table
Table
---------------------
id type date
---------------------
1 a 2021-10-15
2 a 2021-11-16
3 a 2021-11-19
4 a 2021-12-02
5 c 2021-12-12
6 a 2021-12-16
Result
--------------------
id type date
--------------------
1 a 2021-10-15
2 a 2021-11-16
3 a 2021-11-19
4 a 2021-12-02
5 c 2021-12-12
6 a 2021-12-16
with max_b_date as (select max(date) as date
from table1 where type = 'b')
select t1.*
from table1 t1
cross join max_b_date
where t1.date >= max_b_date.date
or max_b_date.date is null
(table is a SQL reserved word, https://en.wikipedia.org/wiki/SQL_reserved_words, so I used table1 as table name instead.)
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=bd05543a9712e27f01528708f10b209f
Please try this(It's somewhat deep but might you exact looking for)
select ab.* from
((select top 1 id, type, date from test where type = 'b' order by id desc)
union
select * from test where type != 'b') as ab
where ab.id >= (select COALESCE((select top 1 id from test where type = 'b' order by id desc), 0))
order by ab.id;
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=739eb6bfee787e5079e616bbf4e933b1
Looks Like you can use an OR condition here
SELECT
*
FROM
(
SELECT
*,
BCount = COUNT(CASE type WHEN 'B' THEN 1 ELSE NULL END)-- to get the Count of Records with Type b.
FROM Table
)Q
WHERE
(
BCount > 0 AND id >= (select top 1 id from table where type=b)-- if there are Row's with Type b then select Req#1
)
OR
(
BCount = 0 -- if THere are no rows with Type B select All
)

Getting aggregate data in MySql

I am attempting to write a sql query to fetch aggregate data from a table. I have a table with data that looks as follows (example data):
trackingId
numberOfRecords
totalRecords
dateSubmitted
fileName
checkpoint
status
1
10
100
01/01/2021
example.doc
gateway
in-progress
1
20
100
02/01/2021
null
checkpoint1
in-progress
1
20
100
03/01/2021
null
checkpoint2
in-progress
The aggregate data I would like to query would look like:
trackingId
numberOfRecords
totalRecords
dateSubmitted
fileName
checkpoint
status
1
50
100
03/01/2021
example.doc
checkpoint2
in-progress
In summary, I would like to:
group on trackingId (done)
Sum of all records fetched (done)
get the latest date (done)
name of original document (not sure how to fetch a value from the first row only, I am trying to avoid subqueries due to inefficiencies)
latest checkpoint (get value from the newest record)
latest status (get value from the newest record)
My issue mainly is fetching specific data from either the newest or oldest record.
Thanks.
Consider below
select trackingId,
sum(numberOfRecords) as numberOfRecords,
any_value(totalRecords) as totalRecords,
max(dateSubmitted) as dateSubmitted,
array_agg(fileName order by dateSubmitted limit 1)[offset(0)] as fileName,
array_agg(checkpoint order by dateSubmitted desc limit 1)[offset(0)] as checkpoint,
array_agg(status order by dateSubmitted desc limit 1)[offset(0)] as status,
from `project.dataset.table`
group by trackingId
if applied to sample data in your question - output is
Look for this:
CREATE TABLE test ( id INT, -- will be used for ordering
cat INT, -- will be used for aggregation
col1 INT, -- will be used for to get SUM
col2 INT, -- will be used for to get value from 1st row
col3 INT -- will be used for to get value from last row
);
INSERT INTO test VALUES
(1,1,11,111,1111), (2,1,22,222,2222), (3,1,33,333,3333),
(4,2,4,4,4), (5,2,5,5,5);
SELECT * FROM test;
id
cat
col1
col2
col3
1
1
11
111
1111
2
1
22
222
2222
3
1
33
333
3333
4
2
4
4
4
5
2
5
5
5
SELECT cat,
SUM(col1) col1_sum,
SUBSTRING_INDEX(GROUP_CONCAT(col2 ORDER BY id), ',', 1) col2_first,
SUBSTRING_INDEX(GROUP_CONCAT(col3 ORDER BY id), ',', -1) col3_last
FROM test
GROUP BY cat;
cat
col1_sum
col2_first
col3_last
1
66
111
3333
2
9
4
5
db<>fiddle here
The values processed by GROUP_CONCAT() must have no comma in the value.
PS. Do not forget about group_concat_max_len, especially when single value in the column may be long.
PPS. The expression for last value may be SUBSTRING_INDEX(GROUP_CONCAT(col3 ORDER BY id DESC), ',', 1) col3_last.

Time consuming query to Skip First inserted record of Id list

In postgressql I have a data with multiple articleId list on table. Whereever I query it should skip first inserted record of particular userID in specified list of articleID.
select * from (
select * , row_number() over (partition by articleId order by date) rn
from table where articleId in (1200) and userId = 1
) t
where t.rn > 1
It will return expected record by skip first inserted record of each articleId of particular userId.
But above query consuming more time to execute if there is large data.
table:
id
name
articleId
date
userId
1
abc
1200
2021-05-01 06:09:35
1
2
bcd
1400
2021-05-02 06:08:35
1
3
xyz
1200
2021-05-03 09:09:35
2
4
pqr
1200
2021-05-04 08:09:35
1
5
xyz
1200
2021-05-05 09:09:35
3
Expected query Output:
id
name
articleId
date
userId
4
pqr
1200
2021-05-04 08:09:35
1
Try adding the following index, which should cover the call to ROW_NUMBER as well as the WHERE clause:
CREATE INDEX idx ON yourTable (articleId, date, userId);
This should speed up your current query. As always, check the execution plan before and after using EXPLAIN.
I would suggest using a correlated subquery with the right indexing:
select *
from t
where t.articleid = 1200 and t.userId = 1 and
t.date > (select min(t2.date)
from t t2
where t2.articleId = t.articleId
);
Then for this query, you want two indexes: (articleid, userId) and (articleId, date).
Note: I'm a bit surprised that userId is not in the partition by clause.

How to use a select command to find all the records that has the maximum date value for a specific item?

Say I have a table like this, we call it tbl_test
ID thedate actionid songid
1 2014-10-01 100 10
2 2014-09-30 100 10
3 2014-10-01 80 10
4 2014-09-30 80 10
5 2014-10-01 80 21
6 2014-09-30 100 21
Now I want to find all the record thats in the tbl_test where actionid=100 and with the latest [thedate] value. In this case, I want the final select result to be
(this is the result I want, not an existing table)
ID thedate actionid songid
1 2014-10-01 100 10
6 2014-09-30 100 21
Question, how am I going to do that use nothing but a single select command in MS SQL Server?
Use a join to a query that returns the latest date for each song:
select tbl_test.*
from tbl_test
join (select songid, max(theDate) maxDate
from tbl_test
where actionId = 100
group by songid) t on t.songId = tbl_test.songId and theDate = maxDate
where actionid = 100
This should perform pretty well as it makes only 2 passes over the table - one for the inner query that determines the latest date, and another to output the matching rows
A general SQL way to get this is using not exists:
select t.*
from tbl_test t
where actionid = 100 and
not exists (select 1
from tbl_test t2
where t2.songid = t.songid and t2.actionid = 100 and t2.thedate > t.thedate
);
For performance, you want an index on songid, actionid, thedate.