Select last row where given value changed - sql

I want to select last row (by mod_date) where column new_status is different from previous entry for given object_id.
At first I tried with row_number but not made it, later I came up with lead/lag functions and I think I'm closer to solution but still not ideal results.
Here's the code and the fiddle:
https://www.db-fiddle.com/f/kS8SAi2WsAjfFLomd7t2it/0
CREATE TABLE changes
(object_id integer,
new_status smallint,
comment text,
mod_date timestamp);
INSERT INTO changes
VALUES
(1001, 0, null, '2020-06-01 12:01'),
(1001, 1, 'XYZ', '2020-06-01 12:05'),
(1001, 1, 'YZX', '2020-06-01 12:11'),
(1002, 1, 'XYZ', '2020-06-01 13:21'),
(1002, 1, 'AAA', '2020-06-01 13:25'),
(1002, 0, 'BCA', '2020-06-01 14:11'),
(1003, 1, 'AXX', '2020-06-01 14:12'),
(1003, 0, 'YZX', '2020-06-01 14:13'),
(1003, 0, 'YYY', '2020-06-01 14:17');
SELECT object_id, min(mod_date), new_status FROM (
SELECT
object_id
, mod_date
, new_status
--, row_number() over (partition BY object_id ORDER BY mod_date desc) rn
, lag(new_status) OVER (partition by object_id ORDER BY mod_date desc) as next_status
FROM changes
ORDER BY 1)x
WHERE new_status = next_status
OR next_status is null
GROUP BY 1,3
The output for 1001, and 1003 is fine, for 1002 it should be row with status 0.
Appreciate any help and suggestions!

I think you want:
select distinct on (object_id) c.*
from (select c.*,
lag(new_status) over (partition by object_id order by mod_date) as prev_ns
from changes c
) c
where prev_ns is distinct from new_status
order by object_id, mod_date desc;
Here is a db<>fiddle.

Related

I want to find the date intervals at which the employee comes on a regular basis

Imagine a employee who works in a company whos having a contract to work on a specific task, he comes in and goes on start and end date respectively. I want to get the interval at which the employee comes to office without any absence.
Example Data:
DECLARE #TimeClock TABLE (PunchID INT IDENTITY, EmployeeID INT, PunchinDate DATE)
INSERT INTO #TimeClock (EmployeeID, PunchInDate) VALUES
(1, '2020-01-01'), (1, '2020-01-02'), (1, '2020-01-03'), (1, '2020-01-04'),
(1, '2020-01-05'), (1, '2020-01-06'), (1, '2020-01-07'), (1, '2020-01-08'),
(1, '2020-01-09'), (1, '2020-01-10'), (1, '2020-01-11'), (1, '2020-01-12'),
(1, '2020-01-13'), (1, '2020-01-14'), (1, '2020-01-16'),
(1, '2020-01-17'), (1, '2020-01-18'), (1, '2020-01-19'), (1, '2020-01-20'),
(1, '2020-01-21'), (1, '2020-01-22'), (1, '2020-01-23'), (1, '2020-01-24'),
(1, '2020-01-25'), (1, '2020-01-26'), (1, '2020-01-27'), (1, '2020-01-28'),
(1, '2020-01-29'), (1, '2020-01-30'), (1, '2020-01-31'),
(1, '2020-02-01'), (1, '2020-02-02'), (1, '2020-02-03'), (1, '2020-02-04'),
(1, '2020-02-05'), (1, '2020-02-06'), (1, '2020-02-07'), (1, '2020-02-08'),
(1, '2020-02-09'), (1, '2020-02-10'), (1, '2020-02-12'),
(1, '2020-02-13'), (1, '2020-02-14'), (1, '2020-02-15'), (1, '2020-02-16');
--the output shall look like this '2020-01-01 to 2020-02-10' as this is the interval at which the employee comes without any leave
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-01') as START_DATE, FORMAT( getdate(), '2020-01-10') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-11') as START_DATE, FORMAT( getdate(), '2020-01-15') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-01-21') as START_DATE, FORMAT( getdate(), '2020-01-31') as END_DATE union all
SELECT 1 AS ID, FORMAT( getdate(), '2020-02-01') as START_DATE, FORMAT( getdate(), '2020-02-10') as END_DATE
--the output shall look like this '2020-01-01 to 2020-01-15' and '2020 01-21 to 2020-02-10'as these are the intervals at which the employee comes without any leave
Using the example data provided we can query the table like this:
;WITH iterate AS (
SELECT *, DATEADD(DAY,1,PunchinDate) AS NextDate
FROM #TimeClock
), base AS (
SELECT *
FROM (
SELECT *, CASE WHEN DATEADD(DAY,-1,PunchInDate) = LAG(PunchinDate,1) OVER (PARTITION BY EmployeeID ORDER BY PunchinDate) THEN PunchInDate END AS s
FROM iterate
) a
WHERE s IS NULL
), rCTE AS (
SELECT EmployeeID, PunchInDate AS StartDate, PunchInDate AS EndDate, NextDate
FROM base
UNION ALL
SELECT a.EmployeeID, a.StartDate, r.PunchInDate, r.NextDate
FROM rCTE a
INNER JOIN iterate r
ON a.NextDate = r.PunchinDate
AND a.EmployeeID = r.EmployeeID
)
SELECT EmployeeID, StartDate, MAX(EndDate) AS EndDate, DATEDIFF(DAY,StartDate,MAX(EndDate)) AS Streak
FROM rCTE
GROUP BY rCTE.EmployeeID, rCTE.StartDate
This is known as a recursive common table expression, and allows us to compare values between related rows. In this case we're looking for rows where they follow a streak, and we want o re-start that streak anytime we encounter a break. We're using a windowed function called LAG to look back a row to the previous value, and compare it to the one we have now. If it's not yesterday, then we start a new streak.
EmployeeID StartDate EndDate Streak
------------------------------------------
1 2020-01-01 2020-01-15 14
1 2020-01-17 2020-02-10 24
1 2020-02-12 2020-02-16 4

can not make a select with the desired values in SQL Server

Creating the table
CREATE TABLE dbo.factura
(
customer_code varchar(20),
invoice_number char(4),
line_number char(2),
data date
);
DROP TABLE dbo.factura;
SELECT * FROM dbo.factura;
Populating the table:
INSERT INTO dbo.factura VALUES ('ABC', '0012', '01', '2020-10-01');
INSERT INTO dbo.factura VALUES ('ABC', '0012', '02', '2020-11-01');
INSERT INTO dbo.factura VALUES ('ABC', '0012', '03', '2020-11-01');
INSERT INTO dbo.factura VALUES ('ABC', '0013', '08', '2021-01-21');
INSERT INTO dbo.factura VALUES ('ABC', '0013', '09', '2020-09-01');
INSERT INTO dbo.factura VALUES ('SLIK', '0001', '01', '2021-01-01');
INSERT INTO dbo.factura VALUES ('SLIK', '0001', '02', '2020-02-01');
Write a SQL statement to return the recordset:
CUSTOMER_CODE, INVOICE_NR, LINE_NR
where NR_LINE is the line number with the most recent value in the DATA column for each invoice of each customer in the table.
I tried to do with a self join, but it doesn't work.
Do you have any ideas how to write this query?
If you want the rows by sorted by line_number in descending order then the query will be:
with cte as (
select CUSTOMER_CODE, INVOICE_number, LINE_number , row_number()over(partition by customer_code,invoice_number order by LINE_number desc) rn
from factura
)
select CUSTOMER_CODE, INVOICE_number, LINE_number from cte where rn=1
Output:
If you want the result in descending order of data field then the query will be:
with cte as (
select CUSTOMER_CODE, INVOICE_number, LINE_number , row_number()over(partition by customer_code,invoice_number order by data desc) rn
from factura
)
select CUSTOMER_CODE, INVOICE_number, LINE_number from cte where rn=1
Output:

Split Table into Windows with Recurring Attributes

My title is awful, because I am not sure how to describe the challenge. I would love an edit if someone can think of a more descriptive title. Hopefully my input/desired output will help explain. Here is some sample input data:
create table #input (
num varchar(10),
code varchar(10),
event_date date
)
insert into #input (num, code, event_date)
values('123456', 'Active', '2007-09-10'),
('123456', 'Active', '2010-09-15'),
('123456', 'Active', '2010-09-24'),
('123456', 'Inactive', '2018-09-17'),
('123456', 'Inactive', '2019-01-01'),
('123456', 'Active', '2019-02-08')
select *
from #input
order by event_date
I want to tag each record for each group of num + code with the same number. However, I want the time periods to stay separate. Here is the desired result:
create table #result (
num varchar(10),
code varchar(10),
event_date date,
tag int
)
insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
('123456', 'Active', '2010-09-15', 1),
('123456', 'Active', '2010-09-24', 1),
('123456', 'Inactive', '2018-09-17', 2),
('123456', 'Inactive', '2019-01-01', 2),
('123456', 'Active', '2019-02-08', 3)
select *
from #result
order by event_date
Obviously normal window partitions like this...
select *, row_number() over(partition by num, code order by event_date) rn
from #input
order by event_date
...don't work, because there is no field on which to partition that would split the two "Active" groups (two groups, because they happen during two time frames). How would I reach my desired result? I have a hunch that a series of lag() and lead() functions might work, but I couldn't get anywhere meaningful.
Alternatively, how would I achieve the results so the categories overlap by one?
create table #result_new (
num varchar(10),
code varchar(10),
event_date date,
tag int
)
insert into #result (num, code, event_date, tag)
values('123456', 'Active', '2007-09-10', 1),
('123456', 'Active', '2010-09-15', 1),
('123456', 'Active', '2010-09-24', 1),
('123456', 'Inactive', '2018-09-17', 1),
('123456', 'Inactive', '2019-01-01', 2),
('123456', 'Active', '2019-02-08', 2)
select *
from #result_new
order by event_date
LAG gets your half way there, but not the whole way. You can use LAG to check the value of the last row, and create (what I have called) a switch. You can then use a SUM window function, with a ROWs BETWEEN clause to get the value for tag:
WITH CTE AS(
SELECT num,
code,
event_date,
CASE WHEN code = LAG(code) OVER (PARTITION BY num ORDER BY event_date) THEN 0 ELSE 1 END AS Switch
FROM #input)
SELECT num,
code,
event_date,
SUM(Switch) OVER (PARTITION BY num ORDER BY event_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tag
FROM CTE;

Get first record change timestamp when grouped by two columns SQL Server

I have a table that tracks versions history.I need to get the record the shows the first version change timestamp to the most recent version.
EDIT: Added more records to illustrate what I am looking for.
For example
id version timestamp
(123, 1.5, '2015-03-28 08:21:04'),
(123, 1.5, '2015-03-28 07:21:04'),
(123, 1.5, '2015-03-27 07:21:04'), <-- Latest version,first change for id 123
(123, 1.2, '2015-03-22 12:58:24'),
(123, 1.2, '2015-03-21 13:32:05'),
(123, 1.0, '2015-03-21 09:18:37'),
(123, 1.0, '2015-03-20 04:44:59'),
(234, 1.5, '2016-10-15 23:08:09'), <-- Latest version,first change for id 234
(345, 1.5, '2016-10-10 15:18:09'),
(345, 1.5, '2016-09-02 21:30:00'),
(345, 1.5, '2016-09-01 21:30:00'),
(345, 1.5, '2016-08-02 21:30:00'), <-- Latest version,first change for id 345
(345, 1.0, '2016-07-02 21:30:00')
Expected output
id version timestamp
(123, 1.5, '2015-03-27 07:21:04')
(234, 1.5, '2016-10-15 23:08:09')
(345, 1.5, '2016-08-02 21:30:00')
I was able to get this by using temp tables.I get the min(dt_create) for each id and version and store it in a temp table.Then I get the max date from this table for each id and then join it again to get the version.Is there a better way to do this?
create table #temp_version
(
id varchar(22) NOT NULL,
version varchar(50) NOT NULL,
dt_create datetime NOT null
)
insert into #temp_version
select id,version,min(dt_create) as dt_create
from [version_history] (nolock)
group by id,version
create table #temp_min_date
(
id varchar(22) NOT NULL,
dt_create datetime NOT null
)
insert into #temp_min_date
select id,max(dt_create)
from #temp_version
group by id
select a.id,a.version,a.dt_create
from #temp_version a
join #temp_date b on a.id = b.id and
a.dt_create=b.dt_create
drop table #temp_date
drop table #temp_version
Here's a query without any joins:
select
id, [version], dt_create
from (
select
id,
[version],
rank() over (partition by id order by [version] desc, dt_create desc) as rnk,
min(dt_create) over (partition by id, [version]) as dt_create
from #history
) res
where
rnk = 1
And there the whole query with your test data:
declare #history table(id varchar(22), [version] varchar(50), dt_create datetime);
insert into #history(id, [version], dt_create) values
('123', '1.5', '2015-03-28 08:21:04'),
('123', '1.5', '2015-03-28 07:21:04'),
('123', '1.5', '2015-03-27 07:21:04'),
('123', '1.2', '2015-03-22 12:58:24'),
('123', '1.2', '2015-03-21 13:32:05'),
('123', '1.0', '2015-03-21 09:18:37'),
('123', '1.0', '2015-03-20 04:44:59'),
('234', '1.5', '2016-10-15 23:08:09'),
('345', '1.5', '2016-10-10 15:18:09'),
('345', '1.5', '2016-09-02 21:30:00'),
('345', '1.5', '2016-09-01 21:30:00'),
('345', '1.5', '2016-08-02 21:30:00'),
('345', '1.0', '2016-07-02 21:30:00')
select
id, [version], dt_create
from (
select
id,
[version],
rank() over (partition by id order by [version] desc, dt_create desc) as rnk,
min(dt_create) over (partition by id, [version]) as dt_create
from #history
) res
where
rnk = 1
Solution without CTE or subquery:
SELECT TOP(1) WITH TIES
id,
version,
dt_create = timestamp
FROM [version_history]
ORDER BY ROW_NUMBER() OVER (PARTITION BY id ORDER BY version desc, timestamp)
You can use a window function with a sub-query or CTE. Note that in your expected output, for ID 123 you didn't select the first instance... just the one with the minimum TIME, without consideration to the date. If that's what you really want, then you just need to use the time portion in the order by. i.e. (partition by id order by cast(timestamp as time))
select t.id, t.version, t.timestamp
from
YourTable t
inner join
(select
id,
version,
timestamp,
row_number() over (partition by id order by timestamp) as rn
from YourTable) x on x.id = t.id and x.timestamp = t.timestamp and x.version = t.version
where x.rn = 1
Try
create table #temp_version
(
id varchar(22) NOT NULL,
version varchar(50) NOT NULL,
dt_create datetime NOT null
)
insert into #temp_version values
(123, 1.5, '2015-03-28 08:21:04'),(123, 1.5, '2015-03-28 07:21:04'),(123, 1.5, '2015-03-27 07:21:04')
,(123, 1.0, '2015-03-21 12:58:24'),(123, 1.0, '2015-03-20 12:58:24'),(123, 1.2, '2015-03-22 12:58:24')
,(123, 1.2, '2015-03-21 12:58:24'),(234, 1.5, '2016-10-15 23:08:09'),(345, 1.5, '2016-10-10 15:18:09')
,(345, 1.5, '2016-09-02 21:30:00'),(345, 1.5, '2016-09-01 21:30:00'),(345, 1.5, '2016-08-02 21:30:00');
select top(1) with ties id, version, dt_create
from (
select *, lag(version) over(partition by id order by dt_create) prev
from #temp_version
) t
where version != prev or prev is null -- first change
-- recent version
order by row_number() over(partition by id order by dt_create desc);
-- or may be .. order by version .. depending on what is "recent"

How to retrieve WTD,YTD,MTD users from a user traffic table in the same query?

In a user traffic table as below, I would like to compute the week to date (WTD), month to date ( MTD ), year to date ( YTD ) user and returned user counts.
Test data :
create table user_traffic (session_id number(6), session_day date,
user_id number(6), product_id number(6));
insert into user_traffic values ( 1, date '2016-09-07', 101, 1);
insert into user_traffic values ( 2, date '2016-09-07', 101, 4);
insert into user_traffic values ( 3, date '2016-09-07', 102, 1);
insert into user_traffic values ( 4, date '2016-09-08', 101, 2);
insert into user_traffic values ( 5, date '2016-09-08', 101, 4);
insert into user_traffic values ( 6, date '2016-09-09', 102, 1);
insert into user_traffic values ( 7, date '2016-09-10', 102, 1);
insert into user_traffic values ( 8, date '2016-09-10', 103, 3);
insert into user_traffic values ( 9, date '2016-09-25', 104, 3);
insert into user_traffic values ( 10, date '2016-10-01', 103, 1);
insert into user_traffic values ( 11, date '2016-10-02', 104, 3);
Expected Output :-
Week_Start_Day, WTD_new_cnt, WTD_returned_cnt
Month_Start_Day, MTD_new_cnt, MTD_returned_cnt
Year_Start_Day, YTD_new_cnt, YTD_returned_cnt
Comments :-
For eg: In the above user traffic table userid=104 visited on Oct 02nd and the WTD,MTD,YTD new/returned counts would be as below.
WTD,new,return
2016-09-26(Mon)(Week start day ), 1,0 ( For userid = 104 )
MTD,new,return
2016-09,1,1
2016-10,0,1
YTD,new,return
2016,0,1
What I have tried?
select session_day,
COUNT( distinct user_id ) AS user_cnt,
count(distinct user_id) - lag(count(distinct user_id))
over (order by session_day) gain,
count(newu) AS newu, count(returnu) AS returnu
from
(
select session_id,
session_day,
user_id,
CASE WHEN
count(*) over ( partition by user_id ORDER BY
session_day,session_id ROWS
BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW
)
= 1
THEN 1
END
AS newu,
CASE WHEN
lag( session_day,1 ) over ( partition by user_id ORDER
BY session_day,session_id
)
<>
lag( session_day,1 ) over ( order by
session_day,session_id
)
THEN 1
END AS returnu
from user_traffic u
)
group by session_day
order by session_day;
I have built this sql in computing the new/returned users from the user traffic table at sessionday level.