Flag "yes/No" if the subsequent row has same ID - sql

I have data like this. If the same id is present in the next row, I want to flag as Yes. If it is not present then Make it as 'No'. Can you kindly help me with the query?
Thanks

The problem with multiple rows for the same ID and no other column that can be used to futher narrow the sort sequence is that you need an order you can rely on. As the typical sulution for the general task to compare with the next row's ID is LEAD, you'll have two ORDER BY clauses in your query, one for LEAD and one for the query result, and you want to force them somehow to obey the same sort order. ORDER BY id is not sufficent.
The best and easiest approach is probably to number the rows first, and then work on this data set.
with numbered
(
select
id,
row_number() over (order by id) as rn
from mytable
)
select
id,
case when id = lead(id) over (order by rn) then 'yes' else 'no' end as flag
from numbered
order by rn;

You can use LEAD, which gets the value of the next row.
SELECT
CASE
WHEN ID = LEAD(ID) OVER (ORDER BY ID) THEN 'yes'
ELSE 'no'
END
FROM [MyTableName]
ORDER BY ID
You can read more about LEAD here.

select ID, Lag(ID) OVER(Order by ID desc) as [NextVal],
case when
Lag(ID) OVER(Order by ID desc) = ID THEN 'yes'
ELSE 'no'
END as 'FLAG'
from tableName
order by 1 , 2

Related

Row_number skip values

I have a table like this:
The idea were to count only when I have "Include" at column include_appt, when it finds NULL, it should skip set is as "NULL" or "0" and on next found "Include" back to counting where it stopped.
The screenshot above I was almost able to do it but unfortunately the count didn't reset on next value.
PS: I can't use over partition because I have to keep the order by id ASC
I suggest using the DENSE_RANK() with the columns you have hidden (--*,):
SELECT
row_num AS id,
include_appt,
CASE WHEN include_appt is not null
THEN ROW_NUMBER() OVER(ORDER BY (SELECT 0))
+ 1
- DENSE_RANK() OVER(
PARTITION BY /*some hidden columns*/
ORDER BY/*some hidden columns*/)
ELSE NULL
END AS row_num2
FROM C
ORDER BY row_num
Then the result will be:
enter image description here
If you are trying to prevent row numbers being added for NULL/0 values, why not try a query like this instead?
SELECT
row_num AS id,
include_appt,
ROW_NUMBER() OVER
(
ORDER BY (SELECT 0)
) AS row_num2
FROM C
WHERE ISNULL(C.include_appt, 0) <> 0
ORDER BY row_num
I would recommend reconsidering the column names/aliases you want to have displayed in your final result to avoid confusion, but the above should effectively do what you are wanting.
You need a PARTITION BY clause
SELECT
row_num AS id,
include_appt,
CASE WHEN include_appt IS NULL
THEN 0
ELSE
ROW_NUMBER() OVER (PARTITION BY include_appt ORDER BY (SELECT 0))
END AS row_num2
FROM C
ORDER BY row_num
SELECT id, include_appt,
CASE WHEN include_appt IS NULL THEN 0
ELSE ROW_NUMBER() OVER (PARTITION BY include_appt ORDER BY id ASC)
END AS row_num
FROM #1 ORDER BY id asc
This can be easily done with a partition by include_appt as in another answer below, yet after playing around with the query plans I've decided that it is still worthwhile to consider this slightly different approach which might offer a performance boost. I believe the benefit is gained by being able to use the clustered index without involving a sort on the flag column:
select id, flag,
case when flag is not null
then row_number() over (order by id)
- count(case when flag is null then 1 end) over (order by id)
else 0 end /* count up the skips */ as new_rn
from T
order by id
Examples (including a "reset" behavior): https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=c9f4c187c494d2a402e43a3b24924581
Performance comparison:
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=719f7bd26135ab498d11c786f1b1b28b

SQL create flag based on earliest/latest date

I have a data set with the following attributes:
- IDs are not unique and has multiple rows
- Each ID has a different date called 'Start Date'
I am trying to add a flag (Y/N) to determine which ID row to use, based on the earliest date.
This is what I have so far:
SELECT *,
min(Start_Date) OVER (PARTITION BY ID) AS FirstEntryFlag,
From `table`
Could someone please give me guidance on how I would achieve this? Thankyou
Is this what you want?
select (case when start_date = min(Start_Date) OVER (PARTITION BY ID)
then 1 else 0
end) as FirstEntryFlag
from t;
If the start date has duplicates for an id and you want only one row flagged, use row_number():
select (case when 1 = row_number() over (partition by id order by Start_Date)
then 1 else 0
end) as FirstEntryFlag
from t;
Finally, some databases support boolean types, so the case is not necessary. Just the conditional expression can return a valid value.

Oracle LEAD - return next matching column value

I having below data in one table.
And I want to get NEXT out data from OUT column. So used LEAD function in below query.
SELECT ROW_NUMBER,TIMESTAMP,IN,OUT,LEAD(OUT) OVER (PARTITION BY NULL ORDER BY TIMESTAMP) AS NEXT_OUT
FROM MYTABLE;
It gives data as below NEXT_OUT column.
But I need to know the matching next column value in sequential way like DESIRED columns. Please let me know how can i achieve this in Oracle LEAD FUNCTION
THANKS
Assign row number to all INs and OUTs separately, sort the results by placing them in a single column and calculate LEADs:
WITH cte AS (
SELECT t.*
, CASE WHEN "IN" IS NOT NULL THEN COUNT("IN") OVER (ORDER BY "TIMESTAMP") END AS rn1
, CASE WHEN "OUT" IS NOT NULL THEN COUNT("OUT") OVER (ORDER BY "TIMESTAMP") END AS rn2
FROM t
)
SELECT cte.*
, LEAD("OUT") OVER (ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST) AS NEXT_OUT
FROM cte
ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST
Demo on db<>fiddle
Enumerate in the "in"s and the "out"s and use that information for matching.
select tin.*, tout.out as next_out
from (select t.*,
count(in) over (order by timestamp) as seqnum_in
from t
) tin left join
(select t.*,
count(out) over (order by timestamp) as seqnum_out
from t
) tout
on tin.in is not null and
tout.out is not null and
tin.seqnum_in = tout.seqnum_out;

SQL Find the minimum date based on consecutive values

I'm having trouble constructing a query that can find consecutive values meeting a condition. Example data below, note that Date is sorted DESC and is grouped by ID.
To be selected, for each ID, the most recent RESULT must be 'Fail', and what I need back is the earliest date in that run of 'Fails'. For ID==1, only the 1st two values are of interest (the last doesn't count due to prior 'Complete'. ID==2 doesn't count at all, failing the first condition, and for ID==3, only the first value matters.
A result table might be:
The trick seems to be doing some type of run-length encoding, but even with several attempts manipulating ROW_NUM and an attempt at the tabibitosan method for grouping consecutive values, I've been unable to gain traction.
Any help would be appreciated.
If your database supports window functions, you can do
select id, case when result='Fail' then earliest_fail_date end earliest_fail_date
from (
select t.*
,row_number() over(partition by id order by dt desc) rn
,min(case when result = 'Fail' then dt end) over(partition by id) earliest_fail_date
from tablename t
) x
where rn=1
Use row_number to get the latest row in the table. min() over() to get the earliest fail date for each id. If the first row has status Fail, you select the earliest_fail_date or else it would be null.
It should be noted that the expected result for id=1 is wrong. It should be 2016-09-20 as it is the earliest fail date.
Edit: Having re-read the question, i think this is what you might be looking for. Getting the minimum Fail date from the latest consecutive groups of Fail rows.
with grps as (
select t.*,row_number() over(partition by id order by dt desc) rn
,row_number() over(partition by id order by dt)-row_number() over(partition by id,result order by dt) grp
from tablename t
)
,maxfailgrp as (
select g.*,
max(case when result = 'Fail' then grp end) over(partition by id) maxgrp
from grps g
)
select id,
case when result = 'Fail' then (select min(dt) from maxfailgrp where id = m.id and grp=m.maxgrp) end earliest_fail_date
from maxfailgrp m
where rn=1
Sample Demo

Select rows based on two columns in SQL Server

I have a table which stores data where accidentally data has been stored multiple times because of case sensivity for the username field on server side code. The username field should be regarded as case insensitive. The important columns and data for the table can be found below.
My requirements now is to delete all but the most recent saved data. I'm writing an sql script for this, and started out by identifying all rows that are duplicates. This selection returns a table like below.
For each row, the most recent save is LASTUPDATEDDATE if it exist, otherwise CREATEDDATE. For this example, the most recent save for 'username' would be row 3.
ID CREATEDDATE LASTUPDATEDDATE USERNAME
-- ----------- --------------- --------
1 11-NOV-11 USERNAME
2 01-NOV-11 02-NOV-11 username
3 8-JAN-12 USERname
My script (which selects all rows where a duplicated username appears) looks like:
SELECT
id, createddate, lastupdateddate, username
FROM
table
WHERE
LOWER(username)
IN
(
SELECT
LOWER(username)
FROM
table
GROUP BY
LOWER(username)
HAVING
COUNT(*) > 1
)
ORDER BY
LOWER(username)
My question now is: How do I select everything but row 3? I have searched Stack Overflow for a good match to this question, but found no match good enough. I know I probably have to make a join of some kind, but can't really get my head around it. Would be really thankful for a push in the right direction.
We are using SQL Server, probably a quite new version.
To delete duplicates, you can use:
with todelete as (
select t.*,
row_number() over (partition by lower(username) order by createddate desc) as seqnum
from table
)
delete from t
where seqnum > 1
This assigns a sequential number to each row, starting with 1 for the most recent. It then deletes all but the most recent.
For two dates, you can use:
with todelete as (
select t.*,
row_number() over (partition by lower(username) order by thedate desc) as seqnum
from (select t.*,
(case when createddate >= coalesdce(updateddate, createddate)
then createddate
else updateddate
end) as thedate
from table
) t
)
delete from t
where seqnum > 1
A couple of things to note -- there is no reason to use LOWER in your query. A = a in SQL Server.
Also, to get the correct date, you can use COALESCE to determine if LastUpdatedDate exists and if so, sort by it, else sort by CreatedDate.
Putting that together, this should work:
DELETE T
FROM YourTable T
JOIN (
SELECT *, ROW_NUMBER() OVER (PARTITION BY username
ORDER BY COALESCE(lastupdateddate, createddate) DESC) as RN
FROM YourTable
) T2 ON T.Id = T2.Id
WHERE T2.RN > 1
Here is a sample fiddle: http://www.sqlfiddle.com/#!3/51f7c/1
As #Gordon correctly suggests, you could also use a CTE depending on the version of SQL Server you use (2005+):
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY username
ORDER BY COALESCE(lastupdateddate, createddate) DESC) as RN
FROM YourTable
)
DELETE FROM CTE WHERE RN > 1