Delete duplicates and keep one with condition in SQL Server

Delete duplicates and keep one with condition in SQL Server - sql

I am using SQL Server.
I have a table with the following design:
ID bigint
Number varchar(50)
Processed int
I have a lot of duplicates in the Number column
I want to delete all repeated Numbers, and keep the Number where processed=1
For Example if I have
Number --- Processed
111 --- 0
111 --- 0
111 --- 1
I want to delete all and keep the last one
Any help would be appreciated

Here is one method:
with todelete as (
select t.*,
row_number() over (partition by number order by processed desc) as seqnum
from table t
)
delete from todelete
where seqnum > 1;
The row_number() enumerates the rows, using the processed as a priority. The logic ensures that exactly one row remains, even if none have processed = 1.

If you are just trying to delete the rows where number equals 111 and processed does not equal 1 you can do:
delete from <table>
where
Number = 111 and
Processed <> 1
Assuming the ID is sequential and you want to keep the last row for each Number you can do:
delete from <table> t
left join (
select
MAX(ID) filter_ID
from <table>
group by
Number
) filter on
t.ID = filter.filter_ID
where
filter.filter_ID is null
to keep at least one row per Number giving priority to Processed = 1
delete from <table> t
left join (
select
ID
from (
select
ROW_NUMBER() OVER (
PARTITION BY
Number
ORDER BY
Processed DESC,
ID DESC
) last_R,
ID
from <table>
) filter
where
last_R = 1
) filter on
t.ID = filter.filter_ID
where
filter.filter_ID is null

Here is how I would approach this problem:
DECLARE #NUM VARCHAR(50)
DECLARE #TAB TABLE
(
NUMBER VARCHAR(50)
)
INSERT INTO #TAB
SELECT number, from <table> where processed = 0 GROUP BY number HAVING COUNT(number) > 1
DECLARE #IDToKEEP TABLE
(
id INT
)
WHILE (SELECT COUNT(*) FROM #TAB) > 0
BEGIN
SELECT TOP 1 #NUM = number FROM #TAB
INSERT INTO #IDToKEEP
SELECT TOP 1 id FROM <table> WHERE number = #NUM
DELETE FROM #TAB WHERE number = #NUM
END
DELETE FROM <table> WHERE processed = 0 AND number IN (SELECT number FROM #TAB) AND id NOT IN (SELECT id FROM #IDToKEEP)

Related

Write Cross Apply to select last row with condition

I'm trying to make request for SQL Table which looks like:
CREATE TABLE StudentMark (
Id int NOT NULL IDENTITY(1,1),
StudentId int NOT NULL,
Mark int
);
Is that possible to select StudentMark rows where row should be last row for each user with mark greater than 4.
I'm trying to accomplish that by doing:
SELECT *
FROM [dbo].StudentMark outer
CROSS APPLY (
SELECT TOP(1) *
FROM [dbo].StudentMark inner
WHERE inner.StudentId= outer.StudentId AND inner.Mark>4
) cApply
But that doesn't do what's needed. Could anyone help?

I am curious if something like this will actually get what you are looking for:
SELECT Data.StudentId,
Data.Id,
Data.Mark
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY StudentMark.StudentId ORDER BY StudentMark.Id DESC) AS RowNumber,
StudentMark.StudentId,
StudentMark.Id,
StudentMark.Mark
FROM dbo.StudentMark
) AS Data
WHERE Data.RowNumber = 1
What this will do is get the row number of each and then let you filter by the row number on what is returned. I changed it to look for the first row instead of the last row, but sorted DESC, such that you will get what effectively would have been the last row entered for a student id.

Presumably, "last row" is based on id. If so:
SELECT *
FROM [dbo].StudentMark sm CROSS APPLY
(SELECT TOP (1) sm2.*
FROM [dbo].StudentMark sm2
WHERE sm2.StudentId = sm.StudentId AND sm2.Mark > 4
ORDER BY sm2.id DESC
) sm2;
EDIT:
If you only want the last row per student with Mark > 4, then use filtering:
select sm.*
from dbo.StudentMark sm
where sm.id = (select max(sm2.id)
from dbo.StudentMark sm2
where sm2.studentId = sm.studentId and sm2.Mark > 4
);

SELECT *
FROM StudentMark A
CROSS APPLY
(
SELECT TOP 1 Mark
FROM StudentMark B
WHERE A.StudentId = B.StudentId
ORDER BY StudentId
)M
WHERE M.Mark > 4

Update based on group by, top 1 row and where case

I have a table as follows:
This is a result of this select:
SELECT ParentID, ID, [Default], IsOnTop, OrderBy
FROM [table]
WHERE ParentID IN (SELECT ParentID
FROM [table]
GROUP BY ParentID
HAVING SUM([Default]) <> 1)
ORDER BY ParentID
Now, what I want to do is to: for each ParentID group, set one of the rows as a Default ([Default] = 1), where the row is chosen using this logic:
if group has a row with IsOnTop = 1 then take this row, otherwise take top 1 row ordered by OrderBy.
I'm completly clueless as on how to do that in SQL and I have over 40 of such groups, thus I'd like to ask you for help, preferably with some explanation of your query.

Just slightly modify your current query by assigning a row number, across each ParentID group. The ordering logic for the row number assignment is that records with IsOnTop values of 1 come first, and after that the OrderBy column determines position. I update the CTE under the condition that only the first record in each ParentID group gets assigned a Default value of 1.
WITH cte AS (
SELECT ParentID, ID, [Default], IsOnTop, OrderBy,
ROW_NUMBER() OVER (PARTITION BY ParentID
ORDER BY IsOnTop DESC, OrderBy) rn
FROM [table]
WHERE ParentID IN (SELECT ParentID FROM [table]
GROUP BY ParentID HAVING SUM([Default]) <> 1)
)
UPDATE cte
SET [Default] = 1
WHERE rn = 1;

There might be a quicker way but this is how I would do it.
First create a CTE
First we create a CTE in which we add a row_number over the ParentID's based on if IsOnTop = 1. Else it picks the 1st row based on the OrderBy column.
Then we update the rows with the rownumber 1.
WITH FindSoonToBeDefault AS (
SELECT ParentID, ID, [Default], IsOnTop, OrderBy, row_number() OVER(PARTITION BY ParentID ORDER BY IsOnTop DESC, [OrderBy] ASC) AS [rn]
FROM [table]
WHERE ParentID IN (SELECT ParentID
FROM [table]
GROUP BY ParentID
HAVING SUM([Default]) <> 1)
ORDER BY ParentID
)
UPDATE FindSoonToBeDefault
SET [Default] = 1
WHERE [rn] = 1
In your screenshot row 12 will be default.
Row 13 will be not.

(1-IsOnTop)*OrderBy combines IsOnTop and OrderBy into a single result that can be ranked so that the lowest value is the one you want. Use a derived table to identify the lowest result for each ParentID, thenJOIN to that to identify your defaults.
UPDATE [table]
SET [Default] = 1
FROM [table]
INNER JOIN
(
SELECT ParentID, MIN((1-IsOnTop)*OrderBy) DefaultRank
FROM [table]
GROUP BY ParentID
) AS rankForDefault
ON rankForDefault.ParentID=[table].ParentID
AND rankForDefault.DefaultRank=(1-[table].IsOnTop)*[table].OrderBy

Selecting only one row if the same ID - SQL Server

I'm trying to learn SQL commands and working currently with an query which will list all customers which has status active (ID = 1) and active-busy (ID = 2).
The problem is that some customers have the same ID but the different type. So I have an customer which has ID 1 and Type 3 but the same customer has also ID 1 but Type 1 so what I'm trying to do is select only this which has Type 1 but have also the same ID. So IF ID is the same and Type is 1 and 3, select only Type 3.
SELECT
CASE
WHEN corel.opts LIKE 3
THEN (SELECT corel.opts
WHERE corel.objid = rel.id
AND corel.type IN (1, 2)
AND corel.opts = 3
ELSE corel.opts 1
END)
It's not complete query because it has many other this which I can't post but if you guys would show me way how could I accomplish that, I would appreciate it. I just don't know how to tell IF the same ID in the table but different Type - select only Type 3. Each customer have different ID but it can have the same type.

USE Row_number() like this
DECLARE #T TABLE
(
Id INT,
TypeNo INT
)
INSERT INTO #T
VALUES(1,1),(1,3),(2,1),(2,3),(3,1),(4,3)
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY TypeNo DESC),
Id,
TypeNo
FROM #T
)
SELECT
Id,
TypeNo
FROM CTE
WHERE RN = 1
My Input
Output

Test scenario is borrowed form Jayasurya Satheesh, thx, voted your's up!
DECLARE #T TABLE
(
Id INT,
TypeNo INT
)
INSERT INTO #T
VALUES(1,1),(1,3),(2,1),(2,3),(3,1),(4,3)
--The query will use ROW_NUMBER with PARTITION BY to start a row count for each T.Id separately.
--SELECT TOP 1 WITH TIES will take all first place rows and not just the very first:
SELECT TOP 1 WITH TIES
Id,
TypeNo
FROM #T AS T
ORDER BY ROW_NUMBER() OVER(PARTITION BY T.Id ORDER BY T.TypeNo DESC)
If your Type=3 is not the highest type code the simple ORDER BY T.TypeNo DESC won't be enough, but you can easily use a CASE to solve this.

As far as I understand, you need something like:
SELECT c1.*
FROM corel c1
LEFT OUTER JOIN corel c2 ON c1.objid=c2.objid AND c1.type <> c2.type
WHERE (c1.type=1 AND c2.type IS NULL) OR (c1.type=3 AND c2.type=1)

Find the lowest non-contiguous value

I was asked to write a T-SQL statement that will find the lowest unused value of MyId in the sequence below (i.e. in this case the result should be 3):
DECLARE #MyTable TABLE (MyId INT);
INSERT INTO #MyTable(MyId) VALUES(1),(2),(4),(5);

;With CTE
AS
(
SELECT * , ROW_NUMBER() OVER (ORDER BY MyID ASC) AS RN
FROM #MyTable
)
SELECT TOP 1 rn
FROM CTE
WHERE Rn <> MyId
ORDER BY MyId ASC

Here is how I would probably answer that question (though it is impossible for anyone here to know exactly what the interviewer was after).
First, you can easily generate a sequence of contiguous numbers from existing tables or views in any SQL Server system. For this, let's use master..spt_values (which will cover a sequence of about 2000 values, depending on version):
SELECT TOP (5) n = number + 1
FROM master.dbo.spt_values
WHERE type = N'P'
ORDER BY number;
Results:
n
------
1
2
3
4
5
Now, you don't know in advance that you need 5, so you can determine the number you need by taking the min and max from the table:
DECLARE #min INT, #max INT;
SELECT #min = MIN(MyId), #max = MAX(MyId) FROM #MyTable;
Now you can get the exact set you need (since it may not always start at 1):
SELECT TOP (#max-#min+1) number
FROM master.dbo.spt_values
WHERE number >= #min AND type = N'P'
ORDER BY number;
Now, finally, we can perform a left anti-semi-join to find the first value that exists in our contiguous set but not in the table:
;WITH x AS
(
SELECT TOP (#max-#min+1) number
FROM master.dbo.spt_values
WHERE number >= #min AND type = N'P'
ORDER BY number
)
SELECT MIN(number) FROM x
WHERE NOT EXISTS
(SELECT 1 FROM #MyTable WHERE MyId = x.number);
If you need more than 2000 values, you can use other things like sys.all_columns and if that isn't enough you can CROSS JOIN multiple tables. See http://www.sqlperformance.com/generate-a-set-1, http://www.sqlperformance.com/generate-a-set-2 and http://www.sqlperformance.com/generate-a-set-3.
Of course, if you know the sequence should always start at 1, rather than the minimum value in the table, then the other answers are slightly simpler. This caters to the case where the set doesn't necessarily start with 1, and you don't care about "missing" values that are below the minimum value.

with cte as (
select MyId, row_number over (order by MyId asc) RowId
from #MyTable
)
select top 1 c1.MyId + 1 FirstMissingMyId
from cte c1
join cte c2
on c1.RowId + 1 = c2.RowId
where c1.MyId + 1 <> c2.MyId
order by c1.MyId asc

The idea is to find first non-sequential number for a sequence that strats with (assumingly) with 1. You can do this by comparing to sequential numbers, generated by ROW_NUMBER() function
SELECT TOP(1) RN FROM
(SELECT MyID, ROW_NUMBER() OVER (ORDER BY MyId) AS RN FROM #MyTable) MT
WHERE MyID <> RN
ORDER BY MyID ASC
Demo: http://sqlfiddle.com/#!3/c1a90/2

SQL Server - Counting number of times an attribute in a dataset changes (non-concurrently)

I have a query that returns either a 1 or 0 based on whether or not an event occurred on a given date. This is ordered by date. Basically, a simple result set is:
Date | Type
---------------------
2010-09-27 1
2010-10-11 1
2010-11-29 0
2010-12-06 0
2010-12-13 1
2010-12-15 0
2010-12-17 0
2011-01-03 1
2011-01-04 0
What I would now like to be able to do is to count the number of separate, non-concurrent instances of '0's there are - i.e. count how many different groups of 0s appear.
In the above instance, the answer should be 3 (1 group of 2, then another group of 2, then finally 1 to end with).
Hopefully, the above example illustrates what I am trying to get at. I have been searching for a while, but am finding it difficult to succinctly describe what I am looking for, and hence haven't found anything of relevance.
Thanks in advance,
Josh

You could give each row a number in a CTE. Then you can join the table on itself to find the previous row. Knowing the previous row, you can sum the number of times the previous row is 1 and the current row is 0. For example:
; with NumberedRows as
(
select row_number() over (order by date) as rn
, type
from YourTable
)
select sum(case when cur.type = 0 and IsNull(prev.type,1) = 1 then 1 end)
from NumberedRows cur
left join
NumberedRows prev
on cur.rn = prev.rn + 1

This is a variant of the "islands" problem. My first answer uses Itzik Ben Gan's double row_number trick to identify contiguous groups of data efficiently. The combination of Type,Grp identifies each individual island in the data.
You can read more about the different approaches to tackling this problem here.
;WITH T AS (
SELECT *,
ROW_NUMBER() OVER(ORDER BY Date) -
ROW_NUMBER() OVER(PARTITION BY Type ORDER BY Date) AS Grp
FROM YourTable
)
SELECT COUNT(DISTINCT Grp)
FROM T
WHERE Type=0
My second answer requires a single pass through the data. It is not guaranteed to work but is on the same principle as a technique that many people successfully use to concatenate strings without problems.
DECLARE #Count int = 0
SELECT #Count = CASE WHEN Type = 0 AND #Count <=0 THEN -#Count+1
WHEN Type = 1 AND #Count > 0 THEN - #Count
ELSE #Count END
FROM YourTable
ORDER BY Date
SELECT ABS(#Count)

Have a look at this example, using Sql Server 2005+
DECLARE #Table TABLE(
Date DATETIME,
Type INT
)
INSERT INTO #Table SELECT '2010-09-27',1
INSERT INTO #Table SELECT '2010-10-11',1
INSERT INTO #Table SELECT '2010-11-29',0
INSERT INTO #Table SELECT '2010-12-06',0
INSERT INTO #Table SELECT '2010-12-13',1
INSERT INTO #Table SELECT '2010-12-15',0
INSERT INTO #Table SELECT '2010-12-17',0
INSERT INTO #Table SELECT '2011-01-03',1
INSERT INTO #Table SELECT '2011-01-04',0
;WITH Vals AS (
SELECT *,
ROW_NUMBER() OVER(ORDER BY Date) ROWID
FROM #Table
)
SELECT v.*
FROM Vals v LEFT JOIN
Vals vNext ON v.ROWID + 1 = vNext.ROWID
WHERE v.Type = 0
AND (vNext.Type = 1 OR vNext.Type IS NULL)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Delete duplicates and keep one with condition in SQL Server - sql

Related

Write Cross Apply to select last row with condition

Update based on group by, top 1 row and where case

Selecting only one row if the same ID - SQL Server

Find the lowest non-contiguous value

SQL Server - Counting number of times an attribute in a dataset changes (non-concurrently)

Categories

Resources