MS SQL - Select only one row for an ID - sql

I have a changelog with insert / update / delete operations:
change_id | object_id | operation
----------+-----------+----------
1 | 1 | insert
2 | 2 | insert
3 | 1 | delete
4 | 1 | insert
5 | 3 | insert
6 | 2 | delete
7 | 4 | insert
8 | 3 | update
I need to select only the last row for each object_id and keep the result sorted by change_id. The result should look like this:
change_id | object_id | operation
----------+-----------+----------
4 | 1 | insert
6 | 2 | delete
7 | 4 | insert
8 | 3 | update
How can I do this? Is it possible with a simple query, without stored procedures?

SQL Fiddle:
SELECT c.change_id, c.object_id, c.operation
FROM
(
SELECT MAX(change_id) AS CID
FROM changelog
GROUP BY object_id
) s
INNER JOIN changelog c on c.change_id = s.CID

;WITH MyCTE AS
(
SELECT change_id,
object_id,
operation,
ROW_NUMBER() OVER(PARTITION BY object_id ORDER BY change_id DESC) AS rn
FROM ChangeLog
)
SELECT change_id,
object_id,
operation
FROM MyCTE
WHERE rn = 1
SQL Fiddle Demo

Using Ranking functions SQL Server Ranking Functions:
SELECT change_id, object_id, operation
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY object_id ORDER BY change_id DESC) as rownum,
change_id, object_id, operation
FROM yourtable
)a
WHERE rownum = 1

The notion of "last" is easy to establish if you add a timestamp column (e.g. LAST_UPDATE) that captures when the operation occurred. You would SELECT the row WHERE the LAST_UPDATE is the max value

Related

SQL select all rows in a single row's "history"

I have a table that looks like this:
ID | PARENT_ID
--------------
0 | NULL
1 | 0
2 | NULL
3 | 1
4 | 2
5 | 4
6 | 3
Being an SQL noob, I'm not sure if I can accomplish what I would like in a single command.
What I would like is to start at row 6, and recursively follow the "history", using the PARENT_ID column to reference the ID column.
The result (in my mind) should look something like:
6|3
3|1
1|0
0|NULL
I already tried something like this:
SELECT T1.ID
FROM Table T1, Table T2
WHERE T1.ID = 6
OR T1.PARENT_ID = T2.PARENT_ID;
but that just gave me a strange result.
With a recursive cte.
If you want to start from the maximum id:
with recursive cte (id, parent_id) as (
select t.*
from (
select *
from tablename
order by id desc
limit 1
) t
union all
select t.*
from tablename t inner join cte c
on t.id = c.parent_id
)
select * from cte
See the demo.
If you want to start specifically from id = 6:
with recursive cte (id, parent_id) as (
select *
from tablename
where id = 6
union all
select t.*
from tablename t inner join cte c
on t.id = c.parent_id
)
select * from cte;
See the demo.
Results:
| id | parent_id |
| --- | --------- |
| 6 | 3 |
| 3 | 1 |
| 1 | 0 |
| 0 | |

Query to delete last two rows as one row is duplicate in SQL Server

+-------+--------+
| EMPID | SALARY |
+-------+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
| 4 | 400 |
| 4 | 400 |
| 5 | 500 |
+-------+--------+
Help me to find out the query which deletes last two rows. I have already tried using where condition but last three rows are being deleted as (4,400) is a duplicate.
You can set limit for that
DELETE `employee` ORDER BY `EMPID` DESC LIMIT 2
AS you can add your where clause like. WHERE EMPID = '104', EMPID = '105'
Try to use this query:
WITH CTE AS(
SELECT [EMPID], [SALARY],
RN = ROW_NUMBER()OVER(PARTITION BY [EMPID], [SALARY] ORDER BY (SELECT 0))
FROM dbo.myTable
)
DELETE FROM CTE WHERE RN > 1
Try this. If you want to all the duplicate records, this will delete a employee if he has duplicate salary since you said you want to delete two records.
WITH CTE
AS (SELECT *,
cnt= Count(1)OVER(PARTITION BY EMPID
ORDER BY (select null))
FROM yourTable)
DELETE FROM CTE
WHERE cnt > 1
Assuming you want to delete the last 2 rows as stated (one of which is duplicate), this should work.
Query
;WITH D as (
SELECT ROW_NUMBER() OVER(ORDER BY EMPID DESC) as rn,EMPID, SALARY
FROM YourTable
)
DELETE FROM D
WHERE rn <= 2;
SQL Fiddle

SQL delete almost identical rows

I have a table that have 5 columns, and instead of update, I've done insert of all rows(stupid mistake). How to get rid of duplicated records. They are identical except of the id. I can't remove all records, but I want do delete half of them.
ex. table:
+-----+-------+--------+-------+
| id | name | name2 | user |
+-----+-------+--------+-------+
| 1 | nameA | name2A | u1 |
| 12 | nameA | name2A | u1 |
| 2 | nameB | name2B | u2 |
| 192 | nameB | name2B | u2 |
+-----+-------+--------+-------+
How to do this?
I'm using Microsoft Sql Server.
Try the following.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY Name, Name2, User)
That is untested so may need adapting. The following video will provide you with some more information about this query.
Video
This is more specific query than #TechDo as I find duplicates where name, name2 and user are identical not only name.
with duplicates as
(
select t.id, ROW_NUMBER() over (partition by t.name, t.name2, t.[user] order by t.id) as RowNumber
from YourTable t
)
delete duplicates
where RowNumber > 1
SQLFiddle demo to try it yourself: DEMO
Please try:
with c as
(
select
*, row_number() over(partition by name, name2, [user] order by id) as n
from YourTable
)
delete from c
where n > 1;

Selecting row with highest ID based on another column

In SQL Server 2008 R2, suppose I have a table layout like this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 1 | 1 | TEST 1 |
| 2 | 1 | TEST 2 |
| 3 | 3 | TEST 3 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 6 | 6 | TEST 6 |
| 7 | 6 | TEST 7 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Is it possible to select every row with the highest UniqueID number, for each GroupID. So according to the table above - if I ran the query, I would expect this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 2 | 1 | TEST 2 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Been chomping on this for a while, but can't seem to crack it.
Many thanks,
SELECT *
FROM (SELECT uniqueid, groupid, title,
Row_number()
OVER ( partition BY groupid ORDER BY uniqueid DESC) AS rn
FROM table) a
WHERE a.rn = 1
With SQL-Server as rdbms you can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT UniqueID, GroupID, Title,
RN = ROW_NUMBER() OVER (PARTITON BY GroupID
ORDER BY UniqueID DESC)
FROM dbo.TableName
)
SELECT UniqueID, GroupID, Title
FROM CTE
WHERE RN = 1
This returns exactly one record for each GroupID even if there are multiple rows with the highest UniqueID (the name does not suggest so). If you want to return all rows in then use DENSE_RANK instead of ROW_NUMBER.
Here you can see all functions and how they work: http://technet.microsoft.com/en-us/library/ms189798.aspx
Since you have not mentioned any RDBMS, this statement below will work on almost all RDBMS. The purpose of the subquery is to get the greatest uniqueID for every GROUPID. To be able to get the other columns, the result of the subquery is joined on the original table.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT GroupID, MAX(uniqueID) uniqueID
FROM tableName
GROUP By GroupID
) b ON a.GroupID = b.GroupID
AND a.uniqueID = b.uniqueID
In the case that your RDBMS supports Qnalytic functions, you can use ROW_NUMBER()
SELECT uniqueid, groupid, title
FROM
(
SELECT uniqueid, groupid, title,
ROW_NUMBER() OVER (PARTITION BY groupid
ORDER BY uniqueid DESC) rn
FROM tableName
) x
WHERE x.rn = 1
TSQL Ranking Functions
The ROW_NUMBER() generates sequential number which you can filter out. In this case the sequential number is generated on groupid and sorted by uniqueid in descending order. The greatest uniqueid will have a value of 1 in rn.
SELECT *
FROM the_table tt
WHERE NOT EXISTS (
SELECT *
FROM the_table nx
WHERE nx.GroupID = tt.GroupID
AND nx.UniqueID > tt.UniqueID
)
;
Should work in any DBMS (no window functions or CTEs are needed)
is probably faster than a sub query with an aggregate
Keeping it simple:
select * from test2
where UniqueID in (select max(UniqueID) from test2 group by GroupID)
Considering:
create table test2
(
UniqueID numeric,
GroupID numeric,
Title varchar(100)
)
insert into test2 values(1,1,'TEST 1')
insert into test2 values(2,1,'TEST 2')
insert into test2 values(3,3,'TEST 3')
insert into test2 values(4,3,'TEST 4')
insert into test2 values(5,5,'TEST 5')
insert into test2 values(6,6,'TEST 6')
insert into test2 values(7,6,'TEST 7')
insert into test2 values(8,6,'TEST 8')

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.