Selecting row with highest ID based on another column

Selecting row with highest ID based on another column - sql

In SQL Server 2008 R2, suppose I have a table layout like this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 1 | 1 | TEST 1 |
| 2 | 1 | TEST 2 |
| 3 | 3 | TEST 3 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 6 | 6 | TEST 6 |
| 7 | 6 | TEST 7 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Is it possible to select every row with the highest UniqueID number, for each GroupID. So according to the table above - if I ran the query, I would expect this...
+----------+---------+-------------+
| UniqueID | GroupID | Title |
+----------+---------+-------------+
| 2 | 1 | TEST 2 |
| 4 | 3 | TEST 4 |
| 5 | 5 | TEST 5 |
| 8 | 6 | TEST 8 |
+----------+---------+-------------+
Been chomping on this for a while, but can't seem to crack it.
Many thanks,

SELECT *
FROM (SELECT uniqueid, groupid, title,
Row_number()
OVER ( partition BY groupid ORDER BY uniqueid DESC) AS rn
FROM table) a
WHERE a.rn = 1

With SQL-Server as rdbms you can use a ranking function like ROW_NUMBER:
WITH CTE AS
(
SELECT UniqueID, GroupID, Title,
RN = ROW_NUMBER() OVER (PARTITON BY GroupID
ORDER BY UniqueID DESC)
FROM dbo.TableName
)
SELECT UniqueID, GroupID, Title
FROM CTE
WHERE RN = 1
This returns exactly one record for each GroupID even if there are multiple rows with the highest UniqueID (the name does not suggest so). If you want to return all rows in then use DENSE_RANK instead of ROW_NUMBER.
Here you can see all functions and how they work: http://technet.microsoft.com/en-us/library/ms189798.aspx

Since you have not mentioned any RDBMS, this statement below will work on almost all RDBMS. The purpose of the subquery is to get the greatest uniqueID for every GROUPID. To be able to get the other columns, the result of the subquery is joined on the original table.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT GroupID, MAX(uniqueID) uniqueID
FROM tableName
GROUP By GroupID
) b ON a.GroupID = b.GroupID
AND a.uniqueID = b.uniqueID
In the case that your RDBMS supports Qnalytic functions, you can use ROW_NUMBER()
SELECT uniqueid, groupid, title
FROM
(
SELECT uniqueid, groupid, title,
ROW_NUMBER() OVER (PARTITION BY groupid
ORDER BY uniqueid DESC) rn
FROM tableName
) x
WHERE x.rn = 1
TSQL Ranking Functions
The ROW_NUMBER() generates sequential number which you can filter out. In this case the sequential number is generated on groupid and sorted by uniqueid in descending order. The greatest uniqueid will have a value of 1 in rn.

SELECT *
FROM the_table tt
WHERE NOT EXISTS (
SELECT *
FROM the_table nx
WHERE nx.GroupID = tt.GroupID
AND nx.UniqueID > tt.UniqueID
)
;
Should work in any DBMS (no window functions or CTEs are needed)
is probably faster than a sub query with an aggregate

Keeping it simple:
select * from test2
where UniqueID in (select max(UniqueID) from test2 group by GroupID)
Considering:
create table test2
(
UniqueID numeric,
GroupID numeric,
Title varchar(100)
)
insert into test2 values(1,1,'TEST 1')
insert into test2 values(2,1,'TEST 2')
insert into test2 values(3,3,'TEST 3')
insert into test2 values(4,3,'TEST 4')
insert into test2 values(5,5,'TEST 5')
insert into test2 values(6,6,'TEST 6')
insert into test2 values(7,6,'TEST 7')
insert into test2 values(8,6,'TEST 8')

Related

Find the count of IDs that have the same value

I'd like to get a count of all of the Ids that have have the same value (Drops) as other Ids. For instance, the illustration below shows you that ID 1 and 3 have A drops so the query would count them. Similarly, ID 7 & 18 have B drops so that's another two IDs that the query would count totalling in 4 Ids that share the same values so that's what my query would return.
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 2 | C |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
I've tried the several approaches but the following query was my last attempt.
With cte1 (Id1, D1) as
(
select Id, Drops
from Posts
),
cte2 (Id2, D2) as
(
select Id, Drops
from Posts
)
Select count(distinct c1.Id1) newcnt, c1.D1
from cte1 c1
left outer join cte2 c2 on c1.D1 = c2.D2
group by c1.D1
The result if written out in full would be a single value output but the records that the query should be choosing should look as follows:
+------+-------+
| ID | Drops |
+------+-------+
| 1 | A |
| 3 | A |
| 7 | B |
| 18 | B |
+------+-------+
Any advice would be great. Thanks

You can use a CTE to generate a list of Drops values that have more than one corresponding ID value, and then JOIN that to Posts to find all rows which have a Drops value that has more than one Post:
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT P.*
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output:
ID Drops
1 A
3 A
7 B
18 B
If desired you can then count those posts in total (or grouped by Drops value):
WITH CTE AS (
SELECT Drops
FROM Posts
GROUP BY Drops
HAVING COUNT(*) > 1
)
SELECT COUNT(*) AS newcnt
FROM Posts P
JOIN CTE ON P.Drops = CTE.Drops
Output
newcnt
4
Demo on SQLFiddle

You may use dense_rank() to resolve your problem. if drops has the same ID then dense_rank() will provide the same rank.
Here is the demo.
with cte as
(
select
drops,
count(distinct rnk) as newCnt
from
( select
*,
dense_rank() over (partition by drops order by id) as rnk
from myTable
) t
group by
drops
having count(distinct rnk) > 1
)
select
sum(newCnt) as newCnt
from cte
Output:
|newcnt |
|------ |
| 4 |

First group the count of the ids for your drops and then sum the values greater than 1.
select sum(countdrops) as total from
(select drops , count(id) as countdrops from yourtable group by drops) as temp
where countdrops > 1;

How to get 50% records from a table in SQL Server?

Suppose I have a table with 1000 rows and I want 50% of it in the output. How can I do that? Does it have any in-built function?

Use :
SELECT
TOP 50 PERCENT *
FROM
Table1;
with Row_number
SELECT
TOP 50 PERCENT Row_Number() over (order by Column1) ,*
FROM
Table1;
Note: Row_number should have a over clause with order by column or partition by columns

The top syntax supports a percent modifier, which you can use:
SELECT TOP 50 PERCENT *
FROM mytable

Here is the solution:
select top 50 percent *
from TableName

In TSQL you can use TOP n PERCENT but you should also order the output so that the "percentage of" is also specified, otherwise the result is indeterminate. By way of a simple example if rows are unordered (in this case the first insert is 6 not 1):
CREATE TABLE mytable (id INT)
INSERT INTO mytable (id)
VALUES
(6)
, (7)
, (8)
, (9)
, (10)
, (1)
, (2)
, (3)
, (4)
, (5) ;
This, if we simply ask for top 50 percent the output is
select top 50 percent
id
from mytable
| id |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
But if we use an order by clause then the result is more meaningful.
select top 50 percent
id
from mytable
order by id
| id |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
It was also asked if a similar result could be determined using row_number(), so here is a method
select
id
from (
select
id
, count(*) over(partition by (select 1)) all_count
, row_number() over(order by id) rn
from mytable
) d
where rn <= all_count / 2
| id |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
db<>fiddle here

SELECT * FROM table
LIMIT (SELECT COUNT(*)/2 FROM table)

Get row which matched in each group

I am trying to make a sql query. I got some results from 2 tables below. Below results are good for me. Now I want those values which is present in each group. for example, A and B is present in each group(in each ID). so i want only A and B in result. and also i want make my query dynamic. Could anyone help?
| ID | Value |
|----|-------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | A |
| 3 | B |

In the following query, I have placed your current query into a CTE for further use. We can try selecting those values for which every ID in your current result appears. This would imply that such values are associated with every ID.
WITH cte AS (
-- your current query
)
SELECT Value
FROM cte
GROUP BY Value
HAVING COUNT(DISTINCT ID) = (SELECT COUNT(DISTINCT ID) FROM cte);
Demo

The solution is simple - you can do this in two ways at least. Group by letters (Value), aggregate IDs with SUM or COUNT (distinct values in ID). Having that, choose those letters that have the value for SUM(ID) or COUNT(ID).
select Value from MyTable group by Value
having SUM(ID) = (SELECT SUM(DISTINCT ID) from MyTable)
select Value from MyTable group by Value
having COUNT(ID) = (SELECT COUNT(DISTINCT ID) from MyTable)

Use This
WITH CTE
AS
(
SELECT
Value,
Cnt = COUNT(DISTINCT ID)
FROM T1
GROUP BY Value
)
SELECT
Value
FROM CTE
WHERE Cnt = (SELECT COUNT(DISTINCT ID) FROM T1)

MS SQL - Select only one row for an ID

I have a changelog with insert / update / delete operations:
change_id | object_id | operation
----------+-----------+----------
1 | 1 | insert
2 | 2 | insert
3 | 1 | delete
4 | 1 | insert
5 | 3 | insert
6 | 2 | delete
7 | 4 | insert
8 | 3 | update
I need to select only the last row for each object_id and keep the result sorted by change_id. The result should look like this:
change_id | object_id | operation
----------+-----------+----------
4 | 1 | insert
6 | 2 | delete
7 | 4 | insert
8 | 3 | update
How can I do this? Is it possible with a simple query, without stored procedures?

SQL Fiddle:
SELECT c.change_id, c.object_id, c.operation
FROM
(
SELECT MAX(change_id) AS CID
FROM changelog
GROUP BY object_id
) s
INNER JOIN changelog c on c.change_id = s.CID

;WITH MyCTE AS
(
SELECT change_id,
object_id,
operation,
ROW_NUMBER() OVER(PARTITION BY object_id ORDER BY change_id DESC) AS rn
FROM ChangeLog
)
SELECT change_id,
object_id,
operation
FROM MyCTE
WHERE rn = 1
SQL Fiddle Demo

Using Ranking functions SQL Server Ranking Functions:
SELECT change_id, object_id, operation
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY object_id ORDER BY change_id DESC) as rownum,
change_id, object_id, operation
FROM yourtable
)a
WHERE rownum = 1

The notion of "last" is easy to establish if you add a timestamp column (e.g. LAST_UPDATE) that captures when the operation occurred. You would SELECT the row WHERE the LAST_UPDATE is the max value

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.

For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.

If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t

Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle

How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting row with highest ID based on another column - sql

SELECT * FROM (SELECT uniqueid, groupid, title, Row_number() OVER ( partition BY groupid ORDER BY uniqueid DESC) AS rn FROM table) a WHERE a.rn = 1

SELECT * FROM the_table tt WHERE NOT EXISTS ( SELECT * FROM the_table nx WHERE nx.GroupID = tt.GroupID AND nx.UniqueID > tt.UniqueID ) ; Should work in any DBMS (no window functions or CTEs are needed) is probably faster than a sub query with an aggregate

Related

Find the count of IDs that have the same value

How to get 50% records from a table in SQL Server?

Get row which matched in each group

MS SQL - Select only one row for an ID

Grouping SQL Results based on order

Categories

Resources