Find distinct groups within table in SQL Server - sql

I have a table in SQL Server which has records like:
ID Name
---------------------
1 CTSH
1 JPMC
1 CSFB
2 CSFB
2 JPMC
2 CTSH
3 CTSH
3 MSSB
4 CTSH
4 JPMC
4 CSFB
5 CTSH
5 MSSB
I want to find out all the distinct groups based on Name. For example, all the Names with ID 1 are exactly same as the Name with ID 2 and 4. In this case, I would like to select all the records for ID 1 only.
Here is how my final output should look like:
ID Name
---------------------
1 CTSH
1 JPMC
1 CSFB
3 MSSB
3 CTSH

You just need to aggregate the ID for every name using MIN()
SELECT MIN(ID) ID, Name
FROM tableName
GROUP BY Name
SQLFiddle Demo

By simply doing this all the distinct names would be displayed along with their min ID allocated to it:
SELECT DISTINCT Name , MIN(ID) ID
FROM tableName
Group BY NAME

This is rather complicated, because you are trying to match two sets. Here is one way to approach this, using full outer join:
select *
from t
where t.id in (
select distinct min(a.id) as idunique
from (select t1.id, t2.id
from (select t.*, count(*) over (partition by id) as NumNames
from t
) t1 full outer join
(select t.*, count(*) over (partition by id) as NumNames
from t
) t2
on t1.name = t2.name
group by t1.id, t2.id
having count(*) = t1.NumNames and count(*) = t2.NumNames
) a
group by t2.id
)
Ok, this is rather complicated. Two ids have the asme set of names when all the names match and the number of matching names is the number of names on each one. This is what the aggregation/full-outer-join subquery does. The result is a set of all pairs of ids that match (including the identity).
Then, the minimum id is extracted from these pairs, using aggregation with min(), and this id is the one chosen for the final join to get all the rows corresponding to that set.

Related

COUNT of GROUP of two fields in SQL Query -- Postgres

I have a table in postgres with 2 fields: they are columns of ids of users who have looked at some data, under two conditions:
viewee viewer
------ ------
93024 66994
93156 93151
93163 113671
137340 93161
92992 93161
93161 93135
93156 93024
And I want to group them by both viewee and viewer field, and count the number of occurrences, and return that count
from high to low:
id count
------ -----
93161 3
93156 2
93024 2
137340 1
66994 1
92992 1
93135 1
93151 1
93163 1
I have been running two queries, one for each column, and then combining the results in my JavaScript application code. My query for one field is...
SELECT "viewer",
COUNT("viewer")
FROM "public"."friend_currentfriend"
GROUP BY "viewer"
ORDER BY count DESC;
How would I rewrite this query to handle both fields at once?
You can combine to columns from the table into a single one by using union all then use group by as below:
select id ,count(*) Count from (
select viewee id from vv
union all
select viewer id from vv) t
group by id
order by count(*) desc
Results:
This is a good place to use a lateral join:
select v.viewx, count(*)
from t cross join lateral
(values (t.viewee), (t.viewer)) v(viewx)
group by v.viewx
order by count(*) desc;
You can try this :
SELECT a.ID,
SUM(a.Total) as Total
FROM (SELECT t.Viewee AS ID,
COUNT(t.Viewee) AS Total
FROM #Temp t
GROUP BY t.Viewee
UNION
SELECT t.Viewer AS ID,
COUNT(t.Viewer) AS Total
FROM #Temp t
GROUP BY t.Viewer
) a
GROUP BY a.ID
ORDER BY SUM(a.Total) DESC

How to find Max value in a column in SQL Server 2012

I want to find the max value in a column
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
2 1 10 P2
3 2 50 P2
4 2 80 P1
Above is my table structure. I just want to find the max total value only from the table. In that four row ID 1 and 2 have same value in CName but total val and PName has different values. What I am expecting is have to find the max value in ID 1 and 2
Expected result:
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
4 2 80 P1
I need result same as like mention above
select Max(Tot_Val), CName
from table1
where PName in ('P1', 'P2')
group by CName
This is query I have tried but my problem is that I am not able to bring PName in this table. If I add PName in the select list means it will showing the rows doubled e.g. Result is 100 rows but when I add PName in selected list and group by list it showing 600 rows. That is the problem.
Can someone please help me to resolve this.
One possible option is to use a subquery. Give each row a number within each CName group ordered by Tot_Val. Then select the rows with a row number equal to one.
select x.*
from ( select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt ) x
where x.No = 1;
An alternative would be to use a common table expression (CTE) instead of a subquery to isolate the first result set.
with x as
(
select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt
)
select x.*
from x
where x.No = 1;
See both solutions in action in this fiddle.
You can search top-n-per-group for this kind of a query.
There are two common ways to do it. The most efficient method depends on your indexes and data distribution and whether you already have another table with the list of all CName values.
Using ROW_NUMBER
WITH
CTE
AS
(
SELECT
ID, CName, Tot_Val, PName,
ROW_NUMBER() OVER (PARTITION BY CName ORDER BY Tot_Val DESC) AS rn
FROM table1
)
SELECT
ID, CName, Tot_Val, PName
FROM CTE
WHERE rn=1
;
Using CROSS APPLY
WITH
CTE
AS
(
SELECT CName
FROM table1
GROUP BY CName
)
SELECT
A.ID
,A.CName
,A.Tot_Val
,A.PName
FROM
CTE
CROSS APPLY
(
SELECT TOP(1)
table1.ID
,table1.CName
,table1.Tot_Val
,table1.PName
FROM table1
WHERE
table1.CName = CTE.CName
ORDER BY
table1.Tot_Val DESC
) AS A
;
See a very detailed answer on dba.se Retrieving n rows per group
, or here Get top 1 row of each group
.
CROSS APPLY might be as fast as a correlated subquery, but this often has very good performance (and better than ROW_NUMBER():
select t.*
from t
where t.tot_val = (select max(t2.tot_val)
from t t2
where t2.cname = t.cname
);
Note: The performance depends on having an index on (cname, tot_val).

Fastest way to check for unique entries in Postgres

I have a table that looks something like this:
first | last
John | Smith
Bob | dfgdf
John | fggf
John | Smith
And I want to run a query that will return only rows that have a unique last name for each first name. So only Bob dfgdf should be returned. Currently, I'm grouping twice and checking if count = 1, but is there a faster way?
SELECT first FROM (
SELECT first, last FROM table1 GROUP BY first, last
)as t1 GROUP BY first HAVING COUNT(*) = 1
Try this version:
SELECT first
FROM table1
GROUP BY first
HAVING COUNT(*) = COUNT(DISTINCT last);
Demo
This just retains only first names whose record count is coincident with the count of distinct last names, which would imply that each first name maps to a distinct last name.
Edit:
If you want all columns from all matching rows, then you may try:
WITH cte AS (
SELECT first
FROM table1
GROUP BY first
HAVING COUNT(*) = COUNT(DISTINCT last)
)
SELECT t1.*
FROM table1 t1
INNER JOIN cte t2
ON t1.first = t2.first;
I would do this as:
SELECT first
FROM table1
GROUP BY first
HAVING MIN(last) = MAX(last);
Actually, this should make use of an index on table1(first, last).
If the above doesn't use the index, then I would expect the fastest way to be:
select distinct on (first) first
from table1 t1
where not exists (select 1 from table1 tt1 where tt1.first = t1.first and tt1.last <> t1.last)
order by first;
This can make use of an index on table1(first, last) for performance.

SQL Query to get all rows with duplicate values but are not part of the same group

The database schema is organized as follows:
ID | GroupID | VALUE
--------------------
1 | 1 | A
2 | 1 | A
3 | 2 | B
4 | 3 | B
In this example, I want to GET all Rows with duplicate VALUE, but are not part of the same group. So the desired result set should be IDs (3, 4), because they are not in the same group (2, 3) but still have the same VALUE (B).
I'm having trouble writing a SQL Query and would appreciate any guidance. Thanks.
So far, I'm using SQL Count, but can't figure out what to do with the GroupId.
SELECT *
FROM TABLE T
HAVING COUNT(T.VALUE) > 1
GROUP BY ID, GroupId, VALUE
The simplest method for this is using EXISTS:
SELECT
ID
FROM
MyTable T1
WHERE
EXISTS (SELECT 1
FROM MyTable
WHERE Value = t1.Value
AND GroupID <> t1.GroupID)
Here is one method. First you have to identify the values that appear in more than one group and then use that information to find the right rows in the original table:
select *
from t
where value in (SELECT value
FROM TABLE T
GROUP BY VALUE
HAVING COUNT(distinct groupid) > 1
)
order by value
Actually, I prefer a slight variant in this case, by changing the HAVING clause:
HAVING min(groupid) <> max(groupid)
This works when you are looking for more than one group and should be faster than the COUNT DISTINCT version.
SELECT ALL_.*
FROM (SELECT *
FROM TABLE_
GROUP BY ID, GROUPID, VALUE
ORDER BY ID) GROUPED,
TABLE_ ALL_
WHERE GROUPED.VALUE = ALL_.VALUE
AND GROUPED.GROUPID <> ALL_.GROUPID

Fetch the row which has the Max value for a column in SQL Server

I found a question that was very similar to this one, but using features that seem exclusive to Oracle. I'm looking to do this in SQL Server.
I have a table like this:
MyTable
--------------------
MyTableID INT PK
UserID INT
Counter INT
Each user can have multiple rows, with different values for Counter in each row. I need to find the rows with the highest Counter value for each user.
How can I do this in SQL Server 2005?
The best I can come up with is a query the returns the MAX(Counter) for each UserID, but I need the entire row because of other data in this table not shown in my table definition for simplicity's sake.
EDIT: It has come to my attention from some of the answers in this post, that I forgot an important detail. It is possible to have 2+ rows where a UserID can have the same MAX counter value. Example below updated for what the expected data/output should be.
With this data:
MyTableID UserID Counter
--------- ------- --------
1 1 4
2 1 7
3 4 3
4 11 9
5 11 3
6 4 6
...
9 11 9
I want these results for the duplicate MAX values, select the first occurance in whatever order SQL server selects them. Which rows are returned isn't important in this case as long as the UserID/Counter pairs are distinct:
MyTableID UserID Counter
--------- ------- --------
2 1 7
4 11 9
6 4 6
I like to use a Common Table Expression for that case, with a suitable ROW_NUMBER() function in it:
WITH MaxPerUser AS
(
SELECT
MyTableID, UserID, Counter,
ROW_NUMBER() OVER(PARTITION BY userid ORDER BY Counter DESC) AS 'RowNumber'
FROM dbo.MyTable
)
SELECT MyTableID, UserID, Counter
FROM MaxPerUser
WHERE RowNumber = 1
THat partitions the data over the UserID, orders it by Counter (descending) for each user, and then labels each of the rows starting with 1 for each user. Select only those rows with a 1 for rownumber and you have your max. values per user.
It's that easy :-) And I get results something like this:
MyTableID UserID Counter
2 1 7
6 4 6
4 11 9
Only one entry per user, no matter how many rows per user happen to have the same max value.
I think this will help you.
SELECT distinct(a.userid), MAX(a.counterid) as counterid
FROM mytable a INNER JOIN mytable b ON a.mytableid = b.mytableid
GROUP BY a.userid
There are several ways to do this, take a look at this Including an Aggregated Column's Related Values Several methods are shown including the performance differences
Here is one example
select t1.*
from(
select UserID, max(counter) as MaxCount
from MyTable
group by UserID) t2
join MyTable t1 on t2.UserID =t1.UserID
and t1.counter = t2.counter
Try this... I'm pretty sure this is the only way to truly make sure you get one row per User.
SELECT MT.*
FROM MyTable MT
INNER JOIN (
SELECT MAX(MID.MyTableId) AS MaxMyTableId,
MID.UserId
FROM MyTable MID
INNER JOIN (
SELECT MAX(Counter) AS MaxCounter, UserId
FROM MyTable
GROUP BY UserId
) AS MC
ON (MID.UserId = MC.UserId
AND MID.Counter = MC.MaxCounter)
GROUP BY MID.UserId
) AS MID
ON (MID.UserId = MC.UserId
AND MID.MyTableId = MC.MaxMyTableId)
select m.*
from MyTable m
inner join (
select UserID, max(Counter) as MaxCounter
from MyTable
group by UserID
) mm on m.UserID = mm.UserID and m.Counter = mm.MaxCounter