Fastest way to check for unique entries in Postgres - sql

I have a table that looks something like this:
first | last
John | Smith
Bob | dfgdf
John | fggf
John | Smith
And I want to run a query that will return only rows that have a unique last name for each first name. So only Bob dfgdf should be returned. Currently, I'm grouping twice and checking if count = 1, but is there a faster way?
SELECT first FROM (
SELECT first, last FROM table1 GROUP BY first, last
)as t1 GROUP BY first HAVING COUNT(*) = 1

Try this version:
SELECT first
FROM table1
GROUP BY first
HAVING COUNT(*) = COUNT(DISTINCT last);
Demo
This just retains only first names whose record count is coincident with the count of distinct last names, which would imply that each first name maps to a distinct last name.
Edit:
If you want all columns from all matching rows, then you may try:
WITH cte AS (
SELECT first
FROM table1
GROUP BY first
HAVING COUNT(*) = COUNT(DISTINCT last)
)
SELECT t1.*
FROM table1 t1
INNER JOIN cte t2
ON t1.first = t2.first;

I would do this as:
SELECT first
FROM table1
GROUP BY first
HAVING MIN(last) = MAX(last);
Actually, this should make use of an index on table1(first, last).
If the above doesn't use the index, then I would expect the fastest way to be:
select distinct on (first) first
from table1 t1
where not exists (select 1 from table1 tt1 where tt1.first = t1.first and tt1.last <> t1.last)
order by first;
This can make use of an index on table1(first, last) for performance.

Related

Stuck to select maximum row

I have a table with columns:
ID | FULLNAME | VALUE
01 Joseph 10
02 Sam 50
... ... ...
I need to select row with maximum value, and show info like
FULLNAME | VALUE
I tried using group function MAX(), but I can't select fullname, because if I use it as a GROUP BY expression, it will select max in groups.
Other way, is to use WITH statement, order table by value desc, use
rank() OVER (PARTITION BY ID) AS max_id
function and maximum value will be on max_id = 1, and then use
WHERE max_id = 1
to remove other rows.
But I think there is a way to do this better and I can't find one.
UPDATE:
A tricky solution to this problem is
SELECT *
FROM t t1
LEFT JOIN t t2 ON t1.value<t2.value
WHERE t2.value IS NULL
The simplest way is to sort the data and pull one row:
select t.*
from t
order by value desc
fetch first 1 row only;
If you want ties, you can add with ties to the fetch first.
Another method is:
select t.*
from t
where t.value = (select max(t2.value) from t t2);
This can have very good performance with an index on value.

How to find Max value in a column in SQL Server 2012

I want to find the max value in a column
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
2 1 10 P2
3 2 50 P2
4 2 80 P1
Above is my table structure. I just want to find the max total value only from the table. In that four row ID 1 and 2 have same value in CName but total val and PName has different values. What I am expecting is have to find the max value in ID 1 and 2
Expected result:
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
4 2 80 P1
I need result same as like mention above
select Max(Tot_Val), CName
from table1
where PName in ('P1', 'P2')
group by CName
This is query I have tried but my problem is that I am not able to bring PName in this table. If I add PName in the select list means it will showing the rows doubled e.g. Result is 100 rows but when I add PName in selected list and group by list it showing 600 rows. That is the problem.
Can someone please help me to resolve this.
One possible option is to use a subquery. Give each row a number within each CName group ordered by Tot_Val. Then select the rows with a row number equal to one.
select x.*
from ( select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt ) x
where x.No = 1;
An alternative would be to use a common table expression (CTE) instead of a subquery to isolate the first result set.
with x as
(
select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt
)
select x.*
from x
where x.No = 1;
See both solutions in action in this fiddle.
You can search top-n-per-group for this kind of a query.
There are two common ways to do it. The most efficient method depends on your indexes and data distribution and whether you already have another table with the list of all CName values.
Using ROW_NUMBER
WITH
CTE
AS
(
SELECT
ID, CName, Tot_Val, PName,
ROW_NUMBER() OVER (PARTITION BY CName ORDER BY Tot_Val DESC) AS rn
FROM table1
)
SELECT
ID, CName, Tot_Val, PName
FROM CTE
WHERE rn=1
;
Using CROSS APPLY
WITH
CTE
AS
(
SELECT CName
FROM table1
GROUP BY CName
)
SELECT
A.ID
,A.CName
,A.Tot_Val
,A.PName
FROM
CTE
CROSS APPLY
(
SELECT TOP(1)
table1.ID
,table1.CName
,table1.Tot_Val
,table1.PName
FROM table1
WHERE
table1.CName = CTE.CName
ORDER BY
table1.Tot_Val DESC
) AS A
;
See a very detailed answer on dba.se Retrieving n rows per group
, or here Get top 1 row of each group
.
CROSS APPLY might be as fast as a correlated subquery, but this often has very good performance (and better than ROW_NUMBER():
select t.*
from t
where t.tot_val = (select max(t2.tot_val)
from t t2
where t2.cname = t.cname
);
Note: The performance depends on having an index on (cname, tot_val).

Select single first occurrence of row against distinct local ID from a table and insert it in another table

I want a postgre SQL query that selects only first row from table against distinct LocalID and enter the result in another table.
Records:
ID| LocalID| Name
1 233 Tim
2 633 John
3 633 Alex
4 234 Mike
5 233 Dave
6 556 Kim
Wanted result:
ID| LocalID| Name
1 233 Tim
2 633 John
4 234 Mike
6 556 Kim
I tried using
CREATE TABLE Weeklylist AS (select distinct on (localid) * from Monthlylist)
But this query select the last distinct record and enters it into the table. All i want is the first occurrence of the row containing distinct localID should be entered in the table.
The use of distinct on in your existing statement indicates that you are using Postgres.
The problem with your query is that it is missing an ORDER BY clause. Without it, it is undefined which record will be selected (you are seeing the last record being picked, but this is not guaranteed to be consistent over subsequent executions of the same query). So, add the ORDER BY clause:
create table Weeklylist as
select distinct on (localid) * from Monthlylist order by localid, id
Side note: parentheses around the select statement are superfluous here.
You can use DISTINCT ON in PostgreSQL :
CREATE TABLE Weeklylist
AS
SELECT DISTINCT ON (LocalID) *
FROM Monthlylist ml
ORDER BY LocalID, ID -- Missing in your query
In MySQL older version correlated sub-query is one way :
SELECT ml.*
FROM Monthlylist ml
WHERE ml.id = (SELECT MIN(ml1.id) FROM Monthlylist ml1 WHERE ml1.LocalID = ml.LocalID);
This will give you what you need:
select *
from Monthlylist
where id in (
select min(id)
from Monthlylist
group by localid
)
create table WeeklyList as
select *
from Monthlylist
where id in (
select min(id)
from Monthlylist
group by localid
)
Demo on DB Fiddle
You need a subquery and join to get your desired output.
create table Weeklylist AS (
select t.* from Monthlylist t
inner join (select distinct on (localid) * from Monthlylist) t1 on t1.localid = t.localid and t.id = t1.id
order by id, localid)
see sqlfiddle

What exactly does SELECT DISTINCT(COUNT(*)) do?

I used the following query and it returned what I wanted it to return, but I'm having a tough time wrapping my head around what the query is doing.
Query is nothing fancier than what's in the title: select distinct(count(*)) from table1
Distinct is not required in your SQL ,as you are going to get only result, count(*) without group by clause returns, count of all rows within that table.
Hence try this :
select count(*) from table1
Distinct is used for finding distinct values from a group of values:
say you have table1 , with column1 as :
Column1
----------
a
a
b
b
a
c
following sqls are run you will get output as :
1) select count(*) from table1
output :6
2) select distinct(count(*)) from table1
output :6
3) select count( distinct column1) from table1
output :3
Usually distinct is used inside count preferably with a particular column .
select count( distinct column_name_n ) from table1
The distinct is redundant... Select Count(*) with only one table can only generate one value, so distinct (which would eliminate duplicates) is irelelvant.
If you had multiple outputs, (if for example you were grouping on something) then it would cause the query to only display one output row for every distinct value of count(*) that would other wise be generated...
if, for example, you had
name
Bob
Bob
Bob
Bob
Mary
Mary
Mary
Mary
Dave
Dave
Al
George
then
select count(*)
From table
group By name
would result in
4
4
2
1
1
but
select distinct count(*)
From table
group By name
would result in
4
2
1

Find distinct groups within table in SQL Server

I have a table in SQL Server which has records like:
ID Name
---------------------
1 CTSH
1 JPMC
1 CSFB
2 CSFB
2 JPMC
2 CTSH
3 CTSH
3 MSSB
4 CTSH
4 JPMC
4 CSFB
5 CTSH
5 MSSB
I want to find out all the distinct groups based on Name. For example, all the Names with ID 1 are exactly same as the Name with ID 2 and 4. In this case, I would like to select all the records for ID 1 only.
Here is how my final output should look like:
ID Name
---------------------
1 CTSH
1 JPMC
1 CSFB
3 MSSB
3 CTSH
You just need to aggregate the ID for every name using MIN()
SELECT MIN(ID) ID, Name
FROM tableName
GROUP BY Name
SQLFiddle Demo
By simply doing this all the distinct names would be displayed along with their min ID allocated to it:
SELECT DISTINCT Name , MIN(ID) ID
FROM tableName
Group BY NAME
This is rather complicated, because you are trying to match two sets. Here is one way to approach this, using full outer join:
select *
from t
where t.id in (
select distinct min(a.id) as idunique
from (select t1.id, t2.id
from (select t.*, count(*) over (partition by id) as NumNames
from t
) t1 full outer join
(select t.*, count(*) over (partition by id) as NumNames
from t
) t2
on t1.name = t2.name
group by t1.id, t2.id
having count(*) = t1.NumNames and count(*) = t2.NumNames
) a
group by t2.id
)
Ok, this is rather complicated. Two ids have the asme set of names when all the names match and the number of matching names is the number of names on each one. This is what the aggregation/full-outer-join subquery does. The result is a set of all pairs of ids that match (including the identity).
Then, the minimum id is extracted from these pairs, using aggregation with min(), and this id is the one chosen for the final join to get all the rows corresponding to that set.