Distinct on specific columns in SQL - sql

I know someone on here already asked the similar questions. However, most of them still want to return the first row or last row if multiple rows have the same attributes. For my case, I want to simply discard the rows which have the same specific attributes.
For example, I have a toy dataset like this:
gender age name
f 20 zoe
f 20 natalia
m 39 tom
f 20 erika
m 37 eric
m 37 shane
f 22 jenn
I only want to distinct on gender and age, then discard all rows if those two attributes, which returns:
gender age name
m 39 tom
f 22 jenn

You could use the window (analytic) variant of count to find the rows that have a just one occurance of the gender/age combination:
SELECT gender, age, name
FROM (SELECT gender, age, name, COUNT(*) OVER (PARTITION BY gender, age) AS cnt
FROM mytable) t
WHERE cnt = 1

Use the HAVING clause in a CTE.
;WITH DistinctGenderAges AS
(
SELECT gender
,age
FROM YourTable
GROUP BY gender
,age
HAVING COUNT(*) = 1
)
SELECT yt.gender, yt.age, yt.name
FROM DistinctGenderAges dga
INNER JOIN YourTable yt ON dga.gender = yt.gender AND dga.age = yt.age

No matter what, you have to tell the database which value to pick for name. If you don't care an easy solution is to group:
SELECT gender, age, MIN(name) as name FROM mytable GROUP BY gender, age HAVING COUNT(*)=1
You can use any valid aggregate for name, but you have to pick something.

Related

How to get values of one column without the aggregate column?

I have this table:
first_name
last_name
age
country
John
Doe
31
USA
Robert
Luna
22
USA
David
Robinson
22
UK
John
Reinhardt
25
UK
Betty
Doe
28
UAE
How can I get only the names of the oldest per country?
When I do this query
SELECT first_name,last_name, MAX(age)
FROM Customers
GROUP BY country
I get this result:
first_name
last_name
MAX(age)
Betty
Doe
31
John
Reinhardt
22
John
Doe
31
But I want to get only first name and last name without the aggregate function.
If window functions are an option, you can use ROW_NUMBER for this task.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY country ORDER BY age DESC) AS rn
FROM tab
)
SELECT first_name, last_name, age, country
FROM cte
WHERE rn = 1
Check the demo here.
It sounds like you want to get the oldest age per country first,
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
With that, you want to match that back to the original table (aka a join) to see which names they match up to.
So, something like this perhaps:
SELECT Customers.*
FROM Customers
INNER JOIN
(
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
) AS max_per_country_query
ON Customers.Country = max_per_country_query.Country
AND Customers.Age = max_per_country_query.MAX_AGE_IN_COUNTRY
If your database supports it, I prefer using the CTE style of handling these subqueries because it's easier to read and debug.
WITH cte_max_per_country AS (
SELECT Country, MAX(age) AS MAX_AGE_IN_COUNTRY
FROM Customers
GROUP BY Country
)
SELECT Customers.*
FROM Customers C
INNER JOIN cte_max_per_country
ON C.Country = cte_max_per_country.Country
AND C.Age = cte_max_per_country.MAX_AGE_IN_COUNTRY

Select all unique values of all attributes in one query

I have a table and I want to select all unique values of all attributes in one query.
For example table Person with 3 columns name, age, city.
Example:
Name
age
city
Alex
34
New York
Leo
34
London
Roy
20
London
Alex
28
Moscow
Mike
36
London
And I want to have a result with unique values of every attribute
Name
age
city
Alex
20
New York
Leo
28
London
Roy
34
Moscow
36
Is it possible to do this query?
I tried to make some queries with DISTINCT and UNION, but the result with always a multiplication of rows.
This is not how relational databases work, but sometimes you got to do what you got to do.
You can do:
select a.name, b.age, c.city
from (select distinct name, row_number() over() as rn from t) a
full join (select distinct age, row_number() over() as rn from t) b on b.rn = a.rn
full join (select distinct city, row_number() over() as rn from t) c
on c.rn = coalesce(a.rn, b.rn)
One option is to aggregate into array, then unnest those arrays:
select x.*
from (
select array_agg(distinct name) as names,
array_agg(distinct age) as ages,
array_agg(distinct city) as cities
from the_table
) d
cross join lateral unnest(d.names, d.ages, d.cities) with ordinality as x(name, age, city);
I would expect this to be quite slow if you really have many distinct values ("millions"), but if you only expect very few distinct values ("hundreds" or "thousands") , then this might be OK.

Select distinct lines by a field

I am making a select that returns me a table likes this
Name surname
Jhon a
Jhon b
Jhon c
Joe a
Joe b
Joe c
But what I need to get is just one occurrence of Jhon and one of Joe with one of the surnames.
I can only have one Jhon with one surname and one Joe with a surname..
I cannot make an order by because I need to select Name and surname.. Also if I use distinct I will have all Jhons and Joes..
Can you help me?
You can just use aggregation:
select name, max(surname) as surname
from table t
group by name;
You can also do something similar with analytic functions:
select t.name, t.surname
from (select t.*, row_number() over (partition by name order by name) as seqnum
from table t
) t
where seqnum = 1;
This is particularly useful if you want to get more than one column from the same row.

Simple SQL query for Min and Max

So I am trying to find the age of the oldest and youngest male and female patients along with the average age of male and female patients in the clinic I work. I am new to SQL but essentially it all comes from one table I believe which is named "Patients". Inside the Patients table there is a column for Gender which has Either M for male or F for female. There is also an age column. I am guessing this is really simple and I am just making this to complicated but could someone try to help me out?
My Query is pretty limited. I know that if you do something along the lines of:
Select
Min(AGE) AS AGEMIN,
MAX(AGE) AS AGEMAX
From Patients
Use the GROUP BY clause:
select * from #MyTable
M 10
M 15
M 20
F 30
F 35
F 40
select Gender, MIN(Age), MAX(Age), AVG(Age)
from #MyTable
group by Gender
F 30 40 35
M 10 20 15
Here you go
SELECT gender, AVG(age) as avgage, MAX(age) as maxage, MIN(age) as minage
FROM patients
group by gender;

In a SQL GROUP BY query, what value is used for the non-aggregate columns?

Say I've got the following data back from a SQL query:
Lastname Firstname Age
Anderson Jane 28
Anderson Lisa 22
Anderson Jack 37
If I want to know the age of the oldest person with the last name Anderson, I can select MAX(Age) and GROUP BY Lastname. But I also want to know the first name of that oldest person. How can I make sure that, when the Firstname values are collapsed into one row by the GROUP BY, I get the Firstname value from the same row where I got the max age?
For those RDBMS that support it (e.g., SQL Server 2005+), you can use a window function:
select t.Lastname, t.Firstname, t.Age
from (select Lastname, Firstname, Age,
row_number() over (partition by Lastname order by Age desc) as RowNum
from YourTable
) t
where t.RowNum = 1
For others, you'd need a subquery on Lastname and a join to get Firstname:
select yt.Lastname, yt.Firstname, yt.Age
from YourTable yt
inner join (select LastName, max(Age) as MaxAge
from YourTable
group by LastName) q
on yt.Lastname = q.Lastname
and yt.Age = q.MaxAge
You have to join back to the table from your grouped results - i.e. create a view or a nested query to contain the group by.
The main thing you need to watch out for whatever your approach is that there might be more than 1 firstname with the same age for a given lastname.
This query will return just 1 row, but if your data set had more than one 'Anderson' aged 37, it could return either one:
select firstname, age
from yourtable
where lastname = 'Anderson'
order by age desc limit 1