How to find columns that only have one value - Postgresql - sql

I have 2 tables, person(email, first_name, last_name, postcode, place_name) and location(postcode, place_name). I am trying to find people that live in places where only one person lives. I tried using SELECT COUNT() but failed because I couldn't figure out what to count in this situation.
SELECT DISTINCT email,
first_name,
last_name
FROM person
INNER JOIN location USING(postcode,
place_name)
WHERE 1 <=
(SELECT COUNT(?))

Aggregate functions always go with having:
SELECT DISTINCT first_value(email) over (partition by place_name),
first_value(first_name) over (partition by place_name),
first_value(last_name) over (partition by place_name),
count(*)
FROM person
INNER JOIN location USING(postcode,
place_name)
GROUP BY place_name
HAVING count(*) = 1
For more about the window functions (like first_value) check out this tutorial.

I would do this as follows. I find it plain and simple.
select p1.* from
person p1
join
(
select p.postcode, p.place_name, count(*) cnt from
person p
group by p.postcode, p.place_name
) t on p1.postcode = t.postcode and p1.place_name = t.place_name and t.cnt = 1
How does it work?
In the inner query (aliased t) we just count how many people live in each location.
Then we join the result of it (t) with the table person (aliased p1) and in the join we require t.cnt = 1. This is probably the most natural way of doing it, I think.

Thanks to the help of people here, I found this answer:
SELECT first_name,
last_name,
email
FROM person
WHERE postcode IN
(SELECT postcode
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(place_name)=1
ORDER BY postcode)
AND place_name IN
(SELECT place_name
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(postcode)=1
ORDER BY place_name)

Related

Selecting rows with the most repeated values at specific column

Problem in general words: I need to select value from one table referenced to the most repeated values in another table.
Tables have this structure:
screenshot
screenshot2
The question is to find country which has the most results from sportsmen related to it.
First, INNER JOIN tables to have relation between result and country
SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id);
Then, I count how much time each country appear
SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id))
GROUP BY country
;
And got this screenshot3
Now it feels like I'm one step away from solution ))
I guess it's possible with one more SELECT FROM (SELECT ...) and MAX() but I can't wrap it up?
ps:
I did it with doubling the query like this but I feel like it's so inefficient if there are millions of rows.
SELECT country
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
)
WHERE highest_participation = (SELECT MAX(highest_participation)
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
))
Also I did it with a view
CREATE VIEW temp AS
SELECT country as country_with_most_participations, COUNT(country) as country_participate_in_#_comp
FROM(
SELECT country, competition_id FROM result
INNER JOIN sportsman USING(sportsman_id)
)
GROUP BY country;
SELECT country_with_most_participations FROM temp
WHERE country_participate_in_#_comp = (SELECT MAX(country_participate_in_#_comp) FROM temp);
But not sure if it's easiest way.
If I understand this correctly you want to rank the countries per competition count and show the highest ranking country (or countries) with their count. I suggest you use RANK for the ranking.
select country, competition_count
from
(
select
s.country,
count(*) as competition_count,
rank() over (order by count(*) desc) as rn
from sportsman s
inner join result r using (sportsman_id)
group by s.country
) ranked_by_count
where rn = 1
order by country;
If the order of the result rows doesn't matter, you can shorten this to:
select s.country, count(*) as competition_count
from sportsman s
inner join result r using (sportsman_id)
group by s.country
order by count(*) desc
fetch first rows with ties;
You seem to be overcomplicating this. Starting from your existing join query, you can aggregate, order the results and keep the top row(s) only.
select s.country, count(*) cnt
from sportsman s
inner join result r using (sportsman_id)
group by s.country
order by cnt desc
fetch first 1 row with ties
Note that this allows top ties, if any.
SELECT country
FROM (SELECT country, COUNT(country) AS highest_participation
FROM (SELECT competition_id, country FROM result
INNER JOIN sportsman USING (sportsman_id)
) GROUP BY country
order by 2 desc
)
where rownum=1

Ambiguous Columns and Count

My query requires me to find the busiest location by number of people (Schema Attached)
select DISTINCT location.name as 'Location', f_name|| ' ' || l_name as 'Citizen'
from CheckIn
join location on checkin.LocID = Location.LocID
join person on person.PersonID = CheckIn.PersonID
With the above query. I can find the people who visited the locations but I cannot find the most number of people who visited a location as they all show individually. I know I need to add a count or group by.
If I try to put the count as per below
select DISTINCT location.name as 'Location', f_name|| ' ' || l_name as 'Citizen', count (personid)
from CheckIn
join location on checkin.LocID = Location.LocID
join person on person.PersonID = CheckIn.PersonID
It will show me ambiguous column. I know it is saying that because I have used it as a join but then how do I count the persons If I cannot reuse it?
How do I fix this code to show me the busiest location by number of people?
This query:
SELECT LID, COUNT(*)
FROM CheckIn
GROUP BY LID
returns the number of people visited each LID.
You can also use window function RANK() to rank each location by the number of people:
SELECT LID, RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM CheckIn
GROUP BY LID
Finally you use this query with the operator IN to return the details of the locations ranked first:
SELECT * FROM Location
WHERE LocationID IN (
SELECT LID
FROM (
SELECT LID, RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM CheckIn
GROUP BY LID
)
WHERE rnk = 1
)
You can try below:
select
name, COUNT(P.PersonID) as People_Count
from Location AS L
inner join checkin as C on C.LocID = L.LocID
inner join person as P on P.PersonID = C.PersonID
group by name
order by COUNT(P.PersonID) desc

project to which maximum number of employees have been allocated

I have these tables with the following columns :
Employee24 (EMPLOYEEID, FIRSTNAME, LASTNAME, GENDER);
PROJECT24 (PROJECTID PROJECTNAME EMPLOYEEID);
I want to write a query to find project to which maximum number of employees are alocated.
SELECT FIRSTNAME, LASTNAME
FROM EMPLOYEE24 E
WHERE E.EMPLOYEEID IN ( SELECT L2.EMPLOYEEID
FROM PROJECT24 L2 group by l2.employeeid)\\
What do you want to do if there are ties? This is an important question and why row_number()/rank() might be a better choice:
select p.*
from (select p.projectid, p.projectname, count(*) as num_employees,
rank() over (order by count(*) desc) as seqnum
from project25 p
group by p.projectid, p.projectname
) p
where seqnum = 1;
Notes:
The above query returns all rows if there are ties. If you want only one (arbitrary) project when there is a tie, then use row_number().
I see no reason to join to employee24.
Your data structure is strange. The relationship between projects and employees should be in a separate table, say project_employees. That should have projectid, but not the name. The name should be in project24.
You might try something like this (though I'm quite sure it can be done in other ways):
SELECT *
FROM (SELECT prj.projectid,
prj.projectname,
COUNT(*) AS number_employees
FROM project24 prj
JOIN employee24 emp
ON prj.employeeid = emp.employeeid
GROUP BY prj.projectid,
prj.projectname
ORDER BY number_employees DESC)
WHERE ROWNUM = 1;

SQL query to find students who received an A for every course taken

I have following schema:
Students(sid, firstname, lastname, status, gpa, email)
Courses(dept_code, course#, title)
Classes(classid, dept_code, course#, sect#, year, semester, limit, class_size)
Enrollments(sid, classid, lgrade)
I need some help to find out all the students who received an A for every course taken.
I might suggest doing this with an aggregation:
select e.sid
from enrollement e
group by e.sid
having min(lgrade) = max(lgrade) and min(lgrade) = 'A';
try this
select * from students
where sid not in (select distinct sid from enrollement where coalesce (lgrade,'X') <> 'A')
It means: take all students where none of his/her grade is other than A
if you wanna to get also the name of the class and course, you have to join also both tables
Try not to overthink this:
SELECT s.*
FROM STUDENTS s
WHERE s.GPA = 4.0
That'll work for the A=4, B=3, C=2, D=1, F=0 case (standard American grading system).
For non-standard systems (such as my kids high school where A=5 for honors and advanced placement classes) we can't trust the GPA:
SELECT s.*
FROM STUDENTS s
INNER JOIN (SELECT SID, COUNT(*) AS CLASS_COUNT
FROM ENROLLMENTS
GROUP BY SID) cc
INNER JOIN (SELECT SID, COUNT(*) AS A_GRADE_COUNT
FROM ENROLLMENTS
WHERE LGRADE = 'A'
GROUP BY SID) ag
ON ag.SID = s.SID
WHERE CLASS_COUNT = A_GRADE_COUNT
Best of luck.
I think this is the clearest:
select e.sid
from enrollement e
group by e.sid
having count(case when lgrade = 'A' then 1 else 0 end) = count(*) and count(*) > 0

sql : how to find duplicates rows?

i got a table users (id,first_name,last_name,...)
i want to find duplicate users in that table (users which has the same first_name AND the same last_name).
let's say my data is :
1;bill;campton
2;sarah;connor
3;bill;campton
i need to get
1;bill;campton;3;bill;campton
i dont want to get
1;bill;campton;3;bill;campton
3;bill;campton;1;bill;campton
How could i do that?
I use SQL Server 2005
thank you
One way
select first_name, last_name
from table
group by first_name, last_name
having count(*) > 1
If you want also the IDs then you can do this
SELECT t1.*
FROM table t1
join
(select first_name, last_name
from table
group by first_name, last_name
having count(*) > 1) x ON t1.last_name = x.last_name
AND t1.first_name = x.first_name
You could use:
select u1.id, u2.id, u1.first_name, u1.last_name
from users u1
inner join users u2
on u1.first_name = u2.first_name
and u1.last_name = u2.last_name
where u2.id > u1.id
Or, to get your 6 rows, use
select u1.id, u1.first_name, u1.last_name, u2.id, u2.first_name, u2.last_name
etc.
I Just figure this out. It's very simple. You can use a Common Table Expression and Window partition.
This example finds all students with the same name and DOB. The fields you want to check for duplication go in the partition. You could include what ever other fields you want in the projection.
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
Order By DOB
Given the output format you said you wanted, this works:
select
o.id,
o.firstname,
o.lastname,
d.id,
d.firstname,
d.lastname
from
users o
join users d on d.firstname = o.firstname and d.lastname = o.lastname and o.id < d.id
Note that if you have more than one duplicate that you will get results that you probably don't want, and so SQLMenace's solution is probably much better overall.