PostgreSQL: how to get max() of GROUP BY? - sql

I have a table called history with the fields id, lastname, event and date.
It saves events of persons that works in my office like "Entering office", "Exiting office", "Illness", "Holidays", etc.
I need to get the last event of every person.
This is what I tried but it doesn't work:
SELECT lastname, event, max(date)
FROM personshistory
GROUP BY lastname, event;

You can use distinct on:
select distinct on (lastname) ph.*
from personshistory
order by lastname, date desc;
distinct on is a very convenient Postgres extension. It keeps one row for each value of the expressions in parentheses. The specific row is determined by the order by clause -- based on the keys that follow the distinct on keys.

With NOT EXISTS:
SELECT p.lastname, p.event, p.date
FROM personshistory p
WHERE NOT EXISTS (
SELECT 1 FROM personshistory
WHERE lastname = p.lastname and date > p.date
)
or join your query (without the event column) to the table:
SELECT p.lastname, p.event, p.date
FROM personshistory p INNER JOIN (
SELECT lastname, MAX(date) maxdate
FROM personshistory
GROUP BY lastname
) g on g.lastname = p.lastname and g.maxdate = p.date

Related

Get a column without adding it to the group by

select year, gender, max(nHospitalizations) from (select TO_CHAR(i.since, 'YYYY') as year, u.gender, h.name,count(h.name) as nHospitalizations from hospital h
join hospitalization i on i.hospital = h.name
join person u on i.person = u.numberID
group by TO_CHAR(i.since, 'YYYY'), u.gender, h.name)
group by year, gender
order by year desc, gender asc
;
I have this query, and it's pretty much doing what I want it too, except, I want to know the hospital name with the most hospitalizations per year, but when I add the h.name to the select, SQL makes me add it to the outer group by, which would mean I would be getting the count per year, gender and hospital name like in the subquery, instead of the hospital with most hospitalizations per year and gender, how can I add the h.name to the outer query without adding it to the outer group by?
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
You want to use window functions for this:
select *
from (select count(*) as nHospitalizations, to_char(i.since, 'YYYY') as year,
u.gender, h.name,
row_number() over (partition by min(i.since) order by count(*) desc) as seqnum
from hospital h join
hospitalization i
i.hospital = h.name join
person u
on i.person = u.numberID
group by TO_CHAR(i.since, 'YYYY'), u.gender, h.name
) x
where serqnum = 1

Ambiguous Columns and Count

My query requires me to find the busiest location by number of people (Schema Attached)
select DISTINCT location.name as 'Location', f_name|| ' ' || l_name as 'Citizen'
from CheckIn
join location on checkin.LocID = Location.LocID
join person on person.PersonID = CheckIn.PersonID
With the above query. I can find the people who visited the locations but I cannot find the most number of people who visited a location as they all show individually. I know I need to add a count or group by.
If I try to put the count as per below
select DISTINCT location.name as 'Location', f_name|| ' ' || l_name as 'Citizen', count (personid)
from CheckIn
join location on checkin.LocID = Location.LocID
join person on person.PersonID = CheckIn.PersonID
It will show me ambiguous column. I know it is saying that because I have used it as a join but then how do I count the persons If I cannot reuse it?
How do I fix this code to show me the busiest location by number of people?
This query:
SELECT LID, COUNT(*)
FROM CheckIn
GROUP BY LID
returns the number of people visited each LID.
You can also use window function RANK() to rank each location by the number of people:
SELECT LID, RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM CheckIn
GROUP BY LID
Finally you use this query with the operator IN to return the details of the locations ranked first:
SELECT * FROM Location
WHERE LocationID IN (
SELECT LID
FROM (
SELECT LID, RANK() OVER (ORDER BY COUNT(*) DESC) rnk
FROM CheckIn
GROUP BY LID
)
WHERE rnk = 1
)
You can try below:
select
name, COUNT(P.PersonID) as People_Count
from Location AS L
inner join checkin as C on C.LocID = L.LocID
inner join person as P on P.PersonID = C.PersonID
group by name
order by COUNT(P.PersonID) desc

How to find columns that only have one value - Postgresql

I have 2 tables, person(email, first_name, last_name, postcode, place_name) and location(postcode, place_name). I am trying to find people that live in places where only one person lives. I tried using SELECT COUNT() but failed because I couldn't figure out what to count in this situation.
SELECT DISTINCT email,
first_name,
last_name
FROM person
INNER JOIN location USING(postcode,
place_name)
WHERE 1 <=
(SELECT COUNT(?))
Aggregate functions always go with having:
SELECT DISTINCT first_value(email) over (partition by place_name),
first_value(first_name) over (partition by place_name),
first_value(last_name) over (partition by place_name),
count(*)
FROM person
INNER JOIN location USING(postcode,
place_name)
GROUP BY place_name
HAVING count(*) = 1
For more about the window functions (like first_value) check out this tutorial.
I would do this as follows. I find it plain and simple.
select p1.* from
person p1
join
(
select p.postcode, p.place_name, count(*) cnt from
person p
group by p.postcode, p.place_name
) t on p1.postcode = t.postcode and p1.place_name = t.place_name and t.cnt = 1
How does it work?
In the inner query (aliased t) we just count how many people live in each location.
Then we join the result of it (t) with the table person (aliased p1) and in the join we require t.cnt = 1. This is probably the most natural way of doing it, I think.
Thanks to the help of people here, I found this answer:
SELECT first_name,
last_name,
email
FROM person
WHERE postcode IN
(SELECT postcode
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(place_name)=1
ORDER BY postcode)
AND place_name IN
(SELECT place_name
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(postcode)=1
ORDER BY place_name)

need help writing subquery

I'm using a health database and trying to display patients who have visited the health facility more than two times. The basic query I have so far is
SELECT FirstName, LastName
FROM PATIENT
I know I have to use a subquery in there somehow, but I don't know if I need to use Count or any other operators to find patients visiting more than two times.
you could use a join with visit and count for distinct(VisitDate) filtering using having for the count > 1
SELECT FirstName, LastName , count(distinct VisitDate)
FROM PATIENT
inner join VISIT on VISIT.patientID = PATIENT.PatientID
group by FirstName, LastName
having count(distinct VisitDate) > 1
use aggregate function count and having clause for comparison
SELECT P.FirstName, P.LastName,COUNT(V.VisitID) as numberOfVisit
FROM VISIT V
JOIN PATIENT P ON P.PatientID = V.PatientID
GROUP BY V.PatientID, P.FirstName, P.LastName
HAVING COUNT(V.VisitID) > 2
By using sub query you can also get same result but no need that 1st query is more appropriate
select * from (
SELECT P.FirstName, P.LastName,COUNT(V.VisitID) as numberOfVisit
FROM VISIT V
JOIN PATIENT P ON P.PatientID = V.PatientID
GROUP BY V.PatientID, P.FirstName, P.LastName
) as T where T.numberOfVisit>2
select x.FirstName,x.LastName from (
SELECT a.FirstName, a.LastName,count(*) n
FROM PATIENTa,visit b
where a.patientid = b.patientid
group by a.FirstName, a.LastName having count(*) > 2
) x

project to which maximum number of employees have been allocated

I have these tables with the following columns :
Employee24 (EMPLOYEEID, FIRSTNAME, LASTNAME, GENDER);
PROJECT24 (PROJECTID PROJECTNAME EMPLOYEEID);
I want to write a query to find project to which maximum number of employees are alocated.
SELECT FIRSTNAME, LASTNAME
FROM EMPLOYEE24 E
WHERE E.EMPLOYEEID IN ( SELECT L2.EMPLOYEEID
FROM PROJECT24 L2 group by l2.employeeid)\\
What do you want to do if there are ties? This is an important question and why row_number()/rank() might be a better choice:
select p.*
from (select p.projectid, p.projectname, count(*) as num_employees,
rank() over (order by count(*) desc) as seqnum
from project25 p
group by p.projectid, p.projectname
) p
where seqnum = 1;
Notes:
The above query returns all rows if there are ties. If you want only one (arbitrary) project when there is a tie, then use row_number().
I see no reason to join to employee24.
Your data structure is strange. The relationship between projects and employees should be in a separate table, say project_employees. That should have projectid, but not the name. The name should be in project24.
You might try something like this (though I'm quite sure it can be done in other ways):
SELECT *
FROM (SELECT prj.projectid,
prj.projectname,
COUNT(*) AS number_employees
FROM project24 prj
JOIN employee24 emp
ON prj.employeeid = emp.employeeid
GROUP BY prj.projectid,
prj.projectname
ORDER BY number_employees DESC)
WHERE ROWNUM = 1;