Better way to demand, in SQL, that a column contains every specified value - sql

Imagine you have two tables, with a one to many relationship.
For this example, I will suggest that there are two tables: Person, and Homes.
The person table holds a persons name, and gives them an ID. The homes table, holds the association of homes to a person. PID joins to "Person.ID"
And, in this tiny DB, a person can have no homes, or many homes.
I hope I drew that right.
How do I write a select, that returns everyone with every specified house type?
Let's say these are valid "Types" in the homes table:
Cottage, Main, Mansion, Spaceport.
I want to return everyone, in the Person table, who has a spaceport and a Cottage.
The best I could come up with was this:
SELECT DISTINCT( p.name ) AS name
FROM person p
INNER JOIN homes h ON h.pid = p.id
WHERE 'spaceport' in (
SELECT DISTINCT( type ) AS type
FROM homes
WHERE pid = p.id
)
AND 'cottage' in (
SELECT DISTINCT( type ) AS type
FROM homes
WHERE pid = p.id
)
When I wrote that, it works, but I'm pretty sure there has to be a better way.

The HAVING clause here will guarantee that the persons returned have both types, not just one or the other.
SELECT p.name
FROM person p
INNER JOIN homes h
ON p.id = h.pid
AND h.type IN ('spaceport', 'cottage')
GROUP BY p.name
HAVING COUNT(DISTINCT h.type) = 2

select * from homes;
home_id person_id type
--
1 1 cottage
2 1 mansion
3 2 cottage
4 3 mansion
5 4 cottage
6 4 cottage
To find the id numbers of every person who has both a cottage and a mansion, group by the id number, restrict the output to cottages and mansions, and count the distinct types.
select person_id
from homes
where type in ('cottage','mansion')
group by person_id
having count(distinct type) = 2;
person_id
--
1
You can use this query in a join to get all the columns from the person table.
select person.*
from person
inner join (select person_id
from homes
where type in ('cottage','mansion')
group by person_id
having count(distinct type) = 2) T
on person.person_id = T.person_id;
Thanks to Joe for pointing out an error in my count().

Not sure about the performance on this one, but here goes:
SELECT PID FROM (
SELECT PID, COUNT(PID) cnt FROM (
SELECT DISTINCT PID, Type FROM Homes
WHERE Type IN ('Type1', 'Type2', 'Type3')
) a
GROUP BY PID
) b
WHERE b.cnt = 3
You'd have to dynamically generate your IN clause as well as the WHERE b.CNT clause.

Related

SQL statement : average

My question: What is the average age to become the first grandpa. The solution should be given out as average_age. The day a person becomes grandpa is where his first grandchild was born.
Relations:
human (name, gender, age)
parent (ParentName, ChildName) -> is subset of human(name).
Table:
I do know that grandpa is the person which has a parentname and a child in childname which is also a person(father) in parentname which has children in childname (grandchildren). The problem is now how do I get the average age to become grandpa.
What I got so far:
SELECT AVG(age) as average_age
FROM human h JOIN
parent p
ON h.name = p.parentname
WHERE h.gender = 'm' AND p.parentname = p.childname AND h.name = p.parentname
Expected outcome:
average_age : 52
It is extremely unusually to be storing the AGE of people in a table, because that changes -- every day. The data should be stored with a date of birth.
This is an aggregation query, but you have to join the tables multiple times. To get grandparents, you need a join on the parents table. Then you need to bring in humans for filtering:
select avg(min_age * 1.0)
from (select min(h_grandparent.age - h.grandchild.age) as min_age
from parent p join -- p.parentname is the grandparent
parent pchild
on p.childname = pchild.parentname join
human h_grandparent
on p.parentname = h_grandparent.name join
human h_grandchild
on pchild.childname = h_grandchild.name
where h_grandparent.gender = 'm'
group by h_grandparent.name
) a
I would address this with an exists condition that filters on humans that have grandchilds:
select avg(age) avg_age_of_grandpas
from human h
where
gender = 'm'
and exists (
select 1
from parent p1
inner join parent p2 on p2.parentName = p1.childName
where p1.parentName = h.name
)
The exists condition ensures that the person has at least one child and one grand child. The the outer query computes the average of such humans. Given the information available in your table structure, this seems to me like the most logical approach. Unlike joins, using exists avoids duplicating the records (and getting wrong results in the average) when a person has more than one line of descendants.
If you want the age of the grand parent at the date when their first grand child was born, then it is a bit complicated. This should get you close to what you expect:
select avg(h.age - g.maxGrandChildAge) avg_age_of_grandpas
from human h
inner join (
select
p1.parentName grandParentName,
max(h1.age) maxGrandChildAge
from parent p1
inner join parent p2 on p2.parentName = p1.childName
inner join human h1 on h1.name = p2.childName
) g
on g.grandParentName = h.name
You can do this with 2 more joins and subtraction the age of biggest grandchild:
SELECT AVG(p_age) average_age
FROM
(
SELECT h.name, h.age-MAX(h2.age) as p_age
FROM parent p1
LEFT JOIN parent p2 ON P1.childname = P2.parentname
INNER JOIN human h ON P1.parentname = h.name
INNER JOIN human h2 ON P2.ChildName = h2.name
WHERE h.gender = 'm' AND p2.childname IS NOT NULL
GROUP BY h.name, h.age
)pAges
Please consider that the name is not appropriate data for doing this task.

PostgreSQL: How do I get data from table `A` filtered by a column in table `B`

I want to fetch all parents that have kids in a specific grade only in a school.
Below are trimmed down version of the tables.
TABLE students
id,
last_name,
grade_id,
school_id
TABLE parents_students
parent_id,
student_id
TABLE parents
id,
last_name,
school_id
I tried the below query but it doesn't really work as expected. It rather fetches all parents in a school disregarding the grade. Any help is appreciated. Thank you.
SELECT DISTINCT
p.id,
p.last_name,
p.school_id,
st.school_id,
st.grade_id,
FROM parents p
INNER JOIN students st ON st.school_id = p.school_id
WHERE st.grade_id = 118
AND st.school_id = 6
GROUP BY p.id,st.grade_id,st.school_id;
I would think:
select p.*
from parents p
where exists (select 1
from parents_students ps join
students s
on ps.student_id = s.id
where ps.parent_id = p.id and
s.grade_id = 118 and
s.school_id = 6
);
Your question says that you want information about the parents. If so, I don't see why you are including redundant information about the school and grade (it is redundant because the where clause specifies exactly what those values are).

Most efficient SQL for this example

Table A: Person: id, name
Table B: Toys: id, person_id, toy_name
I have a search screen that includes a dropdown of fixed toy names.
A search is found if a subset of the total set of toys for a person is matched.
Example, a person name=bob has toys: doll, car, house, hat
A search is done for person name=bob and toys=doll, hat.
I want to return bob and ALL of his toys, not just what toys were searched for(doll, hat).
Bob is found because a subset of his toys are a match.
I don't know what the most efficient/least db calls way to accomplish this.
I can do a search for bob and get all of his toys, then parse through the result set to see if the searched for toys find a match, but that seems wrong, that the db call could return rows for which no match is found (and that seems wrong?).
okay,
select
p.id,
p.name,
t.id as toyid,
t.toy_name
from
person p
join
toys t
on p.id = t.person_id
where
p.id in (
select person_id from toys where toy_name = 'doll'
intersect
select person_id from toys where toy_name = 'hat');
Fiddle Here
If you normalise your schema a little further,
create table Person
(
Id int,
Name varchar(100)
);
create table Toy
(
Id int,
Name varchar(100)
);
create table PersonToy
(
Id int,
PersonId int,
ToyId int
);
It should make the complexity of the problem clearer. It will also save some space. A statement of the form,
select
p.Name PersonName,
t.Name ToyName
from
Person p
join
PersonToy pt
on pt.PersonId = p.Id
join
Toy t
on t.Id = pt.ToyId
where
p.Id in
(
select PersonId from PersonToy where ToyId = 1
intersect
select PersonId from PersonToy where ToyId = 4
);
will work efficiently.
Updated Fiddle
Here's one way to do it using a subquery and checking for the existence of Hat and Doll in the HAVING clause:
select p.id, p.name,
t.id as toyid, t.name as toyname
from person p
inner join toys t on p.id = t.person_id
inner join (
select person_id
from toys
group by person_id
having sum(name = 'hat') > 0 and
sum(name = 'doll') > 0
) t2 on p.id = t2.person_id
SQL Fiddle Demo

SQL query for finding row with same column values that was created most recently

If I have three columns in my MySQL table people, say id, name, created where name is a string and created is a timestamp.. what's the appropriate query for a scenario where I have 10 rows and each row has a record with a name. The names could have a unique id, but a similar name none the less. So you can have three Bob's, two Mary's, one Jack and 4 Phil's.
There is also a hobbies table with the columns id, hobby, person_id.
Basically I want a query that will do the following:
Return all of the people with zero hobbies, but only check by the latest distinct person created, if that makes sense. Meaning if there is a Bob person that was created yesterday, and one created today.. I only want to know if the Bob created today has zero hobbies. The one from yesterday is no longer relevant.
select pp.id
from people pp, (select name, max(created) from people group by name) p
where pp.name = p.name
and pp.created = p.created
and id not in ( select person_id from hobbies )
SELECT latest_person.* FROM (
SELECT p1.* FROM people p1
WHERE NOT EXISTS (
SELECT * FROM people p2
WHERE p1.name = p2.name AND p1.created < p2.created
)
) AS latest_person
LEFT OUTER JOIN hobbies h ON h.person_id = latest_person.id
WHERE h.id IS NULL;
Try This:
Select *
From people p
Where timeStamp =
(Select Max(timestamp)
From people
Where name = p.Name
And not exists
(Select * From hobbies
Where person_id = p.id))

Get latest record from second table left joined to first table

I have a candidate table say candidates having only id field and i left joined profiles table to it. Table profiles has 2 fields namely, candidate_id & name.
e.g. Table candidates:
id
----
1
2
and Table profiles:
candidate_id name
----------------------------
1 Foobar
1 Foobar2
2 Foobar3
i want the latest name of a candidate in a single query which is given below:
SELECT C.id, P.name
FROM candidates C
LEFT JOIN profiles P ON P.candidate_id = C.id
GROUP BY C.id
ORDER BY P.name;
But this query returns:
1 Foobar
2 Foobar3
...Instead of:
1 Foobar2
2 Foobar3
The problem is that your PROFILES table doesn't provide a reliable means of figuring out what the latest name value is. There are two options for the PROFILES table:
Add a datetime column IE: created_date
Define an auto_increment column
The first option is the best - it's explicit, meaning the use of the column is absolutely obvious, and handles backdated entries better.
ALTER TABLE PROFILES ADD COLUMN created_date DATETIME
If you want the value to default to the current date & time when inserting a record if no value is provided, tack the following on to the end:
DEFAULT CURRENT_TIMESTAMP
With that in place, you'd use the following to get your desired result:
SELECT c.id,
p.name
FROM CANDIDATES c
LEFT JOIN PROFILES p ON p.candidate_id = c.id
JOIN (SELECT x.candidate_id,
MAX(x.created_date) AS max_date
FROM PROFILES x
GROUP BY x.candidate_id) y ON y.candidate_id = p.candidate_id
AND y.max_date = p.created_date
GROUP BY c.id
ORDER BY p.name
Use a subquery:
SELECT C.id, (SELECT P.name FROM profiles P WHERE P.candidate_id = C.id ORDER BY P.name LIMIT 1);