SQL - Joining a table over itself to find people with same parents

SQL - Joining a table over itself to find people with same parents - sql

I have table like this:
*Id, Name, Surname, Father Name, Mother Name
---------------------------------------------
*1, John, Green, James, Sue
*2, Michael, Sloan, Barry, Lilly
*3, Sally, Green, Andrew, Molly
*4, Michael, Sloan, Barry, Lilly
*5, Ned, White, James, Sue
I want a query that selects rows with the same father name and mother name for given first names. For the example table, when I want to select Johns and Neds with same parents, query should return
1, John, Green, James, Sue
5, Ned, White, James, Sue
I tried joining table with itself but no matter how I change the where criteria it returned a cartesian product. Any tips?

Use sub-query
SELECT * FROM Table
WHERE (FatherName, MotherName) IN
(SELECT FatherName, MotherName FROM Table WHERE Name='John')

What you need is called relational division. However, it's slightly trickier in your case, since it normally returns aggregated data, and you need all rows from the table. So, indeed, a self join is required:
select t.*
from dbo.Table t
inner join (
select d.FatherName, d.MotherName
from dbo.Table d
group by d.FatherName, d.MotherName
having count(*) > 1
) sq on sq.FatherName = t.FatherName
and sq.MotherName = t.MotherName;
In the subquery, you select only Father+Mother combinations that have more than 1 entry in the table, and then join it with the table again to output everything from these parents' pairs.

You can try this (without group by and count(*))
with fm as
(select fathername, mothername,row_number() over (partition by fathername,
mothername order by id) rownum
from #tmp1
)
select b.*
from #tmp1 b
join fm
on b.fathername = fm.fathername
and b.mothername = fm.mothername
where fm.rownum = 2

Self-join is appropriate method at this case.
SELECT DISTINCT t1.*
FROM MyTable AS t1
INNER JOIN MyTable AS t2
ON t1.FatherName=t2.FatherName
AND t1.MotherName=t2.MotherName
AND t1.Id<>t2.Id
WHERE t1.Name in ('John', 'Ned')

Related

Summarize Null Values in Table with Group By

I have two tables:
Person(ID, Name)
Sports(person_ID, Sport)
The Problem: Sport can have NULL values. And if that is the case then if I group by ID the sport should be NULL.
SELECT p.ID, p.Name, s.Sport
FROM Person p
INNER JOIN Sports s ON p.ID=s.person_id
GROUP BY p.ID
Without the Group By the table looks like this:
p.ID p.Name s.Sport
1 tom soccer
1 tom NULL
2 lisa golf
2 lisa soccer
3 tim golf
3 tim NULL
What I want now:
1 tom NULL
2 lisa golf
3 tim NULL
But what I get:
1 tom soccer
2 lisa golf
3 tim golf
I've tried subselects and ifs but I couldn't get anything to work. Thanks in advance!

Here is a query which should generate your expected result set, though as #jarlh has pointed out, it isn't clear why Lisa should play golf over soccer.
SELECT
p.ID,
p.Name,
CASE WHEN COUNT(CASE WHEN s.Sport IS NULL THEN 1 END) > 0
THEN NULL ELSE MIN(s.Sport) END AS Sport
FROM Person p
INNER JOIN Sports s
ON p.ID = s.person_id
GROUP BY
p.ID,
p.name;
Note that I group by both the ID and name, which would be required on many databases (though perhaps not SQLite).

you can't manage the NULL value with aggreagtion function as MIN()
but you could try
SELECT p.ID, p.Name, min(ifnull(s.Sport,''))
FROM Person p
INNER JOIN Sports s ON p.ID=s.person_id
GROUP BY p.ID, p.name

Assuming the version of SQLLite you are using supports row_number(), please try below, you can set a row_number to 1 if you order by s.sport ASC, then select the first row for each category. If there is NULL, it should locate at the top row of each category via this query. You don't need to use group by:
;with cte as (
select p.ID, p.Name, s.Sport,
ROW_NUMBER() OVER (PARTITION BY p.ID ORDER BY s.Sport ASC) AS rn
FROM Person p INNER JOIN Sports s ON p.ID=s.person_id
)
select *
from cte
where rn=1

You can do this with a correlated subquery, avoiding the join in the outer query:
select p.*,
(select s.sport
from sports s
where s.personId = p.id
order by (s.sport is null) desc, s.sport asc
) as min_sport
from person p;
This may prove useful under some circumstances. With an index on sports(personid, sport), it might be faster than the group by, depending on the data (lots of people, few sports per person).
Also, this is slightly different from your query because it returns all people, even those with no sports.

Return first not null result from a column

On SQL Server, I have the following query (minimized):
SELECT A.ID, A.OWNER, B.CAR
FROM TABLE A
LEFT JOIN TABLE B ON A.ID = B.CAR_ID
Which returns the following:
ID Owner Car
01 Bob BMW
02 Bob NULL
03 Bob BMW
04 Andy Audi
05 Andy Audi
I want to Group By Owner with the first not NULL result for car to get:
Owner Car
Bob BMW
Andy Audi
I could do:
SELECT A.OWNER, max(B.CAR) as Car
FROM TABLE A
LEFT JOIN TABLE B ON A.ID = B.CAR_ID
GROUP BY A.OWNER
But, is there way to do this with Coalesce()? Or something else that might work better with a more complex query?

When a car is present, your result-set always associates 'Bob' with 'BMD' and 'Andy' with 'Audi'. I assume, however, that in the real dataset there are owners that can have more than one type of car. So the question then becomes: "which one do you choose?".
If it's really arbitrary and doesn't matter, then your existing approach using 'max' is fine. At least it has a predictable default ordering so that you'll get the same output on every run given the same state of data in the base tables.
However, if something else should count as 'first', such as if you wanted to base the comparison on the 'id' field, then you're going to want to use 'row_number' to order by that field for within each owner, such as in the code below.
select owner, car
from (
select *,
ord = row_number() over(partition by owner order by id)
from [Table A] a
left join [Table B] b on a.id = b.car_id
where b.car is not null
) orderings
where ord = 1

I'm not sure what you mean by first result. If you want to go by the default order, you could do:
If you were ordering by ID, then it would be
SELECT DISTINCT FIRST_VALUE(Owner) OVER(PARTITION BY Owner ORDER BY ID), FIRST_VALUE(Car) OVER(PARTITION BY Owner ORDER BY ID)
FROM Table_Name WHERE Car IS NOT NULL

You could do the following:
SELECT TOP 1 WITH TIES A.OWNER, B.CAR
FROM TABLE A
LEFT JOIN TABLE B ON A.ID = B.CAR_ID
ORDER BY ROW_NUMBER() OVER (PARTITION BY A.OWNER ORDER IIF(B.CAR IS NOT NULL, 0, 1), A.ID)
By splitting the ORDER BY in two, you place all NULL's last, followed by ordering on the given ID in your resultset. Each distinct A.OWNER will receive ROW_NUMBER() 1. Using TOP 1 WITH TIES you're left with all 1's without using a subquery, thus having only one row per A.OWNER.

How to have IN and NOT IN at same time

Can someone help me to figure out how is the best way to do this?
I have a list of people with cars. I need to execute a query that will return people that have a type of car and don't have another type at the same time.
Here is my example:
ID Name CarType
----------- ---------- ----------
1 John MINI VAN
1 John SUV
2 Mary SUV
2 Mary SEDAN
3 Paul SPORT
3 Paul TRUCK
4 Joe SUV
4 Joe MINI VAN
For instance, I want to display only people that have SUV AND DON'T have MINI VAN. If we try the clause CarType IN ('SUV') AND NOT IN ('MINI VAN'), this will not work, because the second statement is just ignored.
In order to return people that have a type but don't have another type at the same time, I tried the following:
Create a temporary table with the IN clause, let's say #Contains
Create a temporary table with the NOT IN clause, let's say #DoesNotContain
Join table with #Contains, this will do the IN clause
On the where clause, look for IDs that are not in #DoesNotContain table.
The query that I am using is this:
--This is the IN Clause
declare #Contains table(
ID int not null
)
--This is the NOT IN Clause
declare #DoesNotContains table(
ID int not null
)
--Select IN
insert into #Contains
SELECT ID from #temp where CarType = 'SUV'
--Select NOT IN
insert into #DoesNotContains
SELECT ID from #temp where CarType = 'MINI VAN'
SELECT
a.ID, Name
FROM
#temp a
INNER JOIN #Contains b on b.ID = a.ID
WHERE
a.ID NOT IN (SELECT ID FROM #DoesNotContains)
Group by
a.ID, Name
This will return Mary because she has a SUV but does not have a MINI VAN.
Here are my questions:
Is it possible to execute this IN and NOT IN in the query, without temp tables? Is there something new in SQL that does that? (Sorry, last time I worked with SQL was SQL 2005)
Should we use temp tables for this?
If this is the way to go, should I use IN and NOT IN instead of the JOIN?
How to replace the NOT IN clause with a JOIN?
Thank y'all!
EDIT
I just tested the solutions but unfortunately I did not specify that I need a combination of cartypes. My bad :(
For instance, if I want all users that have SUV and MINI VAN but not TRUCK AND NOT SEDAN. In this case it only John is returned.

This is normally accomplished with a single query in standard SQL, using NOT EXISTS:
SELECT *
FROM mytable AS t1
WHERE CarType = 'SUV' AND
NOT EXISTS (SELECT *
FROM mytable AS t2
WHERE t1.Name = t2.Name AND t2.CarType = 'MINI VAN')
The above query will select all people having CarType = 'SUV', but do not have CarType = 'MINI VAN'.

Here's one way
SELECT Id, Name
FROM Cars
WHERE CarType = 'SUV'
EXCEPT
SELECT Id, Name
FROM Cars
WHERE CarType = 'MINI VAN'
Or another
SELECT Id, Name
FROM Cars
WHERE CarType IN ('SUV', 'MINI VAN')
GROUP BY Id, Name
HAVING MIN(CarType) = 'SUV'
Or a more generic version that addresses the different requirement in the comment.
SELECT Id,
NAME
FROM Cars
WHERE CarType IN ( 'SUV', 'MINI VAN', 'TRUCK')
GROUP BY Id,
NAME
HAVING COUNT(DISTINCT CASE
WHEN CarType IN ( 'SUV', 'MINI VAN' ) THEN CarType
END) = 2
AND COUNT(DISTINCT CASE
WHEN CarType IN ( 'TRUCK' ) THEN CarType
END) = 0

Using LEFT JOIN:
SELECT a.ID,
Name
FROM #temp a
INNER JOIN #Contains b ON b.ID = a.ID
LEFT OUTER JOIN #DoesNotContains c ON c.ID = a.ID
WHERE c.ID IS NULL
The INNER JOIN will return records where b.ID and a.ID match.
The LEFT OUTER JOIN returns all records, with NULL where there is no match - adding WHERE c.ID IS NULL returns records from a that don't match to c.

The keyword except is your friend. This is the general idea
where carType in
(select carType
from cars
where you want to include them
except
select carType
from cars
where you want to exclude them)
You can work out the details.

MSSQL: Display Rows for a Select with Case and Count even if Count = 0

i hope i can explain my Problem in detail, so you guys can understand me.
Ive created a small example.
I have a Table which looks like this:
City | Name
Berlin | Mike
Berlin City| Peter
Stuttgart | Boris
here is my Query:
SELECT CASE
WHEN City like '%Berlin%' THEN 'Count Person in Berlin:'
WHEN City like '%Stuttgart%' THEN 'Count Person in Stuttgart:'
WHEN City like '%Dresden%' THEN 'Count Person in Dresden:'
ELSE 'unknown'
END AS Text,
COUNT(Name) AS countPersons
FROM tblTest
GROUP BY City
This is the result:
Count Person in Berlin: 2
Count Person in Stuttgart: 1
But my desired result is:
Count Person in Berlin: 2
Count Person in Stuttgart: 1
Count Person in Dresden: 0
how can i get my desired result? hope you can help me.
Thanks in advance.
SQL Fiddle Demo

If you don't have a table with the list of cities, then you can use a subquery. The key to solving this type of problem is left outer join:
select cities.city, count(t.city) as numpeople
from (select 'Berlin' as city union all
select 'Stuttgart' union all
select 'Dresden'
) cities left outer join
tbltest t
on t.city = cities.city
group by cities.city;
If you want to have 'unknown' as well, then full outer join can be used:
select coalesce(cities.city, 'unknown') as city, count(t.city) as numpeople
from (select 'Berlin' as city union all
select 'Stuttgart' union all
select 'Dresden'
) cities full outer join
tbltest t
on t.city = cities.city
group by coalesce(cities.city, 'unknown');

SQL query or sub-query?

I've got a table of student information in MySQL that looks like this (simplified):
| age : int | city : text | name : text |
-----------------------------------------------------
| | | |
I wish to select all student names and ages within a given city, and also, per student, how many other students in his age group (that is, how many students share his age value).
I managed to do this with a sub-query; something like:
select
name,
age as a,
(select
count(age)
from
tbl_students
where
age == a)
from
tbl_students
where
city = 'ny'
But it seems a bit slow, and I'm no SQL-wiz, so I figure I'd ask if there's a smarter way of doing this. The table is indexed by age and city.

select
t1.name,
t1.age as a,
count(t2.age) NumberSameAge
from
tbl_students t1 inner join tbl_students t2
on t1.age=t2.age
where
city = 'ny'
group by t1.name, t1.age
not tested, but something like that. I.o.w. a groupby on a join. This sometimes can be faster as the query you're running is doing a nested subquery for every row returned, and the query I posted above (or at least, the structure with a join and a groupby) performs a query on the related students just once.

It might be easier to grab a sub-query that grabs everything at once (vs. 1000 rows where it runs the sub-query 1000 times).
SELECT Age, count(*) AS SameAge FROM tbl_students
Making the full query:
SELECT t.Name, t.Age, s.SameAge
FROM tbl_students t
INNER JOIN (
SELECT Age, count(*) AS SameAge FROM tbl_students
) AS s
ON (t.Age = s.Age) -- m:1
WHERE t.City = 'NY'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Joining a table over itself to find people with same parents - sql

Use sub-query SELECT * FROM Table WHERE (FatherName, MotherName) IN (SELECT FatherName, MotherName FROM Table WHERE Name='John')

You can try this (without group by and count()) with fm as (select fathername, mothername,row_number() over (partition by fathername, mothername order by id) rownum from #tmp1 ) select b. from #tmp1 b join fm on b.fathername = fm.fathername and b.mothername = fm.mothername where fm.rownum = 2

Self-join is appropriate method at this case. SELECT DISTINCT t1.* FROM MyTable AS t1 INNER JOIN MyTable AS t2 ON t1.FatherName=t2.FatherName AND t1.MotherName=t2.MotherName AND t1.Id<>t2.Id WHERE t1.Name in ('John', 'Ned')

Related

Summarize Null Values in Table with Group By

Return first not null result from a column

How to have IN and NOT IN at same time

MSSQL: Display Rows for a Select with Case and Count even if Count = 0

SQL query or sub-query?

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL - Joining a table over itself to find people with same parents - sql

Use sub-query SELECT * FROM Table WHERE (FatherName, MotherName) IN (SELECT FatherName, MotherName FROM Table WHERE Name='John')

You can try this (without group by and count(*)) with fm as (select fathername, mothername,row_number() over (partition by fathername, mothername order by id) rownum from #tmp1 ) select b.* from #tmp1 b join fm on b.fathername = fm.fathername and b.mothername = fm.mothername where fm.rownum = 2

Self-join is appropriate method at this case. SELECT DISTINCT t1.* FROM MyTable AS t1 INNER JOIN MyTable AS t2 ON t1.FatherName=t2.FatherName AND t1.MotherName=t2.MotherName AND t1.Id<>t2.Id WHERE t1.Name in ('John', 'Ned')

Related

Summarize Null Values in Table with Group By

Return first not null result from a column

How to have IN and NOT IN at same time

MSSQL: Display Rows for a Select with Case and Count even if Count = 0

SQL query or sub-query?

Categories

Resources

You can try this (without group by and count()) with fm as (select fathername, mothername,row_number() over (partition by fathername, mothername order by id) rownum from #tmp1 ) select b. from #tmp1 b join fm on b.fathername = fm.fathername and b.mothername = fm.mothername where fm.rownum = 2