How to return results where one related field exists, but another does not? - sql

I'm sure there is an easy way to do this, but I can't even think of how to accurate describe what it is I want to do, so I can't Google it. My actual scenario is slightly more complicated than this, but a very basic version would be having two separate tables - one showing the names and dates of people who bought tickets for the bus, and another showing the names and dates of people who travelled on the bus, like so
tableTicket tableTravel
custName ticketDate custName travelDate
-------- ---------- -------- ----------
tim 01-jul-15 tim 01-jul-15
tim 03-jul-15 tim 02-jul-15
anna 15-jul-15 tim 03-jul-15
anna 16-jul-15 anna 15-jul-15
anna 20-jul-15 anna 16-jul-15
emily 02-jul-15 rob 07-jul-15
rob 07-jul-15 rob 12-jul-15
rob 12-jul-15 rob 13-jul-15
I want to return only the names and dates of people who travelled without a ticket, but not those who bought a ticket and didn't travel. So I would want to return Tim's journey on 02-jul-15, but not emily's unused ticket on 02-jul-15.
The problem I'm having is that I can't just look for dates that appear is travelDate that don't appear in ticketDate, because then it will match tim's journey with emily's ticket.
I presume that the best way to do this would be with joins, but I'm not sure how. If I join on custName, it creates a row where tim buys a ticket and travels on 01-jul-15 (which is fine) but also creates a row where he bought a ticket on 01-jul-15 and travels on 02-jul-15.
Ideally, I think I want to end up with a table like this, but I've got no idea how to get there
custName ticketDate travelDate
-------- ---------- --------
tim 01-jul-15 01-jul-15
tim NULL 02-jul-15
tim 03-jul-15 03-jul-15
anna 15-jul-15 15-jul-15
anna 16-jul-15 16-jul-15
anna 20-jul-15 NULL
emily 02-jul-15 NULL
rob 07-jul-15 07-jul-15
rob 12-jul-15 12-jul-15
rob NULL 13-jul-15
I could then select all rows where ticketDate IS NULL and it would give me the details of the journeys that Tim and Rob took without a ticket.
Apologies if I'm barking up entirely the wrong tree. I'm sure it's something really easy, but I just can't wrap my head around it.

You seems want FULL OUTER JOIN :
select coalesce(tk.custName, tr.custName) as custName, tk.ticketDate, tr.travelDate
from tableticket tk full outer join
tabletravel tr
on tr.custname = tk.custname and tr.travelDate = tk.ticketDate;

Related

select the row with max values grouped by other rows

I have a table like this
Name
Project Name
Version deployed
Deployment date
John
Car
1.9.8
2022-09-23
John
Car
2.2.4
2022-10-15
John
Car
2.2.5-beta3
2022-10-14
John
Plane
4.9.345
2020-03-12
John
Plane
6.7.89
2022-05-05
Jack
Plane
6.7.89
2022-05-05
Jack
Plane
6.5
2022-05-07
Jack
Dog
4.6.6
2022-08-23
Jack
Plane
6.7.89
2022-05-05
...
And I would like to have a SQL query that selects ONE line grouped by Name, Project Name and that contains the last deployed version with its deployment date (and possibly other columns not displayed here) and in the example outputs then
Name
Project Name
Version deployed
Deployment date
John
Car
2.2.4
2022-10-15
John
Plane
6.7.89
2022-05-05
Jack
Plane
6.5
2022-05-07
Jack
Dog
4.6.6
2022-08-23
...
What I tried already different answers from this website, but the ones I found either group on one element or do not provide the other columns.
EDIT:
The Question that this has been marked as duplicate of, asks exactly what I mentioned last: when grouping on one column.
The answers there do not explain how to extend themselves to grouping by multiple columns.
The answer provided by #YuriLevinsky in the comments here, on the other hand, solves exactly what I wanted.

Using HAVING to restrict results when using GROUP BY on multiple columns

Given a list of physician and patient interaction dates (VisitSchedule), I want to select all those physicians who have seen more than 2 unique patients. My problem is that in order to isolate physician/patients I need to group by both physician and patient. How do I then restrict the results such that only Dr. Moody is returned? Since he has seen three (3) unique patients and Dr. Franks has seen only two (2) unique patients even though he has had more visits in total?
Physician Patient VisitDate
-------------------------------------
Dr. Moody Danny 5/1/2013
Dr. Moody Danny 5/3/2013
Dr. Moody Danny 5/7/2013
Dr. Moody Paul 4/11/2013
Dr. Moody Paul 5/10/2013
Dr. Moody James 5/1/2013
Dr. Franks Allison 4/18/2013
Dr. Franks Allison 4/24/2013
Dr. Franks Tammy 4/11/2013
Dr. Franks Tammy 4/14/2013
Dr. Franks Tammy 5/11/2013
Dr. Franks Tammy 5/12/2013
Dr. Franks Tammy 5/17/2013
SELECT Physician
FROM VisitSchedule
GROUP BY Physician, Patient
HAVING (COUNT(Physician) > 2)
Am I using COUNT incorrectly?
Please note that my last question was related to this one but I realized I didn't properly explain the grouping by two columns. I was trying to simplify the question as to to not make it too verbose and I ended up over-simplifying it.
P.S. If anyone has any suggestions on "SQL Puzzle" books that would help one practice problems like these that would be great.
You started quite good, this query should bring it to an end:
SELECT
vs.Physician
FROM
VisitSchedule vs
GROUP BY
vs.Physician
HAVING
COUNT(DISTINCT vs.Patient) > 2

VBA/SQL recordsets

The project I'm asking about is for sending an email to teachers asking what books they're using for the classes they're teaching next semester, so that the books can be ordered. I have a query that compares the course number of this upcoming semester's classes to the course numbers of historical textbook orders, pulling out only those classes that are being taught this semester. That's where I get lost.
I have a table that contains the following:
Professor
Course Number
Year
Book Title
The data looks like this:
professor year course number title
--------- ---- ------------- -------------------
smith 13 1111 Pride and Prejudice
smith 13 1111 The Fountainhead
smith 13 1222 The Alchemist
smith 12 1111 Pride and Prejudice
smith 11 1222 Infinite Jest
smith 10 1333 The Bible
smith 13 1333 The Bible
smith 12 1222 The Alchemist
smith 10 1111 Moby Dick
johnson 12 1222 The Tipping Point
johnson 11 1333 Anna Kerenina
johnson 10 1333 Everything is Illuminated
johnson 12 1222 The Savage Detectives
johnson 11 1333 In Search of Lost Time
johnson 10 1333 Great Expectations
johnson 9 1222 Proust on the Shore
Here's what I need the code to do "on paper":
Group the records by professor. Determine every unique course number in that group, and group records by course number. For each unique course number, determine the highest year associated. Then spit out every record with that professor+course number+year combination.
With the sample data, the results would be:
professor year course number title
--------- ---- ------------- -------------------
smith 13 1111 Pride and Prejudice
smith 13 1111 The Fountainhead
smith 13 1222 The Alchemist
smith 13 1333 The Bible
johnson 12 1222 The Tipping Point
johnson 11 1333 Anna Kerenina
johnson 12 1222 The Savage Detectives
johnson 11 1333 In Search of Lost Time
I'm thinking I should make a record set for each teacher, and within that, another record set for each course number. Within the course number record set, I need the system to determine what the highest year number is - maybe store that in a variable? Then pull out every associated record so that if the teacher ordered 3 books the last time they taught that class (whether it was in 2013 or 2012 and so on) all three books display. I'm not sure I'm thinking of record sets in the right way, though.
My SQL so far is basic and clearly doesn't work:
SELECT [All].Professor, [All].Course, Max([All].Year)
FROM [All]
GROUP BY [All].Professor, [All].Course;
Use your query as a subquery and INNER JOIN it back to the [ALL] table to filter the rows.
SELECT
a.Professor,
a.Year,
a.Course,
a.title
FROM
[ALL] AS a
INNER JOIN
(
SELECT [All].Professor, [All].Course, Max([All].Year) AS MaxOfYear
FROM [All]
GROUP BY [All].Professor, [All].Course
) AS sub
ON
a.Professor = sub.Professor
AND a.Course = sub.Course
AND a.Year = sub.MaxOfYear;

duplicate fields with an inner join

I'm having trouble understanding how to do a multi-table join without generating lots of duplicate fields.
Let's say that I have three tables:
family: id, name
parent: id, family, name
child: id, family, name
If I do a simple select:
select family.id, family.name from family
order by family.id;
I get a simple list:
ID Name
1 Smith
2 Jones
3 Wong
If I add an inner join:
select family.id, family.name, parent.first_name, parent.last_name
from family
inner join parent
on parent.family = family.id
order by family.id;
I get some duplicated fields:
ID Name Parent
1 Smith Howard Smith
1 Smith Janet Smith
2 Jones Phil Jones
2 Jones Harriet Jones
3 Wong Billy Wong
3 Wong Rachel Wong
And if I add another inner join:
select family.id, family.name, parent.first_name, parent.last_name
from family
inner join parent
on parent.family = family.id
inner join child
on child.family = family.id
order by family.id;
I get even more duplicated fields:
ID Name Parent Child
1 Smith Howard Smith Peter Smith
1 Smith Howard Smith Sally Smith
1 Smith Howard Smith Fred Smith
1 Smith Janet Smith Peter Smith
1 Smith Janet Smith Sally Smith
1 Smith Janet Smith Fred Smith
2 Jones Phil Jones Mark Jones
2 Jones Phil Jones Melissa Jones
2 Jones Harriet Jones Mark Jones
2 Jones Harriet Jones Melissa Jones
3 Wong Billy Wong Mary Wong
3 Wong Billy Wong Jennifer Wong
3 Wong Rachel Wong Mary Wong
3 Wong Rachel Wong Jennifer Wong
What I would prefer, because it's more human readable, is something like this:
ID Name Parent Child
1 Smith Howard Smith Peter Smith
Janet Smith Sally Smith
Fred Smith
2 Jones Phil Jones Mark Jones
Harriet Jones Melissa Jones
3 Wong Billy Wong Mary Wong
Rachel Wong Jennifer Wong
I know that one of the benefits of an inner join is to avoid presenting excess information through a Cartesian product. But it seems that I get something similar with a multi-table join. Is there a way to summarize each group as shown above or will this require post-processing with a scripting language like Python?
Thanks,
--Dan
This is precisely the way the relation databases work: each row must contain all information in itself, with every single field that you request. In other words, each row needs to make sense in isolation from all other rows. If you do a single query and you need to get all three levels of information, you need to deal with eliminating duplicates yourself for the desired formatting.
Alternatively, you can run three separate queries, and then do in-memory joins in code. Although this may be desirable in certain rare situations, it is generally a wrong way of spending your development time, because RDBMS are usually much more efficient at joining relational data.
You've hit it on the head. You'll need some post processing to get the results you're looking for.
SQL query results are always simple tabular data, so to get the results you're looking for would definitely not be a pretty query. You could do it, but it would involve quite a bit of query voodoo, storing things in temporary tables or using cursors, or some other funky workaround.
I'd definitely suggest using an external application to retrieve your data and format it appropriately from there.
ORMs like Entity Framework in .NET can probably do this pretty easily, but you could definitely do this with a few nested collections or dictionaries in any language.

Joining SQL Tables

I have two tables:
RecommendedFriends and AddedFriends
each of the tables have a User field and a Friend field. I am trying to figure out how I can see how many friends a User added that they were also recommended. Heres an example of the tables:
RecommendedFriends
User Friends Time
------------------------------------
Jake Eric 8:00am
Jake John 8:00am
Jake Jack 8:30am
Greg John 8:30am
Greg Tim 9:00am
Greg Steve 9:30am
Will Jackson 9:30am
AddedFriends
User Friends Time
------------------------------------
Jake Jack 8:35am
Greg John 8:35am
Greg Tim 9:00pm
Greg Jim 10:30am
Greg Tina 10:45am
Greg Bob 10:00am
Charlie Brian 11:00am
So the table I need would look like this:
Results
User RecFriends AddFriends
------------------------------------
Jake Eric
Jake John
Jake Jack Jack
Greg John John
Greg Tim Tim
Greg Steve
Greg Tina
Will Jackson
Charlie Brian
So I can go in and say 3 people added friends they were recommended, 4 Recommendations failed, and 2 people added someone they weren't recommended.
I think what you want is full outer join:
select coalesce(rf.USER, af.user) as user, rf.friends as RecFriends, af.Friends as AddFriends,
from RecommendedFriends rf full outer join
AddedFriends af
on rf.user = af.user and
rf.Friends = af.Friends
This doesn't take time into account. You might want to check that the time of the add is after the time of the recommendation, if you want to infer causality between the recommendation and the add.
If you are using a database that doesn't support full outer join (can anyone say "MySQL"), you can get the same result doing:
select t.user, MAX(case when which = 'rec' then friends end) as RecFriends,
MAX(case when which = 'add' then friends end) as AddFriends
from ((select rf.user, rf.friends, 'rec' as which
from RecommendedFriends af.user
) union all
(select af.user, af.friends, 'add' as which
from AddedFriends af
)
) t
group by user
This version has the nice feature that it will not produce duplicate records, in the event of multiple recommendations or adds.