I have a MySQL database, in which I have a table of monkeys:
id name
1 Alice
2 Bill
3 Donkey Kong
4 Edna
5 Feefee
I also have a table of bananas and where they were picked from.
id where_from
1 USA
2 Botswana
3 Banana-land
4 USA
Finally, I have a table matches that describes which bananas belong to which monkeys. Each monkey can only have one banana, and no monkeys can share a banana. Some monkeys may lack a banana.
id monkey_id banana_id
1 3 4
2 4 1
3 5 2
How can I use a single SQL statement to retrieve all the matches? For each match, I want the name of the monkey as well as where the banana is from.
I have tried the following 3 SQL statements, which work:
SELECT * FROM matches
SELECT * FROM monkeys WHERE id=[monkey_id from 1st SQL query]
SELECT * FROM bananas WHERE id=[banana_id from 1st SQL query]
I feel that 3 SQL statements is cumbersome though. Any ideas on how I can just use a single SQL statement? I am just learning SQL and am monkeying around with the basics. Thanks!
Since some monkeys may lack a banana, that implies a LEFT JOIN between matches and monkeys. That will ensure all monkeys are listed, even if they have no bananas in matches.
SELECT
monkeys.name,
bananas.where_from
FROM
monkeys
/* List all monkeys, even if they have no match */
LEFT JOIN matches ON monkeys.id = matches.monkey_id
/* And another LEFT JOIN to link matches to bananas */
LEFT JOIN bananas ON bananas.id = matches.banana_id
Here is an example on SQLfiddle.com
I very highly recommend reading over Jeff Atwood's (co-founder of Stack Overflow) excellent article explaining SQL joins.
Related
This has been driving me and my team up the wall. I cannot compose a query that will strict match a single record that has a specific permutation of look ups.
We have a single lookup table
room_member_lookup:
room | member
---------------
A | Michael
A | Josh
A | Kyle
B | Kyle
B | Monica
C | Michael
I need to match a room with an exact list of members but everything else I've tried on stack overflow will still match room A even if I ask for a room with ONLY Josh and Kyle
I've tried queries like
SELECT room FROM room_member_lookup
WHERE member IN (Josh, Michael)
GROUP BY room
HAVING COUNT(1) = 2
However this will still return room A even though that has 3 members I need a exact member permutation and that matches the room even not partials.
SELECT room
FROM room_member_lookup a
WHERE member IN ('Monica', 'Kyle')
-- Make sure that the room 'a' has exactly two members
and (select count(*)
from room_member_lookup b
where a.room=b.room)=2
GROUP BY room
-- and both members are in that room
HAVING COUNT(1) = 2
Depending on the SQL dialect, one can build a dynamic table (CTE or select .. union all) to hold the member set (Monica and Kyle, for example), and then look for set equivalence using MINUS/EXCEPT sql operators.
I am trying to work with two tables on BigQuery. From table1 I want to find the accession ID of all records that are "World", and then from each of those accession numbers I want to create a column with every name in a separate row. Unfortunately, when I run this:
Select name
From `table2`
Where acc IN (Select acc
From `table1`
WHERE source = 'World')
Instead of getting something like this:
Acc1
Acc2
Acc3
Jeff
Jeff
Ted
Chris
Ted
Blake
Rob
Jack
Jack
I get something more like this:
row
name
1
Jeff
2
Chris
3
Rob
4
Jack
5
Jeff
6
Jack
7
Ted
8
Blake
Ultimately, I am hoping to download the data and somehow use python or something to take each name and count the number of times it shows up with each other name at a given accession number, and furthermore measure the degree to which each pairing is also found with third names in any given column, i.e. the degree to which they share a cohort. So I need to preserve the groupings which exist with each accession number, but I am struggling to find info on how one might do this.
Could anybody point me in the right direct for this, or otherwise is the way I am going about this wise if that is my end goal?
Thanks!
This is not a direct answer to the question you asked. In general, it is easier to handle multiple rows rather than multiple columns.
So, I would recommend that you put each acc value in a separate row and then list the names as an array:
select t2.acc, array_agg(t2.name order by t2.name) as names
from `table2` t2
where t2.acc in (Select t1.acc
From `table1` t1
where t1.source = 'World'
)
group by t2.acc;
Otherwise, you are going to have a challenge just naming the columns in your result set.
Being new with SQL and SSRS and can do many things already, but I think I must be missing some basics and therefore bang my head on the wall all the time.
A report that is almost working, needs to have more results in it, based on conditions.
My working query so far is like this:
SELECT projects.project_number, project_phases.project_phase_id, project_phases.project_phase_number, project_phases.project_phase_header, project_phase_expensegroups.projectphase_expense_total, invoicerows.invoicerow_total
FROM projects INNER JOIN
project_phases ON projects.project_id = project_phases.project_id
LEFT OUTER JOIN
project_phase_expensegroups ON project_phases.project_phase_id = project_phase_expensegroups.project_phase_id
LEFT OUTER JOIN
invoicerows ON project_phases.project_phase_id = invoicerows.project_phase_id
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total >0 )
The parameter is for selectionlist that is used to choose a project to the report.
How to have also records that have
( project_phase_expensegroups.projectphase_expense_total ) with value 0 but there might be invoices for that project phase?
Tried already to add another condition like this:
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total > 0 )
OR
( invoicerows.invoicerow_total > 0 )
but while it gives some results - also the one with projectphase_expense_total with value 0, but the report is total mess.
So my question is: what am I doing wrong here?
There is a core problem with your query in that you are left joining to two tables, implying that rows may not exist, but then putting conditions on those tables, which will eliminate NULLs. That means your query is internally inconsistent as is.
The next problem is that you're joining two tables to project_phases that both may have multiple rows. Since these data are not related to each other (as proven by the fact that you have no join condition between project_phase_expensegroups and invoicerows, your query is not going to work correctly. For example, given a list of people, a list of those people's favorite foods, and a list of their favorite colors like so:
People
Person
------
Joe
Mary
FavoriteFoods
Person Food
------ ---------
Joe Broccoli
Joe Bananas
Mary Chocolate
Mary Cake
FavoriteColors
Person Color
------ ----------
Joe Red
Joe Blue
Mary Periwinkle
Mary Fuchsia
When you join these with links between Person <-> Food and Person <-> Color, you'll get a result like this:
Person Food Color
------ --------- ----------
Joe Broccoli Red
Joe Bananas Red
Joe Broccoli Blue
Joe Bananas Blue
Mary Chocolate Periwinkle
Mary Chocolate Fuchsia
Mary Cake Periwinkle
Mary Cake Fuchsia
This is essentially a cross-join, also known as a Cartesian product, between the Foods and the Colors, because they have a many-to-one relationship with each person, but no relationship with each other.
There are a few ways to deal with this in the report.
Create ExpenseGroup and InvoiceRow subreports, that are called from the main report by a combination of project_id and project_phase_id parameters.
Summarize one or the other set of data into a single value. For example, you could sum the invoice rows. Or, you could concatenate the expense groups into a single string separated by commas.
Some notes:
Please, please format your query before posting it in a question. It is almost impossible to read when not formatted. It seems pretty clear that you're using a GUI to create the query, but do us the favor of not having to format it ourselves just to help you
While formatting, please use aliases, Don't use full table names. It just makes the query that much harder to understand.
You need an extra parentheses in your where clause in order to get the logic right.
WHERE ( projects.project_number = #iProjectNumber )
AND (
(project_phase_expensegroups.projectphase_expense_total > 0)
OR
(invoicerows.invoicerow_total > 0)
)
Also, you're using a column in your WHERE clause from a table that is left joined without checking for NULLs. That basically makes it a (slow) inner join. If you want to include rows that don't match from that table you also need to check for NULL. Any other comparison besides IS NULL will always be false for NULL values. See this page for more information about SQL's three value predicate logic: http://www.firstsql.com/idefend3.htm
To keep your LEFT JOINs working as you intended you would need to do this:
WHERE ( projects.project_number = #iProjectNumber )
AND (
project_phase_expensegroups.projectphase_expense_total > 0
OR project_phase_expensegroups.project_phase_id IS NULL
OR invoicerows.invoicerow_total > 0
OR invoicerows.project_phase_id IS NULL
)
I found the solution and it was kind easy after all. I changed the only the second LEFT OUTER JOIN to INNER JOIN and left away condition where the query got only results over zero. Also I used SELECT DISTINCT
Now my report is working perfectly.
I've written a query that's producing ghost records. Here's the statements which produce correct results on one table JOINed to a second table to grab the student's LAST_ATTEND_DATE, notice the LAST_ATTEND_DATE won't display, commented out:
SELECT DISTINCT TOP 500
SAC.STC_PERSON_ID AS CCID#,
SAC.STC_COURSE_NAME AS CourseName,
SAC.STC_TITLE AS Title,
SAC.STC_VERIFIED_GRADE AS Grade,
--CONVERT(varchar(10),SCS.SCS_LAST_ATTEND_DATE,101) AS LastAttended,
SAC.STC_REPORTING_TERM AS Term,
SAC.STC_ACAD_LEVEL AS AcadLevel
FROM STUDENT_ACAD_CRED SAC
JOIN STUDENT_COURSE_SEC SCS ON SAC.STC_PERSON_ID = SCS.SCS_STUDENT
WHERE (SAC.STC_ACAD_LEVEL = 'UG') AND (SCS.SCS_LAST_ATTEND_DATE IS NOT NULL)
ORDER BY SAC.STC_PERSON_ID;
This produces what I need except I need to display in the resulting data the students Last Attended Date. If I un-comment the statement above to display the LAST_ATTEND_DATE, 4 records appear in which 2 are ghost records. For example student ID = '0000002', he took English 1010 once in the Fall of 1992, made a D, then retook the course again in the Fall of 1993 and made a B.
0000002 ENGL*1010 English I D 92/FA UG
0000002 ENGL*1010 English I B 93/FA UG
With the LAST_ATTEND_DATE statement (CONVERT(varchar(10),SCS.SCS_LAST_ATTEND_DATE,101) AS LastAttended) un-commented to display the date, then 3 additional records appear...
I've tried changing the query between the 2 tables from JOIN, to LEFT JOIN, FULL JOIN and RIGHT JOIN. I always get 3 additional records that don't exist.
0000002 ENGL*1010 English I B 01/19/1995 93/FA UG
0000002 ENGL*1010 English I B 07/18/1996 93/FA UG
0000002 ENGL*1010 English I B 09/25/1992 93/FA UG
0000002 ENGL*1010 English I D 01/19/1995 92/FA UG
0000002 ENGL*1010 English I D 07/18/1996 92/FA UG
Would anyone know the correct syntax to JOIN these 2 tables correctly to display the data correctly?
Thanks so much for sharing your knowledge,
Donald, Casper College
Most likely the Student_Course_Sec table contains more than one record per student, which your join statement is not accounting for.
For example, if the SCS table consists of:
SCS_Student SCS_CourseName SCS_LastAttendDate
1 English 1/1/2014
1 Calculus 2/1/2014
2 English 3/1/2014
2 Philsolphy 4/1/2014
And your SAC table consists of:
STC_PERSON_ID STC_COURSE_NAME etc.
1 English
1 Calculus
2 English
2 Philosophy
then when you SELECT * FROM SAC JOIN SCS ON SAC.STS_PERSON_ID = SCS.SCS_STUDENT, your result set looks like this:
(row) STC_ID STC_Course SCS_ID SCS_Course SCS_Date
1 1 English 1 English 1/1/2014
2 1 English 1 Calculus 2/1/2014
3 1 Calculus 1 English 1/1/2014
4 1 Calculus 1 Calculus 2/1/2014
5 2 English 2 English 3/1/2014
6 2 English 2 Philosophy 4/1/2014
7 2 Philosophy 2 English 3/1/2014
8 2 Philosophy 2 Philosophy 4/1/2014
Your WHERE clause then filters out all the rows where STC_COURSE is not "English", leaving you with 4 rows (row numbers 1,2,5,6) instead of just the 2 you really want (rows 1 and 5). (And, because you're not reporting any of the other fields, it just looks like "phantom records" appear out of nowhere.)
To fix it, you need additional conditions on your JOIN specifying what else besides the ID needs to match up. In my contrived case, you would need to say
JOIN STUDENT_COURSE_SEC SCS on SAC.STS_PERSON_ID = SCS.SCS_STUDENT and SAC.STC_COURSE_NAME = SCS.SCS_COURSE_NAME, selecting only rows where both the student and the course are a proper match.
The 'ghost' records are actually the true result set. The reason that they don't display when you comment out SCS.SCS_LAST_ATTEND_DATE is that you are creating duplicate records since the date is the only differentiator, and your DISTINCT is suppressing the duplicates.
If you remove the DISTINCT, and leave SCS.SCS_LAST_ATTEND_DATE commented out, you should then get the same number of rows as when you uncomment the date.
Playing around with the JOIN types implies that you don't really know what you are trying to query. As #MarkD said in the comments, we would need to see your data model in order to help you further.
I am using MS Access and I have a rather complex situation.
I have Respondents who are linked to varying numbers of different Companies via 2 connecting tables. I want to be able to create a list of distinct customers which excludes any customer associated with Company X.
Here is a pic of the relationships that are involved with the query.
And here is an example of what I'm trying to achieve.
RespondentRef | Respondent Name
8 Joe Bloggs
.
RespondentRef | GroupRef
8 2
.
GroupRef | CompanyRef
2 10
.
CompanyRef | CompanyName
10 Ball of String
I want a query where I enter in 'Ball of String' for the company name, and then it produces a list of all the Respondents (taken from Tbl_Respondent) which completely excludes Respondent 8 (as he is linked to CompanyName: Ball of String).
Tbl_Respondent
RespondentRef | Respondent Name
... ...
7 Bob Carlyle
9 Anton Boyle
I have tried many combinations of subqueries with <> and NOT EXISTS and NOT IN and nothing seems to work. I suspect the way these tables are linked may have something to do with it.
Any help you could offer would be very much appreciated. If you have any questions let me know. (I have made best efforts, but please accept my apologies for any formatting conventions or etiquette faux-pas I may have committed.)
Thank you very much.
EDIT:
My formatted version of Frazz's code is still turning resulting in a syntax error. Any help would be appreciated.
SELECT *
FROM Tbl_Respondent
WHERE RespondentRef NOT IN (
SELECT tbl_Group_Details_Respondents.RespondentRef
FROM tbl_Group_Details_Respondents
JOIN tbl_Group_Details ON tbl_Group_Details.GroupReference = tbl_Group_Details_Respondents.GroupReference
JOIN tbl_Company_Details ON tbl_Company_Details.CompanyReference = tbl_Group_Details.CompanyReference
WHERE tbl_Company_Details.CompanyName = "Ball of String"
)
This should do what you need:
SELECT *
FROM Tbl_Respondent
WHERE RespondentRef NOT IN (
SELECT gdr.RespondentRef
FROM Tbl_Group_Details_Respondent gdr
JOIN Tbl_Group_Details gd ON gd.GroupRef=gdr.GroupRef
JOIN Tbl_Company_Details cd ON cd.CompanyRef=gd.CompanyRef
WHERE cd.CompanyName='Ball of String'
)