SQL Slightly complicated query - sql

Question: Write a query to find academics that are authors and that have only ever coauthored papers with authors from institutes in the same state as their own. List their academic number, title and last name.
I've been working on this question for some time and I haven't been able to think of a proper query.
My schema for the database (tables) I need to use:
ACADEMIC(ACNUM, DEPTNUM, FAMNAME, TITLE)
DEPARTMENT(DEPTNUM, STATE)
AUTHOR(PANUM, ACNUM)
There are multiple ACNUM's (Account number) for 1 PANUM (Page number) inside the Author table.
I tried planning it and all I could think of was something along the lines of this:
Need to loop through Author table,
SELECT PANUM
FROM AUTHOR A
Then need to find all authors of that paper:
SELECT ACNUM
FROM AUTHOR B
WHERE PANUM = A.PANUM;
Then need to intersect all states:
SELECT DEPTNUM, TITLE, FAMNAME, UPPER(STATE)
FROM ACADEMIC C, DEPARTMENT D
WHERE C.ACNUM = B.ACNUM AND D.DEPTNUM = C.DEPTNUM;
Could you guys give me some assistance on how I could do something like this? I appreciate any help.
EDIT: Some more information
I haven't actually figured out a desired result as theres hundreds of rows of data. Essentially, I have to query the database by: Selecting page number from Author table, then finding all Account Number's that the Author table has for that page number, then using all these account numbers I need to make sure they are all in the same state. E.g. Account 100 and 101 worked on page number 300 together and both are in state VIC, thus I would list the academics information (famname, title and acnum)

You need to query out a table joining the ACADEMIC, DEPARTMENT, and AUTHOR tables, and then self-join that table by paper, restricting on state, to obtain the result you want:
SELECT DISTINCT t1.FAMNAME, DISTINCT t1.TITLE FROM
(ACADEMIC a1 INNER JOIN DEPARTMENT d1 ON a1.DEPTNUM = d1.DEPTNUM
INNER JOIN AUTHOR auth1 ON a1.ACNUM = auth1.ACNUM) t1
INNER JOIN
(ACADEMIC a2 INNER JOIN DEPARTMENT d2 ON a2.DEPTNUM = d2.DEPTNUM
INNER JOIN AUTHOR auth2 ON a2.ACNUM = auth2.ACNUM) t2
GROUP BY t1.PANUM
HAVING COUNT(DISTINCT t2.STATE) = 1;

Related

SQLite Subqueries and Inner Joins

I was doing a practice question for SQL which asks to create a list of album titles and unit prices for the artist "Audioslave" and find out how many records are returned.
Here is the relational database picture given in the question:
Initially, I used an inner join to retrieve the list and actually got the correct answer (40 records returned). The code is shown below:
select a.Title, t.UnitPrice
from albums a
inner join tracks t on t.AlbumId = a.AlbumId
inner join artists ar on ar.ArtistId = a.ArtistId
where ar.Name = 'Audioslave';
Although I finished the question, I was curious to try to solve this problem using nested subqueries instead and tried to first retrieve the AlbumId and UnitPrice from tracks. I got the correct answer but not the correct list (the question asked for album title and not AlbumId). Here is the code:
select AlbumId, UnitPrice
from tracks
where AlbumId in (
select AlbumId
from albums
where ArtistId in (
select ArtistId
from artists
where Name = 'Audioslave'));
In order to solve the problem with the list, I tried combining the previous codes. However, I get a completely different amount of records being returned (10509).
select a.Title, t.UnitPrice
from albums a
inner join tracks t
where a.AlbumId in (
select AlbumId
from albums
where ArtistId in (
select ArtistId
from artists
where Name = 'Audioslave'));
I don't understand what I'm doing wrong with the last code...Any help would be appreciated! Also, sorry if I wrote too much, I just wanted to convey my thinking process clearly.
Some databases (SQLite, MySQL, Maria, maybe others) allow you to write an INNER JOIN without specifying ON, and they just cross every record on the left with every record on the right in that case. If there were 2 albums and 3 tracks, 6 rows would result. If the albums were A and B, and the tracks were 1, 2 and 3, the rows would be the combination of all: A1, A2, A3, B1, B2, B3
Other databases (Postgres, SQLServer, Oracle, maybe others) refuse to do it unless you specify ON. To get an "every row on the left combined with every row on the right" you have to write CROSS JOIN (or write an inner join with an ON that is always true)
It might help your mental model of what happens during a join to consider that the db takes all the rows on the left and connects them to all the rows on the right, then for each combination of rows, assesses the truth of the ON clause, and the WHERE clause, before deciding to return the row
For example, this will return 10509 rows:
SELECT * FROM albums INNER JOIN tracks ON 1=1
The on clause is always true
This will return 10509 tracks, but only if the query is run on Monday
SELECT * FROM albums INNER JOIN tracks ON strftime('%w', 'now') = 1
What goes in the ON or WHERE doesn't have to have anything to do with the data in the table.. it just has to be something that resolves to a Boolean

Counting positions in a row oracle

I'm making bookstore database and in the table "Authors" I've got column where i'd like to count books written by author in the table "Books". Table "Books" has foreign key "id_author". I have no idea how to do it, it's something like
SELECT COUNT(*) FROM Books WHERE id_author = "id of chosen author"
What to write in code in place of "id of chosen author"?
How to put it in a row in table "Author"?
You could join on the authors table and query by its columns (e.g., the first and last name):
SELECT COUNT(*)
FROM books b
JOIN author a ON b.id_author = a.id
WHERE a.firstname = 'John' AND a.lastname = 'Doe'
Depending on the application you have, the application might already have the author id. This means that you could just do a
SELECT COUNT(*) FROM Books WHERE id_author = "id of author provided by application"
Although the difference is probably not large compared to Mureinik's answer, this can enhance your performance on large databases as you do not have to do the join between author and book table.
Number of books by author:
select a.author, count(1)
from author a
join books b
on a.id = b.id_author
group by a.author

Select * from a table using data from specific entry in table

I have an author table
| au_id | au_fname | au_lname | city | state |
what i am trying to do is get a query of first and last names based on who lives in the same state as Sarah
Heres what I have so far:
SELECT AU_FNAME, AU_LNAME FROM authors WHERE "STATE" like 'CA'
I don't want to use a static state in my code, I want it to be based on the selected person - Sarah in this case.
Thanks
Use a Sub-Query to find the state of Sarah and filter that state
Try this
SELECT AU_FNAME, AU_LNAME
FROM authors
WHERE STATE in (select state from authors where au_fname = 'Sarah')
SELECT AU_FNAME, AU_LNAME FROM authors WHERE STATE in (select state from authors where au_fname = 'sarah')
or
select a1.AU_FNAME, a1.AU_LNAME FROM authors a1
inner join authors a2 on a1.state = a2.state
where a2.au_fname = 'sarah'
If there is only one "Sarah", then it is best to use = rather than in:
select a.AU_FNAME, a.AU_LNAME
from authors a
where a.state = (select a2.state from authors a2 where a2.au_fname = 'Sarah');
Of course, if there could be more than one Sarah, then in is needed. But = often has better performance, because the database engine knows to search for a single value.
The question is defined a bit vaguely, so based on a qualified guess, it seems asking for the solution to retrieve data rows (containing either ALL data fields, i.e. SELECT * as per the title, or just first/last names as per description, i.e. SELECT AU_FNAME, AU_LNAME) for the states related to some au_fname (e.g. Sarah) passed as a parameter to a SQL query. In the most general case, there can be multiple different authors with the same first name related to multiple different states. Based on this assumption, the query/sub-query may look like
SELECT * FROM authors
WHERE STATE IN (SELECT [state] from authors WHERE au_fname = #afname)
The exact syntax may depend on the specifics of the Database used in the solution.
An inner join would be my first choice over a subquery... some db engines have poor performance with subqueries (older MySQL versions come to mind), but just about any engine ought to be optimized for an inner join:
SELECT a1.AU_FNAME, a1.AU_LNAME
FROM Authors AS a1
INNER JOIN Authors AS a2 ON a1.state = a2.state
WHERE a2.au_fname = 'sarah'
GROUP BY a1.AU_FNAME, a1.AU_LNAME;

SQLZOO #12 -- confused about multiple select & join statements

I am attempting to answer question #12 on sqlzoo.net
(http://sqlzoo.net/wiki/More_JOIN_operations). I couldn't figure out the answer on my own but I did manage to find the answer online.
12: Which were the busiest years for 'John Travolta', show the year and the number of movies he made each year for any year in which he made more than 2 movies.
Answer:
SELECT yr,COUNT(title) FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr
HAVING COUNT(title)=(SELECT MAX(c) FROM
(SELECT yr,COUNT(title) AS c FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr) AS t)
One of parts that I do not fully understand is the multiple joins:
FROM movie
JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
Is Actor being joined only with Movie, or is actor being joined with Movie JOIN Casting?
I am trying to find a website that explains complex join statements as my attempted answer was far from correct (missing many sections). I think subselect statements with multiple complex join statements is a bit confusing at the moment. But, I could not find a good website that breaks the information up to help me form my own queries.
The other part I don't fully understand is this:
(SELECT yr,COUNT(title) AS c FROM
movie JOIN casting ON movie.id=movieid
JOIN actor ON actorid=actor.id
WHERE name='John Travolta'
GROUP BY yr) AS t)
3. What is the above code trying to find?
Ok, glad you are not afraid to ask, and I'll do my best to help clarify what is going on... Please excuse my re-formatting of the query to my mindset of writing queries. It better shows the relationships of where things are coming from (my perspective), and may help you too.
A few other things about my rewrite. I also like to use alias references to the tables so every column is qualified with the table (or alias) it originates from. It prevents ambiguity, especially for someone who does not know your table structures and relationships between tables. (m = alias to movie, c = alias for casting, a = alias for actor tables). For the sub query, and to keep alias confusion clear, I suffixed them with 2, such as m2, c2, a2.
SELECT
m.yr,
COUNT(m.title)
FROM
movie m
JOIN casting c
ON m.id = c.movieid
JOIN actor a
ON c.actorid = a.id
WHERE
a.name = 'John Travolta'
GROUP BY
m.yr
HAVING
COUNT(m.title) = ( SELECT MAX(t.movieCount)
FROM
( SELECT m2.yr,
COUNT(m2.title) AS movieCount
FROM
movie m2
JOIN casting c2
ON m2.id = c2.movieid
JOIN actor a2
ON c2.actorid = a2.id
WHERE
a2.name='John Travolta'
GROUP BY
m2.yr ) AS t
)
First, look at the outermost query (aliases m, c, a ) and the innermost query (aliases m2, c2, a2) are virtually identical.
The query has to run from the deepest query first... in this case the m2, c2, a2 query. Look at it and see what IT is going to deliver. If you ran that, you would get every year he had a movie and the number of movies... starting result from their sample data goes from 1976 all the way to 2010. So far, nothing complex unto itself (about 20 rows). Now, since each table may have an alias, each sub query (such as this MUST have an alias, so that is why the "as t". So, there is no true table, it is wrapping the entire query's result set and assigning THAT the alias of "t".
So now, go one level up in the query also wrapped in parens...
SELECT MAX(t.movieCount)
FROM (EntireSubquery as t)
Although abbreviated, this is what the engine is doing. Looking at the subquery result given an alias of "t" and finding the maximum "movieCount" value which is the count of movies that were done in a given year. In this case, the actual number is 3 and we are almost done.
Now, to the outermost query... again, this was virtually identical to the innermost query. The only difference is the HAVING clause. This is applied after all the grouping per year is performed. Then it is comparing ITs row result set count per year to the 3 value result of the SELECT MAX( t.movieCount )...
So, all the years that had only 1 or 2 movies are excluded from the result, and only the one year that had 3 movies are included.
Now, to clarify the JOINs. Each table should have a relationship with one or more tables (also known as linking tables, such as the cast table that has both a movie and actors/actresses. So, think of the join as how to I put the tables in order so that each one can touch a piece to the other until I have them all chained together. In this case
Movie -> Casting linked by the movie ID, then Casting -> actor by the actor ID, so that is how I do it visually hierarchically... I am starting FROM the Movie table, JOINing to the cast table based ON Movie ID = Cast Movie ID. Now, from the Casting table joined to the Actor table based on the common Actor ID field
FROM
movie m
JOIN casting c
ON m.id = c.movieid
JOIN actor a
ON c.actorid = a.id
Now, this is a simple relationship, but you COULD have one primary table with multiple child-level tables. You could join multiple tables based on the respective data. Very simple sample to clarify the point. You have a student table going to a school. A student has a degree major, an ethnicity, an address state (assuming an online school and students can be from any state). If you had lookup tables for degrees, ethnicity and states, you might come up with something like...
select
s.firstname,
s.lastname,
d.DegreeDescription,
e.ethnicityDescription,
st.stateName
from
students s
join degrees d
on s.degreemajor = d.degreeID
join ethnicity e
on s.ethnicityID = e.id
join states st
on s.homeState = st.stateID
Notice the hierarchical representation that each table is directly associated under that of the student. Not all tables need to be one deeper than the last.
So, there are many sites out there, such as the w3schools as offered by Mark, but learn to dissect small pieces at a time... what are the bare minimum tables to get from point-A to point-Z and draw the relationships. THEN, tare down based on requirement criteria you are looking for.
The correct answer would be:
SELECT yr, COUNT(title)
FROM movie m
JOIN casting c ON m.id=c.movieid JOIN actor a ON c.actorid=a.id
WHERE name='John Travolta'
GROUP BY yr
HAVING COUNT(title) > 2;
The answer you found (which seems to be a mistake on the sqlzoo site) is looking for any year that has a count equal to the year with the highest count.
I used table aliases in the query above to clear up how the tables are joined. Movie is joined to casting and casting is joined to actor.
The subquery that confuses you is listing each year and a count of movies for that year that star John Travolta. It's not needed if you're answering the question as written.
As for learning resources, make sure you have the basics down. Understand everything at http://w3schools.com/sql. Try searching for "sql joining multiple tables" in your favorite search engine when you're ready for more.

SQL Query multiple tables same values

I'm having an issue creating a query. Here are the specifics.
There are 2 tables company_career and company_people.
People contains person information (Name, Address, etc) and Career contains historical career information (job_title, department, etc.)
People is linked to Career by job_ref_id.
Direct_Report_id lies in the career table and contains a unique id that correlates to job_ref_id.
Example: job_ref_id = '1' results in direct_report_id ='A'. I then use the value produced from direct-report_id (i.e., 'A') and query the job_ref_id = 'A' and this produces the employee name. Since it produces the employee name (which is actually the manager) I need to know how I would query this to present this as the manager name.
I think I know what you are looking for, you just need to use joins and aliases. For example:
SELECT
cp.name AS [EmployeeName],
cp.address AS [EmployeeAddress],
cc.job_title AS [EmployeeTitle],
cc.department AS [EmployeeDept],
m.name AS [ManagerName]
FROM company_people cp
LEFT JOIN company_career cc ON cc.job_ref_id = cp.job_ref_id
LEFT JOIN company_people m ON m.job_ref_id = cc.direct_report_id