Why the output of a SELECT can be another SELECT? - sql

I am rather confused about the following SQL query:
SELECT (SELECT S.name FROM student AS S
WHERE S.sid = E.sid) AS sname
FROM enrolled as E
WHERE cid='15-455';
SELECT should be followed by an output, but why here there is another SELECT? How to understand the step-by-step meaning of this query?
The following is the query that has the same result of the above query, but its meaning is rather explicit: the output of the second SELECT is passed into the IN() function.
SELECT name FROM student
WHERE sid IN (
SELECT sid FROM enrolled
WHERE cid = '15-445'
);
Here are the original tables of this question:
mysql> select * from student;
+-------+--------+------------+------+---------+
| sid | name | login | age | gpa |
+-------+--------+------------+------+---------+
| 53666 | Kanye | kayne#cs | 39 | 4.00000 |
| 53688 | Bieber | jbieber#cs | 22 | 3.90000 |
| 53655 | Tupac | shakur#cs | 26 | 3.50000 |
+-------+--------+------------+------+---------+
mysql> select * from enrolled;
+-------+--------+-------+
| sid | cid | grade |
+-------+--------+-------+
| 53666 | 15-445 | C |
| 53688 | 15-721 | A |
| 53688 | 15-826 | B |
| 53655 | 15-445 | B |
| 53666 | 15-721 | C |
+-------+--------+-------+
mysql> select * from course;
+--------+------------------------------+
| cid | name |
+--------+------------------------------+
| 15-445 | Database Systems |
| 15-721 | Advanced Database Systems |
| 15-826 | Data Mining |
| 15-823 | Advanced Topics in Databases |
+--------+------------------------------+

In real life I'd say both queries are just two creepy ways to avoid joins.
But in this particular case they were included in the slides you've found in order to show in how many place nested loops can be used.
They all do the same thing as the following
SELECT name
FROM student s
JOIN enrolled e
ON s.sid = e.sid
WHERE cid = '15-445';
As for your question about step-by-step meaning of the first query. It is the following
This will loop through every record from "enrolled" table that has cid = '15-455'.
FROM enrolled as E
WHERE cid='15-455';
For every record from step 1 it will perform the following query
SELECT S.name
FROM student AS S
WHERE S.sid = E.sid;

This construct:
SELECT (SELECT S.name FROM student S WHERE S.sid = E.sid) AS sname
-------^
is called a scalar subquery. This is a special type of subquery that has two important properties:
It returns one column.
It returns at most one row.
In this case, the scalar subquery is also a correlated subquery meaning that it references columns in the outer query, via the where clause.
A scalar subquery can be using almost anywhere that a scalar (i.e. constant value) can be used in a query. They can be handy. They are not exactly equivalent to a join, because:
An inner join can filter values. A scalar subquery returns NULL if there are no rows returned.
A join can multiply the number of rows. A scalar subquery returns an error if it returns more than one row.

If you want to get informations like :
Name of student | CID | Grade |
You can do something like :
select t.name, e.cid, e.grade
from enrolled e
inner join student t on (e.sid = t.sid)
Or without join (for optimization) :
select (name from student t where t.sid = e.sid) as name, e.cid, e.grade
from enrolled e
so results are the same but in the second one you're avoiding joins.

Related

Oracle SQL query comparing multiple rows with same identifier

I'm honestly not sure how to title this - so apologies if it is unclear.
I have two tables I need to compare. One table contains tree names and nodes that belong to that tree. Each Tree_name/Tree_node combo will have its own line. For example:
Table: treenode
| TREE_NAME | TREE_NODE |
|-----------|-----------|
| 1 | A |
| 1 | B |
| 1 | C |
| 1 | D |
| 1 | E |
| 2 | A |
| 2 | B |
| 2 | D |
| 3 | C |
| 3 | D |
| 3 | E |
| 3 | F |
I have another table that contains names of queries and what tree_nodes they use. Example:
Table: queryrecord
| QUERY | TREE_NODE |
|---------|-----------|
| Alpha | A |
| Alpha | B |
| Alpha | D |
| BRAVO | A |
| BRAVO | B |
| BRAVO | D |
| CHARLIE | A |
| CHARLIE | B |
| CHARLIE | F |
I need to create an SQL where I input the QUERY name, and it returns any ‘TREE_NAME’ that includes all the nodes associated with the query. So if I input ‘ALPHA’, it would return TREE_NAME 1 & 2. If I ask it for CHARLIE, it would return nothing.
I only have read access, and don’t believe I can create temp tables, so I’m not sure if this is possible. Any advice would be amazing. Thank you!
You can use group by and having as follows:
Select t.tree_name
From tree_node t
join query_record q
on t.tree_node = q.tree_node
WHERE q.query = 'ALPHA'
Group by t.tree_name
Having count(distinct t.tree_node)
= (Select count(distinct q.tree_node) query_record q WHERE q.query = 'ALPHA');
Using an IN condition (a semi-join, which saves time over a join):
with prep (tree_node) as (select tree_node from queryrecord where query = :q)
select tree_name
from treenode
where tree_node in (select tree_node from prep)
group by tree_name
having count(*) = (select count(*) from prep)
;
:q in the prep subquery (in the with clause) is the bind variable to which you will assign the various QUERY values at runtime.
EDIT
I don't generally set up the test case on online engines; but in a comment below this answer, the OP said the query didn't work for him. So, I set up the example on SQLFiddle, here:
http://sqlfiddle.com/#!4/b575e/2
A couple of notes: for some reason, SQLFiddle thinks table names should be at most eight characters, so I had to change the second table name to queryrec (instead of queryrecord). I changed the name in the query, too, of course. And, second, I don't know how I can give bind values on SQLFiddle; I hard-coded the name 'Alpha'. (Note also that in the OP's sample data, this query value is not capitalized, while the other two are; of course, text values in SQL are case sensitive, so one should pay attention when testing.)
You can do this with a join and aggregation. The trick is to count the number of nodes in query_record before joining:
select qr.query, t.tree_name
from (select qr.*,
count(*) over (partition by query) as num_tree_node
from query_record qr
) qr join
tree_node t
on t.tree_node = qr.tree_node
where qr.query = 'ALPHA'
group by qr.query, t.tree_name, qr.num_tree_node
having count(*) = qr.num_tree_node;
Here is a db<>fiddle.

How do I structure my SQL query to prevent the return of duplicate rows with related data?

I need some help with an SQL Query. I have a database table that has related data with other tables. When I query the table it returns the duplicate rows for every row of related data i.e.
|-------------| |-------------| |-------------|
| Cars | | Options | | Value |
|-------------| ------> |-------------| ------> |-------------|
| CarId | | OptionsId | | ValueId |
| CarMake | | OptionName | | CostValue |
| CarModel | | Confirmed | | CarId |
|-------------| | CarId | | OptionsId |
|-------------| |-------------|
|
|
---------------> |-------------|
| Warranty |
|-------------|
| WarrantyId |
| WarrantyType|
| CarId |
|-------------|
The query that I have made, which was designed in the query builder of SSMS (because of this it is not using aliases and has the 3 stage naming convention, this will be changed) is as follows:
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
dbo.Warranty.WarrantyType,
dbo.Value.CostValue
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
LEFT JOIN Value ON Options.OptionsId = Value.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
Executing this query as it stands returns my data, however, for cars with multiple options I receive duplicate rows i.e.
Id | Make | Model | Option Name | Warranty Type | Value
27 | Ford | Fiesta | Heated Seats | Static | 500
27 | Ford | Fiesta | Front Fog Lights | Static | 400
I've been looking around for possible answers to this question and found that the proposed solution is to use the keyword DISTINCT or to create a subquery. I added DISTINCT to my query but the same data was returned, probably because the options are both distinct in their own right, I don't know I'm guessing.
I'm happy to use a subquery but not sure how to apply that to my above query code. All I want to do here is return one single row for each car with the highest option value i.e.
27 | Ford | Fiesta | Heated Seats | Static | 500
Can anyone help me write this query? I think I've included everything in this question but if I can offer more, please let me know.
Instead of joining the table Value which gives you multiple rows,
you must join this query:
SELECT
dbo.Value.CarId,
dbo.Value.OptionsId,
MAX(dbo.Value.CostValue) AS CostValue
FROM dbo.Value
GROUP BY dbo.Value.CarId, dbo.Value.OptionsId
which you will give you from the table Value for each car the option with the max value.
So try this:
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
v.CostValue,
dbo.Warranty.WarrantyType
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
INNER JOIN (
SELECT
dbo.Value.CarId,
dbo.Value.OptionsId,
MAX(dbo.Value.CostValue) AS CostValue
FROM dbo.Value
GROUP BY dbo.Value.CarId, dbo.Value.OptionsId
) AS v ON Options.OptionsId = v.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
you can try like below by using window function
with cte as(
SELECT dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model,
dbo.Options.OptionName,
Value.CostValue,
row_number() over(partition by dbo.Cars.CarId,
dbo.Cars.Make,
dbo.Cars.Model order by Value.CostValue desc) rn
FROM dbo.Cars
LEFT JOIN dbo.Options ON dbo.Cars.CarId = dbo.Options.CarId
LEFT JOIN Value ON Options.OptionsId = Value.OptionsId
LEFT JOIN dbo.Warranty on dbo.Cars.CarId = dbo.Warranty.CarId
) select * from cte where rn=1

How do I use group by with a value that should always be included?

I asked a similar question; however, I asked it incorrectly before. Let's say I have the following table:
+-----------+------------+-------+
| quiz_type | student_id | score |
+-----------+------------+-------+
| class | NULL | 10 |
+-----------+------------+-------+
| class | NULL | 9 |
+-----------+------------+-------+
| student | A | 5 |
+-----------+------------+-------+
| student | B | 7 |
+-----------+------------+-------+
| student | A | 6 |
+-----------+------------+-------+
I want to get the standard deviation of the scores for each student, but need to include the class scores for every student. In reality, the quiz_type column doesn't exist (it's just to better show the example). I need to do a GROUP BY student_id, but include the NULL values with every group. I've been struggling with this for quite a bit. Is there a good solution?
For the sake of example, I'd like to use the aggregate AVG function to get a table like the following:
+------------+---------+
| student_id | Average |
+------------+---------+
| A | 7.5 |
+------------+---------+
| B | 8.67 |
+------------+---------+
In reality I will be calling the STDDEV_SAMP function.
One clever way to do this is to self join your table in such a way that the NULL values get paired up with every non NULL entry. Then, you can use both score columns in your calculation. Try something like this:
SELECT t2.student_id,
SUM(t2.score) / (SELECT SUM(CASE WHEN student_id IS NULL THEN 1 ELSE 0 END) FROM students) AS nonNullScore,
(SUM(t1.score) / COUNT(*)) * (SELECT SUM(CASE WHEN student_id IS NULL THEN 1 ELSE 0 END) FROM students) AS nullScore
FROM students t1
INNER JOIN students t2
ON t1.student_id IS NULL AND t2.student_id IS NOT NULL
GROUP BY t2.student_id
I tested this query in MySQL Workbench and it appears to be working.
Output:
student_id | nonNullScore | nullScore
A | 11.0000 | 19.0000
B | 7.0000 | 19.0000
From the question, the mean value of score can be adjusted by adding the scores for null values to the total. The adjusted standard deviation can then be calculated from the adjusted mean per student.
SELECT
student_id,
SQRT(AVG(squared_diff)) adjusted_std_deviation
FROM (SELECT
t.student_id,
pow((t.score - x.adjmean), 2) squared_diff
FROM t
CROSS JOIN (SELECT avg(1.0*score) adjmean FROM t) x
WHERE student_id IS NOT NULL) y
GROUP BY student_id
ORDER BY 1
Sample Fiddle
Calculating Standard Deviation

Issue with JOIN command

Hi New comer to Stack overflow so if I do not present this correctly I am sorry.
I have used Google, W3schools and read the FQA on SQL.
I am running SQL using the SQL command line in WAMP2.0. I am currently doing a project where the aim is to create a min University DB. With students, grades, programmes, modules ect
One of the tasks is to to list all the students, there modules and there correspoding grades. To do this I am trying to use a JOIN command to select all the names from the Students table, with all there corresponding modules + grades from the records table.
+------------+-------+------------+-----------------+
| Student_id | Name | DOB | Address |
+------------+-------+------------+-----------------+
| 4665236 | Paddy | 1985-09-18 | 123 Fake Street |
| 5665236 | Paul | 1984-06-12 | Good manlane |
| 6665236 | John | 1984-03-09 | Docotor town |
| 7665236 | Aidan | 1983-07-09 | Banker worlds |
| 8665236 | Joe | 1983-07-09 | 24 hitherwood |
+------------+-------+------------+-----------------+
+------------+--------+------+-------+
| Student_id | Mod_id | GPA | Grade |
+------------+--------+------+-------+
| 4655236 | 2222 | 3.84 | A- |
| 5655236 | 11111 | 3.44 | B+ |
| 6655236 | 33333 | 3.24 | B |
| 7655236 | 44444 | 2.45 | C- |
| 8655236 | 44444 | 2.45 | C- |
+------------+--------+------+-------+
The PRIMARY KEY in the students table is Student_id INT 11
The PRIMARY KEY for records is (Student_id,Mod_id)
Individual SELECT FROM , statements work fine on both tables.
Issue occurs when I use
SELECT students.Name, records.Grade
FROM students
INNER JOIN records
ON students.Student_id=Student_id
ORDER BY students.Name
I get the following error
ERROR 1052 (23000): Column 'Student_id' in on clause is ambiguous
Thanks for amazingly fast response I tried
SELECT students.Name, records.Grade
FROM students
INNER JOIN records
ON students.Student_id=records.Student_id
ORDER BY students.Name;
And Got ---- Empty set (0.00 sec) ?
You have to qualify that column Student_Id with an alias, something like records.studentId so that it will be un ambiguous in the ON clause, or:
SELECT s.Name, r.Grade
FROM students AS s
INNER JOIN records AS r ON s.Student_id= r.Student_id
ORDER BY s.Name
You need to supply the table name for column Student_id to avoid ambiguity because it both exist on the two tables.
SELECT students.Name, records.Grade
FROM students
INNER JOIN records
ON students.Student_id = records.Student_id -- << THIS
ORDER BY students.Name
The reason you're column name is being flagged as ambiguous is because you've got two different tables that each have the Student_id field. You can join a table to itself, so even though you've got an identifier on the first instance of the field you need one on both.
Try the following code:
SELECT students.Name, records.Grade
FROM students
INNER JOIN records
ON students.Student_id=records.Student_id
ORDER BY students.Name
You can also alias the tables if that helps your code look cleaner by using the following:
SELECT s.Name, r.Grade
FROM students s
INNER JOIN records r
ON s.Student_id=r.Student_id
ORDER BY s.Name
However, this only works if the Student IDs match in both tables. in the example data you've presented there are no matching records. 4665236 != 4655236

SQL LEFT JOIN help

My scenario: There are 3 tables for storing tv show information; season, episode and episode_translation.
My data: There are 3 seasons, with 3 episodes each one, but there is only translation for one episode.
My objetive: I want to get a list of all the seasons and episodes for a show. If there is a translation available in a specified language, show it, otherwise show null.
My attempt to get serie 1 information in language 1:
SELECT
season_number AS season,number AS episode,name
FROM
season NATURAL JOIN episode
NATURAL LEFT JOIN episode_trans
WHERE
id_serie=1 AND
id_lang=1
ORDER BY
season_number,number
result:
+--------+---------+--------------------------------+
| season | episode | name |
+--------+---------+--------------------------------+
| 3 | 3 | Episode translated into lang 1 |
+--------+---------+--------------------------------+
expected result
+-----------------+--------------------------------+
| season | episode| name |
+-----------------+--------------------------------+
| 1 | 1 | NULL |
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 2 | 1 | NULL |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
| 3 | 1 | NULL |
| 3 | 2 | NULL |
| 3 | 3 | Episode translated into lang 1 |
+--------+--------+--------------------------------+
Full DB dump
http://pastebin.com/Y8yXNHrH
I tested the following on MySQL 4.1 - it returns your expected output:
SELECT s.season_number AS season,
e.number AS episode,
et.name
FROM SEASON s
JOIN EPISODE e ON e.id_season = s.id_season
LEFT JOIN EPISODE_TRANS et ON et.id_episode = e.id_episode
AND et.id_lang = 1
WHERE s.id_serie = 1
ORDER BY s.season_number, e.number
Generally, when you use ANSI-92 JOIN syntax you need to specify the join criteria in the ON clause. In MySQL, I know that not providing it for INNER JOINs results in a cross join -- a cartesian product.
LEFT JOIN episode_trans
ON episode_trans.id_episode = episode.id_episode
AND episode_trans.id_lang = 1
WHERE id_serie=1
You probably need to move the id_lang = 1 into the LEFT JOIN clause instead of the WHERE clause. Think of it this way... for all of those rows with no translation the LEFT JOIN gives you back NULLs for all of those translation columns. Then in the WHERE clause you are checking to see if that is equal to 1 - which of course evaluates to FALSE.
It would probably be easier if you included your code in the question next time instead of in a link.
Can you try using
LEFT OUTER JOIN
instead of
NATURAL LEFT JOIN