I am new to SQL. I am trying to write a query which requires contains operator. I am looking at many examples for the contains clause, but they seem different than what I need to use. I want a divide operator(relational algebra) equivalent in sql.
a sample of what I am doing is :
Give the students names who are enrolled for computer architecture course and have satisfied all its prerequisites. I have a prereq table and a course table which lists all courses taken by student
so what I ultimately want is
(get all courses taken by a student) contains (get all prerequisits)
for what exactly shall I look for for sql equivalent of divide operation in relational algebra?
You can do this by comparing counts.
select s.StudentName
from StudentCourses s, PrereqCourses p
where s.CourseId = p.CourseId
group by s.StudentName
having count(*) = (select count(*) from PrereqCourses)
See: http://www.dba-oracle.com/t_sql_patterns_relational_division.htm
Related
I have the following relations:
teaches(ID,course_id,sec_id,semester,year)
instructor(ID,name,dept_name,salary)
I am trying to express the following as an SQL query:
Find the ID and name of the instructor who has taught the most courses(i.e has the most tuples in teaches).
My Query
select ID, name
from teaches
natural join instructor
group by ID
order by count(*) desc
I know this isn't correct, but I feel like I'm on the right track. In order to answer the question, you need to work with both relations, hence the natural join operation is required. Since the question asks for the instructor that has taught the most courses, that tells me that we are trying to count the number of times each instructor ID appears in the teaches relation. From what I understand, we are looking to count distinct instructor IDs, hence the group by command is needed.
Don't use natural joins: all they do is rely on column names to decide which columns relate across tables (they don't check for foreign keys constraints or the-like, as you would thought). This is unreliable by nature.
You can use a regular inner join:
select i.id, i.name
from teaches t
inner join instructor i on i.id = t.sec_id
group by i.id, i.name
order by count(*) desc
limit 1
Notes:
this assumes that column teaches.sec_id relates to instructor.id (I cannot see which other column could be used)
I added a limit clause to the query since you stated that you want the top instructor - the syntax may vary across databases
always prefix the column names with the table they belong to, to make the query unambiguous and easier to understand
it is a good practice (and a requirement in many databases) that in an aggregate query all non-aggregared columns listed in the select clause should appear in the group by clause; I added the instructur name to your group by clause
I'm trying to create a view that returns the amount of courses each Californian student is enrolled in. There are 4 CA students listed in my 'Students' table, so it should return that many rows.
create or replace view s2018_courses as
select Students.*, COUNT(Current_Schedule.ID) EnrolledCourses
from Students, Current_Schedule
where Students.ID = Current_Schedule.ID AND state='CA';
This query, however, only returns a single row with one student's information, and the total number of courses all CA students are in (in this case 14, since each student is enrolled in 3-5 classes).
I created a view similar to this recently (in a different DB) and it worked well and executed multiple rows, so I'm not sure what is going wrong? Sorry if this is a confusing question, I'm new to SQL and StackOverflow!! Thank you in advance for any advice!
You are missing an aggregation step in your query/view:
CREATE OR REPLACE VIEW s2018_courses AS
SELECT
s.ID, COUNT(cs.ID) EnrolledCourses
FROM Students s
INNER JOIN Current_Schedule cs
ON s.ID = cs.ID
WHERE
state = 'CA'
GROUP BY
s.ID;
The logical problem with your current query is that you are using COUNT(*) without GROUP BY, and MySQL is interpreting this to mean that you want to take the count of the entire table. Note also that I only select ID in my query above, because this is what is being used to aggregate.
In class, professor said that SQL language does not provide 'for all' operator.
In order to use 'for all' you have to use 'not exist( X except Y)'
At this point, I can't figure out why 'for all' is same meaning as 'not exist( X except Y)'
I give you example relation:
course (cID,title,deptName,credit),
teaches (pID,cID,semester,year,classroom),
student (sID,name,gender,deptName)
Q: Find all student names who have taken all courses offered in 'CS' department
The answer is:
Select distinct
S.sid, S.name
from
student as S
where
not exists (
(select cID from course where deptName = 'CS')
except
(select T.cID from takes as T where S.sID = T.sID)
);
Can you give me specific explain about that?
ps. Sorry for my English skill
You professor is right. SQL has no direct way to query all records that have all possible relations of a certain type.
It's easy to query which relations of a certain type a record has. Just INNER JOIN the two tables and you are done.
But in an M:N relationship like "students" to "taken courses" it's not that simple.
To answer the question "which student has taken all possible courses" you must find out which relations could possibly exist and then make sure that all of them do actually exist.
select distinct
S.sid, S.name
from
student as S
where
not exists (
(select cID from course where deptName = 'CS')
except
(select T.cID from takes as T where S.sID = T.sID)
);
can be translated as
give me all students SELECT
for whom it is true: WHERE
that the following set is empty NOT EXISTS
(any course in 'CS') "all relations that can possibly exist"
minus EXCEPT
(all courses the student has taken) "the ones that do actually exist"
In other words: Of all possible relations there is no relation that does not exist.
There are other ways of expressing the same thought that can be used in database systems without support for EXCEPT.
For example
select
S.sid,
S.name
from
student as S
inner join takes as T on T.sID = S.sID
inner join course as C on C. cID = T. cID
where
c. deptName = 'CS'
group by
S.sid,
S.name
having
count(*) = (select count(*) from course where deptName = 'CS');
From your table definition and requirement its not clear what is the use of teaches table. You want the list of students names those have taken all courses offered by 'CS' department. For this students and course table is enough.
SELECT name
FROM
(
SELECT B.name, A.cid
FROM course A
INNER JOIN student B ON A.deptName = B.deptName
WHERE A.deptName = 'CS'
GROUP BY A.cid, B.name
) A
GROUP BY name
HAVING COUNT(name) >= (SELECT COUNT(cid) FROM course WHERE deptName = 'CS')
Internal query just selects all students those have taken any course offered by 'CS' dept and with group by I just make sure that in case a student take same course twice they will be counted as one row. Next I just select those students take all course offered by 'CS' dept.
I think you have some gap to understand your requirement properly. In your requirement no relation with teaches table is specified.
Q: Find all student names who have taken all courses offered in 'CS'
department
NOT EXISTS returns true if the query passed to it contains 0 records.
In this case, your sub-query from NOT EXISTS selects all the courses offered in 'CS', and subtract from this result set all the courses taken by specific student.
If the student have taken all the courses then except will remove all and the sub-query will return 0 records, which in pair with NOT EXISTS will give you true for specific student, and it will be displayed in final result set.
Brief history: Codd invented the Relational Model (RM), some people created a DBMS loosely based on RM to prove a RM product could be performant, and the SQL language emerged based on that DBMS (i.e. not directly based on the RM).
Codd came up with a set of primitive operators to define a database as being relationally complete. His algebra included product, where two relations are 'multiplied' together to give a combined relation; this made it into SQL as CROSS JOIN. [Side note: people refer to this operator as 'Cartesian product', which results in a set of ordered pairs. However, product in RM results in a relation (as do all relational operators), and CROSS JOIN results in a table expression (loosely speaking).]
Codd's algebra also included a division operator. I guess the thinking is, we should be able to take the result of product and one of the relations and use an operator to result in the other relation. But it does has some practical use too, of course. It is commonly expressed as 'the supplier who supplies all products', after Chris Date's parts and suppliers database found in his books. SQL lacks an explicit division operator, so we need to use other operators to get the desired result.
Note there are two flavours of division, being exact division ("suppliers who supply all the parts we are interested in and no more") and division with remainder ("suppliers who supply at least all the parts we are interested in and possibly more"). I tend to be wary of the answers here that do not mention either the name 'division' or that you need to decide whether you need to deal with remainders.
The thinking behind your professor's answer is that a double negative (in mathematics and English) i.e. if the statement "there is no part I don't supply" is true for a given supplier then that supplier will be in the result.
Note there are operators that Codd omitted (e.g. rename and summerize) that can now be found in SQL, so it's a shame we are still waiting for division!
I want to retrieve the course_id in table course that is not in the table takes. Table takes only contains course_id of courses taken by students. The problem is that if I have:
select count (distinct course.course_id)
from course, takes
where course.course_id = (takes.course_id);
the result is 85 which is smaller than the total number of course_id in table course, which is 200. The result is correct.
But I want to find the number of course_id that are not in the table takes, and I have:
select count (distinct course.course_id)
from course, takes
where course.course_id != (takes.course_id);
The result is 200, which is equal the number of course_id in table course. What is wrong with my code?
This SQL will give you the count of course_id in table course that aren't in the table takes:
select count (*)
from course c
where not exists (select *
from takes t
where c.course_id = t.course_id);
You didn't specify your DBMS, however, this SQL is pretty standard so it should work in the popular DBMSs.
There are a few different ways to accomplish what you're looking for. My personal favorite is the LEFT JOIN condition. Let me walk you through it:
Fact One: You want to return a list of courses
Fact Two: You want to
filter that list to not include anything in the Takes table.
I'd go about this by first mentally selecting a list of courses:
SELECT c.Course_ID
FROM Course c
and then filtering out the ones I don't want. One way to do this is to use a LEFT JOIN to get all the rows from the first table, along with any that happen to match in the second table, and then filter out the rows that actually do match, like so:
SELECT c.Course_ID
FROM
Course c
LEFT JOIN -- note the syntax: 'comma joins' are a bad idea.
Takes t ON
c.Course_ID = t.Course_ID -- at this point, you have all courses
WHERE t.Course_ID IS NULL -- this phrase means that none of the matching records will be returned.
Another note: as mentioned above, comma joins should be avoided. Instead, use the syntax I demonstrated above (INNER JOIN or LEFT JOIN, followed by the table name and an ON condition).
I have two tables in a Database
and
I need to retrieve the number of staff per manager in the following format
I've been trying to adapt an answer to another question
SELECT bankNo AS "Bank Number",
COUNT (*) AS "Total Branches"
FROM BankBranch
GROUP BY bankNo
As
SELECT COUNT (*) AS StaffCount ,
Employee.Name AS Name
FROM Employee, Stafflink
GROUP BY Name
As I look at the Group BY I'm thinking I should be grouping by The ManID in the Stafflink Table.
My output with this query looks like this
So it is counting correctly but as you can see it's far off the output I need to get.
Any advice would be appreciated.
You need to join the Employee and Stafflink tables. It appears that your FROM clause should look like this:
FROM Employee INNER JOIN StaffLink ON Employee.ID = StaffLink.ManID
You have to join the Eployee table twice to get the summary of employees under manager
select count(*) as StaffCount,Manager.Name
from Employee join Stafflink on employee.Id = StaffLink.EmpId
join Employee as Manager on StaffLink.ManId = Manager.Id
Group by Manager.Name
The answers that advise you on how to join are correct, assuming that you want to learn how to use SQL in MS Access. But there is a way to accomplish the same thing using the ACCESS GUI for designing queries, and this involves a shorter learning curve than learning SQL.
The key to using the GUI when more than one table is involved is to realize that you have to define the relationships between tables in the relationship manager. Once you do that, designing the query you are after is a piece of cake, just point and click.
The tricky thing in your case is that there are two relationships between the two tables. One relationship links EmpId to ID and the other links ManId to ID.
If, however, you want to learn SQL, then this shortcut will be a digression.
If you don't specify a join between the tables, a so called Cartesian product will be built, i.e., each record from one table will be paired with every record from the other table. If you have 7 records in one table and 10 in the other you will get 70 pairs (i.e. rows) before grouping. This explains why you are getting a count of 7 per manager name.
Besides joining the tables, I would suggest you to group on the manager id instead of the manager name. The manager id is known to be unique per manager, but not the name. This then requires you to either group on the name in addition, because the name is in the select list or to apply an aggregate function on the name. Each additional grouping slows down the query; therefore I prefer the aggregate function.
SELECT
COUNT(*) AS StaffCount,
FIRST(Manager.Name) AS ManagerName
FROM
Stafflink
INNER JOIN Employee AS Manager
ON StaffLink.ManId = Manager.Id
GROUP BY
StaffLink.ManId
I don't know if it makes a performance difference, but I prefer to group on StaffLink.ManId than on Employee.Id, since StaffLink is the main table here and Employee is just used as lookup table in this query.