SQL Query Involving Finding Most Frequent Tuple Value in Column

SQL Query Involving Finding Most Frequent Tuple Value in Column - sql

I have the following relations:
teaches(ID,course_id,sec_id,semester,year)
instructor(ID,name,dept_name,salary)
I am trying to express the following as an SQL query:
Find the ID and name of the instructor who has taught the most courses(i.e has the most tuples in teaches).
My Query
select ID, name
from teaches
natural join instructor
group by ID
order by count(*) desc
I know this isn't correct, but I feel like I'm on the right track. In order to answer the question, you need to work with both relations, hence the natural join operation is required. Since the question asks for the instructor that has taught the most courses, that tells me that we are trying to count the number of times each instructor ID appears in the teaches relation. From what I understand, we are looking to count distinct instructor IDs, hence the group by command is needed.

Don't use natural joins: all they do is rely on column names to decide which columns relate across tables (they don't check for foreign keys constraints or the-like, as you would thought). This is unreliable by nature.
You can use a regular inner join:
select i.id, i.name
from teaches t
inner join instructor i on i.id = t.sec_id
group by i.id, i.name
order by count(*) desc
limit 1
Notes:
this assumes that column teaches.sec_id relates to instructor.id (I cannot see which other column could be used)
I added a limit clause to the query since you stated that you want the top instructor - the syntax may vary across databases
always prefix the column names with the table they belong to, to make the query unambiguous and easier to understand
it is a good practice (and a requirement in many databases) that in an aggregate query all non-aggregared columns listed in the select clause should appear in the group by clause; I added the instructur name to your group by clause

Related

'for all' Queries in sql

In class, professor said that SQL language does not provide 'for all' operator.
In order to use 'for all' you have to use 'not exist( X except Y)'
At this point, I can't figure out why 'for all' is same meaning as 'not exist( X except Y)'
I give you example relation:
course (cID,title,deptName,credit),
teaches (pID,cID,semester,year,classroom),
student (sID,name,gender,deptName)
Q: Find all student names who have taken all courses offered in 'CS' department
The answer is:
Select distinct
S.sid, S.name
from
student as S
where
not exists (
(select cID from course where deptName = 'CS')
except
(select T.cID from takes as T where S.sID = T.sID)
);
Can you give me specific explain about that?
ps. Sorry for my English skill

You professor is right. SQL has no direct way to query all records that have all possible relations of a certain type.
It's easy to query which relations of a certain type a record has. Just INNER JOIN the two tables and you are done.
But in an M:N relationship like "students" to "taken courses" it's not that simple.
To answer the question "which student has taken all possible courses" you must find out which relations could possibly exist and then make sure that all of them do actually exist.
select distinct
S.sid, S.name
from
student as S
where
not exists (
(select cID from course where deptName = 'CS')
except
(select T.cID from takes as T where S.sID = T.sID)
);
can be translated as
give me all students SELECT
for whom it is true: WHERE
that the following set is empty NOT EXISTS
(any course in 'CS') "all relations that can possibly exist"
minus EXCEPT
(all courses the student has taken) "the ones that do actually exist"
In other words: Of all possible relations there is no relation that does not exist.
There are other ways of expressing the same thought that can be used in database systems without support for EXCEPT.
For example
select
S.sid,
S.name
from
student as S
inner join takes as T on T.sID = S.sID
inner join course as C on C. cID = T. cID
where
c. deptName = 'CS'
group by
S.sid,
S.name
having
count(*) = (select count(*) from course where deptName = 'CS');

From your table definition and requirement its not clear what is the use of teaches table. You want the list of students names those have taken all courses offered by 'CS' department. For this students and course table is enough.
SELECT name
FROM
(
SELECT B.name, A.cid
FROM course A
INNER JOIN student B ON A.deptName = B.deptName
WHERE A.deptName = 'CS'
GROUP BY A.cid, B.name
) A
GROUP BY name
HAVING COUNT(name) >= (SELECT COUNT(cid) FROM course WHERE deptName = 'CS')
Internal query just selects all students those have taken any course offered by 'CS' dept and with group by I just make sure that in case a student take same course twice they will be counted as one row. Next I just select those students take all course offered by 'CS' dept.
I think you have some gap to understand your requirement properly. In your requirement no relation with teaches table is specified.

Q: Find all student names who have taken all courses offered in 'CS'
department
NOT EXISTS returns true if the query passed to it contains 0 records.
In this case, your sub-query from NOT EXISTS selects all the courses offered in 'CS', and subtract from this result set all the courses taken by specific student.
If the student have taken all the courses then except will remove all and the sub-query will return 0 records, which in pair with NOT EXISTS will give you true for specific student, and it will be displayed in final result set.

Brief history: Codd invented the Relational Model (RM), some people created a DBMS loosely based on RM to prove a RM product could be performant, and the SQL language emerged based on that DBMS (i.e. not directly based on the RM).
Codd came up with a set of primitive operators to define a database as being relationally complete. His algebra included product, where two relations are 'multiplied' together to give a combined relation; this made it into SQL as CROSS JOIN. [Side note: people refer to this operator as 'Cartesian product', which results in a set of ordered pairs. However, product in RM results in a relation (as do all relational operators), and CROSS JOIN results in a table expression (loosely speaking).]
Codd's algebra also included a division operator. I guess the thinking is, we should be able to take the result of product and one of the relations and use an operator to result in the other relation. But it does has some practical use too, of course. It is commonly expressed as 'the supplier who supplies all products', after Chris Date's parts and suppliers database found in his books. SQL lacks an explicit division operator, so we need to use other operators to get the desired result.
Note there are two flavours of division, being exact division ("suppliers who supply all the parts we are interested in and no more") and division with remainder ("suppliers who supply at least all the parts we are interested in and possibly more"). I tend to be wary of the answers here that do not mention either the name 'division' or that you need to decide whether you need to deal with remainders.
The thinking behind your professor's answer is that a double negative (in mathematics and English) i.e. if the statement "there is no part I don't supply" is true for a given supplier then that supplier will be in the result.
Note there are operators that Codd omitted (e.g. rename and summerize) that can now be found in SQL, so it's a shame we are still waiting for division!

oracle 10g contains clause

I am new to SQL. I am trying to write a query which requires contains operator. I am looking at many examples for the contains clause, but they seem different than what I need to use. I want a divide operator(relational algebra) equivalent in sql.
a sample of what I am doing is :
Give the students names who are enrolled for computer architecture course and have satisfied all its prerequisites. I have a prereq table and a course table which lists all courses taken by student
so what I ultimately want is
(get all courses taken by a student) contains (get all prerequisits)
for what exactly shall I look for for sql equivalent of divide operation in relational algebra?

You can do this by comparing counts.
select s.StudentName
from StudentCourses s, PrereqCourses p
where s.CourseId = p.CourseId
group by s.StudentName
having count(*) = (select count(*) from PrereqCourses)
See: http://www.dba-oracle.com/t_sql_patterns_relational_division.htm

SQL Aggregation AVG statement

Ok, so I have real difficulty with the following question.
Table 1: Schema for the bookworm database. Primary keys are underlined. There are some foreign key references to link the tables together; you can make use of these with natural joins.
For each publisher, show the publisher’s name and the average price per page of books published by the publisher. Average price per page here means the total price divided by the total number of pages for the set of books; it is not the average of (price/number of pages). Present the results sorted by average price per page in ascending order.
Author(aid, alastname, afirstname, acountry, aborn, adied).
Book(bid, btitle, pid, bdate, bpages, bprice).
City(cid, cname, cstate, ccountry).
Publisher(pid, pname).
Author_Book(aid, bid).
Publisher_City(pid, cid).
So far I have tried:
SELECT
pname,
bpages,
AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP BY AVG(bpages) ASC;
and receive
ERROR: syntax error at or near "asc"
LINE 3: group by avg(bpages) asc;

You can't group by an aggregate, at least not like that. Also don't use natural join, it's bad habit to get into because most of the time you'll have to specify join conditions. It's one of those things you see in text books but almost never in real life.
OK with that out of the way, and this being homework so I don't want to just give you an answer without an explanation, aggregate functions (sum in this case) affect all values for a column within a group as limited by the where clause and join conditions, so unless your doing every row you have to specify what column contains the values you are grouping by. In this case our group is Publisher name, they want to know per publisher, what the price per page is. Lets work out a quick select statement for that:
select Pname as Publisher
, Sum(bpages) as PublishersTotalPages
, sum(bprice) as PublishersTotalPrice
, sum(bprice)/Sum(bpages) as PublishersPricePerPage
Next up we have to determine where to get the information and how the tables relate to eachother, we will use books as the base (though due to the nature of left or right joins it's less important than you think). We know there is a foreign key relation between the column PID in the book table and the column PID in the Publisher table:
From Book B
Join Publisher P on P.PID = B.PID
That's what is called an explicit join, we are explicitly stating equivalence between the two columns in the two tables (vs. implying equivalence if it's done in the where clause). This gives us a many to one relation ship, because each publisher has many books published. To see that just run the below:
select b.*, p.*
From Book B
Join Publisher P on P.PID = B.PID
Now we get to the part that seems to have stumped you, how to get the many to one relationship between books and the publishers down to one row per publisher and perform an aggregation (sum in this case) on the page count per book and price per book. The aggregation portion was already done in our selection section, so now we just have to state what column the values our group will come from, since they want to know a per publisher aggregate we'll use the publisher name to group on:
Group by Pname
Order by PublishersPricePerPage Asc
There is a little gotcha in that last part, publisherpriceperpage is a column alias for the formula sum(bprice)/Sum(bpages). Because order by is done after all other parts of the query it's unique in that we can use a column alias no other part of a query allows that, without nesting the original query. so now that you have patiently waded through my explanation, here is the final product:
select Pname as Publisher
, Sum(bpages) as PublishersTotalPages
, sum(bprice) as PublishersTotalPrice
, sum(bprice)/Sum(bpages) as PublishersPricePerPage
From Book B
Join Publisher P on P.PID = B.PID
Group by Pname
Order by PublishersPricePerPage Asc
Good luck and hope the explanation helped you get the concept.

You need ORDER BY clause and not GROUP BY to sort record. So change your query to:
SELECT pname, AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP by pname
ORDER BY AVG(bpages) ASC;

You need Order By for sorting, which was missing:
SELECT
pname,
bpages,
AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP BY pname, bpages
order by AVG(bpages) ASC;

Base on what you're trying to achieve. You can try my query below. I used the stated formula in a CASE statement to catch the error when a bprice is divided by zero(0). Also I added ORDER BY clause in your query and there's no need for the AVG aggregates.
SELECT
pname,
CASE WHEN SUM(bpages)=0 THEN '' ELSE SUM(bprice)/SUM(bpages) END price
FROM book NATURAL JOIN publisher
GROUP BY pname
ORDER BY pname ASC;

The ASC is part of the ORDER BY clause. You are missing the ORDER BY here.
Reference: http://www.tutorialspoint.com/sql/sql-group-by.htm

Check this
SELECT pname, AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP by pname
ORDER BY AVG(bpages)

Ms-Access: counting from 2 tables

I have two tables in a Database
and
I need to retrieve the number of staff per manager in the following format
I've been trying to adapt an answer to another question
SELECT bankNo AS "Bank Number",
COUNT (*) AS "Total Branches"
FROM BankBranch
GROUP BY bankNo
As
SELECT COUNT (*) AS StaffCount ,
Employee.Name AS Name
FROM Employee, Stafflink
GROUP BY Name
As I look at the Group BY I'm thinking I should be grouping by The ManID in the Stafflink Table.
My output with this query looks like this
So it is counting correctly but as you can see it's far off the output I need to get.
Any advice would be appreciated.

You need to join the Employee and Stafflink tables. It appears that your FROM clause should look like this:
FROM Employee INNER JOIN StaffLink ON Employee.ID = StaffLink.ManID

You have to join the Eployee table twice to get the summary of employees under manager
select count(*) as StaffCount,Manager.Name
from Employee join Stafflink on employee.Id = StaffLink.EmpId
join Employee as Manager on StaffLink.ManId = Manager.Id
Group by Manager.Name

The answers that advise you on how to join are correct, assuming that you want to learn how to use SQL in MS Access. But there is a way to accomplish the same thing using the ACCESS GUI for designing queries, and this involves a shorter learning curve than learning SQL.
The key to using the GUI when more than one table is involved is to realize that you have to define the relationships between tables in the relationship manager. Once you do that, designing the query you are after is a piece of cake, just point and click.
The tricky thing in your case is that there are two relationships between the two tables. One relationship links EmpId to ID and the other links ManId to ID.
If, however, you want to learn SQL, then this shortcut will be a digression.

If you don't specify a join between the tables, a so called Cartesian product will be built, i.e., each record from one table will be paired with every record from the other table. If you have 7 records in one table and 10 in the other you will get 70 pairs (i.e. rows) before grouping. This explains why you are getting a count of 7 per manager name.
Besides joining the tables, I would suggest you to group on the manager id instead of the manager name. The manager id is known to be unique per manager, but not the name. This then requires you to either group on the name in addition, because the name is in the select list or to apply an aggregate function on the name. Each additional grouping slows down the query; therefore I prefer the aggregate function.
SELECT
COUNT(*) AS StaffCount,
FIRST(Manager.Name) AS ManagerName
FROM
Stafflink
INNER JOIN Employee AS Manager
ON StaffLink.ManId = Manager.Id
GROUP BY
StaffLink.ManId
I don't know if it makes a performance difference, but I prefer to group on StaffLink.ManId than on Employee.Id, since StaffLink is the main table here and Employee is just used as lookup table in this query.

sql resultset order defined?

Say I've employee table where any employee can be related to any other employee (many to many). Each employee has many characteristics that are stored separately.
emp: id,name
related: name,emp1role,emp1id,emp2role,emp2id
chars: empid,name,value
I want to get all the characteristics of employees who are related via 'xxx' along with the relation. I am currently using this query:
SELECT c.empid, c.Name, c.Value
FROM chars as c, related as r
WHERE r.name='xxx' AND (r.emp1id=c.empid OR r.emp2id=c.empid)
This works and it gives related employees one after another i.e. if emp22 & emp43 are related via 'xxx' then I am getting chars of emp43 followed by emp22 and so on. This way I am able to know which two employees are related (which is needed). However, I want to know if this order is mere luck or is it well-defined. This is in SQLite.
If it is not defined way, how else can I do it? Also, I need to know their respective roles. I want to preferably do it in one query. Can you think of some other query?
Thanks in advance,
Manish
PS: These are not actual tables. They are here for simplicity of asking question.

In SQL, ordering of the result is undefined unless you have an explicit ORDER BY clause in your query. So I believe you want:
SELECT c.empid, c.Name, c.Value
FROM chars as c, related as r
WHERE r.name='xxx' AND (r.emp1id=c.empid OR r.emp2id=c.empid)
ORDER BY c.empid ASC
I tend to add a unique field (such as a primary key) at the end, to if not get an obvious ordering when there are multiple records matching in all other order fields, at least get deterministic ordering. But that's largely a matter of style and choice; it's by no means required.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas