SQL Aggregation AVG statement - sql

Ok, so I have real difficulty with the following question.
Table 1: Schema for the bookworm database. Primary keys are underlined. There are some foreign key references to link the tables together; you can make use of these with natural joins.
For each publisher, show the publisher’s name and the average price per page of books published by the publisher. Average price per page here means the total price divided by the total number of pages for the set of books; it is not the average of (price/number of pages). Present the results sorted by average price per page in ascending order.
Author(aid, alastname, afirstname, acountry, aborn, adied).
Book(bid, btitle, pid, bdate, bpages, bprice).
City(cid, cname, cstate, ccountry).
Publisher(pid, pname).
Author_Book(aid, bid).
Publisher_City(pid, cid).
So far I have tried:
SELECT
pname,
bpages,
AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP BY AVG(bpages) ASC;
and receive
ERROR: syntax error at or near "asc"
LINE 3: group by avg(bpages) asc;

You can't group by an aggregate, at least not like that. Also don't use natural join, it's bad habit to get into because most of the time you'll have to specify join conditions. It's one of those things you see in text books but almost never in real life.
OK with that out of the way, and this being homework so I don't want to just give you an answer without an explanation, aggregate functions (sum in this case) affect all values for a column within a group as limited by the where clause and join conditions, so unless your doing every row you have to specify what column contains the values you are grouping by. In this case our group is Publisher name, they want to know per publisher, what the price per page is. Lets work out a quick select statement for that:
select Pname as Publisher
, Sum(bpages) as PublishersTotalPages
, sum(bprice) as PublishersTotalPrice
, sum(bprice)/Sum(bpages) as PublishersPricePerPage
Next up we have to determine where to get the information and how the tables relate to eachother, we will use books as the base (though due to the nature of left or right joins it's less important than you think). We know there is a foreign key relation between the column PID in the book table and the column PID in the Publisher table:
From Book B
Join Publisher P on P.PID = B.PID
That's what is called an explicit join, we are explicitly stating equivalence between the two columns in the two tables (vs. implying equivalence if it's done in the where clause). This gives us a many to one relation ship, because each publisher has many books published. To see that just run the below:
select b.*, p.*
From Book B
Join Publisher P on P.PID = B.PID
Now we get to the part that seems to have stumped you, how to get the many to one relationship between books and the publishers down to one row per publisher and perform an aggregation (sum in this case) on the page count per book and price per book. The aggregation portion was already done in our selection section, so now we just have to state what column the values our group will come from, since they want to know a per publisher aggregate we'll use the publisher name to group on:
Group by Pname
Order by PublishersPricePerPage Asc
There is a little gotcha in that last part, publisherpriceperpage is a column alias for the formula sum(bprice)/Sum(bpages). Because order by is done after all other parts of the query it's unique in that we can use a column alias no other part of a query allows that, without nesting the original query. so now that you have patiently waded through my explanation, here is the final product:
select Pname as Publisher
, Sum(bpages) as PublishersTotalPages
, sum(bprice) as PublishersTotalPrice
, sum(bprice)/Sum(bpages) as PublishersPricePerPage
From Book B
Join Publisher P on P.PID = B.PID
Group by Pname
Order by PublishersPricePerPage Asc
Good luck and hope the explanation helped you get the concept.

You need ORDER BY clause and not GROUP BY to sort record. So change your query to:
SELECT pname, AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP by pname
ORDER BY AVG(bpages) ASC;

You need Order By for sorting, which was missing:
SELECT
pname,
bpages,
AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP BY pname, bpages
order by AVG(bpages) ASC;

Base on what you're trying to achieve. You can try my query below. I used the stated formula in a CASE statement to catch the error when a bprice is divided by zero(0). Also I added ORDER BY clause in your query and there's no need for the AVG aggregates.
SELECT
pname,
CASE WHEN SUM(bpages)=0 THEN '' ELSE SUM(bprice)/SUM(bpages) END price
FROM book NATURAL JOIN publisher
GROUP BY pname
ORDER BY pname ASC;

The ASC is part of the ORDER BY clause. You are missing the ORDER BY here.
Reference: http://www.tutorialspoint.com/sql/sql-group-by.htm

Check this
SELECT pname, AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP by pname
ORDER BY AVG(bpages)

Related

SQL Query Involving Finding Most Frequent Tuple Value in Column

I have the following relations:
teaches(ID,course_id,sec_id,semester,year)
instructor(ID,name,dept_name,salary)
I am trying to express the following as an SQL query:
Find the ID and name of the instructor who has taught the most courses(i.e has the most tuples in teaches).
My Query
select ID, name
from teaches
natural join instructor
group by ID
order by count(*) desc
I know this isn't correct, but I feel like I'm on the right track. In order to answer the question, you need to work with both relations, hence the natural join operation is required. Since the question asks for the instructor that has taught the most courses, that tells me that we are trying to count the number of times each instructor ID appears in the teaches relation. From what I understand, we are looking to count distinct instructor IDs, hence the group by command is needed.
Don't use natural joins: all they do is rely on column names to decide which columns relate across tables (they don't check for foreign keys constraints or the-like, as you would thought). This is unreliable by nature.
You can use a regular inner join:
select i.id, i.name
from teaches t
inner join instructor i on i.id = t.sec_id
group by i.id, i.name
order by count(*) desc
limit 1
Notes:
this assumes that column teaches.sec_id relates to instructor.id (I cannot see which other column could be used)
I added a limit clause to the query since you stated that you want the top instructor - the syntax may vary across databases
always prefix the column names with the table they belong to, to make the query unambiguous and easier to understand
it is a good practice (and a requirement in many databases) that in an aggregate query all non-aggregared columns listed in the select clause should appear in the group by clause; I added the instructur name to your group by clause

Interview Question about SQL group by and having

This problem is from the following
https://www.programmerinterview.com/index.php/database-sql/advanced-sql-interview-questions-and-answers/
Assume we have two tables:
Salesperson
ID Name Age Salary
Orders
Number order_date cust_id salesperson_id Amount
The question is following:
We want to retrieve the names of all salespeople that have more than 1 order from the tables above. You can assume that each salesperson only has one ID. I would probably also assume that names are all distinct.
My answer was this.
select Name from
salesperson S inner join Orders O
on S.ID=O.salesperson_id
group by Name
having count(number) >=2
However, the given answer is following:
SELECT Name
FROM Orders inner join Salesperson
On Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id, NAME
Having count(salesperson_id) > 1
If name and salesperson_id is one to one, is there any reason we have to add salesperson_id into the group by statement here? Also, if name and salesperson_id relationship is just one to one, wouldn't count(salesperson_id) be always 1 if we group by salesperson_id, name?
I was a bit confused about this, and I was wondering if anybody encountered this problem before and found this weird as well.
Both your solution and the accepted one are functionally identical, except for the GROUP BY clause.
The likely reason why the accepted solution is aggregating both by Name and salesperson_id is that it could be the case that two or more salespeople happen to have the same name. Should this occur, your query would report only a single name, but with aggregate results from more than one salesperson. But, the combination of salesperson_id and Name should always be unique.
Other than this, I actually prefer your version, and I would start joining from the salesperson table out to the Orders table.

Correlated Subquery to find AVG of certain groups

Question! I'm trying to list all books and their corresponding book subject, average cost and regular cost.
So far my query is...
SELECT BOOK_SUBJECT, AVG( BOOK_COST )
FROM BOOK
GROUP BY BOOK_SUBJECT
This query gives me the avg of the four groups total cost. Final out put should look like this I need to bring in BOOK_NUM, BOOK_TITLE, BOOK_SUBJECT, BOOK_COST, but I'm unable to figure it out. Can someone help? Correlated subquery?
Please try the following...
SELECT BOOK_NUM AS Number,
BOOK_TITLE AS Title,
BOOK.BOOK_SUBJECT AS Subject,
BOOK_COST AS Cost,
AVG_BOOK_COST AS 'Avg Cost'
FROM BOOK
JOIN ( SELECT BOOK_SUBJECT AS BOOK_SUBJECT,
AVG( BOOK_COST ) AS AVG_BOOK_COST
FROM BOOK
GROUP BY BOOK_SUBJECT
) AS SUBJECT_AVG_FINDER ON BOOK.BOOK_SUBJECT = SUBJECT_AVG_FINDER.BOOK_SUBJECT
ORDER BY BOOK_NUM;
To calculate the average for each Subject we have to group the contents of BOOK by BOOK_SUBJECT and use AVG( BOOK_COST ) to find the mean average for each group. But we also wish to avoid grouping the other fields in BOOK, instead having the specified fields from each record in Book displayed with their BOOK_SUBJECT's average cost tacked on the end. This suggests that an INNER JOIN between BOOK and a subquery that is used to find the mean average of cost for each subject.
I used the following code to find the mean average cost for each subject listed in BOOK...
SELECT BOOK_SUBJECT AS BOOK_SUBJECT,
AVG( BOOK_COST ) AS AVG_BOOK_COST
FROM BOOK
GROUP BY BOOK_SUBJECT
We need to select BOOK_SUBJECT partly because the GROUP BY clause requires it and partly because we will need it to join the table generated by this subquery to the ungrouped listing of BOOK.
Giving AVG( BOOK_COST ) the alias of AVG_BOOK_COST makes referring to this generated field much easier.
In the absence of a join type before the word JOIN most versions of SQL will assume an INNER JOIN, although all allow INNER JOIN to be used and some require you to do so. By default I and many others simply use JOIN.
Once the join is performed each record from from BOOK will have a copy of it's corresponding record from our subquery (which I have given an alias of SUBJECT_AVG_FINDER), leaving each record with two fields called BOOK_SUBJECT. So as not to confuse your version of SQL, we must specify the table / subquery along with the field name where such duplication occurs, hence BOOK.BOOK_SUBJECT in the third line of the overall statement.
Each field has been given an alias as per your desired final output image.
I have assumed that there is no need to replicate the row number field. If that is incorrect, then please state otherwise.
Finally, I have sorted the results as per your desired output by adding the line ORDER BY BOOK_NUM.
As a tip, although it's allowed, you should avoid using shouting (i.e. full uppercase) your field names, table names, and alias' (unless you are required to do so), but still shout the SQL stuff (like SELECT, FROM, AS, etc.). This can make a statement easier to read and debug by providing a visual clue as to how you are trying to use each word. I suggest the following way of presenting our SQL statement instead...
SELECT book_num AS Number,
book_title AS Title,
book.book_subject AS Subject,
book_cost AS Cost,
avg_book_cost AS 'Avg Cost'
FROM book
JOIN ( SELECT book_subject AS book_subject,
AVG( book_cost ) AS avg_book_cost
FROM book
GROUP BY book_subject
) AS subject_avg_finder ON book.book_subject = subject_avg_finder.book_subject
ORDER BY book_num;
If you have any questions or comments, then please feel free to post a Comment accordingly.
Use a subquery to do this:
SELECT BOOK.BOOK_NUM, BOOK.BOOK_TITLE, BOOK.BOOK_SUBJECT, BOOK.BOOK_COST, T.AVG_COST
FROM BOOK
INNER JOIN (
SELECT BOOK_SUBJECT, AVG(BOOK_COST) AVG_COST
FROM BOOK
GROUP BY BOOK_SUBJECT
) T ON BOOK.BOOK_SUBJECT = T.BOOK_SUBJECT
Use this code
SELECT BOOK_NUM, BOOK_TITLE, BOOK_SUBJECT, BOOK_COST,
AVG(BOOK_COST) OVER(PARTITION BY BOOK_SUBJECT) AS AVG_COST
FROM BOOK

How to modify query to walk entire table rather than a single

I wrote several SQL queries and executed them against my table. Each individual query worked. I kept adding functionality until I got a really ugly working query. The problem is that I have to manually change a value every time I want to use it. Can you assist in making this query automatic rather than “manual”?
I am working with DB2.
Table below shows customers (cid) from 1 to 3. 'club' is a book seller, and 'qnty' is the number of books the customer bought from each 'club'. The full table has 45 customers.
Image below shows all the table elements for the first 3 users (cid=1 OR cid=2 OR cid=3). The final purpose of all my queries (once combined) is it to find the single 'club' with the largest 'qnty' for each 'cid'. So for 'cid =1' the 'club' is Readers Digest with 'qnty' of 3. For 'cid=2' the 'club' is YRB Gold with 'qnty' of 5. On and on until cid 45 is reached.
To give you a background on what I did here are my queries:
(Query 1-starting point for cid=1)
SELECT * FROM yrb_purchase WHERE cid=1
(Query 2 - find the 'club' with the highest 'qnty' for cid=1)
SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC
(Query 3 – combine the record from the above query with it’s cid)
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club
(Query 4) make sure there is only one record for cid=1
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club FETCH FIRST ROWS ONLY
To get the 'club' with the highest 'qnty' for customer 2, I would simply change the text cid=1 to cid=2 in the last query above. My query seems to always produce the correct results. My question is, how do I modify my query to get the results for all 'cid's from 1 to 45 in a single table? How do I get a table with all the cid values along with the club which sold that cid the most books, and how many books were sold within one tablei? Please keep in mind I am hoping you can modify my query as opposed to you providing a better query.
If you decide that my query is way too ugly (I agree with you) and choose to provide another query, please be aware that I just started learning SQL and may not be able to understand your query. You should be aware that I already asked this question: For common elements, how to find the value based on two columns? SQL but I was not able to make the answer work (due to my SQL limitations - not because the answer wasn't good); and in the absence of a working answer I could not reverse engineer it to understand how it works.
Thanks in advance
****************************EDIT #1*******************************************
The results of the answer is:
You could use OLAP/Window Functions to achieve this:
SELECT
cid,
club,
qnty
FROM
(
SELECT
cid,
club,
qnty,
ROW_NUMBER() OVER (PARTITION BY cid order by qnty desc) as cid_club_rank
FROM
(
SELECT
cid,
club,
sum(qnty) as qnty
FROM yrb_purchase
GROUP BY cid, club
) as sub1
) as sub2
WHERE cid_club_rank = 1
The inner most statement (sub1) just grabs a total quantity for each cid/club combination. The second inner most statement (sub2) creates a row_number for each cid/club combination ordering by the quantity (top down). Then the outer most query chooses only records where that row_number() is 1.

SQL Counting and Joining

I'm taking a database course this semester, and we're learning SQL. I understand most simple queries, but I'm having some difficulty using the count aggregate function.
I'm supposed to relate an advertisement number to a property number to a branch number so that I can tally up the amount of advertisements by branch number and compute their cost. I set up what I think are two appropriate new views, but I'm clueless as to what to write for the select statement. Am I approaching this the correct way? I have a feeling I'm over complicating this bigtime...
with ad_prop(ad_no, property_no, overseen_by) as
(select a.ad_no, a.property_no, p.overseen_by
from advertisement as a, property as p
where a.property_no = p.property_no)
with prop_branch(property_no, overseen_by, allocated_to) as
(select p.property_no, p.overseen_by, s.allocated_to
from property as p, staff as s
where p.overseen_by = s.staff_no)
select distinct pb.allocated_to as branch_no, count( ??? ) * 100 as ad_cost
from prop_branch as pb, ad_prop as ap
where ap.property_no = pb.property_no
group by branch_no;
Any insight would be greatly appreciated!
You could simplify it like this:
advertisement
- ad_no
- property_no
property
- property_no
- overseen_by
staff
- staff_no
- allocated_to
SELECT s.allocated_to AS branch, COUNT(*) as num_ads, COUNT(*)*100 as ad_cost
FROM advertisement AS a
INNER JOIN property AS p ON a.property_no = p.property_no
INNER JOIN staff AS s ON p.overseen_by = s.staff_no
GROUP BY s.allocated_to;
Update: changed above to match your schema needs
You can condense your WITH clauses into a single statement. Then, the piece I think you are missing is that columns referenced in the column definition have to be aggregated if they aren't included in the GROUP BY clause. So you GROUP BY your distinct column then apply your aggregation and math in your column definitions.
SELECT
s.allocated_to AS branch_no
,COUNT(a.ad_no) AS ad_count
,(ad_count * 100) AS ad_cost
...
GROUP BY s.allocated_to
i can tell you that you are making it way too complicated. It should be a select statement with a couple of joins. You should re-read the chapter on joins or take a look at the following link
http://www.sql-tutorial.net/SQL-JOIN.asp
A join allows you to "combine" the data from two tables based on a common key between the two tables (you can chain more tables together with more joins). Once you have this "joined" table, you can pretend that it is really one table (aliases are used to indicate where that column came from). You understand how aggregates work on a single table right?
I'd prefer not to give you the answer so that you can actually learn :)