Correlated Subquery to find AVG of certain groups - sql

Question! I'm trying to list all books and their corresponding book subject, average cost and regular cost.
So far my query is...
SELECT BOOK_SUBJECT, AVG( BOOK_COST )
FROM BOOK
GROUP BY BOOK_SUBJECT
This query gives me the avg of the four groups total cost. Final out put should look like this I need to bring in BOOK_NUM, BOOK_TITLE, BOOK_SUBJECT, BOOK_COST, but I'm unable to figure it out. Can someone help? Correlated subquery?

Please try the following...
SELECT BOOK_NUM AS Number,
BOOK_TITLE AS Title,
BOOK.BOOK_SUBJECT AS Subject,
BOOK_COST AS Cost,
AVG_BOOK_COST AS 'Avg Cost'
FROM BOOK
JOIN ( SELECT BOOK_SUBJECT AS BOOK_SUBJECT,
AVG( BOOK_COST ) AS AVG_BOOK_COST
FROM BOOK
GROUP BY BOOK_SUBJECT
) AS SUBJECT_AVG_FINDER ON BOOK.BOOK_SUBJECT = SUBJECT_AVG_FINDER.BOOK_SUBJECT
ORDER BY BOOK_NUM;
To calculate the average for each Subject we have to group the contents of BOOK by BOOK_SUBJECT and use AVG( BOOK_COST ) to find the mean average for each group. But we also wish to avoid grouping the other fields in BOOK, instead having the specified fields from each record in Book displayed with their BOOK_SUBJECT's average cost tacked on the end. This suggests that an INNER JOIN between BOOK and a subquery that is used to find the mean average of cost for each subject.
I used the following code to find the mean average cost for each subject listed in BOOK...
SELECT BOOK_SUBJECT AS BOOK_SUBJECT,
AVG( BOOK_COST ) AS AVG_BOOK_COST
FROM BOOK
GROUP BY BOOK_SUBJECT
We need to select BOOK_SUBJECT partly because the GROUP BY clause requires it and partly because we will need it to join the table generated by this subquery to the ungrouped listing of BOOK.
Giving AVG( BOOK_COST ) the alias of AVG_BOOK_COST makes referring to this generated field much easier.
In the absence of a join type before the word JOIN most versions of SQL will assume an INNER JOIN, although all allow INNER JOIN to be used and some require you to do so. By default I and many others simply use JOIN.
Once the join is performed each record from from BOOK will have a copy of it's corresponding record from our subquery (which I have given an alias of SUBJECT_AVG_FINDER), leaving each record with two fields called BOOK_SUBJECT. So as not to confuse your version of SQL, we must specify the table / subquery along with the field name where such duplication occurs, hence BOOK.BOOK_SUBJECT in the third line of the overall statement.
Each field has been given an alias as per your desired final output image.
I have assumed that there is no need to replicate the row number field. If that is incorrect, then please state otherwise.
Finally, I have sorted the results as per your desired output by adding the line ORDER BY BOOK_NUM.
As a tip, although it's allowed, you should avoid using shouting (i.e. full uppercase) your field names, table names, and alias' (unless you are required to do so), but still shout the SQL stuff (like SELECT, FROM, AS, etc.). This can make a statement easier to read and debug by providing a visual clue as to how you are trying to use each word. I suggest the following way of presenting our SQL statement instead...
SELECT book_num AS Number,
book_title AS Title,
book.book_subject AS Subject,
book_cost AS Cost,
avg_book_cost AS 'Avg Cost'
FROM book
JOIN ( SELECT book_subject AS book_subject,
AVG( book_cost ) AS avg_book_cost
FROM book
GROUP BY book_subject
) AS subject_avg_finder ON book.book_subject = subject_avg_finder.book_subject
ORDER BY book_num;
If you have any questions or comments, then please feel free to post a Comment accordingly.

Use a subquery to do this:
SELECT BOOK.BOOK_NUM, BOOK.BOOK_TITLE, BOOK.BOOK_SUBJECT, BOOK.BOOK_COST, T.AVG_COST
FROM BOOK
INNER JOIN (
SELECT BOOK_SUBJECT, AVG(BOOK_COST) AVG_COST
FROM BOOK
GROUP BY BOOK_SUBJECT
) T ON BOOK.BOOK_SUBJECT = T.BOOK_SUBJECT

Use this code
SELECT BOOK_NUM, BOOK_TITLE, BOOK_SUBJECT, BOOK_COST,
AVG(BOOK_COST) OVER(PARTITION BY BOOK_SUBJECT) AS AVG_COST
FROM BOOK

Related

how to use View function in SQL

I have a question for an assignment and it is listed below here.
"List a query that list the film genres and gross revenue for that genre, conditional to the gross revenue for that genre being higher than average gross revenue per genre. Hint: Use a View to simplify the query."
My code is as follows:
CREATE VIEW grossrevenue as (
SELECT category.name, SUM(payment.amount) as sumpay
FROM payment
JOIN rental ON payment.rental_id = rental.rental_id
JOIN inventory ON rental.inventory_id = inventory.inventory_id
JOIN film_category ON inventory.film_id = film_category.film_id
JOIN category ON film_category.category_id = category.category_id
GROUP by category.name
);
Then:
Select sumpay
FROM grossrevenue
WHERE sumpay > AVG(sumpay);
What is wrong with it and why is it not running?
Syntax wise, AVG() does not make sense without a group of records. It’s for aggregating multiple rows.
Try putting a sub query to calculate the average, or rethink your view.
Requested Edits
People generally shouldn't give answers to school assignments, so I didn't dig into your view SQL too much, and my answer was somewhat vague on purpose.
But here's some info to clarify my above answer so you can go the right direction:
Select sumpay
FROM grossrevenue
WHERE sumpay > AVG(sumpay);
You can't do sumpay > AVG(sumpay) as they both operate on just one row. AVG() of one row doesn't make sense.
If AVG(sumpay) was the only thing in the select clause like this, then it makes sense as it will average all the rows and give you literally just the average.
Select AVG(sumpay)
FROM grossrevenue
But in your where clause, its literally just operating on the current row and doesn't mean anything. There is nothing to average; that's not how it works.
You can use a "subquery" to get the average (literally replace AVG(sumpay) with a subquery after googling it). That would be one option.

SQL - count with or without subquery?

I have two tables in my DB:
Building(bno,address,bname) - PK is bno. bno
Room(bno,rno,floor,maxstud) - PK is bno,rno (together)
The Building table stands for a building number, address and name.
The Room table stands for building number, room number, floor number and maximum amount of students who can live in the room.
The query I have to write:
Find a building who has at least 10 rooms, which the maximum amount of students who can live in is 1. The columns should be bno, bname, number of such rooms.
What I wrote:
select building.bno, building.bname, count(rno)
from room natural join building
where maxstud =1
group by bno, bname
having count(rno)>=10
What the solution I have states:
with temp as (
select bno, count(distinct rno) as sumrooms
from room
where maxstud=1
group by bno
)
select bno, bname, sumrooms
from building natural join temp
where sumrooms>=10
Is my solution correct? I didn't see a reason to use a sub-query, but now I'm afraid I was wrong.
Thanks,
Alan
Your query will perform faster but I'm afraid won't compile because you are not including every unaggregated column in the GROUP BY clause (here: building.bname).
Also, the solution that you have which isn't yours counts distinct room numbers, so one may conclude that a building can have several rooms with the same numbers for example on different floors, so that a room would be identified correctly by the unique triple (bno, rno, floor).
Given what I've wrote above your query would look:
select building.bno, building.bname, count(distinct rno)
from room natural join building
where maxstud = 1
group by 1,2 -- I used positions here, you can use names if you wish
having count(distinct rno) >= 10
Your solution is better.
If you are unsure, run both queries on a sample dataset and convince yourself that the results are the same.

SQL Aggregation AVG statement

Ok, so I have real difficulty with the following question.
Table 1: Schema for the bookworm database. Primary keys are underlined. There are some foreign key references to link the tables together; you can make use of these with natural joins.
For each publisher, show the publisher’s name and the average price per page of books published by the publisher. Average price per page here means the total price divided by the total number of pages for the set of books; it is not the average of (price/number of pages). Present the results sorted by average price per page in ascending order.
Author(aid, alastname, afirstname, acountry, aborn, adied).
Book(bid, btitle, pid, bdate, bpages, bprice).
City(cid, cname, cstate, ccountry).
Publisher(pid, pname).
Author_Book(aid, bid).
Publisher_City(pid, cid).
So far I have tried:
SELECT
pname,
bpages,
AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP BY AVG(bpages) ASC;
and receive
ERROR: syntax error at or near "asc"
LINE 3: group by avg(bpages) asc;
You can't group by an aggregate, at least not like that. Also don't use natural join, it's bad habit to get into because most of the time you'll have to specify join conditions. It's one of those things you see in text books but almost never in real life.
OK with that out of the way, and this being homework so I don't want to just give you an answer without an explanation, aggregate functions (sum in this case) affect all values for a column within a group as limited by the where clause and join conditions, so unless your doing every row you have to specify what column contains the values you are grouping by. In this case our group is Publisher name, they want to know per publisher, what the price per page is. Lets work out a quick select statement for that:
select Pname as Publisher
, Sum(bpages) as PublishersTotalPages
, sum(bprice) as PublishersTotalPrice
, sum(bprice)/Sum(bpages) as PublishersPricePerPage
Next up we have to determine where to get the information and how the tables relate to eachother, we will use books as the base (though due to the nature of left or right joins it's less important than you think). We know there is a foreign key relation between the column PID in the book table and the column PID in the Publisher table:
From Book B
Join Publisher P on P.PID = B.PID
That's what is called an explicit join, we are explicitly stating equivalence between the two columns in the two tables (vs. implying equivalence if it's done in the where clause). This gives us a many to one relation ship, because each publisher has many books published. To see that just run the below:
select b.*, p.*
From Book B
Join Publisher P on P.PID = B.PID
Now we get to the part that seems to have stumped you, how to get the many to one relationship between books and the publishers down to one row per publisher and perform an aggregation (sum in this case) on the page count per book and price per book. The aggregation portion was already done in our selection section, so now we just have to state what column the values our group will come from, since they want to know a per publisher aggregate we'll use the publisher name to group on:
Group by Pname
Order by PublishersPricePerPage Asc
There is a little gotcha in that last part, publisherpriceperpage is a column alias for the formula sum(bprice)/Sum(bpages). Because order by is done after all other parts of the query it's unique in that we can use a column alias no other part of a query allows that, without nesting the original query. so now that you have patiently waded through my explanation, here is the final product:
select Pname as Publisher
, Sum(bpages) as PublishersTotalPages
, sum(bprice) as PublishersTotalPrice
, sum(bprice)/Sum(bpages) as PublishersPricePerPage
From Book B
Join Publisher P on P.PID = B.PID
Group by Pname
Order by PublishersPricePerPage Asc
Good luck and hope the explanation helped you get the concept.
You need ORDER BY clause and not GROUP BY to sort record. So change your query to:
SELECT pname, AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP by pname
ORDER BY AVG(bpages) ASC;
You need Order By for sorting, which was missing:
SELECT
pname,
bpages,
AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP BY pname, bpages
order by AVG(bpages) ASC;
Base on what you're trying to achieve. You can try my query below. I used the stated formula in a CASE statement to catch the error when a bprice is divided by zero(0). Also I added ORDER BY clause in your query and there's no need for the AVG aggregates.
SELECT
pname,
CASE WHEN SUM(bpages)=0 THEN '' ELSE SUM(bprice)/SUM(bpages) END price
FROM book NATURAL JOIN publisher
GROUP BY pname
ORDER BY pname ASC;
The ASC is part of the ORDER BY clause. You are missing the ORDER BY here.
Reference: http://www.tutorialspoint.com/sql/sql-group-by.htm
Check this
SELECT pname, AVG(bprice)
FROM book NATURAL JOIN publisher
GROUP by pname
ORDER BY AVG(bpages)

I'm not sure what is the purpose of "group by" here

I'm struggling to understand what this query is doing:
SELECT branch_name, count(distinct customer_name)
FROM depositor, account
WHERE depositor.account_number = account.account_number
GROUP BY branch_name
What's the need of GROUP BY?
You must use GROUP BY in order to use an aggregate function like COUNT in this manner (using an aggregate function to aggregate data corresponding to one or more values within the table).
The query essentially selects distinct branch_names using that column as the grouping column, then within the group it counts the distinct customer_names.
You couldn't use COUNT to get the number of distinct customer_names per branch_name without the GROUP BY clause (at least not with a simple query specification - you can use other means, joins, subqueries etc...).
It's giving you the total distinct customers for each branch; GROUP BY is used for grouping COUNT function.
It could be written also as:
SELECT branch_name, count(distinct customer_name)
FROM depositor INNER JOIN account
ON depositor.account_number = account.account_number
GROUP BY branch_name
Let's take a step away from SQL for a moment at look at the relational trainging language Tutorial D.
Because the two relations (tables) are joined on the common attribute (column) name account_number, we can use a natural join:
depositor JOIN account
(Because the result is a relation, which by definition has only distinct tuples (rows), we don't need a DISTINCT keyword.)
Now we just need to aggregate using SUMMARIZE..BY:
SUMMARIZE (depositor JOIN account)
BY { branch_name }
ADD ( COUNT ( customer_name ) AS customer_tally )
Back in SQLland, the GROUP BY branch_name is doing the same as SUMMARIZE..BY { branch_name }. Because SQL has a very rigid structure, the branch_name column must be repeated in the SELECT clause.
If you want to COUNT something (see SELECT-Part of the statement), you have to use GROUP BY in order to tell the query what to aggregate. The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
Neglecting it will lead to SQL errors in most RDBMS, or senseless results in others.
Useful link:
http://www.w3schools.com/sql/sql_groupby.asp

SQL Counting and Joining

I'm taking a database course this semester, and we're learning SQL. I understand most simple queries, but I'm having some difficulty using the count aggregate function.
I'm supposed to relate an advertisement number to a property number to a branch number so that I can tally up the amount of advertisements by branch number and compute their cost. I set up what I think are two appropriate new views, but I'm clueless as to what to write for the select statement. Am I approaching this the correct way? I have a feeling I'm over complicating this bigtime...
with ad_prop(ad_no, property_no, overseen_by) as
(select a.ad_no, a.property_no, p.overseen_by
from advertisement as a, property as p
where a.property_no = p.property_no)
with prop_branch(property_no, overseen_by, allocated_to) as
(select p.property_no, p.overseen_by, s.allocated_to
from property as p, staff as s
where p.overseen_by = s.staff_no)
select distinct pb.allocated_to as branch_no, count( ??? ) * 100 as ad_cost
from prop_branch as pb, ad_prop as ap
where ap.property_no = pb.property_no
group by branch_no;
Any insight would be greatly appreciated!
You could simplify it like this:
advertisement
- ad_no
- property_no
property
- property_no
- overseen_by
staff
- staff_no
- allocated_to
SELECT s.allocated_to AS branch, COUNT(*) as num_ads, COUNT(*)*100 as ad_cost
FROM advertisement AS a
INNER JOIN property AS p ON a.property_no = p.property_no
INNER JOIN staff AS s ON p.overseen_by = s.staff_no
GROUP BY s.allocated_to;
Update: changed above to match your schema needs
You can condense your WITH clauses into a single statement. Then, the piece I think you are missing is that columns referenced in the column definition have to be aggregated if they aren't included in the GROUP BY clause. So you GROUP BY your distinct column then apply your aggregation and math in your column definitions.
SELECT
s.allocated_to AS branch_no
,COUNT(a.ad_no) AS ad_count
,(ad_count * 100) AS ad_cost
...
GROUP BY s.allocated_to
i can tell you that you are making it way too complicated. It should be a select statement with a couple of joins. You should re-read the chapter on joins or take a look at the following link
http://www.sql-tutorial.net/SQL-JOIN.asp
A join allows you to "combine" the data from two tables based on a common key between the two tables (you can chain more tables together with more joins). Once you have this "joined" table, you can pretend that it is really one table (aliases are used to indicate where that column came from). You understand how aggregates work on a single table right?
I'd prefer not to give you the answer so that you can actually learn :)