SQL query involving group by and joins - sql

I couldn't be more specific in the title part but I want to do something a little bit complex for me. I thought I did it but it turned out that it is buggy.
I have three tables as following:
ProjectTable
idProject
title
idOwner
OfferTable
idOffer
idProject
idAccount
AccountTable
idAccount
Username
Now in one query I aim to list all the projects with most offers made, and in the query I also want to get details like the username of the owner, username of the offerer* etc. So I don't have to query again for each project.
Here is my broken query, it's my first experiment with GROUP BY and I probably didn't quite get it.
SELECT Project.addDate,Project.idOwner ,Account.Username,Project.idProject,
Project.Price,COUNT(Project.idProject) as offercount
FROM Project
INNER JOIN Offer
ON Project.idProject= Offer.idProject
INNER JOIN Account
ON Account.idAccount = Project.idOwner
GROUP BY Project.addDate,Project.idOwner,
Account.Username,Project.idProject,Project.Price
ORDER BY addDate DESC
*:I wrote that without thinking I was just trying to come up with example extra information, that is meaningless thanks to Hosam Aly.

Try this (modified for projects with no offers):
SELECT
Project.addDate,
Project.idOwner,
Account.Username,
Project.idProject,
Project.Price,
ISNULL(q.offercount, 0) AS offercount
FROM
(
SELECT
o.idProject,
COUNT(o.idProject) as offercount
FROM Offer o
GROUP BY o.idProject
) AS q
RIGHT JOIN Project ON Project.idProject = q.idProject
INNER JOIN Account ON Account.idAccount = Project.idOwner
ORDER BY addDate DESC

I might switch the query slightly to this:
select p.addDate,
p.idOwner,
a.Username,
p.idProject,
p.price,
o.OfferCount
from project p
left join
(
select count(*) OfferCount, idproject
from offer
group by idproject
) o
on p.idproject = o.idproject
left join account a
on p.idowner = a.idaccount
This way, you are getting the count by the projectid and not based on all of the other fields you are grouping by. I am also using a LEFT JOIN in the event the projectid or other id doesn't exist in the other tables, you will still return data.

Your question is a bit vague, but here are some pointers:
To list the projects "with most offers made", ORDER BY offercount.
You're essentially querying for projects, so you should GROUP BY Project.idProject first before the other fields.
You're querying for the number of offers made on each project, yet you ask about offer details. It doesn't really make sense (syntax-wise) to ask for the two pieces of information together. If you want to get the total number of offers, repeated in every record of the result, along with offer information, you'll have to use an inner query for that.
An inner query can be made either in the FROM clause, as suggested by other answers, or directly in the SELECT clause, like so:
SELECT Project.idProject,
(SELECT COUNT(Offer.idOffer)
FROM Offer
WHERE Offer.idProject = Project.idProject
) AS OfferCount
FROM Project

Related

Designing query to select count from different tables

This is the schema for my questions
Hi, I don't have experience in SQL Developer and I'm trying to build a query for the following question:
I need that for each DVD in the catalog, display the title, length, release_date, and how many times it has been checked out by all customers across all libraries.
Also I want to include those that have not been checked out yet displaying 0, and sort results by title.
So far I have this in the query but I'm stock here:
--Question C. ************* VERIFY
Select
Catalog_Item.Title,
DVD.Length,
Catalog_Item.Release_Date,
(
Select
Count(Transaction.Transaction_ID)
From Transaction
Where
DVD.Catalog_Item_ID = Physical_Item.Catalog_Item_ID
And Physical_Item.Physical_Item_ID = Transaction.Physical_Item_ID
) as "Total_DVD"
From
Catalog_Item,DVD,
Physical_Item
Group by
Catalog_Item.Title,
DVD.Length,
Catalog_Item.Release_Date
If I run this exact query I get error
Not a Group By Expression
And if I exclude the GROUP BY, I get results by doesn't look like the correct outputs.
Any suggestions on what syntax I can use to achieve the desired output? Thanks!
You put three tables to the query but you missed to link them. If you don't link them, you will see too much-duplicated rows.
Also, your sub-query links were wrong, I assume you tried to put the links here that you missed in the main query.
I believe you need something like that:
Select
CI.Title
,DVD.Length
,CI.Release_Date
,NVL(TR.TotalTransactions,0) TotalTransactions
From Catalog_Item CI
INNER JOIN DVD ON DVD.Catalog_Item_ID = CI.Catalog_Item_ID
LEFT JOIN Physical_Item PHI ON CI.Catalog_Item_ID = PHI.Catalog_Item_ID
LEFT JOIN (SELECT Physical_Item_ID
, Count(Transaction_ID) TotalTransactions
FROM Transaction
GROUP BY Physical_Item_ID
) TR ON PHI.Physical_Item_ID = TR.Physical_Item_ID
For a start, join Catalog_Item, Physical_Item and DVD together. Without appropriate join conditions, these three tables will join using a cartesian product join - which is probably one of the reasons why you are seeing unexpected results.

Using COUNT (DISTINCT..) when also using INNER JOIN to join 3 tables but Postgres keeps erroring

I need to use INNER JOINs to get a series of information and then I need to COUNT this info. I need to be able to "View all courses and the instructor taking them, the capacity of the course, and the number of members currently booked on the course."
To get all the info I have done the following query:
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo, membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
but no matter where I put the COUNT it always tells me that whatever is outside the COUNT should be in the GROUP BY
I have worked out how to COUNT/GROUP BY the necessary info e.g.:
SELECT courseID, COUNT (DISTINCT MC.memno)
FROM Membercourse AS MC
GROUP BY MC.courseID;
but I don't know how to combine the two!
I think what you're looking for is a subquery. I'm a SQL-Server guy (not postgresql) but the concept looks to be almost identical after some crash-course postgresql googling.
Anyway, basically, when you write a SELECT statement, you can use a subquery instead of an actual table. So your SQL would look something like:
select count(*)
from
(
select stuff from table
inner join someOtherTable
)
... hopefully that makes sense. Instead of trying to write one big query where you're doing both the inner join and count, you're writing two: an inner one that gets your inner-join'ed data, and then an outer one to actually count the rows.
EDIT: To help explain a bit more on the thought process behind subqueries.
Subqueries are a way of logically breaking down the steps/processes on the data. Instead of trying to do everything in one big step, you do it in steps.
In this case, what's step one? It's to get a combined data source for your combined, inner-join'ed data.
Step 1: Write the Inner Join query
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID;
Okay, now, what next?
Well, let's say we want to get a count of how many entries there are for each 'memno' in that result above.
Instead of trying to figure out how to modify that query above, we instead use it as a data source, like it was a table itself.
Step 2 - Make it A Subquery
select * from
(
SELECT
C.coursename, Instructors.fname, Instructors.lname,C.maxNo,
membercourse.memno
FROM Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo
INNER JOIN Membercourse ON C.courseID = Membercourse.courseID
) mySubQuery
Step 3 - Modify your outer query to get the data you want.
Well, we wanted to group by 'memno', and get the count, right? So...
select memno, count(*)
from
(
-- all that same subquery stuff
) mySubQuery
group by memno
... make sense? Once you've got your subquery written out, you don't need to worry about it any more - you just treat it like a table you're working with.
This is actually incredibly important, and makes it much easier to read more intricate queries - especially since you can name your subqueries in a way that explains what the subquery represents data-wise.
There are many ways to solve this, such using Window Functions and so on. But you can also achieve it using a simple subquery:
SELECT
C.coursename,
Instructors.fname,
Instructors.lname,
C.maxNo,
(SELECT
COUNT(*)
FROM
membercourse
WHERE
C.courseID = Membercourse.courseID) AS members
FROM
Courses AS C
INNER JOIN Instructors ON C.instructorNo = Instructors.instructorNo;

Specifying SELECT, then joining with another table

I just hit a wall with my SQL query fetching data from my MS SQL Server.
To simplify, say i have one table for sales, and one table for customers. They each have a corresponding userId which i can use to join the tables.
I wish to first SELECT from the sales table where say price is equal to 10, and then join it on the userId, in order to get access to the name and address etc. from the customer table.
In which order should i structure the query? Do i need some sort of subquery or what do i do?
I have tried something like this
SELECT *
FROM Sales
WHERE price = 10
INNER JOIN Customers
ON Sales.userId = Customers.userId;
Needless to say this is very simplified and not my database schema, yet it explains my problem simply.
Any suggestions ? I am at a loss here.
A SELECT has a certain order of its components
In the simple form this is:
What do I select: column list
From where: table name and joined tables
Are there filters: WHERE
How to sort: ORDER BY
So: most likely it was enough to change your statement to
SELECT *
FROM Sales
INNER JOIN Customers ON Sales.userId = Customers.userId
WHERE price = 10;
The WHERE clause must follow the joins:
SELECT * FROM Sales
INNER JOIN Customers
ON Sales.userId = Customers.userId
WHERE price = 10
This is simply the way SQL syntax works. You seem to be trying to put the clauses in the order that you think they should be applied, but SQL is a declarative languages, not a procedural one - you are defining what you want to occur, not how it will be done.
You could also write the same thing like this:
SELECT * FROM (
SELECT * FROM Sales WHERE price = 10
) AS filteredSales
INNER JOIN Customers
ON filteredSales.userId = Customers.userId
This may seem like it indicates a different order for the operations to occur, but it is logically identical to the first query, and in either case, the database engine may determine to do the join and filtering operations in either order, as long as the result is identical.
Sounds fine to me, did you run the query and check?
SELECT s.*, c.*
FROM Sales s
INNER JOIN Customers c
ON s.userId = c.userId;
WHERE s.price = 10

SQL count - first time

I am learning SQL (bit by bit!) trying to perform a query on our database and adding in a count function to show the total orders that appear against a customers id by counting in a inner join query.
Somehow it is pooling all the data together onto one customer with the count function though.
Can someone please suggest where I am going wrong?
SELECT tbl_customers.*, tbl_stateprov.stprv_Name, tbl_custstate.CustSt_Destination, COUNT(order_id) as total
FROM tbl_stateprov
INNER JOIN (tbl_customers
INNER JOIN (tbl_custstate
INNER JOIN tbl_orders ON tbl_orders.order_CustomerID = tbl_custstate.CustSt_Cust_ID)
ON tbl_customers.cst_ID = tbl_custstate.CustSt_Cust_ID)
ON tbl_stateprov.stprv_ID = tbl_custstate.CustSt_StPrv_ID
WHERE tbl_custstate.CustSt_Destination='BillTo'
AND cst_LastName LIKE '#URL.Alpha#%'
You need a GROUP BY clause in this statement in order to get what you want. You need to figure out what level you want to group it by in order to select which fields to add to the group by clause. If you just wanted to see it on a per customer basis, and the customers table had an id field, it would look like this (at the very end of your sql):
GROUP BY tbl_customers.id
Now you can certainly group by more fields, it just depends how you want to slice the results.
In your select statement you are using format like tableName.ColumnName but not for COUNT(order_id)
It should be COUNT(tableOrAlias.order_id)
Hope that helps.
As you are new to SQL it might also be worth considering the readability of your joins - the nested / bracketed joins you mentioned above are quite hard to read, and I would also personally alias your tables to make the query that bit more accessible:
SELECT
tbl_customers.customer_id
,tbl_stateprov.stprv_Name
,tbl_custstate.CustSt_Destination
,COUNT(order_id) as total
FROM tbl_stateprov statep
INNER JOIN tbl_custstate state ON statep.stprv_ID = state.CustSt_StPrv_ID
INNER JOIN tbl_customers customer ON customer.cst_ID = state.CustSt_Cust_ID
INNER JOIN tbl_orders orders ON orders.order_CustomerID = state.CustSt_Cust_ID
WHERE tbl_custstate.CustSt_Destination='BillTo'
AND cst_LastName LIKE '#URL.Alpha#%'
GROUP BY
tbl_customers.customer_id
,tbl_stateprov.stprv_Name
,tbl_custstate.CustSt_Destination
--And any other columns you want to include the count for

SQL primer: Why and how to use join statements?

I have a MySQL database that looks like this:
users
( id , name )
groups
( id , name )
group_users
( id , group_id , user_id )
Now to look up all the groups that a user belongs to, I would do something like this:
select * from 'group_users' where 'user_id' = 47;
This will probably return something like:
( 1 , 3 , 47 ),
( 2 , 4 , 47 ),
But when I want to display this to the user, I'm going to want to display the name of the groups that they belong to instead of the group_id. In a loop, I could fetch each group with the group_id that was returned, but that seems like the wrong way to do it. Is this a case where a join statement should be used? Which type of join should I use and how would I use it?
In general, you want to reduce the total number of queries to the database, by making each query do more. There are many reasons why this is a good thing, but the main one is that relational database management systems are specifically designed to be able to join tables quickly and are better at it than the equivalent code in some other language. Another reason is that it's usually more expensive to open many little queries than it is to run one large query that has everything you'll end up needing.
You want to take advantage of your RDBMS's strengths, so you should try to push data access into it in a few big queries rather than lots of little queries.
Now, that's just a general rule of thumb. There are cases when it's better to do some things outside of the database. It's important that you determine which is the right case for your situation by looking into bottlenecks if and only if they occur. Don't spend time worrying about performance until you find a performance problem.
But, in general, it's better to handle joins, lookups and all other query-related tasks in the database itself than it is to try to handle it in a general-purpose language.
That said, the kind of join you want is an inner join. You'd structure your join query like this:
SELECT groups.name, group_users.user_id
FROM group_users
INNER JOIN groups
ON group_users.group_id = groups.group_id
WHERE groups.user_id = 47;
I always find these charts to be very useful when doing joins:
https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
SELECT gu.id, gu.group_id, g.name, gu.user_id
FROM group_users gu
INNER JOIN Group g
ON gu.group_id = g.group_id
WHERE user_id = 47
That's a typical case for an inner join, such as:
select users.name, group.name from groups
inner join group_users on groups.id = group_users.group_id
inner join users on group_users.user_id = users.id
where user_id = 47