I'm prepping for an SQL interview and was going over this guide.
The eventual code the author wrote was:
SELECT cust_id,
first_name,
sum(total_order_cost)
FROM customers
JOIN orders ON customers.id = orders.cust_id
GROUP BY cust_id,
first_name
MY QUESTION:
Why is first_name used in the GROUP_BY? If I wrote the code without first_name in the GROUP BY, I'm getting errors.
thanks in advance.
Unaggregated columns in the SELECT (cust_id and first_name in this case) need to be listed in the GROUP BY. Even in cases like this one, where there's (presumably) only one first_name per cust_id, the DB engine still expects every column in the SELECT to either be in an aggregate function or in the GROUP BY.
Related
I'm new to SQL and taking COURSERA's "SQL for Data Science" course.I have the following question in a summary assignment:
Show the number of orders placed by each customer and sort the result by the number of orders in descending order.
Having failed to write the correct code, the answer would be as follows (of course one of several options):
SELECT *
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC
I am still having trouble understanding the query logic. I would appreciate your assistance in understanding this query.
I seriously hope that Coursera isn't giving you the query you cited above as the recommended answer. It won't run on most databases, and even in cases such as MySQL where it might run, it is not completely correct. You should be using this version:
SELECT CustomerId, COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId
ORDER BY number_of_orders DESC;
A basic rule of GROUP BY is that the only columns available for selection are those which appear in the GROUP BY clause. In addition to these columns, aggregates of any column(s) may also appear in the select. The version I gave you above follows these rules, and is ANSI compliant, meaning it would run on any database.
When you say SELECT * it represents ALL COLUMNS. But you are grouping by only CustomerId which is wrong in SQL.
Specify the other columns in the group section that you want to show
The script should be something like
SELECT CustomerName, DateEntered
,COUNT (InvoiceId) AS number_of_orders
FROM Invoices
GROUP BY CustomerId, CustomerName, DateEntered
ORDER BY number_of_orders DESC
I have the tables:
Product(code (PK), pname, (....), sid (FK)),
Supplier(sid(PK), sname, (....))
The assignment is:
Find Suppliers that supply only one product. Display their name (sname) and product name (pname).
It seem to me like a GROUP BY problem, so I used:
SELECT sid FROM
Product GROUP BY sid
HAVING CAST(COUNT(*) AS INTEGER) = 1;
This query have found me the list of sid's that supply one product only, but now I have encountered a problem:
The assignment forbids any form of nested SELECT queries.
The result of the query I have written has only one column. (The sid column)
Thus, I am unable to access the product name as it is not in the query result table, and if I would have added it to the GROUP BY statement, then the grouping will based on product name as well, which is an unwanted behavior.
How should I approach the problem, then?
Note: I use PostgreSQL
You can phrase the query as:
SELECT s.sid, s.sname, MAX(p.pname) as pname
FROM Product p JOIN
Supplier s
ON p.sid = s.sid
GROUP BY s.sid, s.sname
HAVING COUNT(*) = 1;
You don't need to convert COUNT(*) to an integer. It is already an integer.
You could put
max(pname)
in the SELECT list. That's an aggregate, so it would be fine.
I'm trying to give this query:
select s_name, course from Student group by course;
But I get an error (ORA-00979 Not a GROUP BY EXPRESSION).
I want to list the names of all the students that are in the same course.
Is there another method of doing this? If not, what is the proper way to implement this query? I would appreciate if someone could give me the exact code required.
One variant (Oracle 11g):
select course, listagg(s_name, ', ') within group (order by s_name)
from student
group by course;
Oracle 10g (undocumented secret function wm_concat)
select course, wm_concat(s_name)
from student
group by course;
For what you want you shouldn't use GROUP BY.
The intention of GROUP BY is to summarise information per group.
Since you want detail within each course, you should rather use ORDER BY to ensure that your output is simply sorted with students in the same course listed together.
select s_name, course
from Student
order by course
For an example of what GROUP BY is intended for, try the following:
select course, COUNT(*) as NumStudents
from Student
group by course
I'm struggling to understand what this query is doing:
SELECT branch_name, count(distinct customer_name)
FROM depositor, account
WHERE depositor.account_number = account.account_number
GROUP BY branch_name
What's the need of GROUP BY?
You must use GROUP BY in order to use an aggregate function like COUNT in this manner (using an aggregate function to aggregate data corresponding to one or more values within the table).
The query essentially selects distinct branch_names using that column as the grouping column, then within the group it counts the distinct customer_names.
You couldn't use COUNT to get the number of distinct customer_names per branch_name without the GROUP BY clause (at least not with a simple query specification - you can use other means, joins, subqueries etc...).
It's giving you the total distinct customers for each branch; GROUP BY is used for grouping COUNT function.
It could be written also as:
SELECT branch_name, count(distinct customer_name)
FROM depositor INNER JOIN account
ON depositor.account_number = account.account_number
GROUP BY branch_name
Let's take a step away from SQL for a moment at look at the relational trainging language Tutorial D.
Because the two relations (tables) are joined on the common attribute (column) name account_number, we can use a natural join:
depositor JOIN account
(Because the result is a relation, which by definition has only distinct tuples (rows), we don't need a DISTINCT keyword.)
Now we just need to aggregate using SUMMARIZE..BY:
SUMMARIZE (depositor JOIN account)
BY { branch_name }
ADD ( COUNT ( customer_name ) AS customer_tally )
Back in SQLland, the GROUP BY branch_name is doing the same as SUMMARIZE..BY { branch_name }. Because SQL has a very rigid structure, the branch_name column must be repeated in the SELECT clause.
If you want to COUNT something (see SELECT-Part of the statement), you have to use GROUP BY in order to tell the query what to aggregate. The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
Neglecting it will lead to SQL errors in most RDBMS, or senseless results in others.
Useful link:
http://www.w3schools.com/sql/sql_groupby.asp
I am a student this is homework. I'm getting tired and confused. Any advice will be appreciated.
I have two tables.
Employee has the following columns:
Last_name
First_name
Address
Phone
Job_title(FK)
Wage
Job_title has
job_title(PK)
EEO classification
Job_description
Exempt_Non_Exempt
I need to select the employees’ last names and group them by salary within job titles that are grouped into exempt and non-exempt.
I'm using sql server to check my work but it needs to be hand scripted.
Can you provide sample data? Because it's not clear to me what the data type for JOB_TITLE.exempt_non_exempt is, or what is to be accomplished by the specified grouping criteria - EMPLOYEE.last_name will be mostly unique (but it can't be guaranteed due to the Mr. Smith principle), so it sounds like there's a need for aggregate function use.
Based on what I've read, this looks to be what you're after:
SELECT e.last_name, e.wage, jt.exempt_non_exempt
FROM EMPLOYEE e
JOIN JOB_TITLE jt ON jt.job_title = e.job_title
GROUP BY e.last_name, e.wage, jt.exempt_non_exempt
You join on the foreign/primary key to get valid data from both tables.
The GROUP BY clause is where you define grouping, but SQL standard is that if you specify columns in the SELECT clause without being wrapped in aggregate functions (IE: COUNT/MAX/MIN/etc), then those columns need to be specified in the GROUP BY.