SQL: How to use sum in group by - sql

SELECT idteam,
job,
price,
COUNT('X') as INFORMS,
SUM(COUNT('X') * price) as TOTAL
FROM REP
JOIN COSTS ON (job = categ AND to_number(to_char(REP,'YYYY')) = year)
GROUP BY idteam, job, price, TOTAL
ORDER BY IDTEAM;
I don't know why if I write TOTAL in GROUP BY and sql sends me error.. Identifier invalid.
I don't know how can I resolve that.
Thanks.

The column "TOTAL" is an alias for SUM(COUNT('X') * price).
It cannot be used as a column identifier in the GROUP BY clause. You must say GROUP BY SUM(COUNT('X') * price), because "TOTAL" is unknown/not a column, at the time of grouping.
After using GROUPING, you can refer to "TOTAL" in a HAVING clause.
In any case, the version/type of SQL your are using, doesn't allow it.
Additionally, why are you COUNTing 'X'? That X is a fixed value, and does not depend on any of your columns. If you would like to count each row, just use Count(1) or Count(*). Also, you don't need to SUM a COUNT. A COUNT is already summed.
You should post the structure of both REP and COSTS. Your linked image doesn't have enough info to support the query you wrote.
select
idteam,
-- job, /* not selected since it would need to be grouped*/
sum(price) as 'theSUM'
from REP
join COSTS
on REP.categ = COSTS.job
and COSTS.year = 2016
group by idteam
order by idteam

Related

How to fix this column doesn't exist error in SQL?

In the sales table, three columns are btl_price, bottle_qty, and total. The total for a transaction should be the product of btl_price and bottle_qty. How many transactions have a value of total that is not equal to btl_price times bottle_qty?
Here is the table:
Here are my codes:
sql = """
Select (btl_price*bottle_qty) As total_sale, CAST(total AS money)
From sales
Where total != total_sale
"""
It keeps telling me "column "total_sale" does not exist".
Please help me to identify my mistakes.
PS: I code this in Jupyter Notebook. This is a practice of mine not in any DBMS.
You cannot use columns computed in the SELECT clause in the WHERE clause (in SQL, the matter is evaluated before the former).
Also, you need proper type casting to compare money and numbers.
Finally, you need to turn on aggregation to compute the number of sales that satisfy the condition.
Assuming that you are using Postgres, that would be:
select count(*)
from sales
where total::numeric <> btl_price::numeric * btl_quantity
Try this:
SELECT *
FROM sales
WHERE total !=(btl_price * bottle_qty)
Good luck

Difference between HAVING and WHERE in SQL

I've seen in other questions that the difference between HAVING and WHERE in SQL is that HAVING is used post-aggregation whereas WHERE is used pre-aggregation. However, I am still unsure about when to use pre-aggregation filtering or post-aggregation filtering.
As a concrete example, why don't these two queries yield the same result (the second sums quantity prematurely in a way that squashes the GROUP BY call)?
Using WHERE to obtain number of condo sales of each real estate agent.
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;
Attempted use of HAVING to obtain the same quantity as above.
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;
Note: these were written/tested/executed in sqlite3.
The simple way to think about it is to consider the order in which the steps are applied.
Step 1: Where clause filters data
Step 2: Group by is implemented (SUM / MAX / MIN / ETC)
Step 3: Having clause filters the results
So in your 2 examples:
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;
Step 1: Filter by HouseId and Condo
Step 2: Add up the results
(number of houses that match the houseid and condo)
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;
Step 1: No Filter
Step 2: Add up quantity of all houses
Step 3: Filter the results by houseid and condo.
Hopefully this clears up what is happening.
The easiest way to decide which you should use is:
- Use WHERE to filter the data
- Use HAVING to filter the results of an aggregation (SUM / MAX / MIN / ETC)
WHERE filters rows from the database. Then, if the query has aggregation, aggregation is ran based on the aggregate functions and GROUP BY clause in the query. After that point, HAVING is applied to filter the grouping results. The only filtering that HAVING allows is filtering on GROUP BY columns or calculated aggregates.
I must assume that you're using MySQL for your example query since, as other answers have noted, your HAVING clause doesn't make sense and MySQL has some default behaviors which are occasionally problematic and confusing.
First, learn to use proper, explicit, standard JOIN syntax.
Second, your query should look like:
SELECT s.agentId, SUM(s.quantity) as total_sales
FROM sales s JOIN
houses h
ON s.houseId = h.houseId
WHERE h.type = 'condo'
GROUP BY s.agentId
ORDER BY total_sales;
Your version of the query should generate an error in any reasonable database, because the HAVING clause has columns that are neither GROUP BY keys nor aggregation functions.
Additional notes:
The delimiter for a string is single quotes. If you use double quotes, things may not work as you expect.
You should qualify all column references, especially when your query references more than one table.
JOIN conditions belong in the ON clause, not in a WHERE clause.
Filtering on h.type after the aggregation makes no sense. If it did work, the sum() would include non-condos because the filtering is happening too late.

Get the product of two values from two different tables

If anyone can help me figure out where I am going wrong with this SQL that would be great. Please see my attempt to answer it below. I have answer how I think it should be answered but I am very confused by the exam advice below, which says I should use a SUM function? I have googled this and I do not see how a SUM function can help here when I need get the product of two values in this case. Or am I missing something major?
Question: TotalValue is a column in Order relation that contains derived data representing total value (amount) of each order. Write a SQL SELECT statement that computes a value for this column.
My answer:
SELECT Product.ProductPrice * OrderLine.QuantityOrdered AS Total_Value
FROM Product,
OrderLine
GROUP BY Product;
Advice from exam paper:
This is a straightforward question. Tip: you need to use the SUM function. Also, note that you can take the sum of various records set using the GROUP BY clause.
Ok your question became a lot clearer once I clicked on the the hyperlink (blue text).
Each order is going to be made up of a quantity of 1 or more products.
So there could be 3 Product A and 5 Product B etc.
So you have to get the total for each product which is your Price * Quantity, but then you need to add them all together which is where the SUM comes in.
Example:
3 * ProductA Price (e.g. €5) = 15
5 * ProductB Price (e.g. €4) = 20
Total Value = 35
So you need to use the Product, Order and OrderLine tables.
Something like (I haven't tested it):
SELECT SUM(Product.ProductPrice * OrderLine.QuantityOrdered) FROM Product, Order, OrderLine
WHERE Order.OrderID = OrderLine.OrderID
AND Product.ProductID = OrerLine.ProductID
GROUP BY Order.OrderID
This should return rows containing the totalValue for each order - the GROUP BY clause causes the SUM to SUM over each group - not the entire rows.
For a single order you would need add (before the GROUP BY) "AND Order.OrderID = XXXXX" where XXXXX is the actual orders OrderId.

Multiple of same result even with group by

Alright so say I have a 'product_catalog', and 'orders' tables. Each order has the product_catalog_id as a foreign key. What I want to return as the query results is the product_code (name of the product associated with a specific product_catalog_id) + a count of how many of each product_code have been ordered. That's easy enough with something like this (Oracle SQL):
SELECT pc.product_code,
COUNT(*) as count
FROM orders o
join product_catalog pc on pc.product_catalog_id = o.product_catalog_id
GROUP BY pc.product_code
ORDER BY count DESC;
but I also want to print various pieces of information from the order table such as total of all monthly charges for that product_code. That would seem easy enough with something like this:
(o.monthly_base_charge*count(*)) as "Monthly Fee"
but the problem is that there have been various monthly fees for the same product_code over time. If I add the above line in and add 'o.monthly_base_charge' to the group by statement, then it will print out a unique row for every variation of pricing for that product_code. How do I get it to ignore those price variations and just add together every entry with that product code?
It is a little unclear what you are asking. My best guess is that you want the sum of the monthly base charge:
SELECT pc.product_code,
COUNT(*) as count,
sum(o.monthly_base_charge) as "Monthly Fee"
FROM orders o join
product_catalog pc
on pc.product_catalog_id = o.product_catalog_id
GROUP BY pc.product_code
ORDER BY count DESC;
I'm not sure if this is exactly what you want. What happens if you have two orders in the same month for the same product?
You may need to do something like this since SQL will not be able to know which monthly base charge to multiply by the count.
SELECT pc.product_code,
COUNT(*) as count,
(min(o.monthly_base_charge)*count(*)) as "Monthly Fee"
FROM orders o
join product_catalog pc on pc.product_catalog_id = o.product_catalog_id
GROUP BY pc.product_code
ORDER BY count DESC;
Or you will need to add o.monthly_base_charge to the group by in order for sql to know how to determine the count()
GROUP BY pc.product_code, o.monthly_base_charge

Total Count in Grouped TSQL Query

I have an performance heavy query, that filters out many unwanted records based on data in other tables etc.
I am averaging a column, and also returning the count for each average group. This is all working fine.
However, I would also like to include the percentage of the TOTAL count.
Is there any way of getting this total count without rerunning the whole query, or increasing the performance load significantly?
I would also prefer if I didn't need to completely restructure the sub query (e.g. by getting the total count outside of it), but can do if necessary.
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(data.*) AS BinCount
COUNT(data.*)/ ???TotalCount??? AS BinCountPercentage
FROM
(SELECT * FROM MultipleTablesWithJoins) data
GROUP BY data.EquipmentId
See Window functions.
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(*) AS BinCount,
COUNT(*)/ cast (cnt as float) AS BinCountPercentage
FROM
(SELECT *,
-- Here is total count of records
count(*) over() cnt
FROM MultipleTablesWithJoins) data
GROUP BY data.EquipmentId, cnt
EDIT: forgot to actually divide the numbers.
Another approach:
with data as
(
SELECT * FROM MultipleTablesWithJoins
)
,grand as
(
select count(*) as cnt from data
)
SELECT
data.EquipmentId,
AVG(MeasureValue) AS AverageValue,
COUNT(data.*) AS BinCount
COUNT(data.*)/ grand.cnt AS BinCountPercentage
FROM data cross join grand
GROUP BY data.EquipmentId