what is aggregate function in sql? - sql

I have two queries both are working fine when they executed separately:
select distinct
style_ref
from
tbl_Size
where
order_ref='123'
select
sum(quantity)
from
tbl_size
where
order_ref='123'
But if I try to combine them it does not work
select distinct
style_ref, sum(quantity)
from
tbl_size
where
order_ref='123'
ERROR appears:
Column 'tbl_Size.style_ref' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

An aggregate function is one that combines several records into a single one. In your case, SUM. You're taking the sum, clearly, of more than one row at a time. Another example might be AVG, to get the average of several values.
You can't run aggregate functions, as your error says, alongside ungrouped columns, because that introduces multiple "layers" of data. In one row, you'd have something that described the entire dataset, and you'd have something else that described only a single record. This would be confusing, not to mention inefficient.
Rather than using DISTINCT in your example, you're probably looking to GROUP BY your column:
SELECT style_ref, sum(quantity)
FROM tbl_size
WHERE order_ref='123'
GROUP BY style_ref
This will group up every set of records, based on their style_ref value, then tell you the sum of the quantities. Thus, assuming your schema naming is accurate, it will tell you how many orders were present for each style_ref.
The above query is equivalent in meaning to the following:
SELECT DISTINCT style_ref, (SELECT SUM(quantity)
FROM tbl_size AS B
WHERE B.order_ref = '123'
AND B.style_ref = tbl_size.style_ref)
FROM tbl_size
WHERE order_ref = '123'
As you can see, the GROUP BY solution is much, much cleaner and better to use. But I included this just to describe what it returns in a arguably a bit more of a readable way. You can see here how the aggregate function (SUM) could be described as working on a separate plane from the style_ref column, so it'd be hard to combine those into a single one without GROUP BY.

An aggregate function is a function that returns one result for many rows - like sum in your example.
You can use them in conjunction with the group by clause in order to get one result per group:
select style_ref, sum(quantity)
from tbl_size
where order_ref='123'
group by style_ref

Related

SQL - using 'HAVING' with 'EXISTS' without using 'GROUP BY'

Using 'HAVING' without 'GROUP BY' is not allowed:
SELECT *
FROM products
HAVING unitprice > avg(unitprice)
Column 'products.UnitPrice' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
But when placing the same code under 'EXISTS' - no problems:
SELECT *
FROM products p
WHERE EXISTS (SELECT 1
FROM products
HAVING p.unitprice > avg(unitprice))
Can you please explain why?
well the error is clear in first query UnitPrice is not part of aggregation nor group by
whereas in your second query you are comparing p.unitprice from table "products p" which doesn't need to be part of aggregation or group by , your second query is equivalent to :
select * from products p
where p.unitprice > (select avg(unitprice) FROM products)
which maybe this is more clear , that sql caculate avg(unitprice) then compares it with unitprice column from product.
HAVING filters after aggregation according to the SQL standard and in most databases.
Without a GROUP BY, there is still aggregation.
But in your case, you simply want a subquery and WHERE:
SELECT p.*
FROM products p
WHERE p.unitprice > (SELECT AVG(p2.unitprice) FROM products p2);
The problem comes from the columns you select :
SELECT *
and
SELECT 1
Unlike ordinary functions that are evaluated at each row, aggregate functions are computed once the whole dataset is processed, which means that in theory (at least without a GROUP BY statement), you can't select both aggregate and regular functions in a same column set (even if some DBMS still tolerate this).
It's easier to see when considering SUM(). You're not supposed to have an access to the total of a column before all rows have been returned, which prevents you to write something like SELECT price,SUM(price), for instance.
GROUP BY, now, enables you to regroup your rows according to a given criteria (actually, a bunch of columns), which makes these aggregate functions to be computed at the end of each of these groups instead of the whole dataset. Therefore, since all the column specified in GROUP BY are supposed to be the same for a given group, you're allowed to include them in your global SELECT statement.
This leads us to the actual failure cause: on first query, you select all columns. On the second one, you select none: only the constant 1, which is not part of the table itself.

SQL Group By Column Part Number giving the data from most recent received date

New qith SQL my group by is not working and I am wanting it to pull the most recent POReleases.DateReceived date and group by part number. Here is what I have
SELECT POReleases.PONum, POReleases.PartNo, POReleases.JobNo, POReleases.Qty, POReleases.QtyRejected, POReleases.QtyCanceled, POReleases.DueDate, POReleases.DateReceived, PODet.ProdCode, PODet.Unit, PODet.UnitCost, PODet.QtyOrd, PODet.QtyRec, PODet.QtyReject, PODet.QtyCancel
FROM Waples.dbo.PODet PODet, Waples.dbo.POReleases POReleases
WHERE PODet.PartNo = POReleases.PartNo AND PODet.PONum = POReleases.PONum AND ((POReleases.DateReceived>{ts '2010-01-01 00:00:00'}))
GROUP BY PartNo
For starters, columns specified in the GROUP BY should be present in the select statement too. Here in your case only "PartNo" is used in GROUP BY clause whereas so many columns are used in the SELECT statement.
You can try WITH CTE to achieve this,
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER( PARTITION BY PartNo ORDER BY POReleases.DateReceived DESC) AS PartNoCount
FROM TABLENAME
) SELECT * FROM CTE
When you write an SQL statement, you should think about the logical flow, which might be technically slightly inaccurate due to optimizations, but still, it is a good thing to think about it like this:
without the from clause specifying the source relation, the filter cannot be evaluated, so at least logically, the from is the first thing to evaluate
without the where clause specifying which records should be kept from the source relation, the filtered records cannot be grouped, so, at least logically, the where precedes the group by
without the group by, specifying the groups, you cannot select values from the groups, so, at least logically, group by precedes select
So, the projection (select) is executed on the groups of filtered records, which are groups themselves. Since the groups have an attribute, namely PartNo, it becomes an aggregated column. The other columns, which were reachable before the group by, can no longer be reached in the select. If you want to reach them, you need to group by them as well, or use aggregated functions for them, since if you have a group by, you will be able to select only the aggregated columns, which are either aggregated functions or columns which became aggregated due to their presence in the group by.
Since you did not specify how this query is not working, I will have to assume that you have a syntax error in the selection, due to the fact that you refer to columns which are not aggregated. Also, you might want to use join instead of Descartes multiplication and finally, if you want to filter the groups, not the records of the initial relation (which is the result of a Descartes multiplication in your case), then you might consider using a having clause.

Is there any difference to use group by in a query ?

I have a query
SELECT bk_publisher, bk_price FROM books
GROUP BY bk_price, bk_publisher
and
SELECT bk_publisher ,bk_price FROM books
both are returning the same results. Means i have 12 records in my table and both queries returning the 12 records. What is the difference ? Although i am using group by, which is use with aggregate functions. But i want to know is group by making any difference here ?
SELECT bk_publisher, bk_price FROM books
GROUP BY bk_price, bk_publisher
Will result distinct pairs of (publisher, price), even if your table contains duplicated data.
SQL group by helps you group different results by some identical value (using aggregation functions on other values)
In your case it doesn't mean anything, but when you want to aggregate values based on identical field, you use group by.
For example, if you want to get the max price of a publisher:
SELECT bk_publisher, max(bk_price) FROM books
GROUP BY bk_publisher
The GROUP BY statement is used to group the result-set by one or more columns.Group by is used when you have repeating data and you want single record for each entry.
When you use GROUP BY, it will squeeze multiple rows having identical columns listed in GROUP BY as single row in output.
It also means that in general, all other columns mentioned in SELECT list must be wrapped in aggregate functions like sum(), avg(), count(), etc.
Some SQL engines like MySQL permit not using aggregates, but many people consider this a bug.
GROUP BY clause is apparently showing no effect because there is no repeating combination of bk_price, bk_publisher values.

SQL Nested Where with Sums

I've run into a syntax issue with SQL. What I'm trying to do here is add together all of the amounts paid on each order (paid each) an then only select those that are greater than sum of of paid each for a specific order# (1008). I've been trying to move around lots of different things here and I'm not having any luck.
This is what I have right now, though I've had many different things. Trying to use this simply returns an SQL statement not ended properly error. Any help you guys could give would be greatly appreciated. Do I have to use DISTINCT anywhere here?
SELECT ORDER#,
TO_CHAR(SUM(PAIDEACH), '$999.99') AS "Amount > Order 1008"
FROM ORDERITEMS
GROUP BY ORDER#
WHERE TO_CHAR > (SUM (PAIDEACH))
WHERE ORDER# = 1008;
Some versions of SQL regard the hash character (#) as the beginning of a comment. Others use double hyphen (--) and some use both. So, my first thought is that your ORDER# field is named incorrectly (though I can't imagine the engine would let you create a field with that name).
You have two WHERE keywords, which isn't allowed. If you have multiple WHERE conditions, you must link them together using boolean logic, with AND and OR keywords.
You have your WHERE condition after GROUP BY which should be reversed. Specify WHERE conditions before GROUP BY.
One of your WHERE conditions makes no sense. TO_CHAR > (SUM(paideach)): TO_CHAR() is a function which as far as I know is an Oracle function that converts numeric values to strings according to a specified format. The equivalent in SQL Server is CAST or CONVERT.
I'm guessing that you are trying to write a query that finds orders with amounts exceeding a particular value, but it's not very clear because one of your WHERE conditions specifies that the order number should be 1008, which would presumably only return one record.
The query should probably look more like this:
SELECT order,
SUM(paideach) AS amount
FROM orderitems
GROUP BY order
HAVING amount > 999.99;
This would select records from the orderitems table where the sum of paideach exceeds 999.99.
I'm not sure how order 1008 fits into things, so you will have to elaborate on that.
Other have commented on some of the things wrong with your query. I'll try to give more explicit hints about what I think you need to do to get the result I think you're looking for.
The problem seems to break into distinct sections, first finding the total for each order which you're close to and I think probably started from:
SELECT ORDER#, SUM(PAIDEACH) AS AMOUNT
FROM ORDERITEMS
GROUP BY ORDER#;
... finding the total for a specific order:
SELECT SUM(PAIDEACH)
FROM ORDERITEMS
WHERE ORDER# = 1008;
... and combining them, which is where you're stuck. The simplest way, and hopefully something you've recently been taught, is to use the HAVING clause, which comes after the GROUP BY and acts as a kind of filter that can be applied to the aggregated columns (which you can't do in the WHERE clause). If you had a fixed amount you could do this:
SELECT ORDER#, SUM(PAIDEACH) AS AMOUNT
FROM ORDERITEMS
GROUP BY ORDER#
HAVING SUM(PAIDEACH) > 5;
(Note that as #Bridge indicated you can't use the column alias, AMOUNT, in the having clause, you have to repeat the aggregation function SUM). But you don't have a fixed value, you want to use the actual total for order 1008, so you need to replace that fixed value with another query. I'll let you take that last step...
I'm not familiar with Oracle, and since it's homework I won't give you the answers, just a few ideas of what I think is wrong.
select statement should only have one where statement - can have more than one condition of course, just separated by logical operators (anything that evaluates to true will be included). E.g. : WHERE (column1 > column2) AND (column3 = 100)
Group by statements should after WHERE clauses
You can't refer to columns you've aliased in the select in the where clause of the same statement by their aliased name. For example this won't work:
SELECT column1 as hello
FROM table1
WHERE hello = 1
If there's a group by, the columns you're selecting should be the same as in that statement (or aggregates of those). This page does a better explanation of this than I do.

Select all columns on a group by throws error

I ran a query against Northwind database Products Table like below
select * from Northwind.dbo.Products GROUP BY CategoryID and i was hit with a error. I am sure you will also be hit by same error. So what is the correct statement that i need to execute to group all products with respect to their category id's.
edit: this like really helped understand a lot
http://weblogs.sqlteam.com/jeffs/archive/2007/07/20/but-why-must-that-column-be-contained-in-an-aggregate.aspx
You need to use an Aggregate function and then group by any non-aggregated columns.
I recommend reading up on GROUP BY.
If you're using GROUP BY in a query, all items in your SELECT statement must either be contained as part of an aggregate function, e.g. Sum() or Count(), else they will also need to be included in the GROUP BY clause.
Because you are using SELECT *, this is equivalent to listing ALL columns in your SELECT.
Therefore, either list them all in the GROUP BY too, use aggregating functions for the rest where possible, or only select the CategoryID.