SQL Nested Where with Sums - sql

I've run into a syntax issue with SQL. What I'm trying to do here is add together all of the amounts paid on each order (paid each) an then only select those that are greater than sum of of paid each for a specific order# (1008). I've been trying to move around lots of different things here and I'm not having any luck.
This is what I have right now, though I've had many different things. Trying to use this simply returns an SQL statement not ended properly error. Any help you guys could give would be greatly appreciated. Do I have to use DISTINCT anywhere here?
SELECT ORDER#,
TO_CHAR(SUM(PAIDEACH), '$999.99') AS "Amount > Order 1008"
FROM ORDERITEMS
GROUP BY ORDER#
WHERE TO_CHAR > (SUM (PAIDEACH))
WHERE ORDER# = 1008;

Some versions of SQL regard the hash character (#) as the beginning of a comment. Others use double hyphen (--) and some use both. So, my first thought is that your ORDER# field is named incorrectly (though I can't imagine the engine would let you create a field with that name).
You have two WHERE keywords, which isn't allowed. If you have multiple WHERE conditions, you must link them together using boolean logic, with AND and OR keywords.
You have your WHERE condition after GROUP BY which should be reversed. Specify WHERE conditions before GROUP BY.
One of your WHERE conditions makes no sense. TO_CHAR > (SUM(paideach)): TO_CHAR() is a function which as far as I know is an Oracle function that converts numeric values to strings according to a specified format. The equivalent in SQL Server is CAST or CONVERT.
I'm guessing that you are trying to write a query that finds orders with amounts exceeding a particular value, but it's not very clear because one of your WHERE conditions specifies that the order number should be 1008, which would presumably only return one record.
The query should probably look more like this:
SELECT order,
SUM(paideach) AS amount
FROM orderitems
GROUP BY order
HAVING amount > 999.99;
This would select records from the orderitems table where the sum of paideach exceeds 999.99.
I'm not sure how order 1008 fits into things, so you will have to elaborate on that.

Other have commented on some of the things wrong with your query. I'll try to give more explicit hints about what I think you need to do to get the result I think you're looking for.
The problem seems to break into distinct sections, first finding the total for each order which you're close to and I think probably started from:
SELECT ORDER#, SUM(PAIDEACH) AS AMOUNT
FROM ORDERITEMS
GROUP BY ORDER#;
... finding the total for a specific order:
SELECT SUM(PAIDEACH)
FROM ORDERITEMS
WHERE ORDER# = 1008;
... and combining them, which is where you're stuck. The simplest way, and hopefully something you've recently been taught, is to use the HAVING clause, which comes after the GROUP BY and acts as a kind of filter that can be applied to the aggregated columns (which you can't do in the WHERE clause). If you had a fixed amount you could do this:
SELECT ORDER#, SUM(PAIDEACH) AS AMOUNT
FROM ORDERITEMS
GROUP BY ORDER#
HAVING SUM(PAIDEACH) > 5;
(Note that as #Bridge indicated you can't use the column alias, AMOUNT, in the having clause, you have to repeat the aggregation function SUM). But you don't have a fixed value, you want to use the actual total for order 1008, so you need to replace that fixed value with another query. I'll let you take that last step...

I'm not familiar with Oracle, and since it's homework I won't give you the answers, just a few ideas of what I think is wrong.
select statement should only have one where statement - can have more than one condition of course, just separated by logical operators (anything that evaluates to true will be included). E.g. : WHERE (column1 > column2) AND (column3 = 100)
Group by statements should after WHERE clauses
You can't refer to columns you've aliased in the select in the where clause of the same statement by their aliased name. For example this won't work:
SELECT column1 as hello
FROM table1
WHERE hello = 1
If there's a group by, the columns you're selecting should be the same as in that statement (or aggregates of those). This page does a better explanation of this than I do.

Related

SQL Group By Column Part Number giving the data from most recent received date

New qith SQL my group by is not working and I am wanting it to pull the most recent POReleases.DateReceived date and group by part number. Here is what I have
SELECT POReleases.PONum, POReleases.PartNo, POReleases.JobNo, POReleases.Qty, POReleases.QtyRejected, POReleases.QtyCanceled, POReleases.DueDate, POReleases.DateReceived, PODet.ProdCode, PODet.Unit, PODet.UnitCost, PODet.QtyOrd, PODet.QtyRec, PODet.QtyReject, PODet.QtyCancel
FROM Waples.dbo.PODet PODet, Waples.dbo.POReleases POReleases
WHERE PODet.PartNo = POReleases.PartNo AND PODet.PONum = POReleases.PONum AND ((POReleases.DateReceived>{ts '2010-01-01 00:00:00'}))
GROUP BY PartNo
For starters, columns specified in the GROUP BY should be present in the select statement too. Here in your case only "PartNo" is used in GROUP BY clause whereas so many columns are used in the SELECT statement.
You can try WITH CTE to achieve this,
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER( PARTITION BY PartNo ORDER BY POReleases.DateReceived DESC) AS PartNoCount
FROM TABLENAME
) SELECT * FROM CTE
When you write an SQL statement, you should think about the logical flow, which might be technically slightly inaccurate due to optimizations, but still, it is a good thing to think about it like this:
without the from clause specifying the source relation, the filter cannot be evaluated, so at least logically, the from is the first thing to evaluate
without the where clause specifying which records should be kept from the source relation, the filtered records cannot be grouped, so, at least logically, the where precedes the group by
without the group by, specifying the groups, you cannot select values from the groups, so, at least logically, group by precedes select
So, the projection (select) is executed on the groups of filtered records, which are groups themselves. Since the groups have an attribute, namely PartNo, it becomes an aggregated column. The other columns, which were reachable before the group by, can no longer be reached in the select. If you want to reach them, you need to group by them as well, or use aggregated functions for them, since if you have a group by, you will be able to select only the aggregated columns, which are either aggregated functions or columns which became aggregated due to their presence in the group by.
Since you did not specify how this query is not working, I will have to assume that you have a syntax error in the selection, due to the fact that you refer to columns which are not aggregated. Also, you might want to use join instead of Descartes multiplication and finally, if you want to filter the groups, not the records of the initial relation (which is the result of a Descartes multiplication in your case), then you might consider using a having clause.

what is aggregate function in sql?

I have two queries both are working fine when they executed separately:
select distinct
style_ref
from
tbl_Size
where
order_ref='123'
select
sum(quantity)
from
tbl_size
where
order_ref='123'
But if I try to combine them it does not work
select distinct
style_ref, sum(quantity)
from
tbl_size
where
order_ref='123'
ERROR appears:
Column 'tbl_Size.style_ref' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
An aggregate function is one that combines several records into a single one. In your case, SUM. You're taking the sum, clearly, of more than one row at a time. Another example might be AVG, to get the average of several values.
You can't run aggregate functions, as your error says, alongside ungrouped columns, because that introduces multiple "layers" of data. In one row, you'd have something that described the entire dataset, and you'd have something else that described only a single record. This would be confusing, not to mention inefficient.
Rather than using DISTINCT in your example, you're probably looking to GROUP BY your column:
SELECT style_ref, sum(quantity)
FROM tbl_size
WHERE order_ref='123'
GROUP BY style_ref
This will group up every set of records, based on their style_ref value, then tell you the sum of the quantities. Thus, assuming your schema naming is accurate, it will tell you how many orders were present for each style_ref.
The above query is equivalent in meaning to the following:
SELECT DISTINCT style_ref, (SELECT SUM(quantity)
FROM tbl_size AS B
WHERE B.order_ref = '123'
AND B.style_ref = tbl_size.style_ref)
FROM tbl_size
WHERE order_ref = '123'
As you can see, the GROUP BY solution is much, much cleaner and better to use. But I included this just to describe what it returns in a arguably a bit more of a readable way. You can see here how the aggregate function (SUM) could be described as working on a separate plane from the style_ref column, so it'd be hard to combine those into a single one without GROUP BY.
An aggregate function is a function that returns one result for many rows - like sum in your example.
You can use them in conjunction with the group by clause in order to get one result per group:
select style_ref, sum(quantity)
from tbl_size
where order_ref='123'
group by style_ref

Selecting counts of a substring

I know this has to be an easy select but I am having no luck figuring it out. I've got a table has a field of customer grouping codes and I'm trying to get a count of each distinct character 2 through 6 sets. In my past foxpro experience a simple
select distinct substr(custcode,2,5), count (*) from a group by 1
would work, but this doesn't appear to work in sql server queries. The error message indicated it didn't like using the number reference in the group by so I changed it to custcode but the count just returns 1 for each, as I assume the count is after the distinct occurs so there is only one. If I change the count to count(distinct substring(custcode,2,5)) and remove the first distinct substring I just get a count of how many different codes exist. Can someone point out what I'm doing wrong here? Thanks.
The DISTINCT and GROUP BY are redundant, you just want GROUP BY, and you want to GROUP BY the same thing you are selecting:
select substr(custcode,2,5), count (*)
from a
group by substr(custcode,2,5)
In SQL Server you can use column aliases/numbers in the ORDER BY clause, but not in GROUP BY.
Ie. ORDER BY 1 will order by the first selected column, but many consider it bad practice to use column indexes, using aliases/column names is clearer.

Can I use non-aggregate columns with group by?

You cannot (should not) put non-aggregates in the SELECT line of a GROUP BY query.
I would however like access the one of the non-aggregates associated with the max. In plain english, I want a table with the oldest id of each kind.
CREATE TABLE stuff (
id int,
kind int,
age int
);
This query gives me the information I'm after:
SELECT kind, MAX(age)
FROM stuff
GROUP BY kind;
But it's not in the most useful form. I really want the id associated with each row so I can use it in later queries.
I'm looking for something like this:
SELECT id, kind, MAX(age)
FROM stuff
GROUP BY kind;
That outputs this:
SELECT stuff.*
FROM
stuff,
( SELECT kind, MAX(age)
FROM stuff
GROUP BY kind) maxes
WHERE
stuff.kind = maxes.kind AND
stuff.age = maxes.age
It really seems like there should be a way to get this information without needing to join. I just need the SQL engine to remember the other columns when it's calculating the max.
You can't get the Id of the row that MAX found, because there might not be only one id with the maximum age.
You cannot (should not) put non-aggregates in the SELECT line of a GROUP BY query.
You can, and have to, define what you are grouping by for the aggregate function to return the correct result.
MySQL (and SQLite) decided in their infinite wisdom that they would go against spec, and allow queries to accept GROUP BY clauses missing columns quoted in the SELECT - it effectively makes these queries not portable.
It really seems like there should be a way to get this information without needing to join.
Without access to the analytic/ranking/windowing functions that MySQL doesn't support, the self join to a derived table/inline view is the most portable means of getting the result you desire.
I think it's tempting indeed to ask the system to solve the problem in one pass rather than having to do the job twice (find the max, and the find the corresponding id). You can do using CONCAT (as suggested in Naktibalda refered article), not sure that would be more effeciant
SELECT MAX( CONCAT( LPAD(age, 10, '0'), '-', id)
FROM STUFF1
GROUP BY kind;
Should work, you have to split the answer to get the age and the id.
(That's really ugly though)
In recent databases you can use sum() over (parition by ...) to solve this problem:
select id, kind, age as max_age from (
select id, kind, age, max(age) over (partition by kind) as mage
from table)
where age = mage
This can then be single pass
PostgesSQL's DISTINCT ON will be useful here.
SELECT DISTINCT ON (kind) kind, id, age
FROM stuff
ORDER BY kind, age DESC;
This groups by kind and returns the first row in the ordered format. As we have ordered by age in descending order, we will get the row with max age for kind.
P.S. columns in DISTINCT ON should appear first in order by
You have to have a join because the aggregate function max retrieves many rows and chooses the max.
So you need a join to choose the one that the agregate function has found.
To put it a different way how would you expect the query to behave if you replaced max with sum?
An inner join might be more efficient than your sub query though.

What is the difference between HAVING and WHERE in SQL?

What is the difference between HAVING and WHERE in an SQL SELECT statement?
EDIT: I have marked Steven's answer as the correct one as it contained the key bit of information on the link:
When GROUP BY is not used, HAVING behaves like a WHERE clause
The situation I had seen the WHERE in did not have GROUP BY and is where my confusion started. Of course, until you know this you can't specify it in the question.
HAVING: is used to check conditions after the aggregation takes place.
WHERE: is used to check conditions before the aggregation takes place.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Gives you a table of all cities in MA and the number of addresses in each city.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Having Count(1)>5
Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.
Source
Number one difference for me: if HAVING was removed from the SQL language then life would go on more or less as before. Certainly, a minority queries would need to be rewritten using a derived table, CTE, etc but they would arguably be easier to understand and maintain as a result. Maybe vendors' optimizer code would need to be rewritten to account for this, again an opportunity for improvement within the industry.
Now consider for a moment removing WHERE from the language. This time the majority of queries in existence would need to be rewritten without an obvious alternative construct. Coders would have to get creative e.g. inner join to a table known to contain exactly one row (e.g. DUAL in Oracle) using the ON clause to simulate the prior WHERE clause. Such constructions would be contrived; it would be obvious there was something was missing from the language and the situation would be worse as a result.
TL;DR we could lose HAVING tomorrow and things would be no worse, possibly better, but the same cannot be said of WHERE.
From the answers here, it seems that many folk don't realize that a HAVING clause may be used without a GROUP BY clause. In this case, the HAVING clause is applied to the entire table expression and requires that only constants appear in the SELECT clause. Typically the HAVING clause will involve aggregates.
This is more useful than it sounds. For example, consider this query to test whether the name column is unique for all values in T:
SELECT 1 AS result
FROM T
HAVING COUNT( DISTINCT name ) = COUNT( name );
There are only two possible results: if the HAVING clause is true then the result with be a single row containing the value 1, otherwise the result will be the empty set.
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.
Check out this w3schools link for more information
Syntax:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
A query such as this:
SELECT column_name, COUNT( column_name ) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
HAVING COUNT( column_name ) >= 3;
...may be rewritten using a derived table (and omitting the HAVING) like this:
SELECT column_name, column_name_tally
FROM (
SELECT column_name, COUNT(column_name) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
) pointless_range_variable_required_here
WHERE column_name_tally >= 3;
The difference between the two is in the relationship to the GROUP BY clause:
WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.
References
SQLite SELECT Statement Syntax/Railroad Diagram
Informix SELECT Statement Syntax/Railroad Diagram
HAVING is used when you are using an aggregate such as GROUP BY.
SELECT edc_country, COUNT(*)
FROM Ed_Centers
GROUP BY edc_country
HAVING COUNT(*) > 1
ORDER BY edc_country;
WHERE is applied as a limitation on the set returned by SQL; it uses SQL's built-in set oeprations and indexes and therefore is the fastest way to filter result sets. Always use WHERE whenever possible.
HAVING is necessary for some aggregate filters. It filters the query AFTER sql has retrieved, assembled, and sorted the results. Therefore, it is much slower than WHERE and should be avoided except in those situations that require it.
SQL Server will let you get away with using HAVING even when WHERE would be much faster. Don't do it.
WHERE clause does not work for aggregate functions
means : you should not use like this
bonus : table name
SELECT name
FROM bonus
GROUP BY name
WHERE sum(salary) > 200
HERE Instead of using WHERE clause you have to use HAVING..
without using GROUP BY clause, HAVING clause just works as WHERE clause
SELECT name
FROM bonus
GROUP BY name
HAVING sum(salary) > 200
Difference b/w WHERE and HAVING clause:
The main difference between WHERE and HAVING clause is, WHERE is used for row operations and HAVING is used for column operations.
Why we need HAVING clause?
As we know, aggregate functions can only be performed on columns, so we can not use aggregate functions in WHERE clause. Therefore, we use aggregate functions in HAVING clause.
One way to think of it is that the having clause is an additional filter to the where clause.
A WHERE clause is used filters records from a result. The filter occurs before any groupings are made. A HAVING clause is used to filter values from a group
In an Aggregate query, (Any query Where an aggregate function is used) Predicates in a where clause are evaluated before the aggregated intermediate result set is generated,
Predicates in a Having clause are applied to the aggregate result set AFTER it has been generated. That's why predicate conditions on aggregate values must be placed in Having clause, not in the Where clause, and why you can use aliases defined in the Select clause in a Having Clause, but not in a Where Clause.
I had a problem and found out another difference between WHERE and HAVING. It does not act the same way on indexed columns.
WHERE my_indexed_row = 123 will show rows and automatically perform a "ORDER ASC" on other indexed rows.
HAVING my_indexed_row = 123 shows everything from the oldest "inserted" row to the newest one, no ordering.
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is used to filter values from a group (i.e., to
check conditions after aggregation into groups has been performed).
Resource from Here
From here.
the SQL standard requires that HAVING
must reference only columns in the
GROUP BY clause or columns used in
aggregate functions
as opposed to the WHERE clause which is applied to database rows
While working on a project, this was also my question. As stated above, the HAVING checks the condition on the query result already found. But WHERE is for checking condition while query runs.
Let me give an example to illustrate this. Suppose you have a database table like this.
usertable{ int userid, date datefield, int dailyincome }
Suppose, the following rows are in table:
1, 2011-05-20, 100
1, 2011-05-21, 50
1, 2011-05-30, 10
2, 2011-05-30, 10
2, 2011-05-20, 20
Now, we want to get the userids and sum(dailyincome) whose sum(dailyincome)>100
If we write:
SELECT userid, sum(dailyincome) FROM usertable WHERE
sum(dailyincome)>100 GROUP BY userid
This will be an error. The correct query would be:
SELECT userid, sum(dailyincome) FROM usertable GROUP BY userid HAVING
sum(dailyincome)>100
WHERE clause is used for comparing values in the base table, whereas the HAVING clause can be used for filtering the results of aggregate functions in the result set of the query
Click here!
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is
used to filter values from a group (i.e., to check conditions after
aggregation into groups has been performed).
I use HAVING for constraining a query based on the results of an aggregate function. E.G. select * in blahblahblah group by SOMETHING having count(SOMETHING)>0