I got this statement
SELECT ITEM.ITEMID, ITEMNAME, QUANTITY AS "QUANTITY SOLD"
FROM ORDERITEM,
NBUSER."ORDER",
PAYMENT,
ITEM
WHERE NBUSER."ORDER".PAYMENTID = PAYMENT.PAYMENTID
AND ITEM.ITEMID = ORDERITEM.ITEMID
AND PAYMENT.PAYMENTDATE BETWEEN '4/1/2017' AND '4/30/2017'
GROUP BY ITEM.ITEMID
ORDER BY ITEM.ITEMID DESC;
But it keeps giving me this error:
[Exception, Error code 30,000, SQLState 42Y36] Column reference
'ITEM.ITEMNAME' is invalid, or is part of an invalid expression. For
a SELECT list with a GROUP BY, the columns and expressions being
selected may only contain valid grouping expressions and valid
aggregate expressions. Line 1, column 1
I want to join the records with similar itemid together and adding the quantity up for all the same itemid.
First of all, you have 4 tables in the statement, but only 2 joins.
Do you do this intentionally? If not, you need to specify 1 more join.
When you use group by only for a part of columns in select, you must have an aggregate function for remaining columns in the select clause, in your case, for example:
select ITEM.ITEMID, max(ITEMNAME), sum(QUANTITY) AS "QUANTITY SOLD" ...
It is better to use table aliases for each column, otherwise, it is difficult to understand the sql statement in general.
You have defined alias here - ITEM.ITEMID (ITEM is an alias), but not there - ITEMNAME.
For the future reader, who may use another DBMS:
Depending on the standard version and the vendor, it may be allowed to additionally use columns in the SELECT statement, which are only functionally dependent on the grouping columns.
Source: An SQL 2008 standard draft (Link), part II, 7.12 -> General rule number 15:
" If T is a grouped table, then let G be the set of grouping columns of T. In each <value expression> contained
in <select list> , each column reference that references a column of T shall reference some column C that
is functionally dependent on G or shall be contained in an aggregated argument of a whose aggregation query is QS.
"
Related
Suppose I'm writing as query for the aggregate functions where I want result based on some conditions both on the column of the table and on aggregate function. So is it possible to use WHERE and HAVING clause to get expected result without GROUP BY clause.
I wrote following query for the above condition.
select *
from ORDER_DETAILS
where item_price > 1000
having count(item) >= 5 ;
First of all, having just like where, but can apply to aggregate function results.
You should keep track of the data rows and columns after each clause.
If we name a row_id property that can be used to locate one single row of a table. Then the where clause doesn't change the row_id.
When we use aggregate functions, it implies input multi rows and get a single result, that changes the row_id.In fact no group by clause means everything go to one bucket, and the output result only have one row.
My best guess is that you want to get original data rows, which have some attributes that passes aggregated value check.Eg found order details that item price>1000(origin filter) and more than 5 items in single order(aggregated filter).
So group + aggregate + having give you aggregated filter dataset, you can join this dataset back to origin table, then the result table have same row_id with original ORDER_DETAILS
select *
from ORDER_DETAILS
where item_price > 1000
and order_id in (
select order_id
from ORDER_DETAILS
group by order_id
having count(item) >= 5
);
Noteļ¼
order_id is the aggregated filter column example
I use in subquery for convenience, you can change it into join
If you are working with big data sql, like hive/spark, you can also use window functions to get the aggregate result on each row of original table.
Using 'HAVING' without 'GROUP BY' is not allowed:
SELECT *
FROM products
HAVING unitprice > avg(unitprice)
Column 'products.UnitPrice' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
But when placing the same code under 'EXISTS' - no problems:
SELECT *
FROM products p
WHERE EXISTS (SELECT 1
FROM products
HAVING p.unitprice > avg(unitprice))
Can you please explain why?
well the error is clear in first query UnitPrice is not part of aggregation nor group by
whereas in your second query you are comparing p.unitprice from table "products p" which doesn't need to be part of aggregation or group by , your second query is equivalent to :
select * from products p
where p.unitprice > (select avg(unitprice) FROM products)
which maybe this is more clear , that sql caculate avg(unitprice) then compares it with unitprice column from product.
HAVING filters after aggregation according to the SQL standard and in most databases.
Without a GROUP BY, there is still aggregation.
But in your case, you simply want a subquery and WHERE:
SELECT p.*
FROM products p
WHERE p.unitprice > (SELECT AVG(p2.unitprice) FROM products p2);
The problem comes from the columns you select :
SELECT *
and
SELECT 1
Unlike ordinary functions that are evaluated at each row, aggregate functions are computed once the whole dataset is processed, which means that in theory (at least without a GROUP BY statement), you can't select both aggregate and regular functions in a same column set (even if some DBMS still tolerate this).
It's easier to see when considering SUM(). You're not supposed to have an access to the total of a column before all rows have been returned, which prevents you to write something like SELECT price,SUM(price), for instance.
GROUP BY, now, enables you to regroup your rows according to a given criteria (actually, a bunch of columns), which makes these aggregate functions to be computed at the end of each of these groups instead of the whole dataset. Therefore, since all the column specified in GROUP BY are supposed to be the same for a given group, you're allowed to include them in your global SELECT statement.
This leads us to the actual failure cause: on first query, you select all columns. On the second one, you select none: only the constant 1, which is not part of the table itself.
My company has recently switched to big query, one issue I am having right now is that big query in standard SQL is not able to accept alias columns in query.
For eg. returns me Unrecognized name: product_code at [3:5].
Does anyone knows a workaround on this issue?
select sales, t_001 as product_code
from "project_01.sales_001.trans_datamart"
where product_code = '001-40040-00'
According to the documentation, you can not reference an alias from the SELECT and list it in a WHERE clause. The where clause filters each row against a bool_expression.
However, there is a way for you to achieve what you want. Below is the syntax:
select sales, product_code
from (select *, t_001 as product_code from "project_01.sales_001.trans_datamart")
where product_code = '001-40040-00'
Therefore, you use the alias as a new column name within your from clause, which makes possible for you to filter using the alias you just created in your where clause.
I would also encourage you to check out this link with all the explanations about aliases in BigQuery.
I'm not aware of any SQL dialect that allows the use of a column alias in the WHERE clause.
Sticking to just the clauses in your example, SQL engines generally evaluate the FROM clause first, determining which tables to pull data from, then evaluate the WHERE clause to filter the retrieved data, and then the SELECT clause to determine what to display and how to display it.
Given that, the column alias is unknown to the SQL engine at the point that it's reading the WHERE clause.
So your options are to either use the column name in the WHERE clause, or, as Gordon suggests in the comments, put the alias in a sub-query or CTE that will be evaluated as part of the FROM clause.
Column name:
select sales, t_001 as product_code
from "project_01.sales_001.trans_datamart"
where t_001 = '001-40040-00' --<--- Modification here.
Sub-query:
select
sales,
product_code
from
(
select sales, t_001 as product_code
from "project_01.sales_001.trans_datamart"
) as d
where product_code = '001-40040-00'
I have following query in PostgreSQL:
SELECT
COUNT(a.log_id) AS overall_count
FROM
"Log" as a,
"License" as b
WHERE
a.license_id=7
AND
a.license_id=b.license_id
AND
b.limit_call > overall_count
GROUP BY
a.license_id;
Why do I get this error:
ERROR: column "overall_count" does not exist
My table structure:
License(license_id, license_name, limit_call, create_date, expire_date)
Log(log_id, license_id, log, call_date)
I want to check if a license has reached the limit for calls in a specific month.
SELECT a.license_id, a.limit_call
, count(b.license_id) AS overall_count
FROM "License" a
LEFT JOIN "Log" b USING (license_id)
WHERE a.license_id = 7
GROUP BY a.license_id -- , a.limit_call -- add in old versions
HAVING a.limit_call > count(b.license_id)
Since Postgres 9.1 the primary key covers all columns of a table in the GROUP BY clause. In older versions you'd have to add a.limit_call to the GROUP BY list. The release notes for 9.1:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause
Further reading:
Why can't I exclude dependent columns from `GROUP BY` when I aggregate by a key?
The condition you had in the WHERE clause has to move to the HAVING clause since it refers to the result of an aggregate function (after WHERE has been applied). And you cannot refer to output columns (column aliases) in the HAVING clause, where you can only reference input columns. So you have to repeat the expression. The manual:
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING
clauses; there you must write out the expression instead.
I reversed the order of tables in the FROM clause and cleaned up the syntax a bit to make it less confusing. USING is just a notational convenience here.
I used LEFT JOIN instead of JOIN, so you do not exclude licenses without any logs at all.
Only non-null values are counted by count(). Since you want to count related entries in table "Log" it is safer and slightly cheaper to use count(b.license_id). This column is used in the join, so we don't have to bother whether the column can be null or not.
count(*) is even shorter and slightly faster, yet. If you don't mind to get a count of 1 for 0 rows in the left table, use that.
Aside: I would advise not to use mixed case identifiers in Postgres if possible. Very error prone.
Pure conditional count(*):
SELECT COUNT(*) FILTER(where a.myfield > 0) AS my_count
FROM "Log" as a
GROUP BY a.license_id
so you:
get 0 for groups where the condition never meets
can add as many count(*) columns as you need
Filter-out the groups, having condition mismatch:
NOTE: you cannot use HAVING b.limit_call > ..., unless you group by limit_call. But you can use an agregate function to map many "limit_calls" in the group into a single value. For example, in your case, you can use MAX:
SELECT COUNT(a.log_id) AS overall_count
FROM "Log" as a
JOIN "License" b ON(a.license_id=b.license_id)
GROUP BY a.license_id
HAVING MAX(b.limit_call) > COUNT(a.log_id)
And don't care about duplicating COUNT(a.log_id) expression in the first and in the last lines. Postgres will optimize it.
The where query doesn't recognize your column alias, and furthermore, you're trying to filter out rows after aggregation. Try:
SELECT
COUNT(a.log_id) AS overall_count
FROM
"Log" as a,
"License" as b
WHERE
a.license_id=7
AND
a.license_id=b.license_id
GROUP BY
a.license_id
having b.limit_call > count(a.log_id);
The having clause is similar to the where clause, except that it deals with columns after an aggregation, whereas the where clause works on columns before an aggregation.
Also, is there a reason why your table names are enclosed in double quotes?
I'm getting an error stating that I have not specified OrdID's table:
Here's what I have:
SELECT Last, OrderLine.OrdID, OrdDate, SUM(Price*Qty)
FROM ((Cus INNER JOIN Orders ON Cus.CID=Orders.CID)
INNER JOIN OrderLine
ON Orders.OrdID=OrderLine.OrdID)
INNER JOIN ProdFabric
ON OrderLine.PrID=ProdFabric.PrID
AND OrderLine.Fabric=ProdFabric.Fabric
GROUP BY Last
ORDER BY Last DESC, OrdID DESC;
When I hit run, it keeps saying that OrdID could refer to more than one table listed in the FROM clause.
Why does it keep saying that as I've specified which table to select for OrdID.
Tables:
Cus (**CID**, Last, First, Phone)
Orders (**OrdID**, OrdDate, ShipDate, CID)
Manu (**ManuID**, Name, Phone, City)
Prods (**PrID**, ManuID, Category)
ProdFabric (**PrID**, **Fabric**, Price)
Orderline (**OrdId**, **PrID**, Fabric, Qty)
Your ORDER BY clause is valid SQL-92 syntax.
Sadly, the Access database engine is not SQL-92 compliant. It doesn't let you use a column correlation name ('alias') from the SELECT clause. If you had used this:
SUM(Price * Qty) AS total_price
...
ORDER BY total_price
you would have received an error. (Aside: you should consider giving this expression a column correlation name anyhow.)
Instead of correlation names, the Access data engine is expecting either a column name or an expression (the latter being illegal in SQL-92); specified columns need not appear in the SELECT clause (again illegal in SQL-92). Because any column from any table in the FROM clause can be used, you need to disambiguate them with the table name; if you used a table correlation name in the FROM clause then you must use it in the ORDER BY clause (I don't make the rules!)
To satisfy the Access database engine's requirements, I think you need to change your ORDER BY clause to this:
ORDER BY Last DESC, OrderLine.OrdID DESC;
As an aside, I think your code would be more readable if you qualify you columns with table names in your SELECT clause even when they are unambiguous in context (I find using full table names a little wordy and prefer short table correlation names, specified in the data dictionary and used consistently in all queries). As it stands I can only guess that OrdDate is from Orders, and Price and Qty are from OrderLine. I've not idea what Last represents.
The ORDER BY clause is agnostic to what you've specified in the SELECT list. Its possible for example to order by a field that you don't actually include in the output select list.
Hence you need to be sure that the fields in the Order by list are not ambigious.