SQL - using alias in Group By - sql

Just curious about SQL syntax. So if I have
SELECT
itemName as ItemName,
substring(itemName, 1,1) as FirstLetter,
Count(itemName)
FROM table1
GROUP BY itemName, FirstLetter
This would be incorrect because
GROUP BY itemName, FirstLetter
really should be
GROUP BY itemName, substring(itemName, 1,1)
But why can't we simply use the former for convenience?

SQL is implemented as if a query was executed in the following order:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
For most relational database systems, this order explains which names (columns or aliases) are valid because they must have been introduced in a previous step.
So in Oracle and SQL Server, you cannot use a term in the GROUP BY clause that you define in the SELECT clause because the GROUP BY is executed before the SELECT clause.
There are exceptions though: MySQL and Postgres seem to have additional smartness that allows it.

You could always use a subquery so you can use the alias; Of course, check the performance (Possible the db server will run both the same, but never hurts to verify):
SELECT ItemName, FirstLetter, COUNT(ItemName)
FROM (
SELECT ItemName, SUBSTRING(ItemName, 1, 1) AS FirstLetter
FROM table1
) ItemNames
GROUP BY ItemName, FirstLetter

At least in PostgreSQL you can use the column number in the resultset in your GROUP BY clause:
SELECT
itemName as ItemName,
substring(itemName, 1,1) as FirstLetter,
Count(itemName)
FROM table1
GROUP BY 1, 2
Of course this starts to be a pain if you are doing this interactively and you edit the query to change the number or order of columns in the result. But still.

SQL Server doesn't allow you to reference the alias in the GROUP BY clause because of the logical order of processing. The GROUP BY clause is processed before the SELECT clause, so the alias is not known when the GROUP BY clause is evaluated. This also explains why you can use the alias in the ORDER BY clause.
Here is one source for information on the SQL Server logical processing phases.

I'm not answering why it is so, but only wanted to show a way around that limitation in SQL Server by using CROSS APPLY to create the alias. You then use it in the GROUP BY clause, like so:
SELECT
itemName as ItemName,
FirstLetter,
Count(itemName)
FROM table1
CROSS APPLY (SELECT substring(itemName, 1,1) as FirstLetter) Alias
GROUP BY itemName, FirstLetter

Caution that using alias in the Group By (for services that support it, such as postgres) can have unintended results. For example, if you create an alias that already exists in the inner statement, the Group By will chose the inner field name.
-- Working example in postgres
select col1 as col1_1, avg(col3) as col2_1
from
(select gender as col1, maritalstatus as col2,
yearlyincome as col3 from customer) as layer_1
group by col1_1;
-- Failing example in postgres
select col2 as col1, avg(col3)
from
(select gender as col1, maritalstatus as col2,
yearlyincome as col3 from customer) as layer_1
group by col1;

Some DBMSs will let you use an alias instead of having to repeat the entire expression.
Teradata is one such example.
I avoid ordinal position notation as recommended by Bill for reasons documented in this SO question.
The easy and robust alternative is to always repeat the expression in the GROUP BY clause.
DRY does NOT apply to SQL.

Beware of using aliases when grouping the results from a view in SQLite. You will get unexpected results if the alias name is the same as the column name of any underlying tables (to the views.)

Back in the day I found that Rdb, the former DEC product now supported by Oracle allowed the column alias to be used in the GROUP BY. Mainstream Oracle through version 11 does not allow the column alias to be used in the GROUP BY. Not sure what Postgresql, SQL Server, MySQL, etc will or won't allow. YMMV.

In at least Postgres, you can use the alias name in the group by clause:
SELECT
itemName as ItemName1,
substring(itemName, 1,1) as FirstLetter,
Count(itemName)
FROM table1
GROUP BY ItemName1, FirstLetter;
I wouldn't recommend renaming an alias as a change in capitalization, that causes confusion.

Related

Alias not available with GROUP BY while ORDER BY is working fine [duplicate]

Just curious about SQL syntax. So if I have
SELECT
itemName as ItemName,
substring(itemName, 1,1) as FirstLetter,
Count(itemName)
FROM table1
GROUP BY itemName, FirstLetter
This would be incorrect because
GROUP BY itemName, FirstLetter
really should be
GROUP BY itemName, substring(itemName, 1,1)
But why can't we simply use the former for convenience?
SQL is implemented as if a query was executed in the following order:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
For most relational database systems, this order explains which names (columns or aliases) are valid because they must have been introduced in a previous step.
So in Oracle and SQL Server, you cannot use a term in the GROUP BY clause that you define in the SELECT clause because the GROUP BY is executed before the SELECT clause.
There are exceptions though: MySQL and Postgres seem to have additional smartness that allows it.
You could always use a subquery so you can use the alias; Of course, check the performance (Possible the db server will run both the same, but never hurts to verify):
SELECT ItemName, FirstLetter, COUNT(ItemName)
FROM (
SELECT ItemName, SUBSTRING(ItemName, 1, 1) AS FirstLetter
FROM table1
) ItemNames
GROUP BY ItemName, FirstLetter
At least in PostgreSQL you can use the column number in the resultset in your GROUP BY clause:
SELECT
itemName as ItemName,
substring(itemName, 1,1) as FirstLetter,
Count(itemName)
FROM table1
GROUP BY 1, 2
Of course this starts to be a pain if you are doing this interactively and you edit the query to change the number or order of columns in the result. But still.
SQL Server doesn't allow you to reference the alias in the GROUP BY clause because of the logical order of processing. The GROUP BY clause is processed before the SELECT clause, so the alias is not known when the GROUP BY clause is evaluated. This also explains why you can use the alias in the ORDER BY clause.
Here is one source for information on the SQL Server logical processing phases.
I'm not answering why it is so, but only wanted to show a way around that limitation in SQL Server by using CROSS APPLY to create the alias. You then use it in the GROUP BY clause, like so:
SELECT
itemName as ItemName,
FirstLetter,
Count(itemName)
FROM table1
CROSS APPLY (SELECT substring(itemName, 1,1) as FirstLetter) Alias
GROUP BY itemName, FirstLetter
Caution that using alias in the Group By (for services that support it, such as postgres) can have unintended results. For example, if you create an alias that already exists in the inner statement, the Group By will chose the inner field name.
-- Working example in postgres
select col1 as col1_1, avg(col3) as col2_1
from
(select gender as col1, maritalstatus as col2,
yearlyincome as col3 from customer) as layer_1
group by col1_1;
-- Failing example in postgres
select col2 as col1, avg(col3)
from
(select gender as col1, maritalstatus as col2,
yearlyincome as col3 from customer) as layer_1
group by col1;
Some DBMSs will let you use an alias instead of having to repeat the entire expression.
Teradata is one such example.
I avoid ordinal position notation as recommended by Bill for reasons documented in this SO question.
The easy and robust alternative is to always repeat the expression in the GROUP BY clause.
DRY does NOT apply to SQL.
Beware of using aliases when grouping the results from a view in SQLite. You will get unexpected results if the alias name is the same as the column name of any underlying tables (to the views.)
Back in the day I found that Rdb, the former DEC product now supported by Oracle allowed the column alias to be used in the GROUP BY. Mainstream Oracle through version 11 does not allow the column alias to be used in the GROUP BY. Not sure what Postgresql, SQL Server, MySQL, etc will or won't allow. YMMV.
In at least Postgres, you can use the alias name in the group by clause:
SELECT
itemName as ItemName1,
substring(itemName, 1,1) as FirstLetter,
Count(itemName)
FROM table1
GROUP BY ItemName1, FirstLetter;
I wouldn't recommend renaming an alias as a change in capitalization, that causes confusion.

SAS SQL SELECT DISTINCT WITH GROUP BY

What if a SQL code as below?
Proc SQL;
SELECT DISTINCT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
QUIT;
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination Or DISTINCT just a redundant word?
Thanks~
Use DISTINCT with GROUP BY. Any possible error will occur when using this combination? Or DISTINCT just a redundant word?
This won't error, but that's just unnecessary redondancy. GROUP BY ID guarantees that each ID will appear only on one row in the resulset. There is no benefit for adding DISTINCT here - and it makes the intent of the query harder to understand.
On the other hand, there are situations where you would use DISTINCT without GROUP BY: typically when you want to deduplicate a set of columns, but do not need to use aggregate functions (SUM(), COUNT()...).
SELECT ID,SUM(AMOUNT) AS M,SUM(NO) AS CNT
FROM CUSTOMER_LIST
GROUP BY ID
ORDER BY CNT DESC;
We already group by id so no need distinct id

Remove duplicate sub-query

I have a complex SQL query that can be simplified to the below:
Select ColA,ColB,ColC,ColD
From MyTable
Where (ColA In (Select ItemID From Items Where ItemName like '%xxx%')
or ColB In (Select ItemID From Items Where ItemName like '%xxx%'))
As you can see, the sub-query appears twice. Is the compiler intelligent enough to detect this and gets the result of the sub-query only once? Or does the sub-query run twice?
FYI, table Items has about 20,000 rows and MyTable has about 200,000 rows.
Is there another way to re-write this SQL statement so that the sub-query appears/runs only once?
Update: The Where clause in the main query is dynamic and added only when needed (i.e. only when a user searches for 'xxx'). Hence changes to the main select statement or re-structuring of the query are not possible.
UPDATE Your request not to change the query, just the WHERE
You can pack the CTE directly in the place where it is called (untested):
Select ColA,ColB,ColC,ColD
From MyTable
Where EXISTS (SELECT 1 FROM (Select i.ItemID
From Items AS i
Where iItemName like '%xxx%') AS itm
WHERE itm.ItemID=MyTable.ColA OR itm.ItemID=MyTable.ColB)
previous
I think this should be the same...
WITH MyCTE AS
(
Select ItemID From Items Where ItemName like '%xxx%'
)
Select ColA,ColB,ColC,ColD
From MyTable
Where EXISTS (SELECT 1 FROM MyCTE WHERE ItemID=ColA OR ItemID=ColB)
A substring LIKE search is - for sure - not performant.
If you can reduce your "Items" to just a few rows with your LIKE filter, you must test which is fastest.
You can also write the query like this:
SELECT ColA, ColB, ColC, ColD
FROM MyTable
WHERE EXISTS(
(SELECT ItemID FROM Items WHERE ItemName LIKE '%xxx%')
INTERSECT
SELECT t.v FROM (VALUES (ColA), (ColB)) AS t(v) )
There is no guarantee that it will follow the actual execution order. It depends on how you write a query. Identical subqueries are normally only performed once.
There is a WITH clause in standard SQL.
WITH mySubQuery AS
(
[the subquery code]
)
SELECT * FROM
mySubQuery AS sq
WHERE xyz IN (mySubQuery)
SQL programmers can use CTE (Common Table Expression) in such cases
You can define a CTE using the sub-query once and use it in the SQL statement by referencing more than once.
Please refer to SQL CTE Common Table Expression tutorial for samples
CTE's are very powerful tools for SQL developers especially when used as recursive-queries

SQL alias for SELECT statement

I would like to do something like
(SELECT ... FROM ...) AS my_select
WHERE id IN (SELECT MAX(id) FROM my_select GROUP BY name)
Is it possible to somehow do the "AS my_select" part (i.e. assign an alias to a SELECT statement)?
(Note: This is a theoretical question. I realize that I can do it without assign an alias to a SELECT statement, but I would like to know whether I can do it with that.)
Not sure exactly what you try to denote with that syntax, but in almost all RDBMS-es you can use a subquery in the FROM clause (sometimes called an "inline-view"):
SELECT..
FROM (
SELECT ...
FROM ...
) my_select
WHERE ...
In advanced "enterprise" RDBMS-es (like oracle, SQL Server, postgresql) you can use common table expressions which allows you to refer to a query by name and reuse it even multiple times:
-- Define the CTE expression name and column list.
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
AS
-- Define the CTE query.
(
SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
)
-- Define the outer query referencing the CTE name.
SELECT SalesPersonID, COUNT(SalesOrderID) AS TotalSales, SalesYear
FROM Sales_CTE
GROUP BY SalesYear, SalesPersonID
ORDER BY SalesPersonID, SalesYear;
(example from http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx)
You can do this using the WITH clause of the SELECT statement:
;
WITH my_select As (SELECT ... FROM ...)
SELECT * FROM foo
WHERE id IN (SELECT MAX(id) FROM my_select GROUP BY name)
That's the ANSI/ISO SQL Syntax. I know that SQL Server, Oracle and DB2 support it. Not sure about the others...
Yes, but you can select only one column in your subselect
SELECT (SELECT id FROM bla) AS my_select FROM bla2
You could store this into a temporary table.
So instead of doing the CTE/sub query you would use a temp table.
Good article on these here http://codingsight.com/introduction-to-temporary-tables-in-sql-server/

How can I use the GROUP BY SQL clause with no aggregate function?

When I try to use the following SELECT statement:
SELECT [lots of columns]
FROM Client, Customer, Document, Group
WHERE [some conditions]
GROUP BY Group.id
SQL Server complains that the columns I selected are not part of the GROUP BY statement nor an aggregate function. Am I using GROUP BY wrong? What should I be using instead?
To return all single occurences of a group by field, together with associated field values, write a query like:
select group_field,
max(other_field1),
max(other_field2),
...
from mytable1
join mytable2 on ...
group by group_field
having count(*) = 1;
Yes, you are using GROUP BY incorrectly. The point of using GROUP BY is to use aggregate functions. If you have no aggregrate functions you probably want SELECT DISTINCT instead.
SELECT DISTINCT
col1,
col2,
-- etc
coln
FROM Client
JOIN Customer ON ...
JOIN Document ON ...
JOIN [Group] ON ...
WHERE ...
My first guess would be that the problem is that you have table called Group, which I believe is a reserved word in SQL. Try wrapping the Group name with ' '
You want to group by all columns you are selecting that is not in an aggregate funcion.
SELECT ProductName, ProductCategory, SUM(ProductAmount)
FROM Products
GROUP BY ProductName, ProductCategory
This will give you a disticnt result of Product names and categories with the sum total of product amount in all aggregate child records for that group.