Using calculation with an an aliased column in ORDER BY - sql

As we all know, the ORDER BY clause is processed after the SELECT clause, so a column alias in the SELECT clause can be used.
However, I find that I can’t use the aliased column in a calculation in the ORDER BY clause.
WITH data AS(
SELECT *
FROM (VALUES
('apple'),
('banana'),
('cherry'),
('date')
) AS x(item)
)
SELECT item AS s
FROM data
-- ORDER BY s; -- OK
-- ORDER BY item + ''; -- OK
ORDER BY s + ''; -- Fails
I know there are alternative ways of doing this particular query, and I know that this is a trivial calculation, but I’m interested in why the column alias doesn’t work when in a calculation.
I have tested in PostgreSQL, MariaDB, SQLite and Oracle, and it works as expected. SQL Server appears to be the odd one out.

The documentation clearly states that:
The column names referenced in the ORDER BY clause must correspond to
either a column or column alias in the select list or to a column
defined in a table specified in the FROM clause without any
ambiguities. If the ORDER BY clause references a column alias from
the select list, the column alias must be used standalone, and not as
a part of some expression in ORDER BY clause:
Technically speaking, your query should work since order by clause is logically evaluated after select clause and it should have access to all expressions declared in select clause. But without looking at having access to the SQL specs I cannot comment whether it is a limitation of SQL Server or the other RDBMS implementing it as a bonus feature.
Anyway, you can use CROSS APPLY as a trick.... it is part of FROM clause so the expressions should be available in all subsequent clauses:
SELECT item
FROM t
CROSS APPLY (SELECT item + '') AS CA(item_for_sort)
ORDER BY item_for_sort

It is simply due to the way expressions are evaluated. A more illustrative example:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana')) AS sq(item)
)
SELECT item AS s
FROM data
ORDER BY CASE WHEN 1 = 1 THEN s END;
This returns the same Invalid column name error. The CASE expression (and the concatenation of s + '' in the simpler case) is evaluated before the alias in the select list is resolved.
One workaround for your simpler case is to append the empty string in the select list:
SELECT
item + '' AS s
...
ORDER BY s;
There are more complex ways, like using a derived table or CTE:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana') AS sq(item)
),
step2 AS
(
SELECT item AS s FROM data
)
SELECT s FROM step2 ORDER BY s+'';
This is just the way that SQL Server works, and I think you could say "well SQL Server is bad because of this" but SQL Server could also say "what the heck is this use case?" :-)

Related

using subquery's column alias as a property in main query

i want to know if the main query can see the alias, here's an example:
SELECT AVG(values)
FROM(
SELECT SUM(a1) AS values
FROM tableX
)
Does the first query see the alias "values"?
Does the first query see the alias "values"?
Yes, it does. The subquery creates a derived table, and aliases act as column names in that context. However, standard SQL requires that you give an alias to the subquery.
So:
SELECT AVG(vals)
FROM(
SELECT SUM(a1) AS vals
FROM tableX
) t --> alias of the subquery
Side notes:
values is a language keyword, hence not a good choice for a column name; I renamed it to vals in the query
Your example is really contrived; the subquery always returns one row, so aggregating again in the outer query makes little sense: this is guaranteed to return the same value as that of the subquery. A more useful example would put a group by clause in the subquery, like so
SELECT AVG(vals)
FROM(
SELECT SUM(a1) AS vals
FROM tableX
GROUP BY id
) t

OUTER/CROSS APPLY Subquery without FROM clause

Most online documentation or tutorials discussing OUTER|CROSS APPLY describe something like:
SELECT columns
FROM table OUTER|CROSS APPLY (SELECT … FROM …);
The subquery is normally a full SELECT … FROM … query.
I must have read somewhere that the subquery doesn’t need a FROM in which case the columns appear to come from the main query:
SELECT columns
FROM table OUTER|CROSS APPLY (SELECT … );
because I have used it routinely as a method to pre-calculate columns.
The question is what is really happening if the FROM is omitted from the sub query? Is it short for something else? I found that it does not mean the same as from the main table.
I have a sample here: http://sqlfiddle.com/#!18/0188f7/4/1
First consider
SELECT o.name, o.type
FROM sys.objects o
Now consider
SELECT o.name, (SELECT o.type) AS type
FROM sys.objects o
A SELECT without a FROM is as though selecting from an imaginary single row table. The above doesn't change the results the scalar subquery just acts as a correlated sub query and uses the value from the outer query.
APPLY behaves in the same way. References to columns from the outer query are just passed in as correlated parameters. So this is the same as
SELECT o.name, ca.type
FROM sys.objects o
CROSS APPLY (SELECT o.type) AS ca
But APPLY in general is more capable than a scalar subquery in the SELECT (in that it can act to expand a row out or remove rows from the result)
What you have mentioned is not SUBQUERY. It is separate table expression. Whether you use FROM clause in the right expression or not problem.
If you use FROM clause in right table expression then you have got a source for the data in right table expression.
If you dont use FROM clause in the right expression, your source of data comes from left table expression.
First we will see what is APPLY operator. Reference BOL
Using APPLY
Both the left and right operands of the APPLY operator are table
expressions. The main difference between these operands is that the
right_table_source can use a table-valued function that takes a column
from the left_table_source as one of the arguments of the function.
The left_table_source can include table-valued functions, but it
cannot contain arguments that are columns from the right_table_source.
The APPLY operator works in the following way to produce the table
source for the FROM clause:
Evaluates right_table_source against each row of the left_table_source to produce rowsets.
The values in the right_table_source depend on left_table_source.
right_table_source can be represented approximately this way:
TVF(left_table_source.row), where TVF is a table-valued function.
Combines the result sets that are produced for each row in the evaluation of right_table_source with the left_table_source by
performing a UNION ALL operation.
The list of columns produced by the result of the APPLY operator is
the set of columns from the left_table_source that is combined with
the list of columns from the right_table_source.
Based on the way you are using APPLY operator, it will behave as correlated subquery or CROSS JOIN
Using values of the left table expression in right table expression
-- without FROM (similar to Correlated Subquery)
SELECT id, data, value
FROM test OUTER APPLY(SELECT data*10 AS value) AS sq;
Not using values of left table expression in right table expression
-- FROM table (Similar to cross join)
SELECT id, data, value
FROM test OUTER APPLY(SELECT data*10 AS value FROM test) AS sq;
Omitting the FROM statement is not specific to a CROSS/OUTER APPLY; any valid SQL select statement can omit it. By not using FROM you have no source for your data, so you can't specify columns within that source. Rather you can only select values that already exist; be that constants defined in the statement itself, or in some cases (e.g. subqueries) columns referenced from other parts of the query.
This is simpler to understand if you're familiar with Oracle's Dual table; a table with 1 row. In MS SQL that table would look like this:
-- Ref: https://blog.sqlauthority.com/2010/07/20/sql-server-select-from-dual-dual-equivalent/
CREATE TABLE DUAL
(
DUMMY VARCHAR(1) NOT NULL
, CONSTRAINT CHK_ColumnD_DocExc CHECK (DUMMY = 'X') -- ensure this column can only hold the value X
, CONSTRAINT PK_DUAL PRIMARY KEY (DUMMY) -- ensure we can only have unique values... combined with the above means we can only ever have 1 row
)
GO
INSERT INTO DUAL (DUMMY)
VALUES ('X')
GO
You can then do select 1 one, 'something else' two from dual. You're not really using dual; just ensuring that you have a table which will always return exactly 1 row.
Now in SQL anywhere you omit a FROM statement consider that statement as if it said FROM DUAL / it has the same meaning, only SQL allows this more shorthand approach.
Update
You mention in the comments that you don't see how you can reference columns from the original statement when in a subquery (e.g. of the kind you may see when using APPLY). The below code shows this without the APPLY scenario. Admittedly the demo code here's not somehting you'd ever use (since you could just to where Something like '%o%' on the original statement without needing the subquery/in statement), but for illustrative purposes it shows exactly the same sort of scenario as you've got with your APPLY scenario; i.e. the statement is just returning the value of SOMETHING for the current row.
declare #someTable table (
Id bigint not null identity(1,1)
, Something nvarchar(32) not null
)
insert #someTable (Something) values ('one'), ('two'), ('three')
select *
from #someTable x
where x.Something in
(
-- this subquery references the SOMETHING column from above, but doesn't have a FROM statement
-- note: there is only 1 value at a time for something here; not all 3 values at once; it's the same single value as Something as we have before the in keyword above
select Something
where Something like '%o%'
)

ORDER BY in a partition - SELECT keyword

Please see the DDL below:
create table #names (name varchar(20), Gender char(1))
insert into #names VALUES ('Ian', 'M')
insert into #names values ('Marie', 'F')
insert into #names values ('andy', 'F')
insert into #names values ('karen', 'F')
and the SQL below:
select row_number() over (order by (select null)) from #names
This adds a unique number to each row. I could also do this (which does not add a unique row):
select row_number() over (partition by gender order by (name)) from #names
Why do you not need 'SELECT name', however you do not need SELECT null?
As far as I can tell, this is just a quirk of SQL Server. SQL Server does not permit constants in ORDER BY (nor in GROUP BY, which can occur in other contexts).
Probably the origin of this is the ORDER BY clause in a SELECT statement:
ORDER BY 1
where "1" is a column reference rather than a constant. To prevent confusion, (I am guessing), the designers of the language do not allow other constants there. After all, would ORDER BY 2 + 1 refer to the third column? To the sum of the values in the two columns? To the constant 3?
I think this was just carried over into the windows syntax. There is a way around it -- as you have seen -- by using a subquery. The following should also work:
ROW_NUMBER() ORDER BY (CASE WHEN NAME = NULL THEN 'Never Happens' ELSE 'Always' END)
Because a column is mentioned, this is permitted. But, = NULL never returns true, so a constant is used for the sorting. I use the SELECT NULL subquery, however.
The Order By clause has 4 basic syntax structures.
Specifying a single column defined in the select list
Specifying a column that is not defined in the select list
Specifying an alias as the sort column
Specifying an expression as the sort column
You can review the MSDN documentation here.
https://msdn.microsoft.com/en-us/library/ms188385.aspx
I believe that your SELECT NULL, or really any constant that you want to specify in your order by clause, would require the select because the database engine is evaluating the constant as a #4 structure, an expression. As proof, in my query example, I have used a COUNT(*) in lieu of your select null.
I believe when you specify Name or Group in your order by clause, you are actually using a different order by structure, possibly #1. Here is my proof from the execution plan and results of your corrected first and original second query's sort operation. I have removed the partition because it's not relevant to our discussion.
select row_number() over (order by (select null)),name from #names
select row_number() over (order by name),name from #names
select row_number() over (order by Gender),name from #names
I have attached the execution plans for the three queries.
As you can see, no Sort Operation is performed on the data that is passed to the Segment operator which handles the window function. This is mirrored in the results of these queries, also pictured below.
So basically, SQL Server just ignored or did not operate on your Order By clause sub-query, because it could not associate the values which you returned in the sub-query to a particular parent column using method #4 and the reason that you do not specify "SELECT name" in your Order By sub-query is because you are actually using a different Order By syntax structure.

Ordering by expression from Select

I need to make a query like this:
SELECT (t.a-t.b) AS 'difference'
FROM t
ORDER BY abs(t.a-t.b)
Is there a way not to duplicate code (t.a-t.b) ? Thank you for your answers
You can wrap the SQL statement and then perform the ORDER BY if you're performing an absolute value on it.
SELECT * FROM
(
SELECT (t.a-t.b) AS "difference"
FROM t
) a
ORDER BY abs(a.difference)
UPDATE: I used SQL Server the 1st time, but depending on your environment (Oracle, MySQL), you may need to include double quotes around the column alias, so:
SELECT * FROM
(
SELECT (t.a-t.b) AS "difference"
FROM t
) a
ORDER BY abs("a.difference")

SQLServer SQL query with a row counter

I have a SQL query, that returns a set of rows:
SELECT id, name FROM users where group = 2
I need to also include a column that has an incrementing integer value, so the first row needs to have a 1 in the counter column, the second a 2, the third a 3 etc
The query shown here is just a simplified example, in reality the query could be arbitrarily complex, with several joins and nested queries.
I know this could be achieved using a temporary table with an autonumber field, but is there a way of doing it within the query itself ?
For starters, something along the lines of:
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
However, it's important to note that the ROW_NUMBER() OVER (ORDER BY ...) construct only determines the values of Row_Counter, it doesn't guarantee the ordering of the results.
Unless the SELECT itself has an explicit ORDER BY clause, the results could be returned in any order, dependent on how SQL Server decides to optimise the query. (See this article for more info.)
The only way to guarantee that the results will always be returned in Row_Counter order is to apply exactly the same ordering to both the SELECT and the ROW_NUMBER():
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
ORDER BY my_order_column -- exact copy of the ordering used for Row_Counter
The above pattern will always return results in the correct order and works well for simple queries, but what about an "arbitrarily complex" query with perhaps dozens of expressions in the ORDER BY clause? In those situations I prefer something like this instead:
SELECT t.*
FROM
(
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY ...) AS Row_Counter -- complex ordering
FROM my_table
) AS t
ORDER BY t.Row_Counter
Using a nested query means that there's no need to duplicate the complicated ORDER BY clause, which means less clutter and easier maintenance. The outer ORDER BY t.Row_Counter also makes the intent of the query much clearer to your fellow developers.
In SQL Server 2005 and up, you can use the ROW_NUMBER() function, which has options for the sort order and the groups over which the counts are done (and reset).
The simplest way is to use a variable row counter. However it would be two actual SQL commands. One to set the variable, and then the query as follows:
SET #n=0;
SELECT #n:=#n+1, a.* FROM tablename a
Your query can be as complex as you like with joins etc. I usually make this a stored procedure. You can have all kinds of fun with the variable, even use it to calculate against field values. The key is the :=
Heres a different approach.
If you have several tables of data that are not joinable, or you for some reason dont want to count all the rows at the same time but you still want them to be part off the same rowcount, you can create a table that does the job for you.
Example:
create table #test (
rowcounter int identity,
invoicenumber varchar(30)
)
insert into #test(invoicenumber) select [column] from [Table1]
insert into #test(invoicenumber) select [column] from [Table2]
insert into #test(invoicenumber) select [column] from [Table3]
select * from #test
drop table #test