Using a function-generated column in the where clause - sql

I have an SQL query, which calls a stored SQL function, I want to do this:
SELECT dbo.fn_is_current('''{round}''', r.fund_cd, r.rnd) as [current]
FROM BLAH
WHERE current = 1
The select works fine, however, it doesn't know "current". Even though (without the WHERE) the data it generates does have the "current" column, and it's correct.
So, I'm assuming that this is a notation issue.

You cannot use an alias from the select in the where clause (or even again in the same select). Just use a subquery:
SELECT t.*
FROM (SELECT dbo.fn_is_current('''{round}''', r.fund_cd, r.rnd) as [current]
FROM BLAH
) t
WHERE [current] = 1;
As a note: current is a very bad name for a column because it is a reserved word (in many databases at least, including SQL Server). The word is used when defining cursors. Use something else, such as currval.

Related

Using calculation with an an aliased column in ORDER BY

As we all know, the ORDER BY clause is processed after the SELECT clause, so a column alias in the SELECT clause can be used.
However, I find that I can’t use the aliased column in a calculation in the ORDER BY clause.
WITH data AS(
SELECT *
FROM (VALUES
('apple'),
('banana'),
('cherry'),
('date')
) AS x(item)
)
SELECT item AS s
FROM data
-- ORDER BY s; -- OK
-- ORDER BY item + ''; -- OK
ORDER BY s + ''; -- Fails
I know there are alternative ways of doing this particular query, and I know that this is a trivial calculation, but I’m interested in why the column alias doesn’t work when in a calculation.
I have tested in PostgreSQL, MariaDB, SQLite and Oracle, and it works as expected. SQL Server appears to be the odd one out.
The documentation clearly states that:
The column names referenced in the ORDER BY clause must correspond to
either a column or column alias in the select list or to a column
defined in a table specified in the FROM clause without any
ambiguities. If the ORDER BY clause references a column alias from
the select list, the column alias must be used standalone, and not as
a part of some expression in ORDER BY clause:
Technically speaking, your query should work since order by clause is logically evaluated after select clause and it should have access to all expressions declared in select clause. But without looking at having access to the SQL specs I cannot comment whether it is a limitation of SQL Server or the other RDBMS implementing it as a bonus feature.
Anyway, you can use CROSS APPLY as a trick.... it is part of FROM clause so the expressions should be available in all subsequent clauses:
SELECT item
FROM t
CROSS APPLY (SELECT item + '') AS CA(item_for_sort)
ORDER BY item_for_sort
It is simply due to the way expressions are evaluated. A more illustrative example:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana')) AS sq(item)
)
SELECT item AS s
FROM data
ORDER BY CASE WHEN 1 = 1 THEN s END;
This returns the same Invalid column name error. The CASE expression (and the concatenation of s + '' in the simpler case) is evaluated before the alias in the select list is resolved.
One workaround for your simpler case is to append the empty string in the select list:
SELECT
item + '' AS s
...
ORDER BY s;
There are more complex ways, like using a derived table or CTE:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana') AS sq(item)
),
step2 AS
(
SELECT item AS s FROM data
)
SELECT s FROM step2 ORDER BY s+'';
This is just the way that SQL Server works, and I think you could say "well SQL Server is bad because of this" but SQL Server could also say "what the heck is this use case?" :-)

Generate a variable in a SQL statement

I would like to declare a variable in a SQL Oracle statement to work with it in the next lines. I write a simple statement as example:
SELECT customer.surname, LENGTH(customer.name) long, customer.age
FROM customer
WHERE long > 4;
I didn't found any "clear" info on the web, is that even possible?
The order of operations for a select statement is not the same order in which it is written.
FROM (including joins and subqueries but then in the order of operation starts over for that subquery; like order of operations in algebra; inside out )
WHERE
GROUP BY
SELECT
HAVING
ORDER BY
There are some exceptions to the above as not all engines process quite this way. It appears you may be able to use an alias in a group by if you're using mySQL. I'm not familiar enough to know if it changes the processing or if mySQL is just looking ahead.
In this order you can see the where executes before the 'long' alias is generated, so the DB Engine doesn't know what long is at the time it's being executed. Put another way, long is not in scope at the time the where clause is being evaluated.
This can be solved by simply repeating the calculation in the where clause or nesting queries; but the latter is less efficient.
In the below I:
Aliased customer as c to save typing and improve readability.
re-wrote the where clause to use the formula instead of the alias
renamed your long alias due to reserved/keyword use.
.
SELECT c.surname, LENGTH(customer.name) as Name_Len, c.age
FROM customer as c
WHERE LENGTH(c.name)> 4;
In this next example we use the with key word to generate a set of data called CTE (Common Table Expression) with the length of the name calculated. This in effect changes the the order in which the where clause is processed.
In this case the FROM is processed in the CTE then the select including our calculated value but no where clause is applied. Then a second query is run selecting from the CTE data set with the where clause. Since the first dataset already calculated the Name_Len, we can now use it in the where clause.
WITH CTE AS (SELECT c.surname, LENGTH(customer.name) as Name_Len, c.age
FROM customer as c)
SELECT *
FROM CTE
WHERE Name_Len > 4;
This could also be done as a subquery; but after you nest a few of those, you can see using a with may make it easier to read/maintain.
SELECT CTE.*
FROM (SELECT c.surname, LENGTH(customer.name) as Name_Len, c.age
FROM customer as c) as CTE
WHERE CTE.Name_Len > 4;
The way you asked the question is incorrect though there is a solution to your problem in SQL.
SELECT *
FROM (SELECT customer.surname,
LENGTH (customer.name) col_long,
customer.age
FROM customer)
WHERE col_long > 4;
The sub-query here is called in-line view. For more details check Oracle documentation online.
Also, LONG is a reserved keyword, so either rename it or use like "long".
Have you searched online? This is literally covered everywhere... Something like this probably;
DECLARE aVariable NUMBER;
BEGIN
SELECT someColumn INTO aVariable FROM aTable;
END;

SQL pattern to get "and" list of multiple-row matches?

I'm not a database programmer, but I have a simple database-backed app where I have items with tags. Each item may have multiple tags, so I'm using a typical junction table (like this), where each row represents the fact that the item with the appropriate ID has the tag with the appropriate ID.
This works very logically when I want to do something like select all items with a given tag.
But, what is the typical pattern for doing AND searches? That is, what if I want to find all items which have all of a certain set of tags? This is such a common operation that I'd think some of the intro tutorials would cover it, but I guess I'm not looking in the right places.
The approach I tried was to use INTERSECT, first directly and then with subqueries and IN. This works, but builds up long-seeming queries quickly as I add search terms. And, crucially, this approach appears to be about an order of magnitude slower than the approach of shoving all the tags as text into one "tags" column and using SQLite's full-text search. (And, as I would expect/hope, the FTS search gets faster as I add more terms, which doesn't seem to be the case with the INTERSECTS approach.)
What's the proper design pattern here, and what's the right way to make it snappy? I'm using SQLite in this case, but I'm most interested in a general answer, since this must be a common thing to do.
The following is the standard ANSI SQL solution which avoids synchronizing the number of ids and the ids themselves.
with tag_ids (tid) as (
values (1), (2)
)
select id
from tags
where id (select tid from tag_ids)
having count(*) = (select count(*) from tag_ids);
The values clause ("row constructor") is supported by PostgreSQL and DB2. For database that don't support that, you can replace it with a simple "select", e.g. in Oracle this would be:
with tag_ids (tid) as (
select 1 as tid from dual
union all
select 2 from dual
)
select id
from tags
where id (select tid from tag_ids)
having count(*) = (select count(*) from tag_ids);
For SQL Server you would simply leave out the "from dual", as it does not require a FROM clause for a SELECT.
This assumes that one tag can only be assigned exactly once. If that isn't the case, you would need to use a count(distinct id) in the having clause.
I would be inclined to use a group by:
select id
from tags
where id in (<tag1>, <tag2>)
group by id
having count(*) = 2
This would guarantee that both appear.
For an unlimited size list, you could store the ids in a string, such as '|tag1|tag2|tag3|' (note delimiters on ends). Then you can do:
select id
from tags
where #taglist like '%|'+tag+'|%'
group by id
having count(*) = len(#taglist) - (len(replace(#taglist, '|', '') - 1)
This is using SQL Server syntax. But, it is saying two things. The WHERE clause is saying that the tag is in the list. The HAVING clause is saying that the number of matches equals the length of the list. It does this with a trick, by counting the number of separtors and subtracting 1.

How to refer to a variable create in the course of executing a query in T-SQL, in the WHERE clause?

What the title says, really.
If I SELECT [statement] AS whatever, why can't I refer to the whatever column in the body of the WHERE clause? Is there some kind of a workaround? It's driving me crazy.
As far as I'm aware, you can't directly do this in SQL Server.
If you REALLY have to use your column alias in the WHERE clause, you can do this, but it seems like overkill to use a subquery just for the alias:
SELECT *
FROM
(
SELECT [YourColumn] AS YourAlias, etc...
FROM Whatever
) YourSubquery
WHERE YourAlias > 2
You're almost certainly better off just using the contents of the original column in your WHERE clause.
It has to do with the way a SELECT statement gets translated into an abstract query tree: the 'whatever' only appears in the query result projection part of the tree, which is above the filtering part of the tree, so the WHERE clause cannot understand the 'whatever'. This is not some internal implementation detail, it is a fundamental behavior of relational queries: the projection of the result occurs after the evaluation of the joins and filters.
IS really trivial to work around the 'problem' by making the hierarchy of the query explicit:
select ...
from (
select [something] as whatever
from ...
) as subquery
WHERE whatever = ...;
A common table expression can also server the same purpose:
with cte as (
select [something] as whatever
from ...)
select ... from cte
WHERE whatever = ...;
It's to do with the order of operations in the select statement. The WHERE clause is evaluated before the SELECT clause so this information isn't available. Although it is available in the ORDER BY clause as this is processed last.
As others have mentioned, a sub-query will get around this problem.

Why use Select Top 100 Percent?

I understand that prior to SQL Server 2005, you could "trick" SQL Server to allow use of an order by in a view definition, by also include TOP 100 PERCENT in the SELECT clause. But I have seen other code which I have inherited which uses SELECT TOP 100 PERCENT ... within dynamic SQL statements (used in ADO in ASP.NET apps, etc). Is there any reason for this? Isn't the result the same as not including the TOP 100 PERCENT?
It was used for "intermediate materialization (Google search)"
Good article: Adam Machanic: Exploring the secrets of intermediate materialization
He even raised an MS Connect so it can be done in a cleaner fashion
My view is "not inherently bad", but don't use it unless 100% sure. The problem is, it works only at the time you do it and probably not later (patch level, schema, index, row counts etc)...
Worked example
This may fail because you don't know in which order things are evaluated
SELECT foo From MyTable WHERE ISNUMERIC (foo) = 1 AND CAST(foo AS int) > 100
And this may also fail because
SELECT foo
FROM
(SELECT foo From MyTable WHERE ISNUMERIC (foo) = 1) bar
WHERE
CAST(foo AS int) > 100
However, this did not in SQL Server 2000. The inner query is evaluated and spooled:
SELECT foo
FROM
(SELECT TOP 100 PERCENT foo From MyTable WHERE ISNUMERIC (foo) = 1 ORDER BY foo) bar
WHERE
CAST(foo AS int) > 100
Note, this still works in SQL Server 2005
SELECT TOP 2000000000 ... ORDER BY...
TOP (100) PERCENT is completely meaningless in recent versions of SQL Server, and it (along with the corresponding ORDER BY, in the case of a view definition or derived table) is ignored by the query processor.
You're correct that once upon a time, it could be used as a trick, but even then it wasn't reliable. Sadly, some of Microsoft's graphical tools put this meaningless clause in.
As for why this might appear in dynamic SQL, I have no idea. You're correct that there's no reason for it, and the result is the same without it (and again, in the case of a view definition or derived table, without both the TOP and ORDER BY clauses).
...allow use of an ORDER BY in a view definition.
That's not a good idea. A view should never have an ORDER BY defined.
An ORDER BY has an impact on performance - using it a view means that the ORDER BY will turn up in the explain plan. If you have a query where the view is joined to anything in the immediate query, or referenced in an inline view (CTE/subquery factoring) - the ORDER BY is always run prior to the final ORDER BY (assuming it was defined). There's no benefit to ordering rows that aren't the final result set when the query isn't using TOP (or LIMIT for MySQL/Postgres).
Consider:
CREATE VIEW my_view AS
SELECT i.item_id,
i.item_description,
it.item_type_description
FROM ITEMS i
JOIN ITEM_TYPES it ON it.item_type_id = i.item_type_id
ORDER BY i.item_description
...
SELECT t.item_id,
t.item_description,
t.item_type_description
FROM my_view t
ORDER BY t.item_type_description
...is the equivalent to using:
SELECT t.item_id,
t.item_description,
t.item_type_description
FROM (SELECT i.item_id,
i.item_description,
it.item_type_description
FROM ITEMS i
JOIN ITEM_TYPES it ON it.item_type_id = i.item_type_id
ORDER BY i.item_description) t
ORDER BY t.item_type_description
This is bad because:
The example is ordering the list initially by the item description, and then it's reordered based on the item type description. It's wasted resources in the first sort - running as is does not mean it's running: ORDER BY item_type_description, item_description
It's not obvious what the view is ordered by due to encapsulation. This does not mean you should create multiple views with different sort orders...
If there is no ORDER BY clause, then TOP 100 PERCENT is redundant. (As you mention, this was the 'trick' with views)
[Hopefully the optimizer will optimize this away.]
I have seen other code which I have inherited which uses SELECT TOP 100 PERCENT
The reason for this is simple: Enterprise Manager used to try to be helpful and format your code to include this for you. There was no point ever trying to remove it as it didn't really hurt anything and the next time you went to change it EM would insert it again.
No reason but indifference, I'd guess.
Such query strings are usually generated by a graphical query tool. The user joins a few tables, adds a filter, a sort order, and tests the results. Since the user may want to save the query as a view, the tool adds a TOP 100 PERCENT. In this case, though, the user copies the SQL into his code, parameterized the WHERE clause, and hides everything in a data access layer. Out of mind, out of sight.
Kindly try the below, Hope it will work for you.
SELECT TOP
( SELECT COUNT(foo)
From MyTable
WHERE ISNUMERIC (foo) = 1) *
FROM bar WITH(NOLOCK)
ORDER BY foo
WHERE CAST(foo AS int) > 100
)
The error says it all...
Msg 1033, Level 15, State 1, Procedure TestView, Line 5 The ORDER BY
clause is invalid in views, inline functions, derived tables,
subqueries, and common table expressions, unless TOP, OFFSET or FOR
XML is also specified.
Don't use TOP 100 PERCENT, use TOP n, where N is a number
The TOP 100 PERCENT (for reasons I don't know) is ignored by SQL Server VIEW (post 2012 versions), but I think MS kept it for syntax reasons. TOP n is better and will work inside a view and sort it the way you want when a view is used initially, but be careful.
I would suppose that you can use a variable in the result, but aside from getting the ORDER BY piece in a view, you will not see a benefit by implicitly stating "TOP 100 PERCENT":
declare #t int
set #t=100
select top (#t) percent * from tableOf
Just try this, it explains it pretty much itself. You can't create a view with an ORDER BY except if...
CREATE VIEW v_Test
AS
SELECT name
FROM sysobjects
ORDER BY name
GO
Msg 1033, Level 15, State 1, Procedure TestView, Line 5 The ORDER BY
clause is invalid in views, inline functions, derived tables,
subqueries, and common table expressions, unless TOP, OFFSET or FOR
XML is also specified.