Is it possible to use the result of a subquery in a case statement of the same outer query? - sql

I am writing a search routine with a ranking algorithm and would like to get this in one pass.
My Ideal query would be something like this....
select *, (select top 1 wordposition
from wordpositions
where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502
) as WordPos,
case when WordPos<11 then 1 else case WordPos<50 then 2 else case WordPos<100 then 3 else 4 end end end end as rank
from items
Is it possible to use WordPos in a case right there? It's generating an error on me , Invalid column name 'WordPos'.
I know I can redo the subquery for each case but I think it would actually re-run the case wouldn't it?
For example:
select *, case when (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<11 then 1 else case (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<50 then 2 else case (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<100 then 3 else 4 end end end end as rank from items
That works....but is it really re-running the identical query each time?
It's hard to tell from the tests as the first time it runs it's slow but subsequent runs are quick....it's caching...so would that mean that the first time it ran it for the first row, the subsequent three times it would get the result from cache?
Just curious what the best way to do this would be...
Thank you!
Ryan

You can do this using a subquery. I will stick with your SQL Server syntax, even though the question is tagged mysql:
select i.*,
(case when WordPos < 11 then 1
when WordPos < 50 then 2
when WordPos < 100 then 3
else 4
end) as rank
from (select i.*,
(select top 1 wpwordposition
from wordpositions wp
where recordid=i.pk_itemid and wordid=79588 and nextwordid=64502
) as WordPos
from items i
) i;
This also simplifies the case statement. You do not need nested case statements to handle multiple conditions, just multiple where clauses.

No. Identifiers introduced in the output clause (the fact that it comes from a sub-query is irrelevant) cannot be used within the same SELECT statement.
Here are some solutions:
Rewrite the query using a JOIN1, This will eliminate the issue entirely and fits well with RA.
Wrap the entire SELECT with the sub-query within another SELECT with the case. The outer select can access identifiers introduced by the inner SELECT's output clause.
Use a CTE (if SQL Server). This is similar to #2 in that it allows an identifier to be introduced.
While "re-writing" the sub-query for each case is very messy it should still result in an equivalent plan - but view the query profile! - as the results of the query are non-volatile. As such the equivalent sub-queries can be safely moved by the query planner which should move the sub-query/sub-queries to a JOIN to avoid any "re-running" in the first place.
1 Here is a conversion to use a JOIN, which is my preferred method. (I find that if a query can't be written in terms of a JOIN "easily" then it might be asking for the wrong thing or otherwise be showing issues with schema design.)
select
wp.wordposition as WordPos,
case wp.wordposition .. as Rank
from items i
left join wordpositions wp
on wp.recordid = i.pk_itemid
where wp.wordid = 79588
and wp.nextwordid = 64502
I've made assumptions about the multiplicity here (i.e. that wordid is unique) which should be verified. If this multiplicity is not valid and not correctable otherwise (and you're indeed using SQL Server), then I'd recommend using ROW_NUMBER() and a CTE.

Related

SQL CASE WHEN- can I do a function within a function? New to SQL

SELECT
SP.SITE,
SYS.COMPANY,
SYS.ADDRESS,
SP.CUSTOMER,
SP.STATUS,
DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) AS MONTH_COUNT
CASE WHEN(MONTH_COUNT = 0 THEN MONTH_COUNT = DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES) AS DAY_COUNT)
ELSE NULL
END
FROM SALEPASSES AS SP
INNER JOIN SYSTEM AS SYS ON SYS.SITE = SP.SITE
WHERE STATUS IN (7,27,29);
I am still trying to understand SQL. Is this the right order to have everything? I'm assuming my datediff() is unable to work because it's inside case when. What I am trying to do, is get the day count if month_count is less than 1 (meaning it's less than one month and we need to count the days between the dates instead). I need month_count to run first to see if doing the day_count would even be necessary. Please give me feedback, I'm new and trying to learn!
Case is an expression, it returns a value, it looks like you should be doing this:
DAY_COUNT =
CASE WHEN DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) = 0
THEN DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES))
ELSE NULL END
You shouldn't actually need else null as NULL is the default.
Note also you [usually] cannot refer to a derived column in the same select
It appears that what you are trying to do is define the MonthCount column's value, and then reuse that value in another column's definition. (The Don't Repeat Yourself principle.)
In most dialects of SQL, you can't do that. Including MS SQL Server.
That's because SQL is a "declarative" language. This means that SQL Server is free to calculate the column values in any order that it likes. In turn, that means you're not allowed to do anything that would rely on one column being calculated before another.
There are two basic ways around that...
First, use CTEs or sub-queries to create two different "scopes", allowing you to define MonthCount before DayCount, and so reuse the value without retyping the definition.
SELECT
*,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
(
SELECT
*,
bar AS MonthCount
FROM
x
)
AS derive_month
The second main way is to somehow derive the value Before the SELECT block is evaluated. In this case, using APPLY to 'join' a single value on to each input row...
SELECT
x.*,
MonthCount,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
x
CROSS APPLY
(
SELECT
bar AS MonthCount
)
AS derive_month

Tips about optimizing this multi-layered (with many layers of subqueries) SQL query

I need your kind help with this SQL query. I have a query with 6 layers of subqueries that's currently structured like this. I am looking forward to advice how to:
Reduce the layers without repeating the same statement (for example, I could replace 'case when E>200' with '(Case when T2.BB >100 then B+C else B+D end) > 200' and write the statement in layer 1, hence eliminating layer2. I can't do this because in my raw queries I have a computed column that is based on another computed column in its sub-query, which is then calculated based on another computed column in the sub-sub-query... So repeating codes 5/6 time will confuse me and drive me crazy...
Avoid using select 2., select 1. while still keeping all the columns (F,E,A,B,C,D,T2.BB) in the final output. I want to do this because in my raw query there is 5 select.* -- I feel like this causes the servers to do much redundant work and slows down query execution.
Thanks very much for your help!
Select
2.*,
case when E > 200 then 'OK' else 'OH NO' end F
From
(Select
1.*,
Case when T2.BB >100 then B+C else B+D end E
From
(Select
A, B, C, D, T2.BB
From
T1
Join
T2 on T1.A = T2.AA) 1
) 2
try WITH statements, they look cleaner because they help to maintain the order in which you apply the logic, so it will look like this:
WITH
t1 as (
select ...
from src_table
)
,t2 as (
select *, ...
from t1
)
<<as much layers as needed>>
also if you need to reuse something in 2 different places you can reference the prior WITH statement from any following statement, i.e. encapsulate that logic in one place

CASE Clause on select clause throwing 'SQLCODE=-811, SQLSTATE=21000' Error

This query is very well working in Oracle. But it is not working in DB2. It is throwing
DB2 SQL Error: SQLCODE=-811, SQLSTATE=21000, SQLERRMC=null, DRIVER=3.61.65
error when the sub query under THEN clause is returning 2 rows.
However, my question is why would it execute in the first place as my WHEN clause turns to be false always.
SELECT
CASE
WHEN (SELECT COUNT(1)
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779) = 1
THEN
(SELECT ST.FACILITY_ALIAS_ID
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779
)
ELSE NULL
END STAPPFAC
FROM SHIPMENT SHIPMENT
WHERE SHIPMENT.SHIPMENT_ID IN (2779);
The SQL standard does not require short cut evaluation (ie evaluation order of the parts of the CASE statement). Oracle chooses to specify shortcut evaluation, however DB2 seems to not do that.
Rewriting your query a little for DB2 (8.1+ only for FETCH in subqueries) should allow it to run (unsure if you need the added ORDER BY and don't have DB2 to test on at the moment)
SELECT
CASE
WHEN (SELECT COUNT(1)
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779) = 1
THEN
(SELECT ST.FACILITY_ALIAS_ID
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779
ORDER BY ST.SHIPMENT_ID
FETCH FIRST 1 ROWS ONLY
)
ELSE NULL
END STAPPFAC
FROM SHIPMENT SHIPMENT
WHERE SHIPMENT.SHIPMENT_ID IN (2779);
Hmm... you're running the same query twice. I get the feeling you're not thinking in sets (how SQL operates), but in a more procedural form (ie, how most common programming languages work). You probably want to rewrite this to take advantage of how RDBMSs are supposed to work:
SELECT Current_Stop.facility_alias_id
FROM SYSIBM/SYSDUMMY1
LEFT JOIN (SELECT MAX(Stop.facility_alias_id) AS facility_alias_id
FROM Stop
JOIN Facility
ON Facility.facility_id = Stop.facility_id
AND Facility.is_dock_sched_fac = 1
WHERE Stop.shipment_id = 2779
HAVING COUNT(*) = 1) Current_Stop
ON 1 = 1
(no sample data, so not tested. There's a couple of other ways to write this based on other needs)
This should work on all RDBMSs.
So what's going on here, why does this work? (And why did I remove the reference to Shipment?)
First, let's look at your query again:
CASE WHEN (SELECT COUNT(1)
FROM STOP ST, FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC = 1
AND ST.SHIPMENT_ID = 2779) = 1
THEN (SELECT ST.FACILITY_ALIAS_ID
FROM STOP ST, FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC = 1
AND ST.SHIPMENT_ID = 2779)
ELSE NULL END
(First off, stop using the implicit-join syntax - that is, comma-separated FROM clauses - always explicitly qualify your joins. For one thing, it's way too easy to miss a condition you should be joining on)
...from this it's obvious that your statement is the 'same' in both queries, and shows what you're attempting - if the dataset has one row, return it, otherwise the result should be null.
Enter the HAVING clause:
HAVING COUNT(*) = 1
This is essentially a WHERE clause for aggregates (functions like MAX(...), or here, COUNT(...)). This is useful when you want to make sure some aspect of the entire set matches a given criteria. Here, we want to make sure there's just one row, so using COUNT(*) = 1 as the condition is appropriate; if there's more (or less! could be 0 rows!) the set will be discarded/ignored.
Of course, using HAVING means we're using an aggregate, the usual rules apply: all columns must either be in a GROUP BY (which is actually an option in this case), or an aggregate function. Because we only want/expect one row, we can cheat a little, and just specify a simple MAX(...) to satisfy the parser.
At this point, the new subquery returns one row (containing one column) if there was only one row in the initial data, and no rows otherwise (this part is important). However, we actually need to return a row regardless.
FROM SYSIBM/SYSDUMMY1
This is a handy dummy table on all DB2 installations. It has one row, with a single column containing '1' (character '1', not numeric 1). We're actually interested in the fact that it has only one row...
LEFT JOIN (SELECT ... )
ON 1 = 1
A LEFT JOIN takes every row in the preceding set (all joined rows from the preceding tables), and multiplies it by every row in the next table reference, multiplying by 1 in the case that the set on the right (the new reference, our subquery) has no rows. (This is different from how a regular (INNER) JOIN works, which multiplies by 0 in the case that there is no row) Of course, we only maybe have 1 row, so there's only going to be a maximum of one result row. We're required to have an ON ... clause, but there's no data to actually correlate between the references, so a simple always-true condition is used.
To get our data, we just need to get the relevant column:
SELECT Current_Stop.facility_alias_id
... if there's the one row of data, it's returned. In the case that there is some other count of rows, the HAVING clause throws out the set, and the LEFT JOIN causes the column to be filled in with a null (no data) value.
So why did I remove the reference to Shipment? First off, you weren't using any data from the table - the only column in the result set was from the subquery. I also have good reason to believe that there would only be one row returned in this case - you're specifying a single shipment_id value (which implies you know it exists). If we don't need anything from the table (including the number of rows in that table), it's usually best to remove it from the statement: doing so can simplify the work the db needs to do.

How can you use COUNT() in a comparison in a SELECT CASE clause in Sql Server?

Let's say you want do something along the following lines:
SELECT CASE
WHEN (SELECT COUNT(id) FROM table WHERE column2 = 4) > 0
THEN 1 ELSE 0 END
Basically just return 1 when there's one or more rows in the table, 0 otherwise. There has to be a grammatically correct way to do this. What might it be? Thanks!
Question: return 1 when there's one or more rows in the table, 0 otherwise:
In this case, there is no need for COUNT. Instead, use EXISTS, which rather than counting all records will return as soon as any is found, which performs much better:
SELECT CASE
WHEN EXISTS (SELECT 1 FROM table WHERE column2 = 4)
THEN 1
ELSE 0
END
Mahmoud Gammal posted an answer with an interesting approach. Unfortunately the answer was deleted due to the fact that it returned the count of records instead of just 1. This can be fixed using the sign function, leading to this more compact solution:
SELECT sign(count(*)) FROM table WHERE column2 = 4
I posted this because I find it an interesting approach. In production I'd usually end up with something close to RedFilter's answer.
You could do this:
SELECT CASE WHEN COUNT(ID) >=1 THEN 1 WHEN COUNT (ID) <1 THEN 0 END FROM table WHERE Column2=4
Reference:
http://msdn.microsoft.com/en-us/library/ms181765.aspx

Why does "non exists" SQL query work and "not in" doesn't

I spent some time trying to figure out why this query isn't pulling the results i expected:
SELECT * FROM NGS WHERE ESPSSN NOT IN (SELECT SSN FROM CENSUS)
finally i tried writing the query another way and this ended up getting the expected results:
SELECT * FROM NGS n WHERE NOT EXISTS (SELECT * FROM CENSUS WHERE SSN = n.ESPSSN)
The first query seems more appropriate and "correct". I use "in" and "not in" all the time for similar selects and have never had a problem that i know of.
If you write out the syntactic sugar, x not in (1,2,3) becomes:
x <> 1 AND x <> 2 AND x <> 3
So if the ssn column contains a null value, the first query is the equivalent of:
WHERE ESPSSN <> NULL AND ESPSSN <> ...
The result of the comparison with NULL is unknown, so the query would not return anything.
As Andomar said, beware of NULL values when using NOT IN
Also note that a query using the NOT IN predicate will always perform nested full table scans, whereas a query using NOT EXISTS can use an index within the sub-query, and be much faster as a result.