This query is very well working in Oracle. But it is not working in DB2. It is throwing
DB2 SQL Error: SQLCODE=-811, SQLSTATE=21000, SQLERRMC=null, DRIVER=3.61.65
error when the sub query under THEN clause is returning 2 rows.
However, my question is why would it execute in the first place as my WHEN clause turns to be false always.
SELECT
CASE
WHEN (SELECT COUNT(1)
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779) = 1
THEN
(SELECT ST.FACILITY_ALIAS_ID
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779
)
ELSE NULL
END STAPPFAC
FROM SHIPMENT SHIPMENT
WHERE SHIPMENT.SHIPMENT_ID IN (2779);
The SQL standard does not require short cut evaluation (ie evaluation order of the parts of the CASE statement). Oracle chooses to specify shortcut evaluation, however DB2 seems to not do that.
Rewriting your query a little for DB2 (8.1+ only for FETCH in subqueries) should allow it to run (unsure if you need the added ORDER BY and don't have DB2 to test on at the moment)
SELECT
CASE
WHEN (SELECT COUNT(1)
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779) = 1
THEN
(SELECT ST.FACILITY_ALIAS_ID
FROM STOP ST,
FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC=1
AND ST.SHIPMENT_ID = 2779
ORDER BY ST.SHIPMENT_ID
FETCH FIRST 1 ROWS ONLY
)
ELSE NULL
END STAPPFAC
FROM SHIPMENT SHIPMENT
WHERE SHIPMENT.SHIPMENT_ID IN (2779);
Hmm... you're running the same query twice. I get the feeling you're not thinking in sets (how SQL operates), but in a more procedural form (ie, how most common programming languages work). You probably want to rewrite this to take advantage of how RDBMSs are supposed to work:
SELECT Current_Stop.facility_alias_id
FROM SYSIBM/SYSDUMMY1
LEFT JOIN (SELECT MAX(Stop.facility_alias_id) AS facility_alias_id
FROM Stop
JOIN Facility
ON Facility.facility_id = Stop.facility_id
AND Facility.is_dock_sched_fac = 1
WHERE Stop.shipment_id = 2779
HAVING COUNT(*) = 1) Current_Stop
ON 1 = 1
(no sample data, so not tested. There's a couple of other ways to write this based on other needs)
This should work on all RDBMSs.
So what's going on here, why does this work? (And why did I remove the reference to Shipment?)
First, let's look at your query again:
CASE WHEN (SELECT COUNT(1)
FROM STOP ST, FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC = 1
AND ST.SHIPMENT_ID = 2779) = 1
THEN (SELECT ST.FACILITY_ALIAS_ID
FROM STOP ST, FACILITY FAC
WHERE ST.FACILITY_ID = FAC.FACILITY_ID
AND FAC.IS_DOCK_SCHED_FAC = 1
AND ST.SHIPMENT_ID = 2779)
ELSE NULL END
(First off, stop using the implicit-join syntax - that is, comma-separated FROM clauses - always explicitly qualify your joins. For one thing, it's way too easy to miss a condition you should be joining on)
...from this it's obvious that your statement is the 'same' in both queries, and shows what you're attempting - if the dataset has one row, return it, otherwise the result should be null.
Enter the HAVING clause:
HAVING COUNT(*) = 1
This is essentially a WHERE clause for aggregates (functions like MAX(...), or here, COUNT(...)). This is useful when you want to make sure some aspect of the entire set matches a given criteria. Here, we want to make sure there's just one row, so using COUNT(*) = 1 as the condition is appropriate; if there's more (or less! could be 0 rows!) the set will be discarded/ignored.
Of course, using HAVING means we're using an aggregate, the usual rules apply: all columns must either be in a GROUP BY (which is actually an option in this case), or an aggregate function. Because we only want/expect one row, we can cheat a little, and just specify a simple MAX(...) to satisfy the parser.
At this point, the new subquery returns one row (containing one column) if there was only one row in the initial data, and no rows otherwise (this part is important). However, we actually need to return a row regardless.
FROM SYSIBM/SYSDUMMY1
This is a handy dummy table on all DB2 installations. It has one row, with a single column containing '1' (character '1', not numeric 1). We're actually interested in the fact that it has only one row...
LEFT JOIN (SELECT ... )
ON 1 = 1
A LEFT JOIN takes every row in the preceding set (all joined rows from the preceding tables), and multiplies it by every row in the next table reference, multiplying by 1 in the case that the set on the right (the new reference, our subquery) has no rows. (This is different from how a regular (INNER) JOIN works, which multiplies by 0 in the case that there is no row) Of course, we only maybe have 1 row, so there's only going to be a maximum of one result row. We're required to have an ON ... clause, but there's no data to actually correlate between the references, so a simple always-true condition is used.
To get our data, we just need to get the relevant column:
SELECT Current_Stop.facility_alias_id
... if there's the one row of data, it's returned. In the case that there is some other count of rows, the HAVING clause throws out the set, and the LEFT JOIN causes the column to be filled in with a null (no data) value.
So why did I remove the reference to Shipment? First off, you weren't using any data from the table - the only column in the result set was from the subquery. I also have good reason to believe that there would only be one row returned in this case - you're specifying a single shipment_id value (which implies you know it exists). If we don't need anything from the table (including the number of rows in that table), it's usually best to remove it from the statement: doing so can simplify the work the db needs to do.
Related
I'm trying to get the bi-grams on a string column.
I've followed the approach here but Athena/Presto is giving me errors at the final steps.
Source code so far
with word_list as (
SELECT
transaction_id,
words,
n,
regexp_extract_all(f70_remittance_info, '([a-zA-Z]+)') as f70,
f70_remittance_info
FROM exploration_transaction
cross join unnest(regexp_extract_all(f70_remittance_info, '([a-zA-Z]+)')) with ordinality AS t (words, n)
where cardinality((regexp_extract_all(f70_remittance_info, '([a-zA-Z]+)'))) > 1
and f70_remittance_info is not null
limit 50 )
select wl1.f70, wl1.n, wl1.words, wl2.f70, wl2.n, wl2.words
from word_list wl1
join word_list wl2
on wl1.transaction_id = wl2.transaction_id
The specific issue I'm having is on the very last line, when I try to self join the transaction ids - it always returns zero rows. It does work if I join only by wl1.n = wl2.n-1 (the position on the array) which is useless if I can't constrain it to a same id.
Athena doesn't support the ngrams function by presto, so I'm left with this approach.
Any clues why this isn't working?
Thanks!
This is speculation. But I note that your CTE is using limit with no order by. That means that an arbitrary set of rows is being returned.
Although some databases materialize CTEs, many do not. They run the code independently each time it is referenced. My guess is that the code is run independently and the arbitrary set of 50 rows has no transaction ids in common.
One solution would be to add order by transacdtion_id in the subquery.
Why are these two SQL queries not equivalent? One uses a correlated subquery, the other uses group by. The first produces a little over 51000 rows from my database, the second nearly 66000. In both cases, I am simply trying to return all the parts meeting the stated condition, current revision only. A comparison of the output files shows that method #1 (oracle_test1.txt) fails to return quite a few values. Based on that, I can only assume that method #2 is correct. I have some code that has used method #1 for a long time, but it appears I will have to change it. My reasoning concerning the correlated subquery was that as the inner select is comparing the columns in the self join, it will find the max vaule for the prev value for all matches; then return that max prev value for use in the outer query. I designed that query long ago before becoming familiar with the use of group by. Any insights would be appreciated.
Query #1
select pobj_name, prev
from pfmc_part
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
and prev = (select max(prev) from pfmc_part a where a.pobj_name = pfmc_part.pobj_name)
order by pobj_name, prev"
Query #2
select pobj_name, max(prev) prev
from pfmc_part
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
group by pobj_name
order by pobj_name, prev"
Sample output:
Query #2 Query #1
P538512 B P538512 B
P538513 A P538513 A
P538514 C P538514 C
P538520 B
P538522 B P538522 B
P538525 A P538525 A
P538531 C P538531 C
P538533 A P538533 A
P538538 B
P538541 B
P538542 B
P538553 A P538553 A
P538569 A P538569 A
Query 1 is returning each of the max ids and then those that have a pmodel of the type specified within your where clause.
Whereas query 2 is selecting all items with a pmodel of the type specified in your where clause and each of the max ids of that.
You may have data which isn't the max id which satisfies your where clause in query 2 which is why it's being omitted in query 1
There are two differences and the rest of the answers focus on one. The "easy" difference is that the max() in the group by is affected by the filter clause. The max() in the other query has no filter, and so it might return no rows (when max(prev) is on a row otherwise filtered out by the where conditions).
In addition, the where version of the query might return duplicate rows when there are multiple rows with the same value of max(prev) for a given pobj_name. The group by will never return duplicate rows.
this query
select pobj_name, prev
from pfmc_part
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
and prev = (select max(prev) from pfmc_part a where a.pobj_name = pfmc_part.pobj_name)
order by pobj_name, prev"
has a where clause declaration causing it to return less rows -- specifically, only rows where prev = (subquery). that and prev makes it entirely different, and also assigns the value into prev in the first line
if you wanted them to be the more similar, you'd need to modify it like so
select pobj_name, prev, maxes.max
from pfmc_part
JOIN (select max(prev) as max from pfmc_part a where a.pobj_name = pfmc_part.pobj_name) maxes
where pmodel in ('PN-DWG', 'NO-DWG') and pstatus = 'RELEASED'
order by pobj_name, prev"
In query 1 you are ONLY selecting the rows whose prev field is equal to the max(prev) and in query 2 you are selecting all records ALONG WITH max(prev) that's meeting the conditions in the where and group by clause.
Basically, query 1 and query 2 have completely different where clauses. Hope this explains the missing records from query 1.
Your query #1 will certainly fail to return a row for a given pobj_name where maximum prev for that name does not correspond to a revision currently in the database. That could perhaps happen if a revision was skipped or if its row was deleted.
Your Query #2 does not suffer Query #1's limitation, and it may perform better on account of avoiding a correlated subquery. It would be inappropriate, however, if you wanted more data than just pobj_name and aggregate functions of the groups. And by the way, there's no point in including prev in the ORDER BY clause, since pobj_name will already be unique to each result row.
Overall, if the two queries happen to return similar results then that is a matter of the details of the data, not of the queries. They arrive at their results completely differently.
I am writing a search routine with a ranking algorithm and would like to get this in one pass.
My Ideal query would be something like this....
select *, (select top 1 wordposition
from wordpositions
where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502
) as WordPos,
case when WordPos<11 then 1 else case WordPos<50 then 2 else case WordPos<100 then 3 else 4 end end end end as rank
from items
Is it possible to use WordPos in a case right there? It's generating an error on me , Invalid column name 'WordPos'.
I know I can redo the subquery for each case but I think it would actually re-run the case wouldn't it?
For example:
select *, case when (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<11 then 1 else case (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<50 then 2 else case (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<100 then 3 else 4 end end end end as rank from items
That works....but is it really re-running the identical query each time?
It's hard to tell from the tests as the first time it runs it's slow but subsequent runs are quick....it's caching...so would that mean that the first time it ran it for the first row, the subsequent three times it would get the result from cache?
Just curious what the best way to do this would be...
Thank you!
Ryan
You can do this using a subquery. I will stick with your SQL Server syntax, even though the question is tagged mysql:
select i.*,
(case when WordPos < 11 then 1
when WordPos < 50 then 2
when WordPos < 100 then 3
else 4
end) as rank
from (select i.*,
(select top 1 wpwordposition
from wordpositions wp
where recordid=i.pk_itemid and wordid=79588 and nextwordid=64502
) as WordPos
from items i
) i;
This also simplifies the case statement. You do not need nested case statements to handle multiple conditions, just multiple where clauses.
No. Identifiers introduced in the output clause (the fact that it comes from a sub-query is irrelevant) cannot be used within the same SELECT statement.
Here are some solutions:
Rewrite the query using a JOIN1, This will eliminate the issue entirely and fits well with RA.
Wrap the entire SELECT with the sub-query within another SELECT with the case. The outer select can access identifiers introduced by the inner SELECT's output clause.
Use a CTE (if SQL Server). This is similar to #2 in that it allows an identifier to be introduced.
While "re-writing" the sub-query for each case is very messy it should still result in an equivalent plan - but view the query profile! - as the results of the query are non-volatile. As such the equivalent sub-queries can be safely moved by the query planner which should move the sub-query/sub-queries to a JOIN to avoid any "re-running" in the first place.
1 Here is a conversion to use a JOIN, which is my preferred method. (I find that if a query can't be written in terms of a JOIN "easily" then it might be asking for the wrong thing or otherwise be showing issues with schema design.)
select
wp.wordposition as WordPos,
case wp.wordposition .. as Rank
from items i
left join wordpositions wp
on wp.recordid = i.pk_itemid
where wp.wordid = 79588
and wp.nextwordid = 64502
I've made assumptions about the multiplicity here (i.e. that wordid is unique) which should be verified. If this multiplicity is not valid and not correctable otherwise (and you're indeed using SQL Server), then I'd recommend using ROW_NUMBER() and a CTE.
I have the following FireBird query:
update hrs h
set h.plan_week_id=
(select first 1 c.plan_week_id from calendar c
where c.calendar_id=h.calendar_id)
where coalesce(h.calendar_id,0) <> 0
(Intention: For records in hrs with a (non-zero) calendar_id
take calendar.plan_week_id and put it in hrs.plan_week_id)
The trick to select the first record in Oracle is to use WHERE ROWNUM=1, and if understand correctly I do not have to use ROWNUM in a separate outer query because I 'only' match ROWNUM=1 - thanks SO for suggesting Questions that may already have your answer ;-)
This would make it
update hrs h
set h.plan_week_id=
(select c.plan_week_id from calendar c
where (c.calendar_id=h.calendar_id) and (rownum=1))
where coalesce(h.calendar_id,0) <> 0
I'm actually using the 'first record' together with the selection of only one field to guarantee that I get one value back which can be put into h.plan_week_id.
Question: Will the above query work under Oracle as intended?
Right now, I do not have a filled Oracle DB at hand to run the query on.
Like Nicholas Krasnov said, you can test it in SQL Fiddle.
But if you ever find yourself about to use where rownum = 1 in a subquery, alarm bells should go off, because in 90% of the cases you are doing something wrong. Very rarely will you need a random value. Only when all selected values are the same, a rownum = 1 is valid.
In this case I expect calendar_id to be a primary key in calendar. Therefor each record in hrs can only have 1 plan_week_id selected per record. So the where rownum = 1 is not required.
And to answer your question: Yes, it will run just fine. Though the brackets around each where clause are also not required and in fact only confusing (me).
I have a table with actions that are being due in the future. I have a second table that holds all the cases, including the due date of the case. And I have a third table that holds numbers.
The problems is as follows. Our system automatically populates our table with future actions. For some clients however, we need to change these dates. I wanted to create an update query for this, and have this run through our scheduler. However, I am kind of stuck at the moment.
What I have on code so far is this:
UPDATE proxima_gestion p
SET fecha = (SELECT To_char(d.f_ult_vencim + c.hrem01, 'yyyyMMdd')
FROM deuda d,
c4u_activity_dates c,
proxima_gestion p
WHERE d.codigo_cliente = c.codigo_cliente
AND p.n_expediente = d.n_expediente
AND d.saldo > 1000
AND p.tipo_gestion_id = 914
AND p.codigo_oficina = 33
AND d.f_ult_vencim > sysdate)
WHERE EXISTS (SELECT *
FROM proxima_gestion p,
deuda d
WHERE p.n_expediente = d.n_expediente
AND d.saldo > 1000
AND p.tipo_gestion_id = 914
AND p.codigo_oficina = 33
AND d.f_ult_vencim > sysdate)
The field fecha is the current action date. Unfortunately, this is saved as a char instead of date. That is why I need to convert the date back to a char. F_ult_vencim is the due date, and hrem01 is the number of days the actions should be placed away from the due date. (for example, this could be 10, making the new date 10 days after the due date)
Apart from that, there are a few more criteria when we need to change the date (certain creditors, certain departments, only for future cases and starting from a certain amount, only for a certain action type.)
However, when I try and run this query, I get the error message
ORA-01427: single-row subquery returns more than one row
If I run both subqueries seperately, I get 2 results from both. What I am trying to accomplish, is that it connects these 2 queries, and updates the field to the new value. This value will be different for every case, as every due date will be different.
Is this even possible? And if so, how?
You're getting the error because the first SELECT is returning more than one row for each row in the table being updated.
The first thing I see is that the alias for the table in UPDATE is the same as the alias in both SELECTs (p). So all of the references to p in the subqueries are referencing proxima_gestion in the subquery rather than the outer query. That is, the subquery is not dependent on the outer query, which is required for an UPDATE.
Try removing "proxima_gestion p" from FROM in both subqueries. The references to p, then, will be to the outer UPDATE query.