Select In with invalid field [duplicate] - sql

As always, there will be a reasonable explanation for my surprise, but till then....
I have this query
delete from Photo where hs_id in (select hs_id from HotelSupplier where id = 142)
which executes just fine (later i found out that the entire photo table was empty)
but the strange thing: there is no field hs_id in HotelSupplier, it is called hs_key!
So when i execute the last part
select hs_id from HotelSupplier where id = 142
separately (select that part of the query with the mouse and hit F5), i get an error, but when i use it in the in clause, it doesn't!
I wonder if this is normal behaviour?

It is taking the value of hs_id from the outer query.
It is perfectly valid to have a query that doesn't project any columns from the selected table in its select list.
For example
select 10 from HotelSupplier where id = 142
would return a result set with as many rows as matched the where clause and the value 10 for all rows.
Unqualified column references are resolved from the closest scope outwards so this just gets treated as a correlated sub query.
The result of this query will be to delete all rows from Photo where hs_id is not null as long as HotelSupplier has at least one row where id = 142 (and so the subquery returns at least one row)
It might be a bit clearer if you consider what the effect of this is
delete from Photo where Photo.hs_id in (select Photo.hs_id)
This is of course equivalent to
delete from Photo where Photo.hs_id = Photo.hs_id
By the way this is far and away the most common "bug" that I personally have seen erroneously reported on Microsoft Connect. Erland Sommarskog includes it in his wishlist for SET STRICT_CHECKS ON

It's a strong argument for keeping column names consistent between tables. As #Martin says, the SQL syntax allows column names to be resolved from the outer query, when there's no match in the inner query. This is a boon when writing correlated subqueries, but can trip you up sometimes (as here)

Related

How does a DELETE FROM with a SELECT in the WHERE work?

I am looking at an application and I found this SQL:
DELETE FROM Phrase
WHERE Modified < (SELECT Modified FROM PhraseSource WHERE Id = Phrase.PhraseId)
The intention of the SQL is to delete rows from Phrase where there are more recent rows in the PhraseSource table.
Now I know the tables Phrase and PhraseSource have the same columns and Modified holds the number of seconds since 1970 but I cannot understand how/why this works or what it is doing. When I look at it then it seems like on the left of the < it is just one column and on the right side of the > it would be many rows. Does it even make any sense?
The two tables are identical and have the following structure
Id - GUID primary key
...
...
...
Modified int
the ... columns are about ten columns containing text and numeric data. The PhraseSource table may or may not contain more recent rows with a higher number in the Modified column and different text and numeric data.
The SELECT statement in parenthesis is a sub-query or nested query.
What happens is that for each row, the Modified column value is compared with the result of the sub-query (which is run once for each of the rows in the Phrase table).
The sub-query has a WHERE statement, so it finds a row that has the same ID as the row from Phrase table that we are currently evaluating and returns the Modified value (which is for a sigle row, actually a single scalar value).
The two Modified values are compared and in case the Phrase's row has been modified before the row in PhraseSource, it is deleted.
As you can see this approach is not efficient, because it requires the database to run a separate query for each of the rows in the Phrase table (although I imagine that some databases might be smart enough to optimize this a little bit).
A better solution
The more efficient solution would be to use INNER JOIN:
DELETE p FROM Phrase p
INNER JOIN PhraseSource ps
ON p.PhraseId=ps.Id
WHERE p.Modified < ps.Modified
This should do the exact same thing as your query, but using efficient JOIN mechanism. INNER JOIN uses the ON statement to choose how to "match" rows in two different tables (which is done very efficiently by the DB) and then again compares the Modified values of matching rows.

Inconsistent results from BigQuery: same query, different number of rows

I noticed today that one my query was having inconsistent results: every time I run it I have a different number of rows returned (cache deactivated).
Basically the query looks like this:
SELECT *
FROM mydataset.table1 AS t1
LEFT JOIN EACH mydataset.table2 AS t2
ON t1.deviceId=t2.deviceId
LEFT JOIN EACH mydataset.table3 AS t3
ON t2.email=t3.email
WHERE t3.email IS NOT NULL
AND (t3.date IS NULL OR DATE_ADD(t3.date, 5000, 'MINUTE')<TIMESTAMP('2016-07-27 15:20:11') )
The tables are not updated between each query. So I'm wondering if you also have noticed that kind of behaviour.
I usually make queries that return a lot of rows (>1000) so a few missing rows here and there is hardly noticeable. But this query return a few row, and it varies everytime between 10 and 20 rows :-/
If a Google engineer is reading this, here are two Job ID of the same query with different results:
picta-int:bquijob_400dd739_1562d7e2410
picta-int:bquijob_304f4208_1562d7df8a2
Unless I'm missing something, the query that you provide is completely deterministic and so should give the same result every time you execute it. But you say it's "basically" the same as your real query, so this may be due to something you changed.
There's a couple of things you can do to try to find the cause:
replace select * by an explicit selection of fields from your tables (a combination of fields that uniquely determine each row)
order the table by these fields, so that the order becomes the same each time you execute the query
simplify your query. In the above query, you can remove the first condition and turn the two left outer joins into inner joins and get the same result. After that, you could start removing tables and conditions one by one.
After each step, check if you still get different result sets. Then when you have found the critical step, try to understand why it causes your problem. (Or ask here.)

SQL query works even if it should fail [duplicate]

As always, there will be a reasonable explanation for my surprise, but till then....
I have this query
delete from Photo where hs_id in (select hs_id from HotelSupplier where id = 142)
which executes just fine (later i found out that the entire photo table was empty)
but the strange thing: there is no field hs_id in HotelSupplier, it is called hs_key!
So when i execute the last part
select hs_id from HotelSupplier where id = 142
separately (select that part of the query with the mouse and hit F5), i get an error, but when i use it in the in clause, it doesn't!
I wonder if this is normal behaviour?
It is taking the value of hs_id from the outer query.
It is perfectly valid to have a query that doesn't project any columns from the selected table in its select list.
For example
select 10 from HotelSupplier where id = 142
would return a result set with as many rows as matched the where clause and the value 10 for all rows.
Unqualified column references are resolved from the closest scope outwards so this just gets treated as a correlated sub query.
The result of this query will be to delete all rows from Photo where hs_id is not null as long as HotelSupplier has at least one row where id = 142 (and so the subquery returns at least one row)
It might be a bit clearer if you consider what the effect of this is
delete from Photo where Photo.hs_id in (select Photo.hs_id)
This is of course equivalent to
delete from Photo where Photo.hs_id = Photo.hs_id
By the way this is far and away the most common "bug" that I personally have seen erroneously reported on Microsoft Connect. Erland Sommarskog includes it in his wishlist for SET STRICT_CHECKS ON
It's a strong argument for keeping column names consistent between tables. As #Martin says, the SQL syntax allows column names to be resolved from the outer query, when there's no match in the inner query. This is a boon when writing correlated subqueries, but can trip you up sometimes (as here)

SQL error ORA 01427

I am trying to update one of the columns in my table by collecting the values from another table in the data store using this query
UPDATE tablename PT
SET DID = (select distinct(did) from datastore.get_dept_alias
where upper(ltrim(rtrim(deptalias))) = upper(ltrim(rtrim(PT."Dept Descr")))
AND cid = PT.CID)
Note: Both the column names in the table are the same as entered
I get ORA 01427 error. Any idea about the issue?
I am trying to understand the other posts of this ORA error
As you can see here
SQL Error: ORA-01427: single-row subquery returns more than one row
This means that your sub-query
select distinct(did) from datastore.get_dept_alias
where upper(ltrim(rtrim(deptalias))) = upper(ltrim(rtrim(PT."Dept Descr")))
AND cid = PT.CID)
is returning more than one row.
So, are you sure that distinct (did) is unique? Looks like it's not. I don't recommend using where rownum = 1 because you don't know which one of the values will be used to update; unless you use ORDER BY.
Your getting this error because your select statement can return more than one result. You can not update a single cell with a query that can potentially return more than one result.
A common approach to avoid this with many SQL languages is to use a top 1 or something like that to assure the engine that you will only return one result. Note that you have to do this even if you know the query will only return one result. Just because YOU know it doesn't mean that the engine knows it. The engine also has to protect you from future possibilities not just things as they are right this moment.
Update:
I noticed you updated your question to Oracle. So in that case you could limit the subquery to a single result using the where rownum = 1 clause. As other answer pointed out you'd have to use further logic to ensure that top 1 coming back is the right one. If you don't know which one is the right one then solve that first.
The thought also occurs to me that you might be misunderstanding what DISTINCT does. This ensures that the return results are unique - but there could still be multiple unique results.

sql server 2008 management studio not checking the syntax of my query

As always, there will be a reasonable explanation for my surprise, but till then....
I have this query
delete from Photo where hs_id in (select hs_id from HotelSupplier where id = 142)
which executes just fine (later i found out that the entire photo table was empty)
but the strange thing: there is no field hs_id in HotelSupplier, it is called hs_key!
So when i execute the last part
select hs_id from HotelSupplier where id = 142
separately (select that part of the query with the mouse and hit F5), i get an error, but when i use it in the in clause, it doesn't!
I wonder if this is normal behaviour?
It is taking the value of hs_id from the outer query.
It is perfectly valid to have a query that doesn't project any columns from the selected table in its select list.
For example
select 10 from HotelSupplier where id = 142
would return a result set with as many rows as matched the where clause and the value 10 for all rows.
Unqualified column references are resolved from the closest scope outwards so this just gets treated as a correlated sub query.
The result of this query will be to delete all rows from Photo where hs_id is not null as long as HotelSupplier has at least one row where id = 142 (and so the subquery returns at least one row)
It might be a bit clearer if you consider what the effect of this is
delete from Photo where Photo.hs_id in (select Photo.hs_id)
This is of course equivalent to
delete from Photo where Photo.hs_id = Photo.hs_id
By the way this is far and away the most common "bug" that I personally have seen erroneously reported on Microsoft Connect. Erland Sommarskog includes it in his wishlist for SET STRICT_CHECKS ON
It's a strong argument for keeping column names consistent between tables. As #Martin says, the SQL syntax allows column names to be resolved from the outer query, when there's no match in the inner query. This is a boon when writing correlated subqueries, but can trip you up sometimes (as here)