Bizarre behavior of a query - sql

I found this query from a developer:
DELETE FROM [MYDB].[dbo].[MYSIGN] where USERID in
(select USERID from [MYDB].[dbo].[MYUSER] where Surname = 'Rossi');
This query deletes every record in table MYSIGN.
The field USERID does not exists in table MYUSER. If I run only the subquery:
select USERID from [MYDB].[dbo].[MYUSER] where Surname = 'Rossi'
It throws the right error, because the missing column.
We corrected the query using the right column, but we didn't figure out:
Why the first query works?
Why it deletes every record?
Specs: database is on a SQL SERVER 2016 SP1, CU3.

Apparently you have USERID in [MYDB].[dbo].[MYSIGN] so it's exactly how sql-server resolves unprefixed USERID in (select USERID from [MYDB].[dbo].[MYUSER] where Surname = 'Rossi') - it resolves it to [MYDB].[dbo].[MYSIGN].USERID
Use aliases and it will fail
DELETE FROM [MYDB].[dbo].[MYSIGN] where USERID in
(select t.USERID from [MYDB].[dbo].[MYUSER] t where Surname = 'Rossi');
It's something referred as "accidental correlated sub-query" as #NenadZivkovic named it, i like the term.

The problem is the scoping rules of subqueries. If the column is not found in the subquery tables, then the SQL engine starts looking at the next level out -- and so on (in the case of SQL Server).
Whenever you have multiple tables in a query, always qualify the column names. This means, put the table name (or alias) with the column alias. Then you have no ambiguity:
DELETE
FROM [MYDB].[dbo].[MYSIGN] m
WHERE m.USERID IN (SELECT u.USERID FROM [MYDB].[dbo].[MYUSER] u WHERE u.Surname = 'Rossi');
A simple rule to follow that makes your code more readable and less prone to error.

Related

how to optimize SQL sub queries?

This my scenario I tried to search a record in the SQL table using the name. So, I tried to create a subquery and I used like operator also in Postgres. SQL query It's working fine. but it's taking so much time. So, I checked why it's taking so much time. the reason is the subquery. In the subquery it hitting all the records in the table. How to optimize subquery.
SQL QUery
SELECT
id, latitude,longitude,first_name,last_name,
contact_company_id,address,address2,city,state_id, zip,country_id,default_phone_id,last_contacted,image,contact_type_id
FROM contact
WHERE company_id = 001
AND contact_company_id IN (select id from contactcompany where lower( name ) ~*'jack')
So, I tried to run this query it's taking 2 seconds and it hit all records in the contact company table that only It's takes time.
How to optimize subquery using SQL?
Please try a sub query as a inner join with a main table, both query give same result.
Example here :
SELECT contact.id,
contact.latitude,
contact.longitude,
contact.first_name,
contact.last_name,
contact.contact_company_id,
contact.address,
contact.address2,
contact.city,
contact.state_id,
contact.zip,
contact.country_id,
contact.default_phone_id,
contact.last_contacted,
contact.image,
contact.contact_type_id
FROM contact As contact
Inner Join contactcompany As contactcompany On contactcompany.id = contact_company_id
WHERE company_id = 001
AND lower( name ) ~*'jack'
I would start by writing the query using exists. Then, company_id is either a string or a number. Let met guess that it is a string, because the constant is represented with leading zeros. If so, use single quotes:
SELECT c.*
FROM contact c
WHERE company_id = '001' AND
EXISTS (SELECT 1
FROM contactcompany cc
WHERE cc.name ~* 'jack' AND
cc.id = c.contact_company_id
);
Then an index on contact(compnay_id, contact_company_id) makes sense. And for the subquery, contactcompany(id, name).
There may be other alternatives for writing the query, but your question has not provided much information on table sizes, current performance, or the data types.

Difference between DELETE and DELETE FROM in SQL?

Is there one? I am researching some stored procedures, and in one place I found the following line:
DELETE BI_Appointments
WHERE VisitType != (
SELECT TOP 1 CheckupType
FROM BI_Settings
WHERE DoctorName = #DoctorName)
Would that do the same thing as:
DELETE FROM BI_Appointments
WHERE VisitType != (
SELECT TOP 1 CheckupType
FROM BI_Settings
WHERE DoctorName = #DoctorName)
Or is it a syntax error, or something entirely different?
Assuming this is T-SQL or MS SQL Server, there is no difference and the statements are identical. The first FROM keyword is syntactically optional in a DELETE statement.
http://technet.microsoft.com/en-us/library/ms189835.aspx
The keyword is optional for two reasons.
First, the standard requires the FROM keyword in the clause, so it would have to be there for standards compliance.
Second, although the keyword is redundant, that's probably not why it's optional. I believe that it's because SQL Server allows you to specify a JOIN in the DELETE statement, and making the first FROM mandatory makes it awkward.
For example, here's a normal delete:
DELETE FROM Employee WHERE ID = #value
And that can be shortened to:
DELETE Employee WHERE ID = #value
And SQL Server allows you to delete based on another table with a JOIN:
DELETE Employee
FROM Employee
JOIN Site
ON Employee.SiteID = Site.ID
WHERE Site.Status = 'Closed'
If the first FROM keyword were not optional, the query above would need to look like this:
DELETE FROM Employee
FROM Employee
JOIN Site
ON Employee.SiteID = Site.ID
WHERE Site.Status = 'Closed'
This above query is perfectly valid and does execute, but it's a very awkward query to read. It's hard to tell that it's a single query. It looks like two got mashed together because of the "duplicate" FROM clauses.
Side note: Your example subqueries are potentially non-deterministic since there is no ORDER BY clause.
Hi friends there is no difference between delete and delete from in oracle database it is optional, but this is standard to write code like this
DELETE FROM table [ WHERE condition ]
this is sql-92 standard. always develop your code in the standard way.

SQL Duplicate column name error

I am trying to find an error in a massive SQL statement (not mine) - I have cut a lot of it out to make it readable - even pared down it still throws the error
SELECT DISTINCT Profiles.ID
FROM
(select * from Profiles RIGHT JOIN FriendList ON (FriendList.Profile = 15237)
order by LastLoggedIn DESC ) as Profiles
This returns an error
Duplicate column name 'ID'
I have tested the the last part (select * from Profiles ... order by LastLoggedIn DESC) and it works fine by itself
I have tried to troubleshoot by changing column names in the DISTINCT section without any luck.
One solution I read was to remove the DISTINCT, but that didn't help.
I just can't see where the duplicate column error can be coming from. Could it be a database integrity problem?
Any help much appreciated.
Your Profile and FriendList tables both have an ID column. Because you say select *, you're getting two columns named ID in the sub-select which is aliased to Profiles, and SQL doesn't know which one Profiles.ID refers to (note that Profiles here is referring to the alias of the sub-query, not the table of the same name).
Since you only need the ID column, you can change it to this:
SELECT DISTINCT Profiles.ID FROM
( select Profiles.ID from Profiles RIGHT JOIN FriendList ON (FriendList.Profile = 15237)
order by LastLoggedIn DESC ) as Profiles
Replace the "select *" with "select col1, col2..." and the error should become apparent (i.e. multiple columns named "ID"). Nothing to do with distinct or database integrity.
you have a table called Profiles and you are "creating" a temp table called Profiles in your From, that would be my guess as to what is causing the problem. call your temp bananas and try SELECT DISTINCT bananas.ID FROM and see if that works
As the error says, each of the tables that you're joining together has a column named ID. You'll have to specify which ID column you want (Profiles.ID or FriendList.ID) or include ID in the join conditions.
Profiles and FriendList both have an ID column. You are asking to call the entire join "Profiles", and then using Profiles.ID, but SQL doesn't know which ID you mean.

How do I exclude or negate two queries?

I am new to SQL, so this is probably very simple, however, I wasn't able to find the solution.
Basically my query is as follows:
SELECT UserID
FROM Users
NOT UNION
SELECT UserID
FROM User_Groups
WHERE GroupID = '$_[0]'
However, I am not sure what the syntax is to exclude one query from another.
What I am trying to say is give me all the user ID's except for those that are in group X.
SELECT UserID FROM Users
WHERE UserID NOT IN (SELECT UserID FROM User_Groups WHERE GroupID = ?)
P.S. Don't interpolate variables into your queries as this can lead to SQL injection vulnerabilities in your code. Use placeholders instead.
SELECT Users.UserID
FROM Users
LEFT JOIN User_Groups ON Users.UserID = User_Groups.UserID
WHERE Users.GroupID = '$_[0]'
AND User_Groups.UserID IS NULL
You can left join to the other table and then put an IS NULL check on the other table in you WHERE clause as I've shown.
You could use EXCEPT as well:
SELECT UserID
FROM Users
EXCEPT
SELECT UserID
FROM User_Groups
WHERE GroupID = '$_[0]'
EXCEPT is SQL's version of set subtraction. Which of the various approaches (EXCEPT, NOT IN, ...) you should use depends, as usual, on your specific circumstances, what your database supports, and which one works best for you.
And eugene y has already mentioned the SQL injection issue with your code so I'll just consider that covered.
I linked to the PostgreSQL documentation even though this isn't a PostgreSQL question because the PostgreSQL documentation is quite good. SQLite does support EXCEPT:
The EXCEPT operator returns the subset of rows returned by the left SELECT that are not also returned by the right-hand SELECT. Duplicate rows are removed from the results of INTERSECT and EXCEPT operators before the result set is returned.
NOT IN() - Negating IN()
SELECT UserID FROM User_Groups WHERE GroupID NOT IN('1','2')
The IN() parameter can also be a sub-query.
Are you looking for a solution to be used with a postgres or a mySQL database?
Or are you looking for a plain SQL solution?
With postgres a subquery with "WHERE NOT EXISTS" might work like:
SELECT * FROM
(SELECT * FROM SCHEMA_NAME.TABLE_NAME)
WHERE
(NOT EXISTS (SELECT * FROM SCHEMA_NAME.TABLE_NAME)

SQL - table alias scope

I've just learned ( yesterday ) to use "exists" instead of "in".
BAD
select * from table where nameid in (
select nameid from othertable where otherdesc = 'SomeDesc' )
GOOD
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
And I have some questions about this:
1) The explanation as I understood was: "The reason why this is better is because only the matching values will be returned instead of building a massive list of possible results". Does that mean that while the first subquery might return 900 results the second will return only 1 ( yes or no )?
2) In the past I have had the RDBMS complainin: "only the first 1000 rows might be retrieved", this second approach would solve that problem?
3) What is the scope of the alias in the second subquery?... does the alias only lives in the parenthesis?
for example
select * from table t where exists (
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeDesc' )
AND
select nameid from othertable o where t.nameid = o.nameid and otherdesc = 'SomeOtherDesc' )
That is, if I use the same alias ( o for table othertable ) In the second "exist" will it present any problem with the first exists? or are they totally independent?
Is this something Oracle only related or it is valid for most RDBMS?
Thanks a lot
It's specific to each DBMS and depends on the query optimizer. Some optimizers detect IN clause and translate it.
In all DBMSes I tested, alias is only valid inside the ( )
BTW, you can rewrite the query as:
select t.*
from table t
join othertable o on t.nameid = o.nameid
and o.otherdesc in ('SomeDesc','SomeOtherDesc');
And, to answer your questions:
Yes
Yes
Yes
You are treading into complicated territory, known as 'correlated sub-queries'. Since we don't have detailed information about your tables and the key structures, some of the answers can only be 'maybe'.
In your initial IN query, the notation would be valid whether or not OtherTable contains a column NameID (and, indeed, whether OtherDesc exists as a column in Table or OtherTable - which is not clear in any of your examples, but presumably is a column of OtherTable). This behaviour is what makes a correlated sub-query into a correlated sub-query. It is also a routine source of angst for people when they first run into it - invariably by accident. Since the SQL standard mandates the behaviour of interpreting a name in the sub-query as referring to a column in the outer query if there is no column with the relevant name in the tables mentioned in the sub-query but there is a column with the relevant name in the tables mentioned in the outer (main) query, no product that wants to claim conformance to (this bit of) the SQL standard will do anything different.
The answer to your Q1 is "it depends", but given plausible assumptions (NameID exists as a column in both tables; OtherDesc only exists in OtherTable), the results should be the same in terms of the data set returned, but may not be equivalent in terms of performance.
The answer to your Q2 is that in the past, you were using an inferior if not defective DBMS. If it supported EXISTS, then the DBMS might still complain about the cardinality of the result.
The answer to your Q3 as applied to the first EXISTS query is "t is available as an alias throughout the statement, but o is only available as an alias inside the parentheses". As applied to your second example box - with AND connecting two sub-selects (the second of which is missing the open parenthesis when I'm looking at it), then "t is available as an alias throughout the statement and refers to the same table, but there are two different aliases both labelled 'o', one for each sub-query". Note that the query might return no data if OtherDesc is unique for a given NameID value in OtherTable; otherwise, it requires two rows in OtherTable with the same NameID and the two OtherDesc values for each row in Table with that NameID value.
Oracle-specific: When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query. When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query. See "Difference between IN and EXISTS in subqueries".
Probably.
Alias declared inside subquery lives inside subquery. By the way, I don't think your example with 2 ANDed subqueries is valid SQL. Did you mean UNION instead of AND?
Personally I would use a join, rather than a subquery for this.
SELECT t.*
FROM yourTable t
INNER JOIN otherTable ot
ON (t.nameid = ot.nameid AND ot.otherdesc = 'SomeDesc')
It is difficult to generalize that EXISTS is always better than IN. Logically if that is the case, then SQL community would have replaced IN with EXISTS...
Also, please note that IN and EXISTS are not same, the results may be different when you use the two...
With IN, usually its a Full Table Scan of the inner table once without removing NULLs (so if you have NULLs in your inner table, IN will not remove NULLS by default)... While EXISTS removes NULL and in case of correlated subquery, it runs inner query for every row from outer query.
Assuming there are no NULLS and its a simple query (with no correlation), EXIST might perform better if the row you are finding is not the last row. If it happens to be the last row, EXISTS may need to scan till the end like IN.. so similar performance...
But IN and EXISTS are not interchangeable...