SQL IN query produces strange result - sql

Please see the table structure below:
CREATE TABLE Person (id int not null, PID INT NOT NULL, Name VARCHAR(50))
CREATE TABLE [Order] (OID INT NOT NULL, PID INT NOT NULL)
INSERT INTO Person VALUES (1,1,'Ian')
INSERT INTO Person VALUES (2,2,'Maria')
INSERT INTO [Order] values (1,1)
Why does the following query return two results:
select * from Person WHERE id IN (SELECT ID FROM [Order])
ID does not exist in Order. Why does the query above produce results? I would expect it to error because I'd does not exist in order.

This behavior, while unintuitive, is very well defined in Microsoft's Knowledge Base:
KB #298674 : PRB: Subquery Resolves Names of Column to Outer Tables
From that article:
To illustrate the behavior, use the following two table structures and query:
CREATE TABLE X1 (ColA INT, ColB INT)
CREATE TABLE X2 (ColC INT, ColD INT)
SELECT ColA FROM X1 WHERE ColA IN (Select ColB FROM X2)
The query returns a result where the column ColB is considered from table X1.
By qualifying the column name, the error message occurs as illustrated by the following query:
SELECT ColA FROM X1 WHERE ColA in (Select X2.ColB FROM X2)
Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'ColB'.
Folks have been complaining about this issue for years, but Microsoft isn't going to fix it. It is, after all, complying with the standard, which essentially states:
If you don't find column x in the current scope, traverse to the next outer scope, and so on, until you find a reference.
More information in the following Connect "bugs" along with multiple official confirmations that this behavior is by design and is not going to change (so you'll have to change yours - i.e. always use aliases):
Connect #338468 : CTE Column Name resolution in Sub Query is not validated
Connect #735178 : T-SQL subquery not working in some cases when IN operator used
Connect #302281 : Non-existent column causes subquery to be ignored
Connect #772612 : Alias error not being reported when within an IN operator
Connect #265772 : Bug using sub select
In your case, this "error" will probably be much less likely to occur if you use more meaningful names than ID, OID and PID. Does Order.PID point to Person.id or Person.PID? Design your tables so that people can figure out the relationships without having to ask you. A PersonID should always be a PersonID, no matter where in the schema it is; same with an OrderID. Saving a few characters of typing is not a good price to pay for a completely ambiguous schema.
You could write an EXISTS clause instead:
... FROM dbo.Person AS p WHERE EXISTS
(
SELECT 1 FROM dbo.[Order] AS o
WHERE o.PID = p.id -- or is it PID? See why it pays to be explicit?
);

The problem here is that you're not using Table.Column notation in your subquery, table Order doesn't have column ID and ID in subquery really means Person.ID, not [Order].ID. That's why I always insist on using aliases for tables in production code. Compare these two queries:
select * from Person WHERE id IN (SELECT ID FROM [Order]);
select * from Person as p WHERE p.id IN (SELECT o.ID FROM [Order] as o)
The first one will execute but will return incorrect results, and the second one will raise an error. It's because the outer query's columns may be referenced in a subquery, so in this case you can use Person columns inside the subquery.
Perhaps you wanted to use the query like this:
select * from Person WHERE pid IN (SELECT PID FROM [Order])
But you never know when the schema of the [Order] table changes, and if somebody drops the column PID from [Order] then your query will return all rows from the table Person. Therefore, use aliases:
select * from Person as P WHERE P.pid IN (SELECT O.PID FROM [Order] as O)
Just quick note - this is not SQL Server specific behaviour, it's standard SQL:
SQL Server demo
PostgreSQL demo
MySQL demo
Oracle demo

Order table doesnt have id column
Try these instead:
select * from Person WHERE id IN (SELECT OID FROM [Order])
OR
select * from Person WHERE pid IN (SELECT PID FROM [Order])

Related

SQL - Operator IN with operator WITH AS

Using postgres
The following SQL creates TestsTodo, having all the informations about a table tests.
It also have a TestsTodoIds, having only the ids of TestsTodo
Then, i want to update all the rows of a table "test_results", with all the ids in TestsTodoIds. I can do it by setting "WHERE test_id IN (select id from TestsTodo)" But i cant do it with "WHERE test_id IN TestsTodoIds", which is basically the same, i don't understand why.
WITH
TestsTodo AS
(
-- Selecting from table tests
),
TestsTodoIds AS
(
SELECT id FROM TestsTodo -- This returns all the. ids from TestsTodo
)
--UPDATE test_results
--SET status = 'FOUND_IN_DB'
--WHERE test_id IN (SELECT id FROM TestsTodo)
--RETURNING *
-- This works
UPDATE test_results
SET status = 'FOUND_IN_DB'
WHERE test_id IN TestsTodoIds
RETURNING * -- This does not
Error: ERROR: syntax error at or near "TestsTodoIds"
LINE 31: WHERE test_id IN TestsTodoIds
^
Your CTE called TestsTodoIds is a virtual table with one column. It isn't a set of values, and IN needs a set of values. You could use
WHERE test_id IN (SELECT id FROM TestsTodoIds)
and your query would function correctly.
That CTE is, in my opinion, unnecessary. The WHERE clause I suggested will perform exactly the same as
WHERE test_id IN (SELECT id FROM TestsTodo)
and you'll have less complexity to cope with when reading and reasoning about the query.

SELECT query to return a row from a table with all values set to Null

I need to make a query but get the value in every field empty. Gordon Linoff give me the clue to this need here:
SQL Empty query results
which is:
select t.*
from (select 1 as val
) v left outer join
table t
on 1 = 0;
This query wors perfectly on PostgreSQL but gets an error when trying to execute it in Microsoft Access, it says that 1 = 0 expression is not admitted. How could it be fixed to work on microsoft access?
Regards,
If the table has a numeric primary key column whose values are non-negative then the following query will work in Access. The primary key field is [ID].
SELECT t2.*
FROM
myTable AS t2
RIGHT JOIN
(
SELECT TOP 1 (ID * -1) AS badID
FROM myTable AS t1
) AS rowStubs
ON t2.ID = rowStubs.badID
This was tested with Access 2010.
I am offering this answer here, even though you didn't think it worked in my edit to your original question. What is the problem?
select t.*
from (select max(col) as maxval from table as t
) as v left join
table as t
on v.val < t.col;
You can use the following query, but it would still need a little "manual coding".
EDITS:
Actually, you do not need the SWITCH function. Modified query below.
Removed the reference to Description column from one line. Still, you would need to use a Text column name (such as Description) in the last line of the query.
For example, the following query would work for the Months table:
select Months.*
from Months
RIGHT OUTER JOIN
(select "" as DummyColumn from Months) Blank_Data
ON Months.Description = Blank_Data.DummyColumn; --hardcoded Description column

The result of a scalar fullselect, SELECT INTO statement, or VALUES INTO statement is more than one row

I have a problem about my query and i dont know what to do about it.Here is my query.and error i take.
select prd.product_id,prd.prd_name ,prd.prd_longname ,prd.prd_brand ,prd.prd_picture ,prd.market_comment ,prd.categ ,prd.status_id ,prd.status ,prd.active_stock ,prd.slot_date ,prd.currency ,prd.selling_price ,prd.old_price ,prd.type_of_sell ,prd.catalog_id ,prd.catalog_name ,prd.demo ,prd.demo_id,
(select coalesce(count(prd_attribute_id),0) from PRD_ATTRIBUTE where status_id = 1 and product_id = prd.product_id and batch_code <> '0000') as ATTR_CNT ,
(select prd_attribute_id from PRD_ATTRIBUTE where product_id = prd.product_id and batch_code = '0000' and status_id = 1),
(select categ_url from DBNAME.PRD_CATEGORY
where parameter_id = prd.categ_id)||'/'|| (select prd_url from DBNAME.PRODUCT_URL where product_id = prd.product_id) as CATEG_URL
from TEMP_WEB_PRD prd
order by slotdate desc
fetch first 12 rows only
Error:
[IBM][CLI Driver][DB2/AIX64] SQL0811N The result of a scalar fullselect, SELECT INTO statement, or VALUES INTO statement is more than one row. SQLSTATE=21000
The error message is pretty self-explanatory. One of your sub-selects is returning more than one row back, and the database doesn't know how to handle that. I'm guessing your database is DB2 on Linux/Unix/Windows, based on the error message, so here's the Info Center article on your error.
Yes in short it's due to you are using "=" but duplicate rows returned from sub-select statement.
Suppose you have a simple table:
create table T1 (ID int not null primary key, FID int);
The following statement may return SQL0811N if FID column references the same ID values multiple times.
db2 "select id from T1 where ID=(select fid from T1)"
ID
SQL0811N The result of a scalar fullselect, SELECT INTO statement, or VALUES
INTO statement is more than one row. SQLSTATE=21000
The following statement will run successfully:
db2 "select id from T1 where ID IN (select fid from T1)"
Maybe one of your tables has duplicate data, make sure you've checked it. I have the same problem too, it is caused by duplicate rows and it makes subquery return more than one row.

Alternative to NOT IN()

I have a table with 14,028 rows from November 2012. I also have a table with 13,959 rows from March 2013. I am using a simple NOT IN() clause to see who has left:
select * from nov_2012 where id not in(select id from mar_2013)
This returned 396 rows and I never thought anything of it, until I went to analyze who left. When I pulled all the ids for the lost members and put them in a temp table (##lost), 32 of them were actually still in the mar_2013 table. I can pull them up when I search for their ids using the following:
select * from mar_2013 where id in(select id from ##lost)
I can't figure out what is going on. I will mention that the id field I created is an IDENTITY column. Could that have any effect on the matching using NOT IN? Is there a better way to check for missing rows between tables? I have also tried:
select a.* from nov_2012 a left join mar_2013 b on b.id = a.id where b.id is NULL
And received the same results.
This is how I created the identity field;
create table id_lookup( dateofcusttable date ,sin int ,sex varchar(12) ,scid int identity(777000,1))
insert into id_lookup (sin, sex) select distinct sin, sex from [Client Raw].dbo.cust20130331 where sin <> 0 order by sin, sex
This is how I added the scid into the march table:
select scid, rowno as custrowno
into scid_20130331
from [Client Raw].dbo.cust20130331 cust
left join id_lookup scid
on scid.sin = cust.sin
and scid.sex = cust.sex
update scid_20130331
set scid = custrowno where scid is NULL --for members who don't have more than one id or sin information is not available
drop table Account_Part2_Current
select a.*, scid
into Account_Part2_Current
from Account_Part1_Current a
left join scid_20130331 b
on b.custrowno = a.rowno_custdmd_cust
I then group all the information by the scid
I would prefer this form (and here's why):
SELECT a.id --, other columns
FROM dbo.nov_2012 AS a
WHERE NOT EXISTS (SELECT 1 FROM dbo.mar_2013 WHERE id = a.id);
However this should still give the same results as what you've tried, so I suspect there is something about the data model that you're not telling us - for example, is mar_2013.id nullable?
this is logically equivalent to not in and is faster than not in.
where yourfield in
(select afield
from somewhere
minus
select
thesamefield
where you want to exclude the record
)
It probably isn't as fast as using where not exists, as per Aaron's answer so you should only use it if not exists does not provide the results you want.

How can I stop joins from adding rows in my match query?

I'm having difficulty translating what I want into functional programming, since I think imperatively. Basically, I have a table of forms, and a table of expectations. In the Expectation view, I want it to look through the forms table and tell me if each one found a match. However, when I try to use joins to accomplish this, the joins are adding rows to the Expectation table when two or more forms match. I do not want this.
In an imperative fashion, I want the equivalent of this:
ForEach (row in Expectation table)
{
if (any form in the Form table matches the criteria)
{
MatchID = form.ID;
SignDate = form.SignDate;
...
}
}
What I have in SQL is this:
SELECT
e.*, match.ID, match.SignDate, ...
FROM
POFDExpectation e LEFT OUTER JOIN
(SELECT MIN(ID) as MatchID, MIN(SignDate) as MatchSignDate,
COUNT(*) as MatchCount, ...
FROM Form f
GROUP BY (matching criteria columns)
) match
ON (form.[match criteria] = expectation.[match criteria])
Which works okay, but very slowly, and every time there are TWO matches, a row is added to the Expectation results. Mathematically I understand that a join is a cross multiply and this is expected, but I'm unsure how to do this without them. Subquery perhaps?
I'm not able to give too many further details about the implementation, but I'll be happy to try any suggestion and respond with the results. I have 880 Expectation rows, and 942 results being returned. If I only allow results that match one form, I get 831 results. Neither are desirable, so if yours gets me to exactly 880, yours is the accepted answer.
Edit: I am using SQL Server 2008 R2, though a generic solution would be best.
Sample code:
--DROP VIEW ExpectationView; DROP TABLE Forms; DROP TABLE Expectations;
--Create Tables and View
CREATE TABLE Forms (ID int IDENTITY(1,1) PRIMARY KEY, ReportYear int, Name varchar(100), Complete bit, SignDate datetime)
GO
CREATE TABLE Expectations (ID int IDENTITY(1,1) PRIMARY KEY, ReportYear int, Name varchar(100))
GO
CREATE VIEW ExpectationView AS select e.*, filed.MatchID, filed.SignDate, ISNULL(filed.FiledCount, 0) as FiledCount, ISNULL(name.NameCount, 0) as NameCount from Expectations e LEFT OUTER JOIN
(select MIN(ID) as MatchID, ReportYear, Name, Complete, Min(SignDate) as SignDate, COUNT(*) as FiledCount from Forms f GROUP BY ReportYear, Name, Complete) filed
on filed.ReportYear = e.ReportYear AND filed.Name like '%'+e.Name+'%' AND filed.Complete = 1 LEFT OUTER JOIN
(select MIN(ID) as MatchID, ReportYear, Name, COUNT(*) as NameCount from Forms f GROUP BY ReportYear, Name) name
on name.ReportYear = e.ReportYear AND name.Name like '%'+e.Name+'%'
GO
--Insert Text Data
INSERT INTO Forms (ReportYear, Name, Complete, SignDate)
SELECT 2011, 'Bob Smith', 1, '2012-03-01' UNION ALL
SELECT 2011, 'Bob Jones', 1, '2012-10-04' UNION ALL
SELECT 2011, 'Bob', 1, '2012-07-20'
GO
INSERT INTO Expectations (ReportYear, Name)
SELECT 2011, 'Bob'
GO
SELECT * FROM ExpectationView --Should only return 1 result, returns 9
The 'filed' shows that they have completed a form, 'name' shows that they may have started one but not finished it. My view has four different 'match criteria' - each a little more strict, and counts each. 'Name Only Matches', 'Loose Matches', 'Matches' (default), 'Tight Matches' (used if there are more than one default match.
This is how I do it when I want to keep to a JOIN-type query format:
SELECT
e.*,
match.ID,
match.SignDate,
...
FROM POFDExpectation e
OUTER APPLY (
SELECT TOP 1
MIN(ID) as MatchID,
MIN(SignDate) as MatchSignDate,
COUNT(*) as MatchCount,
...
FROM Form f
WHERE form.[match criteria] = expectation.[match criteria]
GROUP BY ID (matching criteria columns)
-- Add ORDER BY here to control which row is TOP 1
) match
It usually performs better as well.
Semantically, {CROSS|OUTER} APPLY (table-expression) specifies a table-expression that is called once for each row in the preceding table expressions of the FROM clause and then joined to them. Pragmatically, however, the compiler treats it almost identically to a JOIN.
The practical difference is that unlike a JOIN table-expression, the APPLY table-expression is dynamically re-evaluated for each row. So instead of an ON clause, it relies on its own logic and WHERE clauses to limit/match its rows to the preceding table-expressions. This also allows it to make reference to the column-values of the preceding table-expressions, inside its own internal subquery expression. (This is not possible in a JOIN)
The reason that we want this here, instead of a JOIN, is that we need a TOP 1 in the sub-query to limit its returned rows, however, that means that we need to move the ON clause conditions to the internal WHERE clause so that it will get applied before the TOP 1 is evaluated. And that means that we need an APPLY here, instead of the more usual JOIN.
#RBarryYoung answered the question as I asked it, but there was a second question that I didn't make very clear. What I really wanted was a combination of his answer and this question, so for the record here's what I used:
SELECT
e.*,
...
match.ID,
match.SignDate,
match.MatchCount
FROM
POFDExpectation e
OUTER APPLY (
SELECT TOP 1
ID as MatchID,
ReportYear,
...
SignDate as MatchSignDate,
COUNT(*) as MatchCount OVER ()
FROM
Form f
WHERE
form.[match criteria] = expectation.[match criteria]
-- Add ORDER BY here to control which row is TOP 1
) match