Why 'where' statement seems to filter expected rows in SAS proc SQL? - sql

I full joined 2 tables first and then full joined a 3rd table, now I got 1000 more rows in the result. But when I added a where statement following the join process I can only get 200 more rows and it seems that some expected rows were filtered. I don't know what I've done wrong.
proc sql;
create table ECG.RECON as
select a.SUBJID as SUBJID_004 ,
a.VISIT as VISIT_004,
input(a.EGDAT, yymmdd10.) as EGDAT_004 ,
...
b.SUBJID as SUBJID_001 ,
...
c.DSDECOD
from
SOURCE.A a full join SOURCE.B b on
(a.SUBJID = b.SUBJID and a.VISIT = b.VISIT )
full join SOURCE.C as c on b.SUBJID = c.SUBJID
where c.EPOCH = "SCR" and c.DSDECOD ne "FAILURE" and a.TEST = "Inter";
quit;

Your where clause is causing empty rows to be filtered. Consider a simplified schema:
TableA
Col1 Col2
----------------
1 A
2 B
TableB
Col1 Col2
----------------
1 X
3 Y
And a simple full join with no filter:
SELECT *
FROM TableA AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1
Which will return
A.Col1 A.Col2 B.Col1 B.Col2
---------------------------------------
1 A 1 X
2 B NULL NULL
NULL NULL 3 Y
Now, if you apply a filter to anything from A, e.g. WHERE A.Col1 = 1, you'll get rid of the 2nd Row (probably as intended) since 2 <> 1, but you'll also remove the 3rd row, since A.Col is NULL, and NULL <> 1. As you have removed all rows with no matching record in TableA you have effectively turned your full join into a left join. If you then apply a further predicate on TableB, your left join becomes an inner join.
With Full joins, I find the easiest solution is to apply your filters before the join by using subqueries, e.g.:
SELECT *
FROM (SELECT * FROM TableA WHERE Col1 = 1) AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1;
Which removes the 2nd row, but still retains the 3rd row from the previous results:
A.Col1 A.Col2 B.Col1 B.Col2
---------------------------------------
1 A 1 X
NULL NULL 3 Y
You can also use OR, but the more predicates you have the more convoluted this can get, e.g.
SELECT *
FROM TableA AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1
WHERE (Col1 = 1 OR A.Col1 IS NULL);

Related

SQL Server: how can I count values in one field different from another, with multiple values for same identifier?

I have two tables with one common column, and an identifying value that can be duplicate (several observations of same document).
An example:
TableA:
A_identifier | Value
-------------+-------
1 | A
1 | B
TableB:
B_identifier | A_identifier | Value
-------------+--------------+-------
1 | 1 | A
2 | 1 | B
3 | 1 | B
4 | 1 | C
The above example illustrates the type of situation I am looking for in my data - we have a case in TableA with multiple values, of which some are the same in TableB and some are not. So TableA.Value and TableB.Value represent the same concept.
I want to know for each TableA.A_identifier, how many rows of TableB have different values than TableA.Value. If there was only one observation per A_identifier, this could be solved with a not, but the multiple possible values prevent this.
What I have thought about doing is something like this (which does not work):
select distinct
b.B_identifier, a.A_identifier
from
TableB b
join
TableA a in b.A_identifer = a.A_identifier and b.Value != a.Value
While the query technically works, it returns the wrong result - it counts all the cases where the values in TableA and TableB are different in a given row. However, I want it to only count the values in TableB which are not present at all in TableA for each A_identifier.
I tried replacing the != with not in which is what I would do for a static parameter. This syntax is not supported.
I hope my question makes sense, and that somebody can help. Thank you in advance.
Try this query if it works for you,
SELECT COUNT(b.A_identifier)
FROM TableB b
LEFT JOIN TableA a
ON b.A_identifier = a.A_identifier
AND b.Value = a.Value
WHERE a.A_identifier IS NULL -- filters out inexisting value
AND EXISTS (SELECT 1
FROM TableA c
WHERE b.A_identifier = c.A_identifier) -- shows only A_identifier
-- that is present in TableA
However, if you want to get the count for each Value
SELECT b.A_identifier, b.Value, TOTAL_COUNT = COUNT(b.A_identifier)
FROM TableB b
LEFT JOIN TableA a
ON b.A_identifier = a.A_identifier
AND b.Value = a.Value
WHERE a.A_identifier IS NULL
AND EXISTS (SELECT 1
FROM TableA c
WHERE b.A_identifier = c.A_identifier)
GROUP BY b.A_identifier, b.Value
Use NOT EXISTS
select t1.A_identifier, count(t2.value)
from TableA t1
left join TableB t2 on t1.A_identifier = t2.A_identifier and
NOT EXISTS (
select 1
from TableA t3
where t3.A_identifier = t2.A_identifier and
t3.Value = t2.Value
)
group by t1.A_identifier
How about the following SQL?
select distinct TableA.A_identifier,
(select count(*) from TableB
where TableB.A_identifier = TableA.A_identifier
and not exists(
select * from TableA where A_identifier = TableB.A_identifier
and Value = TableB.Value)
)
as TableB_Rows
from TableA

SQL how to check is a value in a col is NOT in another table

Maybe I need another coffee because this seems so simple yet I cannot get my head around it.
Let's say I have a tableA with a col1 where employee IDs are stored.... ALL employee IDs. And the 2nd table, tableB has col2 which lists all employeeID who have a negative evaluation.
I need a query which returns all ID's from col1 from table1 and a newcol which show a '1' for those ID's which do NOT exist in col2 of TableB.
I am doing this in dashDB
One option uses a LEFT JOIN between the two tables:
SELECT a.col1,
CASE WHEN b.col2 IS NULL THEN 1 ELSE 0 END AS new_col
FROM tableA a
LEFT JOIN tableB b
ON a.col1 = b.col2
Alternatively you can achieve your requirement with LEFT JOIN along with IFNULL function as below.
SELECT a.col1,
IFNULL(b.col2, 1) NewCol
FROM tableA a
LEFT JOIN tableB b
ON a.col1 = b.col2

Cancel join condition based on column value

I am stuck into a situation where I need to perform conditional join. The brief summary could be, I have 2 tables TableA and TableB.
TableA has columns A1,A2,A3,A4,A5,Condition1,Condition2
similarly TableB has columns B1,B2,Condition1,Condition2
And I need to perform join on TableA.Condition1 = TableB.Condition1 and condition join on TableA.Condition2 = TableB.Condition2, condition being TableA.Condition2 should not be null for any record first join.
In other words, if I some any record with TableA.Condition1 = TableB.Condition1 matched and at the same time TableA.Condition2 is not null for any of them then perform second join, otherwise don't perform second join.
Query could be like
SELECT A.* FROM TableA A
INNER JOIN TableB B
ON A.Condition1 = B.Condition1 -- This must be perform
AND WHEN A.Condition2 IS NULL THEN
1 = 1 -- Assuming no join here
ELSE
A.Condition2 = B.Condition2 -- perform join
END
You are only selecting from TableA, so how about using exists instead?
SELECT A.*
FROM TableA A
WHERE (A.Condition2 IS NULL AND
EXISTS (SELECT 1 FROM TableB b WHERE A.Condition1 = B.Condition1)
) AND
(A.Condition2 IS NOT NULL AND
EXISTS (SELECT 1 FROM TableB b WHERE A.Condition1 = B.Condition1 AND A.Condition2 = B.Condition2)
);
Or, if you want a JOIN:
SELECT A.*
FROM TableA A JOIN
TableB B
ON A.Condition1 = B.Condition1 AND
(A.Condition2 IS NOT NULL OR A.Condition2 = B.Condition2);
Try this may help you
SELECT A.*
FROM TableA A
INNER JOIN TableB B ON A.Condition1 = B.Condition1 AND
((A.Condition2 IS NOT NULL AND A.Condition2 = B.Condition2)
OR (A.Condition2 IS NULL) )
hard to understand your question and even harder to understand the purpose
SELECT A.*
-- join table1 and table2 on Condition1
FROM TableA A
JOIN TableB B ON A.Condition1 = B.Condition1
-- if max condition2 is null then there is nothing but nulls
JOIN ( SELECT MAX(Condition2) Condition2 FROM TableA A2 ) A2
-- in that case every row resulting from join1 goes
ON A2.Condition2 IS NULL
-- otherwise use condition2 but replace nulls with some placeholder
-- or maybe you have either all null or no nulls
OR COALESCE (A.Condition2,'null') = COALESCE (B.Condition2,'null')
;

How to choose a proper filter for an sql join

If I have table A and table B, each with one column:
A:
col1
1
2
3
1
B:
col1
1
1
4
and I want all rows from A and the matching rows from B, only when the column has non null value in both tables, which one should I use?
select * from A left join B on A.col1 = B.col1 and A.col1 is not null AND B.col1 is not null;
select * from A left join B on A.col1 = B.col1 where A.col1 is not null OR B.col1 is not null;
select * from A left join B on A.col1 = B.col1 and (A.col1 is not null OR B.col1 is not null;)
My guess is that the first and the third are the same and will provide the desired output.
If you want to skip null values and you want to link both tables only on existing values you should use an INNER JOIN, the null check is redundant:
SELECT A.*
FROM A INNER JOIN B ON A.col1 = B.col1
NULL will never match any other value (not even NULL itself), unless the join condition explicitly uses the IS NULL or IS NOT NULL predicates.
In a comment you said you are checking for more than nulls in this case I would probaly take thederived table or CTE approach. Dervied table shown below as you did not specify which database backend, so I don't know if you can use CTEs.
select
from
(select from tablea where test is not null or test <>'' or test<>'N/A') a
JOin
(select from tableb where test is not null or test <>'' or test<>'N/A')b
ON a.col1 = b.col1
You just need
select * from A left join B on A.col1 = B.col1
NULL will never match anything (when not compared with IS NULLand the like), therefore NULL in A won't match anything in B.
Since you want all the rows from A, below query should work:
select * from A left outer join B on A.col1 = B.col1 where A.col1 is not null and A.col1<>'N/A' and A.col1<>''
http://sqlfiddle.com/#!2/98501/14

Compare the data in two tables with same schema

I have been doing a bit of searching for a while now on a particular problem, but I can't quite find this particular question
I have a rather unusual task to achieve in SQL:
I have two tables, say A and B, which have exactly the same column names, of the following form:
id | column_1 | ... | column_n
Both tables have the same number of rows, with the same id's, but for a given id there is a chance that the rows from tables A and B differ in one or more of the other columns.
I already have a query which returns all rows from table A for which the corresponding row in table B is not identical, but what I need is a query which returns something of the form:
id | differing_column
----------------------
1 | column_1
3 | column_6
meaning that the row with id '1' has different 'column_1' values in tables A and B, and the row with id '3' has different 'column_6' values in tables A and B.
Is this at all achievable? I imagine it might require some sort of pivot in order to get the column names as values, but I might be wrong. Any help/suggestions much appreciated.
Yes you can do that with a query like this:
WITH Diffs (Id, Col) AS (
SELECT
a.Id,
CASE
WHEN a.Col1 <> b.Col1 THEN 'Col1'
WHEN a.Col2 <> b.Col2 THEN 'Col2'
-- ...and so on
ELSE NULL
END as Col
FROM TableOne a
JOIN TableTwo b ON a.Id=b.Id
)
SELECT Id, Col
WHERE Col IS NOT NULL
Note that the above query is not going to return all the columns with differences, but only the first one that it is going to find.
You can do this with an unpivot -- assuming that the values in the columns are of the same type.
If your data is not too big, I would just recommend using a bunch of union all statements instead:
select a.id, 'Col1' as column
from a join b on a.id = b.id
where a.col1 <> b.col1 or a.col1 is null and b.col1 is not null or a.col1 is not null and b.col1 is null
union all
select a.id, 'Col2' as column
from a join b on a.id = b.id
where a.col2 <> b.col2 or a.col2 is null and b.col2 is not null or a.col2 is not null and b.col2 is null
. . .
This prevents issues with potential type conversion problems.
If you don't mind having the results on one row, you can do:
select a.id,
(case when a.col1 <> b.col1 or a.col1 is null and b.col1 is not null or a.col1 is not null and b.col1 is null
then 'Col1;'
else ''
end) +
(case when a.col2 <> b.col2 or a.col2 is null and b.col2 is not null or a.col2 is not null and b.col2 is null
then 'Col2;'
else ''
end) +
. . .
from a join b on a.id = b.id;
If your columns are of the same type, there is a slick method:
SELECT id,col
FROM (SELECT * FROM A UNION ALL SELECT * FROM B) t1
UNPIVOT (value for col in (column_1,column_2,column_3,column_4)) t2
GROUP BY id,col
HAVING COUNT(DISTINCT value) > 1
If you need to handle NULL as a unique value, then use HAVING COUNT(DISTINCT ISNULL(value,X)) > 1 with X being a value that doesn't occur in your data