Left outer join not matching records consistently - sql

I'm experiencing some very bizarre behaviors with a left outer join query.
Edit: I've provided an Access database with the tables and data needed to reproduce this problem. Note that this is the best way to reproduce the problem, because if the data size is reduced to a small sample, then the problem doesn't appear. This can be downloaded at: https://joek.com/etc/left_join_odd_behavior.accdb
Edit: My core problem with this is that I've been using left outer joins in this manner for years without experiencing a problem like this before. Additionally, I probably have hundreds of other left outer joins like this across a dozen applications, and haven't experienced a problem like this, or worse, if this problem is occurring in some of those other queries but haven't been noticed yet.
Edit: The left outer join is providing different results based on factors that should not affect the results (example, inserting different number of columns into another table, or including a WHERE clause for a specific set of results, or reducing the data to a small sample). My underlying problem isn't with fixing this query (I can do that a few different ways that give the correct results). My problem is understanding why this query that should work doesn't work in certain cases so that I can understand if it's: a data problem (so that I can fix it in these tables and check that it isn't happening in any other tables), an indexing problem (so that I can apply the correct indexes to these tables and make sure other tables are correct), or an issue inherent in MS Access (so that I can code around it, change all my other similar left outer joins, and use different methods going forward [as I've had to do with some other differences between Access & SQL Server that I've accustomed myself to since having to start working with Access over 6 years ago]).
I've simplified it to the following query:
SELECT
*x*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
I have indexes on all these columns being used.
Edit: (Note that in my sample database, I've renamed the columns from the production code to match my example query here.)
If x is:
TableA.IntA INTO TempTableX
then the results I get are missing some records which do match the left join criteria.
If x is:
TableA.* INTO TempTableX
then I get all the records expected.
If I simply return all the records without inserting them into a table, then it doesn't matter if I return TableA.IntA or TableA.* or even TableA.IntA, TableA.IntB; every which way excludes some matching records.
Each table has thousands of records (TableA 50K+, TableB 8,000+), but for the records that match on both IntA and IntB, there are only about 200 matches. And of those, only 17 have DateA set to Null.
Here's an example of data which is not being returned correctly is the following:
TableA
-----------
IntA IntB
1 10
2 22
3 33
4 44
TableB
-------
IntA IntB DateA
2 20 1/1/2020
3 31 2/1/2020
4 44 3/1/2020
As you can see, this:
SELECT
TableA.IntA, TableA.IntB
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
should return the following (edit: because this is a left outer join with specific criteria in the ON, and with no WHERE clause to constrain the results, it should return ALL the records from TableA):
TableA.IntA TableA.IntB
-------------------------
1 10
2 22
3 33
4 44
but instead I get the following (edit: it is excluding the one record that matches the left join criteria despite no WHERE clause specifying to exclude records in TableA that do not match to TableB):
TableA.IntA TableA.IntB
-------------------------
1 10
2 22
3 33
But, if I insert TableA.* into a table, then I get the expected 4 records (edit: note that this occurs when using a SELECT TableA.* INTO TempTableX, which causes Access to create the new table based on the data structure of the columns from TableA, and I'm not returning any columns from TableB, so there would not be any problem of it not populating data because of table field requirements).
Additionally, if I add a WHERE criteria to the query to specifically get the expected records, then I get them in the result. Or, if I empty TableA and TableB of all other records except these example records, then I get them correctly.
Edit:
Using the provided database, you can test these situations out in the following manner:
First:
SELECT
TableA.*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
you'll get 58,160 records, and when you take these results to a spreadsheet and filter on the CUSTID column for "5616", you'll find 25 records (this is an incomplete result).
Second:
SELECT
TableA.*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
WHERE
TableA.CustId = "5616"
you'll find now the correct 26 records.
If you compare these results, you'll find that the one record which gets excluded in the incomplete results (IntA = 25093 & IntB = 59797) is the only record in TableB for CustId 5616 that matches both IntA and IntB. But while it doesn't match the criteria of DateA IS NULL, it should still be returned from TableA even in the first query because there is no WHERE clause excluding TableA records that do not match the left join criteria.
Alternately, if you change the first query to:
SELECT
TableA.*
INTO TempTableX
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
then you'll get all 58,348 records that exist in TableA, and filtering for CUSTID "5616" you'll find all 26 records.
Alternately, if you make no changes to the first query, but empty TableA of all records except where CustId = "5616", then you'll get the complete 26 records instead of 25.
So again, my problem is why is Access providing different results based on changes which should not typically affect the results (number of records in the database, returning data vs inserting into a table, etc)?
Additionally for more detail on why I'm using a left join like this:
This query is not the end goal, it is simply the smallest reduction of the production code where I found the problem occurring.
The production use is to collect all records from TableA that DO NOT have a matching record in TableB where the DateA is null (aka TableB is a log of the records in TableA that have been reported and DateA is populated when the issue is resolved and thus fall off the report, so all DateA nulls are those still open on the report, and thus the production query includes a WHERE TableB.LogId IS NULL in order to exclude all records from TableA that are still unresolved in TableB [so again for more detail the record for CustId "5616" should return all 26 if I add WHERE TableB.LogId IS NULL because 0 records in TableB for CustId "5616" have a null in DateA and thus 0 should join and thus all 26 records for CustId in TableA will have TableB.LogId = null, but this correct behavior does not occur unless I limit the results with more criteria, insert all columns into a table, or reduced the data in the tables, etc]).
The left join is on two columns IntA and IntB because TableB is a log of occurrences of the records in TableA. In otherwords, IntA might have occurred 5 times in TableB (5 different values of IntB) but I only want to match the specific occurrence of IntB per IntA as recorded in TableA.
And, as I mentioned above, these columns are indexed in the tables, and I've even tried compact & repair of the database.
Any thoughts?

It isn't an error in the script. The last table is the correct logical expected outcome of the LEFT JOIN you provided.
By your own query:
SELECT
TableA.IntA, TableA.IntB
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL) -- THIS HERE IS YOUR TRICKY BIT
This means you want
1: all the rows from TableA that have no matching IntA and IntB pairs in TableB
2: and all the rows from TableA that do have matching IntA and IntB but have a null DateA. Note that no row in your sample data matches this.
The expected result is:
TableA.IntA TableA.IntB
-------------------------
1 10
2 22
3 33
-- 4 does not qualify, because it matches IntA and IntB in TableB AND has a non-null DateA value.
If you wanted to see data where IntA and IntB were in both tables, but DateA really was a NULL value, you'd need:
SELECT
TableA.IntA, TableA.IntB
FROM TableA
INNER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB)
WHERE TableB.DateA IS NULL
As to why Access has different behavior depending on the columns selected, I got nothing. I loathe Access for a myriad of reasons. Happy to help you nail down queries if you need a specific set of results, but the quirks of Access is for someone else to answer.

I do not know much about access.
What I guess is, that due to the left join there is data with null values.
Could it be, that the temp tables are defined without allowing nulls in some columns and the data is then swallowed during the insert silently somehow?
This would explain the difference between selecting just some or all columns into the temp table.
If Access allows to not define the temp table you could try to define it anyway and define nullable columns. I could not tell you how to do that in Access.

Related

SQL Inner Join, Left Join, Right Join returns same results. Why this is showing result table like below..? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 11 months ago.
This post was edited and submitted for review 11 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
Here, I have tried to perform inner join, left join, right join on table 1 & table 2..but all joins return the same output.
What is the specific reason behind this...?
sample SQL query for ref:
select column1
from table1
left join table1
on table1.column1 == table2.column1
When both tables have the same set of joined values -- (A, B, C, D) in this case -- then there is no difference between an inner and outer join.
Such joins can only give different results when these sets are not the same. For instance, if you would delete the last two records from Table 2, then the inner join will not produce a record with D, while an outer join could still produce it:
SELECT table1.column
FROM table1
LEFT JOIN table2 ON table1.column = table2.column
The INNER JOIN selects records that have matching values in both tables.
(So in this case whatever data is present in Table1 is also present in table 2 for ex. "A" is present twice in tbl2 so in end result table its 2 times
i.e for 1 "A" there is 2 values is been returned from tbl2)
The LEFT JOIN returns all records from the left table (table1), and the matching records from the right table (table2).The result is 0 records from the right side, if there is no match.
(So again "A" is having 2 values in tbl2 so for 1 "A" there is 2 values is been returned from tbl2)
The RIGHT JOIN returns all records from the right table (table2), and the matching records from the left table (table1). The result is 0 records from the left side, if there is no match.
(So again in table 2 there is 2 "A" and for each 1 "A" 1 values from left table(tbl1) is returned).
Data set is having similar values, that's why you are getting similar result.
Try modifying the data set you will observe the difference

Values after join are incorrect

I have 2 database tables. Table A has to fetch some records based on parameter passed there may or may not be an entry in table B with that key.
What I want to do is:
select a.col1,a.col2,a.col3
FROM table WHERE a.id = 123
This would fetch 20 rows. For one of the rows there is an entry in another table B.
select T_level from table b where b.id = 123
only one record appears with right value.
What I want is to get this in a single query. Something like:
select a.col1,a.col2,a.col3,b.T_level
from a,b
where a.id = 123
and a.id = b.id
When I do that, I get 20 rows and the column T_level as '50' for all the rows, whereas it should be '50' for one correct row, for rest it should be null.
I further tried:
select a.col1,a.col2,a.col3,nvl(b.T_level,0) from a,b
but that doesn't fetch the way I expect.
Firstly, please learn to use ansi sql join syntax. The Oracle join syntax you are using hasn't been considered good practice for decades
SQL Join syntax
If you want to get all records from a and any matching records from b then you need to use a LEFT OUTER JOIN

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?
You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);
SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?
It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

Inner join between two tables with same count values

I have been working on this issue since 2 days now.
I have two tables created by using SQL Select statements
SELECT (
) Target
INNER JOIN
SELECT (
) Source
ON Join condition 1
AND Join condition 2
AND Join condition 3
AND Join condition 4
AND Join condition 5
The target table has count value of 10,000 records.
The source table has count value of 10,000 records.
but when I do an inner join between the two tables on the 5 join conditions
I get 9573 records.
I am basically trying to find a one to one match between source and target table. I feel every field from target matches every field in source.
Questions:
Why does my inner join give less records even if there are same value of records in both tables?
If it is expected, how can I make sure I get the exact 10,000 records after the join condition?
1) An INNER JOIN only outputs the rows from the JOINING of two tables where their joining columns match. So in your case, Join Condition1 may not exist in rows in both tables and therefore some rows are filtering out.
2) As the other poster mentioned a left join is one way. You need to look which table source or target you want to use as your master i.e. start from and return all those rows. You then left join the remaining table based on your conditions to add all the columns where you join conditions match.
It's probably better if you give us the tables you are working on and the query\results you are trying to achieve.
There's some really good articles about the different joins out there. But it looks like you'd be interested in left joins. So if it exists in Target, but not in Source, it will not drop the record.
So, it would be:
SELECT(...) Target
LEFT OUTER JOIN
SELECT(...) Source
ON cond1 and cond2 and cond3 and cond4 and cond5
Give that a shot and let me know how it goes!
Sometime you need to rely on logical analysis rather than feelings. Use this query to find the fields that do not match and then work out your next steps
SELECT
Target.Col1,Source.Col1,
Target.Col2,Source.Col2,
Target.Col3,Source.Col3
FROM
(
) Target
FULL OUTER JOIN
(
) Source
ON Target.Col1=Source.Col1
AND Target.Col2=Source.Col2
AND Target.Col3=Source.Col3
WHERE (
Target.Col1 IS NULL
OR Source.Col1 IS NULL
OR Target.Col2 IS NULL
OR Source.Col2 IS NULL
OR Target.Col3 IS NULL
OR Source.Col3 IS NULL
)

vb.net compare two tables and find differences

I have two tables with the same strucure. a.ID*(varchar(10)), a.CODE_ASS(varchar(1))* and b.ID*(varchar(10)), b.CODE_ASS(varchar(1))*
Table "a" contains 2010 data, and table "b" contain 2013 data.
These two tables should doesn't have the same number of rows in each, but common IDs should be identical. I need to compare tables and find differences.
As you can see in this example, green rows are ok, and red one should be an error.
ID 2, 4 and 5 are errors because there are some differences. ID 6, in "a" table, is ok even if it has been deleted in table "b".
It appears that you are using SQL tables, if you write a query to join the data on ID and select only where the values are different. Do you really want to assume that 0 = NULL?
SELECT a.ID FROM a INNER JOIN b ON a.ID = b.ID WHERE a.ID<>b.ID;
If you want 0 = NULL then you need to change NULLs to 0 - that would look like the following.
SELECT a.ID FROM a INNER JOIN b ON a.ID = b.ID WHERE ISNULL(a.ID,0)<>ISNULL(b.ID,0);
ISNULL(param, value) will change the NULL to the value, in this case 0.
I am guessing you are using a recordset object so then all you have to do is loop through the results.