vb.net compare two tables and find differences - vb.net

I have two tables with the same strucure. a.ID*(varchar(10)), a.CODE_ASS(varchar(1))* and b.ID*(varchar(10)), b.CODE_ASS(varchar(1))*
Table "a" contains 2010 data, and table "b" contain 2013 data.
These two tables should doesn't have the same number of rows in each, but common IDs should be identical. I need to compare tables and find differences.
As you can see in this example, green rows are ok, and red one should be an error.
ID 2, 4 and 5 are errors because there are some differences. ID 6, in "a" table, is ok even if it has been deleted in table "b".

It appears that you are using SQL tables, if you write a query to join the data on ID and select only where the values are different. Do you really want to assume that 0 = NULL?
SELECT a.ID FROM a INNER JOIN b ON a.ID = b.ID WHERE a.ID<>b.ID;
If you want 0 = NULL then you need to change NULLs to 0 - that would look like the following.
SELECT a.ID FROM a INNER JOIN b ON a.ID = b.ID WHERE ISNULL(a.ID,0)<>ISNULL(b.ID,0);
ISNULL(param, value) will change the NULL to the value, in this case 0.
I am guessing you are using a recordset object so then all you have to do is loop through the results.

Related

Values after join are incorrect

I have 2 database tables. Table A has to fetch some records based on parameter passed there may or may not be an entry in table B with that key.
What I want to do is:
select a.col1,a.col2,a.col3
FROM table WHERE a.id = 123
This would fetch 20 rows. For one of the rows there is an entry in another table B.
select T_level from table b where b.id = 123
only one record appears with right value.
What I want is to get this in a single query. Something like:
select a.col1,a.col2,a.col3,b.T_level
from a,b
where a.id = 123
and a.id = b.id
When I do that, I get 20 rows and the column T_level as '50' for all the rows, whereas it should be '50' for one correct row, for rest it should be null.
I further tried:
select a.col1,a.col2,a.col3,nvl(b.T_level,0) from a,b
but that doesn't fetch the way I expect.
Firstly, please learn to use ansi sql join syntax. The Oracle join syntax you are using hasn't been considered good practice for decades
SQL Join syntax
If you want to get all records from a and any matching records from b then you need to use a LEFT OUTER JOIN

Left outer join not matching records consistently

I'm experiencing some very bizarre behaviors with a left outer join query.
Edit: I've provided an Access database with the tables and data needed to reproduce this problem. Note that this is the best way to reproduce the problem, because if the data size is reduced to a small sample, then the problem doesn't appear. This can be downloaded at: https://joek.com/etc/left_join_odd_behavior.accdb
Edit: My core problem with this is that I've been using left outer joins in this manner for years without experiencing a problem like this before. Additionally, I probably have hundreds of other left outer joins like this across a dozen applications, and haven't experienced a problem like this, or worse, if this problem is occurring in some of those other queries but haven't been noticed yet.
Edit: The left outer join is providing different results based on factors that should not affect the results (example, inserting different number of columns into another table, or including a WHERE clause for a specific set of results, or reducing the data to a small sample). My underlying problem isn't with fixing this query (I can do that a few different ways that give the correct results). My problem is understanding why this query that should work doesn't work in certain cases so that I can understand if it's: a data problem (so that I can fix it in these tables and check that it isn't happening in any other tables), an indexing problem (so that I can apply the correct indexes to these tables and make sure other tables are correct), or an issue inherent in MS Access (so that I can code around it, change all my other similar left outer joins, and use different methods going forward [as I've had to do with some other differences between Access & SQL Server that I've accustomed myself to since having to start working with Access over 6 years ago]).
I've simplified it to the following query:
SELECT
*x*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
I have indexes on all these columns being used.
Edit: (Note that in my sample database, I've renamed the columns from the production code to match my example query here.)
If x is:
TableA.IntA INTO TempTableX
then the results I get are missing some records which do match the left join criteria.
If x is:
TableA.* INTO TempTableX
then I get all the records expected.
If I simply return all the records without inserting them into a table, then it doesn't matter if I return TableA.IntA or TableA.* or even TableA.IntA, TableA.IntB; every which way excludes some matching records.
Each table has thousands of records (TableA 50K+, TableB 8,000+), but for the records that match on both IntA and IntB, there are only about 200 matches. And of those, only 17 have DateA set to Null.
Here's an example of data which is not being returned correctly is the following:
TableA
-----------
IntA IntB
1 10
2 22
3 33
4 44
TableB
-------
IntA IntB DateA
2 20 1/1/2020
3 31 2/1/2020
4 44 3/1/2020
As you can see, this:
SELECT
TableA.IntA, TableA.IntB
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
should return the following (edit: because this is a left outer join with specific criteria in the ON, and with no WHERE clause to constrain the results, it should return ALL the records from TableA):
TableA.IntA TableA.IntB
-------------------------
1 10
2 22
3 33
4 44
but instead I get the following (edit: it is excluding the one record that matches the left join criteria despite no WHERE clause specifying to exclude records in TableA that do not match to TableB):
TableA.IntA TableA.IntB
-------------------------
1 10
2 22
3 33
But, if I insert TableA.* into a table, then I get the expected 4 records (edit: note that this occurs when using a SELECT TableA.* INTO TempTableX, which causes Access to create the new table based on the data structure of the columns from TableA, and I'm not returning any columns from TableB, so there would not be any problem of it not populating data because of table field requirements).
Additionally, if I add a WHERE criteria to the query to specifically get the expected records, then I get them in the result. Or, if I empty TableA and TableB of all other records except these example records, then I get them correctly.
Edit:
Using the provided database, you can test these situations out in the following manner:
First:
SELECT
TableA.*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
you'll get 58,160 records, and when you take these results to a spreadsheet and filter on the CUSTID column for "5616", you'll find 25 records (this is an incomplete result).
Second:
SELECT
TableA.*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
WHERE
TableA.CustId = "5616"
you'll find now the correct 26 records.
If you compare these results, you'll find that the one record which gets excluded in the incomplete results (IntA = 25093 & IntB = 59797) is the only record in TableB for CustId 5616 that matches both IntA and IntB. But while it doesn't match the criteria of DateA IS NULL, it should still be returned from TableA even in the first query because there is no WHERE clause excluding TableA records that do not match the left join criteria.
Alternately, if you change the first query to:
SELECT
TableA.*
INTO TempTableX
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
then you'll get all 58,348 records that exist in TableA, and filtering for CUSTID "5616" you'll find all 26 records.
Alternately, if you make no changes to the first query, but empty TableA of all records except where CustId = "5616", then you'll get the complete 26 records instead of 25.
So again, my problem is why is Access providing different results based on changes which should not typically affect the results (number of records in the database, returning data vs inserting into a table, etc)?
Additionally for more detail on why I'm using a left join like this:
This query is not the end goal, it is simply the smallest reduction of the production code where I found the problem occurring.
The production use is to collect all records from TableA that DO NOT have a matching record in TableB where the DateA is null (aka TableB is a log of the records in TableA that have been reported and DateA is populated when the issue is resolved and thus fall off the report, so all DateA nulls are those still open on the report, and thus the production query includes a WHERE TableB.LogId IS NULL in order to exclude all records from TableA that are still unresolved in TableB [so again for more detail the record for CustId "5616" should return all 26 if I add WHERE TableB.LogId IS NULL because 0 records in TableB for CustId "5616" have a null in DateA and thus 0 should join and thus all 26 records for CustId in TableA will have TableB.LogId = null, but this correct behavior does not occur unless I limit the results with more criteria, insert all columns into a table, or reduced the data in the tables, etc]).
The left join is on two columns IntA and IntB because TableB is a log of occurrences of the records in TableA. In otherwords, IntA might have occurred 5 times in TableB (5 different values of IntB) but I only want to match the specific occurrence of IntB per IntA as recorded in TableA.
And, as I mentioned above, these columns are indexed in the tables, and I've even tried compact & repair of the database.
Any thoughts?
It isn't an error in the script. The last table is the correct logical expected outcome of the LEFT JOIN you provided.
By your own query:
SELECT
TableA.IntA, TableA.IntB
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL) -- THIS HERE IS YOUR TRICKY BIT
This means you want
1: all the rows from TableA that have no matching IntA and IntB pairs in TableB
2: and all the rows from TableA that do have matching IntA and IntB but have a null DateA. Note that no row in your sample data matches this.
The expected result is:
TableA.IntA TableA.IntB
-------------------------
1 10
2 22
3 33
-- 4 does not qualify, because it matches IntA and IntB in TableB AND has a non-null DateA value.
If you wanted to see data where IntA and IntB were in both tables, but DateA really was a NULL value, you'd need:
SELECT
TableA.IntA, TableA.IntB
FROM TableA
INNER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB)
WHERE TableB.DateA IS NULL
As to why Access has different behavior depending on the columns selected, I got nothing. I loathe Access for a myriad of reasons. Happy to help you nail down queries if you need a specific set of results, but the quirks of Access is for someone else to answer.
I do not know much about access.
What I guess is, that due to the left join there is data with null values.
Could it be, that the temp tables are defined without allowing nulls in some columns and the data is then swallowed during the insert silently somehow?
This would explain the difference between selecting just some or all columns into the temp table.
If Access allows to not define the temp table you could try to define it anyway and define nullable columns. I could not tell you how to do that in Access.

SQL SELECT query where the IDs were already found

I have 2 tables:
Table A has 3 columns (for example) with opportunity sales header data:
OPP_ID, CLOSE_DTTM, STAGE
Table B has 3 columns with the individual line items for the Opportunities:
OPP_LINE_ID, OPP_ID, AMOUNT_USD
I have a select statement that correctly parses through Table A and returns a list of Opportunities. What I would like to do is, without joining the data, to have a SELECT statement that will get data from Table B but only for the OPP_IDs that were found in my first query.
The result should be 2 views/resultset (one for each select query) and not just 1 combined view where Table B is joined to Table A.
The reason why I want to keep them separate is because I will have to perform a few manipulations to the result from table B and i don't want the result from table A affected.
Subquery is all what you need
SELECT OPP_ID, CLOSE_DTTM, STAGE
From table a
where a.opp_id IN (Select opp_id from table b)
Presuming you're using this in some client side data access library that represents B's data in some 2 dimensional collection and you want to manipulate it without affecting/ having A's data present in that collection:
Identify the records in A:
SELECT * FROM a WHERE somecolumn = 'somevalue'
Identify the records in B that relate to A, but don't return A's data:
SELECT b.* FROM a JOIN b ON a.opp_id = b.opp_id WHERE a.somecolumn = 'somevalue'
Just because JOIN is used doesn't mean your end-consuming program has to know about A's data. You could also use IN, like the other answer does, but internally the database will rewrite them to be the same thing anyway
I tend to use exists for this type of query:
select b.*
from b
where exists (select 1 from a where a.opp_id = b.opp_id);
If you want two results sets, you need to run two queries. It is unclear what the second query is, perhaps the first query on A.

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?
You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);
SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?
It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

Setting multiple rows in one table equal to multiple rows in another table based on multiple column values being equal

I am trying to run a rather convoluted query. Two tables are out of sync. In one step of the processing, a 16 digit value is copied from one table to the other, and is getting truncated to just 10 digits.
I'm using a few pieces of information to copy the full 16 digit number over. I'm trying to find anywhere where the 10 digit value matches the first 10 digits of the 16 digit value, and three other pieces of information in these two tables match. Combined, they give almost 100% certainty that we have a unique entry. This is the current iteration of my query:
UPDATE DB1.TABLE1
SET ID =
(
SELECT b.ID
FROM DB2.TABLE1 b
INNER DB1.TABLE1 a
ON left(b.ID, 10) = a.ID
WHERE len(a.ID) = 10
AND a.STORE = b.STORE
AND a.DOCTYPE = b.DOCTYPE
AND a.DOCDATE = b.DOCDATE
)
The problem is, it's telling me the subquery is returning multiple results. But I want multiple results. I tried adding another WHERE statement after the parenthesis, and duplicating the last four lines of the subquery, but that's not working either. I also tried using WHERE EXISTS and duplicating the entire SELECT statement, but that gives em the multiple results error as well. What am I missing here?
Your statement is attempting to update every row in DB1.TABLE1 to what's returned by the subquery. Not only is that not what you want, but the statement fails because the subquery is returning multiple values.
What you need to do is correlate the two tables as part of the update statement, like this:
UPDATE DB1.TABLE1
SET ID = b.ID
FROM DB1.TABLE1 a
INNER JOIN DB2.TABLE1 b
ON left(b.ID, 10) = a.ID
AND a.STORE = b.STORE
AND a.DOCTYPE = b.DOCTYPE
AND a.DOCDATE = b.DOCDATE
WHERE len(a.ID) = 10