SQL join to find inconsistencies between two data sources

SQL join to find inconsistencies between two data sources - sql

I have a SQL challenge that is wracking my brain. I am trying to reconcile two reports for licenses of an application.
The first report is an access database table. It has been created and maintained by hand by my predecessors. Whenever they installed or uninstalled the application they would manually update the table, more or less. It has a variety of columns of inconsistent data, including Name(displayName) Network ID(SAMAccountName) and Computer Name. Each record has a value for at least one of these fields. Most only have 1 or 2 of the values, though.
The second report is based on an SMS inventory. It has three columns: NetbiosName for the computer name, SAMAccountName, and displayName. Every record has a NetbiosName, but there are some nulls in SAMAccountName and displayName.
I have imported both of these as tables in an MS SQL Server 2005 database.
What I need to do is get a report of each record in the Access table that is not in the SMS table and vice versa. I think it can be done with a properly formed join and where clause, but I can't see how to do it.
Edit to add more detail:
If the records match for at least one of the three columns, it is a match. So I need the records form the Access table where the Name, NetworkID, and ComputerName are all missing from the SMS table. I can do it for anyone column, but I can't see how to combine all three columns.

Taking Kaboing's answer and the edited question, the solution seems to be:
SELECT *
FROM report_1 r1
FULL OUTER JOIN report_2 r2
ON r1.SAMAccountName = r2.SAMAccountName
OR r1.NetbiosName = r2.NetbiosName
OR r1.DisplayName = r2.DisplayName
WHERE r2.NetbiosName IS NULL OR r1.NetbiosName IS NULL
Not sure whether records will show up multiple times

You need to look at the EXCEPT clause. It's new to SQL SERVER 2005 and does the same thing that Oracle's MINUS does.
SQL1
EXCEPT
SQL2
will give you all the rows in SQL1 not found in SQL2
IF
SQL1 = A, B, C, D
SQL2 = B, C, E
the result is A, D

Building on Gabriel1836's answer, made simpler, but perhaps a bit harder to interpret:
SELECT *
FROM report_1 r1
FULL OUTER JOIN report_2 r2 ON r1.SAMAccountName = r2.SAMAccountName
WHERE r2.SAMAccountName IS NULL OR r1.SAMAccountName IS NULL

take a look at the tabeldiff.exe that comes with sql server.

Try the following:
SELECT displayName, 'report_1' as type
FROM report_1 r1
LEFT OUTER JOIN report_2 r2 ON r1.SAMAccountName = r2.SAMAccountName
WHERE r2.SAMAccountName IS NULL
UNION
SELECT displayName, 'report_2' as type
FROM report_1 r1
RIGHT OUTER JOIN report_2 r2 ON r1.SAMAccountName = r2.SAMAccountName
WHERE r1.SAMAccountName IS NULL

Related

SQL - query returning rows that presumably do not exist

I am working in Microsoft SQL Server 2012.
I run this query:
select * from tblbill
^Returns four rows. Particularly 4 distinct values of my field of interest paymentduedate^
I run a second query:
select b.paymentduedate, ledgertypeid, l.Billid
from tblbill as b
join tblledger as l on b.billid = l.billid
^^Returns twenty rows with values ofb.paymentduedate that are not returned when I run the elect *. paymentduedate is not a column in tblledger.
How is this possible? My first guess is that somehow rows in tblBill may be hidden but I do now know how to check that.

There could be few reasons:
There are 20 records with matching billid in table tblledger (the so called duplicate records came from the same 4 records in tblbill, you should count distinct values to determine if there are duplicates)
after you ran the first query, that data was changed.
Any way there is no such thing as hidden records

when you join you get all options. use inner, left or right join

problems while trying to optimize my SQL(inner join and group)

Im having a problem in joining and grouping two table. Im using ms sql server 2005 express .
Thank you in advance!

You just need to add date_request to your JOIN criteria:
SELECT otd.userid,otd.task,otd.date_request,ot.approved_by
FROM otd
JOIN ot
ON otd.userid = ot.requested_by
AND otd.date_request = ot.date_request
WHERE otd.userid ='xxx'
AND CONVERT(varchar,otd.date_request,101) BETWEEN '09/10/2013' AND '09/11/2013'
AND ot.status ='A'
ORDER BY otd.date_request,ot.date_request ASC
Demo: SQL Fiddle
Note: Date is changed in Fiddle, but the extra JOIN criteria is the important part. Also, not sure what you're converting your date field for, but if it's a DATE you can just alter the format of your date strings and not cast (as it is in fiddle).

Based on your schema, there's no way to determine which task in otd the record in ot is referring to. Perhaps you meant to include a task column in ot? For example, take a look at your first record in otd. Task 1 by user xxx requested on 9/10/2013. Now look at all the records in ot. You're joining ot on otd.userid = ot.requested_by, and there are two records in ot requested by xxx. So that join matches those two records for task 1 by xxx, and the same two records for task 2 by xxx, and again for tasks 5 and 6.

SQL- make all rows show a column value if one of the rows has it

I have an SQL statement for a PICK sheet that returns the header/detail records for an order.
One of the fields in the SQL is basically a field to say if there are dangerous goods. If a single product on the order has a code against it, then the report should display that its hazardous.
The problem I am having is that in the SQL results, because I am putting the code on the report in the header section (and not the detail section), it is looking for the code only on the first row.
Is there a way through SQL to basically say "if one of these rows has this code, make all of these rows have this code"? I'm guessing a subselect would work here... the problem is, is that I am using a legacy system built on FoxPro and FoxPro SQL is terrible!
EDIT: just checked and I am running VFP8, subqueries in the SELECT statement were added in FVP9 :(

SELECT Header.HeaderId, Header.HeaderDescription,
Detail.DetailId, Detail.DetailDescription, Detail.Dangerous,
Danger.DangerousItems
FROM Header
INNER JOIN Detail ON Header.HeaderId = Detail.HeaderId
LEFT OUTER JOIN
(SELECT HeaderId, COUNT(*) AS DangerousItems FROM Detail WHERE Dangerous = 1 GROUP BY HeaderId) Danger ON Header.HeaderId = Danger.HeaderId
If Danger.DangerousItems > 0 then something is dangerous. If it is Null then nothing is dangerous.
If you can't do nested queries, then you should be able to create a view-like object (called a query in VFP8) for the nested select:
SELECT HeaderId, COUNT(*) AS DangerousItems FROM Detail WHERE Dangerous = 1 GROUP BY HeaderId
and then can you left join on that?

In VFP 8 and earlier, your best bet is to use three queries in a row:
SELECT Header.HeaderId, Header.HeaderDescription,
Detail.DetailId, Detail.DetailDescription, Detail.Dangerous,
Danger.DangerousItems
FROM Header
INNER JOIN Detail ON Header.HeaderId = Detail.HeaderId
INTO CURSOR csrDetail
SELECT HeaderId, COUNT(*) AS DangerousItems
FROM Detail
WHERE Dangerous
GROUP BY HeaderId
INTO CURSOR csrDanger
SELECT csrDetail.*, csrDanger.DangerousItems
FROM csrDetail.HeaderID = csrDanger.HeaderID
INTO CURSOR csrResult

Error while posting to a table

Platform used:
SQL Server 2008 and C++ Builder
I am doing an inner join between 2 tables which was giving me an error:
Row cannot be located for updating
Query:
SELECT DISTINCT
b.Acc, b.Region, b.Off, b.Sale, a.OrgDate
FROM
sales b
INNER JOIN
dates a ON (a.Acc = b.Acc and a.Region = b.Region and a.year= b.year)
WHERE
(a.xdate <> a.yDate)
and (b.Sale = a.SaleDate)
and b.year = 2010
Note: Acc, Region, Off are primary keys of table b and are also present in table a.
Table a has an id which is the primary key which does not appear in the query.
It turned out that my inner join was returning duplicate rows.
I changed my inner join query to use 'DISTINCT' so that only distinct rows are returned and not duplicate. The query runs but then I get the error:
Insufficient key column information for updating or refreshing.
It does turn out that the fields which are primary keys in Table A have the same names as the fields in Table B
I found that this is a bug which occurs while updating ADO record-sets.
BUG: Problem Updating ADO Hierarchical Recordset When Join Tables Share Same Column Name
I have the following 2 questions:
Is it not a good idea to use Distinct on an inner join query?
Has anyone found a resolution for that bug associated with TADO Query's?
Thank you,

The way I would solve this is to construct an update query by hand and run it through TADOQuery.ExecSQL. That assumes you actually know what you are doing.
The question is WHY are you working on a recordset that results in multiples of the same row, on all fields? You should be inspecting your query and fixing it. DISTINCT doesn't help, because SQL Server has picked one record but ADO won't know which one it picked, since there isn't enough information to properly identify the source on each side of the JOIN.
This query pulls in a.id to make the source records identifiable:
SELECT Acc,Region,Off,Sale,OrgDate,id
FROM
(
SELECT b.Acc,b.Region,b.Off,b.Sale,a.OrgDate, a.id,
rn=row_number() over (partition by b.Acc,b.Region,b.Off order by a.id asc)
FROM sales b
JOIN dates a ON(a.Acc = b.Acc and a.Region = b.Region and a.year= b.year)
WHERE a.xdate <> a.yDate
and b.Sale = a.SaleDate
and b.year = 2010
) X
WHERE rn=1;
Not tested, but it should work with ADO

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?

ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.

This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).

Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);

For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL join to find inconsistencies between two data sources - sql

You need to look at the EXCEPT clause. It's new to SQL SERVER 2005 and does the same thing that Oracle's MINUS does. SQL1 EXCEPT SQL2 will give you all the rows in SQL1 not found in SQL2 IF SQL1 = A, B, C, D SQL2 = B, C, E the result is A, D

Building on Gabriel1836's answer, made simpler, but perhaps a bit harder to interpret: SELECT * FROM report_1 r1 FULL OUTER JOIN report_2 r2 ON r1.SAMAccountName = r2.SAMAccountName WHERE r2.SAMAccountName IS NULL OR r1.SAMAccountName IS NULL

take a look at the tabeldiff.exe that comes with sql server.

Related

SQL - query returning rows that presumably do not exist

problems while trying to optimize my SQL(inner join and group)

SQL- make all rows show a column value if one of the rows has it

Error while posting to a table

Why does my left join in Access have fewer rows than the left table?

Categories

Resources