Retrieving Common rows from two tables having Duplicate records - sql

Its my first question, so don't know where I might go wrong.
Here is my scenario :
I have two tables : tblA and tblB. A procedure that first fills some data in tblA and then the same data in tblB. Now due to certain glitch in procedure query, some values of certain columns were not inserted in tblB but in tblA. I now want to update tblB with the records in tblA. I can't find the corresponding records for update.
For easing the task I have given the sample ready to use query
--DROP TABLE #tblA
--DROP TABLE #tblB
--TRUNCATE TABLE #tblA
--TRUNCATE TABLE #tblB
CREATE TABLE #tblA (col1 char(2), col2 char(2), col3 int, col4 char)
CREATE TABLE #tblB (ID char(2),col1 char(2), col2 char(2), col3 int, col4 char)
INSERT INTO #tblA VALUES ('A','B',1,'C')
INSERT INTO #tblA VALUES ('A1','B',2,'C')
INSERT INTO #tblA VALUES ('A1','B',3,'C')
INSERT INTO #tblB VALUES ('I1','A','B',NULL,'C') -- Here we can see that values could not be inserted in col3
INSERT INTO #tblB VALUES ('I2','A1','B',NULL,'C')
INSERT INTO #tblB VALUES ('I3','A1','B',3,'C')
What I have tried so far (excluding other queries that I tried, this gives me the closest result)
(i've even tried the commented section)
--UPDATE B
--SET B.col3=A.col3
SELECT DISTINCT A.*, B.*
FROM #tblA A
INNER JOIN #tblB B
ON 1=1
AND A.col1=B.col1
AND A.col2 = B.col2
AND A.col4 = B.col4
--AND (A.col3<>B.col3)
AND (B.col3 IS NULL OR (B.col3 IS NOT NULL AND A.col3=B.col3))
--AND B.col3 IS NULL
The result I get is as followed:
col1 col2 col3 col4 ID col1 col2 col3 col4
A B 1 C I1 A B NULL C
A1 B 2 C I2 A1 B NULL C
A1 B 3 C I2 A1 B NULL C
A1 B 3 C I3 A1 B 3 C
What modification should I do in my select query (which is going to be my update query) so that I get result as followed : (result that removes third row)
col1 col2 col3 col4 ID col1 col2 col3 col4
A B 1 C I1 A B NULL C
A1 B 2 C I2 A1 B NULL C
A1 B 3 C I3 A1 B 3 C

Unfortunately, based on the information you have given, I don't think there is a good generalized solution. I'd need more information about the possible patterns. As asked, the obvious answer is:
SELECT *
FROM #tblA a
inner join #tblb b
on a.col1=b.col1
and a.col2 = b.col2
and a.col4 = b.col4
and a.col3 = right(b.ID,1)
alternatively, the last line could be and a.col3 = coalesce(b.col3,right(b.ID,1))
Which depends on the ID in #tblb being reflected in Col3. I don't think that's what you want. From an algorithmic point of view, if you can't depend on a meaningful ID, I'm not sure there is a good way to do this. If you can describe the general angorithm you want to apply, then we can write the code to match it.
For example, something that might be meaningful is:
Tbla joins to tblb on all columns
When the join fails, I want to pick the first record from tblb that:
matches columns 1, 2, and 4 in table a
has not already been joined to tbla when sorted by ID
In certain situations I could see that algorithm being useful. But in essence, your are joining tables with duplicate keys, which gets you a cross join on the records where the key is duplicated.
Hope this helps. If you clarify a bit more what you are trying to do, I might be able to help a bit more.
UPDATE:
If, as you say in your comment, you simply want to make sure the two tables match exactly, then you can use the following code:
CREATE TABLE #tblA (col1 char(2), col2 char(2), col3 int, col4 char)
CREATE TABLE #tblB (col1 char(2), col2 char(2), col3 int, col4 char)
INSERT INTO #tblA VALUES ('A','B',1,'C')
INSERT INTO #tblA VALUES ('A1','B',2,'C')
INSERT INTO #tblA VALUES ('A1','B',3,'C')
INSERT INTO #tblB VALUES ('A','B',NULL,'C')
INSERT INTO #tblB VALUES ('A1','B',NULL,'C')
INSERT INTO #tblB VALUES ('A1','B',3,'C')
delete from #tblb
where NOT Exists (Select * from #tbla a
where a.col1 = #tblb.col1
and a.col2 = #tblb.col2
and a.col3 = #tblb.col3
and a.col4 = #tblb.col4)
INSERT #tblb
SELECT * from #tbla
where not exists (select * from #tblb b
where b.col1 = #tbla.col1
and b.col2 = #tbla.col2
and b.col3 = #tbla.col3
and b.col4 = #tbla.col4)
select * from #tbla a
inner join #tblb b
on b.col1 = a.col1
and b.col2 = a.col2
and b.col3 = a.col3
and b.col4 = a.col4
Of course, depending on the size of the table, the size of the row, the indexing of both tables, and how often you run it, it might make more sense to simply drop and reinsert.
It is possible to only update the columns that are incorrect, but I think the coding and maintenance overhead would no be worth it, not to mention that performance would probably be pretty bad.
Does this help?

you don't want to use inner join as there are multiple values comes, you want TOP 1 or Top First record's Col3 value.
So I update the update query as below and this update the result as you want.
update #tblB
SET col3=
( select top 1 a.col3 from #tblA a where A.col1=#tblB.col1
AND A.col2 = #tblB.col2
AND A.col4 = #tblB.col4)
select * from #tblB

Related

How to update table without using update keyword

I have table a ,table b with same columns .I want to replace the value in table b with table a value without using update keyword.
The question could use a bit more detail on the table structure, what exactly you're trying to accomplish, and what precludes you from using UPDATE, but here goes:
CREATE TABLE #tempTable (col1, col2, col3, ...)
INSERT INTO #tempTable
SELECT
b.col1
, b.col2
, a.col3
, ...
FROM a
INNER JOIN b
ON a.col1 = b.col1
DELETE FROM b
WHERE col1 IN (SELECT col1 FROM a)
INSERT INTO b
SELECT
col1
, col2
, col3
, ...
FROM #TempTable
Which of course makes the bold assumption that Table a and b share a primary key, and that Table b doesn't have any constraint that would prevent deletion of matched rows. Please, provide some more detail and I'll update my answer accordingly.

SQL how to check is a value in a col is NOT in another table

Maybe I need another coffee because this seems so simple yet I cannot get my head around it.
Let's say I have a tableA with a col1 where employee IDs are stored.... ALL employee IDs. And the 2nd table, tableB has col2 which lists all employeeID who have a negative evaluation.
I need a query which returns all ID's from col1 from table1 and a newcol which show a '1' for those ID's which do NOT exist in col2 of TableB.
I am doing this in dashDB
One option uses a LEFT JOIN between the two tables:
SELECT a.col1,
CASE WHEN b.col2 IS NULL THEN 1 ELSE 0 END AS new_col
FROM tableA a
LEFT JOIN tableB b
ON a.col1 = b.col2
Alternatively you can achieve your requirement with LEFT JOIN along with IFNULL function as below.
SELECT a.col1,
IFNULL(b.col2, 1) NewCol
FROM tableA a
LEFT JOIN tableB b
ON a.col1 = b.col2

Select and Update are returning/updating different number of records with same join

Pardon my long question but I am more looking for troubleshooting advice. Both A.DR_CR and B.DR_CR has distinct values 'D' or 'C' I wanted to audit the number of records in table A where the value will be changed (D to C or vice versa) hence I wrote the following select before update statement. The join keys ID and SRC_TRAN_ID are NOT unique in both tables.
The select statement shows the count of credit updated to debit (CR_TO_DB) is 588017
and DB_TO_CR is 924119
But when I run the update with the same join condition, I see that number of credits updated to debit are 3257 LESS than the above select result for CR_TO_DB and similarly, DB_TO_CR count is 3257 more than 924119.
I am unable to troubleshoot why the update does NOT update the number of records returned by the select with same join condition. Please help.
SELECT
(CASE WHEN A.DR_CR != B.DR_CR AND A.DR_CR = 'C' AND B.DR_CR='D' THEN 1 END) CR_TO_DB
,(CASE WHEN A.DR_CR != B.DR_CR AND A.DR_CR = 'D' AND B.DR_CR='C' THEN 1 END) DB_TO_CR
INTO MTL_REPORT
FROM #STAGING_TRANSACTIONS A
JOIN #CLEAN_MTL_UPDATE B
ON A.SRC_TRANID = B.SRC_TRANID
AND A.ID = B.ID
UPDATE A
SET
A.DR_CR = B.DR_CR
FROM #STAGING_TRANSACTIONS A
JOIN #CLEAN_MTL_UPDATE B
ON A.SRC_TRANID = B.SRC_TRANID
AND A.ID = B.ID
I'll explain it in simpler terms with smaller data sets. Given these two tables:
CREATE TABLE TableA (Col1 INT, Col2 INT, Col3 VARCHAR(10))
CREATE TABLE TableB (Col1 INT, Col2 INT, Col3 VARCHAR(10))
INSERT INTO TableA (Col1, Col2, Col3) VALUES (1, 1, 'a')
INSERT INTO TableA (Col1, Col2, Col3) VALUES (1, 1, 'b')
INSERT INTO TableA (Col1, Col2, Col3) VALUES (1, 2, 'c')
INSERT INTO TableB (Col1, Col2, Col3) VALUES (1, 1, 'd')
INSERT INTO TableB (Col1, Col2, Col3) VALUES (1, 1, 'e')
INSERT INTO TableB (Col1, Col2, Col3) VALUES (1, 2, 'f')
This query will give you 5 rows because Col1=1, Col2=1 exists twice on both tables.
SELECT *
FROM TableA
JOIN TableB ON TableA.Col1 = TableB.Col1 AND TableA.Col2 = TableB.Col2
You get pairs of Col3 like this:
a, d
b, d
a, e
b, e
c, f
An UPDATE statement using the same join essentially filters out the unique records and only reports on them. After all, you want to know how many records were affected.

Filtering out null values only if they exist in SQL select

Below I have sql select to retrieve values from a table. I want to retrieve the values from tableA regardless of whether or not there are matching rows in tableB. The below gives me both non-null values and null values. How do I filter out the null values if non-null rows exist, but otherwise keep the null values?
SELECT a.* FROM
(
SELECT
id,
col1,
coll2
FROM tableA a LEFT OUTER JOIN tableB b ON b.col1=a.col1 and b.col2='value'
WHERE a.id= #id
AND a.col2= #arg
) AS a
ORDER BY col1 ASC
You can do this by counting the number of matches using a window function. Then, either return all rows in A if there are no matching B rows, or only return the rows that do match:
select id, col1, col2
from (SELECT a.id, a.col1, a.coll2,
count(b.id) over () as numbs
FROM tableA a LEFT OUTER JOIN tableB b ON b.col1=a.col1 and b.col2='value'
WHERE a.id = #id AND a.col2= #arg
) ab
where numbs = 0 or b.id is not null;
Filter them out in WHERE clause
SELECT
id,
col1,
coll2
FROM tableA a LEFT OUTER JOIN tableB b ON b.col1=a.col1 and b.col2='value'
WHERE a.id= #id
AND a.col2= #arg
AND A.Col1 IS NOT NULL -- HERE
) AS a
ORDER BY col1 ASC
For some reason, people write code like Marko that puts a filter (b.col2 = 'value') in the JOIN clause. While this works, it is not good practice.
Also, you should get in the habit of having the ON clause in the right sequence. We are joining table A to table B, why write it as B.col1 = A.col1 which is backwards.
While the above statement works, it could definitely be improved.
I created the following test tables.
-- Just playing
use tempdb;
go
-- Table A
if object_id('A') > 0 drop table A
go
create table A
(
id1 int,
col1 int,
col2 varchar(16)
);
go
-- Add data
insert into A
values
(1, 1, 'Good data'),
(2, 2, 'Good data'),
(3, 3, 'Good data');
-- Table B
if object_id('B') > 0 drop table B
go
create table B
(
id1 int,
col1 int,
col2 varchar(16)
);
-- Add data
insert into B
values
(1, 1, 'Good data'),
(2, 2, 'Good data'),
(3, NULL, 'Null data');
Here is the improved statement. I choose literals instead of variables. However, you can change for your example.
-- Filter non matching records
SELECT
A.*
FROM A LEFT OUTER JOIN B ON
A.col1 = B.col1
WHERE
B.col1 IS NOT NULL AND
A.id1 in (1, 2) AND
A.col2 = 'Good data'
ORDER BY
A.id1 DESC
Here is an image of the output.

Updating and join on multiple rows, which row's value is used?

Let's say I have the following statement and the inner join results in 3 rows where a.Id = b.Id, but each of the 3 rows have different b.Value's. Since only one row from tableA is being updated, which of the 3 values is used in the update?
UPDATE a
SET a.Value = b.Value
FROM tableA AS a
INNER JOIN tableB as b
ON a.Id = b.Id
I don't think there are rules for this case and you cannot depend on a particular outcome.
If you're after a specific row, say the latest one, you can use apply, like:
UPDATE a
SET a.Value = b.Value
FROM tableA AS a
CROSS APPLY
(
select top 1 *
from tableB as b
where b.id = a.id
order by
DateColumn desc
) as b
Usually what you end up with in this scenario is the first row that appears in the order of the physical index on the table. In actual practice, you should treat this as non-deterministic and include something that narrows your result to one row.
Here is what I came up with using SQL Server 2008
--drop table #b
--drop table #a
select 1 as id, 2 as value
into #a
select 1 as id, 5 as value
into #b
insert into #b
select 1, 3
insert into #b
select 1, 6
select * from #a
select * from #b
UPDATE #a
SET #a.Value = #b.Value
FROM #a
INNER JOIN #b
ON #a.Id = #b.Id
It appears that it uses the top value of a basic select each time (row 1 of select * from #b). So, it possibly depends on indexing. However, I would not rely on the implementation set by SQL, as that has the possibility of changing. Instead, I would suggest using the solution presented by Andomar to make sure you know what value you are going to choose.
In short, do not trust the default implementation, create your own. But, this was an interesting academic question :)
Best option in my case for updating multiple records is to use merge Query(Supported from SQL Server 2008), in this query you have complete control of what you are updating.
Also you can use output query to do further processing.
Example: Without Output clause(only update)
;WITH cteB AS
( SELECT Id, Col1, Col2, Col3
FROM B WHERE Id > 10 ---- Select Multiple records
)
MERGE A
USING cteB
ON(A.Id = cteB.Id) -- Update condition
WHEN MATCHED THEN UPDATE
SET
A.Col1 = cteB.Col1, --Note: Update condition i.e; A.Id = cteB.Id cant appear here again.
A.Col2 = cteB.Col2,
A.Col3 = cteB.Col3;
Example: With OputPut clause
CREATE TABLE #TempOutPutTable
{
PkId INT NOT NULL,
Col1 VARCHAR(50),
Col2 VARCHAR(50)
}
;WITH cteB AS
( SELECT Id, Col1, Col2, Col3
FROM B WHERE Id > 10
)
MERGE A
USING cteB
ON(A.Id = cteB.Id)
WHEN MATCHED THEN UPDATE
SET
A.Col1 = cteB.Col1,
A.Col2 = cteB.Col2,
A.Col3 = cteB.Col3
OUTPUT
INSERTED.Id, cteB.Col1, A.Col2 INTO #TempOutPutTable;
--Do what ever you want with the data in temporary table
SELECT * FROM #TempOutPutTable; -- you can check here which records are updated.
Yes, I came up with a similar experiment to Justin Pihony:
IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test ;
SELECT
1 AS Name, 0 AS value
INTO #test
IF OBJECT_ID('tempdb..#compare') IS NOT NULL DROP TABLE #compare ;
SELECT 1 AS name, 1 AS value
INTO #compare
INSERT INTO #compare
SELECT 1 AS name, 0 AS value;
SELECT * FROM #test
SELECT * FROM #compare
UPDATE t
SET t.value = c.value
FROM #test t
INNER JOIN #compare c
ON t.Name = c.name
Takes the topmost row in the comparison, right-side table. You can reverse the #compare.value values to 0 and 1 and you'll get the reverse. I agree with the posters above...its very strange that this operation does not throw an error message as it is completely hidden that this operation IGNORES secondary values