I need to find duplicates of the same database/table/column on two linked SQL servers
Note that the column may also have duplicates inside the table itself in each individual SQL server !
ie
server1.tableName.columnName:
john
john
mary
kate
kate
server2.tableName.columnName:
kate
I want the result in this case to be kate as it is the only entry that exists in both
I tried this:
select table1.columnName, table2.columnName, count(*)
from [server1].[dbName].[dbo].[tableName] table1
inner join [server2].[dbName].[dbo].[tableName] table2
ON table1.columnName = table2.columnName
group by table1.columnName, table2.columnName having count(table1.columnName) > 1
Which gives a set of results
My question is is this correct ? will I get an entry for any value in columnName that exists in dbName.tableName on both server1 and server2 ?
will I get an entry for any value in columnName that exists in dbName.tableName on both server1 and server2 ?
Not exactly. This would do what you want without the having -- the join is determining whether anything matches.
If you can leave the count out entirely, an alternative formulation uses exists:
select t1.columnName
from [server1].[dbName].[dbo].[tableName] t1
where exists (select 1
from [server2].[dbName].[dbo].[tableName] t2
where t1.columnName = t2.columnName
);
No, you'd only get results which have a duplicate within one of the two servers and is present in both.
The linked servers here are irrelevant, your question functions the same on any two tables. I believe you're asking for one of the two below queries.
DECLARE #table1 TABLE (Id INT)
DECLARE #table2 TABLE (Id INT)
INSERT INTO #table1 (Id)
VALUES (1), (2), (3)
INSERT INTO #table2 (Id)
VALUES (2), (2), (3), (4), (5), (5)
-- original query - 1 result
select table1.Id, table2.Id, count(*)
from #table1 table1
inner join #table2 table2
ON table1.Id = table2.Id
group by table1.Id, table2.Id having count(table1.Id) > 1
-- cross table duplicates - 2 results
select table1.id
from #table1 table1
where exists (select 1 from #table2 table2 where table1.Id = table2.Id)
-- cross/within table duplicates - 3 results
select unioned.Id
from (
select table1.Id
from #table1 table1
union all
select table2.Id
from #table2 table2
) unioned
group by unioned.Id
having count(*) > 1
you can left join the two tables on column name then select the column name from each table and exclude the null values. This will show duplicates if they exist in the output otherwise use a distinct to remove the duplicates.
Related
I have two tables.
Table 1
Id
UpdateId
Name
Table 2
Table1ID
UpdateID
Address
Each time user update, system will insert record to table1. But for table2, system only insert record when there is update in address.
Sample data
Table 1
1,1,name1
1,2,name1
1,3,name1update
1,4,name1update
1,5,name1
1,6,name2
Table 2
1,1,address
1,4,addressupdate
I want to get the result as following
1,1,name1,address
1,2,name1,address
1,3,name1update,address
1,4,name1update,addressupdate
1,5,name1,addressupdate
1,6,name2,addressupdate
How to make use of join condition to achieve as above?
You can use a correlated subquery. Here is standard syntax, but it can be easily adapted to any database:
select t1.*,
(select t2.addressid
from table2 t2
where t2.table1id = t1.id and
t2.updateid <= t1.updateid
order by t2.updateid desc
fetch first 1 row only
) as addressid
from table1 t1;
you can use left join when you want to take all columns from left table t1 even though it doesn't match with the other table with column updateid on t2 table.
select t1.id,t1.updateid,t1.name,t2.address from table1 t1
left join table2 t2
on t2.updateid= t1.updateid
you can read more about joins here
I am looking for an answer which is actually
Is It possible to rewrite every Join to equivalent Subquery
I know that Subquery columns can not be selected outer query.
I run a query in sql server which is
select DISTINct A.*,B.ParentProductCategoryID from [SalesLT].[Product] as
A inner join [SalesLT].[ProductCategory] as B on
A.ProductCategoryID=B.ProductCategoryID
select A.*
from [SalesLT].[Product] as A
where EXISTS(select B.ParentProductCategoryID from [SalesLT].
[ProductCategory] as B where A.ProductCategoryID=B.ProductCategoryID)
Both of these query giving me output 293 rows which I expected.
Now Problem is How do I select [SalesLT].[ProductCategory] the column in the 2nd case?
Do I need to co-relate this subquery in the select clause to get this column to be shown in output?
Is It possible to rewrite every Join to equivalent Subquery
No, because joins can 1) remove rows or 2) multiply rows
ex 1)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3)
SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
The row containing value 1 from t1 was removed. This does not happen in a subquery.
ex 2)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3), (3), (3), (3)
SELECT t1.num AS t1num, t2.num as t2num FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
3 3
3 3
3 3
A subquery would not change the number of rows in the table being queried.
In your example, you do an exists... this is not going to return the value from the 2nd table.
This is how I would subquery:
select A.*
,(SELECT B.ParentProductCategoryID
FROM [SalesLT].[ProductCategory] B
WHERE B.ProductCategoryID = A.ProductCategoryID) AS [2nd table ProductCategoryID]
from [SalesLT].[Product] as A
You might use
select A.*,
(
select B.ParentProductCategoryID
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID
) ParentProductCategoryID
from [SalesLT].[Product] as A
where EXISTS(select 1
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID)
however, I find the JOIN version much more intuitive.
There is no way for you to use any data from the EXISTS subquery in the outer query. The only purpose of the subquery is to evaluate whether the EXISTS is true or false for each product.
I have two tables.
Table 1 columns are
====================
(MAINID, XID, Name)
====================
(A1 1 SAP)
(B2 2 BAPS)
(C3 3 SWAMI)
Table 2 columns are
===================
(ID COL1)
===================
(1 XYZ)
(2 ABC)
Now, I want to find which XID value is not in Table2's ID column. In Table 1 XID is unique and also in Table 2 ID is PK.
select xid
from table1
where xid not in
(select id from table2)
Aln alternative solution is by using LEFT JOIN.
SELECT tb1.*
FROM Table1 AS tb1 LEFT JOIN Table2 AS tb2
ON tb1.XID = tb2.ID
WHERE tb2.ID IS NULL
This is a tipical case to use Sets difference, however, the solution provided by Rossana is faster than this one (not sure about Steve Howard solution):
select XID as ID from Table1
except
select ID from Table2;
SQLFiddle
This way you are obtaining those ID's from Table1 that are not in Table2.
Note this solution works in postgresql, other RDBMS uses a different clause as MINUS.
The next solution is faster than using IN and EXCEPT clauses:
select XID from Table1 t1
where (not exists (
select ID from Table2 t2 where (t1.XID = t2.ID)
));
SQLFiddle
Without using a while or forloop, is there a way to insert a record two or more times on a single insert?
Thanks
INSERT INTO TABLE2 ((VALUE,VALUE)
SELECT VALUE,VALUE FROM TABLE1 )) * 2
You would need to CROSS JOIN onto a table with 2 rows. The following would work in SQL Server.
INSERT INTO TABLE2 ((VALUE,VALUE)
SELECT VALUE,VALUE
FROM TABLE1, (SELECT 1 UNION ALL SELECT 2) T(C)
If you have an auxilliary numbers table you could also do
SELECT VALUE,VALUE
FROM TABLE1 JOIN Numbers ON N <=2
--first create a dummy table with 2 records
INSERT INTO TABLE2 ((VALUE,VALUE)
SELECT VALUE,VALUE FROM TABLE1, dummytable ))
This is not an elegant way, but could work easily.
If you have a table with an high enough number of records you can do the cross join with a TOP clause
INSERT INTO TABLE2
SELECT VALUE,VALUE FROM TABLE1
cross join (select top 2 TABLE_DUMMY) as DUMMY
This works for MQ SqlServer, to let it work in other DBMS you should change the TOP with the keyword needed by your DBMS
I've gotta question for you, I'm getting hard times trying to combine two tables, I can't manage to find the correct query.
I have two tables:
T1: 1column, Has X records
T2: 1column, Has Y records
Note: Y could never be greater than X but it often lesser than this one
I want to join those tables in order to have a table with two columns
t3: ColumnFromT1, columnFromT2.
When Y is lesser than X, the T2 field values gets repeated and are spread over all my other values, but I want to get NULL when ALL the columns from T2 are used.
How could I achieve that?
Thanks
You could give each table a row number in a subquery. Then you can left join on that row number. To recycle rows from the second table, take the modulus % of the first table's row number.
Example:
select Sub1.col1
, Sub2.col1
from (
select row_number() over (order by col1) as rn
, *
from #T1
) Sub1
left join
(
select row_number() over (order by col1) as rn
, *
from #T2
) Sub2
on (Sub1.rn - 1) % (select count(*) from #T2) + 1 = Sub2.rn
Test data:
declare #t1 table (col1 int)
declare #t2 table (col1 datetime)
insert #t1 values (1), (2), (3), (4), (5)
insert #t2 values ('2010-01-01'), ('2012-02-02')
This prints:
1 2010-01-01
2 2012-02-02
3 2010-01-01
4 2012-02-02
5 2010-01-01
You are looking for a LEFT JOIN (http://www.w3schools.com/sql/sql_join_left.asp) eg . T1 LEFT JOIN T2
say they both have column CustomerID in common
SELECT *
FROM T1
LEFT JOIN
T2 on t1.CustomerId = T2.CustomerId
This will return all records in T1 and those that match in T2 with nulls for the T2 values where they do not match.
Make sure you are joining the tables on a common column (or common column set if more than one column are necessary to perform the join). If not, you are doing a cartesian join ( http://ezinearticles.com/?What-is-a-Cartesian-Join?&id=3560672 )