SQL find the same column in different tables - sql

I have 2 very large tables. I try to figure out what they have in common.
They do not have the same numbers of columns. I could go about to just look at each column name from each table and compare - but they both have hundreds of columns (I have to do it for many such tables).
I use MS Sql server.
There are no constrains and no foregin keys on any of them.
How do I go about doing that ?
Something like this:
select * AS "RES" from Table1 where RES IN (select * column from Table2)
Thanks in advance.

If you're looking for column names which are the same between two tables, this should work:
select name from syscolumns sc1 where id = object_id('table1') and exists(select 1 from syscolumns sc2 where sc2.name = sc1.name and sc2.id = object_id('table2'))
You could also make sure they're the same type by tossing in and sc1.xtype = sc2.xtype in the subquery.

If I understood correctly, you are trying to compare the data in the two tables and check what the data has in common.
Provided that you have the columns you want to use for comparison (Table1.YourColumn and Table2.OtherColumn, in the example), you can do this:
select YourColumn from Table1 t1
where exists (select OtherColumn
from Table2 t2
where t2.OtherColumn = t1.YourColumn)

DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX), #Table1 AS NVARCHAR(MAX)='Table1' , #Table2 AS NVARCHAR(MAX)='Table2'
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(A.COLUMN_NAME)
from INFORMATION_SCHEMA.COLUMNS A
join INFORMATION_SCHEMA.COLUMNS B
on A.COLUMN_NAME = B.COLUMN_NAME
where A.TABLE_NAME = #Table1
and B.TABLE_NAME = #Table2 and A.COLUMN_NAME not in ('Doc','CreatedBy')
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT ' + #cols + '
from
(select A.COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS A
join INFORMATION_SCHEMA.COLUMNS B
on A.COLUMN_NAME = B.COLUMN_NAME
where A.TABLE_NAME = '''+#Table1+'''
and B.TABLE_NAME = '''+#Table2+'''
) x
pivot
(
Max(COLUMN_NAME)
for COLUMN_NAME in (' + #cols + ')
) p '
execute sp_executesql #query

Here is an SP to find common columns in two different tables..
Works in SQL Server
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE GetColumnsData(#T_NAME1 varchar,#T_NAME2 varchar)
AS
BEGIN
DECLARE #Co int;
SET #co = 0;
CREATE TABLE #TEMP_TABLE(C_NAME VARCHAR(50),D_TYPE VARCHAR(50),T_NAME VARCHAR(50));
INSERT INTO #TEMP_TABLE (C_NAME,D_TYPE,T_NAME)( SELECT COLUMN_NAME,DATA_TYPE,
TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = #T_NAME1 OR
TABLE_NAME= #T_NAME2);
SELECT #Co = COUNT(*) from #TEMP_TABLE t , #TEMP_TABLE t1 WHERE t1.C_NAME = t.C_NAME
and t.D_TYPE = t1.D_TYPE and t.T_NAME != t1.T_NAME
PRINT #co
END

Assuming your RDBMS supports digests, you could calculate the digest of each row and join on the digest. Something like:
SELECT T1.*
FROM
(SELECT *, MD5(col1, col2,...) as digest
FROM Table1) T1,
(SELECT *, MD5(col1, col2,...) as digest
FROM Table2) T2
WHERE T1.digest = T2.digest
I'm assuming that the two tables have the same columns and column types.

Related

INNER JOIN with ON All columns except one Column

I have 2 tables(Table1 and Table2). Both tables schema are exactly the same and both might have duplicated set of records except IDs since ID is auto generated.
I would like to get the common set of records but with ID to follow as Table1's ID. So, I query using Inner join. It works as I expected.
SELECT Table1.ID, Table1.Param1, Table1.Param2, Table1.Param3
INTO #Common
FROM Table1
INNER JOIN Table2 ON Table1.Param1 = Table2.Param1
AND Table1.Param2 = Table2.Param2
AND Table1.Param3 = Table2.Param3
However, in actual usage, the total number of parameters in both tables will be around 100. So, the total number of comparison inside ON clause will increase up to 100.
How can I do inner join by excluding one column instead of comparing all columns in ON clause?
By removing ID column from both tables and doing intersect also no possible since I still want to extract Table1 ID for other purpose.
I can achieve the common of 2 table by removing ID and compare those 2 table.
However, that still do not serve my requirement, since I need to get Table1 ID for those common data.
SELECT * INTO #TemporaryTable1 FROM Table1
ALTER TABLE #TemporaryTable1 DROP COLUMN ID
SELECT * INTO #TemporaryTable2 FROM Table2
ALTER TABLE #TemporaryTable2 DROP COLUMN ID
SELECT * INTO #Common FROM (SELECT * FROM #TemporaryTable1 INTERSECT SELECT * FROM #TemporaryTable2) data
SELECT * FROM #Common
If i understood your problem correctly i guess you could generate dynamically the query you want to use using the following code :
DECLARE #SQL nvarchar(max) = 'SELECT ',
#TBL1 nvarchar(50) = 'data',
#TBL2 nvarchar(50) = 'data1',
#EXCLUDEDCOLUMNS nvarchar(100)= 'ID,col1'
-- column selection
SELECT #sql += #tbl1 + '.' + COLUMN_NAME + ' ,
'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #TBL1
-- from clause and remove last ,
set #SQL = LEFT(#sql,LEN(#sql) - 5)
SET #sql += '
FROM ' + #TBL1 + ' INNER JOIN
' + #TBL2 + '
ON '
-- define the on clause
SELECt #SQL += #tbl1 + '.' + COLUMN_NAME + ' = '+ #tbl2 + '.' + COLUMN_NAME +',
'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #TBL1
AND COLUMN_NAME not in (#EXCLUDEDCOLUMNS)
--remove last ,
set #SQL = LEFT(#sql,LEN(#sql) - 3)
--SELECt #SQL
EXEC SP_EXECUTESQL #sql
Before you execute make sure the #sql is properly generated. choose the columns you want to exclude from your on clause using the #EXCLUDEDCOLUMNS parameter.

Pull Table Information from code

I have a table in SQL Server 2012 with 3 columns:
RDDID, SPDESC, SQLTEXT
Column SQLTEXT contains inline queries.
Is it possible to extract all table names from SQLTEXT and put them into a separate column?
INPUT:
RDDID|SPDESC|SQLTEXT
10XH1|DAGASC|SELECT COL1 FROM TABLE1 AS A JOIN TABLE2 B ON A.ID=B.ID JOIN TABLE3 AS C ON A.ID=C.ID
OUTPUT
RDDID|SPDESC|COLX1|COLX2|COLX3|
10XH1|DAGASC|TABLE1|TABLE2|TABLE3
Please share your thoughts.
If you could TRY Regex or any simple technique to split the line with delimiter[space],
You could try the following:
declare #query varchar(1000) = 'SELECT COL1 FROM TABLE1 AS A JOIN TABLE2 B ON A.ID=B.ID JOIN TABLE3 AS C ON A.ID=C.ID'
select b.name, 'ColX'+ cast(row_number() over( order by (select 0)) as varchar) as rn into #temp
from dbo.RegExSplit(' ',#query,1) a join sys.tables b on a.match = b.name
declare #cols varchar(100)
select #cols =
stuff((select ',' +rn from #temp for xml path('')),1,1,'')
exec('select * from #temp pivot(max(name) for rn in ('+#cols+' ))piv')
drop table #temp

One view from two tables with identical column names

We have two tables that we need to merge into a singular view. Normally I'd individually select columns to avoid this issue, however in this case the two tables are a combined 800 columns.
The only identical columns are the identifier columns. Unfortunately these cannot be changed as they are used by a 3rd party tool to sync table
Table A
GUID
Name
Address
...
Table B
GUID
Cell
Fax
Home2
...
Are good examples, just assume each table has 400 odd columns.
Obviously the traditional
SELECT a.*, b.* from table_a a, table_b a where a.guid = b.guid
Fails miserably. Is there any easy way to create the view without having to list out 799 individual column names? I was thinking perhaps a one off function to create the view but so far I'm hitting a wall.
You can use dynamic sql as a solution.
CREATE TABLE test1 (id INT, col1 NVARCHAR(50), col2 NVARCHAR(50))
GO
CREATE TABLE test2(id INT, col1 NVARCHAR(50), col2 NVARCHAR(50))
GO
DECLARE #sql NVARCHAR(max) = ''
; WITH cte AS (
SELECT
CASE WHEN TABLE_NAME = 'test1' THEN TABLE_NAME + '.' + COLUMN_NAME + ' AS ' + + COLUMN_NAME + 't1' ELSE TABLE_NAME + '.' + COLUMN_NAME + ' AS ' + + COLUMN_NAME + 't2' END AS a, 1 AS ID
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME IN ('test1', 'test2')
)
SELECT #sql =
'CREATE VIEW myview as
select ' + (
SELECT
STUFF(
(
SELECT ', '+ [A]
FROM cte
WHERE ID = results.ID
FOR XML PATH(''), TYPE
).value('(./text())[1]','VARCHAR(MAX)')
,1,2,''
) AS NameValues
FROM cte results
GROUP BY ID
) + ' from test1 join test2 on test1.id = test2.id'
PRINT #sql
--EXEC (#sql)
The result is
CREATE VIEW myview
AS
SELECT test1.id AS idt1 ,
test1.col1 AS col1t1 ,
test1.col2 AS col2t1 ,
test2.id AS idt2 ,
test2.col1 AS col1t2 ,
test2.col2 AS col2t2
FROM test1
JOIN test2 ON test1.id = test2.id

how to find "String or binary data would be truncated" error on sql in a big query

I have a huge INSERT INTO TABLE1 (....) SELECT .... FROM TABLE2 statement. It gives me the error
"String or binary data would be truncated".
I know that one of the columns from TABLE2 is way bigger for one column from TABLE1 in the INSERT statement.
I have more than 100 columns in each table. So it is hard to find out the problem. Is there any easier way to figure this out?
You can query Information_Schema.Columns for both tables and check the difference in content length.
Assuming your tables have the same column names, you can use this:
SELECT t1.Table_Name, t1.Column_Name
FROM INFORMATION_SCHEMA.Columns t1
INNER JOIN INFORMATION_SCHEMA.Columns t2 ON (t1.Column_Name = t2.Column_Name)
WHERE t1.Table_Name = 'Table1'
AND t2.Table_Name = 'Table2'
AND ISNULL(t1.Character_maximum_length, 0) < ISNULL(t2.Character_maximum_length, 0)
Assuming your tables have different column names, you can do this and just look for the difference
SELECT Table_Name, Column_Name, Character_maximum_length
FROM INFORMATION_SCHEMA.Columns
WHERE Table_Name IN('Table1', 'Table2')
ORDER BY Column_Name, Character_maximum_length, Table_Name
To figure out which column the data is too long fit in, I would use following statement to output the results to a temp table.
SELECT ...
INTO MyTempTable
FROM Table2
Then use the query example from this article to get the max data length of each column. I have attached a copy of the code below.
DECLARE #TableName sysname = 'MyTempTable', #TableSchema sysname = 'dbo'
DECLARE #SQL NVARCHAR(MAX)
SELECT #SQL = STUFF((SELECT
' UNION ALL select ' +
QUOTENAME(Table_Name,'''') + ' AS TableName, ' +
QUOTENAME(Column_Name,'''') + ' AS ColumnName, ' +
CASE WHEN DATA_TYPE IN ('XML','HierarchyID','Geometry','Geography','text','ntext') THEN 'MAX(DATALENGTH('
ELSE 'MAX(LEN('
END + QUOTENAME(Column_Name) + ')) AS MaxLength, ' +
QUOTENAME(C.DATA_TYPE,'''') + ' AS DataType, ' +
CAST(COALESCE(C.CHARACTER_MAXIMUM_LENGTH, C.NUMERIC_SCALE,0) AS VARCHAR(10)) + ' AS DataWidth ' +
'FROM ' + QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(Table_Name)
FROM INFORMATION_SCHEMA.COLUMNS C
WHERE TABLE_NAME = #TableName
AND table_schema = #TableSchema
--AND DATA_TYPE NOT IN ('XML','HierarchyID','Geometry','Geography')
ORDER BY COLUMN_NAME
FOR XML PATH(''),Type).value('.','varchar(max)'),1,11,'')
EXECUTE (#SQL)
#ZoharPeled answer is great, but for temp tables you have to do something a little different:
SELECT t1.Table_Name
,t1.Column_Name
,t1.Character_maximum_length AS Table1_Character_maximum_length
,t2.Character_maximum_length AS Table2_Character_maximum_length
FROM INFORMATION_SCHEMA.Columns t1
INNER JOIN tempdb.INFORMATION_SCHEMA.COLUMNS t2 ON (t1.Column_Name = t2.Column_Name)
WHERE t1.Table_Name = 'Table1'
AND t2.Table_Name LIKE '#Table2%' -- Don't remove the '%', it's required
AND ISNULL(t1.Character_maximum_length, 0) < ISNULL(t2.Character_maximum_length, 0)
If the column names are the same, you could try something like this:
SELECT
c1.name as ColumnName,
c1.max_length AS Table1MaxLength,
c2.max_length AS Table2MaxLength
FROM
sys.columns c1
inner join sys.columns c2 on c2.name = c1.name
WHERE
c1.object_id = OBJECT_ID('TABLE1')
c2.object_id = OBJECT_ID('TABLE2')
You can query for the definitions of the two tables from information_schema.columns and then get the diff using EXCEPT
CREATE TABLE peter(a INT, b BIGINT, c VARCHAR(100));
CREATE TABLE peter2(a INT, b BIGINT, c VARCHAR(800));
SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'peter'
EXCEPT
SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'peter2'
Merhaba Arif,
What I can suggest is to make comparison easier is to list the related table column definitions from sys.columns and make the comparison manually
SELECT * FROM sys.columns WHERE object_id = object_id('tablename')
Perhaps you can limit the returned list with string data type columns, or numeric values with sizes like int, bigint, etc.
Try:
Select ID from TABLE2 where LEN(YourColumn) > SIZE
Try this one
With Data as (
SELECT
Table_Name, Column_Name, Character_maximum_length, Ordinal_Position,
LEAD(Character_maximum_length,1) Over(Partition by Column_Name Order by Table_Name) as NextValue
FROM INFORMATION_SCHEMA.Columns
WHERE Table_Name IN('Table1', 'Table2')
)
Select * , CHARACTER_MAXIMUM_LENGTH - NextValue as Variance
from Data
Where NextValue is not null and ( CHARACTER_MAXIMUM_LENGTH - NextValue) <> 0
ORDER BY Column_Name, Character_maximum_length, Table_Name
See #aaron-bertrand's researched response on Stack Exchange's DBA site.
Basically
Turn on the tracing with DBCC TRACEON(460);
Execute your INSERT code
Turn on the tracing with DBCC TRACEOFF(460);
Using Aaron's example code:
DBCC TRACEON(460);
GO
INSERT dbo.x(a) VALUES('foo');
GO
-- Drop if this is a test table with: DROP TABLE dbo.x;
DBCC TRACEOFF(460);
I am a beginner and I faced it during inserting names of Employees fname and lname. I did not specify the number of characters.
instead of writing this (wrong code):
create table Employee(
fname varchar ,
lname varchar
)
write this:
create table Employee(
fname varchar(10) ,
lname varchar(10)
)

Generic code to determine how many rows from a table are in a different table with matching structure?

How can I create a generic function in C# (LINQ-to-SQL) or SQL that takes two tables with matching structure and counts how many rows in TableA that are in TableB.
TableA Structure
Value1, Value2
TableA Data
1,1
TableB Structure
Value1, Value2
TableB Data
1,1,
1,2
To get count of matching rows between TableA and TableB:
SELECT COUNT(*) FROM TableA
INNER JOIN TableB ON
TableA.Value1 = TableB.Value1 AND
TableA.Value2 = TableB.Value2
The result in this example
1
So the query above works great, but I don't want to have to write a version of it for every pair of tables I want to do this for since the INNER JOIN is on every field. I feel like there should be a more generic way to do this instead having to manually type out all of the join conditions. Thoughts?
Edit: Actually, I think I'll need a C#/LINQ answer since this will be done across servers. Again, this is annoying because I have to write the code to compare each field manually.
var tableARows = dbA.TableA.ToList();
var tableBRows = dbB.TableB.ToList();
var match = 0;
foreach(tableARow in tableARows){
if(tableBRows.Where(a=>a.Value1 = tableARow.Value1 && a.Value2 = tableARow.Value2).Any()){
match++;
}
}
return match;
using sql server this will work
var sql="select count(0) from(
select * from product except select * from product1
) as aa";
dc = dtataContext
var match= dc.ExecuteStoreQuery<int>(sql);
You could generate the join using syscolumns.
declare #tablenameA varchar(50) = 'tableA'
declare #tablenameB varchar(50) = 'tableB'
declare #sql nvarchar(4000)
select #sql =''
select #sql = #sql + ' and ' + quotename(#tablenameA )+ '.'
+ c.name +' = ' + quotename(#tablenameB )+ '.' + c.name
from syscolumns c
inner join sysobjects o on c.id = o.id
where o.name = #tablenameA
select #sql = 'select count(*) from ' + #tablenameA + ' inner join '+#tablenameB+' on '
+ substring (#sql, 5, len(#sql))
exec sp_executesql #sql
You query the ANSI INFORMATION_SCHEMA views, thus:
select *
from INFORMATION_SCHEMA.COLUMNS col
where col.TABLE_SCHEMA = <your-schema-name-here>
and col.TABLE_NAME = <your-table-name-here>
order by col.ORDINAL_POSITION
against each of the tables involved. The result set will provide everything needed for your to construct the required query on the fly.
One fairly simple answer:
select ( select count(*) from
(select * from TableA UNION ALL select * from TableB) a ) -
( select count(*) from
(select * from TableA UNION select * from TableB) d ) duplicates