MERGE Command in SQL Server - sql

I have been using the statement
insert into target
select * from source
where [set of conditions] for a while.
Recently found this MERGE command that will be more effective to use for my purpose so that I can change the above statement to
MERGE target
USING source ON [my condtion]
WHEN NOT MATCHED BY TARGET
THEN INSERT VALUES (source.col1, source.col2, source.col3)
But the problem for me is lets say if I have 20+ columns in my source table I have to list all of them, I need a way to specify it to insert source.* . Is there a way ? I'm new to SQL. Appreciate your help.
Thanks in advance :)

Me too; I hate typing column names.
I normally build the Merge statement in dynamic SQL.
I have a function that takes a table name as a parameter, and returns a string containing all column names formatted properly with Table Name prefix, [] brackets and comma, as in S.Col1, S.Col2, S.Col3
I could also tell you that I build a temp table with the required columns, and pass the temp table to my function, because some times you don't want a list of all columns. But that would probably be a confusing wooble, obscuring the important bits;
Use dynamic sql
Use a function to create csv list of columns.

Everything that I have read regarding the MERGE statement says that you need to specify the columns for your INSERT statement. If you are looking for a quick way to get the INSERT statment, you can right mouse click the table in SSMS and select Script Table As->INSERT To->Clipboard. You can then paste this into your query and alter just the VALUES part.
Merge statement

There's simply no advantage of using MERGE in this situation. Why overcomplicate? Stick to the KISS principle, for chrissake.
Anyways, here's the script:
declare
#targetTableName varchar(100) = 'target'
,#targetSchemaName varchar(20) = 'dbo'
,#sourceTableName varchar(100) = 'source'
,#sourceSchemaName varchar(20) = 'dbo2'
,#matchCondition varchar(50) = 't.id = s.id'
,#columns varchar(max)
set #columns = (select ','+quotename(c.name)
from sys.tables t
join sys.columns as c on t.object_id = c.object_id
join sys.schemas s on s.schema_id = t.schema_id
where t.name = #targetTableName and s.name = isnull(#targetSchemaName, s.name)
for xml path(''))
--a column name starts with a comma
declare #sql varchar(max) = '
merge #target t
using #source s on #matchCondition
when not matched then
insert (#columns)
values #sourceColumns'
set #sql =
replace(replace(replace(replace(replace(#sql
, '#matchCondition', #matchCondition)
--replace #columns with column list with the first comma removed
, '#columns', stuff(#columns, 1, 1, ''))
--replace #sourceColumns with column list with the 's.' prefix and comma removed
, '#sourceColumns', stuff(replace(#columns, ',', ',s.'),1,1,''))
, '#target', quotename(#targetSchemaName)+'.'+quotename(#targetTableName))
, '#source', quotename(#sourceSchemaName)+'.'+quotename(#sourceTableName))
print #sql
--exec(#sql)
And we'll get something like this:
merge [dbo].[target] t
using [dbo2].[source] s on t.id = s.id
when not matched then
insert ([column1], [column2], [column3], [column4])
values s.[column1], s.[column2], s.[column3], s.[column4]

Related

How do I identify the column(s) responsible for “String or binary data would be truncated.”

I have an INSERT statement which looks like this:
INSERT INTO CLIENT_TABLE
SELECT NAME, SURNAME, AGE FROM CONTACT_TABLE
My example above is a basic one, but is there a way to pass in a SELECT statement and then check the returned column values against what the actual field sizes are?
Checking LEN against every column isnt practical. I am looking for something that is automated.
My debugging in that kind of problem is..
I am removing columns in the SELECT one by one, if did not return error, then you know what column is the cause of truncation problem.. but here are some tips on debugging.
Option 1: Start first with the columns that hold more character.. like VARCHAR, for example in your case, i think the column NAME, SURNAME are the one causes an error since AGE column does not hold many characters because its integer. You should debug something like that.
Option 2: You can investigate the column in your final output. The final SELECT will return all columns and its values, then you can counter check if the values matches what you input on the UI etc.
Ex. See the Expected vs. Actual Output result on the image below
Expected:
Actual Output:
My example in option 2 shows that the truncated string is the SURNAME as you can see..
NOTE: You can only use the Option 2 if the query did not return execution error, meaning to say that the truncated string did not return an error BUT created an unexpected split string which we don't want.
IF the query return an error, your best choice is Option 1, which consume more time but worth it, because that is the best way to make sure you found the exact column that causes the truncation problem
Then if you already found the columns that causes the problem, you can now adjust the size of the column or another way is to limit the input of the user ?, you can put some validation to users to avoid truncation problem, but it is all up to you on how you want the program works depending on your requirement.
My answers/suggestion is base on my experience in that kind of situation.
Hope this answer will help you. :)
Check max length for each field, this way you can identify the fields that are over char limit specified in your table e.g CLIENT_TABLE.
SELECT Max(Len(NAME)) MaxNamePossible
, Max(Len(SURNAME)) MaxSurNamePossible
, Max(Len(AGE)) MaxAgePossible
FROM CONTACT_TABLE
Compare the result with Client_Table Design
Like if in Client_Table "Name" is of Type Varchar(50) and validation query( written above) return more than 50 chars than "Name" field is causing over flow.
There is a great answer by Aaron Bertrand to the question:
Retrieve column definition for stored procedure result set
If you used SQL Server 2012+ you could use sys.dm_exec_describe_first_result_set. Here is a nice article with examples. But, even in SQL Server 2008 it is possible to retrieve the types of columns of the query. Aaron's answer explains it in details.
In fact, in your case it is easier, since you have a SELECT statement that you can copy-paste, not something that is hidden in a stored procedure. I assume that your SELECT is a complex query returning columns from many tables. If it was just one table you could use sys.columns with that table directly.
So, create an empty #tmp1 table based on your complex SELECT:
SELECT TOP(0)
NAME, SURNAME, AGE
INTO #tmp1
FROM CONTACT_TABLE;
Create a second #tmp2 table based on the destination of your complex SELECT:
SELECT TOP(0)
NAME, SURNAME, AGE
INTO #tmp2
FROM CLIENT_TABLE;
Note, that we don't need any rows, only columns for metadata, so TOP(0) is handy.
Once those #tmp tables exist, we can query their metadata using sys.columns and compare it:
WITH
CTE1
AS
(
SELECT
c.name AS ColumnName
,t.name AS TypeName
,c.max_length
,c.[precision]
,c.scale
FROM
tempdb.sys.columns AS c
INNER JOIN tempdb.sys.types AS t ON
c.system_type_id = t.system_type_id
AND c.user_type_id = t.user_type_id
WHERE
c.[object_id] = OBJECT_ID('tempdb.dbo.#tmp1')
)
,CTE2
AS
(
SELECT
c.name AS ColumnName
,t.name AS TypeName
,c.max_length
,c.[precision]
,c.scale
FROM
tempdb.sys.columns AS c
INNER JOIN tempdb.sys.types AS t ON
c.system_type_id = t.system_type_id
AND c.user_type_id = t.user_type_id
WHERE
c.[object_id] = OBJECT_ID('tempdb.dbo.#tmp2')
)
SELECT *
FROM
CTE1
FULL JOIN CTE2 ON CTE1.ColumnName = CTE2.ColumnName
WHERE
CTE1.TypeName <> CTE2.TypeName
OR CTE1.max_length <> CTE2.max_length
OR CTE1.[precision] <> CTE2.[precision]
OR CTE1.scale <> CTE2.scale
;
Another possible way to compare:
WITH
... as above ...
SELECT * FROM CTE1
EXCEPT
SELECT * FROM CTE2
;
Finally
DROP TABLE #tmp1;
DROP TABLE #tmp2;
You can tweak the comparison to suit your needs.
A manual solution is very quick if you are using SQL Server Manager Studio (SSMS). First capture the table structure of your SELECT statement into a working table:
SELECT COL1, COL2, ... COL99 INTO dbo.zz_CONTACT_TABLE
FROM CONTACT_TABLE WHERE 1=0;
Then in SSMS, right-click your original destination table (CLIENT_TABLE) and script it as create to a new SSMS window. Then right-click your working table (zz_CONTACT_TABLE) and script the creation of this table to a second SSMS window. Arrange both windows side by side and check the columns of zz_CONTACT_TABLE against CLIENT_TABLE. Differences in length and out-of-order columns will be immediately seen, even if there are hundreds of output columns.
Finally drop your working table:
DROP TABLE dbo.zz_CONTACT_TABLE;
Regarding an automated solution, it is difficult to see how this could work. Basically you are comparing a destination table (or a subset of columns in a destination table) against the output of a SELECT statement. I suppose you could write a stored procedure that takes two varchar parameters: the name of the destination table and the SELECT statement that would populate it. But this would not handle the case where only some columns of the destination are populated, and it would be more work than the manual solution above.
Here is some code to compare two row producing SQL statements to compare the columns. It takes as parameters two row-sets specified with server name, database name, and T-SQL query. It can compare data in different databases and even on different SQL Servers.
--setup parameters
declare #Server1 as varchar(128)
declare #Database1 as varchar(128)
declare #Query1 as varchar(max)
declare #Server2 as varchar(128)
declare #Database2 as varchar(128)
declare #Query2 as varchar(max)
set #Server1 = '(local)'
set #Database1 = 'MyDatabase'
set #Query1 = 'select * from MyTable' --use a select
set #Server2 = '(local)'
set #Database2 = 'MyDatabase2'
set #Query2 = 'exec MyTestProcedure....' --or use a procedure
--calculate statement column differences
declare #SQLStatement1 as varchar(max)
declare #SQLStatement2 as varchar(max)
set #Server1 = replace(#Server1,'''','''''')
set #Database1 = replace(#Database1,'''','''''')
set #Query1 = replace(#Query1,'''','''''')
set #Server2 = replace(#Server2,'''','''''')
set #Database2 = replace(#Database2,'''','''''')
set #Query2 = replace(#Query2,'''','''''')
CREATE TABLE #Qry1Columns(
[colorder] [smallint] NULL,
[ColumnName] [sysname] COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[TypeName] [sysname] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[prec] [smallint] NULL,
[scale] [int] NULL,
[isnullable] [int] NULL,
[collation] [sysname] COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
CREATE TABLE #Qry2Columns(
[colorder] [smallint] NULL,
[ColumnName] [sysname] COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[TypeName] [sysname] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[prec] [smallint] NULL,
[scale] [int] NULL,
[isnullable] [int] NULL,
[collation] [sysname] COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
set #SQLStatement1 =
'SELECT *
INTO #Qry1
FROM OPENROWSET(''SQLNCLI'',
''server=' + #Server1 + ';database=' + #Database1 + ';trusted_connection=yes'',
''select top 0 * from (' + #Query1 + ') qry'')
select colorder, syscolumns.name ColumnName, systypes.name TypeName, syscolumns.prec, syscolumns.scale, syscolumns.isnullable, syscolumns.collation
from tempdb.dbo.syscolumns
join tempdb.dbo.systypes
on syscolumns.xtype = systypes.xtype
where id = OBJECT_ID(''tempdb.dbo.#Qry1'')
order by 1'
insert into #Qry1Columns
exec(#SQLStatement1)
set #SQLStatement2 =
'SELECT *
INTO #Qry1
FROM OPENROWSET(''SQLNCLI'',
''server=' + #Server2 + ';database=' + #Database2 + ';trusted_connection=yes'',
''select top 0 * from (' + #Query2 + ') qry'')
select colorder, syscolumns.name ColumnName, systypes.name TypeName, syscolumns.prec, syscolumns.scale, syscolumns.isnullable, syscolumns.collation
from tempdb.dbo.syscolumns
join tempdb.dbo.systypes
on syscolumns.xtype = systypes.xtype
where id = OBJECT_ID(''tempdb.dbo.#Qry1'')
order by 1'
insert into #Qry2Columns
exec(#SQLStatement2)
select ISNULL( #Qry1Columns.colorder, #Qry2Columns.colorder) ColumnNumber,
#Qry1Columns.ColumnName ColumnName1,
#Qry1Columns.TypeName TypeName1,
#Qry1Columns.prec prec1,
#Qry1Columns.scale scale1,
#Qry1Columns.isnullable isnullable1,
#Qry1Columns.collation collation1,
#Qry2Columns.ColumnName ColumnName2,
#Qry2Columns.TypeName TypeName2,
#Qry2Columns.prec prec2,
#Qry2Columns.scale scale2,
#Qry1Columns.isnullable isnullable2,
#Qry2Columns.collation collation2
from #Qry1Columns
join #Qry2Columns
on #Qry1Columns.colorder=#Qry2Columns.colorder
You can tweak the finally select statement to highlight any differences that you wish. You can also wrap this up in a procedure and make a nice little user interface for it if you like, so that it's literally a cut and paste away to quick results.

Select all values from all tables with specific table name

EDIT original question:
Our UDW is broken out into attribute and attribute list tables.
I would like to write a data dictionary query that dynamically pulls in all column values from all tables that are like %attr_list% without having to write a series of unions and update or add every time a new attribute list is created in our UDW.
All of our existing attribute list tables follow the same format (number of columns, most column names, etc). Below is the first two unions in our existing view which I want to avoid updating each time a new attribute list table is added to our UDW.
CREATE VIEW [dbo].[V_BI_DATA_DICTIONARY]
( ATTR_TABLE
,ATTR_LIST_ID
,ATTR_NAME
,ATTR_FORMAT
,SHORT_DESCR
,LONG_DESCR
,SOURCE_DATABASE
,SOURCE_TABLE
,SOURCE_COLUMN
,INSERT_DATETIME
,INSERT_OPRID
)
AS
SELECT 'PREAUTH_ATTR_LIST' ATTR_TABLE
,[PREAUTH_ATTR_LIST_ID] ATTR_LIST_ID
,[ATTR_NAME] ATTR_NAME
,[ATTR_FORMAT] ATTR_FORMAT
,[SHORT_DESCR] SHORT_DESCR
,[LONG_DESCR] LONG_DESCR
,[SOURCE_DATABASE] SOURCE_DATABASE
,[SOURCE_TABLE] SOURCE_TABLE
,[SOURCE_COLUMN] SOURCE_COLUMN
,[INSERT_DATETIME] INSERT_DATETIME
,[INSERT_OPRID] INSERT_OPRID
FROM [My_Server].[MY_DB].[dbo].[PREAUTH_ATTR_LIST]
UNION
SELECT 'SAVINGS_ACCOUNT_ATTR_LIST'
,[SAVINGS_ACCOUNT_ATTR_LIST_ID]
,[ATTR_NAME]
,[ATTR_FORMAT]
,[SHORT_DESCR]
,[LONG_DESCR]
,[SOURCE_DATABASE]
,[SOURCE_TABLE]
,[SOURCE_COLUMN]
,[INSERT_DATETIME]
,[INSERT_OPRID]
FROM [My_Server].[MY_DB].[dbo].[SAVINGS_ACCOUNT_ATTR_LIST]'
Something like this might work for you if all tables contain the same columns.
Just change the temp table and the selected columns to match your own columns.
CREATE TABLE #results (
ATTR_TABLE SYSNAME,
ATTR_LIST_ID INT,
ATTR_NAME NVARCHAR(50),
ATTR_FORMAT NVARCHAR(50),
SHORT_DESCR NVARCHAR(50),
LONG_DESCR NVARCHAR(255),
SOURCE_DATABASE NVARCHAR(50),
SOURCE_TABLE NVARCHAR(50),
SOURCE_COLUMN NVARCHAR(50),
INSERT_DATETIME DATETIME,
INSERT_OPRID INT
);
INSERT INTO #results
EXEC sp_MSforeachtable #command1 =
'
SELECT ''?''
, *
FROM ?
WHERE ''?'' LIKE ''%ATTR_LIST%''
'
SELECT *
FROM #results
DROP TABLE #results
EDIT: Updated my example with your columns. Because you use different column name for ATTR_LIST_ID in each table I changed the select to SELECT *. Obviously I don't know the data types of your columns so you have to change them.
This won't work in a view but you could create a stored procedure.
For SQL Server you should be able to use something like this:
SELECT c.name AS ColName, t.name AS TableName
FROM sys.columns c
JOIN sys.tables t ON c.object_id = t.object_id
WHERE t.name LIKE '%attr_list%'
And this will include views as well as tables
SELECT COLUMN_NAME, TABLE_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME LIKE '%attr_list%'
If using MS SQL Server check out the sys catalog views. You can use sys.tables and join to sys.columns to get your tables and columns. sys.extended_properties can get you description information, if entered.

How can I exclude GUIDs from a select distinct without listing all other columns in a table?

So let's say 'table' has two columns that act as a GUID - the ID column and msrepl_tran_version. Our original programmer did not know that our replication created this column and included it in a comparison, which has resulted in almost 20,000 records being put into this table, of which only 1,588 are ACTUALLY unique, and it's causing long load times.
I'm looking for a way to exclude the ID and replication columns from a select distinct, without having to then list every single column in the table, since I'm going to have to select from the record set multiple times to fix this (there are other tables affected and the query is going to be ridiculous) I don't want to have to deal with my code being messy if I can help it.
Is there a way to accomplish this without listing all of the other columns?
Select distinct {* except ID, msrepl_tran_version} from table
Other than (where COL_1 is ID and COL_N is the replication GUID)
Select distinct COL_2, ..., COL_N-1, COL_N+1, ... from table
After more searching, I found the answer:
SELECT * INTO #temp FROM table
ALTER TABLE #temp DROP COLUMN id
ALTER TABLE #temp DROP COLUMN msrepl_tran_version
SELECT DISTINCT * FROM #temp
This works for what I need. Thanks for the answers guys!
Absolutely, 100% not possible, there is no subtract columns instruction.
It can't be done in the spirit of the OP's initial question. However, it can be done with dynamic sql:
--Dynamically build list of column names.
DECLARE #ColNames NVARCHAR(MAX) = ''
SELECT #ColNames = #ColNames + '[' + c.COLUMN_NAME + '],'
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.TABLE_SCHEMA = 'dbo'
AND c.TABLE_NAME = 'YourTable'
--Exclude these.
AND c.COLUMN_NAME NOT IN ('ID', 'msrepl_tran_version')
--Keep original column order for appearance, convenience.
ORDER BY c.ORDINAL_POSITION
--Remove trailing comma.
SET #ColNames = LEFT(#ColNames, LEN(#ColNames) - 1)
--Verify query
PRINT ('SELECT DISTINCT ' + #ColNames + ' FROM [dbo].[YourTable]')
--Uncomment when ready to proceed.
--EXEC ('SELECT DISTINCT ' + #ColNames + ' FROM [dbo].[YourTable]')
One additional note: since you need to select from the record set multiple times and potentially join to other tables, you can use the above to create a view on the table. This should make your code fairly clean.

How to select some particular columns from a table if the table has more than 100 columns

I need to select 90 columns out of 107 columns from my table.
Is it possible to write select * except( column1,column2,..) from table or any other way to get specific columns only, or I need to write all the 90 columns in select statement?
You could generate the column list:
select name + ', '
from sys.columns
where object_id = object_id('YourTable')
and name not in ('column1', 'column2')
It's possible to do this on the fly with dynamic SQL:
declare #columns varchar(max)
select #columns = case when #columns is null then '' else #columns + ', ' end +
quotename(name)
from sys.columns
where object_id = object_id('YourTable')
and name not in ('column1', 'column2')
declare #query varchar(max)
set #query = 'select ' + #columns + ' from YourTable'
exec (#query)
No, there's no way of doing * EXCEPT some columns. SELECT * itself should rarely, if ever, be used outside of EXISTS tests.
If you're using SSMS, you can drag the "columns" folder (under a table) from the Object Explorer into a query window, and it will insert all of the column names (so you can then go through them and remove the 17 you don't want)
There is no way in SQL to do select everything EXCEPT col1, col2 etc.
The only way to do this is to have your application handle this, and generate the sql query dynamically.
You could potentially do some dynamic sql for this, but it seems like overkill. Also it's generally considered poor practice to use SELECT *... much less SELECT * but not col3, col4, col5 since you won't get consistent results in the case of table changes.
Just use SSMS to script out a select statement and delete the columns you don't need. It should be simple.
No - you need to write all columns you need. You might create an view for that, so your actual statement could use select * (but then you have to list all columns in the view).
Since you should never be using select *, why is this a problem? Just drag the columns over from the Object Explorer and delete the ones you don't want.

Sybase BCP - include Column header

Sybase BCP exports nicely but only includes the data. Is there a way to include column names in the output?
AFAIK It's a very difficult to include column names in the bcp output.
Try free sqsh isql replacement http://www.sqsh.org/ with pipe and redirect features.
F.e.
1> select * from sysobjects
2> go 2>/dev/null >/tmp/objects.txt
I suppose you can achive necessary result.
With bcp you can't get the table columns.
You can get it with a query like this:
select c.name from sysobjects o
inner join syscolumns c on o.id = c.id and o.name = tablename
I solved this problem not too long ago via a proc will loop through the tables columns, and concatenate them. I removed all the error checking and procedure wrapper from this example. this should give you the idea. I then BCP'd out of the below table into headers.txt, then BCP'd the results into detail.txt and used dos copy /b header.txt+detail.txt file.txt to combine the header and detail records...this wall all done in a batch script.
The table you will BCP
create table dbo.header_record
(
headers_delimited varchar(5000)
)
Then massage the below commands into a stored proc. use isql to call this proc before your BCP extracts.
declare
#last_col int,
#curr_col int,
#header_conc varchar(5000),
#table_name varchar(35),
#delim varchar(5),
#delim_size int
select
#header_conc = '',
#table_name = 'dbo.detail_table',
#delim = '~'
set #delim_size = len(#delim)
--
--create column list table to hold our identity() columns so we can work through it
--
create local temporary table col_list
(
col_head int identity
,column_name varchar(50)
) on commit preserve rows
--
-- Delete existing rows in case columns have changed
--
delete from header_record
--
-- insert our column values in the order that they were created
--
insert into col_list (column_name)
select
trim(column_name)
from SYS.SYSCOLUMN --sybase IQ specific, you will need to adjust.
where table_id+100000 = object_id(#table_name) --Sybase IQ 12.7 specific, 15.x will need to be changed.
order by column_id asc
--
--select the biggest identity in the col_list table
--
select #last_col = max(col_head)
from col_list
--
-- Start # column 1
--
set #curr_col = 1
--
-- while our current columns are less than or equal to the column we need to
-- process, continue else end
--
while (#curr_col <= #last_col)
BEGIN
select
#header_conc =
#header_conc + #delim + column_name
from col_list where col_head = #curr_col
set #curr_col = #curr_col + 1
END
--
-- insert our final concatenated value into 1 field, ignore the first delimiter
--
insert into dbo.header_record
select substring(#header_conc, #delim_size, len(#header_conc) )
--
-- Drop temp table
--
drop table col_list
I created a view with the first row being the column names unioned to the actual table.
create view bcp_view
as 'name' col1, 'age' col2, ....
union
select name, convert(varchar, age),.... from people
Just remember to convert any non-varchar columns.