MS SQL Store Procedure to Merge Multiple Rows into Single Row based on Variable Table and Column Names - sql

I'm working with MS SQL Server 2008. I'm trying to create a stored procedure to Merge (perhaps) several rows of data (answers) into a single row on target table(s). This uses a 'table_name' field and 'column_name' field from the answers table. The data looks like something like this:
answers table
--------------
id int
table_name varchar
column_name varchar
answer_value varchar
So, the target table (insert/update) would come from the 'table_name'. Each row from the anwsers would fill one column on the target table.
table_name_1 table
--------------
id int
column_name_1 varchar
column_name_2 varchar
column_name_3 varchar
etc...
Note, there can be many target tables (variable from answers table: table_name_1, table_name_2, table_name_3, etc.) that insert into many columns (column_name_1...2...3) on each target table.
I thought about using a WHILE statement to loop through the answers table. This could build a variable which would be the insert/update statement(s) for the target tables. Then executing those statements somehow. I also noticed Merge looks like it might help with this problem (select/update/insert), but my MS SQL Stored Procedure experience is very little. Could someone suggestion a strategy or solution to this problem?
Note 6/23/2014: I'm considering using a single Merge statement, but I'm not sure it is possible.

I'm probably missing something, but the basic idea to solve the problem is to use meta-programming, like a dynamic pivot.
In this particular case there is another layer to make the solution more difficult: the result need to be in different execution instead of beeing grouped.
The backbone for a possible solution is
DECLARE #cols AS NVARCHAR(MAX)
DECLARE #query AS NVARCHAR(MAX)
--using a cursor on SELECT DISTINCT table_name FROM answers iterate:
--*Cursor Begin Here*
--mock variable for the first value of the cursor
DECLARE #table AS NVARCHAR(MAX) = 't1'
-- Column list
SELECT #cols = STUFF((SELECT distinct
',' + QUOTENAME(column_name)
FROM answers with (nolock)
WHERE table_name = #table
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
, 1, 1, '')
--Query definition
SET #query = '
SELECT ' + #cols + '
INTO ' + #table + '
FROM (SELECT column_name, answer_value
FROM answers
WHERE table_name = ''' + #table + ''') b
PIVOT (MAX(answer_value) FOR column_name IN (' + #cols + ' )) p '
--select #query
EXEC sp_executesql #query
--select to verify the execution
--SELECT * FROM t1
--*Cursor End Here*
SQLFiddle Demo
The cursor definition is omitted, because I'm not sure if it'll work on SQLFiddle
In addition to the template for a Dynamic Pivot the columns list is filtered by the new table name, and in the query definition there is a SELECT ... INTO instead of a SELECT.
This script does not account for table already in the database, if that's a possibility the query can be divided in two:
SET #query = '
SELECT TOP 0 ' + #cols + '
INTO ' + #table + '
FROM (SELECT column_name, answer_value
FROM answers
WHERE table_name = ''' + #table + ''') b
PIVOT (MAX(answer_value) FOR column_name IN (' + #cols + ' )) p '
to create the table without data, if needed, and
SET #query = '
INSERT INTO ' + #table + '(' + #cols + ')'
SELECT ' + #cols + '
FROM (SELECT column_name, answer_value
FROM answers
WHERE table_name = ''' + #table + ''') b
PIVOT (MAX(answer_value) FOR column_name IN (' + #cols + ' )) p '
or a MERGE to insert/update the values in the table.
Another possibility will be to DROP and recreate every table.

Approach I took to this complex problem:
Create several temporary tables to work with your data
Select and populate the temporary tables with the data
Use dynamic pivoting to pivot the rows into one row
Use a CURSOR with WHILE loop for multiple table entries
SET #query with the dynamically built MERGE statement
EXECUTE(#query)
Drop temporary tables

Related

Select non-empty columns using SQL Server

I am using SQL Server 2012. i have a table with 90 columns. I am trying to select only columns that contains data. After searching i used the following procedure:
1- Getting all columns count using one select query
2- Pivoting Result Table into a Temp table
3- Creating Select query
4- Executing this query
Here is the query i used:
DECLARE #strTablename varchar(100) = 'dbo.MyTable'
DECLARE #strQuery varchar(max) = ''
DECLARE #strSecondQuery varchar(max) = 'SELECT '
DECLARE #strUnPivot as varchar(max) = ' UNPIVOT ([Count] for [Column] IN ('
CREATE TABLE ##tblTemp([Column] varchar(50), [Count] Int)
SELECT #strQuery = ISNULL(#strQuery,'') + 'Count([' + name + ']) as [' + name + '] ,' from sys.columns where object_id = object_id(#strTablename) and is_nullable = 1
SELECT #strUnPivot = ISNULL(#strUnPivot,'') + '[' + name + '] ,' from sys.columns where object_id = object_id(#strTablename) and is_nullable = 1
SET #strQuery = 'SELECT [Column],[Count] FROM ( SELECT ' + SUBSTRING(#strQuery,1,LEN(#strQuery) - 1) + ' FROM ' + #strTablename + ') AS p ' + SUBSTRING(#strUnPivot,1,LEN(#strUnPivot) - 1) + ')) AS unpvt '
INSERT INTO ##tblTemp EXEC (#strQuery)
SELECT #strSecondQuery = #strSecondQuery + '[' + [Column] + '],' from ##tblTemp WHERE [Count] > 0
DROP TABLE ##tblTemp
SET #strSecondQuery = SUBSTRING(#strSecondQuery,1,LEN(#strSecondQuery) - 1) + ' FROM ' + #strTablename
EXEC (#strSecondQuery)
The problem is that this query is TOO SLOW. Is there a best way to achieve this?
Notes:
Table have only one clustered index on primary key Column ID and does not contains any other indexes.
Table is not editable.
Table contains very large data.
Query is taking about 1 minute to be executed
Thanks in advance.
I do not know if this is faster, but you might use one trick: FOR XML AUTO will ommit columns without content:
DECLARE #tbl TABLE(col1 INT,col2 INT,col3 INT);
INSERT INTO #tbl VALUES (1,2,NULL),(1,NULL,NULL),(NULL,NULL,NULL);
SELECT *
FROM #tbl AS tbl
FOR XML AUTO
This is the result: col3 is missing...
<tbl col1="1" col2="2" />
<tbl col1="1" />
<tbl />
Knowing this, you could find the list of columns, which are not NULL in all rows, like this:
DECLARE #ColList VARCHAR(MAX)=
STUFF
(
(
SELECT DISTINCT ',' + Attr.value('local-name(.)','nvarchar(max)')
FROM
(
SELECT
(
SELECT *
FROM #tbl AS tbl
FOR XML AUTO,TYPE
) AS TheXML
) AS t
CROSS APPLY t.TheXML.nodes('/tbl/#*') AS A(Attr)
FOR XML PATH('')
),1,1,''
);
SELECT #ColList
The content of #ColList is now col1,col2. This string you can place in a dynamically created SELECT.
UPDATE: Hints
It would be very clever, to replace the SELECT * with a column list created from INFORMATION_SCHEMA.COLUMNS excluding all not-nullable. And - if needed and possible - types, wich contain very large data (BLOBs).
UPDATE2: Performance
Don't know what your very large data means actually... Just tried this on a table with about 500.000 rows (with SELECT *) and it returned correctly after less than one minute. Hope, this is fast enough...
Try using this condition:
where #columnname IS NOT NULL AND #columnname <> ' '

change column names in a dynamic pivot table result

I have a tsql dynamic pivot table query that works fine although I´m not happy with the column names in the final output.
To make it work, the user has to select up to 20 fruit names from a list of 200 fruit names. Then I build the pivot table so I end up with different column names everytime the select is run.
For example:
First time the column names are: apple, orange and pear
Second time is: .orange, banana, kiwi and apple
My question is: Is there a posibility to have static names, like for example: name of the first column always "col_1", second column "col_2" etc?
The select statement is as follows:
DECLARE #idList varchar(800)
DECLARE #sql nvarchar(max)
SELECT #idList = coalesce(#idList + ', ', '') + '['+ltrim(rtrim(id_producto)) +']'
from gestor_val_pos
group by id_producto order by id_producto
SELECT #sql = 'select * from #correlaciones pivot (max (correl)
for codigo2 in (' + #IDlist + ')) AS pvt order by codigo1;'
exec sp_executeSQL #sql
sure.. just created a new variable to hold the column aliases and Row_Number to get the column number.
DECLARE #idList varchar(800)
DECLARE #idListAlias varchar(800)
DECLARE #sql nvarchar(max)
SELECT
#idList = coalesce(#idList + ', ', '') + '['+ltrim(rtrim(id_producto)) +']',
#idListAlias = coalesce(#idListAlias + ', ', '') + '['+ltrim(rtrim(id_producto)) +'] as col_' + CONVERT(VARCHAR(10), ROW_NUMBER() OVER(ORDER BY id_producto))
from gestor_val_pos
group by id_producto order by id_producto
SELECT #sql = 'select ' + #idListAlias + ' from #correlaciones pivot (max (correl)
for codigo2 in (' + #IDlist + ')) AS pvt order by codigo1;'
exec sp_executeSQL #sql
Yes, but it'll make your query significantly more complex.
You'll need to return a list of the possible column names generated from #IDList and convert that into a SELECT clause more sophisticated than your current SELECT *.
When you've got that, use some SQL string splitting code to convert the #IDList into a table of items with a position parameter. Append AS <whatever> onto the end of any you want and use the FOR XML PATH trick to flatten it back, and you've got a SELECT clause that'll do what you want. But, as I said, your code is now significantly more complicated.
As an aside - I really hope that #idList is either completely impossible for any user input to ever reach or hugely simplified from your real code for this demonstration. Otherwise you've got a big SQL injection hole right there.

Dynamic SQL Result INTO Temporary Table

I need to store dynamic sql result into a temporary table #Temp.
Dynamic SQL Query result is from a pivot result, so number of columns varies(Not fixed).
SET #Sql = N'SELECT ' + #Cols + ' FROM
(
SELECT ResourceKey, ResourceValue
FROM LocaleStringResources where StateId ='
+ LTRIM(RTRIM(#StateID)) + ' AND FormId =' + LTRIM(RTRIM(#FormID))
+ ' AND CultureCode =''' + LTRIM(RTRIM(#CultureCode)) + '''
) x
pivot
(
max(ResourceValue)
for ResourceKey IN (' + #Cols + ')
) p ;'
--#Cols => Column Names which varies in number
Now I have to insert dynamic sql result to #Temp Table and use this #Temp Table with another existing table to perform joins or something else.
(#Temp table should exist there to perform operations with other existing tables)
How can I Insert dynamic SQL query result To a Temporary table?
Thanks
Can you please try the below query.
SET #Sql = N'SELECT ' + #Cols + '
into ##TempTable
FROM
(
SELECT ResourceKey, ResourceValue
FROM LocaleStringResources where StateId ='
+ LTRIM(RTRIM(#StateID)) + ' AND FormId =' + LTRIM(RTRIM(#FormID))
+ ' AND CultureCode =''' + LTRIM(RTRIM(#CultureCode)) + '''
) x
pivot
(
max(ResourceValue)
for ResourceKey IN (' + #Cols + ')
) p ;'
You can then use the ##TempTable for further operations.
However, do not forget to drop the ##TempTable at the end of your query as it will give you error if you run the query again as it is a Global Temporary Table
As was answered in (https://social.msdn.microsoft.com/Forums/sqlserver/en-US/144f0812-b3a2-4197-91bc-f1515e7de4b9/not-able-to-create-hash-table-inside-stored-proc-through-execute-spexecutesql-strquery?forum=sqldatabaseengine),
you need to create a #Temp table in advance:
CREATE TABLE #Temp(columns definition);
It seems that the task is impossible, if you know nothing about the dynamic list of columns in advance. But, most likely you do know something.
You do know the types of dynamic columns, because they come from PIVOT. Most likely, you know the maximum possible number of dynamic columns. Even if you don't, SQL Server has a limit of 1024 columns per (nonwide) table and there is a limit of 8060 bytes per row (http://msdn.microsoft.com/en-us/library/ms143432.aspx). So, you can create a #Temp table in advance with maximum possible number of columns and use only some of them (make all your columns NULLable).
So, CREATE TABLE will look like this (instead of int use your type):
CREATE TABLE #Temp(c1 int NULL, c2 int NULL, c3 int NULL, ..., c1024 int NULL);
Yes, column names in #Temp will not be the same as in #Cols. It should be OK for your processing.
You have a list of columns in your #Cols variable. You somehow make this list of columns in some external code, so when #Cols is generated you know how many columns there are. At this moment you should be able to generate a second list of columns that matches the definition of #Temp. Something like:
#TempCols = N'c1, c2, c3, c4, c5';
The number of columns in #TempCols should be the same as the number of columns in #Cols. Then your dynamic SQL would look like this (I have added INSERT INTO #Temp (#TempCols) in front of your code):
SET #Sql = N'INSERT INTO #Temp (' + #TempCols + N') SELECT ' + #Cols + N' FROM
(
SELECT ResourceKey, ResourceValue
FROM LocaleStringResources where StateId ='
+ LTRIM(RTRIM(#StateID)) + ' AND FormId =' + LTRIM(RTRIM(#FormID))
+ ' AND CultureCode =''' + LTRIM(RTRIM(#CultureCode)) + '''
) x
pivot
(
max(ResourceValue)
for ResourceKey IN (' + #Cols + ')
) p ;'
Then you execute your dynamic SQL:
EXEC (#Sql) OR sp_executesql #Sql
And then do other processing using the #Temp table and temp column names c1, c2, c3, ...
MSDN says:
A local temporary table created in a stored procedure is dropped
automatically when the stored procedure is finished.
You can also DROP the #Temp table explicitly, like this:
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp'
All this T-SQL code (CREATE TABLE, EXEC, ...your custom processing..., DROP TABLE) would naturally be inside the stored procedure.
Alternative to create a temporary table is to use the subquery
select t1.name,t1.lastname from(select * from table)t1.
where "select * from table" is your dyanmic query. which will return result which you can use as temp table t1 as given in example .
IF OBJECT_ID('tempdb..##TmepTable') IS NOT NULL DROP TABLE ##TmepTable
CREATE TABLE ##TmepTable (TmpCol CHAR(1))
DECLARE #SQL NVARCHAR(max) =' IF OBJECT_ID(''tempdb..##TmepTable'') IS NOT
NULL DROP TABLE ##TmepTable
SELECT * INTO ##TmepTable from [MyTableName]'
EXEC sp_executesql #SQL
SELECT Alias.* FROM ##TmepTable as Alias
IF OBJECT_ID('tempdb..##TmepTable') IS NOT NULL DROP TABLE ##TmepTable
Here is step by step solution for your problem.
Check for your temporary tables if they exist, and delete them.
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp
IF OBJECT_ID('tempdb..##abc') IS NOT NULL
DROP TABLE ##abc
Store your main query result in first temp table (this step is for simplicity and more readability).
SELECT *
INTO #temp
FROM (SELECT ResourceKey, ResourceValue
FROM LocaleStringResources
where StateId ='+ LTRIM(RTRIM(#StateID)) + ' AND FormId =' + LTRIM(RTRIM(#FormID))
+ ' AND CultureCode =' + LTRIM(RTRIM(#CultureCode)) + ') AS S
Write below query to create your pivot and store result in another temp table.
DECLARE #str NVARCHAR(1000)
DECLARE #sql NVARCHAR(1000)
SELECT #str = COALESCE(#str+',', '') + ResourceKey FROM #temp
SET #sql = N'select * into ##abc from (select ' + #str + ' from (SELECT ResourceKey, ResourceValue FROM #temp) as A
Pivot
(
max(ResourceValue)
for ResourceKey in (' + #str + ')
)as pvt) as B'
Execute below query to get the pivot result in your next temp table ##abc.
EXECUTE sp_executesql #sql
And now you can use ##abc as table where-ever you want like
select * from ##abc
Hope this will help you.

unpivot a table using column_name function

I have a table which has been designed to have columns named the same as a value I need to reference it with. I know I need to unpivot the table and have got it working if I manually type all the column names, however this table keeps getting new columns added to it so I want to capture all the columns in the unpivot rather than script them manually.
I can get the column names using the column_name function and was wondering if this can be added to the unpivot at all, ive been playing around with it and its not looking possible at the moment to me so thought id check to see if there were any other suggestions.
Sadly I cant redesign the table with where the column names keep getting added to although that would be the ideal solution.
select Day, Rota, RotaTemplate
from table1 t1
unpivot
(
Rota
for RotaTemplate in (select Column_name
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'table1')
) unpiv;
You are not allowed to do this. You need to build dynamic SQL statement like follows:
DECLARE #DynamicSQL NVARCHAR(MAX)
SET #DynamicSQL = N'select Day, Rota, RotaTemplate' + CHAR(10) +
'from table1 t1'+ CHAR(10) +
'unpivot' + CHAR(10) +
'(' + CHAR(10) + CHAR(9) +
'Rota for RotaTemplate in ('
+
STUFF
(
(
SELECT ',[' + [COLUMN_NAME] + '] '
FROM [INFORMATION_SCHEMA].[COLUMNS]
WHERE [TABLE_NAME] = 'table1'
FOR XML PATH('')
)
,1
,1
,''
)
+')' + CHAR(10) +
') unpiv;'
EXEC sp_executesql #DynamicSQL

How to find out whether a table has some unique columns

I use MS SQL Server.
Ive been handed some large tables with no constrains on them, no keys no nothing.
I know some of the columns have unique values. Is there a smart way for a given table to finde the cols that have unique values ?
Right now I do it manually for each column by counting if there is as many DISTINCT values as there are rows in the table.
SELECT COUNT(DISTINCT col) FROM table
Could prob make a cusor to loop over all the columns but want to hear if someone knows a smarter or build-in function.
Thanks.
Here's an approach that is basically similar to #JNK's but instead of printing the counts it returns a ready answer for every column that tells you whether a column consists of unique values only or not:
DECLARE #table varchar(100), #sql varchar(max);
SET #table = 'some table name';
SELECT
#sql = COALESCE(#sql + ', ', '') + ColumnExpression
FROM (
SELECT
ColumnExpression =
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(*) THEN ''UNIQUE'' ' +
'ELSE '''' ' +
'END AS ' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table
) s
SET #sql = 'SELECT ' + #sql + ' FROM ' + #table;
PRINT #sql; /* in case you want to have a look at the resulting query */
EXEC(#sql);
It simply compares COUNT(DISTINCT column) with COUNT(*) for every column. The result will be a table with a single row, where every column will contain the value UNIQUE for those columns that do not have duplicates, and empty string if duplicates are present.
But the above solution will work correctly only for those columns that do not have NULLs. It should be noted that SQL Server does not ignore NULLs when you want to create a unique constraint/index on a column. If a column contains just one NULL and all other values are unique, you can still create a unique constraint on the column (you cannot make it a primary key, though, which requires both uniquness of values and absence of NULLs).
Therefore you might need a more thorough analysis of the contents, which you could get with the following script:
DECLARE #table varchar(100), #sql varchar(max);
SET #table = 'some table name';
SELECT
#sql = COALESCE(#sql + ', ', '') + ColumnExpression
FROM (
SELECT
ColumnExpression =
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(*) THEN ''UNIQUE'' ' +
'WHEN COUNT(*) - 1 THEN ' +
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE WITH SINGLE NULL'' ' +
'ELSE '''' ' +
'END ' +
'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE with NULLs'' ' +
'ELSE '''' ' +
'END AS ' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table
) s
SET #sql = 'SELECT ' + #sql + ' FROM ' + #table;
PRINT #sql; /* in case you still want to have a look at the resulting query */
EXEC(#sql);
This solution takes NULLs into account by checking three values: COUNT(DISTINCT column), COUNT(column) and COUNT(*). It displays the results similarly to the former solution, but the possible diagnoses for the columns are more diverse:
UNIQUE means no duplicate values and no NULLs (can either be a PK or have a unique constraint/index);
UNIQUE WITH SINGLE NULL – as can be guessed, no duplicates, but there's one NULL (cannot be a PK, but can have a unique constraint/index);
UNIQUE with NULLs – no duplicates, two or more NULLs (in case you are on SQL Server 2008, you could have a conditional unique index for non-NULL values only);
empty string – there are duplicates, possibly NULLs too.
Here is I think probably the cleanest way. Just use dynamic sql and a single select statement to create a query that gives you a total row count and a count of distinct values for each field.
Fill in the DB name and tablename at the top. The DB name part is really important since OBJECT_NAME only works in the current database context.
use DatabaseName
DECLARE #Table varchar(100) = 'TableName'
DECLARE #SQL Varchar(max)
SET #SQL = 'SELECT COUNT(*) as ''Total'''
SELECT #SQL = #SQL + ',COUNT(DISTINCT ' + name + ') as ''' + name + ''''
FROM sys.columns c
WHERE OBJECT_NAME(object_id) = #Table
SET #SQL = #SQL + ' FROM ' + #Table
exec #sql
If you are using 2008, you can use the Data Profiling Task in SSIS to return Candidate Keys for each table.
This blog entry steps through the process, it's fairly simple:
http://consultingblogs.emc.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx
A few words what my code does:
Read's all tables and columns
Creates a temp table to hold table/columns with duplicate keys
For each table/column it runs a query. If it finds a count(*)>1 for at least one value
it makes an insert into the temp table
Select's column and values from the system tables that do not match table/columns that are found to have duplicates
DECLARE #sql VARCHAR(max)
DECLARE #table VARCHAR(100)
DECLARE #column VARCHAR(100)
CREATE TABLE #temp (tname VARCHAR(100),cname VARCHAR(100))
DECLARE mycursor CURSOR FOR
select t.name,c.name
from sys.tables t
join sys.columns c on t.object_id = c.object_id
where system_type_id not in (34,35,99)
OPEN mycursor
FETCH NEXT FROM mycursor INTO #table,#column
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = 'INSERT INTO #temp SELECT DISTINCT '''+#table+''','''+#column+ ''' FROM ' + #table + ' GROUP BY ' + #column +' HAVING COUNT(*)>1 '
EXEC (#sql)
FETCH NEXT FROM mycursor INTO #table,#column
END
select t.name,c.name
from sys.tables t
join sys.columns c on t.object_id = c.object_id
left join #temp on t.name = #temp.tname and c.name = #temp.cname
where system_type_id not in (34,35,99) and #temp.tname IS NULL
DROP TABLE #temp
CLOSE mycursor
DEALLOCATE mycursor
What about simple one line of code:
CREATE UNIQUE INDEX index_name ON table_name (column_name);
If the index is created then your column_name has only unique values. If there are dupes in your column_name, you will get an error message.