SSIS Multiple Unknow Column Updates - sql

I wonder if anyone has come across a similar situation before that could point me in the right direction..? I'll add that it's a bit frustrating as someone has replaced the NULL value with a text string containing the word 'NULL' - which I need to remove.
I have 6 quite large tables, over 250+ columns and in excess of 1 million records in each and I need to update the columns where the word NULL appears in a row and replace it with a proper NULL value - the problem is that I have no idea in which column this appears.
As a start, I've got some code that will list every column with a count of the values and anything that looks to have a lower count than expected, I'll run a SQL query to ascertain if the column contains the string 'NULL' and using the following code, replace it with NULL.
declare #tablename sysname
declare #ColName nvarchar(500)
declare #sql nvarchar(1000)
declare #sqlUpdate nvarchar(1000)
declare #ParmDefinition nvarchar(1000)
set #tablename = N'Table_Name'
Set #ColName = N'Column_Name'
set #ParmDefinition = N'#ColName nvarchar OUTPUT';
set #sql= 'Select ' + #ColName + ', Count(' + #ColName + ') from ' + #tablename + ' group by ' + #ColName + ''
Set #sqlUpdate = 'Update ' + #tablename + ' SET ' + #ColName + ' = NULL WHERE '+ #ColName + ' = ''NULL'''
print #sql
print #sqlUpdate
EXECUTE sp_executesql #sql, #ParmDefinition, #ColName=#ColName OUTPUT;
EXECUTE sp_executesql #sqlUpdate, #ParmDefinition, #ColName=#ColName OUTPUT;
What I'm trying to with SSIS is to iterate through each column,
Select Column_Name from Table_Name where Column_Name = 'NULL'
run the appropriate query, and perform the update.
So far I can extract the column names from Information.Schema and get a record count from the appropriate table, but when it comes to running the actual UPDATE statement (as above, sqlUpdate) - there doesn't seem to be a component that's happy with the dynamic phrasing of the query.
I'm using a Conditional Split to determine where to go if there are records (which may be incorrect) and I've tried OLE DB Command for the update.
In short, I'm wondering whether SSIS is the best tool for this job or whether I'm looking in the wrong place!
I'm using SSIS 2005, which may well have limitations that I'm not yet aware of!
Any guidance would be appreciated.
Thanks,
Jon

The principle is basically sound, but I would leave SSIS out, and do it with SSMS directly against the SQL Server and build the looping logic there, probably with a cursor.
I'm not sure whether you need to check the count of potential values first - you might just as well apply the update and accept that sometimes it will update no rows - the filtering will then not be duplicated.
Something like
declare columns cursor local read_only for
select
c.TABLE_CATALOG,
c.TABLE_SCHEMA,
c.TABLE_NAME,
c.COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS c
inner join INFORMATION_SCHEMA.TABLES t
on c.TABLE_CATALOG = t.TABLE_CATALOG
and c.TABLE_SCHEMA = t.TABLE_SCHEMA
and c.TABLE_NAME = c.TABLE_NAME
where c.DATA_TYPE like '%varchar%'
open columns
declare #catalog varchar(100), #schema varchar(100), #table varchar(100), #column varchar(100)
fetch from columns into #catalog, #schema, #table, #column
while ##FETCH_STATUS= 0
begin
-- construct update here and execute it.
select #catalog, #schema, #table, #column
fetch next from columns into #catalog, #schema, #table, #column
end
close columns
deallocate columns
You might also consider applying all the updates to the table in one hit, removing the filter and using nullif dependent on the density of the bad data.
eg:
update table
set
col1 = nullif(col1, 'null'),
col2 = nullif(col2, 'null'),
...

SSIS won't be the best option for you. Conceptually, you are performing updates, lots of updates. SSIS can do really fast inserts. Updates, are fired off on a row by agonizing row basis.
In a SQL based approach, you'd be firing off 1000 update statements to fix everything. In an SSIS based scenario, using a data flow with OLE DB Command, you're looking at 1000 * 1000000.
I would skip the cursor myself. It is an acceptable time to use a cursor but if your tables are as littered with 'NULL' as it sounds, just assume you're updating every row and fix all the fields in a given record instead of coming back to the same row for each thing needing fixed.

Related

Return multiple columns as single comma separated row in SQL Server 2005

I'm curious to see if this is possible.
I have a table or this could be specific to any old table with data. A simple SELECT will return the columns and rows as a result set. What I'm trying to find out if is possible to return rows but rather than columns, the columns concatenated and are comma separated. So expected amount of rows returned but only one varchar column holding comma separated results of all the columns just like a CSV file.
Thanks.
[UPDATE]
Here is a bit more detail why I'm asking. I don't have the option to do this on the client, this is a task I'm trying to do with SSIS.
Scenario: I have a table that is dynamically created in SSIS but the column names change each time it's built. The original package uses BCP to grab the data and put it into a flat file but due to permissions when run as a job BCP can't create the flat file at the required destination. We can't get this changed either.
The other issue is that with SSIS 2005, using the flat files destination, you have to map the column name from the input source which I can't do because the column names keep changing.
I've written a script task to grab all the data from the original tables and then use stream writer to write to the CSV but I have to loop through each row then through each column to produce the string built up of all the columns. I want to measure performance of this concatenation of columns on sql server against a nasty loop with vb.net.
If I can get sql to produce a single column for each row I can just write a single line to the text file instead of iterating though each column to build the row.
I Think You Should try This
SELECT UserName +','+ Password AS ColumnZ
FROM UserTable
Assuming you know what columns the table has, and you don't want to do something dynamic and crazy, you can do this
SELECT CONCAT(ColumnA, ',', ColumnB) AS ColumnZ
FROM Table
There is a fancy way to this using SQL Server's XML functions, but for starters could you just cast the contents of the columns you care about as varchar and concatenate them with commas?
SELECT cast(colA as varchar)+', '+cast(colB as varchar)+', '+cast(colC as varchar)
FROM table
Note, that this will get tripped up if any of your contents have a comma or double quotes in them, in which case you can also use a replace function on each cast to escape them.
This could stand to be cleaned up some, but you can do this by using the metadata stored in sys.objects and sys.columns along with dynamic SQL. Note that I am NOT a fan of dynamic SQL, but for reporting purposes it shouldn't be too much of a problem.
Some SQL to create test data:
if (object_id('test') is not null)
drop table test;
create table test
(
id uniqueidentifier not null default newId()
,col0 nvarchar(255)
,col1 nvarchar(255)
,col2 nvarchar(255)
,col3 nvarchar(255)
,col4 nvarchar(255)
);
insert into test (col0,col1,col2,col3,col4)
select 'alice','bob','charlie','dave','emily'
union
select 'abby','bill','charlotte','daniel','evan'
A stored proc to build CSV rows:
-- emit the contents of a table as a CSV.
-- #table_name: name of a permanent (in sys.objects) table
-- #debug: set to 1 to print the generated query
create procedure emit_csv(#table_name nvarchar(max), #debug bit = 0)
as
declare #object_id int;
set nocount on;
set #object_id = object_id(#table_name);
declare #name nvarchar(max);
declare db_cursor cursor for
select name
from sys.columns
where object_id = #object_id;
open db_cursor;
fetch next from db_cursor into #name
declare #query nvarchar(max);
set #query = '';
while ##FETCH_STATUS = 0
begin
-- TODO: modify appended clause to escape commas in addition to trimming
set #query = #query + 'rtrim(cast('+#name+' as nvarchar(max)))'
fetch next from db_cursor into #name;
-- add concatenation to the end of the query.
-- TODO: Rearrange #query construction order to make this unnecessary
if (##fetch_status = 0)
set #query = #query + ' + '','' +'
end;
close db_cursor;
deallocate db_cursor;
set #query = 'select rtrim('+#query+') as csvrow from '+#table_name;
if #debug != 0
begin
declare #newline nvarchar(2);
set #newline = char(13) + char(10)
print 'Generated SQL:' + #newline + #query + #newline + #newline;
end
exec (#query);
For my test table, this generates the query:
select
rtrim(rtrim(cast(id as nvarchar(max)))
+ ','
+rtrim(cast(col0 as nvarchar(max)))
+ ','
+rtrim(cast(col1 as nvarchar(max)))
+ ','
+rtrim(cast(col2 as nvarchar(max)))
+ ','
+rtrim(cast(col3 as nvarchar(max)))
+ ','
+rtrim(cast(col4 as nvarchar(max))))
as csvrow
from test
and the result set:
csvrow
-------------------------------------------------------------------------------------------
EEE16C3A-036E-4524-A8B8-7CCD2E575519,alice,bob,charlie,dave,emily
F1EE6C84-D6D9-4621-97E6-AA8716C0643B,abby,bill,charlotte,daniel,evan
Suggestions
Modify the cursor loop to escape commas
Make sure that #table_name refers to a valid table (if object_id(#table_name) is null) in the sproc
Some exception handling would be good
Set permissions on this so that only the account that runs the report can execute it. String concatenation in dynamic SQL can be a big security hole, but I don't see another way to do this.
Some error handling to ensure that the cursor gets closed and deallocated might be nice.
This can be used for any table that is not a #temp table. In that case, you'd have to use sys.objects and sys.columns from tempdb...
select STUFF((select ','+ convert(varchar,l.Subject) from tbl_Student B,tbl_StudentMarks L
where B.Id=L.Id FOR XML PATH('')),1,1,'') Subject FROM tbl_Student A where A.Id=10

Running the same SQL code against a number of tables sequentially

I have a number of tables (around 40) containing snapshot data about 40 million plus vehicles. Each snapshot table is at a specific point in time (the end of the quarter) and is identical in terms of structure.
Whilst most of our analysis is against single snapshots, on occasion we need to run some analysis against all the snapshots at once. For instance, we may need to build a new table containing all the Ford Focus cars from every single snapshot.
To achieve this we currently have two options:
a) write a long, long, long batch file repeating the same code over and over again, just changing the FROM clause
[drawbacks - it takes a long time to write and changing a single line of code in one of blocks requires fiddly changes in all the other blocks]
b) use a view to union all the tables together and query that instead
[drawbacks - our tables are stored in separate database instances and cannot be indexed, plus the resulting view is something like 600 million records long by 125 columns wide, so is incredibly slow]
So, what I would like to find out is whether I can either use dynamic sql or put the SQL into a loop to spool through all tables. This would be something like:
for each *table* in TableList
INSERT INTO output_table
SELECT *table* as OriginTableName, Make, Model
FROM *table*
next *table* in TableList
Is this possible? This would mean that updating the original SQL when our client changes what they need (a very regular occurrence!) would be very simple and we would benefit from all the indexes we already have on the original tables.
Any pointers, suggestions or help will be much appreciated.
If you can identify your tables (e.g. a naming pattern), you could simply say:
DECLARE #sql NVARCHAR(MAX);
SELECT #sql = N'';
SELECT #sql = #sql + 'INSERT output_table SELECT ''' + name + ''', Make, Model
FROM dbo.' + QUOTENAME(name) + ';'
FROM sys.tables
WHERE name LIKE 'pattern%';
-- or WHERE name IN ('t1', 't2', ... , 't40');
EXEC sp_executesql #sql;
This assumes they're all in the dbo schema. If they're not, the adjustment is easy... just replace dbo with ' + QUOTENAME(SCHEMA_NAME([schema_id])) + '...
In the end I used two methods:
Someone on another forum suggested making use of sp_msforeachtable and a table which contains all the table names. Their suggestion was:
create table dbo.OutputTable (OriginTableName nvarchar(500), RecordCount INT)
create table dbo.TableList (Name nvarchar (500))
insert dbo.TableList
select '[dbo].[swap]'
union select '[dbo].[products]'
union select '[dbo].[structures]'
union select '[dbo].[stagingdata]'
exec sp_msforeachtable #command1 = 'INSERT INTO dbo.OutputTable SELECT ''?'', COUNT(*) from ?'
,#whereand = 'and syso.object_id in (select object_id(Name) from dbo.TableList)'
select * from dbo.OutputTable
This works perfectly well for some queries, but seems to suffer from the fact that one cannot use a GROUP BY clause within the query (or, at least, I could not find a way to do this).
The final solution I used was to use Dynamic SQL with a lookup table containing the table names. In a very simple form, this looks like:
DECLARE #TableName varchar(500)
DECLARE #curTable CURSOR
DECLARE #sql NVARCHAR(1000)
SET #curTable = CURSOR FOR
SELECT [Name] FROM Vehicles_LookupTables.dbo.AllStockTableList
OPEN #curTable
FETCH NEXT
FROM #curTable INTO #TableName
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = 'SELECT ''' +#TableName + ''', Make, sum(1) as Total FROM ' + #TableName + ' GROUP BY Make'
EXEC sp_executesql #sql
FETCH NEXT
FROM #curTable INTO #TableName
END
CLOSE #curTable
DEALLOCATE #curTable

SQL Query Replace All

I want to modify the following query:
UPDATE wp_posts
SET post_content =
REPLACE(post_content, 'http://oldlink.com', 'http://newlink.com');
To be something that goes through all tables, columns and values. Something similar to this (but this doesn't work):
UPDATE * SET *= REPLACE(*, 'http://oldlink.com', 'http://newlink.com');
I want to replace every instance of my old link to my new link in my database. Is there any way to do this?
UPDATE
Sorry, I forgot to mention, it's MySQL. I'm currently going through all the answers and I'll be back and let you know what worked. Thanks everyone!
UPDATE 2
Hi guys (and girls), I decided to just dump the database and do manual search and replace (with TextWrangler). As this isn't (currently) a large DB, it's probably the easiest way.
Here is one that works on SQL Server -- does something like this work for you?
Search and Replace SQL Server data in all columns, of all tables
Here's a stored procedure named, SearchAndReplace, that searches
through all the character columns of all tables in the current
database, and replaces the given string with another user provided
string.
This mssql script will whine a bit if there is a computed column in a table, but it will still execute:
DECLARE #searchvalue varchar(100)
DECLARE #newvalue varchar(100)
SET nocount off
SET #searchvalue = 'http://oldlink.com'
SET #newvalue = 'http://newlink.com'
SELECT * into #t FROM
(
SELECT 'update [' + a.TABLE_name + '] SET ['+ column_name+ ']=''' + #newvalue + '''
where [' +a.column_name+
']='''+#searchvalue +'''' sqlstring
FROM INFORMATION_SCHEMA.COLUMNS a
join
INFORMATION_SCHEMA.TABLES b
ON a.TABLE_name = b.TABLE_name
and b.TABLE_type = 'base table'
WHERE data_type in ('varchar', 'char', 'nvarchar')
and character_maximum_length >= len(#newvalue)
) a
DECLARE #sqlstring as nvarchar(500)
DECLARE SqlCursor CURSOR FAST_FORWARD FOR
SELECT sqlstring FROM #t
OPEN SqlCursor
FETCH NEXT FROM SqlCursor
INTO #sqlstring
WHILE ##FETCH_STATUS = 0
BEGIN
EXEC(#sqlstring)
FETCH NEXT FROM SqlCursor
INTO #sqlstring
END
CLOSE SqlCursor
DEALLOCATE SqlCursor
DROP TABLE #t
You didn't mention the database. I currently use Sybase ASA, where you cannot do this literally but it can be done by checking the column names from a join of systable and syscolumn and then using execute immediate

In SQL Server 2008, how should I copy data from database to another database?

I'm trying to write a stored procedure to copy a subset of data from one set of tables to an identical set of tables in a different database. The "source" database needs to be a parameter to the stored procedure.
I've struggled with this for two days now, and I thought I had a good solution:
Validate that the schemas are the same.
Create temporary "rmt" synonyms for the source tables using dynamic SQL.
Copy the data using INSERT INTO A SELECT * FROM rmtA WHERE <criteria>
Delete the synonyms.
This works pretty well for most tables, but for tables that contain an identity column, I'm forced not only to SET IDENTITY_INSERT ON & OFF, but even worse, I can't use SELECT *; I have to specify all the columns explicitly. This will be a nightmare if I add or delete columns later.
I've gotta get something out the door, so I'm going with this solution for now, but I'd like to think that there's a better solution out there somewhere.
Help?
It sounds like you're using dynamic SQL in your stored procedure, so you're ready to dynamically create your list of columns in the SELECT clause.
You can select from sys.columns to get the list of columns, and learn if the table has an identity column. Here's a query that shows the information you need to create the list of columns.
SELECT c.name, is_identity
FORM sys.columns c
WHERE object_id = object_id('MyTable')
In short, if is_identity is 1 for at least one column, you'll need to include the SET IDENTITY_INSERT. And, you would exclude any columns from the SELECT clause where is_identity = 1.
And, this approach will adapt to new columns you add to the tables.
Here's an example
DECLARE #TableName varchar(128) = 'MyTableName'
DECLARE #ColumnName varchar(128)
DECLARE #IsIdentity bit
DECLARE #TableHasIdentity bit = 0
DECLARE #sql varchar(2000) = 'SELECT '
-- create cursor to step through list of columns
DECLARE MyCurs CURSOR FOR
SELECT c.name, is_identity
FROM sys.columns c
WHERE object_id = object_id(#TableName)
ORDER BY column_id
-- open cursor and get first row
OPEN MyCurs
FETCH NEXT FROM MyCurs INTO #ColumnName, #IsIdentity
-- process each column in the table
WHILE ##FETCH_STATUS = 0
BEGIN
IF #IsIdentity = 0
-- add column to the SELECT clause
SET #sql = #sql + #ColumnName + ', '
ELSE
-- indicate that table has identity column
SET #TableHasIdentity = 1
-- get next column
FETCH NEXT FROM MyCurs INTO #ColumnName, #IsIdentity
END
-- cursor cleanup
CLOSE MyCurs
DEALLOCATE MyCurs
-- add FROM clause
SET #sql = LEFT(#sql, LEN(#sql)-1) + CHAR(13) + CHAR(10) + 'FROM ' + #TableName
-- add SET IDENTITY if necessary
IF #TableHasIdentity = 1
SET #sql = 'SET IDENTITY_INSERT ' + #TableName + ' ON' + CHAR(13) + CHAR (10)
+ #sql + CHAR(13) + CHAR (10)
+ 'SET IDENTITY_INSERT ' + #TableName + ' OFF'
PRINT #sql

Help with TSQL - a way to get the value in the Nth column of a row?

I hope to find a way to get the value in the Nth column of a dataset.
Thus, for N = 6 I want
SELECT (Column6Value) from MyTable where MyTable.RowID = 14
Is there a way to do this in TSQL as implemented in SQL Server 2005? Thanks.
You should be able to join with the system catalog (Information_Schema.Columns) to get the column number.
This works:
create table test (a int, b int, c int)
insert test values(1,2,3)
declare #column_number int
set #column_number = 2
declare #query varchar(8000)
select #query = COLUMN_NAME from information_Schema.Columns
where TABLE_NAME = 'test' and ORDINAL_POSITION = #column_number
set #query = 'select ' + #query + ' from test'
exec(#query)
But why you would ever do something like this is beyond me, what problem are you trying to solve?
Not sure if you're at liberty to redesign the table, but if the ordinal position of the column is significant, your data is not normalized and you're going to have to jump through lots of hoops for many common tasks.
Instead of having table MyTable with Column1... ColumnN you'd have a child table of those values you formerly stored in Column1...ColumnN each in their own row.
For those times when you really need those values in a single row, you could then do a PIVOT: Link
Edit: My suggestion is somewhat moot. Ash clarified that it's "de-normalization by design, it's a pivot model where each row can contain one of any four data types." Yeah, that kind of design can be cumbersome when you normalize it.
If you know the range of n you could use a case statement
Select Case when #n = 1 then Column1Value when #n = 2 then Column2Value end
from MyTable
As far as I know there is no dynamic way to replace a column (or table) in a select statement without resorting to dynamic sql (in which chase you should probably refactor anyways)
Implementation of #Mike Sharek's answer.
Declare #columnName varchar(255),
#tablename varchar(255), #columnNumber int, #SQL nvarchar(4000)
Set #tablename = 'MyTable'
Set #columnNumber = 6
Select #columnName = Column_Name from Information_SChema.columns
where Ordinal_position = #columnNumber and Table_Name = #tablename
Set #SQL = 'select ' + #columnName + ' from ' + #tableName + ' where RowID=14'
Exec sp_Executesql #SQL
I agree with Sambo - why are you trying to do this? If you are calling the code from C# or VB, its much easier to grab the 6th column from a resultset.