Running the same SQL code against a number of tables sequentially - sql

I have a number of tables (around 40) containing snapshot data about 40 million plus vehicles. Each snapshot table is at a specific point in time (the end of the quarter) and is identical in terms of structure.
Whilst most of our analysis is against single snapshots, on occasion we need to run some analysis against all the snapshots at once. For instance, we may need to build a new table containing all the Ford Focus cars from every single snapshot.
To achieve this we currently have two options:
a) write a long, long, long batch file repeating the same code over and over again, just changing the FROM clause
[drawbacks - it takes a long time to write and changing a single line of code in one of blocks requires fiddly changes in all the other blocks]
b) use a view to union all the tables together and query that instead
[drawbacks - our tables are stored in separate database instances and cannot be indexed, plus the resulting view is something like 600 million records long by 125 columns wide, so is incredibly slow]
So, what I would like to find out is whether I can either use dynamic sql or put the SQL into a loop to spool through all tables. This would be something like:
for each *table* in TableList
INSERT INTO output_table
SELECT *table* as OriginTableName, Make, Model
FROM *table*
next *table* in TableList
Is this possible? This would mean that updating the original SQL when our client changes what they need (a very regular occurrence!) would be very simple and we would benefit from all the indexes we already have on the original tables.
Any pointers, suggestions or help will be much appreciated.

If you can identify your tables (e.g. a naming pattern), you could simply say:
DECLARE #sql NVARCHAR(MAX);
SELECT #sql = N'';
SELECT #sql = #sql + 'INSERT output_table SELECT ''' + name + ''', Make, Model
FROM dbo.' + QUOTENAME(name) + ';'
FROM sys.tables
WHERE name LIKE 'pattern%';
-- or WHERE name IN ('t1', 't2', ... , 't40');
EXEC sp_executesql #sql;
This assumes they're all in the dbo schema. If they're not, the adjustment is easy... just replace dbo with ' + QUOTENAME(SCHEMA_NAME([schema_id])) + '...

In the end I used two methods:
Someone on another forum suggested making use of sp_msforeachtable and a table which contains all the table names. Their suggestion was:
create table dbo.OutputTable (OriginTableName nvarchar(500), RecordCount INT)
create table dbo.TableList (Name nvarchar (500))
insert dbo.TableList
select '[dbo].[swap]'
union select '[dbo].[products]'
union select '[dbo].[structures]'
union select '[dbo].[stagingdata]'
exec sp_msforeachtable #command1 = 'INSERT INTO dbo.OutputTable SELECT ''?'', COUNT(*) from ?'
,#whereand = 'and syso.object_id in (select object_id(Name) from dbo.TableList)'
select * from dbo.OutputTable
This works perfectly well for some queries, but seems to suffer from the fact that one cannot use a GROUP BY clause within the query (or, at least, I could not find a way to do this).
The final solution I used was to use Dynamic SQL with a lookup table containing the table names. In a very simple form, this looks like:
DECLARE #TableName varchar(500)
DECLARE #curTable CURSOR
DECLARE #sql NVARCHAR(1000)
SET #curTable = CURSOR FOR
SELECT [Name] FROM Vehicles_LookupTables.dbo.AllStockTableList
OPEN #curTable
FETCH NEXT
FROM #curTable INTO #TableName
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = 'SELECT ''' +#TableName + ''', Make, sum(1) as Total FROM ' + #TableName + ' GROUP BY Make'
EXEC sp_executesql #sql
FETCH NEXT
FROM #curTable INTO #TableName
END
CLOSE #curTable
DEALLOCATE #curTable

Related

How to use a variable in "Select [some calculations] insert into #NameOfTheTableInThisVariable"?

I have a procedure in which there are calculations being done and the final result is inserted into a permanent table. I want to remove the permanent table and I cannot use Temp table as well. So i want to use a dynamic table name, which is stored in a variable:
Current scenario:
Insert into xyz_table
Select col1,col2,sum(col3)
from BaseTable
(In reality, there are lot of columns and a lot of calculations)
What I want:
Select col1,col2,sum(col3) into #DynamicTableName
from BaseTable
where the name of the table would be dynamic in nature i.e.,
#DynamicTableName = 'xyz ' + cast(convert(date,getdate()) as nvarchar)+' '+convert(nvarchar(5),getdate(),108)
It will have date and time in its name every time the procedure is run.
I want to use this name in the "Select * into statement"
How can I achieve this?
i tried it with the some short code. But since my procedure has a lot of calculations and UNIONS , I cannot use that code for this. Any help would be appreciated.
declare #tablename nvarchar(30)= 'xyz ' + cast(convert(date,getdate()) as nvarchar)+' '+convert(nvarchar(5),getdate(),108)
declare #SQL_Statement nvarchar(100)
declare #SQL_Statement2 nvarchar(100)
declare #dropstatement nvarchar(100)
SET #SQL_Statement = N'SELECT * Into ' +'['+#tablename +'] '+'FROM '+ 'dimBranch'
print #SQL_Statement
EXECUTE sp_executesql #SQL_Statement
SET #SQL_Statement= N'select * from ' + '['+#tablename + '] '
print #SQL_Statement
EXECUTE sp_executesql #SQL_Statement
set #dropstatement = 'DROP TABLE' + '['+#tablename + '] '
PRINT #dropstatement
exec sp_executesql #dropstatement
Reason why I want this is because I use this procedure in ETL job as well as in SSRS report. And if someone runs the package and the SSRS report at the same time, the incorrect or weird data gets stored in the table. Therefore I need a dynamic name of the table with date and time.
You can't parameterize an identifier in SQL, only a value
--yes
select * from table where column = #value
--no
select * from #tablename where #columnname = #value
The only thin you can do to make these things dynamic is to build an sql string and execute it dynamically, but your code is already doing this with sp_executesql
More telling is your complaint at the bottom of your question, that if the procedure is invoked simultaneously it gives problems. Perhaps you should consider using local table variables for temporary data storage that the report is using rather than pushing data back into the db
DECLARE #temp TABLE(id INT, name varchar100);
INSERT INTO #temp SELECT personid, firstname FROM person;
-- work with temp data
select count(*) from #temp;
--when #temp goes out of scope it is lost,
--no other procedure invoked simultaneously can access this procedure'a #temp
Consider a local temp table, which is automatically session scoped without the need for dynamic SQL. For example:
SELECT *
INTO #YourTempTable
FROM dimBranch;
The local temp table will automatically be dropped when the proc completes so there is no need for an explict drop in the proc code.

How to find, across multiple databases, a specific table (common to most/all) that is not empty

I am working in an environment where many users have the same (or almost identical) test database set up on a common MSSQL server. We are talking about well over 100 databases for testing purposes. And at the very least, 95+% of them will contain the table I am trying to target.
These test databases are only filled with junk data - I will not be impacting anyone by doing any kind of a search. I am looking at one table, specifically, and I need to determine if any test database has that table actually containing any data at all. It doesn’t matter what the data is, I just need to find a table actually containing any data, so I can determine why that data exists in the first place. (This DB is quite old - almost two decades, so sometimes no-one has a clear answer why something in it exists).
I have been trying to build an SQL statement that iterates through all the databases, and checks that particular table specifically to see if it has any content, to bring back a list of databases that have that table containing data.
So to be specific: I need to find all databases where a specific table has any content at all (COUNT(*) > 0). Right now totally stuck with not much of any clues as to how to proceed.
In both methods replace <tablename> with the table name
Using sp_foreachdb
You can use sp_foreachDb
CREATE TABLE ##TBLTEMP(dbname varchar(100), rowscount int)
DECLARE #command varchar(4000)
SELECT #command =
'if exists(select 1 from [?].INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME =''<TABLE NAME>'') insert into ##TBLTEMP(dbname,rowscount) select ''[?]'',count(*) from [?].dbo.<tablename>'
EXEC sp_MSforeachdb #command
SELECT * FROM ##TBLTEMP WHERE rowscount > 0
DROP TABLE ##TBLTEMP
Using CURSOR
CREATE TABLE ##TBLTEMP(dbname varchar(100), rowscount int)
DECLARE #dbname Varchar(100), #strQuery varchar(4000)
DECLARE csr CURSOR FOR SELECT [name] FROM sys.databases
FETCH NEXT FROM csr INTO #dbname
WHILE ##FETCH_STATUS = 0
BEGIN
SET #strQuery = 'if exists(select 1 from [' + #dbname +'].INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME =''<TABLE NAME>'') INSERT INTO ##TBLTEMP(dbname,rowscount) SELECT ''' + #dbname + '' ', COUNT(*) FROM [' + #dbname + '].[dbo].<table name>'
EXEC(#strQuery)
FETCH NEXT FROM csr INTO #dbname
END
CLOSE csr
DEALLOCATE csr
SELECT * FROM ##TBLTEMP where rowscount > 0
References
Sp MSforeachDB
Run same command on all SQL Server databases without cursors
DECLARE CURSOR (Transact-SQL)

How to pick a table_name value from one table and delete records from the table_name table based on a condition?

We have a table. Lets call it Table_A.
Table_A holds bunch of table_names and numeric value associated to each table_name. Refer to the picture below
Can someone help me write a query to:
Select table_names from TABLE_A one by one; go to that table, Check the Date_inserted of each record against NO_OF_DAYS in Table_A and if the record is older than NO_OF_DAYS in Table_A, then DELETE THAT RECORD from that specific table.
I'm guessing we have to create dynamic values for this query but I'm having a hard time.
So, in the above picture, the query should:
Select the first table_name (T_Table1) from Table_A
Go to that Table (T_Table1)
Check the date inserted of each record in (T_Table1) against the condition
If the condition (IF record was inserted prior to NO_OF_DAYS, which is 90 in this case THEN delete the record; ELSE move to next
record)
Move on to the next table (T_Table2) in Table_A
Continue till all the table_names in Table_A have been executed
What you posted as your attempt (in a comment), quite simply isn't going to work. Let's actually format that first, shall we:
SET SQL = '
DELETE [' + dbo + '].[' + TABLE_NAME + ']
where [Date_inserted ] < '
SET SQL = SQL + ' convert(varchar, DATEADD(day, ' + CONVERT(VARCHAR, NO_OF_DAYS) + ',' + '''' + CONVERT(VARCHAR, GETDATE(), 102) + '''' + '))'
PRINT SQL
EXEC (SQL)
Firstly, I actually have no idea what you're even trying to do here. You have things like [' + dbo + '], which means that you're referencing the column dbo; as you're using a SET, then no column dbo can exist. Also, variables are prefixed with a # in SQL Server; you have none.
Anyway, the solution. Some might not like this one, as I'm using a CURSOR, rather than doing it all in one go. I, however, do have my reasons. A CURSOR isn't actually a "bad" thing, like many believe; the problem is that people constantly use them incorrectly. Using a CURSOR to loop through records and create a hierarchy for example is a terrible idea; there are far better dataset approaches.
So, what are my reasons? Firstly I can parametrise the dynamic SQL; this would be harder outside a CURSOR as I'd need to declare a different parameter for every DELETE. Also, with a CURSOR, if the DELETE fails on one table, it won't on the others; one long piece of dynamic SQL would mean if one of the transactions fail, they would all be rolled back. Also, depending on the size of the deletes, that could be a very big DELETE.
It's important, however, you understand what I've done here; if you don't that's a problem unto itself. What happens if you need to trouble shoot it in the future? SO isn't a website for support like that; you need to support your own code. If you can't, understand the code you're given don't use it or learn what it's doing first (or you're doing the wrong thing).
Note I use my own objects, in the absence of consumable sample data:
CREATE TABLE TableOfTables (TableName sysname,
NoOfDays int);
GO
INSERT INTO TableOfTables
VALUES ('T1',10),
('T2',15),
('T3',5);
GO
DECLARE Deletes CURSOR FOR
SELECT TableName, NoOfDays
FROM TableOfTables;
DECLARE #SQL nvarchar(MAX), #TableName sysname, #Days int;
OPEN Deletes;
FETCH NEXT FROM Deletes
INTO #TableName, #Days;
WHILE ##FETCH_STATUS = 0 BEGIN
SET #SQL = N'DELETE FROM ' + QUOTENAME(#TableName) + NCHAR(10) +
N'WHERE DATEDIFF(DAY, InsertedDate, GETDATE()) >= #dDays;'
PRINT #SQL; --Say hello to your best friend. o/
--EXEC sp_executeSQL #SQL, N'#dDays int', #dDays = #Days; --Uncomment to run
FETCH NEXT FROM Deletes
INTO #TableName, #Days;
END
CLOSE Deletes;
DEALLOCATE Deletes;
GO
DROP TABLE TableOfTables;
GO

SSIS Multiple Unknow Column Updates

I wonder if anyone has come across a similar situation before that could point me in the right direction..? I'll add that it's a bit frustrating as someone has replaced the NULL value with a text string containing the word 'NULL' - which I need to remove.
I have 6 quite large tables, over 250+ columns and in excess of 1 million records in each and I need to update the columns where the word NULL appears in a row and replace it with a proper NULL value - the problem is that I have no idea in which column this appears.
As a start, I've got some code that will list every column with a count of the values and anything that looks to have a lower count than expected, I'll run a SQL query to ascertain if the column contains the string 'NULL' and using the following code, replace it with NULL.
declare #tablename sysname
declare #ColName nvarchar(500)
declare #sql nvarchar(1000)
declare #sqlUpdate nvarchar(1000)
declare #ParmDefinition nvarchar(1000)
set #tablename = N'Table_Name'
Set #ColName = N'Column_Name'
set #ParmDefinition = N'#ColName nvarchar OUTPUT';
set #sql= 'Select ' + #ColName + ', Count(' + #ColName + ') from ' + #tablename + ' group by ' + #ColName + ''
Set #sqlUpdate = 'Update ' + #tablename + ' SET ' + #ColName + ' = NULL WHERE '+ #ColName + ' = ''NULL'''
print #sql
print #sqlUpdate
EXECUTE sp_executesql #sql, #ParmDefinition, #ColName=#ColName OUTPUT;
EXECUTE sp_executesql #sqlUpdate, #ParmDefinition, #ColName=#ColName OUTPUT;
What I'm trying to with SSIS is to iterate through each column,
Select Column_Name from Table_Name where Column_Name = 'NULL'
run the appropriate query, and perform the update.
So far I can extract the column names from Information.Schema and get a record count from the appropriate table, but when it comes to running the actual UPDATE statement (as above, sqlUpdate) - there doesn't seem to be a component that's happy with the dynamic phrasing of the query.
I'm using a Conditional Split to determine where to go if there are records (which may be incorrect) and I've tried OLE DB Command for the update.
In short, I'm wondering whether SSIS is the best tool for this job or whether I'm looking in the wrong place!
I'm using SSIS 2005, which may well have limitations that I'm not yet aware of!
Any guidance would be appreciated.
Thanks,
Jon
The principle is basically sound, but I would leave SSIS out, and do it with SSMS directly against the SQL Server and build the looping logic there, probably with a cursor.
I'm not sure whether you need to check the count of potential values first - you might just as well apply the update and accept that sometimes it will update no rows - the filtering will then not be duplicated.
Something like
declare columns cursor local read_only for
select
c.TABLE_CATALOG,
c.TABLE_SCHEMA,
c.TABLE_NAME,
c.COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS c
inner join INFORMATION_SCHEMA.TABLES t
on c.TABLE_CATALOG = t.TABLE_CATALOG
and c.TABLE_SCHEMA = t.TABLE_SCHEMA
and c.TABLE_NAME = c.TABLE_NAME
where c.DATA_TYPE like '%varchar%'
open columns
declare #catalog varchar(100), #schema varchar(100), #table varchar(100), #column varchar(100)
fetch from columns into #catalog, #schema, #table, #column
while ##FETCH_STATUS= 0
begin
-- construct update here and execute it.
select #catalog, #schema, #table, #column
fetch next from columns into #catalog, #schema, #table, #column
end
close columns
deallocate columns
You might also consider applying all the updates to the table in one hit, removing the filter and using nullif dependent on the density of the bad data.
eg:
update table
set
col1 = nullif(col1, 'null'),
col2 = nullif(col2, 'null'),
...
SSIS won't be the best option for you. Conceptually, you are performing updates, lots of updates. SSIS can do really fast inserts. Updates, are fired off on a row by agonizing row basis.
In a SQL based approach, you'd be firing off 1000 update statements to fix everything. In an SSIS based scenario, using a data flow with OLE DB Command, you're looking at 1000 * 1000000.
I would skip the cursor myself. It is an acceptable time to use a cursor but if your tables are as littered with 'NULL' as it sounds, just assume you're updating every row and fix all the fields in a given record instead of coming back to the same row for each thing needing fixed.

Dynamically search columns for given table

I need to create a search for a java app I'm building where users can search through a SQL database based on the table they're currently viewing and a search term they provide. At first I was going to do something simple like this:
SELECT * FROM <table name> WHERE CAST((SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '<table name>')
AS VARCHAR) LIKE '%<search term>%'
but that subquery returns more than one result, so then I tried to make a procedure to loop through all the columns in a given table and put any relevant fields in a results table, like this:
CREATE PROC sp_search
#tblname VARCHAR(4000),
#term VARCHAR(4000)
AS
SET nocount on
SELECT COLUMN_NAME
INTO #tempcolumns
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = #tblname
ALTER TABLE #tempcolumns
ADD printed BIT,
num SMALLINT IDENTITY
UPDATE #tempcolumns
SET printed = 0
DECLARE #colname VARCHAR(4000),
#num SMALLINT
WHILE EXISTS(SELECT MIN(num) FROM #tempcolumns WHERE printed = 0)
BEGIN
SELECT #num = MIN(num)
FROM #tempcolumns
WHERE printed = 0
SELECT #colname = COLUMN_NAME
FROM #tempcolumns
WHERE num = #num
SELECT * INTO #results FROM #tblname WHERE CAST(#colname AS VARCHAR)
LIKE '%' + #term + '%' --this is where I'm having trouble
UPDATE #tempcolumns
SET printed = 1
WHERE #num = num
END
SELECT * FROM #results
GO
This has two problems: first is that it gets stuck in an infinite loop somehow, and second I can't select anything from #tblname. I tried using dynamic sql as well, but I don't know how to get results from that or if that's even possible.
This is for an assignment I'm doing at college and I've gotten this far after hours of trying to figure it out. Is there any way to do what I want to do?
You need to only search columns that actually contain strings, not all columns in a table (which may include integers, dates, GUIDs, etc).
You shouldn't need a #temp table (and certainly not a ##temp table) at all.
You need to use dynamic SQL (though I'm not sure if this has been part of your curriculum so far).
I find it beneficial to follow a few simple conventions, all of which you've violated:
use PROCEDURE not PROC - it's not a "prock," it's a "stored procedure."
use dbo. (or alternate schema) prefix when referencing any object.
wrap your procedure body in BEGIN/END.
use vowels liberally. Are you saving that many keystrokes, never mind time, saying #tblname instead of #tablename or #table_name? I'm not fighting for a specific convention but saving characters at the cost of readability lost its charm in the 70s.
don't use the sp_ prefix for stored procedures - this prefix has special meaning in SQL Server. Name the procedure for what it does. It doesn't need a prefix, just like we know they're tables even without a tbl prefix. If you really need a prefix there, use another one like usp_ or proc_ but I personally don't feel that prefix gives you any information you don't already have.
since tables are stored using Unicode (and some of your columns might be too), your parameters should be NVARCHAR, not VARCHAR. And identifiers are capped at 128 characters, so there is no reason to support > 257 characters for #tablename.
terminate statements with semi-colons.
use the catalog views instead of INFORMATION_SCHEMA - though the latter is what your professor may have taught and might expect.
CREATE PROCEDURE dbo.SearchTable
#tablename NVARCHAR(257),
#term NVARCHAR(4000)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #sql NVARCHAR(MAX);
SET #sql = N'SELECT * FROM ' + #tablename + ' WHERE 1 = 0';
SELECT #sql = #sql + '
OR ' + c.name + ' LIKE ''%' + REPLACE(#term, '''', '''''') + '%'''
FROM
sys.all_columns AS c
INNER JOIN
sys.types AS t
ON c.system_type_id = t.system_type_id
AND c.user_type_id = t.user_type_id
WHERE
c.[object_id] = OBJECT_ID(#tablename)
AND t.name IN (N'sysname', N'char', N'nchar',
N'varchar', N'nvarchar', N'text', N'ntext');
PRINT #sql;
-- EXEC sp_executesql #sql;
END
GO
When you're happy that it's outputting the SELECT query you're after, comment out the PRINT and uncomment the EXEC.
You get into an infinite loop because EXISTS(SELECT MIN(num) FROM #tempcolumns WHERE printed = 0) will always return a row even if there are no matches - you need to EXISTS (SELECT * .... instead
To use dynamic SQL, you need to build up a string (varchar) of the SQL statement you want to run, then you call it with EXEC
eg:
declare #s varchar(max)
select #s = 'SELECT * FROM mytable '
Exec (#s)