Iterate through databases with similar tables but slightly different column names - sql

In SQL Server, I am trying to consolidate multiple databases with similar relational table structure into one database. Let's say each database contains a table called items and each items table has a column for the name of an item. However, this column name varies depending on the database we are in. Some databases may have Item_Name while others may have ItemName or Name, etc.
In the consolidated database, I am trying to append all of the items tables into one table, and I want to create a CASE expression that checks if a column name exists in the items table from any of the databases, and if it does, then to return the results from that table column.
This is what I have at the moment:
DECLARE #sql varchar(max)
SELECT #sql = #sql + 'UNION ALL
SELECT ''' + name + ''' AS DatabaseName,
CASE WHEN EXISTS (SELECT COLUMN_NAME
FROM ' + QUOTENAME(name) + '.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = ''items'' AND
COLUMN_NAME = ''Item_Name'')
THEN Item_Name
WHEN EXISTS (SELECT COLUMN_NAME
FROM ' + QUOTENAME(name) + '.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = ''items'' AND
COLUMN_NAME = ''ItemName'')
THEN ItemName
...
END AS NameOfItem
FROM ' + QUOTENAME(name) + 'dbo.Items'
FROM master.sys.databases
WHERE name <> 'master' ...
SET #sql = STUFF(#sql, 1, LEN(' UNION ALL'), '')
EXEC(#sql)
I believe this throws an error because of the CASE expression, since in the THEN clauses I am specifying a column that may not exist in that current table, even though it wouldn't satisfy the condition in the WHEN clause. How would I resolve this issue?

Related

Find all tables with a specific column name + where value is lower than X

I need help in searching our database. So we have this problem that we need to know all tables, with the column_name "sysmodified" and see if there are any entries before a specific date (25-sep-2019).
I tried to find the answer on google and stackoverflow, but I either get an answer how to get the results before 25-sep within 1 table Example1, or results how to get all tables, which has this column_name Example2.
Using the code I have so far (see below), we know that there are 325 tables, which contain the column_name "sysmodified". I could manually use example 1 to get my information, but I was hoping for a way to get the results that I need with just one query.
This is what I have so far:
USE [database2]
GO
SELECT t.name AS table_name,
SCHEMA_NAME(schema_id) AS schema_name,
c.name AS column_name
FROM sys.tables AS t
INNER JOIN sys.columns c ON t.OBJECT_ID = c.OBJECT_ID
WHERE c.name LIKE '%sysmodified%'
ORDER BY schema_name, table_name;
However if I try to enter anything like sysmodified < '20190925'. I get errors
WHERE c.name LIKE '%sysmodified%'
AND t.sysmodified < '20190925'
or this approach, which also results in errors
SELECT t.name AS table_name, sysmodified,
based on (but I cannot add 325 columnnames in FROM?)
SELECT
title,
primary_author,
published_date
FROM
books
WHERE
title LIKE 'The%'
Hopefully someone could help me with an approach how to tackle this problem. We use Microsoft SQL Server Management Studio 17 (if that might be relevant).
This is fairly simple dynamic sql to put together. This should produce the results you are looking for as I understand your requirements.
declare #SQL nvarchar(MAX) = ''
select #SQL = #SQL + 'select distinct TableName = ''' + object_name(object_ID) + ''' from ' + quotename(object_name(object_ID)) + ' where ' + quotename(c.name) + ' < ''20190925'' UNION ALL '
from sys.columns c
where name like '%sysmodified%'
set #SQL = left(#SQL, len(#SQL) - 10) --removes the final UNION ALL
select #SQL
--once you are comfortable that the dynamic sql is correct just uncomment the next line
--exec sp_executesql #SQL
The comment from GMB is right. Fastest way I can think of an answer is using Dynamic SQL.
I would build a query to loop through or create a union select statement across all tables that have that column. Something like:
(skeleton)
DECLARE #N_SQL NVARCHAR(MAX)
Find all tables that have a column with '%sysmodified%'
Build a dynamic query of (union style) from above like:
SET #N_SQL = ''
SELECT #N_SQL = #NSQL + 'UNION SELECT ' [SCHEMA] + '.' + [TABLENAME] + ' AS TABLENAME FROM ' + [SCHEMA] + '.' + [TABLENAME] + ' WHERE ' + [COLUMN] + ' >= '''<DATE>'''
SELECT #N_SQL --just to see what that string looks like
EXEC SP_EXECUTESQL RIGHT(#N_SQL, LEN(#N_SQL) - 5) --Trimming out the first word "UNION"
So, the above might work. Might need a bit of clean up, but its a skeleton idea.

SQL loop for each column in a table

Say I have a table called:
TableA
The following columns exist in the table are:
Column1, Column2, Column3
what I am trying to accomplish is to see how many records are not null.
to do this I have the following case statement:
sum(Case when Column1 is not null then 1 else 0 end)
What I want is the above case statement for every table that exists from a list provided and to be run for each columns that exists in the table.
So for the above example the case statment will run for Column1, Column2 and Column3 as there are 3 columns in that particular table etc
But I want to specfiy a list of tables to loop through executing the logic above
create procedure tab_cols (#tab nvarchar(255))
as
begin
declare #col_count nvarchar(max) = ''
,#col nvarchar(max) = ''
select #col_count += case ORDINAL_POSITION when 1 then '' else ',' end + 'count(' + QUOTENAME(COLUMN_NAME,']') + ') as ' + QUOTENAME(COLUMN_NAME,']')
,#col += case ORDINAL_POSITION when 1 then '' else ',' end + QUOTENAME(COLUMN_NAME,']')
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = #tab
order by ORDINAL_POSITION
declare #stmt nvarchar(max) = 'select * from (select ' + #col_count + ' from ' + #tab + ') t unpivot (val for col in (' + #col + ')) u'
exec sp_executesql #stmt
end
Wouldn't it be easy as this?
SELECT AccountID
,SUM(Total) AS SumTotal
,SUM(Profit) AS SumProfit
,SUM(Loss) AS SumLoss
FROM tblAccount
GROUP BY AccountID
If I understand this correctly you want to get the sums, but not for all rows in one go but for each accountID separately. This is what GROUP BY is for...
If ever possible try to avoid loops, cursors and other procedural approaches...
UPDATE: Generic approach for different tables
With different tables you will - probably - need exactly the statement I show above, but you'll have to generate it dynamically and use EXEC to execute it. You can go through INFORMATION_SCHEMA.COLUMNS to get the columns names...
But:
How should this script know generically, which columns should be summed up? You might head for data_type like 'decimal%' or similar...
What about the other columns and their usage in GROUP BY?
How would you want to place aliases
How do you want to continue with a table of unknown structure?
To be honest: I think, there is no real-generic-one-for-all approach for this...

How to find table names which have a same value in other tables based aone column

I have a database with many tables and that table has a common column. How can I retrieve that table which have same value in that column?
ex:-
I have 25 table, all tables have a column name CCODE now I want to know which tables have same value for this column?
The following statement will create an UNION SELECT what brings back all the data you need in one result set. Best is to set the query output to text and don't forget to set the query option max text to highest (8192). Take the result of this SELECT into a new SQL window and execute it:
WITH AllTablesWithMyColumn AS
(
SELECT DISTINCT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME='CCODE'
)
SELECT STUFF(
(
SELECT 'UNION SELECT ''' + TABLE_NAME + ''' AS TableName, CCODE FROM ' + TABLE_NAME + CHAR(13) + CHAR(10)
FROM AllTablesWithMyColumn
FOR XML PATH(''),TYPE
).value('.','varchar(max)'),1,6,'')
If you need any further help, just tell me...

How to insert into tempoary table twice

I've picked up some SQL similar to the following:
IF EXISTS(SELECT name FROM tempdb..sysobjects WHERE name Like N'#tmp%'
and id=object_id('tempdb..#tmp'))
DROP TABLE #tmp
into #tmp
select * from permTable
I need to add more data to #tmp before continuing processing:
insert into #tmp
select * from permTable2
But this gives errors because SQL has assumed sizes and types for #tmp columns (e.g. if permTable has a column full of ints but permTable2 has column with same name but with a NULL in one record you get "Cannot insert the value NULL into column 'IsPremium', table 'tempdb.dbo.#tmp").
How do I get #tmp to have the types I want? Is this really bad practise?
Have you considered creating a table var instead? You can declare the columns like such
declare #sometable table(
SomeField [nvarchar](15),
SomeOtherField [decimal](15,2));
This is why select into is a poor idea for your problem. Create the table structure specifically with a create table command and then write two insert statements.
It isn't possible.
If you need to generate the table definition typelist now,
create a view from the select statement, and read the columns and their definition from information_schema... (this work of art won't consider decimal and/or datetime2)
Note: this will give you the lowest possible field-length for varchar/varbinary columns you currently selected. You need to adjust them manually...
SELECT
','
+ COLUMN_NAME
+ ' '
+ DATA_TYPE
+ ' '
+ ISNULL
(
'('
+
CASE
WHEN CHARACTER_MAXIMUM_LENGTH = -1
THEN 'MAX'
ELSE CAST(CHARACTER_MAXIMUM_LENGTH AS varchar(36))
END
+ ')'
, ''
)
+ ' '
+ CASE WHEN IS_NULLABLE = 'NO' THEN 'NOT NULL' ELSE '' END
FROM information_schema.columns
WHERE table_name = '________theF'
ORDER BY ORDINAL_POSITION
And the field-list for the insert-statement:
SELECT
',' + COLUMN_NAME
FROM information_schema.columns
WHERE table_name = '________theF'
ORDER BY ORDINAL_POSITION

View error in PostgreSQL

I have a large query in a PostgreSQL database.
The Query is something like this:
SELECT * FROM table1, table2, ... WHERE table1.id = table2.id...
When I run this query as a sql query, the it returns the wanted row.
But when I tries to use the same query to create a view, it returns an error:
"error: column "id" specified more than once."
(I use pgAdminIII when executing the queries.)
I'll guess this happens because the resultset will have more than one column named "id". Is there someway to solve this, without writing all the column names in the query?
That happens because a view would have two id named columns, one from table1 and one from table2, because of the select *.
You need to specify which id you want in the view.
SELECT table1.id, column2, column3, ... FROM table1, table2
WHERE table1.id = table2.id
The query works because it can have equally named columns...
postgres=# select 1 as a, 2 as a;
a | a
---+---
1 | 2
(1 row)
postgres=# create view foobar as select 1 as a, 2 as a;
ERROR: column "a" duplicated
postgres=# create view foobar as select 1 as a, 2 as b;
CREATE VIEW
If only join columns are duplicated (i.e. have the same names), then you can get away with changing:
select *
from a, b
where a.id = b.id
to:
select *
from a join b using (id)
If you got here because you are trying to use a function like to_date and getting the "defined more than once" error, note that you need to use a column alias for functions, e.g.:
to_date(o.publication_date, 'DD/MM/YYYY') AS publication_date
No built-in way in the language to solve it (and frankly, * is a bad practice in general because it can cause latent defects to arise as the table schemas change - you can do table1.*, table2.acolumn, tabl2.bcolumn if you want all of one table and selectively from another), but if PostgreSQL supports INFORMATION_SCHEMA, you can do something like:
DECLARE #sql AS varchar
SELECT #sql = COALESCE(#sql + ', ', '')
+ '[' + TABLE_NAME + '].[' + COLUMN_NAME + ']'
+ CHAR(13) + CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME IN ('table1', 'table2')
ORDER BY TABLE_NAME, ORDINAL_POSITION
PRINT #sql
And paste the results in to save a lot of typing. You will need to manually alias the columns which have the same name, of course. You can also code-gen unique names if you like (but I don't):
SELECT #sql = COALESCE(#sql + ', ', '')
+ '[' + TABLE_NAME + '].[' + COLUMN_NAME + '] '
+ 'AS [' + TABLE_NAME + '_' + COLUMN_NAME + ']'
+ CHAR(13) + CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME IN ('table1', 'table2')
ORDER BY TABLE_NAME, ORDINAL_POSITION