SQL: Query multiple tables with one query - sql

I would like to query the following in SQL Server:
Select...FROM Database.dbo.[Table 1], [Table 2], [Table 3]
and so on.
Instead of writing a unique select query for each table, I would like to write one, as follows:
FROM Database.dbo.[All Tables]
What is the most effective way to do this? I imagine I would need to define All Tables somehow and point them to the static list of tables I would like to query.
Thank you

Here's a couple of examples, of the kind of thing you might want to do.
1) SIMPLEST: to list all tables, this command will generate a listing of SQL commands that you can paste into a query window and execute:
SELECT 'SELECT * FROM ' + TABLE_NAME + '; '
FROM INFORMATION_SCHEMA.TABLES
2) Adding the TABLE_NAME to each listing, and showing just the top 3 rows of each. Also, though I usually just copy/paste them, you could try capturing all in a variable, then executing the variable contents with EXECUTE:
DECLARE #SQL_CMDS NVARCHAR(MAX)
SET #SQL_CMDS = ''
SELECT #SQL_CMDS = #SQL_CMDS + CONVERT(VARCHAR(1000),
'SELECT TOP 3 [' + TABLE_NAME + '] = ''' + TABLE_NAME
+ ''' ,* FROM ' + TABLE_NAME + '; ')
FROM INFORMATION_SCHEMA.TABLES;
EXECUTE sp_executesql #SQL_CMDS
3) Putting in a WHERE clause, for all columns that might contain an order number column (if you're not sure whether it's ORDERNO or OrderNumber or Order_Num):
DECLARE #SQL_CMDS NVARCHAR(MAX)
SET #SQL_CMDS = ''
SELECT #SQL_CMDS = #SQL_CMDS + CONVERT(VARCHAR(1000),
'SELECT [' + TABLE_NAME + '] = ''' + TABLE_NAME + ''' ,* FROM [' + TABLE_NAME
+ '] WHERE [' + COLUMN_NAME + '] = ''thisValueIWant'';')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE UPPER(COLUMN_NAME) LIKE '%ORDER%N%'
GROUP BY TABLE_NAME, COLUMN_NAME
ORDER BY TABLE_NAME, COLUMN_NAME;
EXECUTE sp_executesql #SQL_CMDS
Hope that helps...

instead of this:
Select...FROM Database.dbo.[Table 1], [Table 2], [Table 3]
you MUST have at least columns of the same type in each table... then you could do this:
SELECT id(or whatever column you have in common) FROM table1
union
SELECT id(or whatever column you have in common) FROM table2
union
SELECT id(or whatever column you have in common) FROM table3
etc.

Related

Find count of items from all tables in SQL Server database (filter tables based on column name)

I have to find the total number of distinct items from a particular column (named Ticker) from all tables in the database.
How can I achieve this?. This is what I want:
Table_name | Column | Total_Tickers
------------+---------+---------------
Table_1 | ticker | 500
Table_2 | ticker | 100
Table_3 | ticker | 5000
.
.
I know I've got to use sp_MSForEachTable but I'm not sure how to filter those tables that do not have a Ticker column at all.
This is what I've tried:
create table #counts
(
table_name varchar(255),
ticker_count int
)
EXEC sp_MSForEachTable #command1='INSERT #counts (table_name, ticker_count)
SELECT ''?'', COUNT(ticker) FROM ? ',
#whereand = 'AND ''?'' IN (Select * from information_schema.columns where
column_name = ''%ticker%'')'
SELECT table_name, ticker_count
FROM #counts
ORDER BY table_name, ticker_count DESC
DROP TABLE #counts
It doesn't recognize the COUNT(ticker) on the 7th line since I'm not able to filter the tables!
I'd appreciate any pointers on this. Thanks
Here is a much easier approach
use your_databasename --replace with your database name
go
DECLARE #sql VARCHAR(max)= '',
#column_name SYSNAME = 'ticker'
SET #sql = Stuff((SELECT ' union all select Table_name = '''+ table_name + ''',[Column] = ''' + column_name
+ ''',Total_Tickers = count(distinct '+ column_name + ') from '
+ Quotename(table_catalog) + '.'+ Quotename(table_schema) + '.'+ Quotename(table_name)
FROM information_schema.columns
WHERE column_name = #column_name
FOR xml path('')), 1, 11, '') -- stuff is used to remove the first union all
--SELECT #sql
EXEC (#sql)
Since tables has to be filtered based on column name, I don't think msforeachtable would be helpful here.

Is there a way to remove '_' from column name while selecting * in sql statement?

My table has all the column names
(There are more than 80 columns, I can't change the column names now)
in the format of '_'. Like First_Name, Last_Name,...
So i want to use select * from table instead
of using AS.
I want to select them by removing '_' in one statement. Anyway i can do it?
something like Replace(coulmnName, '_','') in select statement ?
Thanks
You can simply rename the column in your query. For example:
SELECT FIRST_NAME [First Name],
LAST_NAME [Last Name]
FROM UserTable
You can also use the AS keyword but this is optional. Also note that if you don't want to do this on every query you can use this process to create a view with renamed columns. Then you can use SELECT * the way you want to (although this is considered a bad idea for many reasons).
Best of luck!
Alternative - Map In The Client Code:
One other alternative is to do the mapping in the client code. This solution is going to depend greatly on your ORM. Most ORM's (such as LINQ or EF) will allow you to remap. If nothing else you could use AutoMapper or similar to rename the columns on the client using convention based naming.
You can't do this in a single statement unless you're using dynamic SQL. If you're just trying to generate code, you can run a query against Information_Schema and get the info you want ...
DECLARE #MaxColumns INT
DECLARE #TableName VARCHAR(20)
SET #TableName = 'Course'
SELECT #MaxColumns = MAX(ORDINAL_POSITION) FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = #TableName
SELECT Col
FROM
(
SELECT 0 Num, 'SELECT' Col
UNION
SELECT ROW_NUMBER() OVER (PARTITION BY TABLE_NAME ORDER BY ORDINAL_POSITION) Num, ' [' + COLUMN_NAME + '] AS [' + REPLACE(COLUMN_NAME, '_', '') + ']' + CASE WHEN ORDINAL_POSITION = #MaxColumns THEN '' ELSE ',' END
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #TableName
UNION
SELECT #MaxColumns + 1 Num, 'FROM ' + #TableName
) s
ORDER BY num
The question intrigued me and I did find one way. It makes it happen but if you just wanted to give a lot of aliases one time in one query I wouldn't recommend it though.
First I made a stored procedure that extracts all the column names and gives them an alias without '_'.
USE [DataBase]
GO
IF OBJECT_ID('usp_AlterColumnDisplayName', 'P') IS NOT NULL
DROP PROCEDURE usp_AlterColumnDisplayName
GO
CREATE PROCEDURE usp_AlterColumnDisplayName
#TableName VARCHAR(50)
,
#ret nvarchar(MAX) OUTPUT
AS
Select #ret = #ret + [Column name]
From
(
SELECT ([name] + ' AS ' + '[' + REPLACE([name], '_', ' ') + '], ') [Column name]
FROM syscolumns
WHERE id =
(Select id
From sysobjects
Where type = 'U'
And [name] = #TableName
)
) T
GO
Then extract that string and throw it into another string with a query-structure.
Execute that and you are done.
DECLARE #out NVARCHAR(MAX), #DesiredTable VARCHAR(50), #Query NVARCHAR(MAX)
SET #out = ''
SET #DesiredTable = 'YourTable'
EXEC usp_AlterColumnDisplayName
#TableName = #DesiredTable,
#ret = #out OUTPUT
SET #out = LEFT(#out, LEN(#out)-1) --Removing trailing ', '
SET #Query = 'Select ' + #out + ' From ' + #DesiredTable + ' WHERE whatever'
EXEC sp_executesql #Query
If you just wanted to give a lot of aliases at once without sitting and typing it out for 80+ columns I would rather suggest doing that with one simple SELECT statement, like the one in the sp, or in Excel and then copy paste into your code.

Looping through column names with dynamic SQL

I just came up with an idea for a piece of code to show all the distinct values for each column, and count how many records for each. I want the code to loop through all columns.
Here's what I have so far... I'm new to SQL so bear with the noobness :)
Hard code:
select [Sales Manager], count(*)
from [BT].[dbo].[test]
group by [Sales Manager]
order by 2 desc
Attempt at dynamic SQL:
Declare #sql varchar(max),
#column as varchar(255)
set #column = '[Sales Manager]'
set #sql = 'select ' + #column + ',count(*) from [BT].[dbo].[test] group by ' + #column + 'order by 2 desc'
exec (#sql)
Both of these work fine. How can I make it loop through all columns? I don't mind if I have to hard code the column names and it works its way through subbing in each one for #column.
Does this make sense?
Thanks all!
You can use dynamic SQL and get all the column names for a table. Then build up the script:
Declare #sql varchar(max) = ''
declare #tablename as varchar(255) = 'test'
select #sql = #sql + 'select [' + c.name + '],count(*) as ''' + c.name + ''' from [' + t.name + '] group by [' + c.name + '] order by 2 desc; '
from sys.columns c
inner join sys.tables t on c.object_id = t.object_id
where t.name = #tablename
EXEC (#sql)
Change #tablename to the name of your table (without the database or schema name).
This is a bit of an XY answer, but if you don't mind hardcoding the column names, I suggest you do just that, and avoid dynamic SQL - and the loop - entirely. Dynamic SQL is generally considered the last resort, opens you up to security issues (SQL injection attacks) if not careful, and can often be slower if queries and execution plans cannot be cached.
If you have a ton of column names you can write a quick piece of code or mail merge in Word to do the substitution for you.
However, as far as how to get column names, assuming this is SQL Server, you can use the following query:
SELECT c.name
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
Therefore, you can build your dynamic SQL from this query:
SELECT 'select '
+ QUOTENAME(c.name)
+ ',count(*) from [BT].[dbo].[test] group by '
+ QUOTENAME(c.name)
+ 'order by 2 desc'
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
and loop using a cursor.
Or compile the whole thing together into one batch and execute. Here we use the FOR XML PATH('') trick:
DECLARE #sql VARCHAR(MAX) = (
SELECT ' select ' --note the extra space at the beginning
+ QUOTENAME(c.name)
+ ',count(*) from [BT].[dbo].[test] group by '
+ QUOTENAME(c.name)
+ 'order by 2 desc'
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
FOR XML PATH('')
)
EXEC(#sql)
Note I am using the built-in QUOTENAME function to escape column names that need escaping.
You want to know the distinct coulmn values in all the columns of the table ? Just replace the table name Employee with your table name in the following code:
declare #SQL nvarchar(max)
set #SQL = ''
;with cols as (
select Table_Schema, Table_Name, Column_Name, Row_Number() over(partition by Table_Schema, Table_Name
order by ORDINAL_POSITION) as RowNum
from INFORMATION_SCHEMA.COLUMNS
)
select #SQL = #SQL + case when RowNum = 1 then '' else ' union all ' end
+ ' select ''' + Column_Name + ''' as Column_Name, count(distinct ' + quotename (Column_Name) + ' ) As DistinctCountValue,
count( '+ quotename (Column_Name) + ') as CountValue FROM ' + quotename (Table_Schema) + '.' + quotename (Table_Name)
from cols
where Table_Name = 'Employee' --print #SQL
execute (#SQL)

Most recent datetime column and count for each table

I have a DB that has 1000+ tables. 100 of those tables are prefixed with a three letters (let's say 'ABC') Only half of those prefixed tables have MODIFIEDDATETIME column.
I'm trying to do a simple select query to get all the last updated MODIFIEDDATETIME stamp for each Table that actually has a MODIFIEDDATETIME on that table and also begins with the three letter prefix.
I've tried using this function but it doesn't seem to be getting me there. Thoughts?
sp_msforeachtable '
select ''?'', modifieddatetime, count(*)
from ?
where ? like ''%ABC%''
group by modifieddatetime
order by modifieddatetime desc
'
Borrowing from another answer earlier today:
For one, I recommend staying away from undocumented and unsupported
procedures like sp_MSForEachTable. They can be changed or even
removed from SQL Server at any time, and this specific procedure may
have the same symptoms reported by many against sp_MSForEachDb. (See
some background here and here.)
...but also see sp_ineachdb.
Here is how I would do it - most importantly, pull the row count from the metadata which - while not 100% accurate to the millisecond is usually close enough - will not bog down your system performing a scan of every single table:
DECLARE #sql NVARCHAR(MAX);
SELECT #sql = N'';
CREATE TABLE #x
(
[table] NVARCHAR(255),
updated DATETIME,
[rowcount] BIGINT
);
SELECT #sql = #sql + N'INSERT #x SELECT '''
+ QUOTENAME(OBJECT_SCHEMA_NAME([object_id]))
+ '.' + QUOTENAME(OBJECT_NAME([object_id])) + ''',
MAX(MODIFIEDDATETIME), (SELECT SUM(rows) FROM sys.partitions
WHERE [object_id] = ' + CONVERT(VARCHAR(12), [object_id])
+ ') FROM ' + QUOTENAME(OBJECT_SCHEMA_NAME([object_id]))
+ '.' + QUOTENAME(OBJECT_NAME([object_id])) + ';'
FROM sys.columns
WHERE UPPER(name) = 'MODIFIEDDATETIME'
AND UPPER(OBJECT_NAME([object_id])) LIKE 'ABC%';
EXEC sp_executesql #sql;
SELECT [table],updated,[rowcount] FROM #x;
DROP TABLE #x;
That said, I don't know if using MAX(MODIFIEDDATETIME) is appropriate for knowing when a table was touched last. What if a transaction failed? What if the last operation was a delete?
You could do it with dynamic SQL, but this will probably not be very efficient on 1000 tables!
DECLARE #SQL NVARCHAR(MAX) = ''
SELECT #SQL = #SQL + ' UNION SELECT COUNT(' + QUOTENAME(Column_Name) + ') [Rows], MAX(' + QUOTENAME(Column_Name) + ') [MaxModifiedDate], ''' + QUOTENAME(Table_Schema) + '.' + QUOTENAME(Table_Name) + ''' [TableName] FROM ' + QUOTENAME(Table_Schema) + '.' + QUOTENAME(Table_Name)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE Column_Name = 'ModifiedDateTime'
AND Table_Name LIKE 'ABC%'
SET #SQL = 'SELECT MaxModifiedDate, TableName, Rows FROM (' + STUFF(#SQL, 1, 7, '') + ') t ORDER BY MaxModifiedDate DESC'
print #sql
EXEC SP_EXECUTESQL #SQL
It basically builds a query like
SELECT MaxModifiedDate, TableName, Rows
FROM ( SELECT 'Table1' [TableName], MAX(ModifiedDate) [MaxModifedDate], COUNT(ModifiedDate) [Rows]
FROM Table1
UNION
SELECT 'Table2' [TableName], MAX(ModifiedDate) [MaxModifedDate], COUNT(ModifiedDate) [Rows]
FROM Table2
UNION
SELECT 'Table3' [TableName], MAX(ModifiedDate) [MaxModifedDate], COUNT(ModifiedDate) [Rows]
FROM Table3
UNION
...
) c
ORDER BY MaxModifiedDate DESC

How to find out whether a table has some unique columns

I use MS SQL Server.
Ive been handed some large tables with no constrains on them, no keys no nothing.
I know some of the columns have unique values. Is there a smart way for a given table to finde the cols that have unique values ?
Right now I do it manually for each column by counting if there is as many DISTINCT values as there are rows in the table.
SELECT COUNT(DISTINCT col) FROM table
Could prob make a cusor to loop over all the columns but want to hear if someone knows a smarter or build-in function.
Thanks.
Here's an approach that is basically similar to #JNK's but instead of printing the counts it returns a ready answer for every column that tells you whether a column consists of unique values only or not:
DECLARE #table varchar(100), #sql varchar(max);
SET #table = 'some table name';
SELECT
#sql = COALESCE(#sql + ', ', '') + ColumnExpression
FROM (
SELECT
ColumnExpression =
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(*) THEN ''UNIQUE'' ' +
'ELSE '''' ' +
'END AS ' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table
) s
SET #sql = 'SELECT ' + #sql + ' FROM ' + #table;
PRINT #sql; /* in case you want to have a look at the resulting query */
EXEC(#sql);
It simply compares COUNT(DISTINCT column) with COUNT(*) for every column. The result will be a table with a single row, where every column will contain the value UNIQUE for those columns that do not have duplicates, and empty string if duplicates are present.
But the above solution will work correctly only for those columns that do not have NULLs. It should be noted that SQL Server does not ignore NULLs when you want to create a unique constraint/index on a column. If a column contains just one NULL and all other values are unique, you can still create a unique constraint on the column (you cannot make it a primary key, though, which requires both uniquness of values and absence of NULLs).
Therefore you might need a more thorough analysis of the contents, which you could get with the following script:
DECLARE #table varchar(100), #sql varchar(max);
SET #table = 'some table name';
SELECT
#sql = COALESCE(#sql + ', ', '') + ColumnExpression
FROM (
SELECT
ColumnExpression =
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(*) THEN ''UNIQUE'' ' +
'WHEN COUNT(*) - 1 THEN ' +
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE WITH SINGLE NULL'' ' +
'ELSE '''' ' +
'END ' +
'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE with NULLs'' ' +
'ELSE '''' ' +
'END AS ' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table
) s
SET #sql = 'SELECT ' + #sql + ' FROM ' + #table;
PRINT #sql; /* in case you still want to have a look at the resulting query */
EXEC(#sql);
This solution takes NULLs into account by checking three values: COUNT(DISTINCT column), COUNT(column) and COUNT(*). It displays the results similarly to the former solution, but the possible diagnoses for the columns are more diverse:
UNIQUE means no duplicate values and no NULLs (can either be a PK or have a unique constraint/index);
UNIQUE WITH SINGLE NULL – as can be guessed, no duplicates, but there's one NULL (cannot be a PK, but can have a unique constraint/index);
UNIQUE with NULLs – no duplicates, two or more NULLs (in case you are on SQL Server 2008, you could have a conditional unique index for non-NULL values only);
empty string – there are duplicates, possibly NULLs too.
Here is I think probably the cleanest way. Just use dynamic sql and a single select statement to create a query that gives you a total row count and a count of distinct values for each field.
Fill in the DB name and tablename at the top. The DB name part is really important since OBJECT_NAME only works in the current database context.
use DatabaseName
DECLARE #Table varchar(100) = 'TableName'
DECLARE #SQL Varchar(max)
SET #SQL = 'SELECT COUNT(*) as ''Total'''
SELECT #SQL = #SQL + ',COUNT(DISTINCT ' + name + ') as ''' + name + ''''
FROM sys.columns c
WHERE OBJECT_NAME(object_id) = #Table
SET #SQL = #SQL + ' FROM ' + #Table
exec #sql
If you are using 2008, you can use the Data Profiling Task in SSIS to return Candidate Keys for each table.
This blog entry steps through the process, it's fairly simple:
http://consultingblogs.emc.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx
A few words what my code does:
Read's all tables and columns
Creates a temp table to hold table/columns with duplicate keys
For each table/column it runs a query. If it finds a count(*)>1 for at least one value
it makes an insert into the temp table
Select's column and values from the system tables that do not match table/columns that are found to have duplicates
DECLARE #sql VARCHAR(max)
DECLARE #table VARCHAR(100)
DECLARE #column VARCHAR(100)
CREATE TABLE #temp (tname VARCHAR(100),cname VARCHAR(100))
DECLARE mycursor CURSOR FOR
select t.name,c.name
from sys.tables t
join sys.columns c on t.object_id = c.object_id
where system_type_id not in (34,35,99)
OPEN mycursor
FETCH NEXT FROM mycursor INTO #table,#column
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = 'INSERT INTO #temp SELECT DISTINCT '''+#table+''','''+#column+ ''' FROM ' + #table + ' GROUP BY ' + #column +' HAVING COUNT(*)>1 '
EXEC (#sql)
FETCH NEXT FROM mycursor INTO #table,#column
END
select t.name,c.name
from sys.tables t
join sys.columns c on t.object_id = c.object_id
left join #temp on t.name = #temp.tname and c.name = #temp.cname
where system_type_id not in (34,35,99) and #temp.tname IS NULL
DROP TABLE #temp
CLOSE mycursor
DEALLOCATE mycursor
What about simple one line of code:
CREATE UNIQUE INDEX index_name ON table_name (column_name);
If the index is created then your column_name has only unique values. If there are dupes in your column_name, you will get an error message.