Get fill rates in another table - SQL Server - sql

I am trying to create a script to calculate the fill rates for each column in a table Data_table and insert it into a second table Metadata_table.
The Data_table has 30 columns in it, and some columns have 100% data in them, and some have less than 100% (due to nulls).
My code to calculate the fill rate looks like this,
select
cast(sum(case
when employee_id is null
then 0
else 1
end) / cast(count(1) as float ) * 100 as decimal(8,3)) as employee_id_fill,
.....--/*so on for 30 columns..*/
from
[Data_table]
The Metadata_table should look like this:
Table_name | Colmn_name | Fill_rate
[Data_table]| Colomn_a | 100%
[Data_table]| Colomn_b | 89%
[Data_table]| Colomn_c | 100%
and so on...
I think
unpivot
can work here, but i am unable to get the column names into the [Metadata_table] automatically.
I tried using this for automating the column names-
COL_NAME(OBJECT_ID('DBO.[DATA_TABLE]'),'COLOMN_A')
but this has not worked so far.
Any help is appreciated

You can use sys.columns for grabbing the column names. You can join it to sys.tables by the object_id if you ever need to associate the two.
For example:
SELECT c.NAME
FROM SYS.TABLES t
INNER JOIN SYS.COLUMNS c ON t.OBJECT_ID = c.OBJECT_ID
WHERE t.OBJECT_ID = OBJECT_ID('DBO.[Data_Table]');
You can generate SQL from here in the format you wanted by creating an expression to query your table and then unpivot it.
Another approach could be a while loop to do inserts into your metadata table. If you're working with a very large table this option will be more expensive so keep it in mind. I used an example table dbo.Attendance_Records and this script will print out the example SQL, not execute it. You would want to change it to call sp_executesql on that text.
DECLARE #Table NVARCHAR(128) = 'DBO.[Attendance_Records]'
,#MetaTable NVARCHAR(128) = 'DBO.[Metadata_Table]'
,#ColumnName NVARCHAR(128)
,#Iterator INT = 1
,#SQL NVARCHAR(MAX)
SELECT c.NAME
,c.COLUMN_ID
,ROW_NUMBER() OVER (ORDER BY COLUMN_ID) AS RN
INTO #Cols
FROM SYS.COLUMNS c
WHERE c.OBJECT_ID = OBJECT_ID(#Table);
WHILE #Iterator <= (SELECT ISNULL(MAX(RN),0) FROM #Cols)
BEGIN
SET #ColumnName = (SELECT NAME FROM #Cols WHERE RN = #Iterator)
SET #SQL = 'INSERT INTO ' + #MetaTable + ' (Table_Name, Column_Name, Fill_Rate) '
+ 'SELECT ''' + REPLACE(#Table,'DBO.','') + ''', ''' + #ColumnName + ''', 100 * CONVERT(DECIMAL(8,3), SUM(CASE WHEN [' + #ColumnName + '] IS NULL THEN 0 ELSE 1 END)) / COUNT(1) AS [' + #ColumnName + '_fill]' + ' FROM ' + #Table
PRINT #SQL
SET #Iterator += 1
END

Since you need to have the column names you would need to so something along these lines.
select ColumnName = 'Colomn_a'
, FillRate = count(distinct Colomn_a) / count(*) * 1.0 --must multiply by 1.0 to avoid integer math
from YourTable
UNION ALL
select 'Colomn_b'
, count(distinct Colomn_b) / count(*) * 1.0
from YourTable

Just an alternate method of Mike R's
CREATE OR ALTER PROCEDURE [dbo].[GetFillRate_new] -- EXEC [GetFillRate] 'TestEmp'
(
#TableName NVARCHAR(128),
#Include_BlankAsNotFilled BIT = 1 -- 0-OFF(Default); 1-ON(Blank As Not Filled Data)
)
AS
BEGIN
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
IF NOT EXISTS(SELECT 1 FROM SYS.OBJECTS WHERE [TYPE]='U' AND [NAME]=#TableName )
BEGIN
SELECT Result = 1 , Reason ='Table not exists in this Database' ;
RETURN 1;
END;
declare #sql varchar(max)
set #sql=''
select
#sql=#sql+'select
'''+c.column_name+''' as [Column Name],
cast((100*(sum(
case when ' +
case
when #include_blankasnotfilled = 0
then '[' + c.column_name + '] is not null'
when c.collation_name is null
then '[' + c.column_name + '] is not null'
else 'isnull([' + c.column_name + '],'''')<>'''' ' end +
' then 1 else 0 end)*1.0 / count(*)))
as decimal(5,2)) as [Fill Rate (%)]
from '+c.table_name+'
union all '
from
information_schema.columns as c
inner join information_schema.tables as t
on c.table_name=t.table_name
where
t.table_type='base table' and
t.table_name =#tablename
set #sql=left(#sql,len(#sql)-10)
--print #sql
exec(#sql)
end
You can find more details into this blog post https://exploresql.com/2019/12/14/how-to-find-fill-rate-in-a-table/

Challenges:
The Schema changes like below things makes our Fill Rate approach little difficult than actual.
Table name changes
Column name changes
Data type changes
Removing Existing columns
Adding New Columns
Due to the above challenges, we cannot simply go for Static Solution to find Fill Rate of a table. Instead, we need something like Dynamic Approach to avoid our future re-works.
Prerequisite:
In the below sample, we are going to use one stored procedure named ‘Get_FillRate’ for demo. If any one have the same object name in database, please make sure to change the below stored procedure name.
Sample Table Creation with Data Loading Script
--dropping temp table if exists
IF OBJECT_ID('TempDb..#TestEmp') IS NOT NULL
DROP TABLE #TestEmp;
CREATE TABLE #TestEmp
(
[TestEmp_Key] INT IDENTITY(1,1) NOT NULL,
[EmpName] VARCHAR(100) NOT NULL,
[Age] INT NULL,
[Address] VARCHAR(100) NULL,
[PhoneNo] VARCHAR(11) NULL,
[Inserted_dte] DATETIME NOT NULL,
[Updated_dte] DATETIME NULL,
CONSTRAINT [PK_TestEmp] PRIMARY KEY CLUSTERED
(
TestEmp_Key ASC
)
);
GO
INSERT INTO #TestEmp
(EmpName,Age,[Address],PhoneNo,Inserted_dte)
VALUES
('Arul',24,'xxxyyy','1234567890',GETDATE()),
('Gokul',22,'zzzyyy',NULL,GETDATE()),
('Krishna',24,'aaa','',GETDATE()),
('Adarsh',25,'bbb','1234567890',GETDATE()),
('Mani',21,'',NULL,GETDATE()),
('Alveena',20,'ddd',NULL,GETDATE()),
('Janani',30,'eee','',GETDATE()),
('Vino',26,NULL,'1234567890',GETDATE()),
('Madhi',25,'ggg',NULL,GETDATE()),
('Ronen',25,'ooo',NULL,GETDATE()),
('Visakh',25,'www',NULL,GETDATE()),
('Jayendran',NULL,NULL,NULL,GETDATE());
GO
SELECT [TestEmp_Key],[EmpName],[Age],[Address],[PhoneNo],[Inserted_dte],[Updated_dte] FROM #TestEmp;
GO
Temp Table - #TestEmp
SQL Procedure For Finding Fill Rate in a Table - Dynamic Approach
Input Parameters
Both of the Input Parameters are mandatory.
#p_TableName - Data type used for this input Parameter is NVARCHAR(128) and Nullability is NOT NULL.
#p_Include_BlankAsNotFilled - Data type used for this input Parameter is BIT and Nullability is NOT NULL and either 0 or 1 needs to give. 0 is by Default and 0 means OFF. 1 is ON (when given as 1 - Blank entries will be considered As Not Filled Data).
Output Columns
There are Two output Columns. both of those are Non Nullable Output Columns.
[Column Name] - Data type used for this Output Column is sysname and Nullability is NOT NULL. All the Column Names for the user given Table Name would come as row values.
[Fill Rate (%)] - Data type used for this Output Column is DECIMAL(5,2) and Nullability is NOT NULL. Values from 0.00 to 100.00 would come in result with respective Column Names.
Info reg Stored Procedure
Created the store Procedure named - 'Get_FillRate'.
To avoid the number of rows returned, set NOCOUNT as ON.
Try, Catch Blocks are added for error handling's.
To read Uncommitted Modifications, set TRANSACTION ISOLATION LEVEL as READ UNCOMMITTED.
Parameter Sniffing Concept is also included.
Some handling's done on the Table Name input parameters to support user typing table name formats like '.table_name','..table_name','...table_name','table_name','[table_name]','dbo.table_name','dbo.[table_name]','[dbo].[table_name]' etc.,
Validation is included at the start, when user gives other than 'table name', stored procedure would throw 'Table not exists in this Database' as error message.
System table named SYS.OBJECTS and SYS.COLUMNS and System View named INFORMATION_SCHEMA.COLUMNS are used inside the stored procedure.
ORDINAL_POSITION from INFORMATION_SCHEMA.COLUMNS is used, to return the result set with the same column order that the table structure already has.
COLLATION_NAME from INFORMATION_SCHEMA.COLUMNS is used, to support conditions like blank is either need to consider or not, as not filled entries.
COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS is used, to show the final result set with respective fill rates.
Dynamic Query is used, to support dynamic approach and this would avoid all the challenges that would come in static solutions like schema changes.
Both Method 1(Dynamic Query with WHILE LOOP) and Method 2(Dynamic Query with UNION ALL) produces same result sets and carries same functionality where some metrics like CPU time, Elapsed Time, Logical reads that are better in Method 2.
Method 1 - With the use of WHILE Loop
CREATE OR ALTER PROCEDURE [dbo].[Get_FillRate]
(
#p_TableName NVARCHAR(128),
#p_Include_BlankAsNotFilled BIT = 0 -- 0-OFF(Default); 1-ON(Blank As Not Filled Data)
)
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
--Parameter Sniffing
DECLARE #TableName NVARCHAR(128),
#Include_BlankAsNotFilled BIT,
#ColumnName NVARCHAR(128),
#R_NO INT,
#DataType_Field BIT,
#i INT, --Iteration
#RESULT NVARCHAR(MAX);
SELECT #TableName = #p_TableName,
#Include_BlankAsNotFilled = #p_Include_BlankAsNotFilled,
#i = 1;
--To Support some of the table formats that user typing.
SELECT #TableName =REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#TableName,'[',''),']',''),'dbo.',''),'...',''),'..',''),'.','');
--validation
IF NOT EXISTS(SELECT 1 FROM SYS.OBJECTS WHERE [TYPE]='U' AND [NAME]=#TableName )
BEGIN
SELECT Result = 1 , Reason ='Table not exists in this Database' ;
RETURN 1;
END;
--dropping temp table if exists - for debugging purpose
IF OBJECT_ID('TempDb..#Temp') IS NOT NULL
DROP TABLE #Temp;
IF OBJECT_ID('TempDb..#Columns') IS NOT NULL
DROP TABLE #Columns;
--temp table creations
CREATE TABLE #Temp
(
[R_NO] INT NOT NULL,
[ColumnName] NVARCHAR(128) NOT NULL,
[FillRate] DECIMAL(5,2) NOT NULL
PRIMARY KEY CLUSTERED (ColumnName)
);
CREATE TABLE #Columns
(
[R_NO] INT NOT NULL,
[Name] [sysname] NOT NULL,
[DataType_Field] BIT NOT NULL
PRIMARY KEY CLUSTERED ([Name])
);
INSERT INTO #Columns ([R_NO],[Name],[DataType_Field])
SELECT
COLUMN_ID,
[Name],
IIF(collation_name IS NULL,0,1)
FROM SYS.COLUMNS WHERE OBJECT_ID = OBJECT_ID(#TableName);
WHILE #i <= ( SELECT MAX(R_NO) FROM #Columns) --Checking of Iteration till total number of columns
BEGIN
SELECT #DataType_Field=DataType_Field,#ColumnName=[Name],#R_NO=[R_NO] FROM #Columns WHERE R_NO = #i;
SET #RESULT =
'INSERT INTO #Temp ([R_NO],[ColumnName], [FillRate]) ' +
'SELECT ' + QUOTENAME(#R_NO,CHAR(39)) + ',
''' + #ColumnName + ''',
CAST((100*(SUM(
CASE WHEN ' +
CASE
WHEN #Include_BlankAsNotFilled = 0
THEN '[' + #ColumnName + '] IS NOT NULL'
WHEN #DataType_Field = 0
THEN '[' + #ColumnName + '] IS NOT NULL'
ELSE 'ISNULL([' + #ColumnName + '],'''')<>'''' ' END +
' THEN 1 ELSE 0 END)*1.0 / COUNT(*)))
AS DECIMAL(5,2))
FROM ' + #TableName;
--PRINT(#RESULT); --for debug purpose
EXEC(#RESULT);
SET #i += 1; -- Incrementing Iteration Count
END;
--Final Result Set
SELECT
ColumnName AS [Column Name],
FillRate AS [Fill Rate (%)]
FROM #TEMP
ORDER BY [R_NO];
RETURN 0;
END TRY
BEGIN CATCH --error handling even it is fetching stored procedure
SELECT
ERROR_NUMBER() AS ErrorNumber
,ERROR_SEVERITY() AS ErrorSeverity
,ERROR_STATE() AS ErrorState
,ERROR_PROCEDURE() AS ErrorProcedure
,ERROR_LINE() AS ErrorLine
,ERROR_MESSAGE() AS ErrorMessage;
RETURN 1;
END CATCH;
END;
Execute this stored procedure - Method 1 by passing the table name like below
Execute like below if we need to consider NULL values alone as not filled
EXEC [Get_FillRate] #p_TableName='#TestEmp',#p_Include_BlankAsNotFilled=0;
Execute like below if we need to consider both NULL values and empty/blank values as not filled
EXEC [Get_FillRate] #p_TableName='#TestEmp',#p_Include_BlankAsNotFilled=1;
Method 1 -Output
Method 2 - With the use of UNION ALL
CREATE OR ALTER PROCEDURE [dbo].[Get_FillRate]
(
#p_TableName NVARCHAR(128),
#p_Include_BlankAsNotFilled BIT = 0 -- 0-OFF(Default); 1-ON(Blank As Not Filled Data)
)
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
--Parameter Sniffing
DECLARE #TableName NVARCHAR(128),
#Include_BlankAsNotFilled BIT,
#RESULT NVARCHAR(MAX);
SELECT #TableName = #p_TableName,
#Include_BlankAsNotFilled = #p_Include_BlankAsNotFilled,
#RESULT = '';
--To Support some of the table formats that user typing.
SELECT #TableName =REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#TableName,'[',''),']',''),'dbo.',''),'...',''),'..',''),'.','');
--validation
IF NOT EXISTS(SELECT 1 FROM SYS.OBJECTS WHERE [TYPE]='U' AND [NAME]=#TableName )
BEGIN
SELECT Result = 1 , Reason ='Table not exists in this Database' ;
RETURN 1;
END;
--dropping temp table if exists - for debugging purpose
IF OBJECT_ID('TempDb..#Columns') IS NOT NULL
DROP TABLE #Columns;
--temp table creations
CREATE TABLE #Columns
(
[ORDINAL_POSITION] INT NOT NULL,
[COLUMN_NAME] [sysname] NOT NULL,
[DataType_Field] BIT NOT NULL,
[TABLE_NAME] [sysname] NOT NULL
PRIMARY KEY CLUSTERED ([ORDINAL_POSITION],[COLUMN_NAME])
);
INSERT INTO #Columns ([ORDINAL_POSITION],[COLUMN_NAME],[DataType_Field],[TABLE_NAME])
SELECT
[ORDINAL_POSITION],
[COLUMN_NAME],
CASE WHEN COLLATION_NAME IS NOT NULL THEN 1 ELSE 0 END,
[TABLE_NAME]
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME =#tablename; --Using System_View
--Final Result Set
SELECT #RESULT = #RESULT+ N'SELECT '''+C.COLUMN_NAME+''' AS [Column Name],
CAST((100*(SUM(
CASE WHEN ' +
CASE
WHEN #include_blankasnotfilled = 0
THEN '[' + C.COLUMN_NAME + '] IS NOT NULL'
WHEN C.[DataType_Field]=0
THEN '[' + C.COLUMN_NAME + '] IS NOT NULL'
ELSE 'ISNULL([' + C.COLUMN_NAME + '],'''')<>'''' ' END +
' THEN 1 ELSE 0 END)*1.0 / COUNT(*)))
AS DECIMAL(5,2)) AS [Fill Rate (%)]
FROM '+C.TABLE_NAME+' UNION ALL '
FROM #Columns C;
SET #RESULT=LEFT(#RESULT,LEN(#RESULT)-10); --To Omit 'Last UNION ALL '.
--PRINT(#RESULT); --for debug purpose
EXEC(#RESULT);
RETURN 0;
END TRY
BEGIN CATCH --error handling even it is fetching stored procedure
SELECT
ERROR_NUMBER() AS ErrorNumber
,ERROR_SEVERITY() AS ErrorSeverity
,ERROR_STATE() AS ErrorState
,ERROR_PROCEDURE() AS ErrorProcedure
,ERROR_LINE() AS ErrorLine
,ERROR_MESSAGE() AS ErrorMessage;
RETURN 1;
END CATCH;
END;
Execute this stored procedure - Method 2 by passing the table name like below
Execute like below if we need to consider NULL values alone as not filled
EXEC [Get_FillRate] #p_TableName='#TestEmp',#p_Include_BlankAsNotFilled=0;
Execute like below if we need to consider both NULL values and empty/blank values as not filled
EXEC [Get_FillRate] #p_TableName='#TestEmp',#p_Include_BlankAsNotFilled=1;
Method 2 -Output
Metrics Difference between Method 1 Vs Method 2
The below four metrics taken for consideration for knowing the difference between Method 1 Vs Method 2
No. of Query Sets in Exec Query Plan
Total CPU Time (in ms)
Total Elapsed Time (in ms)
Total Logical Reads
In conclusion, we have seen how to find the Fill Rate of a Table Using T-SQL Queries that is applicable to run in both AZURE and On-Premises SQL Databases. Thus, it would helps us to take business decisions effectively as well as immediately.

Related

Retrieve Max loaded date across all tables on a DB

Output I'm trying to get to;
(Database name = ATT)
Table Name
Column name
MAX loaded date = MAX(loaded_date) for this column only
loaded_date is a column in around 50 tables in a database with the same name and datatype (Datetime)
select * FROM sys.tables
select * FROM syscolumns
I've been exploring the system tables without much luck, looking at some posts it may be done dynamic SQL which I've never done.
You can write an sql that writes an sql..
SELECT REPLACE(
'select ''{tn}'' as table_name, max(loaded_date) as ld from {tn} union all'
,'{tn}',table_name)
FROM
information_schema.columns
WHERE
column_name = 'loaded_date'
Run that, then copy all but the final UNION ALL out of the results window and into the query window, and run again
If you wanted to get all this into a single string for dynamic exec, i guess it'd look like (untested) a procedure that contained:
DECLARE #x NVARCHAR(MAX);
SELECT #x =
STRING_AGG(
REPLACE(
'select ''{tn}'' as table_name, max(loaded_date) as ld from {tn}'
,'{tn}',table_name)
,' union all ')
FROM
information_schema.columns
WHERE
column_name = 'loaded_date';
EXECUTE sp_executesql #x;
If your SQLS is old and doesnt have string_agg it's a bit more awkward - but there are many examples of "turn rows into CSV" in sql server that look like STUFF..FOR XML PATH - https://duckduckgo.com/?t=ffab&q=rows+to+CSV+SQLS&ia=web
I wrote up a more permanent type of script that does this. It returns a result set of the list of tables in the current database with a column named loaded_date along with the MAX(loaded_date) result from each table. This script individually queries each table by looping through and running the query on each table individually and keeping track of the max value for each table in a table variable. It also has a #Debug variable that allows you to see the text of the queries that would be run instead of actually running them and implements custom error message to troubleshoot any issues.
/*disable row count messages*/
SET NOCOUNT ON;
/*set to 1 to debug (aka just print queries instead of running)*/
DECLARE #Debug bit = 0;
/*get list of tables to query and assign a unique index to row to assist in looping*/
DECLARE #TableList TABLE(
SchemaAndTableName nvarchar(257) NOT NULL
,OrderToQuery bigint NOT NULL
,MaxLoadedDate datetime NULL
,PRIMARY KEY (OrderToQuery)
);
INSERT INTO #TableList (SchemaAndTableName,OrderToQuery)
SELECT
CONCAT(QUOTENAME(s.name),N'.', QUOTENAME(t.name)) AS SchemaAndTableName
,ROW_NUMBER() OVER(ORDER BY s.name, t.name) AS OrderToQuery
FROM
sys.columns AS c
INNER JOIN sys.tables AS t ON c.object_id = t.object_id
INNER JOIN sys.schemas AS s ON t.schema_id = s.schema_id
WHERE
c.name = N'loaded_date';
/*declare and set some variables for loop*/
DECLARE #NumTables int = (SELECT TOP (1) OrderToQuery FROM #TableList ORDER BY OrderToQuery DESC);
DECLARE #I int = 1;
DECLARE #CurMaxDate datetime;
DECLARE #CurTable nvarchar(257);
DECLARE #CurQuery nvarchar(max);
/*start loop*/
WHILE #I <= #NumTables
BEGIN
/*build text of current query*/
SET #CurTable = (SELECT SchemaAndTableName FROM #TableList WHERE OrderToQuery = #I);
SET #CurQuery = CONCAT(N'SELECT #MaxDateOut = MAX(loaded_date) FROM ', #CurTable, N';');
/*check debugging status*/
IF #Debug = 0
BEGIN
BEGIN TRY
EXEC sys.sp_executesql #stmt = #CurQuery
,#params = N'#MaxDateOut datetime OUTPUT'
,#MaxDateOut = #CurMaxDate OUTPUT;
END TRY
BEGIN CATCH
DECLARE #ErrorMessage nvarchar(max) = CONCAT(
N'Error querying table ', #CurTable, N'.', NCHAR(13), NCHAR(10)
,N'Errored query: ', NCHAR(13), NCHAR(10), #CurQuery, NCHAR(13), NCHAR(10)
,N'Error message: ', ERROR_MESSAGE()
);
RAISERROR(#ErrorMessage,16,1) WITH NOWAIT;
/*on error end loop so error can be investigated*/
SET #I = #NumTables + 1;
END CATCH;
END;
ELSE /*currently debugging*/
BEGIN
PRINT(CONCAT(N'Debug output: ', #CurQuery));
END;
/*update value in our table variable*/
UPDATE #TableList
SET MaxLoadedDate = #CurMaxDate
WHERE
OrderToQuery = #I;
/*increment loop*/
SET #I = #I + 1;
END;
SELECT
SchemaAndTableName AS TableName
,MaxLoadedDate AS Max_Loaded_date
FROM
#TableList;
I like this solution better as querying each table one at a time would be much less system impact than attempting one large UNION ALL query. Querying a large set of a tables all at once could cause some serious resource semaphore or locking contention (depending on usage of your db).
It is fairly well commented, but let me know if something is not clear.
Also, just a note, dynamic SQL should be used as a last resort. I provided this script to answer your question, but you should explore better options than something like this.
You can go for undocumented stored procedure sp_MSforeachtable. But, don't use in production code, as this stored procedure might not be available in future versions.
Read more on sp_MSforeachtable
EXEC sp_MSforeachtable 'SELECT ''?'' as tablename, max(loaded_Date) FROM ?'

Find all unique values of a column name in a SQL database

We are building a large database using SQL. Every table in the database has many columns but one of the columns in the tables tells who added the row of data. That value "Name of person" is tied to a variable in SSIS. Again, the variable tells who added the row. How can I create a query to pull back all the names in that column, no matter where it is used in the database. The value of the column is different depending on the day.
RE: Every table has the same <Column_Name> ... a query to pull back all the values of that column, no matter where it is used in the database.
IF you have a large database and every table has the same column <Column_Name>, then you will be pulling a value from every row in every table ... in a very large database. Not sure that is what you want to do, but it can be easily done. The following will work even if <column_name> is only in a few tables.
Grab a list of every schema.table that contains <column_name>, then loop over it to get the the value for <column_name>. The following should work.
DECLARE #colname sysname = '<Column_name>' -- just in case it is not in every table
-- capture name of every table here
CREATE TABLE #tablename ( schema_name sysname, table_name sysname, Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED)
INSERT INTO #tablename (schema_name, table_name)
SELECT s.name schemaname, t.name tablename FROM sys.columns c
INNER JOIN sys.tables t ON t.object_id = c.object_id
INNER JOIN sys.schemas s ON s.schema_id = t.schema_id
WHERE c.name = #colname
-- capture result of query here
CREATE TABLE #result ( schema_name sysname, table_name sysname, column_value VARCHAR(100) )
DECLARE #i INT = 1, #imax INT
SELECT #imax = MAX(Id) FROM #tablename
-- loop over tablename
DECLARE #query NVARCHAR(255)
WHILE #i <= #imax
BEGIN
SELECT #query = N'SELECT ' + schema_name + '.' + table_name + ',' + ' <Column_Name> FROM ' + schema_name + '.' + table_name
FROM #tablename WHERE Id = #i
INSERT INTO #result ( schema_name, table_name, column_value)
EXEC sp_executesql #query
SET #i += 1
END

SQL Cursor to use Table and Field Names from Temp Table

I'll preface this by letting you all know that I promised myself a few years ago never to use a cursor in SQL where it's not needed. Unfortunately I think I may have to use one in my current situation but it's been so long that I'm struggling to remember the correct syntax.
Basically, I've got a problem with CONVERT_IMPLICIT happening in queries because I have data types that are different for the same field in different tables so I'd like to eventually convert these to int. But to do this I need to check whether all data can be converted to int or not to see how big the job is.
I've got the query below which gives me a list of all tables in the database that contain the relevant field in a list;
IF OBJECT_ID('tempdb..#BaseData') IS NOT NULL DROP TABLE #BaseData
GO
CREATE TABLE #BaseData (Table_Name varchar(100), Field_Name varchar(100), Data_Type_Desc varchar(20), Data_Max_Length int, Convertible bit)
DECLARE #FieldName varchar(20); SET #FieldName = 'TestFieldName'
INSERT INTO #BaseData (Table_Name, Field_Name, Data_Type_Desc, Data_Max_Length)
SELECT
o.name ,c.name ,t.name ,t.max_length
FROM sys.columns c
JOIN sys.types t
ON c.user_type_id = t.user_type_id
JOIN sys.objects o
ON c.object_id = o.object_id
WHERE c.name LIKE '%' + #FieldName + '%'
AND o.type_desc = 'USER_TABLE'
Which gives results like this;
Table_Name Field_Name Data_Type_Desc Data_Max_Length Convertible
Table1 TestFieldName varchar 8000 NULL
Table2 TestFieldName nvarchar 8000 NULL
Table3 TestFieldName int 4 NULL
Table4 TestFieldName varchar 8000 NULL
Table5 TestFieldName varchar 8000 NULL
What I'd like to do is to check if all data in the relevant table & field can be converted to an int and update the 'convertible' field (1 if there's data that can't be converted, 0 if the data is fine). I've got the following calculation which works perfectly fine;
'SELECT
CASE
WHEN COUNT(' + #FieldName + ') - SUM(ISNUMERIC(' + #FieldName + ')) > 0
THEN 1
ELSE 0
END
FROM ' + #TableName
And gives the result that I'm after. But I'm struggling to get to the correct syntax to create the cursor which will look at each row in my temp table and run this SQL accordingly. It then needs to update the final column of the temp table with the result of the query (1 or 0).
This will have to be run on a couple of hundred databases which is why I need this list to be dynamic, there may well be custom tables in some databases (in fact, it's pretty likely).
If anybody could give any guidance it would be greatly appreciated.
Thanks
I made a couple of changes to your original query but here is something that should work. I have done similar things in the past :-)
Changes:
Added schema to the source table - my test database had matches in multiple schemas
Changed datatypes to sysname, smallint to match table definitions or names could get truncated
IF OBJECT_ID('tempdb..#BaseData') IS NOT NULL DROP TABLE #BaseData;
GO
CREATE TABLE #BaseData (Schema_Name sysname, Table_Name sysname, Field_Name sysname, Data_Type_Desc sysname, Data_Max_Length smallint, Convertible bit);
DECLARE #FieldName varchar(20); SET #FieldName = 'TestFieldName';
INSERT INTO #BaseData (Schema_Name, Table_Name, Field_Name, Data_Type_Desc, Data_Max_Length)
SELECT
s.name, o.name ,c.name ,t.name ,t.max_length
FROM sys.columns c
JOIN sys.types t
ON c.user_type_id = t.user_type_id
JOIN sys.objects o
ON c.object_id = o.object_id
JOIN sys.schemas s ON o.schema_id=s.schema_id
WHERE c.name LIKE '%' + #FieldName + '%'
AND o.type_desc = 'USER_TABLE';
--select * from #BaseData;
DECLARE #sName sysname,
#tName sysname,
#fName sysname,
#sql VARCHAR(MAX);
DECLARE c CURSOR LOCAL FAST_FORWARD FOR
SELECT Schema_Name,
Table_Name,
Field_Name
FROM #BaseData;
OPEN c;
FETCH NEXT FROM c INTO #sName, #tName, #fName;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = 'UPDATE #BaseData SET Convertible =
(SELECT
CASE
WHEN COUNT(' + #fName + ') - SUM(ISNUMERIC(' + #fName + ')) > 0
THEN 1
ELSE 0
END Convertible
FROM ' + #sName + '.' + #tName + ')
FROM #BaseData WHERE Schema_Name = ''' + #sName + ''' AND Table_Name = ''' + #tName + ''' AND Field_Name = ''' + #fName + '''';
--select #sql;
EXEC(#sql);
FETCH NEXT FROM c INTO #sName, #tName, #fName;
END
CLOSE c;
DEALLOCATE c;
select *
from #BaseData;
If I understand your question, I would do something like this to identify those records that do not cast as an [int].
You didn't state which version of SQL Server you're using; TRY_CAST and TRY_CONVERT are 2012 or later.
DECLARE #test AS TABLE ( [field] [sysname] );
INSERT INTO #test
( [field] )
VALUES ( N'1' ),
( N'a' );
SELECT [field]
FROM #test
WHERE TRY_CAST([field] AS [INT]) IS NULL;
-- this is the basic sql syntax for a cursor
CURSOR (https://msdn.microsoft.com/en-us/library/ms180169.aspx)
DECLARE #parameter [sysname];
BEGIN
DECLARE [field_cursor] CURSOR
FOR
SELECT [value]
FROM [<schema>].[<table>];
OPEN [field_cursor];
FETCH NEXT FROM [field_cursor] INTO #parameter;
WHILE ##FETCH_STATUS = 0
BEGIN
-- do something really interesting here
FETCH NEXT FROM [field_cursor] INTO #parameter;
END;
CLOSE [field_cursor];
DEALLOCATE [field_cursor];
END;
I wasn't able to test this but it should do what you're looking for. Just plop this in after you create your temp table:
DECLARE #tName VARCHAR(20),
#fName VARCHAR(20),
#dType VARCHAR(20),
#dLength INT,
#sql VARCHAR(MAX);
DECLARE c CURSOR LOCAL FAST_FORWARD FOR
SELECT Table_Name,
Field_Name,
Data_Type_Desc,
Data_Max_Length
FROM #BaseData;
OPEN c;
FETCH NEXT FROM c INTO #tName, #fName, #dType, #dLength;
WHILE ##FETCH_STATUS = 0
BEGIN
IF((COUNT(#FieldName) - SUM(ISNUMERIC(#FieldName))) > 0)
BEGIN
SET #sql = 'UPDATE ' + #tName + ' SET Convertible = 1 WHERE Table_Name = ''' + #tName + '''';
END
ELSE
BEGIN
SET #sql = 'UPDATE ' + #tName + ' SET Convertible = 0 WHERE Table_Name = ''' + #tName + '''';
END
EXEC(#sql);
FETCH NEXT FROM c INTO #TableName, #FieldName, #DataType, #DataLength;
END
CLOSE c;
DEALLOCATE c;

Search SQL DB's For A Specific Word

I am completely new to SQL and have no experience what so ever in it so please bear with me with this question.
I need to know if it is possible to search a SQL database for a specific word and if so how?
We are currently going through a rebranding project and I need to look in our CMS (Content Management System) database for all reference to an email address. All I need to search for is:
.co.uk
Below is a screenshot of the database in question with all the containing tables, I just cant get me head around SQL and I have had no joy on Google trying to find the answer.
I need to search everything in this database but I don't know what tables, views, column names etc the content sits in as it's all spread across them all.
There are other tables I need to search but hopefully an answer will be provided which I can modify to search these.
DB's aren't really meant for such vague search descriptions, you should have some definition or model or requirement specs to describe where values like that could exist.
But of course, you could opt for an insanely slow method of doing it by using dynamic SQL.
I made this right fast and just tested it fast, but it should work:
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#SEARCHTABLE') IS NOT NULL
DROP TABLE #SEARCHTABLE
IF OBJECT_ID('tempdb..#RESULTS') IS NOT NULL
DROP TABLE #RESULTS
CREATE TABLE #SEARCHTABLE (ROWNUM INT IDENTITY(1,1), SEARCHCLAUSE VARCHAR(2000) COLLATE DATABASE_DEFAULT)
INSERT INTO #SEARCHTABLE (SEARCHCLAUSE)
SELECT 'SELECT TOP 1 '''+TAB.name+''', '''+C.name+'''
FROM ['+S.name+'].['+TAB.name+']
WHERE '
+CASE WHEN T.name <> 'xml'
THEN '['+C.name+'] LIKE ''%.co.uk%'' AND ['+C.name+'] LIKE ''%#%'''
ELSE 'CAST(['+C.name+'] AS VARCHAR(MAX)) LIKE ''%.co.uk%'' AND CAST(['+C.name+'] AS VARCHAR(MAX)) LIKE ''%#%'''
END AS SEARCHCLAUSE
FROM sys.tables TAB
JOIN sys.schemas S on S.schema_id = TAB.schema_id
JOIN sys.columns C on C.object_id = TAB.object_id
JOIN sys.types T on T.user_type_id = C.user_type_id
WHERE TAB.type_desc = 'USER_TABLE'
AND (T.name LIKE '%char%' OR
T.name LIKE '%xml%')
AND CASE WHEN C.max_length = -1 THEN 10 ELSE C.max_length END >= 6 -- To only search through sufficiently long column
CREATE TABLE #RESULTS (ROWNUM INT IDENTITY(1,1), TABLENAME VARCHAR(256) COLLATE DATABASE_DEFAULT, COLNAME VARCHAR(256) COLLATE DATABASE_DEFAULT)
DECLARE #ROWNUM_NOW INT, #ROWNUM_MAX INT, #SQLCMD VARCHAR(2000), #STATUSSTRING VARCHAR(256)
SELECT #ROWNUM_NOW = MIN(ROWNUM), #ROWNUM_MAX = MAX(ROWNUM) FROM #SEARCHTABLE
WHILE #ROWNUM_NOW <= #ROWNUM_MAX
BEGIN
SELECT #SQLCMD = SEARCHCLAUSE FROM #SEARCHTABLE WHERE ROWNUM = #ROWNUM_NOW
INSERT INTO #RESULTS
EXEC(#SQLCMD)
SET #STATUSSTRING = CAST(#ROWNUM_NOW AS VARCHAR(25))+'/'+CAST(#ROWNUM_MAX AS VARCHAR(25))+', time: '+CONVERT(VARCHAR, GETDATE(), 120)
RAISERROR(#STATUSSTRING, 10, 1) WITH NOWAIT
SELECT #ROWNUM_NOW = #ROWNUM_NOW + 1
END
SET NOCOUNT ON
SELECT 'This table and column contains strings ".co.uk" and a "#"' INFORMATION, TABLENAME, COLNAME FROM #RESULTS
-- Uncomment to drop the created temp tables
--IF OBJECT_ID('tempdb..#SEARCHTABLE') IS NOT NULL
-- DROP TABLE #TABLECOLS
--IF OBJECT_ID('tempdb..#RESULTS') IS NOT NULL
-- DROP TABLE #RESULTS
What it does, it search the DB for all user-created tables with their schemas, which have (n)char/(n)varchar/xml columns of a sufficient length, and search each of them one by one until at least one match is found, then it moves to the next one on the list. Match is defined as any string or XML cast as string, which contains the text ".co.uk" and an "#"-sign somewhere in there.
It will show the progress of the script (how many searchable TABLE.COLUMN combinations are have been found and which one on that list is currently running, as well as the current timestamps down to seconds) on the messages tab. When ready, it will show you all the tables and column names that contained at least one match.
So from that list, you'll have to search through the tables and columns manually to find exactly how many and what kinds of matches there are, and what it is you actually want to do.
Edit: Again I disregarded using sysnames for sysobjects, but I'll modify later if needed.
I threw together a quick query that seems to work for me:
--Search for a word in the current database
SET NOCOUNT ON;
--First make a hit list of possible tables/ columns
DECLARE #HitList TABLE (
Id INT IDENTITY(1,1) PRIMARY KEY,
TableName VARCHAR(255),
SchemaName VARCHAR(255),
ColumnName VARCHAR(255));
INSERT INTO
#HitList (
TableName,
SchemaName,
ColumnName)
SELECT
t.name,
s.name,
c.name
FROM
sys.tables t
INNER JOIN sys.columns c ON c.object_id = t.object_id
INNER JOIN sys.schemas s ON s.schema_id = t.schema_id
WHERE
c.system_type_id = 167;
--Construct Dynamic SQL
DECLARE #Id INT = 1;
DECLARE #Count INT;
SELECT #Count = COUNT(*) FROM #HitList;
DECLARE #DynamicSQL VARCHAR(1024);
WHILE #Id <= #Count
BEGIN
DECLARE #TableName VARCHAR(255);
DECLARE #SchemaName VARCHAR(255);
DECLARE #ColumnName VARCHAR(255);
SELECT #TableName = TableName FROM #HitList WHERE Id = #Id;
SELECT #SchemaName = SchemaName FROM #HitList WHERE Id = #Id;
SELECT #ColumnName = ColumnName FROM #HitList WHERE Id = #Id;
SELECT #DynamicSQL = 'SELECT * FROM [' + #SchemaName + '].[' + #TableName + '] WHERE [' + #ColumnName + '] LIKE ''%co.uk%''';
--PRINT #DynamicSQL;
EXECUTE (#DynamicSQL);
IF ##ROWCOUNT != 0
BEGIN
PRINT 'We have a hit in ' + #TableName + '.' + #ColumnName + '!!';
END;
SELECT #Id = #Id + 1;
END;
Basically it makes a list of any VARCHAR columns (you might need to change this to include NVARCHARs if you have Unicode text columns - just change the test for system type id from 167 to 231) then performs a search for each one. When you run this from management studio switch to the messages pane to see the hits and just ignore the results.
It will be slow if your database is any sort of size... but then that is to be expected?

SQL to return first two columns of a table

Is there any SQL lingo to return JUST the first two columns of a table WITHOUT knowing the field names?
Something like
SELECT Column(1), Column(2) FROM Table_Name
Or do I have to go the long way around and find out the column names first? How would I do that?
You have to get the column names first. Most platforms support this:
select column_name,ordinal_position
from information_schema.columns
where table_schema = ...
and table_name = ...
and ordinal_position <= 2
There it´s
declare #select varchar(max)
set #select = 'select '
select #select=#select+COLUMN_NAME+','
from information_schema.columns
where table_name = 'TABLE' and ordinal_position <= 2
set #select=LEFT(#select,LEN(#select)-1)+' from TABLE'
exec(#select)
A dynamic query using for xml path will also do the job:
declare #sql varchar(max)
set #sql = (SELECT top 2 COLUMN_NAME + ',' from information_schema.columns where table_name = 'YOUR_TABLE_NAME_HERE' order by ordinal_position for xml path(''))
set #sql = (SELECT replace(#sql +' ',', ',''))
exec('SELECT ' + #sql + ' from YOUR_TABLE_NAME_HERE')
I wrote a stored procedure a while back to do this exact job. Even though in relational theory there is no technical column order SSMS is not completely relational. The system stores the order in which the columns were inserted and assigns an ID to them. This order is followed using the typical SELECT * statement which is why your SELECT statements appear to return the same order each time. In practice its never a good idea to SELECT * with anything as it doesn't lock the result order in terms of columns or rows. That said I think people get so stuck on 'you shouldn't do this' that they don't write scripts that actually can do it. Fact is there is predictable system behavior so why not use it if the task isn't super important.
This SPROC of course has caveats and is written in T-SQL but if your looking to just return all of the values with the same behavior of SELECT * then this should do the job pretty easy for you. Put in your table name, the amount of columns, and hit F5. It returns them in order from left to right the same as you'd be expecting. I limited it to only 5 columns but you can edit the logic if you need any more. Takes both temp and permanent tables.
EXEC OnlySomeColumns 'MyTable', 3
/*------------------------------------------------------------------------------------------------------------------
Document Title: The Unknown SELECT SPROC.sql
Created By: CR
Date: 4.28.2013
Purpose: Returns all results from temp or permanent table when not knowing the column names
SPROC Input Example: EXEC OnlySomeColumns 'MyTable', 3
--------------------------------------------------------------------------------------------------------------------*/
IF OBJECT_ID ('OnlySomeColumns', 'P') IS NOT NULL
DROP PROCEDURE OnlySomeColumns;
GO
CREATE PROCEDURE OnlySomeColumns
#TableName VARCHAR (1000),
#TotalColumns INT
AS
DECLARE #Column1 VARCHAR (1000),
#Column2 VARCHAR (1000),
#Column3 VARCHAR (1000),
#Column4 VARCHAR (1000),
#Column5 VARCHAR (1000),
#SQL VARCHAR (1000),
#TempTable VARCHAR (1000),
#PermanentTable VARCHAR (1000),
#ColumnNamesAll VARCHAR (1000)
--First determine if this is a temp table or permanent table
IF #TableName LIKE '%#%' BEGIN SET #TempTable = #TableName END --If a temporary table
IF #TableName NOT LIKE '%#%' BEGIN SET #PermanentTable = #TableName END --If a permanent column name
SET NOCOUNT ON
--Start with a few simple error checks
IF ( #TempTable = 'NULL' AND #PermanentTable = 'NULL' )
BEGIN
RAISERROR ( 'ERROR: Please select a TempTable or Permanent Table.',16,1 )
END
IF ( #TempTable <> 'NULL' AND #PermanentTable <> 'NULL' )
BEGIN
RAISERROR ( 'ERROR: Only one table can be selected at a time. Please adjust your table selection.',16,1 )
END
IF ( #TotalColumns IS NULL )
BEGIN
RAISERROR ( 'ERROR: Please select a value for #TotalColumns.',16,1 )
END
--Temp table to gather the names of the columns
IF Object_id('tempdb..#TempName') IS NOT NULL DROP TABLE #TempName
CREATE TABLE #TempName ( ID INT, Name VARCHAR (1000) )
--Select the column order from a temp table
IF #TempTable <> 'NULL'
BEGIN
--Verify the temp table exists
IF NOT EXISTS ( SELECT 1
FROM tempdb.sys.columns
WHERE object_id = object_id ('tempdb..' + #TempTable +'') )
BEGIN
RAISERROR ( 'ERROR: Your TempTable does not exist - Please select a valid TempTable.',16,1 )
RETURN
END
SET #SQL = 'INSERT INTO #TempName
SELECT column_id AS ID, Name
FROM tempdb.sys.columns
WHERE object_id = object_id (''tempdb..' + #TempTable +''')
ORDER BY column_id'
EXEC (#SQL)
END
--From a permanent table
IF #PermanentTable <> 'NULL'
BEGIN
--Verify the temp table exists
IF NOT EXISTS ( SELECT 1
FROM syscolumns
WHERE id = ( SELECT id
FROM sysobjects
WHERE Name = '' + #PermanentTable + '' ) )
BEGIN
RAISERROR ( 'ERROR: Your Table does not exist - Please select a valid Table.',16,1 )
RETURN
END
SET #SQL = 'INSERT INTO #TempName
SELECT colorder AS ID, Name
FROM syscolumns
WHERE id = ( SELECT id
FROM sysobjects
WHERE Name = ''' + #PermanentTable + ''' )
ORDER BY colorder'
EXEC (#SQL)
END
--Set the names of the columns
IF #TotalColumns >= 1 BEGIN SET #Column1 = (SELECT Name FROM #TempName WHERE ID = 1) END
IF #TotalColumns >= 2 BEGIN SET #Column2 = (SELECT Name FROM #TempName WHERE ID = 2) END
IF #TotalColumns >= 3 BEGIN SET #Column3 = (SELECT Name FROM #TempName WHERE ID = 3) END
IF #TotalColumns >= 4 BEGIN SET #Column4 = (SELECT Name FROM #TempName WHERE ID = 4) END
IF #TotalColumns >= 5 BEGIN SET #Column5 = (SELECT Name FROM #TempName WHERE ID = 5) END
--Create a select list of only the column names you want
IF Object_id('tempdb..#FinalNames') IS NOT NULL DROP TABLE #FinalNames
CREATE TABLE #FinalNames ( ID INT, Name VARCHAR (1000) )
INSERT #FinalNames
SELECT '1' AS ID, #Column1 AS Name UNION ALL
SELECT '2' AS ID, #Column2 AS Name UNION ALL
SELECT '3' AS ID, #Column3 AS Name UNION ALL
SELECT '4' AS ID, #Column4 AS Name UNION ALL
SELECT '5' AS ID, #Column5 AS Name
--Comma Delimite the names to insert into a select statement. Bracket the names in case there are spaces
SELECT #ColumnNamesAll = COALESCE(#ColumnNamesAll + '], [' ,'[') + Name
FROM #FinalNames
WHERE Name IS NOT NULL
ORDER BY ID
--Add an extra bracket at the end to complete the string
SELECT #ColumnNamesAll = #ColumnNamesAll + ']'
--Tell the user if they selected to many columns
IF ( #TotalColumns > 5 AND EXISTS (SELECT 1 FROM #FinalNames WHERE Name IS NOT NULL) )
BEGIN
SELECT 'This script has been designed for up to 5 columns' AS ERROR
UNION ALL
SELECT 'Only the first 5 columns have been selected' AS ERROR
END
IF Object_id('tempdb..#FinalNames') IS NOT NULL DROP TABLE ##OutputTable
--Select results using only the Columns you wanted
IF #TempTable <> 'NULL'
BEGIN
SET #SQL = 'SELECT ' + #ColumnNamesAll + '
INTO ##OutputTable
FROM ' + #TempTable + '
ORDER BY 1'
EXEC (#SQL)
END
IF #PermanentTable <> 'NULL'
BEGIN
SET #SQL = 'SELECT ' + #ColumnNamesAll + '
INTO ##OutputTable
FROM ' + #PermanentTable + '
ORDER BY 1'
EXEC (#SQL)
END
SELECT *
FROM ##OutputTable
SET NOCOUNT OFF
SQL doesn't understand the order of columns. You need to know the column names to get them.
You can look into querying the information_schema to get the column names. For example:
SELECT column_name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'tbl_name'
ORDER BY ordinal_position
LIMIT 2;
You can query the sysobject of the table to find out the first two column then dynamically generate the SQL statement you need.
If you want a permant object that you can query over and over again make a view for each table that only returns the first 2 columns. You can name the columns Column1 and Column2 or use the existing names.
If you want to return the first two columns from any table without any preprocessing steps create a stored procedure that queries the system information and executes a dynamic query that return the first two columns from the table.
Or do I have to go the long way around and find out the column names first? How would I do that?
It's pretty easy to do manually.
Just run this first
select * from tbl where 1=0
This statement works on all major DBMS without needing any system catalogs.
That gives you all the column names, then all you need to do is type the first two
select colname1, colnum2 from tbl