Retrieve tables based on its contents in SQL server - sql

I'd like to retrieve all tables and the associated column values where two of their specific columns (the column names will be passed into) that don't have the exact same content in them.
Here's a more definite break-down of the problem. Suppose, the columns that I need to look into is 'Column_1' and 'Column_2'
First identify from in INFORMATION_SCHEMA which of the tables have both of these columns present in them(possible one sub-query),
And then identify which of these tables don't have exact same content on these 2 columns meaning Column_1 != Column_2.
The following section would retrieve all the tables that has both 'Column_1' and 'Column_2'.
SELECT
TABLE_NAME
FROM
INFORMATION_SCHEMA.TABLES T
WHERE
T.TABLE_CATALOG = 'myDB' AND
T.TABLE_TYPE = 'BASE TABLE'
AND EXISTS (
SELECT T.TABLE_NAME
FROM INFORMATION_SCHEMA.COLUMNS C
WHERE
C.TABLE_CATALOG = T.TABLE_CATALOG AND
C.TABLE_SCHEMA = T.TABLE_SCHEMA AND
C.TABLE_NAME = T.TABLE_NAME AND
C.COLUMN_NAME = 'Column_1')
AND EXISTS
(
SELECT T.TABLE_NAME
FROM INFORMATION_SCHEMA.COLUMNS C
WHERE
C.TABLE_CATALOG = T.TABLE_CATALOG AND
C.TABLE_SCHEMA = T.TABLE_SCHEMA AND
C.TABLE_NAME = T.TABLE_NAME AND
C.COLUMN_NAME = 'Column_2')
As the next step, I tried to use this as a sub-query and have the following at the end but that doesn't work and sql-server returns 'Cannot call methods on sysname'. What would the next step on this? This problem assumes all columns has the exact same Data-type.
WHERE SUBQUERY.TABLE_NAME.Column_1 != SUBQUERY.TABLE_NAME.Column_2
This is what's expected :
Table_Name
Column_Name1
Column_Value_1
Column_Name2
Column_Value_2
Table_A
Column_1
abcd
Column_2
abcde
Table_A
Column_1
qwerty
Column_2
qwert
Table_A
Column_1
abcde
Column_2
eabcde
Table_B
Column_1
zxcv
Column_2
zxcde
Table_C
Column_1
asdfgh
Column_2
asdfghy
Table_C
Column_1
aaaa
Column_2
bbbb

If in fact you want to actually compare values (not length) between two columns in tables that contain those two columns, you will need to generate dynamic SQL and then execute it. This could be done semi-automatically with the following:
DECLARE #SqlTemplate VARCHAR(MAX) =
'UNION ALL'
+ ' SELECT Table_Name = <TNAME>'
+ ', Column_Name1 = <C1NAME>, Column_Value_1 = <C1>'
+ ', Column_Name2 = <C2NAME>, Column_Value_2 = <C2>'
+ ' FROM <T>'
+ ' WHERE ISNULL(<C1>, '(null)') <> ISNULL(<C2>, '(null)')'
SELECT T.TABLE_NAME
, REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
#SqlTemplate
, '<TNAME>', QUOTENAME(T.TABLE_SCHEMA + '.' + T.TABLE_NAME, ''''))
, '<C1NAME>', QUOTENAME(C1.COLUMN_NAME, ''''))
, '<C2NAME>', QUOTENAME(C2.COLUMN_NAME, ''''))
, '<T>', QUOTENAME(T.TABLE_SCHEMA) + '.' + QUOTENAME(T.TABLE_NAME))
, '<C1>', QUOTENAME(C1.COLUMN_NAME))
, '<C2>', QUOTENAME(C2.COLUMN_NAME))
FROM INFORMATION_SCHEMA.TABLES T
JOIN INFORMATION_SCHEMA.COLUMNS C1
ON C1.TABLE_CATALOG = T.TABLE_CATALOG
AND C1.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C1.TABLE_NAME = T.TABLE_NAME
AND C1.COLUMN_NAME = 'Column_1'
JOIN INFORMATION_SCHEMA.COLUMNS C2
ON C2.TABLE_CATALOG = T.TABLE_CATALOG
AND C2.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C2.TABLE_NAME = T.TABLE_NAME
AND C2.COLUMN_NAME = 'Column_2'
WHERE T.TABLE_CATALOG = 'myDB'
AND T.TABLE_TYPE = 'BASE TABLE'
This would generate sql for each qualifying table of the form:
UNION ALL SELECT Table_Name = 'dbo.Z', Column_Name1 = 'X', Column_Value_1 = [X], Column_Name2 = 'Y', Column_Value_2 = [Y] FROM [dbo].[Z] WHERE ISNULL([X], '(null)') <> ISNULL([Y], '(null)')
After running the above, you would then cut & paste the generated SQL into another query window, remove the initial 'UNION ALL', and then execute the remaining SQL to get the final results.
There are ways of combining all the SQL into a single string and executing it automatically, but your problem sounds like a one-off process that doesn't warrant the extra complexity.

I believe you need to compare the CHARACTER_MAXIMUM_LENGTH or CHARACTER_OCTET_LENGTH metadata values in the INFORMATION_SCHEMA.COLUMNS table instead of using LEN(). This can be done using something like:
SELECT T.TABLE_NAME
, C1.COLUMN_NAME, C1.DATA_TYPE, C1.CHARACTER_MAXIMUM_LENGTH
, C2.COLUMN_NAME, C2.DATA_TYPE, C2.CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.TABLES T
JOIN INFORMATION_SCHEMA.COLUMNS C1
ON C1.TABLE_CATALOG = T.TABLE_CATALOG
AND C1.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C1.TABLE_NAME = T.TABLE_NAME
AND C1.COLUMN_NAME = 'Column_1'
JOIN INFORMATION_SCHEMA.COLUMNS C2
ON C2.TABLE_CATALOG = T.TABLE_CATALOG
AND C2.TABLE_SCHEMA = T.TABLE_SCHEMA
AND C2.TABLE_NAME = T.TABLE_NAME
AND C2.COLUMN_NAME = 'Column_2'
WHERE T.TABLE_CATALOG = 'myDB'
AND T.TABLE_TYPE = 'BASE TABLE'
AND C1.CHARACTER_MAXIMUM_LENGTH <> C2.CHARACTER_MAXIMUM_LENGTH
The inner joins both limit results to tables having both columns and retrieve the column metadata. The length compare at the end checks for a mismatch.
This assumes character types. You might also want to check DATA_TYPE consistency ("char" vs "varchar" vs "nvarchar") or some of the other precision and scale values for other non-character data types.

To query the data within the columns you need dynamic SQL. I would advise you not to use INFORMATION_SCHEMA (which is for compatibility only) and instead use sys.tables etc. You don't need to check sys.columns twice, you can use aggregation in the EXISTS subquery to check for multiple columns.
To compare the columns, you can do Column_1 <> Column_2, but that will not deal with nulls correctly. If the columns can be nullable then you should instead use the syntax shown in the code below: NOT EXISTS (SELECT Column_1 INTERSECT SELECT Column_2)
DECLARE #sql nvarchar(max);
SELECT
STRING_AGG(CAST('
SELECT
Table_Name = ' + QUOTENAME(t.name, '''') + ',
Column_1,
Column_2
FROM ' + QUOTENAME(s.name) + '.' + QUOTENAME(t.name) + '
WHERE NOT EXISTS (SELECT Column_1 INTERSECT SELECT Column_2)
' AS nvarchar(max)), '
UNION ALL
' )
FROM sys.tables t
JOIN sys.schemas s ON s.schema_id = t.schema_id
AND s.name = 'myDB'
WHERE EXISTS (SELECT 1
FROM sys.columns c
WHERE c.object_id = t.object_id
AND c.name IN ('Column_1', 'Column_2')
HAVING COUNT(*) = 2
AND COUNT(DISTINCT c.system_type_id) = 1 -- all same type
);
PRINT #sql; -- your friend
EXEC sp_executesql #sql;

Related

How to find all columns in a database with all NULL values

I have a large Snowflake database with 70+ tables and 3000+ fields. Is there a query I can use across the entire database to find all columns with all NULLs? I have a command I can use to find all the columns
select * from prod_db.information_schema.columns
Is there a way to modify that command to identify which columns are all NULLs? If there is not a way to do it across the entire database. Is there a way to do it across a table? I do not want to type:
select column_name from prod_db.information_schema.table_name
3000+ times. Thanks!
This uses a SQL generator to generate a SQL statement that will locate columns matching two criteria:
The column is in a table with one or more rows
The column has all nulls.
To be highly efficient, rather than checking the each table entirely, it uses a UNION ALL block that looks for a single non-null row in each table. It uses TOP 1 to find a not null row. That way as soon as it finds a not null row, it returns that row and stops scanning that table so it can move to another table scan.
This means that the large UNION ALL section will list tables where it finds a not null row, which is the opposite of what we want. To use this information, a CTE wrapped around the UNION ALL will do an anti-join against the column view in the information schema.
with COLS as
(
select 'select top 1 ''' || C.TABLE_CATALOG || ''' as TABLE_CATALOG, ''' || C.TABLE_SCHEMA ||
''' as TABLE_SCHEMA, ''' || C.TABLE_NAME || ''' as TABLE_NAME, ''' || C.COLUMN_NAME ||
''' as COLUMN_NAME from "' ||
C.TABLE_CATALOG || '"."' || C.TABLE_SCHEMA || '"."' || C.TABLE_NAME || '"' ||
' where "' || C.COLUMN_NAME || '" is not null'
as NULL_CHECK
from INFORMATION_SCHEMA.COLUMNS C
left join INFORMATION_SCHEMA.TABLES T on
C.TABLE_CATALOG = T.TABLE_CATALOG and
C.TABLE_SCHEMA = T.TABLE_SCHEMA and
C.TABLE_NAME = T.TABLE_NAME
where C.IS_NULLABLE = 'YES' and T.TABLE_TYPE = 'BASE TABLE'
and T.ROW_COUNT > 0
), UNIONED as
(
select listagg(NULL_CHECK, '\nunion all\n') as UNIONED from COLS
)
select replace($$
with NON_NULL_COLUMNS as (
!~UNIONED~!
)
select C.TABLE_CATALOG, C.TABLE_SCHEMA, C.TABLE_NAME, C.COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS C
left join NON_NULL_COLUMNS NN
on C.TABLE_CATALOG = NN.TABLE_CATALOG
and C.TABLE_SCHEMA = NN.TABLE_SCHEMA
and C.TABLE_NAME = NN.TABLE_NAME
and C.COLUMN_NAME = NN.COLUMN_NAME
left join INFORMATION_SCHEMA.TABLES T
on C.TABLE_CATALOG = T.TABLE_CATALOG
and C.TABLE_SCHEMA = T.TABLE_SCHEMA
and C.TABLE_NAME = T.TABLE_NAME
where NN.COLUMN_NAME is null and T.ROW_COUNT > 0
;$$, '!~UNIONED~!', UNIONED) as SQL_TO_RUN from UNIONED
;
You can produce a list of SELECT queries for each column as follows
SELECT CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT('SELECT ''', TABLE_NAME), ''', '''), COLUMN_NAME), ''', '), 'COUNT(*) FROM '), TABLE_NAME), ' WHERE '), COLUMN_NAME), ' IS NULL OR '), LEN(COLUMN_NAME)), ' = 0'), ' UNION ')
from information_schema.columns
The result of the above query can then be taken and executed to get the result you need (PS: Remove the UNION on the last row produced from Step 1 before executing)
Hope this helps.

Search database for table with 2 or more specified column names

I have the following query that I use very frequently to find a table in a database that has a specified column name:
SELECT Table_Name, Column_Name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_CATALOG = 'db' AND COLUMN_NAME = 'col_A'
I'm now trying to find a table that has both of the specified columns in the query (ex: both col_A and col_B). I thought it would have been as simple as just further qualifying the WHERE clause, but that was to no avail. Any tips?
Another way that satisfies the "2 or more" requirement without major modifications:
;WITH input(ColumnName) AS
(
SELECT y FROM (VALUES
/* simply list your columns here: */
(N'col_A'),
(N'col_B')
) AS x(y)
)
SELECT t.name FROM input
INNER JOIN sys.columns AS c ON c.name = input.ColumnName
INNER JOIN sys.tables AS t ON c.[object_id] = t.[object_id]
GROUP BY t.name HAVING COUNT(*) = (SELECT COUNT(*) FROM input);
Example db<>fiddle
And FWIW why I don't use INFORMATION_SCHEMA.
As long as you know the database already, this should work for you:
select t.TABLE_NAME
from INFORMATION_SCHEMA.TABLES t
inner join INFORMATION_SCHEMA.COLUMNS c
on t.TABLE_NAME = c.TABLE_NAME
and c.COLUMN_NAME = 'col_A'
inner join INFORMATION_SCHEMA.COLUMNS c2
on t.TABLE_NAME = c2.TABLE_NAME
and c2.COLUMN_NAME = 'col_B'
If you want all column names and tables names that have both columnname you can do
SELECT Table_Name, Column_Name,TABLE_CATALOG
FROM INFORMATION_SCHEMA.COLUMNS
WHERE EXISTS( SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_CATALOG = 'testdb' AND column_name = 'col_a')
AND EXISTS( SELECT 1 FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_CATALOG = 'testdb' AND column_name = 'col_b')

How to find a particular column in all tables and modify the values of that column in PostgreSQL

I am new to PostgreSQL. Actually I want to change all the email addresses in all the tables to some fake email address. Like I want that abc#gmail.com should become abc#1234gmail.com, xyz#hotmail.com should become xyz#1234hotmail.com and so on.
I found the query that gives the tables that have email column in it. Here is the query
select t.table_schema, t.table_name
from information_schema.tables t
inner join information_schema.columns c on c.table_name = t.table_name and c.table_schema = t.table_schema
where c.column_like '%email%'
and t.table_schema not in ('information_schema', 'pg_catalog')
and t.table_type = 'BASE TABLE'
order by t.table_schema;
It is giving me some records. Now one way is go to each table and alter the values of the email column. But can I modify the above query to also get the value of each email and change it to some fake address. Like if email value is abc#gmail.com then just append 12345 or any value after the # sign of each email address. So each email value becomes abc#1234gmail.com, xyz#1234hotmail.com and etc
You can build a set of update queries for each column as below -
select concat('update ', t.table_schema, '.', t.table_name, ' set ', c.column_name, ' = replace(', c.column_name, ', ''#'', ''#1234'') ')
from information_schema.tables t
inner join information_schema.columns c on c.table_name = t.table_name and c.table_schema = t.table_schema
where c.column_name like '%email%'
and t.table_schema not in ('information_schema', 'pg_catalog')
and t.table_type = 'BASE TABLE' order by t.table_schema;
This query would build one update statement per column that uses replace function to replace '#' with '#1234'. I haven't fully tested it, but hope it gives you an approach to work with.

Selecting column names separated by a comma

I want to write a SQL statement (SQL Server) which selects column names under specific conditions and "returns" all column names separated by a ,.
SELECT COLUMN_NAME
FROM all_tab_columns
WHERE OWNER = 'KOCH' AND TABLE_NAME = 'TABLE1';
This returns several rows of column names:
NAME
ID
STATE
CITY
But I want them to be returned in this format:
NAME, ID, STATE, CITY
(I think I have to use FROM dual)?
For Oracle:
;WITH CTE_Columns (column_list, column_id) AS
(
SELECT
CAST(COLUMN_NAME AS VARCHAR(500)) AS column_list,
COLUMN_ID
FROM USER_TAB_COLUMNS C
WHERE
TABLE_NAME = 'TABLE1' AND
COLUMN_ID= 1
UNION ALL
SELECT
CAST(column_list + ', ' + C.COLUMN_NAME AS VARCHAR(500)),
C.COLUMN_ID
FROM
CTE_Columns CL
INNER JOIN USER_TAB_COLUMNS C ON
C.TABLE_NAME = 'TABLE1' AND
C.COLUMN_ID = CL.COLUMN_ID + 1
)
SELECT column_list
FROM CTE_Columns
WHERE
COLUMN_ID =
(
SELECT MAX(COLUMN_ID)
FROM USER_TAB_COLUMNS
WHERE TABLE_NAME = 'TABLE1')

how to get all the tables and structure in a database in a printed format?

how to get all the tables and structure in a database in a table format using an sql query or stored procedure?
Structure is like below:
Sl No FieldName DataType Size Description
1 UserName varchar 50
This should do the trick. There is a bit more information in here but I think you may find it useful.
Select t.Table_Schema,
t.Table_Name,
c.Column_Name,
IsNull(c.Column_Default, '') as 'Column_Default',
c.Is_Nullable,
c.Data_Type,
IsNull(c.Character_Maximum_Length, IsNull(Numeric_Precision,'') + IsNull(Numeric_Scale, IsNull(DateTime_Precision,''))) as 'Size'
From Information_Schema.Tables t
Join Information_Schema.Columns c on t.Table_Catalog = c.Table_Catalog
And t.Table_Schema = c.Table_Schema
And t.Table_Name = c.Table_Name
Where t.Table_Type = 'BASE TABLE'
Order by t.Table_Schema, t.Table_Name, c.Ordinal_Position