SQL sum across columns by name - sql

I have a table with lots of columns, some of them with names beginning with "EQ". For an individual row, I'd like to sum all the values in the columns that start with "EQ", but not the other values. I know I can do it like this:
select EQ_DOMESTIC + EQ_INTL + EQ_OTHER from myTable where id=1
However, I have lots of columns, and I was wondering if I can do it systematically, without typing in the name of each column. Would I have to get the column names from the system tables in another query?
Follow up question: Some of the values are nulls, which makes the sum NULL. Is there any way to avoid writing out ISNULL(column,0) for the sum?

You can do this pretty easily with dynamic SQL:
DECLARE #sql NVARCHAR(MAX) = N'';
SELECT #sql += N'
+ COALESCE(' + QUOTENAME(name) + ', 0)'
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.MyTable')
AND name LIKE 'EQ[_]%';
SELECT #sql += N',' + QUOTENAME(name)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.MyTable')
AND name NOT LIKE 'EQ[_]%';
SELECT #sql = 'SELECT [EQ_SUM] = 0' + #sql
+ ' FROM dbo.MyTable WHERE id = 1;';
PRINT #sql;
-- EXEC sp_executesql #sql;

Related

How to get row counts from each tables in a DB based on condition

i have multiple tables in database ( say x_db) and all the tables have a common column having date values( say column 'date'). I want to retrieve count of rows for each table in database x_db where value of column 'date' is greater than a specific value.
You need dynamic SQL for this. It can be error-prone and difficult to debug.
Make sure to pass parameters in via sp_executesql
Anything that can't be passed in make sure to quote with quotename
declare #date datetime = '2021-01-01';
declare #sql nvarchar(max) =
(
select string_agg(
N'SELECT ' quotename(schema_name(t.schema_id), '''') + N'.' + quotename(t.name, '''') + N' AS tableName, COUNT(*) AS countRows
FROM ' quotename(schema_name(t.schema_id)) + N'.' + quotename(t.name) + N'
WHERE date > #date
', cast(N'
UNION ALL
' AS nvarchar(max)))
from sys.tables t
join sys.columns c on c.object_id = t.object_id and c.name = 'date'
group by t.schema_id, t.name
);
EXEC sp_executesql #sql, N'#date datetime', #date = #date;
Change the type to datetime2 or whatever if necessary

Looping through column names with dynamic SQL

I just came up with an idea for a piece of code to show all the distinct values for each column, and count how many records for each. I want the code to loop through all columns.
Here's what I have so far... I'm new to SQL so bear with the noobness :)
Hard code:
select [Sales Manager], count(*)
from [BT].[dbo].[test]
group by [Sales Manager]
order by 2 desc
Attempt at dynamic SQL:
Declare #sql varchar(max),
#column as varchar(255)
set #column = '[Sales Manager]'
set #sql = 'select ' + #column + ',count(*) from [BT].[dbo].[test] group by ' + #column + 'order by 2 desc'
exec (#sql)
Both of these work fine. How can I make it loop through all columns? I don't mind if I have to hard code the column names and it works its way through subbing in each one for #column.
Does this make sense?
Thanks all!
You can use dynamic SQL and get all the column names for a table. Then build up the script:
Declare #sql varchar(max) = ''
declare #tablename as varchar(255) = 'test'
select #sql = #sql + 'select [' + c.name + '],count(*) as ''' + c.name + ''' from [' + t.name + '] group by [' + c.name + '] order by 2 desc; '
from sys.columns c
inner join sys.tables t on c.object_id = t.object_id
where t.name = #tablename
EXEC (#sql)
Change #tablename to the name of your table (without the database or schema name).
This is a bit of an XY answer, but if you don't mind hardcoding the column names, I suggest you do just that, and avoid dynamic SQL - and the loop - entirely. Dynamic SQL is generally considered the last resort, opens you up to security issues (SQL injection attacks) if not careful, and can often be slower if queries and execution plans cannot be cached.
If you have a ton of column names you can write a quick piece of code or mail merge in Word to do the substitution for you.
However, as far as how to get column names, assuming this is SQL Server, you can use the following query:
SELECT c.name
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
Therefore, you can build your dynamic SQL from this query:
SELECT 'select '
+ QUOTENAME(c.name)
+ ',count(*) from [BT].[dbo].[test] group by '
+ QUOTENAME(c.name)
+ 'order by 2 desc'
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
and loop using a cursor.
Or compile the whole thing together into one batch and execute. Here we use the FOR XML PATH('') trick:
DECLARE #sql VARCHAR(MAX) = (
SELECT ' select ' --note the extra space at the beginning
+ QUOTENAME(c.name)
+ ',count(*) from [BT].[dbo].[test] group by '
+ QUOTENAME(c.name)
+ 'order by 2 desc'
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
FOR XML PATH('')
)
EXEC(#sql)
Note I am using the built-in QUOTENAME function to escape column names that need escaping.
You want to know the distinct coulmn values in all the columns of the table ? Just replace the table name Employee with your table name in the following code:
declare #SQL nvarchar(max)
set #SQL = ''
;with cols as (
select Table_Schema, Table_Name, Column_Name, Row_Number() over(partition by Table_Schema, Table_Name
order by ORDINAL_POSITION) as RowNum
from INFORMATION_SCHEMA.COLUMNS
)
select #SQL = #SQL + case when RowNum = 1 then '' else ' union all ' end
+ ' select ''' + Column_Name + ''' as Column_Name, count(distinct ' + quotename (Column_Name) + ' ) As DistinctCountValue,
count( '+ quotename (Column_Name) + ') as CountValue FROM ' + quotename (Table_Schema) + '.' + quotename (Table_Name)
from cols
where Table_Name = 'Employee' --print #SQL
execute (#SQL)

how do I select records that are like some string for any column in a table?

I know that I can search for a term in one column in a table in t-sql by using like %termToFind%. And I know I can get all columns in a table with this:
SELECT *
FROM MyDataBaseName.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'MyTableName`
How can I perform a like comprparison on each of the columns of a table? I have a very large table so I can't just spell out LIKE for each column.
As always, I'll suggest xml for this (I'd suggest JSON if SQL Server had native support for it :) ). You can try to use this query, though it could perform not so well on large number of rows:
;with cte as (
select
*,
(select t.* for xml raw('data'), type) as data
from test as t
)
select *
from cte
where data.exist('data/#*[local-name() != "id" and contains(., sql:variable("#search"))]') = 1
see sql fiddle demo for more detailed example.
Important note by Alexander Fedorenko in comments: it should be understood that contains function is case-sensitive and uses xQuery default Unicode code point collation for the string comparison.
More general way would be to use dynamic SQL solution:
declare #search nvarchar(max)
declare #stmt nvarchar(max)
select #stmt = isnull(#stmt + ' or ', '') + quotename(name) + ' like #search'
from sys.columns as c
where c.[object_id] = object_id('dbo.test')
--
-- also possible
--
-- select #stmt = isnull(#stmt + ' or ', '') + quotename(column_name) + ' like #search'
-- from INFORMATION_SCHEMA.COLUMNS
-- where TABLE_NAME = 'test'
select #stmt = 'select * from test where ' + #stmt
exec sp_executesql
#stmt = #stmt,
#params = N'#search nvarchar(max)',
#search = #search
sql fiddle demo
I'd use dynamic SQL here.
Full credit - this answer was initially posted by another user, and deleted. I think it's a good answer so I'm re-adding it.
DECLARE #sql NVARCHAR(MAX);
DECLARE #table NVARCHAR(50);
DECLARE #term NVARCHAR(50);
SET #term = '%term to find%';
SET #table = 'TableName';
SET #sql = 'SELECT * FROM ' + #table + ' WHERE '
SELECT #sql = #sql + COALESCE('CAST('+ column_name
+ ' as NVARCHAR(MAX)) like N''' + #term + ''' OR ', '')
FROM INFORMATION_SCHEMA.COLUMNS WHERE [TABLE_NAME] = #table
SET #sql = #sql + ' 1 = 0'
SELECT #sql
EXEC sp_executesql #sql
The XML answer is cleaner (I prefer dynamic SQL only when necessary) but the benefit of this is that it will utilize any index you have on your table, and there is no overhead in constructing the XML CTE for querying.
In case someone is looking for PostgreSQL solution:
SELECT * FROM table_name WHERE position('your_value' IN (table_name.*)::text)>0
will select all records that have 'your_value' in any column. Didn't try this with any other database.
Unfortunately this works as combining all columns to a text string and then searches for a value in that string, so I don't know a way to make it match "whole cell" only. It will always match if any part of any cell matches 'your_value'.

Aliasing columns based on cross-reference table

I have a table with over 400 columns, and these are named with our vendor's archaic naming system. I need to move this data into a new table which uses our company's naming conventions, so I have to change the names of these 400 columns.
Fortunately, I also have a table that cross-references the current column names with what they should become, like so:
Acronym | Name
----------------
A | ColumnNameA
B | ColumnNameB
C | ColumnNameC
etc...
So my question is this:
If it were only a few rows, I could easily do
SELECT
A AS ColumnNameA,
B AS ColumnNameB
FROM
Table
But there are just too many columns to do this by hand. What's the best way to dynamically change column names in a SELECT statement based off of a cross-ref table?
My effort so far:
I was thinking something along the lines of
SET #sqlCommand = 'SELECT ' + #columns + ' FROM Table'
EXEC (#sqlCommand)
but I have no idea how to set #columns to be a dynamically generated list of all the acronyms as the final column names. Is this even a viable approach?
DECLARE #sql NVARCHAR(MAX) = N'';
SELECT #sql += ',' + QUOTENAME(Acronym) + ' AS ' + QUOTENAME(Name)
FROM dbo.AcronymTable;
SET #sql = 'SELECT ' + STUFF(#sql, 1, 1, N'') + ' FROM dbo.Table;';
PRINT #sql;
--EXEC sp_executesql #sql;
The easiest way is to add an int field that determines the order of columns so it matches from source to target. Then you can:
DECLARE #sql varchar(max)
SET #SQL = 'INSERT INTO dbo.Target SELECT '
SELECT #SQL = #SQL + Acronym + ','
FROM ConversionTable
ORDER BY OrderColumn
SET #SQL = LEFT(#SQL, (LEN(#SQL) - 1))
SET #SQL = #SQL + ' FROM SourceTable'

How to find out whether a table has some unique columns

I use MS SQL Server.
Ive been handed some large tables with no constrains on them, no keys no nothing.
I know some of the columns have unique values. Is there a smart way for a given table to finde the cols that have unique values ?
Right now I do it manually for each column by counting if there is as many DISTINCT values as there are rows in the table.
SELECT COUNT(DISTINCT col) FROM table
Could prob make a cusor to loop over all the columns but want to hear if someone knows a smarter or build-in function.
Thanks.
Here's an approach that is basically similar to #JNK's but instead of printing the counts it returns a ready answer for every column that tells you whether a column consists of unique values only or not:
DECLARE #table varchar(100), #sql varchar(max);
SET #table = 'some table name';
SELECT
#sql = COALESCE(#sql + ', ', '') + ColumnExpression
FROM (
SELECT
ColumnExpression =
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(*) THEN ''UNIQUE'' ' +
'ELSE '''' ' +
'END AS ' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table
) s
SET #sql = 'SELECT ' + #sql + ' FROM ' + #table;
PRINT #sql; /* in case you want to have a look at the resulting query */
EXEC(#sql);
It simply compares COUNT(DISTINCT column) with COUNT(*) for every column. The result will be a table with a single row, where every column will contain the value UNIQUE for those columns that do not have duplicates, and empty string if duplicates are present.
But the above solution will work correctly only for those columns that do not have NULLs. It should be noted that SQL Server does not ignore NULLs when you want to create a unique constraint/index on a column. If a column contains just one NULL and all other values are unique, you can still create a unique constraint on the column (you cannot make it a primary key, though, which requires both uniquness of values and absence of NULLs).
Therefore you might need a more thorough analysis of the contents, which you could get with the following script:
DECLARE #table varchar(100), #sql varchar(max);
SET #table = 'some table name';
SELECT
#sql = COALESCE(#sql + ', ', '') + ColumnExpression
FROM (
SELECT
ColumnExpression =
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(*) THEN ''UNIQUE'' ' +
'WHEN COUNT(*) - 1 THEN ' +
'CASE COUNT(DISTINCT ' + COLUMN_NAME + ') ' +
'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE WITH SINGLE NULL'' ' +
'ELSE '''' ' +
'END ' +
'WHEN COUNT(' + COLUMN_NAME + ') THEN ''UNIQUE with NULLs'' ' +
'ELSE '''' ' +
'END AS ' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #table
) s
SET #sql = 'SELECT ' + #sql + ' FROM ' + #table;
PRINT #sql; /* in case you still want to have a look at the resulting query */
EXEC(#sql);
This solution takes NULLs into account by checking three values: COUNT(DISTINCT column), COUNT(column) and COUNT(*). It displays the results similarly to the former solution, but the possible diagnoses for the columns are more diverse:
UNIQUE means no duplicate values and no NULLs (can either be a PK or have a unique constraint/index);
UNIQUE WITH SINGLE NULL – as can be guessed, no duplicates, but there's one NULL (cannot be a PK, but can have a unique constraint/index);
UNIQUE with NULLs – no duplicates, two or more NULLs (in case you are on SQL Server 2008, you could have a conditional unique index for non-NULL values only);
empty string – there are duplicates, possibly NULLs too.
Here is I think probably the cleanest way. Just use dynamic sql and a single select statement to create a query that gives you a total row count and a count of distinct values for each field.
Fill in the DB name and tablename at the top. The DB name part is really important since OBJECT_NAME only works in the current database context.
use DatabaseName
DECLARE #Table varchar(100) = 'TableName'
DECLARE #SQL Varchar(max)
SET #SQL = 'SELECT COUNT(*) as ''Total'''
SELECT #SQL = #SQL + ',COUNT(DISTINCT ' + name + ') as ''' + name + ''''
FROM sys.columns c
WHERE OBJECT_NAME(object_id) = #Table
SET #SQL = #SQL + ' FROM ' + #Table
exec #sql
If you are using 2008, you can use the Data Profiling Task in SSIS to return Candidate Keys for each table.
This blog entry steps through the process, it's fairly simple:
http://consultingblogs.emc.com/jamiethomson/archive/2008/03/04/ssis-data-profiling-task-part-8-candidate-key.aspx
A few words what my code does:
Read's all tables and columns
Creates a temp table to hold table/columns with duplicate keys
For each table/column it runs a query. If it finds a count(*)>1 for at least one value
it makes an insert into the temp table
Select's column and values from the system tables that do not match table/columns that are found to have duplicates
DECLARE #sql VARCHAR(max)
DECLARE #table VARCHAR(100)
DECLARE #column VARCHAR(100)
CREATE TABLE #temp (tname VARCHAR(100),cname VARCHAR(100))
DECLARE mycursor CURSOR FOR
select t.name,c.name
from sys.tables t
join sys.columns c on t.object_id = c.object_id
where system_type_id not in (34,35,99)
OPEN mycursor
FETCH NEXT FROM mycursor INTO #table,#column
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = 'INSERT INTO #temp SELECT DISTINCT '''+#table+''','''+#column+ ''' FROM ' + #table + ' GROUP BY ' + #column +' HAVING COUNT(*)>1 '
EXEC (#sql)
FETCH NEXT FROM mycursor INTO #table,#column
END
select t.name,c.name
from sys.tables t
join sys.columns c on t.object_id = c.object_id
left join #temp on t.name = #temp.tname and c.name = #temp.cname
where system_type_id not in (34,35,99) and #temp.tname IS NULL
DROP TABLE #temp
CLOSE mycursor
DEALLOCATE mycursor
What about simple one line of code:
CREATE UNIQUE INDEX index_name ON table_name (column_name);
If the index is created then your column_name has only unique values. If there are dupes in your column_name, you will get an error message.