Display count AND percentage of distinct in TSQL - sql

I'm working in MSSQL 2008 on a stored procedure for data profiling. I have a script that returns the distinct values and count of each using the following dynamic SQL:
SELECT #SQL = N'SELECT ' + #FieldName + ', COUNT(*) AS [Frequency] FROM ' + #TableName + ' GROUP BY ' + #FieldName + ' ORDER BY [Frequency] DESC';
I would like to add percentage of each distinct count to that output. I think the technique used here would work for what I'm doing but I can't figure out how to integrate the two.
The desired output would show the distinct values, the count of each, and the percentage of each.
Thanks in advance for any help.

your query should be like
SELECT #SQL =
N'SELECT ' +
#FieldName +
', COUNT(*) AS [Frequency] '+
', (COUNT(*)* 100 / (Select Count(*) From '+#TableName+ ')) AS [percentage] ' +
'FROM ' +
#TableName +
' GROUP BY ' +
#FieldName +
' ORDER BY [Frequency] DESC';
see demo here

I wanted to share something I added to the excellent answer from #DhruvJoshi in hopes that it might help someone in the future.
I wanted to have percentages displayed with 2 decimal places and add a percentage sign to the output. Here's what I ended up using:
SELECT #SQL =
N'SELECT ' +
#FieldName +
', COUNT(*) AS [Frequency] '+
', CAST(CONVERT(DECIMAL(10,2),(COUNT(*)* 100.0 / (Select Count(*) From '+#TableName+ '))) AS nvarchar) + ''%'' AS [Percent] ' +
'FROM ' +
#TableName +
' GROUP BY ' +
#FieldName +
' ORDER BY [Frequency] DESC';
EXECUTE sp_executesql #SQL;
Hope that helps someone in the future. Thanks again #DhruvJoshi

Related

How to display the output and also save it in global temp table in ms-sql

Usually when we use select statement it displays the output, but when insert into is used,stores the result into temp table.i want to do both.Display result and store in temp table as well in dynamic sql.
IF #DisplayInSelect IS NOT NULL
SET #DisplayInSelect = ','+#DisplayInSelect
SET #SQL = 'IF EXISTS (SELECT DISTINCT a.'+#column_name+' FROM ['+#TableName+'] a where '+#FullCondition+' )'+
'SELECT DISTINCT ''Error at column: '+#Column_name+''' as [Error Records if found any are shown below],'''+ISNULL(#CustomErrorMessage,'ERROR')+''''+ISNULL(#DisplayInSELECT,'')+', a.'+#column_name+',* FROM ['+#TableName+'] a where '+#FullCondition+'
INSERT INTO ##error_check(SELECT DISTINCT ''Error at column: '+#Column_name+''' as [Error Records if found any are shown below],'''+ISNULL(#CustomErrorMessage,'ERROR')+''''+ISNULL(#DisplayInSELECT,'')+', a.'+#column_name+', *FROM ['+#TableName+'] a where '+#FullCondition+');
PRINT('IQR1 sql is'+#SQL)
EXEC(#SQL)
END
You have to use insert into table along with Exec. Try like this,
IF #DisplayInSelect IS NOT NULL
SET #DisplayInSelect = ',' + #DisplayInSelect
SET #SQL = 'IF EXISTS (SELECT DISTINCT a.' + #column_name + ' FROM [' + #TableName + '] a where ' + #FullCondition + ' )' + 'SELECT DISTINCT ''Error at column: ' + #Column_name + ''' as [Error Records if found any are shown below],''' + ISNULL(#CustomErrorMessage, 'ERROR') + '''' + ISNULL(#DisplayInSELECT, '') + ', a.' + #column_name + ',* FROM [' + #TableName + '] a where ' + #FullCondition + '
SELECT DISTINCT ''Error at column: ' + #Column_name + ''' as [Error Records if found any are shown below],''' + ISNULL(#CustomErrorMessage, 'ERROR') + '''' + ISNULL(#DisplayInSELECT, '') + ', a.' + #column_name + ', *FROM [' + #TableName + '] a where ' + #FullCondition + ';'
--To Save
INSERT INTO ##error_check
EXEC (#SQL)
PRINT (' IQR1 sql IS ' + #SQL)
--To Display
EXEC (#SQL)

Pivot query replacing column name instead of adding a new one

I have a dynamic pivot query I am using to retrieve data. But instead of adding a new column to the query, it is replacing the last column. Therefore causing data binding issues for one of my controls on the front end.
Set #SQL = 'WITH ResultTable AS (' + char(13)
Set #SQL = #SQL + 'Select row_number() OVER (Order By CensusId) AS RowNum ,QuoteId, CensusId, FirstName, LastName, MiddleName, InputAge, Gender, SalaryAmount, SalaryTypeId, IsDeleted,bCensusCount, Vgtl, Vstd ' + #colnames + ' ' + char(13)
Set #SQL = #SQL + 'From (Select F.CensusId, ColumnName, ColumnValue, QuoteId, FirstName, LastName, MiddleName, InputAge, Gender, SalaryAmount, SalaryTypeId, IsDeleted,bCensusCount, Vgtl, Vstd ' + char(13)
Set #SQL = #SQL + 'FROM #TblX F LEFT OUTER JOIN Census.tbl_Census c ON F.CensusId = c.CensusId) P ' + char(13)
Set #SQL = #SQL + 'Pivot (Max(ColumnValue) For ColumnName In (' + #colnames + ')) as Pvt ' + char(13)
Set #SQL = #SQL + ') SELECT * FROM ResultTable ' + char(13)
Set #SQL = #SQL + 'Where RowNum > ' + CAST(#FirstRec as varchar) + ' and RowNum < ' + CAST(#LastRec as varchar)
Exec (#SQL)
Vgtl and Vstd are the columns that I added. Vgtl shows up fine, but Vtsd is replaced by different text. I can't figure out what is causing the column name to be replaced.
The problem is in very first line (select statement), when the variable #colnames is appended to the following line the very first column in the variable #colnames will become the alias for Vstd.
'........, IsDeleted,bCensusCount, Vgtl, Vstd ' + #colnames + ' ' + char(13)
Therefor add another comma after the column Vstd and it should fix the issue something like....
'........, IsDeleted,bCensusCount, Vgtl, Vstd, ' + #colnames + ' ' + char(13)

Dynamically Count Null Values in SQL Server

I'm a little new at SQL so please bear with me. I am attempting to write some a query that will allow me to loop through an entire table and find the number of times null values appear in each column. This is easy to do the hard way by typing the following:
Select
SUM(CASE COL_1 WHEN IS NULL THEN 1 ELSE 0 END) AS COL_1_NULLS
,SUM(CASE COL_2 WHEN IS NULL THEN 1 ELSE 0 END) AS COL_2_NULLS
FROM TABLE1
This is easy but it can become arduous if you want to do this for multiple tables or if a single table has a lot of columns.
I'm looking for a way to write a query that passes a table name into it and then loops through each column in the defined table (possibly pulling the column name by ordinance via a join to a metadata view?) and then sums the number of nulls in the column. Before anyone jumps on the nitpick bandwagon please keep in mind that this basic idea could be used for more than just finding nulls. Any assistance with this issue is greatly appreciated.
You need to use dynamic sql:
declare #custom_sql varchar(max)
set #custom_sql = 'SELECT null as first_row'
select
#custom_sql = #custom_sql + ', ' + 'SUM(CASE WHEN ' + COLUMN_NAME + ' IS NULL THEN 1 ELSE 0 END) as ' + COLUMN_NAME + '_NULLS'
from
INFORMATION_SCHEMA.COLUMNS where table_name = 'MYTABLE'
set #custom_sql = #custom_sql + ' FROM MYTABLE'
exec(#custom_sql)
You can also use the COALESCE term (just for a slightly different approach):
declare #custom_sql varchar(max)
select
#custom_sql = COALESCE(#custom_sql + ', ', '') + 'SUM(CASE WHEN ' + COLUMN_NAME + ' IS NULL THEN 1 ELSE 0 END) as ' + COLUMN_NAME + '_NULLS'
from
INFORMATION_SCHEMA.COLUMNS where table_name = 'users'
set #custom_sql = 'SELECT ' + #custom_sql
set #custom_sql = #custom_sql + ' FROM Users'
print #custom_sql
exec(#custom_sql)
I don't know how to make a generic query, but you can always generate the script like this
declare #sql nvarchar(max) = 'select 1 as dummy'
select #sql = #sql + '
, sum(case when [' + c.name + '] is null then 1 else 0 end) as [' + c.name + '_NULLS]'
from sys.columns c
join sys.tables t on t.object_id = c.object_id
where t.name = 'TABLE1'
set #sql = #sql + ' from TABLE1'
select #sql
Then you can execute the result eg. with exec sp_executesql #sql
For a cooler approach, you can use ISNULL to skip the first comma.
declare #sql nvarchar(max)
declare #tablename nvarchar(255) = 'xxxx'
Select #sql = ISNULL(#SQL + ',','') + ' ' + COLUMN_NAME + '_count = Sum(case when ' + COLUMN_NAME + ' is null then 1 else 0 end)' + char(13)
From information_schema.columns
where table_name = #tablename
set #sql = 'Select' + #sql + ' From ' + #tablename
print #sql
exec sp_executesql #sql

Incorrect syntax near ',' in SQL dynamic query

When I compile there are no issues, but when I execute this stored procedure, I get:
Incorrect syntax near ','
I can't for the life of me figure out where the issue is:
DECLARE #sql nvarchar(4000)
SET #sql = 'SELECT idea, id, posted_by
FROM
(idea,id, posted_by, ROW_NUMBER() OVER(ORDER BY ' + #sortExpression + ') as RowNum
FROM ideas e
INNER JOIN buckets d ON
e.bucket_id = d.id
WHERE e.bucket_id = ' + CONVERT(nvarchar(10), #bucketId) + '
) as EmpInfo
WHERE RowNum BETWEEN ' + CONVERT(nvarchar(10), #startRowIndex) +
' AND (' + CONVERT(nvarchar(10), #startRowIndex) + ' + '
+ CONVERT(nvarchar(10), #maximumRows) + ') - 1'
EXEC sp_executesql #sql
You shouldn't be concatenating all of those things into dynamic SQL. In fact you should try to get the ORDER BY expression working without introducing dynamic SQL at all, but I understand that this can be problematic when users can pick several columns, different data types, and both directions (I discussed this problem here. So please pass in the other parameters safely:
DECLARE #sql nvarchar(4000); -- always use semi-colons
DECLARE #r INT;
SET #r = #startRowIndex + #maximumRows - 1;
SET #sql = 'SELECT idea, id, posted_by
FROM
(
SELECT -- this was your actual problem
idea,id, posted_by, -- you should prefix these with the alias
ROW_NUMBER() OVER (ORDER BY ' + #sortExpression + ') as RowNum
FROM dbo.ideas e -- always use schema prefix
INNER JOIN dbo.buckets d ON
e.bucket_id = d.id
WHERE e.bucket_id = #bucketId
) as EmpInfo
WHERE RowNum BETWEEN #startRowIndex AND #r;';
EXEC sp_executesql #sql, N'#startRowIndex INT, #bucketId INT, #r INT',
#startRowIndex, #bucketId, #r;
For some of my comments:
Bad Habits to Kick : Using EXEC() instead of sp_executesql
Ladies and gentlemen, start your semi-colons!
Bad habits to kick : avoiding the schema prefix
Also, if you're using SQL Server 2012, you should probably be using OFFSET / FETCH.
You need a SELECT at the beginning of your inner query
DECLARE #sql nvarchar(4000)
SET #sql = 'SELECT idea, id, posted_by
FROM
(**SELECT** idea,id, posted_by, ROW_NUMBER() OVER(ORDER BY ' + #sortExpression + ') as RowNum
FROM ideas e
INNER JOIN buckets d ON
e.bucket_id = d.id
WHERE e.bucket_id = ' + CONVERT(nvarchar(10), #bucketId) + '
) as EmpInfo
WHERE RowNum BETWEEN ' + CONVERT(nvarchar(10), #startRowIndex) +
' AND (' + CONVERT(nvarchar(10), #startRowIndex) + ' + '
+ CONVERT(nvarchar(10), #maximumRows) + ') - 1'
EXEC sp_executesql #sql
Hopefully that shows up correctly. I was trying to bold the work SELECT
You are missing an inner select
DECLARE #sql nvarchar(4000)
SET #sql = 'SELECT idea, id, posted_by
FROM
(**SELECT** idea,id, posted_by, ROW_NUMBER() OVER(ORDER BY ' + #sortExpression + ') as RowNum
FROM ideas e
INNER JOIN buckets d ON
e.bucket_id = d.id
WHERE e.bucket_id = ' + CONVERT(nvarchar(10), #bucketId) + '
) as EmpInfo
WHERE RowNum BETWEEN ' + CONVERT(nvarchar(10), #startRowIndex) +
' AND (' + CONVERT(nvarchar(10), #startRowIndex) + ' + '
+ CONVERT(nvarchar(10), #maximumRows) + ') - 1'
EXEC sp_executesql #sql

Most recent datetime column and count for each table

I have a DB that has 1000+ tables. 100 of those tables are prefixed with a three letters (let's say 'ABC') Only half of those prefixed tables have MODIFIEDDATETIME column.
I'm trying to do a simple select query to get all the last updated MODIFIEDDATETIME stamp for each Table that actually has a MODIFIEDDATETIME on that table and also begins with the three letter prefix.
I've tried using this function but it doesn't seem to be getting me there. Thoughts?
sp_msforeachtable '
select ''?'', modifieddatetime, count(*)
from ?
where ? like ''%ABC%''
group by modifieddatetime
order by modifieddatetime desc
'
Borrowing from another answer earlier today:
For one, I recommend staying away from undocumented and unsupported
procedures like sp_MSForEachTable. They can be changed or even
removed from SQL Server at any time, and this specific procedure may
have the same symptoms reported by many against sp_MSForEachDb. (See
some background here and here.)
...but also see sp_ineachdb.
Here is how I would do it - most importantly, pull the row count from the metadata which - while not 100% accurate to the millisecond is usually close enough - will not bog down your system performing a scan of every single table:
DECLARE #sql NVARCHAR(MAX);
SELECT #sql = N'';
CREATE TABLE #x
(
[table] NVARCHAR(255),
updated DATETIME,
[rowcount] BIGINT
);
SELECT #sql = #sql + N'INSERT #x SELECT '''
+ QUOTENAME(OBJECT_SCHEMA_NAME([object_id]))
+ '.' + QUOTENAME(OBJECT_NAME([object_id])) + ''',
MAX(MODIFIEDDATETIME), (SELECT SUM(rows) FROM sys.partitions
WHERE [object_id] = ' + CONVERT(VARCHAR(12), [object_id])
+ ') FROM ' + QUOTENAME(OBJECT_SCHEMA_NAME([object_id]))
+ '.' + QUOTENAME(OBJECT_NAME([object_id])) + ';'
FROM sys.columns
WHERE UPPER(name) = 'MODIFIEDDATETIME'
AND UPPER(OBJECT_NAME([object_id])) LIKE 'ABC%';
EXEC sp_executesql #sql;
SELECT [table],updated,[rowcount] FROM #x;
DROP TABLE #x;
That said, I don't know if using MAX(MODIFIEDDATETIME) is appropriate for knowing when a table was touched last. What if a transaction failed? What if the last operation was a delete?
You could do it with dynamic SQL, but this will probably not be very efficient on 1000 tables!
DECLARE #SQL NVARCHAR(MAX) = ''
SELECT #SQL = #SQL + ' UNION SELECT COUNT(' + QUOTENAME(Column_Name) + ') [Rows], MAX(' + QUOTENAME(Column_Name) + ') [MaxModifiedDate], ''' + QUOTENAME(Table_Schema) + '.' + QUOTENAME(Table_Name) + ''' [TableName] FROM ' + QUOTENAME(Table_Schema) + '.' + QUOTENAME(Table_Name)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE Column_Name = 'ModifiedDateTime'
AND Table_Name LIKE 'ABC%'
SET #SQL = 'SELECT MaxModifiedDate, TableName, Rows FROM (' + STUFF(#SQL, 1, 7, '') + ') t ORDER BY MaxModifiedDate DESC'
print #sql
EXEC SP_EXECUTESQL #SQL
It basically builds a query like
SELECT MaxModifiedDate, TableName, Rows
FROM ( SELECT 'Table1' [TableName], MAX(ModifiedDate) [MaxModifedDate], COUNT(ModifiedDate) [Rows]
FROM Table1
UNION
SELECT 'Table2' [TableName], MAX(ModifiedDate) [MaxModifedDate], COUNT(ModifiedDate) [Rows]
FROM Table2
UNION
SELECT 'Table3' [TableName], MAX(ModifiedDate) [MaxModifedDate], COUNT(ModifiedDate) [Rows]
FROM Table3
UNION
...
) c
ORDER BY MaxModifiedDate DESC