Finding duplicate rows in table with many columns - sql

I have a table with 122 columns and ~200K rows. There are duplicate rows in this table. What query can I use to return these duplicate rows? Normally I would GROUP BY all rows and COUNT, but with 122 columns, this becomes unwieldy.
Basically, I'm looking for a query that does this:
SELECT *, COUNT(*) AS NoOfOccurrences
FROM TableName
GROUP BY *
HAVING COUNT(*) > 1

If you are using SSMS you can right-click on the table and pick "Select Top 1000 rows..." - SSMS will generate the select query for you - then all you have to do is add GROUP BY then copy the column list and paste it after that. Add the HAVING COUNT(*) > 1 and the COUNT(*) AS NoOfOccurrences and run.
I suggest that you include an ORDER BY clause as well so that the duplicated rows are displayed together in your results.
If you are not using SSMS then you can run this dynamic SQL
-- Get the list of columns
-- There are loads of ways of doing this
declare #colList nvarchar(MAX) = '';
select #colList = #colList + c.[name] +','
from sys.columns c
join sys.tables t on c.object_id =t.object_id
where t.[name] = 'tblFees';
-- remove the trailing comma
set #colList = LEFT(#colList,LEN(#colList)-1);
-- generate dynamic SQL
declare #sql1 nvarchar(max) = 'SELECT *, COUNT(*) AS NoOfOccurrences FROM TableName GROUP BY '
declare #sql2 nvarchar(max) = ' HAVING COUNT(*) > 1'
declare #sql nvarchar(max) = CONCAT(#sql1,#colList, #sql2)
--print #sql
-- run the SQL
exec sp_executesql #sql
For other ways of generating comma separated lists see Converting row values in a table to a single concatenated string

Related

Dynamic SQL causing SQL statement failed error

I am trying to select all the contents of all the columns stored in a variable from a table in SQL Server using dynamic SQL.
Here is my code:
IF #UserChoice# = 'MANGO' BEGIN
DECLARE #columns NVARCHAR(MAX)
SELECT #columns = COALESCE(#columns + ', ', '') + Cols
FROM (
SELECT DISTINCT x.value('local-name(.)', 'SYSNAME') AS Cols
FROM DBO.Fruits AS t
CROSS APPLY (SELECT t.* FOR XML PATH(''), TYPE, ROOT('root')) AS t1(c)
CROSS APPLY c.nodes('/root/*') AS t2(x)
) temp
DECLARE #sql NVARCHAR(MAX) = 'SELECT ' + #columns + ' FROM DBO.Foods;'
EXEC sp_executesql #sql;
END
In this code, there are two tables dbo.Fruits has the columns and dbo.Foods has all the columns existing in dbo.Fruits and additional columns and rows. However, dbo.Fruits has some null columns that is columns with no data in all of its rows. Thus with the help of XML null columns are removed and only non-null columns are stored in a variable #columns.
Then a dynamic SQL is written that performs select statement of the #columns from the table dbo.Foods.
Filtering of null columns is working however when I try to run the dynamic SQL. I get an error saying SQL statement failed.
FYI: I have huge data in the table. I have also tried timeout feature but not working.
Any help is appreciated.
Thanks in advance.

Selecting columns in a table where table name is selected from another table and concatenate them selecting only specific columns

I have a table:
TableName rn
Tab_1 1
Tab_2 2
Tab_3 3
Tab_1, Tab_2 and Tab_3 are tables stored in the database.
What i want is to read all these tables using a loop, select specific columns (say col1, col2, and col3) and concatenate them.
What i tried was:
'''
DECLARE
#table NVARCHAR(128),
#sql NVARCHAR(MAX);
SET #table = N'select tablename from #db2 where rn=1';
SET #sql = N'SELECT * FROM ' + #table;
EXEC sp_executesql #sql;
'''
This query does not exactly concatenate the tables one by one but i am first trying to select the tables dynamically first before i use them in a loop. This does not seem to be working, it returns 'incorrect syntax near 'select''.
#db2 is the temp table that has all the table names.
I have looked at various methods but am not able to figure one out to suit this specific problem.
How do i go about working this out?
You are actually storing SQL select statement into #table by the Set Statement
SET #table = N'select tablename from #db2 where rn=1';
Due to which the final #sql has the statement like "SELECT * FROM select tablename from #db2 where rn=1"
While you need to set the value into #table by
Select #table = db.TableName From #db2 As db With (Nolock) Where db.rn = 1
Try this way, it will work. You need either use While Loop or Cursor for your requirments.

How can I find potential not null columns?

I'm working with a SQL Server database which is very light on constraints and want to apply some not null constraints. Is there any way to scan all nullable columns in the database and select which ones do not contain any nulls or even better count the number of null values?
Perhaps with a little dynamic SQL
Example
Declare #SQL varchar(max) = '>>>'
Select #SQL = #SQL
+ 'Union All Select TableName='''+quotename(Table_Schema)+'.'+quotename(Table_Name)+''''
+',ColumnName='''+quotename(Column_Name)+''''
+',NullValues=count(*)'
+' From '+quotename(Table_Schema)+'.'+quotename(Table_Name)
+' Where '+quotename(Column_Name)+' is null '
From INFORMATION_SCHEMA.COLUMNS
Where Is_Nullable='YES'
Select #SQL='Select * from (' + replace(#SQL,'>>>Union All ','') + ') A Where NullValues>0'
Exec(#SQL)
Returns (for example)
TableName ColumnName NullValues
[dbo].[OD-Map] [Map-Val2] 185
[dbo].[OD-Map] [Map-Val3] 225
[dbo].[OD-Map] [Map-Val4] 225
For all table/columns with counts >= 0
...
Select #SQL=replace(#SQL,'>>>Union All ','')
Exec(#SQL)
Check this query. This was originally written by Linda Lawton
Original Article: https://www.daimto.com/sql-server-finding-columns-with-null-values
Finding columns with null values in your Database - Find Nulls Script
set nocount on
declare #columnName nvarchar(500)
declare #tableName nvarchar(500)
declare #select nvarchar(500)
declare #sql nvarchar(500)
-- check if the Temp table already exists
if OBJECT_ID('tempdb..#LocalTempTable') is null
Begin
CREATE TABLE #LocalTempTable(
TableName varchar(150),
ColumnName varchar(150))
end
else
begin
truncate table #LocalTempTable;
end
-- Build a select for each of the columns in the database. That checks for nulls
DECLARE check_cursor CURSOR FOR
select column_name, table_name, concat(' Select ''',column_name,''',''',table_name,''' from ',table_name,' where [',COLUMN_NAME,'] is null')
from INFORMATION_SCHEMA.COLUMNS
OPEN check_cursor
FETCH NEXT FROM check_cursor
INTO #columnName, #tableName,#select
WHILE ##FETCH_STATUS = 0
BEGIN
-- Insert it if there if it exists.
set #sql = 'insert into #LocalTempTable (ColumnName, TableName)' + #select
print #sql
-- Run the statment
exec( #sql)
FETCH NEXT FROM check_cursor
INTO #columnName, #tableName,#select
end
CLOSE check_cursor;
DEALLOCATE check_cursor;
SELECT TableName, ColumnName, COUNT(TableName) 'Count'
FROM #LocalTempTable
GROUP BY TableName, ColumnName
ORDER BY TableName
The query result would be something like this.
This will tell you which columns in your database are currently NULLABLE.
USE <Your_DB_Name>
GO
SELECT o.name AS Table_Name
, c.name AS Column_Name
FROM sys.objects o
INNER JOIN sys.columns c ON o.object_id = c.object_id
AND c.is_nullable = 1 /* 1 = NULL, 0 = NOT NULL */
WHERE o.type_desc = 'USER_TABLE'
AND o.type NOT IN ('PK','F','D') /* NOT Primary, Foreign of Default Key */
Yes, it is fairly straight forward. Note: if the table contains a lot of records, I suggest using SELECT TOP 1000 *, instead of SELECT *.
-- Identify records where a specific column is NOT NULL
SELECT *
FROM TableName
WHERE ColumNName IS NOT NULL
-- Identify the count of records where a specific column contains NULL
SELECT COUNT(1)
FROM TableName
WHERE ColumNName IS NULL
-- Identify all NULLable columns in a database
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
For more information on the INFORMATION_SCHEMA views, see this: https://learn.microsoft.com/en-us/sql/relational-databases/system-information-schema-views/system-information-schema-views-transact-sql
If you want to scan all tables and columns in a given database for NULLs, then it is a two step process.
1.) Get the list of tables and columns that are NULLABLE.
-- Identify all NULLable columns in a database
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
2.) Use Excel to create a SELECT statement to get the NULL counts for each table/column. To do this, copy and paste the query results from step 1 into EXCEL. Assuming you have copied the header row, then your data starts on row 2. In cell E2, enter the following formula.
="SELECT COUNT(1) FROM "&A2&"."&B2&"."&C2&" WHERE "&D2&" IS NULL"
Copy and paste that down the entire sheet. This will generate the SQL SELECT statement that you require. Copy the results in column E and paste into SQL Server and run it. This may take a while depending on the number of tables/columns to scan.

isnull for dynamically Generated column

I am getting temp table with dynamically generated columns let say it is columns A,B,C,D etc from other source.
Now in my hand I have temp table with column generated. I had to write stored procedure with the use of temp table.
So my stored procedure is like
create proc someproc()
as
begin
Insert into #searchtable
select isnull(#temp.*,0.00)
End
Now #searchresult is table created by me to store temp table columns. The problem arises when I want to check isnull for #tempdb columns. Because from source it comes it may be 3 columns, again next time it may be 4 columns. It changes.
Since it is dynamically generated I cannot use each column name and use like below:
isnull(column1,0.00)
isnull(column2,0.00)
I had to use all column generated and check if value is empty use 0.00
I tried this below but not working:
isnull(##temp.*,0.00),
Try with Dynamic code by fetching the column name for your dynamic table from [database].NFORMATION_SCHEMA.COLUMNS
--Get the Column Names for the your dynamic table and add the ISNULL Check:
DECLARE #COLS VARCHAR(MAX) = ''
SELECT #COLS = #COLS + ', ISNULL(' + COLUMN_NAME + ', 0.00) AS ' + COLUMN_NAME
FROM tempdb.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME LIKE '#temp[_]%' -- Dynamic Table (here, Temporary table)
DECLARE #COLNAMES VARCHAR(MAX) = STUFF(#COLS, 1, 1, '')
--Build your Insert Command:
DECLARE #cmd VARCHAR(MAX) = '
INSERT INTO #temp1
SELECT ' + #COLNAMES + ' FROM #temp'
--Execute:
EXEC (#cmd)
Hope, I understood your comment right:
CREATE PROCEDURE someproc
AS
IF OBJECT_ID(N'#searchtable') IS NOT NULL DROP TABLE #searchtable
IF OBJECT_ID(N'#temp') IS NOT NULL
BEGIN
DECLARE #sql nvarchar(max),
#cols nvarchar(max)
SELECT #cols = (
SELECT ',COALESCE('+QUOTENAME([name])+',0.00) as '+QUOTENAME([name])
FROM sys.columns
WHERE [object_id] = OBJECT_ID(N'#temp')
FOR XML PATH('')
)
SELECT #sql = N'SELECT '+STUFF(#cols,1,1,'')+' INTO #searchtable FROM #temp'
EXEC sp_executesql #sql
END
This SP checks if #temp table exists. If exists then it takes all column names from sys.columns table and we make a string like ,COALESCE([Column1],0.00) as [Column1], etc. Then we make a dynamic SQL query like:
SELECT COALESCE([Column1],0.00) as [Column1] INTO #searchtable FROM #temp
And execute it. This query result will be stored in #searchtable.
Notes: Use COALESCE instead of ISNULL, and sp_executesql instead of direct exec. It is a good practice.

Getting access to a dynamic table from dynamic sql

Good day StackOverflow
The table that I create from my dynamic sql can have any number of columns as it is a pivot table.
-- Pivot the table so we get the UW as columns rather than rows
DECLARE #SQL NVARCHAR(MAX)
SET #SQL = '
SELECT *
FROM #PreProcessed
PIVOT (SUM(Quotes)
FOR [UW] IN (' + #UWColumns + ')
) AS bob'
I run this code to run my dynamic sql.
EXEC sp_executesql #SQL,
N'#UWColumns nvarchar(MAX)',
#UWColumns
My question is, how do I store the resulting table? Especially when I don't know how many columns it will have or even what the columns will be called?
I tried the code below but it doesn't work
INSERT INTO #Temp
EXEC sp_executesql #SQL,
N'#UWColumns nvarchar(MAX)',
#UWColumns
Thanks everyone
SQL Server uses SELECT * INTO ...., as opposed to the CREATE TABLE AS syntax. So you'll need to modify your dynamic sql to:
SELECT * INTO <YOUR TABLE>
FROM #PreProcessed
PIVOT (SUM(Quotes)
FOR [UW] IN (' + #UWColumns + ')
) AS bob'
The only way I could find around this problem was to do all of my calculations in the dynamic sql. Which meant I had to work on two tables.
-- Pivot the table so we get the UW as columns rather than rows
DECLARE #SQL NVARCHAR(MAX)
SET #SQL = '
SELECT * INTO #temp
FROM #PreProcessed
PIVOT (SUM(Quotes)
FOR [UW] IN (' + #UWColumns + ')
) AS bob
SELECT DISTINCT t1.Date, d.Declines AS ''Declines'' , '+#UWColumns+'
FROM #temp AS t1 LEFT OUTER JOIN
#Declines AS d ON t1.DATE = d.DATE
'
PRINT #SQL
EXEC(#SQL)