SQL omit columns with all same value

SQL omit columns with all same value - sql

Is there a way to write a SQL query to omit columns that have all of the same values? For example,
row A B
1 9 0
2 7 0
3 5 0
4 2 0
I'd like to return just
row A
1 9
2 7
3 5
4 2

Although it is possible to use SQL to find if all rows in a column have identical values, there is no way to make a fixed SQL statement not return a column based on the content of the query.
Here is how to find out if all items in a column have identical values:
SELECT COUNT(row)=COUNT(DISTINCT B) FROM my_table
You can run a preliminary query to see if a column needs to be displayed, and then form the query dynamically, including the column only when you need it.

The only way to change what columns are returned is by executing separate queries. So you'd have to do something like:
IF EXISTS(SELECT null FROM myTable WHERE B <> 0)
BEGIN
SELECT row, A, B FROM myTable
END
ELSE
BEGIN
SELECT row, A FROM myTable
END
However it's generally bad practice to return different columns based on the data - otherwise you make the client determine if a particular column is in the result set first before trying to access the data.
This sort of requirement is more commonly done when displaying the data, e.g. in a web page, in a report, etc.

There is no way to write a query statement that will only return columns that have disparate values. You can however use some conditonnal statements to execute different queries based on your needs.
You could also insert your query result into a temporary table, loop over the columns, build a new select statement that includes only the columns that have different values and finally execute the statement.
Note: You should probably just include bit columns to indicate wheter columns are all duplicates or not. The application could then just discard any column that has been indicated as all duplicates.
Anyway, here's an example solution for SQL SERVER
-- insert results into a temp table
SELECT *
INTO #data
FROM (
SELECT 1 AS col1, 1 AS col2, 2 AS col3
UNION ALL
SELECT 2, 1, 3
) d;
DECLARE
#column sysname,
#sql nvarchar(max) = '',
#finalSql nvarchar(500) = 'SELECT ',
#allDuplicates bit;
DECLARE colsCur CURSOR
FOR
SELECT name
FROM tempdb.sys.columns
WHERE object_id = OBJECT_ID('tempdb..#data');
OPEN colsCur;
FETCH NEXT FROM colsCur INTO #column;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #sql = N'SELECT #allDuplicates = CASE COUNT(DISTINCT ' + #column + ') WHEN 1 THEN 1 ELSE 0 END FROM #data';
EXEC sp_executesql #sql, N'#allDuplicates int OUT', #allDuplicates = #allDuplicates OUT;
IF #allDuplicates = 0 SET #finalSql = #finalSql + #column + ',';
FETCH NEXT FROM colsCur INTO #column;
END
CLOSE colsCur;
DEALLOCATE colsCur;
SET #finalSql = LEFT(#finalSql, LEN(#finalSql) - 1) + ' FROM #data';
EXEC sp_executesql #finalSql;
DROP TABLE #data;

Related

How to use SQL column name only if the column exists?

I've been trying to get a textual result from CASE(def.OPTION_CATEGORY_ID) in a select, but I'm having a hard time implementing the terms.
What I'm trying to do is to check if OPTION_CATEGORY_ID is an existing column in sys.columns. If it is then I'm trying to make the int to text translation using the When - Then on the bottom but the SQL is not aware of the column name so CASE(def.OPTION_CATEGORY_ID) is failing because the column name is invalid.
Is there any way to call def.OPTION_CATEGORY_ID using an alias name so the SQL won't fail it before hand?
Thanks
SELECT DISTINCT *
FROM
(SELECT def.OPTION_DEF_ID AS 'Def ID',
ass.ASSET_NAME AS 'Asset Name',
CASE(def.OPTION_TYPE_ID)
WHEN 1 THEN 'A'
WHEN 2 THEN 'B'
END AS 'Option Type',
case when exists (SELECT name
FROM sys.columns
WHERE Name = 'OPTION_CATEGORY_ID'
AND Object_ID = Object_ID(N'TFC_OPTION_DEFINITION'))
then
-- Column Exists
CASE(def.OPTION_CATEGORY_ID)
WHEN 1 THEN 'C'
WHEN 2 THEN 'D'
WHEN 3 THEN 'E'
WHEN 4 THEN 'F'
end
End AS 'test' ,
-- the rest of the select

If you want to select a column and you are not sure if it exists, you cannot write it explicitly in your code, and expect it to compile.
You should build the statement dynamically based on the run time information of which the column exists or not:
DECLARE #statement VARCHAR(MAX)
IF((SELECT COUNT(*)
FROM sys.columns
WHERE Name = 'OPTION_CATEGORY_ID'
AND Object_ID = Object_ID(N'TFC_OPTION_DEFINITION')) > 0)
BEGIN
SET #statement = --assign a query which uses OPTION_CATEGORY_ID
END
ELSE
BEGIN
SET #statement = --assign a query which does not use OPTION_CATEGORY_ID
END
EXEC sp_executesql #statement

How to UPDATE all columns of a record without having to list every column

I'm trying to figure out a way to update a record without having to list every column name that needs to be updated.
For instance, it would be nice if I could use something similar to the following:
// the parts inside braces are what I am trying to figure out
UPDATE Employee
SET {all columns, without listing each of them}
WITH {this record with id of '111' from other table}
WHERE employee_id = '100'
If this can be done, what would be the most straightforward/efficient way of writing such a query?

It's not possible.
What you're trying to do is not part of SQL specification and is not supported by any database vendor. See the specifications of SQL UPDATE statements for MySQL, Postgresql, MSSQL, Oracle, Firebird, Teradata. Every one of those supports only below syntax:
UPDATE table_reference
SET column1 = {expression} [, column2 = {expression}] ...
[WHERE ...]

This is not posible, but..
you can doit:
begin tran
delete from table where CONDITION
insert into table select * from EqualDesingTabletoTable where CONDITION
commit tran
be carefoul with identity fields.

Here's a hardcore way to do it with SQL SERVER. Carefully consider security and integrity before you try it, though.
This uses schema to get the names of all the columns and then puts together a big update statement to update all columns except ID column, which it uses to join the tables.
This only works for a single column key, not composites.
usage: EXEC UPDATE_ALL 'source_table','destination_table','id_column'
CREATE PROCEDURE UPDATE_ALL
#SOURCE VARCHAR(100),
#DEST VARCHAR(100),
#ID VARCHAR(100)
AS
DECLARE #SQL VARCHAR(MAX) =
'UPDATE D SET ' +
-- Google 'for xml path stuff' This gets the rows from query results and
-- turns into comma separated list.
STUFF((SELECT ', D.'+ COLUMN_NAME + ' = S.' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = #DEST
AND COLUMN_NAME <> #ID
FOR XML PATH('')),1,1,'')
+ ' FROM ' + #SOURCE + ' S JOIN ' + #DEST + ' D ON S.' + #ID + ' = D.' + #ID
--SELECT #SQL
EXEC (#SQL)

In Oracle PL/SQL, you can use the following syntax:
DECLARE
r my_table%ROWTYPE;
BEGIN
r.a := 1;
r.b := 2;
...
UPDATE my_table
SET ROW = r
WHERE id = r.id;
END;
Of course that just moves the burden from the UPDATE statement to the record construction, but you might already have fetched the record from somewhere.

How about using Merge?
https://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx
It gives you the ability to run Insert, Update, and Delete. One other piece of advice is if you're going to be updating a large data set with indexes, and the source subset is smaller than your target but both tables are very large, move the changes to a temporary table first. I tried to merge two tables that were nearly two million rows each and 20 records took 22 minutes. Once I moved the deltas over to a temp table, it took seconds.

If you are using Oracle, you can use rowtype
declare
var_x TABLE_A%ROWTYPE;
Begin
select * into var_x
from TABLE_B where rownum = 1;
update TABLE_A set row = var_x
where ID = var_x.ID;
end;
/
given that TABLE_A and TABLE_B are of same schema

It is possible. Like npe said it's not a standard practice. But if you really have to:
1. First a scalar function
CREATE FUNCTION [dte].[getCleanUpdateQuery] (#pTableName varchar(40), #pQueryFirstPart VARCHAR(200) = '', #pQueryLastPart VARCHAR(200) = '', #pIncludeCurVal BIT = 1)
RETURNS VARCHAR(8000) AS
BEGIN
DECLARE #pQuery VARCHAR(8000);
WITH cte_Temp
AS
(
SELECT
C.name
FROM SYS.COLUMNS AS C
INNER JOIN SYS.TABLES AS T ON T.object_id = C.object_id
WHERE T.name = #pTableName
)
SELECT #pQuery = (
CASE #pIncludeCurVal
WHEN 0 THEN
(
STUFF(
(SELECT ', ' + name + ' = ' + #pQueryFirstPart + #pQueryLastPart FROM cte_Temp FOR XML PATH('')), 1, 2, ''
)
)
ELSE
(
STUFF(
(SELECT ', ' + name + ' = ' + #pQueryFirstPart + name + #pQueryLastPart FROM cte_Temp FOR XML PATH('')), 1, 2, ''
)
) END)
RETURN 'UPDATE ' + #pTableName + ' SET ' + #pQuery
END
2. Use it like this
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery(<your table name>, <query part before current value>, <query part after current value>, <1 if current value is used. 0 if updating everything to a static value>);
EXEC (#pQuery)
Example 1: make all employees columns 'Unknown' (you need to make sure column type matches the intended value:
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery('employee', '', 'Unknown', 0);
EXEC (#pQuery)
Example 2: Remove an undesired text qualifier (e.g. #)
DECLARE #pQuery VARCHAR(8000) = dte.getCleanUpdateQuery('employee', 'REPLACE(', ', ''#'', '''')', 1);
EXEC (#pQuery)
This query can be improved. This is just the one I saved and sometime I use. You get the idea.

Similar to an upsert, you could check if the item exists on the table, if so, delete it and insert it with the new values (technically updating it) but you would lose your rowid if that's something sensitive to keep in your case.
Behold, the updelsert
IF NOT EXISTS (SELECT * FROM Employee WHERE ID = #SomeID)
INSERT INTO Employee VALUES(#SomeID, #Your, #Vals, #Here)
ELSE
DELETE FROM Employee WHERE ID = #SomeID
INSERT INTO Employee VALUES(#SomeID, #Your, #Vals, #Here)

you could do it by deleting the column in the table and adding the column back in and adding a default value of whatever you needed it to be. then saving this will require to rebuild the table

Export data from a non-normalized database

I need to export data from a non-normalized database where there are multiple columns to a new normalized database.
One example is the Products table, which has 30 boolean columns (ValidSize1, ValidSize2 ecc...) and every record has a foreign key which points to a Sizes table where there are 30 columns with the size codes (XS, S, M etc...). In order to take the valid sizes for a product I have to scan both tables and take the value SizeCodeX from the Sizes table only if ValidSizeX on the product is true. Something like this:
Products Table
--------------
ProductCode <PK>
Description
SizesTableCode <FK>
ValidSize1
ValidSize2
[...]
ValidSize30
Sizes Table
-----------
SizesTableCode <PK>
SizeCode1
SizeCode2
[...]
SizeCode30
For now I am using a "template" query which I repeat for 30 times:
SELECT
Products.Code,
Sizes.SizesTableCode, -- I need this code because different codes can have same size codes
Sizes.Size_1
FROM Products
INNER JOIN Sizes
ON Sizes.SizesTableCode = Products.SizesTableCode
WHERE Sizes.Size_1 IS NOT NULL
AND Products.ValidSize_1 = 1
I am just putting this query inside a loop and I replace the "_1" with the loop index:
SET #counter = 1;
SET #max = 30;
SET #sql = '';
WHILE (#counter <= #max)
BEGIN
SET #sql = #sql + ('[...]'); -- Here goes my query with dynamic indexes
IF #counter < #max
SET #sql = #sql + ' UNION ';
SET #counter = #counter + 1;
END
INSERT INTO DestDb.ProductsSizes EXEC(#sql); -- Insert statement
GO
Is there a better, cleaner or faster method to do this? I am using SQL Server and I can only use SQL/TSQL.

You can prepare a dynamic query using the SYS.Syscolumns table to get all value in row
DECLARE #SqlStmt Varchar(MAX)
SET #SqlStmt=''
SELECT #SqlStmt = #SqlStmt + 'SELECT '''+ name +''' column , UNION ALL '
FROM SYS.Syscolumns WITH (READUNCOMMITTED)
WHERE Object_Id('dbo.Products')=Id AND ([Name] like 'SizeCode%' OR [Name] like 'ProductCode%')
IF REVERSE(#SqlStmt) LIKE REVERSE('UNION ALL ') + '%'
SET #SqlStmt = LEFT(#SqlStmt, LEN(#SqlStmt) - LEN('UNION ALL '))
print ( #SqlStmt )

Well, it seems that a "clean" (and much faster!) solution is the UNPIVOT function.
I found a very good example here:
http://pratchev.blogspot.it/2009/02/unpivoting-multiple-columns.html

Pass EXEC statement to APPLY as a parameter

I have a need to grab data from multiple databases which has tables with the same schema. For this I created synonyms for this tables in the one of the databases. The number of databases will grow with time. So, the procedure, which will grab the data should be flexible. I wrote the following code snippet to resolve the problem:
WHILE #i < #count
BEGIN
SELECT #synonymName = [Name]
FROM Synonyms
WHERE [ID] = #i
SELECT #sql = 'SELECT TOP (1) *
FROM [dbo].[synonym' + #synonymName + '] as syn
WHERE [syn].[Id] = tr.[Id]
ORDER BY [syn].[System.ChangedDate] DESC'
INSERT INTO #tmp
SELECT col1, col2
FROM
(
SELECT * FROM TableThatHasRelatedDataFromAllTheSynonyms
WHERE [Date] > #dateFrom
) AS tr
OUTER APPLY (EXEC(#sql)) result
SET #i = #i + 1
END
I also appreciate for any ideas on how to simplify the solution.

Actually, it's better to import data from all tables into one table (maybe with additional column for source table name) and use it. Importing can be performed through SP or SSIS package.
Regarding initial question - you can achieve it through TVF wrapper for exec statement (with exec .. into inside it).
UPD: As noticed in the comments exec doesn't work inside TVF. So, if you really don't want to change DB structure and you need to use a lot of tables I suggest to:
OR select all data from synonym*** table into variables (as I see you select only one row) and use them
OR prepare dynamic SQL for complete statement (with insert, etc.) and use temporary table instead of table variable here.

My solution is quite simple. Just to put all the query to the string and exec it. Unfortunately it works 3 times slower than just copy/past the code for all the synonyms.
WHILE #i < #count
BEGIN
SELECT #synonymName = [Name]
FROM Synonyms
WHERE [ID] = #i
SELECT #sql = 'SELECT col1, col2
FROM
(
SELECT * FROM TableThatHasRelatedDataFromAllTheSynonyms
WHERE [Date] > ''' + #dateFrom + '''
) AS tr
OUTER APPLY (SELECT TOP (1) *
FROM [dbo].[synonym' + #synonymName + '] as syn
WHERE [syn].[Id] = tr.[Id]
ORDER BY [syn].[System.ChangedDate] DESC) result'
INSERT INTO #tmp
EXEC(#sql)
SET #i = #i + 1
END

SQL Query to check if 40 columns in table is null

How do I select few columns in a table that only contain NULL values for all the rows?
Suppose if Table has 100 columns, among this 100 columns 60 columns has null values.
How can I write where condition to check if 60 columns are null.

maybe with a COALESCE
SELECT * FROM table WHERE coalesce(col1, col2, col3, ..., colN) IS NULL

where c1 is null and c2 is null ... and c60 is null
shortcut using string concatenation (Oracle syntax):
where c1||c2||c3 ... c59||c60 is null

First of all, if you have a table that has so many nulls and you use SQL Server 2008 - you might want to define the table using sparse columns (http://msdn.microsoft.com/en-us/library/cc280604.aspx).
Secondly I am not sure if coalesce solves the question asks - it seems like Ammu might actually want to find the list of columns that are null for all rows, but I might have misunderstood. Nevertheless - it is an interesting question, so I wrote a procedure to list null columns for any given table:
IF (OBJECT_ID(N'PrintNullColumns') IS NOT NULL)
DROP PROC dbo.PrintNullColumns;
go
CREATE PROC dbo.PrintNullColumns(#tablename sysname)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #query nvarchar(max);
DECLARE #column sysname;
DECLARE columns_cursor CURSOR FOR
SELECT c.name
FROM sys.tables t JOIN sys.columns c ON t.object_id = c.object_id
WHERE t.name = #tablename AND c.is_nullable = 1;
OPEN columns_cursor;
FETCH NEXT FROM columns_cursor INTO #column;
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #query = N'
DECLARE #c int
SELECT #c = COUNT(*) FROM ' + #tablename + ' WHERE ' + #column + N' IS NOT NULL
IF (#c = 0)
PRINT (''' + #column + N''');'
EXEC (#query);
FETCH NEXT FROM columns_cursor INTO #column;
END
CLOSE columns_cursor;
DEALLOCATE columns_cursor;
SET NOCOUNT OFF;
RETURN;
END;
go

If you don't want to write the columns names, Try can do something like this.
This will show you all the rows when all of the columns values are null except for the columns you specified (IgnoreThisColumn1 & IgnoreThisColumn2).
DECLARE #query NVARCHAR(MAX);
SELECT #query = ISNULL(#query+', ','') + [name]
FROM sys.columns
WHERE object_id = OBJECT_ID('yourTableName')
AND [name] != 'IgnoreThisColumn1'
AND [name] != 'IgnoreThisColumn2';
SET #query = N'SELECT * FROM TmpTable WHERE COALESCE('+ #query +') IS NULL';
EXECUTE(#query)
Result
If you don't want rows when all the columns are null except for the columns you specified, you can simply use IS NOT NULL instead of IS NULL
SET #query = N'SELECT * FROM TmpTable WHERE COALESCE('+ #query +') IS NOT NULL';
Result
[

Are you trying to find out if a specific set of 60 columns are null, or do you just want to find out if any 60 out of the 100 columns are null (not necessarily the same 60 for each row?)
If it is the latter, one way to do it in oracle would be to use the nvl2 function, like so:
select ... where (nvl2(col1,0,1)+nvl2(col2,0,1)+...+nvl2(col100,0,1) > 59)
A quick test of this idea:
select 'dummy' from dual where nvl2('somevalue',0,1) + nvl2(null,0,1) > 1
Returns 0 rows while:
select 'dummy' from dual where nvl2(null,0,1) + nvl2(null,0,1) > 1
Returns 1 row as expected since more than one of the columns are null.

It would help to know which db you are using and perhaps which language or db framework if using one.
This should work though on any database.
Something like this would probably be a good stored procedure, since there are no input parameters for it.
select count(*) from table where col1 is null or col2 is null ...

Here is another method that seems to me to be logical as well (use Netezza or TSQL)
SELECT KeyColumn, MAX(NVL2(TEST_COLUMN,1,0) AS TEST_COLUMN
FROM TABLE1
GROUP BY KeyColumn
So every TEST_COLUMN that has MAX value of 0 is a column that contains all nulls for the record set. The function NVL2 is saying if the column data is not null return a 1, but if it is null then return a 0.
Taking the MAX of that column will reveal if any of the rows are not null. A value of 1 means that there is at least 1 row that has data. Zero (0) means that each row is null.

I use the below query when i have to check for multiple columns NULL. I hope this is helpful . If the SUM comes to a value other than Zero , then you have NULL in that column
select SUM (CASE WHEN col1 is null then 1 else 0 end) as null_col1,
SUM (CASE WHEN col2 is null then 1 else 0 end) as null_col2,
SUM (CASE WHEN col3 is null then 1 else 0 end) as null_col3, ....
.
.
.
from tablename

you can use
select NUM_NULLS , COLUMN_NAME from all_tab_cols where table_name = 'ABC' and COLUMN_NAME in ('PQR','XYZ');

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL omit columns with all same value - sql

Is there a way to write a SQL query to omit columns that have all of the same values? For example, row A B 1 9 0 2 7 0 3 5 0 4 2 0 I'd like to return just row A 1 9 2 7 3 5 4 2

Related

How to use SQL column name only if the column exists?

How to UPDATE all columns of a record without having to list every column

Export data from a non-normalized database

Pass EXEC statement to APPLY as a parameter

SQL Query to check if 40 columns in table is null

Categories

Resources