Find the non null columns in SQL Server in a table - sql

I have read many answers but they are all for PL/SQL or Oracle, I could not find anything for Microsoft SQL-Server.
My table :
CREATE TABLE StudentScore
(
Student_ID INT PRIMARY KEY,
Student_Name NVARCHAR (50),
Student_Score INT
)
GO
INSERT INTO StudentScore VALUES (1,'Ali', NULL)
INSERT INTO StudentScore VALUES (2,'Zaid', 770)
INSERT INTO StudentScore VALUES (3,'Mohd', 1140)
INSERT INTO StudentScore VALUES (4,NULL, 770)
INSERT INTO StudentScore VALUES (5,'John', 1240)
INSERT INTO StudentScore VALUES (6,'Mike', 1140)
INSERT INTO StudentScore VALUES (7,'Goerge', NULL)
How to find the names of all the non-null columns.
Return table containing only non null columns
EDIT based on comments:
I am aware of IS_NULLABLE attribute of Information_schema . But just because a column allows null values does not mean it will actually have null values. How to find out columns which actually have null values.
I am looking for some num_nulls equivalent for microsoft SQL-SERVER.

You could achieve it by issuing:
SELECT
FORMATMESSAGE('SELECT col = ''%s.%s.%s'' FROM %s.%s HAVING COUNT(*) != COUNT(%s)',
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME)
)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES';
db<>fiddle demo
It will generate script for checking individual column.
HAVING COUNT(*) != COUNT(col_name) -- it means that column contains at least single NULL
HAVING COUNT(col_name) = 0 AND COUNT(*) != 0 -- it means all values in columns are NULL
This approach could be polished with using STRING_AGG to get single query per table and with dynamic SQL you could avoid the need of copying the query.
EDIT:
Fully baked-solution:
DECLARE #sql NVARCHAR(MAX);
SELECT #sql = STRING_AGG(
FORMATMESSAGE('SELECT table_schema = ''%s''
,table_name = ''%s''
,table_col_name = ''%s''
,row_num = COUNT(*)
,row_num_non_nulls = COUNT(%s)
,row_num_nulls = COUNT(*) - COUNT(%s)
FROM %s.%s',
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(COLUMN_NAME),
QUOTENAME(TABLE_SCHEMA),
QUOTENAME(TABLE_NAME),
QUOTENAME(COLUMN_NAME)), ' UNION ALL' + CHAR(13)
) WITHIN GROUP(ORDER BY TABLE_SCHEMA, TABLE_NAME)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
AND TABLE_NAME = ? -- filter by table name
AND TABLE_SCHEMA = ?; -- filter by schema name
SELECT #sql;
EXEC(#sql);
db<>fiddle demo
Output:
+---------------+-----------------+------------------+----------+--------------------+---------------+
| table_schema | table_name | table_col_name | row_num | row_num_non_nulls | row_num_nulls |
+---------------+-----------------+------------------+----------+--------------------+---------------+
| [dbo] | [StudentScore] | [Student_Name] | 7 | 6 | 1 |
| [dbo] | [StudentScore] | [Student_Score] | 7 | 5 | 2 |
+---------------+-----------------+------------------+----------+--------------------+---------------+

Perhaps you want to look at INFORMATION_SCHEMA.COLUMNS. The column IS_NULLABLE provides this information.
Note that the INFORMATION_SCHEMA tables (well, they are really views) are Standard SQL, so this information is available in most database. Oracle has not (yet?) adopted them.

Related

Update columns in multiple tables by names pulled from a temporary table

I have a temp table where various table names and connected column names are stored. If I were to run a simple SELECT on it the results would look something like this:
----------------
TableName | ColumnName
------------------
Users | RoleId
Tables | OwnerId
Chairs | MakerId
etc...
I'm looking for a way to set mentioned column values in the connected tables to NULL.
I know how to do it via a CURSOR or a WHILE loop by processing each row individually but I'm trying to eliminate these performance hoarders from my stored procedures.
Is there any way to build a join by table names from the TableName column to the actual tables to then set connected ColumnName column values to NULL ?
Check this Script-
IF OBJECT_ID('SampleTable') IS NOT NULL
DROP TABLE SampleTable
CREATE TABLE SampleTable
(
Table_Name VARCHAR(50) NOT NULL,
Column_Name VARCHAR(50) NOT NULL
)
GO
INSERT INTO SampleTable
VALUES
('Users','RoleId'),('Tables','OwnerId'),('Chairs','MakerId') --Give your Combo here
GO
--Check this scripts
SELECT 'UPDATE ' + QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(S1.TABLE_NAME) +
' SET ' + QUOTENAME(S1.COLUMN_NAME) + ' = NULL ; '
AS [Dynamic_Scripts]
FROM SampleTable S JOIN INFORMATION_SCHEMA.COLUMNS S1 ON s.Table_Name=s1.Table_Name and s.Column_Name=s1.Column_Name
--Check this scripts (multiple column single script; 1 table 'n' column - 1 update query)
SELECT 'UPDATE ' + CONCAT('[',TABLE_SCHEMA,'].[',S1.TABLE_NAME,'] SET ') + STRING_AGG(CONCAT('[',S1.COLUMN_NAME,']=NULL'),',') + ' ; ' AS [Dynamic_Scripts]
FROM SampleTable S JOIN INFORMATION_SCHEMA.COLUMNS S1 ON s.Table_Name=s1.Table_Name and s.Column_Name=s1.Column_Name
GROUP BY CONCAT('[',TABLE_SCHEMA,'].[',S1.TABLE_NAME,'] SET ')
Try this,
IF OBJECT_ID('SampleTable') IS NOT NULL
DROP TABLE SampleTable
CREATE TABLE SampleTable
(
Table_Name VARCHAR(50) NOT NULL,
Column_Name VARCHAR(50) NOT NULL
)
GO
INSERT INTO SampleTable
VALUES
('Users','RoleId'),('Tables','OwnerId'),('Chairs','MakerId')
,('Users','Appid'),('Tables','Column') --Give your Combo here
GO
declare #Sql nvarchar(1000)=''
;with CTE as
(
select QUOTENAME(a.Table_Name)Table_Name
,stuff((select ','+QUOTENAME(Column_Name),'=null'
from SampleTable B
where a.Table_Name=b.Table_Name for xml path('') ),1,1,'')UpdateCol
from SampleTable A
group by a.Table_Name
)
select #Sql=coalesce(#Sql+char(13)+char(10)+SingleUpdate,SingleUpdate)
from
(
select CONCAT('Update ',Table_Name,' ','SET ',UpdateCol)SingleUpdate
from cte
)t4
print #Sql
select #Sql
Execute sp_executeSql #Sql

Merging records based on condition in bigquery

I have multiple rows for members and want to merge them based on the values of two columns by giving priority to the value 'Yes'.
Name | Status1 | Status2
Jon | Yes | No
Jon | No | Yes
I want the query to return
Name | Status1 | Status2
Jon | Yes | Yes
So, if the column has Yes even once, it has to assign Yes for the person and No otherwise.
Below is for BigQuery Standard SQL
#standardSQL
SELECT Name, MAX(Status1) AS Status1, MAX(Status2) AS Status2
FROM `project.dataset.table`
GROUP BY Name
You can test, play with it using sample data
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Jon' Name, 'Yes' Status1, 'No' Status2 UNION ALL
SELECT 'Jon', 'No', 'Yes'
)
SELECT Name, MAX(Status1) AS Status1, MAX(Status2) AS Status2
FROM `project.dataset.table`
GROUP BY Name
with result
Row Name Status1 Status2
1 Jon Yes Yes
In addition to Mikhail's answer, I am adding another solution with MsSQL. Syntax may be different but the logic would be similar:
create table test
(id int , name1 varchar(10), name2 varchar(10))
insert into test values (1,'yes','no')
insert into test values (2,'no','no')
insert into test values (3,'yes','yes')
declare #searchKey varchar(10) = 'yes'
declare #cols varchar(255) = (SELECT STUFF((
SELECT ', ' + c.name
FROM sys.columns c
JOIN sys.types AS t ON c.user_type_id=t.user_type_id
WHERE t.name != 'int' AND t.name != 'bit' AND t.name !='date' AND t.name !='datetime'
AND object_id =(SELECT object_id FROM sys.tables WHERE name='test')
FOR XML PATH('')),1,2,''))
declare #sql nvarchar(max) = 'SELECT * from test where '''+#searchKey+''' in ('+#cols+')'
exec sp_executesql #sql
Edit: Please note that this solution checks all the columns of a table if a specific value is included by any column. Assume the OP needs to check 100 columns, until status100, then I believe a dynamic solution like that would be more handy.

Identify distinct codes present in one column but not present in another column

I have data like the following table. First two columns are list of country codes with pipe separator. There are two group of rows with RANK as 1 and 2.
I am trying to identify the country codes which are present in CountryList1 but not present in the column CountryList1 over a give RANK. For Rank 1 rows, HN JP SK and KY is present in CountryList1 but not present in CountryList2. Likewise, for Rank 2 rows. HN is present in CountryList1 but not present in CountryList2.
I am expecting Output like second table. I do not want to use a function or a procedure but trying to accomplish it using select statement.
Input
CountryList1 || CountryList2 || RANK
================||==============||=======
HN|IN|US || GB|CA|CH|CA || 1
JP|CH || IN|US|LU || 1
HN|SK|KY || GB|CA || 1
FI || IN|MO || 1
HN|IN|US || HN || 2
JP|CH || CH|IN|US || 2
HN || NO || 2
Output
DistinctCountry1 || RAN
====================||========
HN || 1
JP || 1
SK || 1
KY || 1
JP || 2
You have an abominable data structure. You should be storing elements of a list as separate values on rows. But you can do something by splitting the values. SQL Server 2016 has string_split(). For earlier versions you can find one on the web.
with tc as (
select t.*, s.country1
from t cross apply
(string_split(t.countrylist1, '|') s(country1)
)
select distinct t.country1, t.rnk
from tc
where not exists (select 1
from t t2
where tc.rnk = t2.rnk and
tc.country in (select value from string_split(t2.country_list))
);
This will not be efficient. And with the data structure you have, there is little scope for improving performance.
Here's a nice loop you can use for this:
declare #holding table (country1 varchar(max), country2 varchar(max), rank int)
declare #iterator int=1
declare #countrylistoriginal1 varchar(max)
declare #countrylistoriginal2 varchar(max)
declare #countrylist1 varchar(max)
declare #countrylist2 varchar(max)
declare #rank int
while #iterator<=(select max(rowid) from #temp2)
begin
select #countrylistoriginal1=countrylist1+'|', #rank=[rank]
from yourtable where rowid=#iterator
while #countrylistoriginal1<>''
begin
set #countrylist1=left(#countrylistoriginal1,(charindex('|',#countrylistoriginal1)))
set #countrylistoriginal1=replace(#countrylistoriginal1, #countrylist1,'')
select #countrylistoriginal2=countrylist2+'|'
from yourtable where rowid=#iterator
while #countrylistoriginal2<>''
begin
set #countrylist2=left(#countrylistoriginal2,(charindex('|',#countrylistoriginal2)))
set #countrylistoriginal2=replace(#countrylistoriginal2, #countrylist2,'')
insert #holding
select replace(#countrylist1,'|',''), replace(#countrylist2,'|',''), #rank
end
end
set #iterator=#iterator+1
end
select distinct a.country1, a.rank from #holding a
left join #holding b on a.country1=b.country2 and a.rank=b.rank where b.country2 is null
Try this...
Table Schema and data
CREATE TABLE [tableName](
[CountryList1] [nvarchar](50) NULL,
[CountryList2] [nvarchar](50) NULL,
[RANK] [int] NULL
)
INSERT [tableName] ([CountryList1], [CountryList2], [RANK]) VALUES (N'HN|IN|US', N'GB|CA|CH|CA', 1)
INSERT [tableName] ([CountryList1], [CountryList2], [RANK]) VALUES (N'JP|CH ', N'IN|US|LU', 1)
INSERT [tableName] ([CountryList1], [CountryList2], [RANK]) VALUES (N'HN|SK|KY', N'GB|CA', 1)
INSERT [tableName] ([CountryList1], [CountryList2], [RANK]) VALUES (N'FI', N'IN|MO', 1)
INSERT [tableName] ([CountryList1], [CountryList2], [RANK]) VALUES (N'HN|IN|US', N'HN ', 2)
INSERT [tableName] ([CountryList1], [CountryList2], [RANK]) VALUES (N'JP|CH', N'CH|IN|US', 2)
INSERT [tableName] ([CountryList1], [CountryList2], [RANK]) VALUES (N'HN', N'NO', 2)
SQL Query
;WITH cte AS
( SELECT DISTINCT *
FROM (SELECT [value] AS DistinctCountry1,
[rank],
Rtrim(Ltrim([value])) + Cast([rank] AS NVARCHAR(max)) AS colX
FROM tablename
CROSS apply String_split([countrylist1], '|')) tmp
WHERE colx NOT IN (SELECT Rtrim(Ltrim([value])) + Cast([rank] AS NVARCHAR(max)) AS colX
FROM tablename
CROSS apply String_split([countrylist2], '|'))
)
SELECT [distinctcountry1], [rank]
FROM cte
ORDER BY [rank]
Output
+------------------+------+
| distinctcountry1 | rank |
+------------------+------+
| FI | 1 |
| HN | 1 |
| JP | 1 |
| KY | 1 |
| SK | 1 |
| JP | 2 |
+------------------+------+
Demo: http://www.sqlfiddle.com/#!18/19acb/2/0
Note: As others already suggested, you should really consider fixing your table or you'll have to put extra hours when manipulating data.

Using column value as column name in subquery

I'm working with a legacy DB that has a table that houses field names from other tables.
So I have this structure:
Field_ID | Field_Name
*********************
1 | Col1
2 | Col2
3 | Col3
4 | Col4
and I need to pull a list of this field metadata along with the values of that field for a given user. So I need:
Field_ID | Field_Name | Value
1 | Col1 | ValueOfCol1onADiffTable
2 | Col2 | ValueOfCol2onADiffTable
3 | Col3 | ValueOfCol3onADiffTable
4 | Col4 | ValueOfCol4onADiffTable
I'd like to use the Field_Name in a subquery to pull that value, but can't figure out how to get SQL to evaluate Field_Name as a column in the sub-query.
So something like this:
select
Field_ID
,Field_Name
,(SELECT f.Field_Name from tblUsers u
where u.User_ID = #userId) as value
from
dbo.tblFields f
But that just returns Field_Name in the values column, not the value of it.
Do I need to put the sub-query in a separate function and evaluate that? Or some kind of dynamic SQL?
In SQL server this would require dynamic SQL and UNPIVOT notation.
see working demo
create table tblFields (Field_ID int ,Field_Name varchar(10));
insert into tblFields values
(1,'Col1')
,(2,'Col2')
,(3,'Col3')
,(4,'Col4');
declare #userId int
set #userId=1
create table tblUsers (User_ID int, col1 varchar(10),col2 varchar(10));
insert into tblUsers values
(1, 10,100),
(2,20,200);
declare #collist varchar(max)
declare #sqlquery varchar(max)
select #collist= COALESCE(#collist + ', ', '') + Field_Name
from dbo.tblFields
where exists (
select * from sys.columns c join sys.tables t
on c.object_id=t.object_id and t.name='tblUsers'
and c.name =Field_Name)
select #sqlquery=
' select Field_ID ,Field_Name, value '+
' from dbo.tblFields f Join '+
' ( select * from '+
'( select * '+
' from tblUsers u '+
' where u.User_ID = '+ cast(#userId as varchar(max)) +
' ) src '+
'unpivot ( Value for field in ('+ #collist+')) up )t'+
' on t.field =Field_Name'
exec(#sqlquery)

Append "_Repeat" to Ambiguous column names

I have a query that joins a table back onto itself in order to display orders that generated a repeat within a certain window.
The table returns something like the following:
id | value | note | id | value | note
------------------------------------------------------
01 | abcde | .... | 03 | zyxxx | ....
06 | 12345 | .... | 09 | 54321 | ....
In actuality, the table returns over 150 columns, so when the join occurs, I end up with 300 columns. I end up having to manually rename 150 columns to "id_Repeat","value_Repeat","note_Repeat" etc...
I'm looking for some way of automatically appending "_Repeat" to the ambiguous columns. Is this possible in T-SQL, (Using SQL Server 2008) or will I have to manually map out each column using:
SELECT [value] AS [value_Repeat]
The only way I can see this working is to construct some dynamic SQL (ugh!). I put together a quick example of how this might work:
CREATE TABLE test1 (id INT, note VARCHAR(50));
CREATE TABLE test2 (id INT, note VARCHAR(20));
INSERT INTO test1 SELECT 1, 'hello';
INSERT INTO test2 SELECT 1, 'world';
DECLARE #SQL VARCHAR(4096);
SELECT #SQL = 'SELECT ';
SELECT #SQL = #SQL + t.name + '.' + c.name + CASE WHEN t.name LIKE '%test2%' THEN ' AS ' + c.name + '_repeat' ELSE '' END + ','
FROM sys.columns c INNER JOIN sys.tables t ON t.object_id = c.object_id WHERE t.name IN ('test1', 'test2');
SELECT #SQL = LEFT(#SQL, LEN(#SQL) - 1);
SELECT #SQL = #SQL + ' FROM test1 INNER JOIN test2 ON test1.id = test2.id;';
EXEC(#SQL);
SELECT #SQL;
DROP TABLE test1;
DROP TABLE test2;
Output is:
id note id_repeat note_repeat
1 hello 1 world
This isn't possible in T-SQL. A column will have the name it had in its source table, or any alias name you specify, but there is no way to systematically rename them.
For cases like this, it pays off to take it one level higher: write some code (using sys.columns) that generates the query you're after, including renames. Why do something manually for 150 columns when you have a computer at your disposal?