SQL apply join dynamically based on column names mentioned in other table - sql

I have table for example temp
Id Col1 Col2 Col3
1 1 2 3
and I have another table joininfo
Id SourceKey Table TargetKey
1 Col1 A ColA
2 Col2 B ColB
3 Col3 C ColC
I want to generate a query which will add inner join clause dynamically and will look like this
SELECT * FROM temp
INNER JOIN A ON Col1=ColA
INNER JOIN B ON Col2=ColB
INNER JOIN C ON Col3=ColC
Any help?

Can't do it.
SQL needs to know this stuff at query compile time, before looking at any data, so it can validate security and check for possible indexes. The only query element comes close to looking at data as if it were a column after query compile time is the PIVOT keyword.
Otherwise, you're down to a CASE expression listing every possible set of column compares, or writing dynamic SQL over multiple steps where you first execute a query to find what columns/joins you need, use those results to build a new query string, and then execute the string you just made.

As per the comment from Charlieface updating the answer using string_agg
declare
#dsql nvarchar(max)
select #dsql = 'select * from temp ' + string_agg(' join '+ Table + ' on ' + c ,' ')
from (
select Table , string_agg ( SourceKey + ' = ' + TargetKey,' and ') c
from table1
group by Table ) t
select #dsql
EXECUTE sp_executesql #dsql

You can build a dynamic SQL query quite neatly by using string aggregation:
Try to keep clear about which bits are static and which dynamic. And test the generated code by using PRINT #sql
DECLARE #sql nvarchar(max) = N'
SELECT *
FROM temp s
' +
(
SELECT STRING_AGG(CAST(
N'JOIN ' + QUOTENAME(SCHEMA_NAME(t.schema_id)) + N'.' + QUOTENAME(t.name) + N' AS T' + CAST(t.object_id AS nvarchar(10)) + N'
ON s.' + QUOTENAME(j.SourceKey) + N' = T' + CAST(t.object_id AS nvarchar(10)) + N'.' + QUOTENAME(c.name)
AS nvarchar(max)), N'
') WITHIN GROUP (ORDER BY j.Id)
FROM sys.tables t
JOIN sys.columns c ON c.object_id = t.object_id
JOIN joininfo j ON OBJECT_ID(j.[Table]) = t.object_id
AND j.TargetKey = c.name
);
PRINT #sql; -- for testing
EXECUTE sp_executesql #sql;
If your version of SQL Server does not support STRING_AGG you can use FOR XML PATH('') instead.

Related

SQL Script for creating test sample data from source table

I've created the script below to be able to quickly create a minimal reproducible example for other questions in general.
This script uses an original table and generates the following PRINT statements:
DROP and CREATE a temp table with structure matching the original table
INSERT INTO statement using examples from the actual data
I can just add the original table name into the variable listed, along with the number of sample records required from the table. When I run it, it generates all of the statements needed in the Messages window in SSMS. Then I can just copy and paste those statements into my posted questions, so those answering have something to work with.
I know that you can get similar results in SSMS through Tasks>Generate Scripts, but this gets things down to the minimal amount of code that's useful for posting here without all of the unnecessary info that SSMS generates automatically. It's just a quick way to create a reproduced version of a simple table with actual sample data in it.
Unfortunately the one scenario that doesn't work is if I run it on very wide tables. It seems to fail on the last STRING_AGG() query where it's building the VALUES portion of the INSERT. When it runs on wide tables, it returns NULL.
Any suggestions to correct this?
EDIT: I figured out the issue I was having with UNIQUEIDENTIFIER columns and revised the query below. Also included an initial check to make sure the table actually exists.
/* ---------------------------------------
-- For creating minimal reproducible examples
-- based on original table and data,
-- builds the following statements
-- -- CREATE temp table with structure matching original table
-- -- INSERT statements based on actual data
--
-- Note: May not work for very wide tables due to limitations on
-- PRINT statements
*/ ---------------------------------------
DECLARE #tableName NVARCHAR(MAX) = 'testTable', -- original table name HERE
#recordCount INT = 5, -- top number of records to insert to temp table
#buildStmt NVARCHAR(MAX),
#insertStmt NVARCHAR(MAX),
#valuesStmt NVARCHAR(MAX),
#insertCol NVARCHAR(MAX),
#strAgg NVARCHAR(MAX),
#insertOutput NVARCHAR(MAX)
IF (EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = #tableName))
BEGIN
-- build DROP and CREATE statements for temp table from original table
SET #buildStmt = 'IF OBJECT_ID(''tempdb..#' + #tableName + ''') IS NOT NULL DROP TABLE #' + #tableName + CHAR(10) + CHAR(10) +
'CREATE TABLE #' + #tableName + ' (' + CHAR(10)
SELECT #buildStmt = #buildStmt + ' ' + C.[Name] + ' ' +
T.[Name] +
CASE WHEN T.[Name] IN ('varchar','varchar','char','nchar') THEN '(' + CAST(C.[Length] AS VARCHAR) + ') ' ELSE ' ' END +
'NULL,' + CHAR(10)
FROM sysobjects O
JOIN syscolumns C ON C.id = O.id
JOIN systypes T ON T.xusertype = C.xusertype
WHERE O.[name] = #TableName
ORDER BY C.ColID
SET #buildStmt = SUBSTRING(#buildStmt,1,LEN(#buildStmt) - 2) + CHAR(10) + ')' + CHAR(10)
PRINT #buildStmt
-- build INSERT INTO statement from original table
SELECT #insertStmt = 'INSERT INTO #' + #tableName + ' (' +
STUFF ((
SELECT ', [' + C.[Name] + ']'
FROM sysobjects O
JOIN syscolumns C ON C.id = O.id
WHERE O.[name] = #TableName
ORDER BY C.ColID
FOR XML PATH('')), 1, 1, '')
+')'
PRINT #insertStmt
-- build VALUES portion of INSERT from data in original table
SELECT #insertCol = STUFF ((
SELECT '''''''''+CONVERT(NVARCHAR(200),' +
'[' + C.[Name] + ']' +
')+'''''',''+'
FROM sysobjects O
JOIN syscolumns C ON C.id = O.id
JOIN systypes T ON T.xusertype = C.xusertype
WHERE O.[name] = #TableName
ORDER BY C.ColID
FOR XML PATH('')), 1, 1, '')
SET #insertCol = SUBSTRING(#insertCol,1,LEN(#insertCol) - 1)
SELECT #strAgg = ';WITH CTE AS (SELECT TOP(' + CONVERT(VARCHAR,#recordCount) + ') * FROM ' + #tableName + ') ' +
' SELECT #valuesStmt = STRING_AGG(CAST(''' + #insertCol + ' AS NVARCHAR(MAX)),''), ('') ' +
' FROM CTE'
EXEC sp_executesql #strAgg,N'#valuesStmt NVARCHAR(MAX) OUTPUT', #valuesStmt OUTPUT
PRINT 'VALUES (' +REPLACE(SUBSTRING(#valuesStmt,1,LEN(#valuesStmt) - 1),',)',')') + ')'
END
ELSE
BEGIN
PRINT 'Table does NOT exist'
END

Counting the number of rows for all columns in views within SQL database

I am hoping someone could help or provide insight. I am working on automating the way we audit our database views. The goal is to quickly see which columns are empty in which views. I started to work on a script but ran into different issues. All of the examples with this script run, but produce results that I did not intended. Right now some column headers have titles with a space in between.
Since I am fairly new to this, I am unsure what I am doing wrong or even how to obtain the results I am looking for.
Thank you in advance for your help!
Example 1 - Runs but produces Nulls
Example 2 - Runs but produces 0s,
Example 3 – Runs and produces counts for the entire view rather than
for the column itself (i.e. it gave the max amount of rows in the
view).
------------------------------------
-- EXAMPLE 1:
-- Did not produce the results expected.
-- Issue is it returns all as NULL
---------------------------------------\
DROP TABLE IF EXISTS #tempColumnCount;
CREATE TABLE #tempColumnCount
(
Name VARCHAR(100),
Row_Count INT
);
Declare #SQL VARCHAR(MAX)
SET #SQL = ''
SELECT #SQL = #SQL + 'INSERT INTO #tempColumnCount SELECT ''' + c.name + ''' as Name, SUM (CASE WHEN ''' + c.name +
''' IS NULL THEN 1 END) FROM ' + schema_name(v.schema_id) + '.' + OBJECT_NAME(c.object_id) +
CHAR(13)
FROM sys.columns c
JOIN sys.views v
ON v.object_id = c.object_id AND SCHEMA_NAME(v.schema_id)='analytics'
exec (#SQL)
SELECT Name, Row_Count
FROM #tempColumnCount
---------------------------------------\
-- Example 2: Issue is it returns all as 0
---------------------------------------\
DROP TABLE IF EXISTS #tempColumnCount;
CREATE TABLE #tempColumnCount
(
Name VARCHAR(100),
Row_Count INT
);
Declare #SQL VARCHAR(MAX)
SET #SQL = ''
SELECT #SQL = #SQL + 'INSERT INTO #tempColumnCount SELECT ''' + c.name + ''' as Name, COUNT(1) - COUNT(''' + c.name +
''' ) FROM ' + schema_name(v.schema_id) + '.' + OBJECT_NAME(c.object_id) +
CHAR(13)
FROM sys.columns c
JOIN sys.views v
ON v.object_id = c.object_id AND SCHEMA_NAME(v.schema_id)='analytics'
exec (#SQL)
SELECT Name, Row_Count
FROM #tempColumnCount
---------------------------------------\
-- Example 3: Issue is it returns the the max amount of rows for each column. Which just means it returns the row count for the whole view and not the column it self.
---------------------------------------\
DROP TABLE IF EXISTS #tempColumnCount;
CREATE TABLE #tempColumnCount
(
Name VARCHAR(100),
Row_Count INT
);
Declare #SQL VARCHAR(MAX)
SET #SQL = ''
SELECT #SQL = #SQL + 'INSERT INTO #tempColumnCount SELECT ''' + c.name + ''' as Name, COUNT(''' + c.name +
''' ) FROM ' + schema_name(v.schema_id) + '.' + OBJECT_NAME(c.object_id) +
CHAR(13)
FROM sys.columns c
JOIN sys.views v
ON v.object_id = c.object_id AND SCHEMA_NAME(v.schema_id)='analytics'
exec (#SQL)
SELECT Name, Row_Count
FROM #tempColumnCount

Calculate amount of disk space used by results of a T-SQL query

I am comparing SQL Server with another database technology, and in order to compare disk requirements I would like to calculate the amount of disk used by only the elements returned by a SELECT query that JOINs over a few tables with a simple WHERE clause, rather than for a whole table or database. This is so that I don't have to go to the trouble of loading an entire table or database worth of data into the other database.
I have a query like the following that I'd like to do this for:
SELECT * FROM a
INNER JOIN b ON a.id = b.id
INNER JOIN c ON a.id = c.id
INNER JOIN d ON a.id = d.id1 OR a.id = d.id2
WHERE a.id = 390330
I haven't been able to find any advice online, other than how to find the disk space used by an entire table or database. Is there anything built into SQL Server that can help me, or will I need to calculate this by hand?
An estimate for the storage needed to store a subset of rows can be calculated by collecting the sizes of the rows. Although this is infeasible to do in a generic fashion for an arbitrary query with several joins, for specifically the query in the question you can build up a query that will calculate the size of all of the rows included in the query like so (based on code from https://www.c-sharpcorner.com/blogs/calculate-row-size-of-a-table-in-sqlserver1):
declare #sql nvarchar(max);
set #sql = '
SELECT COALESCE((
--table a
SELECT SUM(0'
select #sql = #sql + ' + isnull(datalength(' + name + '), 1)'
from sys.columns where object_id = object_id('a')
set #sql = #sql + ')
FROM a
WHERE a.id = 390330
), 0) + COALESCE((
-- table b
SELECT SUM(0'
select #sql = #sql + ' + isnull(datalength(' + name + '), 1)'
from sys.columns where object_id = object_id('b')
set #sql = #sql + ')
FROM b
WHERE b.id = 390330
), 0) + COALESCE((
-- table c
SELECT SUM(0'
select #sql = #sql + ' + isnull(datalength(' + name + '), 1)'
from sys.columns where object_id = object_id('c')
set #sql = #sql + ')
FROM c
WHERE c.id = 390330
), 0) + COALESCE((
-- table d
SELECT SUM(0'
select #sql = #sql + ' + isnull(datalength(' + name + '), 1)'
from sys.columns where object_id = object_id('d')
set #sql = #sql + ')
FROM d
WHERE d.id1 = 390330 OR d.id2 = 390330
), 0);
exec (#sql);

How to dynamically calculate the sums of many columns in a GROUP?

In the table below, I have a variable number of columns, and that number is in the 1000s. I need to sum all the values of each of the 1000 columns grouped by the person's name. So, smith's total test_score_1, total test_score_2,...total test_score_1000. And then Jackson's total test_score_1, total test_score_2,...total test_score_1000.
I don't know the number of 'test_score_n' columns beforehand and they are always changing.
So given this table:
name test_score_1 test_score_2 ... test_score_1000
smith 2 1 0
jackson 0 3 1
jackson 1 1 2
jackson 3 0 3
smith 4 5 1
How can I produce the table below?
name test_score_1 test_score_2 ... test_score_1000
smith 6 6 1
jackson 4 4 6
SQL to generate the SQL
DECLARE #generatedSQL nvarchar(max);
SET #generatedSQL = (
SELECT
'SELECT ' +
SUBSTRING(X.foo, 2, 2000) +
'FROM ' +
QUOTENAME(SCHEMA_NAME(t.schema_id)) + '.' + QUOTENAME(t.name) +
' GROUP BY name' --fix this line , edited
FROM
sys.tables t
CROSS APPLY
(
SELECT
', SUM(' + QUOTENAME(c.name) + ')'
FROM
sys.columns c
WHERE
c.object_id = t.object_id
AND
c.name <> 'Name'
FOR XML PATH('')
) X (foo)
WHERE
t.name = 'MyTable'
);
EXEC (#generatedSQL);
Demo: http://rextester.com/MAFCP19297
SQL
DECLARE #cols varchar(max), #sql varchar(max);
SELECT #cols =
COALESCE(#cols + ', ', '') + 'SUM(' + COLUMN_NAME + ') AS ' + COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = '<tbl name>'
AND COLUMN_NAME <> 'name'
-- The AND below may be optional - see "Additional Notes #1"
AND TABLE_CATALOG = '<database schema name>';
SET #sql = 'SELECT name, ' + #cols + ' FROM tbl GROUP BY name;';
EXEC (#sql);
Explanation
The DECLARE creates two variables - one for storing the column summing part of the SQL and the other for storing the whole dynamically created SQL statement to run.
The SELECT queries the INFORMATION_SCHEMA.COLUMNS system table to get the names of all the columns in tbl apart from the name column. (Alternatively the sys tables could be used - answers to this question discuss the relative merits of each). These row values are then converted into a single comma separated value using this method (which is arguably a little simpler than the alternative FOR XML PATH ('') method). The comma-separated values are a bit more than just the column names - they SUM over each column name and then assign the result with an alias of the same name.
The SET then builds a simple SQL statement that selects the name and all the summed values - e.g: SELECT name, SUM(test_score_1) AS test_score_1, SUM(test_score_2) AS test_score_2, SUM(test_score_1000) AS test_score_1000 FROM tbl GROUP BY name;.
The EXEC then runs the above query.
Additional Notes
If there is a possibility that the table name may not be unique across all databases then the following clause is needed in the select: AND TABLE_CATALOG = '<database schema name>'
My initial answer to this question was mistakenly using MySQL rather than SQL Server - this has now been corrected but the previous version is still in the edit history and might be helpful to someone...
Try this dynamic column generation Sql script
DECLARE #Sql nvarchar(max)
SET #Sql=( SELECT DISTINCT 'SELECT'+
STUFF((SELECT ', '+ ' SUM( '+ COLUMN_NAME +' ) AS '+ QUOTENAME( COLUMN_NAME )
FROM INFORMATION_SCHEMA.COLUMNS Where TABLE_NAME ='Tab1000'
FOR XML PATH (''),type).value('.','varchar(max)'),1,2,'')
+' From Tab1000'From INFORMATION_SCHEMA.COLUMNS Where TABLE_NAME ='Tab1000')
EXEC (#sql)
Try the below script
(set the #tableName= [yourTablename] and #nameColumn to the name of the field you want to group by)
Declare #tableName varchar(50)='totalscores'
Declare #nameColumn nvarchar(50)='name'
Declare #query as nvarchar(MAX) ;
select #query = 'select ' + nameColumn + cast(sumColumns as nvarchar(max)) + 'from ' + #tableName +' group by ' + nameColumn from (
select #nameColumn nameColumn, (SELECT
', SUM(' + QUOTENAME(c.name) + ') ' + QUOTENAME(c.name)
FROM
sys.columns c
WHERE
c.object_id=t.object_id and c.name != #nameColumn
order by c.name
FOR
XML path(''), type
) sumColumns
from sys.tables t where t.name= #tableName
)t
EXECUTE(#query)
Change tablename with your tablename.
Declare #query as nvarchar(MAX) = (SELECT
'SELECT name,' + SUBSTRING(tbl.col, 2, 2000) + ' FROM ' + QUOTENAME(SCHEMA_NAME(t.schema_id)) + '.' + QUOTENAME(t.name) + 'Group By name'
FROM
sys.tables t
CROSS APPLY
(
SELECT
', SUM(' + QUOTENAME(columns.name) + ') as ' + columns.name
FROM
sys.columns columns
WHERE
columns.object_id = t.object_id and columns.name != 'name'
FOR XML PATH('')
) tbl (col)
WHERE
t.name = 'tablename')
select #query EXECUTE(#query)
GBN's dynamic SQL would be my first choice (+1), and would be more performant. However, if you are interested in breaking this horrible cycle of a 1,000+ columns, consider the following:
Example
Declare #YourTable Table ([col 1] int,[col 2] int,[col 1000] varchar(50))
Insert Into #YourTable Values
(2,1,0)
,(4,5,1)
Select Item = replace(C.Item,'_x0020_', ' ')
,Value = sum(C.Value)
From #YourTable A
Cross Apply (Select XMLData= cast((Select A.* for XML RAW) as xml)) B
Cross Apply (
Select Item = a.value('local-name(.)','varchar(100)')
,Value = a.value('.','int')
From B.XMLData.nodes('/row') as C1(n)
Cross Apply C1.n.nodes('./#*') as C2(a)
Where a.value('local-name(.)','varchar(100)') not in ('Fields','ToExclude')
) C
Group By C.Item
Returns
Item Value
col 1 6
col 2 6
col 1000 1

Looping through column names with dynamic SQL

I just came up with an idea for a piece of code to show all the distinct values for each column, and count how many records for each. I want the code to loop through all columns.
Here's what I have so far... I'm new to SQL so bear with the noobness :)
Hard code:
select [Sales Manager], count(*)
from [BT].[dbo].[test]
group by [Sales Manager]
order by 2 desc
Attempt at dynamic SQL:
Declare #sql varchar(max),
#column as varchar(255)
set #column = '[Sales Manager]'
set #sql = 'select ' + #column + ',count(*) from [BT].[dbo].[test] group by ' + #column + 'order by 2 desc'
exec (#sql)
Both of these work fine. How can I make it loop through all columns? I don't mind if I have to hard code the column names and it works its way through subbing in each one for #column.
Does this make sense?
Thanks all!
You can use dynamic SQL and get all the column names for a table. Then build up the script:
Declare #sql varchar(max) = ''
declare #tablename as varchar(255) = 'test'
select #sql = #sql + 'select [' + c.name + '],count(*) as ''' + c.name + ''' from [' + t.name + '] group by [' + c.name + '] order by 2 desc; '
from sys.columns c
inner join sys.tables t on c.object_id = t.object_id
where t.name = #tablename
EXEC (#sql)
Change #tablename to the name of your table (without the database or schema name).
This is a bit of an XY answer, but if you don't mind hardcoding the column names, I suggest you do just that, and avoid dynamic SQL - and the loop - entirely. Dynamic SQL is generally considered the last resort, opens you up to security issues (SQL injection attacks) if not careful, and can often be slower if queries and execution plans cannot be cached.
If you have a ton of column names you can write a quick piece of code or mail merge in Word to do the substitution for you.
However, as far as how to get column names, assuming this is SQL Server, you can use the following query:
SELECT c.name
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
Therefore, you can build your dynamic SQL from this query:
SELECT 'select '
+ QUOTENAME(c.name)
+ ',count(*) from [BT].[dbo].[test] group by '
+ QUOTENAME(c.name)
+ 'order by 2 desc'
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
and loop using a cursor.
Or compile the whole thing together into one batch and execute. Here we use the FOR XML PATH('') trick:
DECLARE #sql VARCHAR(MAX) = (
SELECT ' select ' --note the extra space at the beginning
+ QUOTENAME(c.name)
+ ',count(*) from [BT].[dbo].[test] group by '
+ QUOTENAME(c.name)
+ 'order by 2 desc'
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.test')
FOR XML PATH('')
)
EXEC(#sql)
Note I am using the built-in QUOTENAME function to escape column names that need escaping.
You want to know the distinct coulmn values in all the columns of the table ? Just replace the table name Employee with your table name in the following code:
declare #SQL nvarchar(max)
set #SQL = ''
;with cols as (
select Table_Schema, Table_Name, Column_Name, Row_Number() over(partition by Table_Schema, Table_Name
order by ORDINAL_POSITION) as RowNum
from INFORMATION_SCHEMA.COLUMNS
)
select #SQL = #SQL + case when RowNum = 1 then '' else ' union all ' end
+ ' select ''' + Column_Name + ''' as Column_Name, count(distinct ' + quotename (Column_Name) + ' ) As DistinctCountValue,
count( '+ quotename (Column_Name) + ') as CountValue FROM ' + quotename (Table_Schema) + '.' + quotename (Table_Name)
from cols
where Table_Name = 'Employee' --print #SQL
execute (#SQL)