Dynamic SQL: Grouping by one variable, counting another for column names - sql

I am trying to do a dynamic sql query, similar to some that have appeared on this forum, but for the life of me, I cannot get it to work.
I am using SQL Server 2008. I have a table with a series of order_ref numbers. Each of these numbers has a varying number of advice_refs associated with it. advice_ref numbers are unique (they are a key from another table). There is at least one advice_ref for each order_ref. There are a bunch of columns that describe information for each advice_ref.
What I want to do is create a table with a row for each unique order_ref, with columns for each advice_ref, in ascending order. The columns would be Advice01, Advice02, ....Advice10, Advice11, etc. Not all the Advice# columns would be filled in for every order_ref and the number of advice# columns would depend on the order_ref with the greatest number of advice_refs.
The table would look like:
Order Advice01 Advice02 Advice03 Advice04.....
1 1 2 3
2 5 8 9 20
3 25
The code I've tried to use is:
DECLARE #SQL NVARCHAR(MAX)
DECLARE #PVT NVARCHAR(MAX)
SELECT #SQL = #SQL + ', COALESCE(' + QUOTENAME('Advice' + RowNum) + ', '''') AS ' + QUOTENAME('Advice' + RowNum),
#PVT = #PVT + ', ' + QUOTENAME('Advice' + RowNum)
FROM (SELECT case when RowNum2 < 10 then '0'+RowNum2 when RowNum2 >=10 then RowNum2 end [RowNum] From
( SELECT DISTINCT CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref)) [RowNum2]
FROM [ED_dups].[dbo].[NewEDDupsLongForm]
) rn2 ) rn
SET #SQL = 'SELECT order_ref' + #SQL + '
FROM ( SELECT order_ref,
advice_ref,
case when CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref)) < 10
then ''Advice0'' + CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref))
else ''Advice'' + CONVERT(VARCHAR, ROW_NUMBER() OVER(PARTITION BY order_ref ORDER BY advice_ref))
end [AdviceID]
FROM [ED_dups].[dbo].[NewEDDupsLongForm]
) data
PIVOT
( MAX(advice_ref)
FOR AdviceID IN (' + STUFF(#PVT, 1, 2, '') + ')
) pvt'
EXECUTE SP_EXECUTESQL #SQL
SQL server tells me that the query executed successfully, but there is no output. When I run snippets of the code, it seems that the problem either lies in the pivot statement, near
+ STUFF(#PVT, 1, 2, '') + ')
and/or in the select statement, near
''Advice0'' +
Thanks in advance for any help--I've been at this for days!

I think you have to initialize variables like
DECLARE #SQL NVARCHAR(MAX) = ''
DECLARE #PVT NVARCHAR(MAX) = ''
or
DECLARE #SQL NVARCHAR(MAX)
DECLARE #PVT NVARCHAR(MAX)
SELECT #SQL = '', #PVT = ''
Otherwise your #SQL would be null

fist thing that comes to my mind is - do you really need SQL to fetch you dataset with dynamic number of columns? If you are writting an application, then your user interface, being it a web page or desktop app form, would be much nicer place to transform your data into a desired structure.
If you really need to do so, you will make your life much easier when you will not try to do everything in one big and rather complicated query, but rather split it into smaller tasks done step by step. What I would do is to use temporary tables to store working results, then use cursors to process order by order and advice by advice while inserting my data into temporary table or tables, in the end return a content of this table. Wrap everything in a stored procedure.
This method will also allow you to debug it easier - you can check every single step if it has done what it was expected to do.
And final advice - share a definition of your NewEDDupsLongForm table - someone might write some code to help you out then.
cheers

Related

SQL Sort / Order By pivoted fields while COALESCE function

I have some rates for resources for all countries
The rows will be Resource IDs
Columns should be Country Codes
Challenge here, I cannot sort the Country Codes in ASC
It would be so grateful if you could help me on this.
When I query, I get the list of country codes, but not sorted. i.e., USA,BRA,ARG etc. But the expected result should be ARG,BRA,USA in columns of the pivot.
Here is my code:
DECLARE #idList nvarchar(MAX)
SELECT
#idList = COALESCE(#idList + ',', '') + CountryCodeISO3
FROM
(
SELECT
DISTINCT CountryCodeISO3
FROM
Published.RateCardsValues
WHERE
CardID = 55
) AS SRC
DECLARE #sqlToRun nvarchar(MAX)
SET
#sqlToRun = '
SELECT *
FROM (
SELECT
[ResourceCode]
,[TITLES]
,[MostRepresentativeTitle]
,[ABBR_RES_DESC]
,[TypicalJobGrade]
,[BidGridResourceCode]
,[OpUnit]
,[PSResType]
,[JobGradeORResCat]
,[CountryCodeISO3]
--,[CurrencyCode]
,[RateValue]
FROM
[Published].[RateCardsValues] rc
WHERE
CardID = 55) As src
PIVOT (
MAX(RateValue) FOR [CountryCodeISO3] IN (' + #idList + ')
) AS pvt'
EXEC (#sqlToRun)
As you have discovered, PIVOT in T-SQL requires you to know at development time what the values will be that you will be pivoting on.
This is limiting, because if you want something like "retrieve data for all the countries where Condition X is true, then pivot on their IDs!", you have to resort to dynamic SQL to do it.
If Condition X is constant -- I'm guessing that belonging to CardID = 55 doesn't change often -- you can look up the values, and hardcode them in your code.
If the CardID you're looking up is always 55 and you have relatively few countries in that category, I'd actually advise doing that.
But if your conditions for picking countries can change, or the number of columns you want can vary -- something like "all the countries where there were sales of product Y, for month Z!" -- then you can't predict them, which means that the T-SQL PIVOT can't be set up (without dynamic SQL.)
In that case, I'd strongly suggest that you have whatever app you plan to use the data in do the pivoting, not T-SQL. (SSRS and Excel can both do it themselves, and code can be written to do it in .NET langauges.) T-SQL, as you have seen, does not lend itself to dynamic pivoting.
What you have will "work" in the sense that it will execute without errors, but there's another downside, in the next stage of your app: not only will the number of columns potentially change over time, the names of the columns will change, as countries move in and out of Card ID 55. That may cause problems for whatever app or destination you have in mind for this data.
So, my two suggestions would be: either hard-code your country codes, or have the next stage in your app (whatever executes the query) do the actual pivoting.
You need to sort the columns while creating the dynamic SQL
Also:
Do not use variable coalescing, use STRING_AGG or FOR XML instead
Use QUOTENAME to escape the column names
sp_executesql allows you to pass parameters to the dynamic query
DECLARE #idList nvarchar(MAX)
SELECT
#idList = STRING_AGG(QUOTENAME(CountryCodeISO3), ',') WITHIN GROUP (ORDER BY CountryCodeISO3)
FROM
(
SELECT
DISTINCT CountryCodeISO3
FROM
Published.RateCardsValues
WHERE
CardID = 55
) AS SRC;
DECLARE #sqlToRun nvarchar(MAX);
SET
#sqlToRun = '
SELECT *
FROM (
SELECT
[ResourceCode]
,[TITLES]
,[MostRepresentativeTitle]
,[ABBR_RES_DESC]
,[TypicalJobGrade]
,[BidGridResourceCode]
,[OpUnit]
,[PSResType]
,[JobGradeORResCat]
,[CountryCodeISO3]
--,[CurrencyCode]
,[RateValue]
FROM
[Published].[RateCardsValues] rc
WHERE
CardID = 55) As src
PIVOT (
MAX(RateValue) FOR [CountryCodeISO3] IN (' + #idList + ')
) AS pvt'
EXEC sp_executesql #sqlToRun;
On earlier versions of SQL Server, you cannot use STRING_AGG. You need to hack it with FOR XML. You need to also use STUFF to strip off the first separator.
DECLARE #idList nvarchar(MAX)
DECLARE #separator nvarchar(20) = ',';
SET #idList =
STUFF(
(
SELECT
#sep + QUOTENAME(CountryCodeISO3)
FROM
Published.RateCardsValues
WHERE
CardID = 55
GROUP BY
CountryCodeISO3
ORDER BY
CountryCodeISO3
FOR XML PATH(''), TYPE
).value('text()[1]','nvarchar(max)'),
1, LEN(#separator), '')
;
DECLARE #sqlToRun nvarchar(MAX);
SET
#sqlToRun = '
SELECT *
FROM (
SELECT
[ResourceCode]
,[TITLES]
,[MostRepresentativeTitle]
,[ABBR_RES_DESC]
,[TypicalJobGrade]
,[BidGridResourceCode]
,[OpUnit]
,[PSResType]
,[JobGradeORResCat]
,[CountryCodeISO3]
--,[CurrencyCode]
,[RateValue]
FROM
[Published].[RateCardsValues] rc
WHERE
CardID = 55) As src
PIVOT (
MAX(RateValue) FOR [CountryCodeISO3] IN (' + #idList + ')
) AS pvt'
EXEC sp_executesql #sqlToRun;

Dynamic Pivot - SQL Server

I have a test SQL database the following query:
USE DataBase1
Select Data.MonthDate,
Data.AccountID,
Data.MonthID,
Data.Sales,
Data.AccountName
From Test1 as Data with(nolock)
That I need to pivot based off of the sales column. The problem is the months when I run this query will always change (though there will always be 4 of them) and they need to be ordered left-to-right/oldest-newest in the pivoted result based off of the MonthDate column. The initial return when the query is run looks like this:
And the final result needs to look like this:
I'm using Excel here to demonstrate and I highlighted the 0's because those are technically NULL values but I need them to come back as 0.
I'm using SQL Server Management Studio and the actual database I'll be running this against is over 200,000 rows.
Any thoughts?
Thanks,
Joshua
Use Dynamic Query.
DECLARE #col_list VARCHAR(max)='',
#sel_list VARCHAR(max)='',
#sql NVARCHAR(max)
SELECT DISTINCT #col_list += '[' + Isnull(MonthID, '') + '],'
FROM Test1
ORDER BY MonthID
SELECT #col_list = LEFT(#col_list, Len(#col_list) - 1)
SELECT DISTINCT #sel_list += 'Isnull([' + Isnull(MonthID, '') + '],0) ' + '['+ MonthID + '],'
FROM Test1
ORDER BY MonthID
SELECT #sel_list = LEFT(#sel_list, Len(#sel_list) - 1)
SET #sql ='select Data.AccountID,Data.AccountName,'+ #sel_list+ ' from (
Select
Data.AccountID,
Data.MonthID,
Data.Sales,
Data.AccountName
From Test1 as Data ) A
pivot (sum(Sales) for monthid in('+ #col_list + ')) piv'
--PRINT #sql
EXEC Sp_executesql #sql
Basically you need to dynamically build the PIVOT query and use sp_exec to run it.
SQL Server, out of the box, has no support for dynamic ever-changing columns as the columns need to be defined in the PIVOT query.
Here's an example of how to accomplish this: http://sqlhints.com/tag/dynamic-pivot-column-names/

SQL query to find duplicate rows, in any table

I'm looking for a schema-independent query. That is, if I have a users table or a purchases table, the query should be equally capable of catching duplicate rows in either table without any modification (other than the from clause, of course).
I'm using T-SQL, but I'm guessing there should be a general solution.
I believe that this should work for you. Keep in mind that CHECKSUM() isn't 100% perfect - it's theoretically possible to get a false positive here (I think), but otherwise you can just change the table name and this should work:
;WITH cte AS (
SELECT
*,
CHECKSUM(*) AS chksum,
ROW_NUMBER() OVER(ORDER BY GETDATE()) AS row_num
FROM
My_Table
)
SELECT
*
FROM
CTE T1
INNER JOIN CTE T2 ON
T2.chksum = T1.chksum AND
T2.row_num <> T1.row_num
The ROW_NUMBER() is needed so that you have some way of distinguishing rows. It requires an ORDER BY and that can't be a constant, so GETDATE() was my workaround for that.
Simply change the table name in the CTE and it should work without spelling out the columns.
I'm still confused about what "detecting them might be" but I'll give it a shot.
Excluding them is easy
e.g.
SELECT DISTINCT * FROM USERS
However if you wanted to only include them and a duplicate is all the fields than you have to do
SELECT
[Each and every field]
FROM
USERS
GROUP BY
[Each and every field]
HAVING COUNT(*) > 1
You can't get away with just using (*) because you can't GROUP BY *
so this requirement from your comments is difficult
a schema-independent means I don't want to specify all of the columns
in the query
Unless that is you want to use dynamic SQL and read the columns from sys.columns or information_schema.columns
For example
DECLARE #colunns nvarchar(max)
SET #colunns = ''
SELECT #colunns = #colunns + '[' + COLUMN_NAME +'], '
FROM INFORMATION_SCHEMA.columns
WHERE table_name = 'USERS'
SET #colunns = left(#colunns,len(#colunns ) - 1)
DECLARE #SQL nvarchar(max)
SET #SQL = 'SELECT ' + #colunns
+ 'FROM USERS' + 'GROUP BY '
+ #colunns
+ ' Having Count(*) > 1'
exec sp_executesql #SQL
Please note you should read this The Curse and Blessings of Dynamic SQL if you haven't already
I have done this using CTEs in SQL Server.
Here is a sample on how to delete dupes but you should be able to adapt it easily to find dupes:
WITH CTE (COl1, Col2, DuplicateCount)
AS
(
SELECT COl1,Col2,
ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount
FROM DuplicateRcordTable
)
DELETE
FROM CTE
WHERE DuplicateCount > 1
GO
Here is a link to an article where I got the SQL:
http://blog.sqlauthority.com/2009/06/23/sql-server-2005-2008-delete-duplicate-rows/
I recently was looking into the same issue and noticed this question.
I managed to solve it using a stored procedure with some dynamic SQL. This way you only need to specify the table name. And it will get all the other relevant data from sys tables.
/*
This SP returns all duplicate rows (1 line for each duplicate) for any given table.
to use the SP:
exec [database].[dbo].[sp_duplicates]
#table = '[database].[schema].[table]'
*/
create proc dbo.sp_duplicates #table nvarchar(50) as
declare #query nvarchar(max)
declare #groupby nvarchar(max)
set #groupby = stuff((select ',' + [name]
FROM sys.columns
WHERE object_id = OBJECT_ID(#table)
FOR xml path('')), 1, 1, '')
set #query = 'select *, count(*)
from '+#table+'
group by '+#groupby+'
having count(*) > 1'
exec (#query)

Can SQL Server Pivot without knowing the resulting column names?

I have a table that looks like this:
Month Site Val
2009-12 Microsoft 10
2009-11 Microsoft 12
2009-10 Microsoft 13
2009-12 Google 20
2009-11 Google 21
2009-10 Google 22
And I want to get a 2-dimension table that gives me the "Val" for each site's month, like:
Month Microsoft Google
2009-12 10 20
2009-11 12 21
2009-10 13 22
But the catch is, I don't know all the possible values that can be in "Site". If a new site appears, I want to automatically get a new column in my resulting table.
All the code samples I saw that could do this required me to hardcode "Microsoft and Google" in the query text.
I saw one that didn't, but it was basically faking it by listing the Sites and generating a query on the fly (concatting a string) that had those column names in it.
Isn't there a way to get SQL Server 2008 to do this without a hack like that?
NOTE: I need to be able to run this as a query that I send from ASP.Net, I can't do stored procedures or other stuff like that.
Thanks!
Daniel
The example you linked to uses dynamic SQL. Unfortunately, there is no other built-in method for pivoting in SQL Server when the output columns are not known in advance.
If the data is not too large, it's probably easiest to simply run a normal row query from ASP.NET and perform your pivot in the application code. If the data is very large, then you'll have to generate the SQL dynamically after first querying for the possible column values.
Note that you don't actually need to write a SQL statement that generates dynamic SQL; you can simply generating the SQL in ASP.NET, and that will most likely be much easier. Just don't forget to escape the distinct Site values before chucking them in a generated query, and don't forget to parameterize whatever parts of the SQL statement that you normally would without the pivot.
It's been more than 10 years, and the same problem came to me.
Is there any way to pivot without knowing column names?
Then I searched something and found the below solution. We can achieve this by using dynamic query. I am adding this so it will help someone.
CREATE TABLE TEMP
(
[Month] varchar(50),
[Site] varchar(50),
Val int
)
INSERT INTO TEMP
VALUES ('2009-12', 'Microsoft', 10),
('2009-11', 'Microsoft', 12),
('2009-10', 'Microsoft', 15),
('2009-12', 'Google', 20),
('2009-11', 'Google', 8),
('2009-10', 'Google', 11),
('2009-12', 'Facebook', 13),
('2009-11', 'Facebook', 12),
('2009-10', 'Facebook', 5)
DECLARE #Columns as VARCHAR(MAX)
SELECT #Columns = COALESCE(#Columns + ', ','') + QUOTENAME([Site])
FROM
(SELECT DISTINCT [Site] FROM TEMP) AS B
ORDER BY B.[Site]
DECLARE #SQL as VARCHAR(MAX)
SET #SQL = 'SELECT Month, ' + #Columns + '
FROM
(
select Month,[Site],Val from TEMP
) as PivotData
PIVOT
(
Sum(Val)
FOR [Site] IN (' + #Columns + ')
) AS PivotResult
ORDER BY Month'
EXEC(#SQL);
As you can see I took the column values into a string and then dynamically use that to pivot.
Here is the result:
If we take the answer of marc_s and put it into a procedure, we have this:
create procedure spPivot (
#DataSource varchar(max),
#Column1 varchar(100),
#PivotColumn varchar(100),
#AggregateColumn varchar(100),
#AgregateFunction varchar(20),
#Debug bit = 0) as
declare #SQL varchar(max) =
'DECLARE #Columns as VARCHAR(MAX)
SELECT #Columns = COALESCE(#Columns + '', '','''') + QUOTENAME({PivotColumn})
FROM (SELECT DISTINCT {PivotColumn} FROM {DataSourceA} ds) c
ORDER BY {PivotColumn}
DECLARE #SQL as VARCHAR(MAX)
SET #SQL = ''SELECT {Column1}, '' + #Columns + ''
FROM {DataSourceB} as PivotData
PIVOT (
{AgregateFunction}({AggregateColumn})
FOR {PivotColumn} IN ('' + #Columns + '')
) AS PivotResult
ORDER BY {Column1}''
EXEC(#SQL)'
if #DataSource like 'select %' begin
set #SQL = replace(#SQL, '{DataSourceA}', '(' + #DataSource + ')')
set #SQL = replace(#SQL, '{DataSourceB}', '(' + replace(#DataSource, '''', '''''') + ')')
end else begin
set #SQL = replace(#SQL, '{DataSourceA}', #DataSource)
set #SQL = replace(#SQL, '{DataSourceB}', #DataSource)
end
set #SQL = replace(#SQL, '{Column1}', #Column1)
set #SQL = replace(#SQL, '{PivotColumn}', #PivotColumn)
set #SQL = replace(#SQL, '{AggregateColumn}', #AggregateColumn)
set #SQL = replace(#SQL, '{AgregateFunction}', #AgregateFunction)
if #Debug = 1
print #SQL
else
exec(#SQL)
And an example of its usage:
spPivot
'select ''Bucket'' Category, ''Large'' SubCategory, 1 Amount union all
select ''Bucket'' Category, ''Large'' SubCategory, 2 Amount union all
select ''Shovel'' Category, ''Large'' SubCategory, 4 Amount union all
select ''Shovel'' Category, ''Small'' SubCategory, 8 Amount',
'Category', 'SubCategory', 'Amount', 'sum'
The example works, but note that it's probably more efficient to send the procedure the name of a [temp] table because it's queried twice within. So using marc_s' temp table, the call would be
spPivot 'TEMP', '[Month]', 'Site', 'Val', 'SUM'
Also note you have a #debug parameter that you can use to figure out why your call is not working as you expect.
select
month,
min(case site when 'microsoft'then val end) microsoft,
min(case site when 'google'then val end) google
from
withoutpivot
group by
month
select
main.month,
m.val as microsoft,
g.val as google
from
withoutpivot main
inner join
withoutpivot m on m.month = main.month
inner join
withoutpivot g on g.month = main.month
where
m.site = 'microsoft'
and g.site = 'google'

Updates on PIVOTs in SQL Server 2008

Is there a way to perform updates on a PIVOTed table in SQL Server 2008 where the changes propagate back to the source table, assuming there is no aggregation?
PIVOTs always require an aggregate function in the pivot clause.
Thus there is always aggregation.
So, no, it cannot be updatable.
You CAN put an INSTEAD OF TRIGGER on a view based on the statement and thus you can make any view updatable.
Example here
This will only really work if the pivoted columns form a unique identifier. So let's take Buggy's example; here is the original table:
TaskID Date Hours
and we want to pivot it into a table that looks like this:
TaskID 11/15/1980 11/16/1980 11/17/1980 ... etc.
In order to create the pivot, you would do something like this:
DECLARE #FieldList NVARCHAR(MAX)
SELECT
#FieldList =
CASE WHEN #FieldList <> '' THEN
#FieldList + ', [' + [Date] + ']'
ELSE
'[' + [Date] + ']'
END
FROM
Tasks
DECLARE #PivotSQL NVARCHAR(MAX)
SET #PivotSQL =
'
SELECT
TaskID
, ' + #FieldList + '
INTO
##Pivoted
FROM
(
SELECT * FROM Tasks
) AS T
PIVOT
(
MAX(Hours) FOR T.[Date] IN (' + #FieldList + ')
) AS PVT
'
EXEC(#PivotSQL)
So then you have your pivoted table in ##Pivoted. Now you perform an update to one of the hours fields:
UPDATE
##Pivoted
SET
[11/16/1980 00:00:00] = 10
WHERE
TaskID = 1234
Now ##Pivoted has an updated version of the hours for a task that took place on 11/16/1980 and we want to save that back to the original table, so we use an UNPIVOT:
DECLARE #UnPivotSQL NVarChar(MAX)
SET #UnPivotSQL =
'
SELECT
TaskID
, [Date]
, [Hours]
INTO
##UnPivoted
FROM
##Pivoted
UNPIVOT
(
Value FOR [Date] IN (' + #FieldList + ')
) AS UP
'
EXEC(#UnPivotSQL)
UPDATE
Tasks
SET
[Hours] = UP.[Hours]
FROM
Tasks T
INNER JOIN
##UnPivoted UP
ON
T.TaskID = UP.TaskID
You'll notice that I modified Buggy's example to remove aggregation by day-of-week. That's because there's no going back and updating if you perform any sort of aggregation. If I update the SUNHours field, how do I know which Sunday's hours I'm updating? This will only work if there is no aggregation. I hope this helps!
this is just a guess, but can you make the query into a view and then update it?
I don't believe that it is possible, but if you post specifics about the actual problem that you're trying to solve someone might be able to give you some advice on a different approach to handling it.