Looking for elegant (or any) solution to convert columns to rows.
Here is an example: I have a table with the following schema:
[ID] [EntityID] [Indicator1] [Indicator2] [Indicator3] ... [Indicator150]
Here is what I want to get as the result:
[ID] [EntityId] [IndicatorName] [IndicatorValue]
And the result values will be:
1 1 'Indicator1' 'Value of Indicator 1 for entity 1'
2 1 'Indicator2' 'Value of Indicator 2 for entity 1'
3 1 'Indicator3' 'Value of Indicator 3 for entity 1'
4 2 'Indicator1' 'Value of Indicator 1 for entity 2'
And so on..
Does this make sense? Do you have any suggestions on where to look and how to get it done in T-SQL?
You can use the UNPIVOT function to convert the columns into rows:
select id, entityId,
indicatorname,
indicatorvalue
from yourtable
unpivot
(
indicatorvalue
for indicatorname in (Indicator1, Indicator2, Indicator3)
) unpiv;
Note, the datatypes of the columns you are unpivoting must be the same so you might have to convert the datatypes prior to applying the unpivot.
You could also use CROSS APPLY with UNION ALL to convert the columns:
select id, entityid,
indicatorname,
indicatorvalue
from yourtable
cross apply
(
select 'Indicator1', Indicator1 union all
select 'Indicator2', Indicator2 union all
select 'Indicator3', Indicator3 union all
select 'Indicator4', Indicator4
) c (indicatorname, indicatorvalue);
Depending on your version of SQL Server you could even use CROSS APPLY with the VALUES clause:
select id, entityid,
indicatorname,
indicatorvalue
from yourtable
cross apply
(
values
('Indicator1', Indicator1),
('Indicator2', Indicator2),
('Indicator3', Indicator3),
('Indicator4', Indicator4)
) c (indicatorname, indicatorvalue);
Finally, if you have 150 columns to unpivot and you don't want to hard-code the entire query, then you could generate the sql statement using dynamic SQL:
DECLARE #colsUnpivot AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #colsUnpivot
= stuff((select ','+quotename(C.column_name)
from information_schema.columns as C
where C.table_name = 'yourtable' and
C.column_name like 'Indicator%'
for xml path('')), 1, 1, '')
set #query
= 'select id, entityId,
indicatorname,
indicatorvalue
from yourtable
unpivot
(
indicatorvalue
for indicatorname in ('+ #colsunpivot +')
) u'
exec sp_executesql #query;
well If you have 150 columns then I think that UNPIVOT is not an option. So you could use xml trick
;with CTE1 as (
select ID, EntityID, (select t.* for xml raw('row'), type) as Data
from temp1 as t
), CTE2 as (
select
C.id, C.EntityID,
F.C.value('local-name(.)', 'nvarchar(128)') as IndicatorName,
F.C.value('.', 'nvarchar(max)') as IndicatorValue
from CTE1 as c
outer apply c.Data.nodes('row/#*') as F(C)
)
select * from CTE2 where IndicatorName like 'Indicator%'
sql fiddle demo
You could also write dynamic SQL, but I like xml more - for dynamic SQL you have to have permissions to select data directly from table and that's not always an option.
UPDATEAs there a big flame in comments, I think I'll add some pros and cons of xml/dynamic SQL. I'll try to be as objective as I could and not mention elegantness and uglyness. If you got any other pros and cons, edit the answer or write in comments
cons
it's not as fast as dynamic SQL, rough tests gave me that xml is about 2.5 times slower that dynamic (it was one query on ~250000 rows table, so this estimate is no way exact). You could compare it yourself if you want, here's sqlfiddle example, on 100000 rows it was 29s (xml) vs 14s (dynamic);
may be it could be harder to understand for people not familiar with xpath;
pros
it's the same scope as your other queries, and that could be very handy. A few examples come to mind
you could query inserted and deleted tables inside your trigger (not possible with dynamic at all);
user don't have to have permissions on direct select from table. What I mean is if you have stored procedures layer and user have permissions to run sp, but don't have permissions to query tables directly, you still could use this query inside stored procedure;
you could query table variable you have populated in your scope (to pass it inside the dynamic SQL you have to either make it temporary table instead or create type and pass it as a parameter into dynamic SQL;
you can do this query inside the function (scalar or table-valued). It's not possible to use dynamic SQL inside the functions;
Just to help new readers, I've created an example to better understand #bluefeet's answer about UNPIVOT.
SELECT id
,entityId
,indicatorname
,indicatorvalue
FROM (VALUES
(1, 1, 'Value of Indicator 1 for entity 1', 'Value of Indicator 2 for entity 1', 'Value of Indicator 3 for entity 1'),
(2, 1, 'Value of Indicator 1 for entity 2', 'Value of Indicator 2 for entity 2', 'Value of Indicator 3 for entity 2'),
(3, 1, 'Value of Indicator 1 for entity 3', 'Value of Indicator 2 for entity 3', 'Value of Indicator 3 for entity 3'),
(4, 2, 'Value of Indicator 1 for entity 4', 'Value of Indicator 2 for entity 4', 'Value of Indicator 3 for entity 4')
) AS Category(ID, EntityId, Indicator1, Indicator2, Indicator3)
UNPIVOT
(
indicatorvalue
FOR indicatorname IN (Indicator1, Indicator2, Indicator3)
) UNPIV;
Just because I did not see it mentioned.
If 2016+, here is yet another option to dynamically unpivot data without actually using Dynamic SQL.
Example
Declare #YourTable Table ([ID] varchar(50),[Col1] varchar(50),[Col2] varchar(50))
Insert Into #YourTable Values
(1,'A','B')
,(2,'R','C')
,(3,'X','D')
Select A.[ID]
,Item = B.[Key]
,Value = B.[Value]
From #YourTable A
Cross Apply ( Select *
From OpenJson((Select A.* For JSON Path,Without_Array_Wrapper ))
Where [Key] not in ('ID','Other','Columns','ToExclude')
) B
Returns
ID Item Value
1 Col1 A
1 Col2 B
2 Col1 R
2 Col2 C
3 Col1 X
3 Col2 D
I needed a solution to convert columns to rows in Microsoft SQL Server, without knowing the colum names (used in trigger) and without dynamic sql (dynamic sql is too slow for use in a trigger).
I finally found this solution, which works fine:
SELECT
insRowTbl.PK,
insRowTbl.Username,
attr.insRow.value('local-name(.)', 'nvarchar(128)') as FieldName,
attr.insRow.value('.', 'nvarchar(max)') as FieldValue
FROM ( Select
i.ID as PK,
i.LastModifiedBy as Username,
convert(xml, (select i.* for xml raw)) as insRowCol
FROM inserted as i
) as insRowTbl
CROSS APPLY insRowTbl.insRowCol.nodes('/row/#*') as attr(insRow)
As you can see, I convert the row into XML (Subquery select i,* for xml raw, this converts all columns into one xml column)
Then I CROSS APPLY a function to each XML attribute of this column, so that I get one row per attribute.
Overall, this converts columns into rows, without knowing the column names and without using dynamic sql. It is fast enough for my purpose.
(Edit: I just saw Roman Pekar answer above, who is doing the same.
I used the dynamic sql trigger with cursors first, which was 10 to 100 times slower than this solution, but maybe it was caused by the cursor, not by the dynamic sql. Anyway, this solution is very simple an universal, so its definitively an option).
I am leaving this comment at this place, because I want to reference this explanation in my post about the full audit trigger, that you can find here: https://stackoverflow.com/a/43800286/4160788
DECLARE #TableName varchar(max)=NULL
SELECT #TableName=COALESCE(#TableName+',','')+t.TABLE_CATALOG+'.'+ t.TABLE_SCHEMA+'.'+o.Name
FROM sysindexes AS i
INNER JOIN sysobjects AS o ON i.id = o.id
INNER JOIN INFORMATION_SCHEMA.TABLES T ON T.TABLE_NAME=o.name
WHERE i.indid < 2
AND OBJECTPROPERTY(o.id,'IsMSShipped') = 0
AND i.rowcnt >350
AND o.xtype !='TF'
ORDER BY o.name ASC
print #tablename
You can get list of tables which has rowcounts >350 . You can see at the solution list of table as row.
The opposite of this is to flatten a column into a csv eg
SELECT STRING_AGG ([value],',') FROM STRING_SPLIT('Akio,Hiraku,Kazuo', ',')
Related
Please see the TSQL below:
create table #IDs (id varchar(100))
insert into #IDs values ('123')
insert into #IDs values ('456')
insert into #IDs values ('789')
insert into #IDs values ('1010')
create table #Notes (Note varchar(500))
insert into #Notes values ('Here is a note for 123')
insert into #Notes values ('A note for 789 here')
insert into #Notes values ('456 has a note here')
I want to find all the IDs that are referenced in the #Notes table. This works:
select #IDs.id from #IDs inner join #Notes on #Notes.note like '%' + #IDs.id + '%'
However, there are hundreds of thousands of records in both tables and the query does not complete. I was thinking about FreeText searching, but I don't think it can be applied here. A cursor takes too long to run as well (I think it will take over one month). Is there anything else I can try? I am using SQL Server 2019.
The size of the input is only one aspect of the solution.
By splitting the text to tokens you indeed increase the number of records, but in the same time you enable equality join, which can be implemented using Hash Join.
You should get the query results in a few minutes top, basically the time it takes to your system to do a full scan on both tables, plus some processing time.
No need for temp tables.
No need for indexes.
Select id
from #IDS
where id in (select w.value
from #Notes as n
cross apply string_split(n.Note, ' ') as w
)
Fiddle
Per the OP request -
Here is a code that handles more complicated scenario, where an id could contain various characters (as defined by #token_char) and the separators are potentially all other characters
declare #token_char varchar(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
;
with cte_notes as
(
select Note
,replace(translate(Note,#token_char,space(len(#token_char))),' ','') as non_token_char
from #Notes
)
select id
from #IDS
where id in
(
select w.value
from cte_notes as n
cross apply string_split(translate(n.Note,n.non_token_char,space(len(n.non_token_char))),' ') as w
where w.value != ''
)
The Fiddle data sample was altered accordingly, to reflect the change
If you are going to do this search often you may want to explore using a wonderful (if underused) feature of SQL Server called 'Full Text Search.' To quote Microsoft:
A LIKE query against millions of rows of text data can take minutes to
return; whereas a full-text query can take only seconds or less
against the same data, depending on the number of rows that are
returned.'
I have seen searches go from minutes to seconds using this feature.
You would need to create a Full Text Search Catalog and then create indexs on the tables you want to search. It's not hard and will take you a few minutes to learn how to do this.
This is a good starting point:
https://learn.microsoft.com/en-us/sql/relational-databases/search/get-started-with-full-text-search?view=sql-server-ver15
I would apply CTE with string_split to filter out all alphabetic components and then join #ID table with the result of the CTE by id column. The query was tested on a sample of 1 mm rows.
With CTE As (
Select T.value As id
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null
)
Select I.id
From #IDs As I Inner Join CTE On (I.id=CTE.id)
If you just want to extract a numeric value from a string, in this case join is excessive.
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null And T.value Like '%[0-9]%'
id
Note
123
Here is a note for 123
789
A note for 789 here
456
456 has a note here
No matter what, under the given circumstances, I would use join to filter out those numbers that are not represented in #IDs table.
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
If the string contains brackets or parenthesis instead of spaces like this:
"456(this is an id number) has a note here" or "456[01/01/2022]"
as last resorts (since it degrades performance) you can use TRANSLATE to replace those brackets with spaces as follows:
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(TRANSLATE(Note,'[]()',' '),' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
db<>fiddle
I'd like to rework a script I've been given.
The way it currently works is via a WITH CTE using a large number of UNIONs.
Current setup
We're taking one record from a source table, inserting it into a destination table once with [Name] A then inserting it again with [Name] B. Essentially creating multiple rows in the destination, albeit with different [Name].
An example of one transaction would be to take this row from [Source]:
ID [123] Name [Red and Green]
The results of my current set up in the [Destination] is:
ID [123] Name [Red]
ID [123] Name [Green]
Current logic
Here's a simplified version of the current logic:
WITH CTE
AS
(SELECT ID,
'Red' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green'
UNION ALL
SELECT ID,
'Green' AS [Name]
FROM [Source_Table]
WHERE [Name] = 'Red and Green')
INSERT INTO [Destination_Table]
(ID,
[Name])
SELECT ID,
[Name]
FROM CTE;
The reason I'd like to rework this is when we get a new [Name], we have to manually add another portion of code into our (ever increasing) UNION, to make sure it gets picked up.
What I've considered
What I was considering was setting up a WHILE LOOP (or CURSOR) running off a control table, where we could store all of the [Names]. However, I'm not sure if this would be the best approach and I'm not too familiar yet with LOOPS/CURSORS. Also, Wouldn't be too sure of how to stop the loop once all [Name]s had been completed.
Any help much appreciated.
You can use cross apply to duplicate the rows:
insert into [destination_table] (id, name)
select x.*
from source_table s
cross apply (values (id, 'Red'), (id, 'Green')) x(id, name)
where name = 'Red and Green'
Introduce a new table called Color_List which just contains one row for each possible color. Then do this:
with cte as
(
select
st.ID,
c.colorname
from
Source_Table s
inner join
Color_List c
on
CHARINDEX(c.colorname, s.[Name]) > 0
)
insert into Destination_Table
(
ID,
[Name]
)
select
ID,
colorname
from
cte
The benefit of this method is that you aren't hard-coding any color names in the query. All the color names (and presumably there can be many more than two) get maintained in the Color_List table.
You could use string_split to split the values apart. First replace the ' and ' with a pipe '|'. Then do a string split on the vertical pipe.
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '[123]', 'Name', '[Red and Green]')) V(ID, testCol, nameCol, stringCol);
select ID, testCol, nameCol,
case when left([value], 1)!='[' then concat('[',[value]) else
case when right([value], 1)!=']' then concat([value], ']') else [value] end end valCol
from #tTEST t
cross apply string_split(replace(t.stringCol, ' and ', '|'), '|');
Results
ID testCol nameCol valCol
1 [123] Name [Red]
1 [123] Name [Green]
Im using two query's, 1st separated one column to two columns and inserted one table and second query (PIVOT) fetching based on inserted table,
1st Query,
SELECT A.MDDID, A.DeviceNumber,
Split.a.value('.', 'VARCHAR(100)') AS MetReading
FROM (
SELECT MDDID,DeviceNumber,
CAST ('<M>' + REPLACE(Httpstring, ':', '</M><M>') + '</M>' AS XML) AS MetReading
FROM [IOTDBV1].[dbo].[MDASDatas] E
Where E.MDDID = 49101
) AS A CROSS APPLY MetReading.nodes ('/M') AS Split(a);
2nd Query
SELECT * FROM
(
Select ID,MDDID,DeviceNumber,ReceivedDate
, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY (SELECT 1)) AS ID2
, SPLT.MR.value('.','VARCHAR(MAX)') AS LIST FROM (
Select ID,MDDID,DeviceNumber,ReceivedDate
, CAST( '<M>'+REPLACE(MeterReading,',','</M><M>')+'</M>' AS XML) AS XML_MR
From [dbo].[PARSEMDASDatas] E
Where E.MeterReading is Not Null
)E
CROSS APPLY E.XML_MR.nodes('/M') AS SPLT(MR)
)A
PIVOT
(
MAX(LIST) FOR ID2 IN ([1],[2],[3],[4],[5],[6],[7],[8])
)PV
I want 2nd query run based on first query no need to require table.
any help would be appreciated.
Your question is not very clear... And it is a very good example, why you always should add a MCVE, including DDL, sample data, own attempts, wrong output and expected output. This time I do this for you, please try to prepare such a MCVE the next time yourself...
If I get this correctly, your source table includes a CSV column with up to 8 (max?) values. This might be solved much easier, no need to break this up in two queries, no need for an intermediate table and not even for PIVOT.
--create a mockup-table to simulate your situation (slightly shortened for brevity)
DECLARE #YourTable TABLE(ID INT,MDDID INT, DeviceNumber VARCHAR(100),MetReading VARCHAR(2000));
INSERT INTO #YourTable VALUES
(2,49101,'NKLDEVELOPMENT02','DCPL,981115,247484,9409') --the character code and some numbers
,(3,49101,'NKLDEVELOPMENT02','SPPL,,,,,,,,') --eigth empty commas
,(4,49101,'NKLDEVELOPMENT02','BLAH,,,999,,'); --A value somewhere in the middle
--The cte will return the table as is. The only difference is a cast to XML (as you did it too)
WITH Splitted AS
(
SELECT ID
,MDDID
,DeviceNumber
,CAST('<x>' + REPLACE(MetReading,',','</x><x>') + '</x>' AS XML) AS Casted
FROM #YourTable t
)
SELECT s.ID
,s.MDDID
,s.DeviceNumber
,s.Casted.value('/x[1]','varchar(100)') AS [1]
,s.Casted.value('/x[2]','varchar(100)') AS [2]
,s.Casted.value('/x[3]','varchar(100)') AS [3]
,s.Casted.value('/x[4]','varchar(100)') AS [4]
,s.Casted.value('/x[5]','varchar(100)') AS [5]
,s.Casted.value('/x[6]','varchar(100)') AS [6]
,s.Casted.value('/x[7]','varchar(100)') AS [7]
,s.Casted.value('/x[8]','varchar(100)') AS [8]
FROM Splitted s;
the result
ID MDDID DeviceNumber 1 2 3 4 5 6 7 8
2 49101 NKLDEVELOPMENT02 DCPL 981115 247484 9409 NULL NULL NULL NULL
3 49101 NKLDEVELOPMENT02 SPPL
4 49101 NKLDEVELOPMENT02 BLAH 999 NULL NULL
The idea in short:
Each CSV is tranformed to a XML similar to this:
<x>DCPL</x>
<x>981115</x>
<x>247484</x>
<x>9409</x>
Using a position predicate in the XPath, we can call the first, the second, the third <x> easily.
CTE: WITH common_table_expression is answer
you can prepare some data in first query and user in second
WITH cte_table AS
(
SELECT *
FROM sys.objects
)
SELECT *
FROM cte_table
where name like 'PK%'
I know it is possible to SELECT, from sys.columns and from tempdb.sys.columns the names of the columns of a specific table.
Can the same be done from a CTE?
with SampleCTE as (
Select
'Tom' as Name
,'Bombadill' as Surname
,99999 as Age
,'Withywindle' as Address
)
is there any way to know that the columns of this CTE are Name,Surname,Age and Address, without resorting to dumping the CTE result to a temporary table and reading the columns from there?
Thanks!
Here is a "dynamic" approach without actually using Dynamic SQL.
Unpivot (dynamic or not) would be more performant
Example
with SampleCTE as (
Select
'Tom' as Name
,'Bombadill' as Surname
,99999 as Age
,'Withywindle' as Address
)
Select C.*
From SampleCTE A
Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
Cross Apply (
Select Item = a.value('local-name(.)','varchar(100)')
,Value = a.value('.','varchar(max)')
From B.XMLData.nodes('/row') as C1(n)
Cross Apply C1.n.nodes('./#*') as C2(a)
Where a.value('local-name(.)','varchar(100)') not in ('ID','ExcludeOtherCol')
) C
Returns
Item Value
Name Tom
Surname Bombadill
Age 99999
Address Withywindle
Yes, it is possible sys.dm_exec_describe_first_result_set :
This dynamic management function takes a Transact-SQL statement as a parameter and describes the metadata of the first result set for the statement.
SELECT name
FROM sys.dm_exec_describe_first_result_set(
N'
with SampleCTE as (
Select
''Tom'' as Name
,''Bombadill'' as Surname
,99999 as Age
,''Withywindle'' as Address
)
SELECT * FROM SampleCTE
', NULL, NULL);
db<>fiddle demo
I'm wondering if I can select the value of a column if the column exists and just select null otherwise. In other words I'd like to "lift" the select statement to handle the case when the column doesn't exist.
SELECT uniqueId
, columnTwo
, /*WHEN columnThree exists THEN columnThree ELSE NULL END*/ AS columnThree
FROM (subQuery) s
Note, I'm in the middle to solidifying my data model and design. I hope to exclude this logic in the coming weeks, but I'd really like to move beyond this problem right because the data model fix is a more time consuming endeavor than I'd like to tackle now.
Also note, I'd like to be able to do this in one query. So I'm not looking for an answer like
check what columns are on your sub query first. Then modify your
query to appropriately handle the columns on your sub query.
You cannot do this with a simple SQL statement. A SQL query will not compile unless all table and column references in the table exist.
You can do this with dynamic SQL if the "subquery" is a table reference or a view.
In dynamic SQL, you would do something like:
declare #sql nvarchar(max) = '
SELECT uniqueId, columnTwo, '+
(case when exists (select *
from INFORMATION_SCHEMA.COLUMNS
where tablename = #TableName and
columnname = 'ColumnThree' -- and schema name too, if you like
)
then 'ColumnThree'
else 'NULL as ColumnThree'
end) + '
FROM (select * from '+#SourceName+' s
';
exec sp_executesql #sql;
For an actual subquery, you could approximate the same thing by checking to see if the subquery returned something with that column name. One method for this is to run the query: select top 0 * into #temp from (<subquery>) s and then check the columns in #temp.
EDIT:
I don't usually update such old questions, but based on the comment below. If you have a unique identifier for each row in the "subquery", you can run the following:
select t.. . ., -- everything but columnthree
(select column3 -- not qualified!
from t t2
where t2.pk = t.pk
) as column3
from t cross join
(values (NULL)) v(columnthree);
The subquery will pick up column3 from the outer query if it doesn't exist. However, this depends critically on having a unique identifier for each row. The question is explicitly about a subquery, and there is no reason to expect that the rows are easily uniquely identified.
As others already suggested, the sane approach is to have queries that meet your table design.
There is a rather exotic approach to achieve what you want in (pure, not dynamic) SQL though. A similar problem was posted at DBA.SE: How to select specific rows if a column exists or all rows if a column doesn't but it was simpler as only one row and one column was wanted as result. Your problem is more complex so the query is more convoluted, to say the least. Here is, the insane approach:
; WITH s AS
(subquery) -- subquery
SELECT uniqueId
, columnTwo
, columnThree =
( SELECT ( SELECT columnThree
FROM s AS s2
WHERE s2.uniqueId = s.uniqueId
) AS columnThree
FROM (SELECT NULL AS columnThree) AS dummy
)
FROM s ;
It also assumes that the uniqueId is unique in the result set of the subquery.
Tested at SQL-Fiddle
And a simpler method which has the additional advantage that allows more than one column with a single subquery:
SELECT s.*
FROM
( SELECT NULL AS columnTwo,
NULL AS columnThree,
NULL AS columnFour
) AS dummy
CROSS APPLY
( SELECT
uniqueId,
columnTwo,
columnThree,
columnFour
FROM tableX
) AS s ;
The question has also been asked at DBA.SE and has been answered by #Andriy M (using CROSS APPLY too!) and Michael Ericsson (using XML):
Why can't I use a CASE statement to see if a column exists and not SELECT from it?
you can use dynamic SQL.
first you need to check exist column and then create dynamic query.
DECLARE #query NVARCHAR(MAX) = '
SELECT FirstColumn, SecondColumn, '+
(CASE WHEN exists (SELECT 1 FROM syscolumns
WHERE name = 'ColumnName' AND id = OBJECT_ID('TableName'))
THEN 'ColumnName'
ELSE 'NULL as ThreeColumn'
END) + '
FROM TableName'
EXEC sp_executesql #query;