azure data lake u-sql pivot - azure-data-lake

I am loving Azure Data Lake but lack of documentation will probably slow down the adoption. I hope somebody out there have more experinnce on U-SQL than I do.
Trying to derive from what's available under Microsoft.Analytics.Interfaces and via U-SQL interpreter with not much luck. Dynamic sql does not seem to be supported to define the schema of a row set at run time and IUpdatableRow's schema is readonly so Processor approach is not viable. And there is no out of the box PIVOT capability in U-SQL.
I also thought that maybe I can process the rowset all together and write a custom outputter to pivot but couldn't figure it out.
There is probably a really easy way to do this as it is a standard pivot operation. How would you go about reshaping a rowset from I to II for an indeterminate number of ColA and ColB values in a performant way?
I
|ColA |ColB |ColC|
|1 |A |30 |
|1 |B |70 |
|1 |ZA |12 |
|2 |C |22 |
|2 |A |13 |
II
|ID |A |B |C |...... |ZA |.....
|1 |30 |70 |0 | |12 |
|2 |13 |0 |22 |...... |0 |.....

Note PIVOT / UNPIVOT syntax has been added to U-SQL as of March 2017.
Using the above sample data:
#t = SELECT *
FROM(
VALUES
( 1, "A", 30 ),
( 1, "B", 70 ),
( 1, "ZA", 12 ),
( 2, "C", 22 ),
( 2, "A", 13 ),
( 2, "ABC", 42)
) AS T(ColA, ColB, ColC);
#p =
SELECT Column_0 AS id, Column_1 AS a
FROM #t
PIVOT (MAX(ColC) FOR ColB IN ("A" AS [A], "B" AS [B], "C" AS [C], "ZA" AS [ZA], "ABC" AS [ABC])
) AS pvt;
OUTPUT #p
TO "/output/pivot3.csv"
USING Outputters.Csv();

You have several options for doing such a PIVOT.
Here is one that uses the U-SQL MAP data type (called SQL.MAP). Instead of 0 it will return null for missing values (use a null coalesce expression to turn it into 0) This will work under the following conditions:
The generated MAP stays within the row size limit of 4MB. If not,
see the next solution.
You know ahead of time, what columns you have
(if not, just keep the data in the map column and extract as
needed).
Solution with map:
#t = SELECT *
FROM(
VALUES
( 1, "A", 30 ),
( 1, "B", 70 ),
( 1, "ZA", 12 ),
( 2, "C", 22 ),
( 2, "A", 13 ),
( 2, "ABC", 42)
) AS T(ColA, ColB, ColC);
#m = SELECT ColA AS [ID],
MAP_AGG(ColB, (int?) ColC) AS m
FROM #t
GROUP BY ColA;
#r =
SELECT [ID],
m["A"]AS A,
m["B"]AS B,
m["C"]AS C,
m["ZA"]AS [ZA],
m["ABC"]AS [ABC]
FROM #m;
OUTPUT #r
TO "/output/pivot1.csv"
USING Outputters.Csv();
And here is a solution that does use the standard SQL pivot work-around pattern (Some SQL database implementations actually used to translate the PIVOT expression into such an expression internally, and may still do it). Again, you will have to know all columns ahead of time. If that is not the case, just use the MAP datatype.
#t =
SELECT *
FROM(
VALUES
( 1, "A", 30 ),
( 1, "B", 70 ),
( 1, "ZA", 12 ),
( 2, "C", 22 ),
( 2, "A", 13 ),
( 2, "ABC", 42)
) AS T(ColA, ColB, ColC);
#r =
SELECT ColA AS [ID],
(ColB == "A") ? ColC : 0 AS A,
(ColB == "B") ? ColC : 0 AS B,
(ColB == "C") ? ColC : 0 AS C,
(ColB == "ZA") ? ColC : 0 AS [ZA],
(ColB == "ABC") ? ColC : 0 AS [ABC]
FROM #t;
#r =
SELECT DISTINCT [ID],
LAST_VALUE(A) OVER(PARTITION BY [ID] ORDER BY A) AS A,
LAST_VALUE(B) OVER(PARTITION BY [ID] ORDER BY B) AS B,
LAST_VALUE(C) OVER(PARTITION BY [ID] ORDER BY C) AS C,
LAST_VALUE([ZA]) OVER(PARTITION BY [ID] ORDER BY [ZA]) AS [ZA],
LAST_VALUE([ABC]) OVER(PARTITION BY [ID] ORDER BY [ABC]) AS [ABC]
FROM #r;
OUTPUT #r
TO "/output/pivot2.csv"
USING Outputters.Csv();

Here is one workaround that my team member come up with for a scenario where we don't know a number of columns ahead of time.
#t = SELECT *
FROM(
VALUES
( 1, "A", 30 ),
( 1, "B", 70 ),
( 1, "ZA", 12 ),
( 2, "C", 22 ),
( 2, "A", 13 ),
( 2, "ABC", 42)
) AS T(ColA, ColB, ColC);
#t1 =
SELECT DISTINCT ColB
FROM #t
ORDER BY ColB DESC
OFFSET 0 ROW;
#t1 =
SELECT ARRAY_AGG(ColB) AS ColBArray
FROM #t1;
#result =
SELECT ColA,
MAP_AGG(ColB, (int?) ColC) AS ColCMap
FROM #t
GROUP BY ColA;
#result =
SELECT a.ColA,
DPivotNS.DPivot.FillGapsAndConvert(a.ColCMap, b.ColBArray) AS Values
FROM #result AS a
CROSS JOIN
#t1 AS b;
#result =
SELECT ColA,
ArrayColumn
FROM
(
SELECT 0 AS ColA,
ColBArray AS ArrayColumn,
0 AS Ord
FROM #t1
UNION ALL
SELECT ColA AS ColA,
Values AS ArrayColumn,
1 AS Ord
FROM #result
) AS rs1
ORDER BY rs1.Ord
OFFSET 0 ROWS;
#result =
SELECT ColA,
String.Join(",", ArrayColumn) AS Values
FROM #result;
OUTPUT #result
TO "result.csv"
USING Outputters.Csv(quoting:false);
Here is UDF for above script:
public static SqlArray<string> FillGapsAndConvert (SqlMap<string, int?> ColCMap, SqlArray<string> ColDArray)
{
var list = new LinkedList<string> ();
foreach ( string colD in ColDArray )
{
int? currentCount = ColCMap[colD];
int newCount = currentCount.HasValue ? currentCount.Value : 0;
list.AddLast (newCount.ToString ());
}
return new SqlArray<string> (list);
}

Related

Count number of repeated character in a given string

How do I count the number of occurrences of repeated $ character in the given strings.
For ex:
String = '$$$$ABC$$$DE$$$' --> Answer is 4,3,3
String = '###$$%%ANE$$$$$' --> Answer is 2,5
I have no idea how to do it so did not do any attempts.
Thanks for your help.
For Reproducing:
DDL and Inserts:
Create table xyz(text varchar(200));
Insert into xyz values('$$$$ABC$$$DE$$$');
Insert into xyz values('###$$%%ANE$$$$$');
What I need to do: Count the repeated number of '$'
Desired output, based on the sample data in #1 above.
text = '$$$$ABC$$$DE$$$' --> Answer is 4,3,3
text = '###$$%%ANE$$$$$' --> Answer is 2,5
SQL Server version: Microsoft SQL Server 2019 (RTM) - 15.0.2000.5
Please try the following solution. It will work starting from SQL Server 2017 onwards.
It is based on use of the TRANSLATE() function, and XML and XQuery.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(30));
INSERT INTO #tbl (tokens) VALUES
('$$$$ABC$$$DE$$$'), --> Answer is 4,3,3
('###$$%%ANE$$$$$'); --> Answer is 2,5
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = SPACE(1);
;WITH cte AS
(
SELECT *
, REPLACE(TRANSLATE(tokens, '$', SPACE(1)),' ','') AS JunkCharacters
FROM #tbl
)
SELECT *
, REPLACE(TRY_CAST('<root><r><![CDATA[' +
REPLACE(TRANSLATE(tokens, TRIM(JunkCharacters), SPACE(LEN(TRIM(JunkCharacters)))), #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)
.query('
for $x in /root/r[text()]
return data(string-length($x))
').value('.', 'VARCHAR(20)'), SPACE(1), ',') AS CleansedTokensCounter
FROM cte;
Output
+----+-----------------+----------------+-----------------------+
| ID | tokens | JunkCharacters | CleansedTokensCounter |
+----+-----------------+----------------+-----------------------+
| 1 | $$$$ABC$$$DE$$$ | ABCDE | 4,3,3 |
| 2 | ###$$%%ANE$$$$$ | ###%%ANE | 2,5 |
+----+-----------------+----------------+-----------------------+
We can do this with a number of steps:
We use a tally/numbers table to shred the string into individual characters. The tally is calculated on the fly with a couple of cross-joins and ROW_NUMBER
We then calculate a grouping ID for each group of characters, using a standard gaps-and-islands technique: a windowed sum of each starting row
Filter down to the character we want, group it by ID and return a count of rows in each group.
This returns a new row for every group of $ characters
Create table xyz(text varchar(200));
Insert into xyz values('$$$$ABC$$$DE$$$');
Insert into xyz values('###$$%%ANE$$$$$');
WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
-- you can allow for larger strings with more cross-joins
Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L1 )
SELECT
xyz.[text],
r.numRepetitions
FROM xyz
CROSS APPLY (
SELECT numRepetitions = COUNT(*)
FROM (
SELECT TOP(LEN(xyz.[text]))
thisChar = SUBSTRING(xyz.[text], rownum, 1),
groupId = SUM(CASE WHEN rownum = 1 OR SUBSTRING(xyz.[text], rownum, 1) <> SUBSTRING(xyz.[text], rownum - 1, 1) THEN 1 ELSE 0 END)
OVER (ORDER BY rownum ROWS UNBOUNDED PRECEDING)
FROM Nums
ORDER BY rownum
) AS chars
WHERE thisChar = '$'
GROUP BY groupId
) AS r;
If you want a single comma-separated list of row-counts, you need to subquery again
CROSS APPLY (
SELECT numRepetitions = STRING_AGG(CAST(numRepetitions AS varchar(10)), ',')
FROM (
SELECT numRepetitions = COUNT(*)
FROM (
SELECT TOP(LEN(xyz.[text]))
thisChar = SUBSTRING(xyz.[text], rownum, 1),
groupId = SUM(CASE WHEN rownum = 1 OR SUBSTRING(xyz.[text], rownum, 1) <> SUBSTRING(xyz.[text], rownum - 1, 1) THEN 1 ELSE 0 END)
OVER (ORDER BY rownum ROWS UNBOUNDED PRECEDING)
FROM Nums
ORDER BY rownum
) AS chars
WHERE thisChar = '$'
GROUP BY groupId
) AS groups
) AS r;

Can I replace substrings in a formula stored in a string in SQL?

I need to replace values within a formula stored as a string in SQL.
Example formulas stored in a column:
'=AA+BB/DC'
'=-(AA+CC)'
'=AA/BB+DD'
I have values for AA, BB etc. stored in another table.
Can I find and replace 'AA', 'BB' and so forth from within the formulas with numeric values to actually calculate the formula?
I assume I also need to replace the arithmetic operators ('+' , '/') from strings to actual signs, and if so is there a way to do it?
Desired Result
Assuming: AA = 10, BB = 20, DC = 5
I would need
'=AA+BB/DC' converted to 10+20/5 and a final output of 14
Please note that formulas can change in the future so I would need something resilient to that.
Thank you!
Okay, so this is a real hack, but I was intrigued by your question. You could turn my example into a function and then refactor it to your specific needs.
Note: using TRANSLATE requires SQL Server 2017. This could be a deal-breaker for you right there. TRANSLATE simplifies the replacement process greatly.
This example is just that--an example. A hack. Performance issues are unknown. You still need to do your diligence with testing.
-- Create a mock-up of the values table/data.
DECLARE #Values TABLE ( [key] VARCHAR(2), [val] INT );
INSERT INTO #Values ( [key], [val] ) VALUES
( 'AA', 10 ), ( 'BB', 20 ), ( 'CC', 6 ), ( 'DC', 5 );
-- Variable passed in to function.
DECLARE #formula VARCHAR(255) = '=(AA+BB)/DC';
-- Remove unnecessary mathmatical characters from the formula values.
DECLARE #vals VARCHAR(255) = REPLACE ( TRANSLATE ( #formula, '=()', '___' ), '_', '' );
-- Remove any leading mathmatical operations from #vals.
WHILE PATINDEX ( '[A-Z]', LEFT ( #vals, 1 ) ) = 0
SET #vals = SUBSTRING ( #vals, 2, LEN ( #vals ) );
-- Use SQL hack to replace placeholder values with actual values...
SELECT #formula = REPLACE ( #formula, fx.key_val, v.val )
FROM (
SELECT
[value] AS [key_val],
ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) AS [key_id]
FROM STRING_SPLIT ( TRANSLATE ( #vals, '+/*-', ',,,,' ), ',' )
) AS fx
INNER JOIN #Values v
ON Fx.[key_val] = v.[key]
ORDER BY
fx.[key_id]
-- Return updated formula.
SELECT #formula AS RevisedFormula;
-- Return the result (remove the equals sign).
SET #formula = FORMATMESSAGE ( 'SELECT %s AS FormulaResult;', REPLACE ( #formula, '=', '' ) );
EXEC ( #formula );
SELECT #formula AS RevisedFormula; returns:
+----------------+
| RevisedFormula |
+----------------+
| =(10+20)/5 |
+----------------+
The last part of my example uses EXEC to do the math. You cannot use EXEC in a function.
-- Return the result (remove the equals sign).
SET #formula = FORMATMESSAGE ( 'SELECT %s AS FormulaResult;', REPLACE ( #formula, '=', '' ) );
EXEC ( #formula );
Returns
+---------------+
| FormulaResult |
+---------------+
| 6 |
+---------------+
Changing the formula value to =-(AA+CC) returns:
+----------------+
| RevisedFormula |
+----------------+
| =-(10+6) |
+----------------+
+---------------+
| FormulaResult |
+---------------+
| -16 |
+---------------+
It's probably worth noting to pay attention to math order in your formulas. Your original example of =AA+BB/DC returns 14, not the 6 that may have been expected. I updated your formula to =(AA+BB)/DC for my example.

How to concatenate strings in SQL Server, and sort/ order by a different column?

I've seen many examples of concatenating strings in SQL Server, but if they worry about sorting, it's always by the column being concatenated.
I need to order the values based on data in a different fields.
Sample table:
ClassID | StudentName | SortOrder
-----------------------------
A |James |1
A |Janice |3
A |Leonard |2
B |Luke |2
B |Leia |1
B |Artoo |3
And the results I'd like to get are:
ClassID |StudentName
--------------------------------
A |James, Leonard, Janice
B |Leia, Luke, Artoo
How can this be done in SQL Server 2016?
(I'm looking forward to STRING_AGG in 2017, but we're not there yet...)
Thanks!
Here you go:
SELECT
s1.ClassID
, STUFF((SELECT
',' + s2.StudentName
FROM dbo.Student AS s2
WHERE s1.classID = s2.ClassID
ORDER BY s2.SortOrder
FOR XML PATH('')), 1, 1, '') AS StudentNames
FROM dbo.Student AS s1
GROUP BY s1.ClassID
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE MyTable(ClassID varchar(255),StudentName varchar(255),SortOrder int)
INSERT INTO MyTable(ClassID,StudentName,SortOrder)VALUES('A','James',1),('A','Janice',3),('A','Leonard',2),
('B','Luke',2),('B','Lela',1),('B','Artoo',3)
Query 1:
SELECT
t.ClassID
, STUFF((SELECT
',' + t1.StudentName
FROM MyTable t1
WHERE t.classID = t1.ClassID
ORDER BY t1.SortOrder
FOR XML PATH('')), 1, 1, '') AS StudentNamesConcat
FROM MyTable AS t
GROUP BY t.ClassID
Results:
| ClassID | StudentNamesConcat |
|---------|----------------------|
| A | James,Leonard,Janice |
| B | Lela,Luke,Artoo |
Here the query
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL DROP TABLE #Temp;
CREATE TABLE #Temp(ClassId varchar(10),studName varchar(100),SortOrder int)
INSERT INTO #Temp(ClassId , studName, SortOrder)
SELECT 'A','James',1 UNION ALL
SELECT 'A','Janice',3UNION ALL
SELECT 'A','Leonard',2 UNION ALL
SELECT 'B','Luke',2 UNION ALL
SELECT 'B','Leia',1 UNION ALL
SELECT 'B','Artoo',3
-- select * from #Temp
select
distinct
stuff((
select ',' + u.studName
from #Temp u
where u.studName = studName and U.ClassId = L.ClassId
order by u.SortOrder
for xml path('')
),1,1,'') as userlist,ClassId
from #Temp L
group by ClassId
You can try using PIVOT also. This will support even older version of SQL server.
Only limitation : you should know the maximum SortOrder value. Below code will work for SortOrder <=20 of any ClassID
SELECT ClassID, ISNULL([1],'') +ISNULL(', '+[2],'')+ISNULL(', '+[3],'')+ISNULL(', '+
[4],'')+ISNULL(', '+[5],'')+ISNULL(', '+[6],'')+ISNULL(', '+[7],'')+ISNULL(', '+
[8],'')+ISNULL(', '+[9],'')+ISNULL(', '+[10],'')+ISNULL(', '+[11],'')+ISNULL(', '+
[12],'')+ISNULL(', '+[13],'')+ISNULL(', '+[14],'')+ISNULL(', '+[15],'')+ISNULL(', '+
[16],'')+ISNULL(', '+[17],'')+ISNULL(', '+[18],'')+ISNULL(', '+[19],'')+ISNULL(', '+
[20],'') AS StudentName
FROM
(SELECT SortOrder,ClassID,StudentName
FROM [Table1] A
) AS SourceTable
PIVOT
(
MAX(StudentName)
FOR SortOrder IN ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20])
) AS PivotTable

SQL taking a one-to-many delimited list in a table column and transforming it into a one-to-one relationship table [duplicate]

I have a SQL Table like this:
| SomeID | OtherID | Data
+----------------+-------------+-------------------
| abcdef-..... | cdef123-... | 18,20,22
| abcdef-..... | 4554a24-... | 17,19
| 987654-..... | 12324a2-... | 13,19,20
is there a query where I can perform a query like SELECT OtherID, SplitData WHERE SomeID = 'abcdef-.......' that returns individual rows, like this:
| OtherID | SplitData
+-------------+-------------------
| cdef123-... | 18
| cdef123-... | 20
| cdef123-... | 22
| 4554a24-... | 17
| 4554a24-... | 19
Basically split my data at the comma into individual rows?
I am aware that storing a comma-separated string into a relational database sounds dumb, but the normal use case in the consumer application makes that really helpful.
I don't want to do the split in the application as I need paging, so I wanted to explore options before refactoring the whole app.
It's SQL Server 2008 (non-R2).
You can use the wonderful recursive functions from SQL Server:
Sample table:
CREATE TABLE Testdata
(
SomeID INT,
OtherID INT,
String VARCHAR(MAX)
);
INSERT Testdata SELECT 1, 9, '18,20,22';
INSERT Testdata SELECT 2, 8, '17,19';
INSERT Testdata SELECT 3, 7, '13,19,20';
INSERT Testdata SELECT 4, 6, '';
INSERT Testdata SELECT 9, 11, '1,2,3,4';
The query
WITH tmp(SomeID, OtherID, DataItem, String) AS
(
SELECT
SomeID,
OtherID,
LEFT(String, CHARINDEX(',', String + ',') - 1),
STUFF(String, 1, CHARINDEX(',', String + ','), '')
FROM Testdata
UNION all
SELECT
SomeID,
OtherID,
LEFT(String, CHARINDEX(',', String + ',') - 1),
STUFF(String, 1, CHARINDEX(',', String + ','), '')
FROM tmp
WHERE
String > ''
)
SELECT
SomeID,
OtherID,
DataItem
FROM tmp
ORDER BY SomeID;
-- OPTION (maxrecursion 0)
-- normally recursion is limited to 100. If you know you have very long
-- strings, uncomment the option
Output
SomeID | OtherID | DataItem
--------+---------+----------
1 | 9 | 18
1 | 9 | 20
1 | 9 | 22
2 | 8 | 17
2 | 8 | 19
3 | 7 | 13
3 | 7 | 19
3 | 7 | 20
4 | 6 |
9 | 11 | 1
9 | 11 | 2
9 | 11 | 3
9 | 11 | 4
Finally, the wait is over with SQL Server 2016. They have introduced the Split string function, STRING_SPLIT:
select OtherID, cs.Value --SplitData
from yourtable
cross apply STRING_SPLIT (Data, ',') cs
All the other methods to split string like XML, Tally table, while loop, etc.. have been blown away by this STRING_SPLIT function.
Here is an excellent article with performance comparison: Performance Surprises and Assumptions: STRING_SPLIT.
For older versions, using tally table here is one split string function(best possible approach)
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
Referred from Tally OH! An Improved SQL 8K “CSV Splitter” Function
Check this
SELECT A.OtherID,
Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
(
SELECT OtherID,
CAST ('<M>' + REPLACE(Data, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM Table1
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
Very late but try this out:
SELECT ColumnID, Column1, value --Do not change 'value' name. Leave it as it is.
FROM tbl_Sample
CROSS APPLY STRING_SPLIT(Tags, ','); --'Tags' is the name of column containing comma separated values
So we were having this:
tbl_Sample :
ColumnID| Column1 | Tags
--------|-----------|-------------
1 | ABC | 10,11,12
2 | PQR | 20,21,22
After running this query:
ColumnID| Column1 | value
--------|-----------|-----------
1 | ABC | 10
1 | ABC | 11
1 | ABC | 12
2 | PQR | 20
2 | PQR | 21
2 | PQR | 22
Thanks!
select t.OtherID,x.Kod
from testData t
cross apply (select Code from dbo.Split(t.Data,',') ) x
As of Feb 2016 - see the TALLY Table Example - very likely to outperform my TVF below, from Feb 2014. Keeping original post below for posterity:
Too much repeated code for my liking in the above examples. And I dislike the performance of CTEs and XML. Also, an explicit Id so that consumers that are order specific can specify an ORDER BY clause.
CREATE FUNCTION dbo.Split
(
#Line nvarchar(MAX),
#SplitOn nvarchar(5) = ','
)
RETURNS #RtnValue table
(
Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Data nvarchar(100) NOT NULL
)
AS
BEGIN
IF #Line IS NULL RETURN;
DECLARE #split_on_len INT = LEN(#SplitOn);
DECLARE #start_at INT = 1;
DECLARE #end_at INT;
DECLARE #data_len INT;
WHILE 1=1
BEGIN
SET #end_at = CHARINDEX(#SplitOn,#Line,#start_at);
SET #data_len = CASE #end_at WHEN 0 THEN LEN(#Line) ELSE #end_at-#start_at END;
INSERT INTO #RtnValue (data) VALUES( SUBSTRING(#Line,#start_at,#data_len) );
IF #end_at = 0 BREAK;
SET #start_at = #end_at + #split_on_len;
END;
RETURN;
END;
Nice to see that it have been solved in the 2016 version, but for all of those that is not on that, here are two generalized and simplified versions of the methods above.
The XML-method is shorter, but of course requires the string to allow for the xml-trick (no 'bad' chars.)
XML-Method:
create function dbo.splitString(#input Varchar(max), #Splitter VarChar(99)) returns table as
Return
SELECT Split.a.value('.', 'VARCHAR(max)') AS Data FROM
( SELECT CAST ('<M>' + REPLACE(#input, #Splitter, '</M><M>') + '</M>' AS XML) AS Data
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
Recursive method:
create function dbo.splitString(#input Varchar(max), #Splitter Varchar(99)) returns table as
Return
with tmp (DataItem, ix) as
( select #input , CHARINDEX('',#Input) --Recu. start, ignored val to get the types right
union all
select Substring(#input, ix+1,ix2-ix-1), ix2
from (Select *, CHARINDEX(#Splitter,#Input+#Splitter,ix+1) ix2 from tmp) x where ix2<>0
) select DataItem from tmp where ix<>0
Function in action
Create table TEST_X (A int, CSV Varchar(100));
Insert into test_x select 1, 'A,B';
Insert into test_x select 2, 'C,D';
Select A,data from TEST_X x cross apply dbo.splitString(x.CSV,',') Y;
Drop table TEST_X
XML-METHOD 2: Unicode Friendly 😀 (Addition courtesy of Max Hodges)
create function dbo.splitString(#input nVarchar(max), #Splitter nVarchar(99)) returns table as
Return
SELECT Split.a.value('.', 'NVARCHAR(max)') AS Data FROM
( SELECT CAST ('<M>' + REPLACE(#input, #Splitter, '</M><M>') + '</M>' AS XML) AS Data
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
Please refer below TSQL. STRING_SPLIT function is available only under compatibility level 130 and above.
TSQL:
DECLARE #stringValue NVARCHAR(400) = 'red,blue,green,yellow,black';
DECLARE #separator CHAR = ',';
SELECT [value] As Colour
FROM STRING_SPLIT(#stringValue, #separator);
RESULT:
Colour
red
blue
green
yellow
black
I know it has a lot of answers, but I want to write my version of split function like others and like string_split SQL Server 2016 native function.
create function [dbo].[Split]
(
#Value nvarchar(max),
#Delimiter nvarchar(50)
)
returns #tbl table
(
Seq int primary key identity(1, 1),
Value nvarchar(max)
)
as begin
declare #Xml xml = cast('<d>' + replace(#Value, #Delimiter, '</d><d>') + '</d>' as xml);
insert into #tbl
(Value)
select a.split.value('.', 'nvarchar(max)') as Value
from #Xml.nodes('/d') a(split);
return;
end;
Seq column is primary key to support fast join with other real table or Split function returned table.
Used XML function to support large data (looping version will slow down significantly when you have large data)
Here's a answer to question.
CREATE TABLE Testdata
(
SomeID INT,
OtherID INT,
String VARCHAR(MAX)
);
INSERT Testdata SELECT 1, 9, '18,20,22';
INSERT Testdata SELECT 2, 8, '17,19';
INSERT Testdata SELECT 3, 7, '13,19,20';
INSERT Testdata SELECT 4, 6, '';
INSERT Testdata SELECT 9, 11, '1,2,3,4';
select t.SomeID, t.OtherID, s.Value
from Testdata t
cross apply dbo.Split(t.String, ',') s;
--Output
SomeID OtherID Value
1 9 18
1 9 20
1 9 22
2 8 17
2 8 19
3 7 13
3 7 19
3 7 20
4 6
9 11 1
9 11 2
9 11 3
9 11 4
Joining Split with other split
declare #Names nvarchar(max) = 'a,b,c,d';
declare #Codes nvarchar(max) = '10,20,30,40';
select n.Seq, n.Value Name, c.Value Code
from dbo.Split(#Names, ',') n
inner join dbo.Split(#Codes, ',') c on n.Seq = c.Seq;
--Output
Seq Name Code
1 a 10
2 b 20
3 c 30
4 d 40
Split two times
declare #NationLocSex nvarchar(max) = 'Korea,Seoul,1;Vietnam,Kiengiang,0;China,Xian,0';
with rows as
(
select Value
from dbo.Split(#NationLocSex, ';')
)
select rw.Value r, cl.Value c
from rows rw
cross apply dbo.Split(rw.Value, ',') cl;
--Output
r c
Korea,Seoul,1 Korea
Korea,Seoul,1 Seoul
Korea,Seoul,1 1
Vietnam,Kiengiang,0 Vietnam
Vietnam,Kiengiang,0 Kiengiang
Vietnam,Kiengiang,0 0
China,Xian,0 China
China,Xian,0 Xian
China,Xian,0 0
Split to columns
declare #Numbers nvarchar(50) = 'First,Second,Third';
with t as
(
select case when Seq = 1 then Value end f1,
case when Seq = 2 then Value end f2,
case when Seq = 3 then Value end f3
from dbo.Split(#Numbers, ',')
)
select min(f1) f1, min(f2) f2, min(f3) f3
from t;
--Output
f1 f2 f3
First Second Third
Generate rows by range
declare #Ranges nvarchar(50) = '1-2,4-6';
declare #Numbers table (Num int);
insert into #Numbers values (1),(2),(3),(4),(5),(6),(7),(8);
with t as
(
select r.Seq, r.Value,
min(case when ft.Seq = 1 then ft.Value end) ValueFrom,
min(case when ft.Seq = 2 then ft.Value end) ValueTo
from dbo.Split(#Ranges, ',') r
cross apply dbo.Split(r.Value, '-') ft
group by r.Seq, r.Value
)
select t.Seq, t.Value, t.ValueFrom, t.ValueTo, n.Num
from t
inner join #Numbers n on n.Num between t.ValueFrom and t.ValueTo;
--Output
Seq Value ValueFrom ValueTo Num
1 1-2 1 2 1
1 1-2 1 2 2
2 4-6 4 6 4
2 4-6 4 6 5
2 4-6 4 6 6
DECLARE #id_list VARCHAR(MAX) = '1234,23,56,576,1231,567,122,87876,57553,1216';
DECLARE #table TABLE ( id VARCHAR(50) );
DECLARE #x INT = 0;
DECLARE #firstcomma INT = 0;
DECLARE #nextcomma INT = 0;
SET #x = LEN(#id_list) - LEN(REPLACE(#id_list, ',', '')) + 1; -- number of ids in id_list
WHILE #x > 0
BEGIN
SET #nextcomma = CASE WHEN CHARINDEX(',', #id_list, #firstcomma + 1) = 0
THEN LEN(#id_list) + 1
ELSE CHARINDEX(',', #id_list, #firstcomma + 1)
END;
INSERT INTO #table
VALUES ( SUBSTRING(#id_list, #firstcomma + 1, (#nextcomma - #firstcomma) - 1) );
SET #firstcomma = CHARINDEX(',', #id_list, #firstcomma + 1);
SET #x = #x - 1;
END;
SELECT *
FROM #table;
;WITH tmp(SomeID, OtherID, DataItem, Data) as (
SELECT SomeID, OtherID, LEFT(Data, CHARINDEX(',',Data+',')-1),
STUFF(Data, 1, CHARINDEX(',',Data+','), '')
FROM Testdata
WHERE Data > ''
)
SELECT SomeID, OtherID, Data
FROM tmp
ORDER BY SomeID
with only tiny little modification to above query...
By creating this function ([DelimitedSplit]) which splits a string, you could do an OUTER APPLY to your SELECT.
CREATE FUNCTION [dbo].[DelimitedSplit]
--===== Define I/O parameters
(#pString VARCHAR(8000), #pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a INNER JOIN E1 b ON b.N = a.N), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a INNER JOIN E2 b ON b.N = a.N), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;
TEST
CREATE TABLE #Testdata
(
SomeID INT,
OtherID INT,
String VARCHAR(MAX)
);
INSERT #Testdata SELECT 1, 9, '18,20,22';
INSERT #Testdata SELECT 2, 8, '17,19';
INSERT #Testdata SELECT 3, 7, '13,19,20';
INSERT #Testdata SELECT 4, 6, '';
INSERT #Testdata SELECT 9, 11, '1,2,3,4';
SELECT
*
FROM #Testdata
OUTER APPLY [dbo].[DelimitedSplit](String,',');
DROP TABLE #Testdata;
RESULT
SomeID OtherID String ItemNumber Item
1 9 18,20,22 1 18
1 9 18,20,22 2 20
1 9 18,20,22 3 22
2 8 17,19 1 17
2 8 17,19 2 19
3 7 13,19,20 1 13
3 7 13,19,20 2 19
3 7 13,19,20 3 20
4 6 1
9 11 1,2,3,4 1 1
9 11 1,2,3,4 2 2
9 11 1,2,3,4 3 3
9 11 1,2,3,4 4 4
Function
CREATE FUNCTION dbo.SplitToRows (#column varchar(100), #separator varchar(10))
RETURNS #rtnTable TABLE
(
ID int identity(1,1),
ColumnA varchar(max)
)
AS
BEGIN
DECLARE #position int = 0;
DECLARE #endAt int = 0;
DECLARE #tempString varchar(100);
set #column = ltrim(rtrim(#column));
WHILE #position<=len(#column)
BEGIN
set #endAt = CHARINDEX(#separator,#column,#position);
if(#endAt=0)
begin
Insert into #rtnTable(ColumnA) Select substring(#column,#position,len(#column)-#position);
break;
end;
set #tempString = substring(ltrim(rtrim(#column)),#position,#endAt-#position);
Insert into #rtnTable(ColumnA) select #tempString;
set #position=#endAt+1;
END;
return;
END;
Use case
select * from dbo.SplitToRows('T14; p226.0001; eee; 3554;', ';');
Or just a select with multiple result set
DECLARE #column varchar(max)= '1234; 4748;abcde; 324432';
DECLARE #separator varchar(10) = ';';
DECLARE #position int = 0;
DECLARE #endAt int = 0;
DECLARE #tempString varchar(100);
set #column = ltrim(rtrim(#column));
WHILE #position<=len(#column)
BEGIN
set #endAt = CHARINDEX(#separator,#column,#position);
if(#endAt=0)
begin
Select substring(#column,#position,len(#column)-#position);
break;
end;
set #tempString = substring(ltrim(rtrim(#column)),#position,#endAt-#position);
select #tempString;
set #position=#endAt+1;
END;
When using this approach you have to make sure that none of your values contains something that would be illegal XML – user1151923
I always use the XML method. Make sure you use VALID XML. I have two functions to convert between valid XML and Text. (I tend to strip out the carriage returns as I don't usually need them.
CREATE FUNCTION dbo.udf_ConvertTextToXML (#Text varchar(MAX))
RETURNS varchar(MAX)
AS
BEGIN
SET #Text = REPLACE(#Text,CHAR(10),'');
SET #Text = REPLACE(#Text,CHAR(13),'');
SET #Text = REPLACE(#Text,'<','<');
SET #Text = REPLACE(#Text,'&','&');
SET #Text = REPLACE(#Text,'>','>');
SET #Text = REPLACE(#Text,'''','&apos;');
SET #Text = REPLACE(#Text,'"','"');
RETURN #Text;
END;
CREATE FUNCTION dbo.udf_ConvertTextFromXML (#Text VARCHAR(MAX))
RETURNS VARCHAR(max)
AS
BEGIN
SET #Text = REPLACE(#Text,'<','<');
SET #Text = REPLACE(#Text,'&','&');
SET #Text = REPLACE(#Text,'>','>');
SET #Text = REPLACE(#Text,'&apos;','''');
SET #Text = REPLACE(#Text,'"','"');
RETURN #Text;
END;
Below works on sql server 2008
select *, ROW_NUMBER() OVER(order by items) as row#
from
( select 134 myColumn1, 34 myColumn2, 'd,c,k,e,f,g,h,a' comaSeperatedColumn) myTable
cross apply
SPLIT (rtrim(comaSeperatedColumn), ',') splitedTable -- gives 'items' column
Will get all Cartesian product with the origin table columns plus "items" of split table.
You can use the following function to extract data
CREATE FUNCTION [dbo].[SplitString]
(
#RowData NVARCHAR(MAX),
#Delimeter NVARCHAR(MAX)
)
RETURNS #RtnValue TABLE
(
ID INT IDENTITY(1,1),
Data NVARCHAR(MAX)
)
AS
BEGIN
DECLARE #Iterator INT;
SET #Iterator = 1;
DECLARE #FoundIndex INT;
SET #FoundIndex = CHARINDEX(#Delimeter,#RowData);
WHILE (#FoundIndex>0)
BEGIN
INSERT INTO #RtnValue (data)
SELECT
Data = LTRIM(RTRIM(SUBSTRING(#RowData, 1, #FoundIndex - 1)));
SET #RowData = SUBSTRING(#RowData,
#FoundIndex + DATALENGTH(#Delimeter) / 2,
LEN(#RowData));
SET #Iterator = #Iterator + 1;
SET #FoundIndex = CHARINDEX(#Delimeter, #RowData);
END;
INSERT INTO #RtnValue (Data)
SELECT Data = LTRIM(RTRIM(#RowData));
RETURN;
END;

JOIN three tables and aggregate data from multiple rows for every DISTINCT row in separate column

JOIN three tables and aggregate data from multiple rows for every DISTINCT row in separate column
i have a table where one item is mapped with multiple items.
Key 1 | Key 2
1 2
1 5
1 6
1 4
1 8
I have another table like this
Key 1 | ShortKey1Desc
1 'Desc short'
i have one more table where i have data like this
Key 1 | Description
1 'Desc a'
1 'Desc c'
1 'Desc aa'
1 'Desc tt'
i need to write a sql query for my view where table would be generated like this
Key 1 | AllKeys2ForKey1 | AllDescriptionsForKey1 | ShortKey1Desc
1 | 2;5;6;4;8 | Desc a; Desc c; Desc aa; Desc tt | Desc short
Key 1 is a string type field so i need to join them table using that string key
what i'm trying is to create a view for comfortable data access. need to create a query what will not take ages. i already tried to do it with Functions but it takes ages for load.
any help on this one would be highly appreciated. thanks a lot
Assuming that you are unable to change the data structures to make a more efficient query, this will work:
--Populate sample data
SELECT 1 as key1, 2 as key2 INTO #tbl1
UNION ALL SELECT 1, 5
UNION ALL SELECT 1, 6
UNION ALL SELECT 1, 4
UNION ALL SELECT 1, 8
SELECT 1 as key1, 'Desc short' as shortkeydesc INTO #tbl2
SELECT 1 as key1, 'Desc a' as [description] INTO #tbl3
UNION ALL SELECT 1, 'Desc c'
UNION ALL SELECT 1, 'Desc aa'
UNION ALL SELECT 1, 'Desc tt'
--Combine data into semi-colon separated lists
SELECT
key1
,STUFF(
(
SELECT
';' + CAST(t2.key2 AS VARCHAR(10))
FROM #tbl1 t2
WHERE t2.key1 = tbl1.key1
FOR XML PATH('')
), 1, 1, ''
)
,STUFF(
(
SELECT
';' + tbl2.shortkeydesc
FROM #tbl2 tbl2
WHERE tbl2.key1 = tbl1.key1
FOR XML PATH('')
), 1, 1, ''
)
,STUFF(
(
SELECT
';' + tbl3.[description]
FROM #tbl3 tbl3
WHERE tbl3.key1 = tbl1.key1
FOR XML PATH('')
), 1, 1, ''
)
FROM #tbl1 tbl1
GROUP BY tbl1.key1
to convert rows into one single result you will need to save values in a variable, below is sample code just to give you an idea
Declare #AllKeys2ForKey1 varchar(50)
set #AllKeys2ForKey1 = ''
SELECT #AllKeys2ForKey1 = #AllKeys2ForKey1 + cast([Key 2] as varchar(3)) + ','
FROM [AllKeys2ForKey1Table] where [KEY 1] = 1
Declare #AllDescriptionsForKey1 varchar(100)
set #AllDescriptionsForKey1 = ''
SELECT #AllKeys2ForKey1 = #AllKeys2ForKey1 + [Description] + ','
FROM [AllDescriptionsForKey1Table] where [KEY 1] = 1
Declare #ShortKey1Desc varchar(100)
set #ShortKey1Desc = ''
SELECT #ShortKey1Desc = #ShortKey1Desc + [ShortKey1Desc] + ','
FROM [ShortKey1DescTable] where [KEY 1] = 1
Select [KEY 1],
substring(#AllKeys2ForKey1,1,len(#AllKeys2ForKey1) - 1) as 'AllKeys2ForKey1 ',
substring(#AllDescriptionsForKey1,1,len(#AllDescriptionsForKey1) - 1) as 'AllDescriptionsForKey1',
substring(#ShortKey1Desc,1,len(#ShortKey1Desc) - 1) as 'ShortKey1Desc'
from Table where [KEY 1]= 1
You Must Write CLR Aggregate Function for Solving This Question.
for write CLR Aggregate Function :
1: Run Microsoft Visual Stadio
2: Create New Project
3: then Select Data Project
4: CLR Aggregate Function
After Create Your Aggregate Function Create Your Query Such as Below
Select A.Key1, OwnAggregateFn(B.Description), OwnAggregateFn(C.Key2), ...
From A
inner join B ON B.Key1 = A.Key1
inner join C ON C.Key1 = A.Key1
...
Group By A.Key1