Apply CTE function consolidating strings to every row in a H2 table - sql

I have a table with long strings in one column which I want to consolidate into an easier to read format ('abc;abc;abc;efg;hij;klm;klm;klm' -> 'abc: 3, efg: 1, hij: 1, klm: 3').
I have written a function that consolidates the string, but now I want to apply it to every entry in a table. Any suggestion on how this can be achieved?
This is the code that splits and consolidates the string in #str:
SET #str = 'abc;abc;abc;efg;hij;klm;klm;klm;';
WITH cte1(token, remainder) AS (
SELECT LEFT(#str, LOCATE(';', #str)-1) AS token,
RIGHT(#str, LENGTH(#str)-LOCATE(';', #str)) as remainder -- anchor member
UNION ALL
SELECT LEFT(remainder, LOCATE(';', remainder)-1) AS token,
RIGHT(remainder, LENGTH(remainder)-LOCATE(';', remainder)) as remainder -- recursive member
FROM cte1
WHERE LENGTH(remainder)>0 -- terminator
), cte2 AS (
SELECT token, count(token) AS c
FROM cte1
GROUP BY token
HAVING LENGTH(token)>0
ORDER BY token
)
SELECT GROUP_CONCAT(CONCAT_WS(': ', token, c) SEPARATOR ', ') FROM cte2
GROUP BY 1
The first cte1 breaks the string into separate tokens, the second cte2 creates a pivot and counts each instance, and the final SELECT statement consolidates the resulting table into one single string.
How would I apply this to each entry in column S1 in the following setup, e.g. by updating the table and adding the result into S2?
CREATE TABLE T1 (
ID INT, S1 VARCHAR, S2 VARCHAR);
INSERT INTO T1
VALUES (1, 'abc;abc;abc;efg;hij;klm;klm;klm;', ''),
(2, '123;123;235;235;235;987;987;123;', '');
Thank you very much for any help!

Related

SQL : extract next character from string where multiple separators exist

Azure MSSQL Database
I have a column that contains values stored per transaction. The string can contain up to 7 values, separated by a '-'.
I need to be able to extract the value that is stored after the 3rd '-'. The issue is that the length of this column (and the characters that come before the 3rd '-') can vary.
For example:
DIM VALUE
1. NHL--WA-S-MOSG-SER-
2. VDS----HAST-SER-
3. ---D---SER
Row 1 needs to return 'S'
Row 2 needs to return '-'
Row 3 needs to return 'D'
This is by no means an optimal solution, but it works in SQL Server. 😊
TempTable added for testing purposes. Maybe it gives you a hint as of where to start.
Edit: added reference for string_split function (works from SQL Server 2016 up).
CREATE TABLE #tempStrings (
VAL VARCHAR(30)
);
INSERT INTO #tempStrings VALUES ('NHL--WA-S-MOSG-SER-');
INSERT INTO #tempStrings VALUES ('VDS----HAST-SER-');
INSERT INTO #tempStrings VALUES ('---D---SER');
INSERT INTO #tempStrings VALUES ('A-V-D-C--SER');
SELECT
t.VAL,
CASE t.PART WHEN '' THEN '-' ELSE t.PART END AS PART
FROM
(SELECT
t.VAL,
ROW_NUMBER() OVER (PARTITION BY VAL ORDER BY (SELECT NULL)) AS IX,
value AS PART
FROM #tempStrings t
CROSS APPLY string_split(VAL, '-')) t
WHERE t.IX = 4; --DASH COUNT + 1
DROP TABLE #tempStrings;
Output is...
VAL PART
---D---SER D
A-V-D-C--SER C
NHL--WA-S-MOSG-SER- S
VDS----HAST-SER- -
If you always want the fourth element then using CHARINDEX is relatively straightforward:
DROP TABLE IF EXISTS #tmp;
CREATE TABLE #tmp (
rowId INT IDENTITY PRIMARY KEY,
xval VARCHAR(30) NOT NULL
);
INSERT INTO #tmp
VALUES
( 'NHL--WA-S-MOSG-SER-' ),
( 'VDS----HAST-SER-' ),
( '---D---SER' ),
( 'A-V-D-C--SER' );
;WITH cte AS
( -- Work out the position of the 3rd dash
SELECT
rowId,
xval,
CHARINDEX( '-', xval, CHARINDEX( '-', xval, CHARINDEX( '-', xval ) + 1 ) + 1 ) + 1 xstart
FROM #tmp t
), cte2 AS
( -- Work out the length for the substring function
SELECT rowId, xval, xstart, CHARINDEX( '-', xval, xstart) - (xstart) AS xlen
FROM cte
)
SELECT rowId, ISNULL( NULLIF( SUBSTRING( xval, xstart, xlen ), '' ), '-' ) xpart
FROM cte2
I also did a volume test at 1 million rows and this was by far the fastest method compared with STRING_SPLIT, OPENJSON, recursive CTE (the worst at high volume). As a downside this method is less extensible, say you want the second or fifth items for example.

How to spilt string in a sql column and store it in other column based on string length

I have declared a temporary table inside a stored procedure where i have four column for address
e.g AddressLine1, AddressLine2,AddressLine3,AddressLine4 each of which have length Varchar(50)
i want to insert data inside the temp table from existing table so that existing table stored the address inside AddressLine1
so i want to insert Address in AddressLine1 from existing table to temporary table but if Address exceeds length of 50 with spaces then i want to insert remaining address to AddressLine2 and so on
so in all i want to divide address based on length of the column which is 50 and then store it amongst
addressLine1,addressLine2,addressLine3,addressLine4 in temporary table
select DATALENGTH(ADDRESSLINE1)
from PASSENGER
where DATALENGTH(ADDRESSLINE1) > 50
You can achieve this by some recursive cte and substring... following a quick example which can for sure be optimized / shortened but which should make clear the point:
DECLARE #x NVARCHAR(200) = N'This is some test and this still is the test and yet here the test continues and so on until the test is finished';
WITH cte1 AS(
-- evaluate all positions of spaces
SELECT #x as txt, CHARINDEX(' ', #x) as idx
UNION ALL
SELECT txt, CHARINDEX(' ', txt, idx+1) as idx
FROM cte1
WHERE CHARINDEX(' ', txt, idx+1) >0
),
cte2 AS(
-- evaluate groups basing of the length of 50 as desired output length
SELECT *, idx/50 - CASE WHEN idx%50 = 0 THEN 1 ELSE 0 END AS dividx
FROM cte1
),
cte3 AS(
-- evaluate max space position per group
SELECT txt, dividx, max(idx) maxIdx
FROM cte2
GROUP BY txt, dividx
),
cte4 AS(
-- evaluate required start and end position for substring operation
SELECT txt, dividx
,ISNULL(LAG(maxIdx) OVER (PARTITION BY txt ORDER BY dividx)+1, 1) AS minIdx
,CASE WHEN LEAD(maxIdx) OVER (PARTITION BY txt ORDER BY dividx) IS NULL THEN len(txt) ELSE maxIdx END AS maxIdx
FROM cte3
)
-- perform substring
SELECT SUBSTRING(txt, minIdx, maxIdx-minIdx+1) AS txt
FROM cte4
OPTION (MAXRECURSION 0)

Query everything that comes after the '#'

I am setting up a new query but unfortunately I got stuck in some kind of functions in SQL. I have some records with specific emails. All I want is bringing everything that comes after the '#'.
For example:
cesarcastillo88#hotmail.com ==> as a result I should get the following: hotmail.com.
This was not complicated at all because of the fact that the record shows one email only.
But...what if that record includes the following emails:
cesarcastillo88#hotmail.com ; laura23#gmail.com ; test#compliance.com
I did it perfectly for those cases with only 1 email in a single record
I used the following formula:
substring(**columnName**, charindex('#', sfe.**columnName**), len(sfe.**columnName**))
However, how am I suppose to do it with 3 emails in a single record?
My desired outcome is the following:
hotmail.com ; gmail.com ; compliance.com
Here is a possible solution based on the assumption that you have some sort of ID column that could help to identify each unique row:
;with smpl as (
select *
from (values
(1, 'cesarcastillo88#hotmail.com ; laura23#gmail.com ; test#compliance.com'),
(2, 'abc#cde.net'),
(3, 'laura23#gmail.com ; test#compliance.com')) x(id, email)
), split(id, A, B) as (
select distinct id, CAST(LEFT(email, CHARINDEX(';',email+';')-1) as varchar(100)),
CAST(STUFF(email, 1, CHARINDEX(';',email+';'), '') as varchar(100))
from smpl
union all
select id, CAST(LEFT(B, CHARINDEX(';',B+';')-1) as varchar(100)),
CAST(STUFF(B, 1, CHARINDEX(';',B+';'), '') as varchar(100))
from split
where B > ''
), clr as (
select ID, substring(LTRIM(RTRIM(A)), charindex('#', LTRIM(RTRIM(A))) + 1, len(LTRIM(RTRIM(A)))) cleanEmail
--into #tempTbl
from split
), ccat as (
SELECT DISTINCT ST2.ID,
SUBSTRING(
(
SELECT ';'+ST1.cleanEmail AS [text()]
FROM clr ST1
WHERE ST1.ID = ST2.ID
ORDER BY ST1.ID
FOR XML PATH ('')
), 2, 1000) Emails
FROM clr ST2
)
select * from ccat
And here is some explanation on how this all works:
First CTE expression splits emails into separate rows using ; as a separator
Second CTE is based on your function to remove the recipient from email address and only leave the domain
The last one concatenates everything back and uses same ; as separator. Feel free to add extra spaces around if that's your preferred output.
You don't say what version of SQL Server, but I'll assume 2016 or newer. They key is the STRING_SPLIT function. To join it to your data, you'll want to use CROSS APPLY.
create table #a (
id int identity(1,1),
email varchar(max)
)
insert #a
values ('cesarcastillo88#hotmail.com ; laura23#gmail.com ; test#compliance.com')
, ('dannyboy#irish.com')
select id
, email
, substring(email, CHARINDEX('#', email) + 1, len(email)) as domain
from #a
select a.id
, substring(ltrim(rtrim(b.value)), CHARINDEX('#', ltrim(rtrim(b.value))) + 1, len(ltrim(rtrim(b.value)))) as domain
from #a a
cross apply string_split(email, ';') b
drop table #a

Select rows using in with comma-separated string parameter

I'm converting a stored procedure from MySql to SQL Server. The procedure has one input parameter nvarchar/varchar which is a comma-separated string, e.g.
'1,2,5,456,454,343,3464'
I need to write a query that will retrieve the relevant rows, in MySql I'm using FIND_IN_SET and I wonder what the equivalent is in SQL Server.
I also need to order the ids as in the string.
The original query is:
SELECT *
FROM table_name t
WHERE FIND_IN_SET(id,p_ids)
ORDER BY FIND_IN_SET(id,p_ids);
The equivalent is like for the where and then charindex() for the order by:
select *
from table_name t
where ','+p_ids+',' like '%,'+cast(id as varchar(255))+',%'
order by charindex(',' + cast(id as varchar(255)) + ',', ',' + p_ids + ',');
Well, you could use charindex() for both, but the like will work in most databases.
Note that I've added delimiters to the beginning and end of the string, so 464 will not accidentally match 3464.
You would need to write a FIND_IN_SET function as it does not exist. The closet mechanism I can think of to convert a delimited string into a joinable object would be a to create a table-valued function and use the result in a standard in statement. It would need to be similar to:
DECLARE #MyParam NVARCHAR(3000)
SET #MyParam='1,2,5,456,454,343,3464'
SELECT
*
FROM
MyTable
WHERE
MyTableID IN (SELECT ID FROM dbo.MySplitDelimitedString(#MyParam,','))
And you would need to create a MySplitDelimitedString type table-valued function that would split a string and return a TABLE (ID INT) object.
A set based solution that splits the id's into ints and join with the base table which will make use of index on the base table id. I assumed the id would be an int, otherwise just remove the cast.
declare #ids nvarchar(100) = N'1,2,5,456,454,343,3464';
with nums as ( -- Generate numbers
select top (len(#ids)) row_number() over (order by (select 0)) n
from sys.messages
)
, pos1 as ( -- Get comma positions
select c.ci
from nums n
cross apply (select charindex(',', #ids, n.n) as ci) c
group by c.ci
)
, pos2 as ( -- Distinct posistions plus start and end
select ci
from pos1
union select 0
union select len(#ids) + 1
)
, pos3 as ( -- add row number for join
select ci, row_number() over (order by ci) as r
from pos2
)
, ids as ( -- id's and row id for ordering
select cast(substring(#ids, p1.ci + 1, p2.ci - p1.ci - 1) as int) id, row_number() over (order by p1.ci) r
from pos3 p1
inner join pos3 p2 on p2.r = p1.r + 1
)
select *
from ids i
inner join table_name t on t.id = i.id
order by i.r;
You can also try this by using regex to get the input values from comma separated string :
select * from table_name where id in (
select regexp_substr(p_ids,'[^,]+', 1, level) from dual
connect by regexp_substr(p_ids, '[^,]+', 1, level) is not null );

How to split concatenated field in SQL with SSIS or SQL

I have problem that I have to split a concatenated field into different rows.
The delimiter is a "+" marker.
So in my field I have 3%+2%+1% and what I want is row 1 ->3%, row 2 -> 2% and so on.
But there is one more big problem: I don't know how many concatenated values we have so it could 3, 5 or maybe 10 values.
Can somebody help me solving this issue with SSIS or SQL.
For me #sdrzymala is correct here. I would normalise this data first before loading it to a database. If the client or report needed the data pivoted or denormalised again I would do this in client code.
1) First I would save the following split function and staging table "PercentsNormalised" into the database. I got the split function from this question here.
-- DDL Code:
Create FUNCTION dbo.SplitStrings
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
AS
RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(#List, Number,
CHARINDEX(#Delimiter, #List + #Delimiter, Number) - Number)))
FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
WHERE Number <= CONVERT(INT, LEN(#List))
AND SUBSTRING(#Delimiter + #List, Number, 1) = #Delimiter
) AS y);
GO
Create Table PercentsNormalised(
RowIndex Int,
-- Other fields here,
PercentValue Varchar(100)
)
GO
2) Either writing some SQL (like below) or using the same logic in a SSIS dataflow task transform the data like so and insert into the "PercentsNormalised" table created above.
With TestData As (
-- Replace with "real table" containing concatenated rows
Select '3%+2%+1%' As Percents Union All
Select '5%+1%+1%+0%' Union All
Select '10%+8%' Union All
Select '10%+5%+1%+1%+0%'
),
TestDataWithRowIndex As (
-- You might want to order the rows by another field
-- in the "real table"
Select Row_Number() Over (Order By Percents) As RowIndex,
Percents
From TestData
)
-- You could remove this insert and select and have the logic in a
-- SSIS Dataflow task
Insert PercentsNormalised
Select td.RowIndex,
ss.Item As PercentValue
From TestDataWithRowIndex As td
Cross Apply dbo.SplitStrings(td.Percents, '+') ss;
3) Write client code on the "PercentsNormalised" table using say the SQL pivot operator.