String_Split on delimiter '.' SQL Server - sql

I have an issue when parsing out a particular field of data, and I'm at a block on how to solve it, so I'm hoping I can gain some insight on how to solve it.
I have a field being brought [ItemCategory] that contains instances like...
Instance: TennisShoes.Laces
Instance: HikingBoot-Dr.Marten.Laces
(I cannot change the delimiter from '.' to '|' as I don't control the source)
the code being used to separate the instances is as follows:
SELECT
[Program] = LTRIM(RTRIM(LEFT(c.[ItemCategory], CHARINDEX('.',c.[ItemCategory] + '.') - 1)))
,[Category] = LTRIM(RTRIM(RIGHT(c.[ItemCategory],LEN(c.[ItemCategory]) - CHARINDEX('.',c.[ItemCategory]))))
So my issue when the DHikingBoot-Dr.Marten.Laces instance passes through the code it becomes.
[Program] = HikingBoot-Dr
[Category] = Marten.Laces
How would I make it to ignore the first '.' and delimit on the second '.', while still maintaining correctness for the first instance.
Thank you for your time.. any advice is helpful.

Give this one a try for grabbing the end.
RIGHT(c.[ItemCategory], CHARINDEX(REVERSE('.'), REVERSE(c.[ItemCategory])) -1)

I would suggest revisiting how you are storing this data, if you can, as it is flawed and will continue to give you challenges.
But that aside, this solution assumes "Category" will not include a period and the data will always end with .category
A few tweaks to what you had started, we'll use REVERSE() to basically determine the length of "Category" when using LEFT(). Then when we do "Program" we subtract that from the total length when using the RIGHT()
DECLARE #testdata TABLE
(
[sampledata] NVARCHAR(100)
);
INSERT INTO #testdata (
[sampledata]
)
VALUES ( N'TennisShoes.Laces' )
, ( 'HikingBoot-Dr.Marten.Laces' );
SELECT LEFT([sampledata], LEN([sampledata]) - CHARINDEX('.', REVERSE([sampledata]))) AS [Program]
,RIGHT([sampledata], CHARINDEX('.', REVERSE([sampledata])) -1) AS [Category]
FROM #testdata;
You can also use SUBTRING() along with REVERSE()
For category, reverse the data, find the first period, parse the
value and reverse it back.
For Program, reverse the data, go 1 past the first period to the end
and reverse it back.
DECLARE #testdata TABLE
(
[sampledata] NVARCHAR(100)
);
INSERT INTO #testdata (
[sampledata]
)
VALUES ( N'TennisShoes.Laces' )
, ( 'HikingBoot-Dr.Marten.Laces' );
SELECT REVERSE(SUBSTRING(REVERSE([sampledata]), CHARINDEX('.', REVERSE([sampledata])) + 1, LEN([sampledata]))) AS [Program]
, REVERSE(SUBSTRING(REVERSE([sampledata]), 1, CHARINDEX('.', REVERSE([sampledata])) - 1)) AS [Category]
FROM #testdata;
Both giving you results of:
Program Category
--------------------- ----------
TennisShoes Laces
HikingBoot-Dr.Marten Laces

If you need to select only last part after the ., then you can reverse the string, find charindex and do left and right with that position:
with s as (
select 'TennisShoes.Laces' as inst union
select 'HikingBoot-Dr.Marten.Laces' union
select 'Test'
)
, pos as (
select
s.*,
charindex('.', reverse(inst)) as pos
from s
)
select
ltrim(rtrim(left(inst, len(inst) - pos))) as program,
ltrim(rtrim(right(inst, nullif(pos - 1, -1)))) as category
from pos
program | category
:------------------- | :-------
HikingBoot-Dr.Marten | Laces
TennisShoes | Laces
Test | null
db<>fiddle

Related

SQL : extract next character from string where multiple separators exist

Azure MSSQL Database
I have a column that contains values stored per transaction. The string can contain up to 7 values, separated by a '-'.
I need to be able to extract the value that is stored after the 3rd '-'. The issue is that the length of this column (and the characters that come before the 3rd '-') can vary.
For example:
DIM VALUE
1. NHL--WA-S-MOSG-SER-
2. VDS----HAST-SER-
3. ---D---SER
Row 1 needs to return 'S'
Row 2 needs to return '-'
Row 3 needs to return 'D'
This is by no means an optimal solution, but it works in SQL Server. 😊
TempTable added for testing purposes. Maybe it gives you a hint as of where to start.
Edit: added reference for string_split function (works from SQL Server 2016 up).
CREATE TABLE #tempStrings (
VAL VARCHAR(30)
);
INSERT INTO #tempStrings VALUES ('NHL--WA-S-MOSG-SER-');
INSERT INTO #tempStrings VALUES ('VDS----HAST-SER-');
INSERT INTO #tempStrings VALUES ('---D---SER');
INSERT INTO #tempStrings VALUES ('A-V-D-C--SER');
SELECT
t.VAL,
CASE t.PART WHEN '' THEN '-' ELSE t.PART END AS PART
FROM
(SELECT
t.VAL,
ROW_NUMBER() OVER (PARTITION BY VAL ORDER BY (SELECT NULL)) AS IX,
value AS PART
FROM #tempStrings t
CROSS APPLY string_split(VAL, '-')) t
WHERE t.IX = 4; --DASH COUNT + 1
DROP TABLE #tempStrings;
Output is...
VAL PART
---D---SER D
A-V-D-C--SER C
NHL--WA-S-MOSG-SER- S
VDS----HAST-SER- -
If you always want the fourth element then using CHARINDEX is relatively straightforward:
DROP TABLE IF EXISTS #tmp;
CREATE TABLE #tmp (
rowId INT IDENTITY PRIMARY KEY,
xval VARCHAR(30) NOT NULL
);
INSERT INTO #tmp
VALUES
( 'NHL--WA-S-MOSG-SER-' ),
( 'VDS----HAST-SER-' ),
( '---D---SER' ),
( 'A-V-D-C--SER' );
;WITH cte AS
( -- Work out the position of the 3rd dash
SELECT
rowId,
xval,
CHARINDEX( '-', xval, CHARINDEX( '-', xval, CHARINDEX( '-', xval ) + 1 ) + 1 ) + 1 xstart
FROM #tmp t
), cte2 AS
( -- Work out the length for the substring function
SELECT rowId, xval, xstart, CHARINDEX( '-', xval, xstart) - (xstart) AS xlen
FROM cte
)
SELECT rowId, ISNULL( NULLIF( SUBSTRING( xval, xstart, xlen ), '' ), '-' ) xpart
FROM cte2
I also did a volume test at 1 million rows and this was by far the fastest method compared with STRING_SPLIT, OPENJSON, recursive CTE (the worst at high volume). As a downside this method is less extensible, say you want the second or fifth items for example.

How to SELECT string between second and third instance of ",,"?

I am trying to get string between second and third instance of ",," using SQL SELECT.
Apparently functions substring and charindex are useful, and I have tried them but the problem is that I need the string between those specific ",,"s and the length of the strings between them can change.
Can't find working example anywhere.
Here is an example:
Table: test
Column: Column1
Row1: cat1,,cat2,,cat3,,cat4,,cat5
Row2: dogger1,,dogger2,,dogger3,,dogger4,,dogger5
Result: cat3dogger3
Here is my closest attempt, it works if the strings are same length every time, but they aren't:
SELECT SUBSTRING(column1,LEN(LEFT(column1,CHARINDEX(',,', column1,12)+2)),LEN(column1) - LEN(LEFT(column1,CHARINDEX(',,', column1,20)+2)) - LEN(RIGHT(column1,CHARINDEX(',,', (REVERSE(column1)))))) AS column1
FROM testi
Just repeat sub-string 3 times, each time moving onto the next ",," e.g.
select
-- Substring till the third ',,'
substring(z.col1, 1, patindex('%,,%',z.col1)-1)
from (values ('cat1,,cat2,,cat3,,cat4,,cat5'),('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')) x (col1)
-- Substring from the first ',,'
cross apply (values (substring(x.col1,patindex('%,,%',x.col1)+2,len(x.col1)))) y (col1)
-- Substring from the second ',,'
cross apply (values (substring(y.col1,patindex('%,,%',y.col1)+2,len(y.col1)))) z (col1);
And just to reiterate, this is a terrible way to store data, so the best solution is to store it properly.
Here is an alternative solution using charindex. The base idea is the same as in Dale K's an answer, but instead of cutting the string, we specify the start_location for the search by using the third, optional parameter, of charindex. This way, we get the location of each separator, and could slip each value off from the main string.
declare #vtest table (column1 varchar(200))
insert into #vtest ( column1 ) values('dogger1,,dogger2,,dogger3,,dogger4,,dogger5')
insert into #vtest ( column1 ) values('cat1,,cat2,,cat3,,cat4,,cat5')
declare #separetor char(2) = ',,'
select
t.column1
, FI.FirstInstance
, SI.SecondInstance
, TI.ThirdInstance
, iif(TI.ThirdInstance is not null, substring(t.column1, SI.SecondInstance + 2, TI.ThirdInstance - SI.SecondInstance - 2), null)
from
#vtest t
cross apply (select nullif(charindex(#separetor, t.column1), 0) FirstInstance) FI
cross apply (select nullif(charindex(#separetor, t.column1, FI.FirstInstance + 2), 0) SecondInstance) SI
cross apply (select nullif(charindex(#separetor, t.column1, SI.SecondInstance + 2), 0) ThirdInstance) TI
For transparency, I saved the separator string in a variable.
By default the charindex returns 0 if the search string is not present, so I overwrite it with the value null, by using nullif
IMHO, SQL Server 2016 and its JSON support in the best option here.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tokens VARCHAR(500));
INSERT INTO #tbl VALUES
('cat1,,cat2,,cat3,,cat4,,cat5'),
('dogger1,,dogger2,,dogger3,,dogger4,,dogger5');
-- DDL and sample data population, end
WITH rs AS
(
SELECT *
, '["' + REPLACE(Tokens
, ',,', '","')
+ '"]' AS jsondata
FROM #tbl
)
SELECT rs.ID, rs.Tokens
, JSON_VALUE(jsondata, '$[2]') AS ThirdToken
FROM rs;
Output
+----+---------------------------------------------+------------+
| ID | Tokens | ThirdToken |
+----+---------------------------------------------+------------+
| 1 | cat1,,cat2,,cat3,,cat4,,cat5 | cat3 |
| 2 | dogger1,,dogger2,,dogger3,,dogger4,,dogger5 | dogger3 |
+----+---------------------------------------------+------------+
It´s the same as #"Yitzhak Khabinsky" but i think it looks clearer
WITH CTE_Data
AS(
SELECT 'cat1,,cat2,,cat3,,cat4,,cat5' AS [String]
UNION
SELECT 'dogger1,,dogger2,,dogger3,,dogger4,,dogger5' AS [String]
)
SELECT
A.[String]
,Value3 = JSON_VALUE('["'+ REPLACE(A.[String], ',,', '","') + '"]', '$[2]')
FROM CTE_Data AS A

Query SQL with similar values

I have to make a query to a base using as a comparison a string like this 12345678, but the value to compare is this way12.345.678, if I do the following query it does not return anything.
SELECT * FROM TABLA WHERE CAMPO = '12345678'
Where CAMPO would have the value of (12.345.678), if I replace = with a like, it does not return the data either
SELECT * FROM TABLA WHERE CAMPO like '12345678%'
SELECT * FROM TABLA WHERE CAMPO like '%12345678'
SELECT * FROM TABLA WHERE CAMPO like '%12345678%'
None of the 3 previous consultations works for me, how can I make this query?
The value can be of either 7, 8 or 9 numbers and the. It has to be every 3 from the end to the beginning
Use REPLACE() function to replace all the dots '.' as
SELECT *
FROM(
VALUES ('12.345.678'),
('23.456.789')
) T(CAMPO)
WHERE REPLACE(CAMPO, '.', '') = '12345678';
Your query should be
SELECT * FROM TABLA WHERE REPLACE(CAMPO, '.', '') = '12345678';
You can compare the string without the dots to a REPLACE(StringWithDots, '.','')
I recommend you to convert the number to numeric
So you can use < and > operators and all functions that require you to have a number...
the best way to achieve this is to make sure you remove any unecessary dots and convert the commas to dots. like this
CONVERT(NUMERIC(10, 2),
REPLACE(
REPLACE('7.000,45', '.', ''),
',', '.'
)
)
I hope this will help you out.
A SARGABLE solution would be to write a function that takes your target value ('12345678') and inserts the separators ('.') every third character from right to left. The result ('12.345.678') can then be used in a where clause and benefit from an index on CAMPO.
The following code demonstrates an approach without creating a user-defined function (UDF). Instead, a recursive common table expression (CTE) is used to process the input string three characters at a time to build the dotted target string. The result is used in a query against a sample table.
To see the results from the recursive CTE replace the final select statement with the commented select immediately above it.
-- Sample data.
declare #Samples as Table ( SampleId Int Identity, DottedDigits VarChar(20) );
insert into #Samples ( DottedDigits ) values
( '1' ), ( '12' ), ( '123' ), ( '1.234' ), ( '12.345' ),
( '123.456' ), ( '1.234.567' ), ( '12.345.678' ), ( '123.456.789' );
select * from #Samples;
-- Query the data.
declare #Target as VarChar(15) = '12345678';
with
Target as (
-- Get the first group of up to three characters from the tail of the string ...
select
Cast( Right( #Target, 3 ) as VarChar(20) ) as TargetString,
Cast( Left( #Target, case when Len( #Target ) > 3 then Len( #Target ) - 3 else 0 end ) as VarChar(20) ) as Remainder
union all
-- ... and concatenate the next group with a dot in between.
select
Cast( Right( Remainder, 3 ) + '.' + TargetString as VarChar(20) ),
Cast( Left( Remainder, case when Len( Remainder ) > 3 then Len( Remainder ) - 3 else 0 end ) as VarChar(20) )
from Target
where Remainder != ''
)
-- To see the intermediate results replace the final select with the line commented out below:
--select TargetString from Target;
select SampleId, DottedDigits
from #Samples
where DottedDigits = ( select TargetString from Target where Remainder = '' );
An alternative approach would be to add a indexed computed column to the table that contains Replace( CAMPO, '.', '' ).
If the table containing IDs like 12.345.678 is big (contains many records), I would add a computed field that removes the dots (and if this ID does never contain any alphanumeric characters other than dots and has no significant leading zeros then also cast it in an INT or BIGINT) and persist it and lay an index over it. That way you loose a little time when inserting the record but are querying it with maximum speed and therefore saving processor power.

SQL Server REPLACE AND CHECK IF EXISTS

I have to check the string with the following scenarios in WHERE condition.
The data ProductId stored in the database can be like
7314-3337 sometimes with - symbol and not prefixed with 19
73143337 sometimes without symbol and not prefixed with 19
1973143337 correct format
197314-3337 sometimes with - symbol
I need to filter the record ProductId and the input is correct format , i.e 1973143337
WHERE P.ProductId=#ProductId
How can i filter it if the data stored in other 3 formats?
How to use the string replace(-) and prefix 19 if not exists in SQL server?
please check this 2 approach.
one is very simple and second is some trick. (I think you go with second option which cover everythings)
declare #t table (ProductId varchar(100))
insert into #t
values
('7314-3337')
,('73143337')
,('1973143337')
,('197314-3337')
,('73683337')
,('73143338')
declare #valuetosearch varchar(100) = '1973143337'
--this is very simple , but not work in each schenerio. the second approach is fine.
--select CHARINDEX ( '19','1973143337'), SUBSTRING('1973143337',3,len('1973143337'))
--select * from
--#t
--where
--replace(REPLACE(ProductId ,'-','') ,'19','') = replace(REPLACE(#valuetosearch ,'-','') ,'19','')
select * from
#t
where
REPLACE( case when CHARINDEX ( '19',ProductId) = 1
then SUBSTRING( ProductId ,3,LEN(ProductId))
else ProductId
end ,'-','')
=
REPLACE ( case when CHARINDEX ( '19',#valuetosearch) = 1
then SUBSTRING( #valuetosearch ,3,LEN(#valuetosearch))
else #valuetosearch
end ,'-','')
You should first sanitize your data, if it is not consistent then you won't be able to get the correct results.
For prefixing with 19:
UPDATE foo
SET ProductId = '19' + ProductId
WHERE Left(ProductID, 2) <> '19'
For removing the '-':
UPDATE foo
SET ProductId = REPLACE(ProductId, '-', '')
Then you should be able to get the results you want.
UPDATE:
You could construct a CTE with the results in a single format, and then, filter that CTE:
WITH cte (
FormattedPID
,ProductId
)
AS (
SELECT CASE
WHEN LEFT(ProductId, 2) = '19'
THEN REPLACE(ProductId, '-', '')
ELSE '19' + REPLACE(ProductId, '-', '')
END
,ProductId
FROM foo
)
SELECT FormattedPID
,ProductId
FROM cte
WHERE FormattedPID = #ProductID
You could make sure the column is in the correct format like this:
Remove the - by replacing it with an empty string (197314-3337 -> 1973143337, 7314-3337 -> 73143337).
Add 19 at the beginning (1973143337 -> 191973143337, 73143337 -> 1973143337).
Take 10 rightmost characters of the result and compare to the input (1973143337 -> 1973143337, 1973143337 -> 1973143337).
In Transact-SQL:
WHERE RIGHT('19' + REPLACE(P.ProductId, '-', ''), 10) = #ProductId
Of course, this means no index seek for you, because we are applying functions to the column.
An alternative to that would be to produce the three non-standard formats out of the input:
cut off the initial 19 (1973143337 -> 73143337);
insert the - (1973143337 -> 197314-3337);
insert the - and cut off the 19 (1973143337 -> 197314-3337 -> 7314-3337).
In Transact-SQL:
WHERE P.ProductId IN (
#ProductId,
SUBSTRING(#ProductId, 3, 999999999),
STUFF(#ProductId, 7, 0, '-'),
SUBSTRING(STUFF(#ProductId, 7, 0, '-'), 3, 999999999)
)
This way if there is an index on P.ProductId, it will be used efficiently.
Both approaches assume that the length of the correct format is fixed.

Extracting pipe delimted field into rows

I have a tbl with a field with values that are pipe delimited, and I need them extracted as rows.
Sample data
select distinct [PROV_KEY],
[NTWK_CDS]
FROM [SPOCK].[US\AC39169].[WellPointExtract_ERR]
where [PROV_KEY] = '447358B0A8E1C0F1B7AEB1ED07EC2F25'
--results
PROV_KEY NTWK_CDS
447358B0A8E1C0F1B7AEB1ED07EC2F25 |GA_HMO|GA_OPN|GA_PPO|GA_BD|GA_MCPPO|GA_HDPPO|
And I would like:
PROV_KEY NTWK_CDS
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_HMO
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_OPN
447358B0A8E1C0F1B7AEB1ED07EC2F25 GA_PPO
I tried the following but I'm only getting the first set of values:
select distinct [PROV_KEY],
substring([NTWK_CDS], 1,
CHARINDEX('|',[NTWK_CDS], CHARINDEX('|',[NTWK_CDS])+1))
FROM [SPOCK].[US\AC39169].[WellPointExtract_ERR]
where [PROV_KEY] = '447358B0A8E1C0F1B7AEB1ED07EC2F25'
This is a standard string splitting problem and there are many solutions out there. However most still feel like a workaround, as SQL Server does not have a split function build in.
You can start your research here: http://www.sommarskog.se/arrays-in-sql.html
The crucial operation you need to perform is a split. There are lots of solutions to this problem (see here for some), and people favor different ones depending on both situation and personal preference. Once you've done the split, though, you can JOIN or APPLY against the results to get the desired output.
I personally prefer using a SQLCLR function for this purpose since the performance is generally much better; but the number of options out there is staggering.
You can use splitting function
CREATE FUNCTION dbo.SplitStrings_CTE(#List nvarchar (1000), #Delimiter nvarchar(1 ))
RETURNS #returns TABLE(val nvarchar(100), [level] int, PRIMARY KEY CLUSTERED([level]))
AS
BEGIN
;WITH cte AS
(
SELECT SUBSTRING(#List, 0, CHARINDEX(#Delimiter , #List)) AS val ,
CAST(STUFF(#List + #Delimiter, 1, CHARINDEX(#Delimiter, #List),'') AS nvarchar (1000)) AS stval,
1 AS [level]
UNION ALL
SELECT SUBSTRING(stval, 0, CHARINDEX(#Delimiter, stval)),
CAST(STUFF(stval, 1 , CHARINDEX(#Delimiter ,stval), '') AS nvarchar(1000)),
[level] + 1
FROM cte
WHERE stval != ''
)
INSERT #returns
SELECT REPLACE(val ,' ' ,'') AS val, [level]
FROM cte
RETURN
END
Hence, your SELECT statement will be
SELECT *
FROM dbo.test82 t CROSS APPLY dbo.SplitStrings_CTE(t.NTWK_CDS, '|') o
WHERE o.val != ''
Demo on SQLFiddle