I have a column in a SQL Server table that has strings of varying lengths. I need to find the position of the first occurrence of the string , -- that's not enclosed in single quotes or square brackets.
For example, in the following two strings, I've bolded the portion I would like to get the position of. Notice in the first string, the first time , -- appears on its own (without being between single quote or square bracket delimiters) is at position 13 and in the second string, it's at position 16.
'a, --'[, --]**, --**[, --]
[a, --b]aaaaaaa_ **, --**', --'
Also I should mention that , -- itself could appear multiple times in the string.
Here's a simple query that shows the strings and my desired output.
SELECT
t.string, t.desired_pos
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
Is there any way to accomplish this using a SELECT query (or multiple) without using a function?
Thank you in advance!
I've tried variations of SUBSTRING, CHARINDEX, and even some CROSS APPLYs but I can't seem to get the result I'm looking for.
Before i write down my solution, i must warn you: DON'T USE IT. Use a function, or do this in some other language. This code is probably buggy.
It doesn't handle stuff like escaped quotes etcetc.
The idea is to first remove the stuff inside brackets [] and quotes '' and then just do a "simple" charindex.
To remove the brackets, i'm using a recursive CTE that loops ever part of matching quotes and replaces their content with placeholder strings.
One important point is that quotes might be embedded in each other, so you have to try both variants and chose the one that is earliest.
WITH CTE AS (
SELECT *
FROM
(VALUES (N'''a, --''[, --], --[, --]', 14),
(N'[a, —-b]aaaaaaa_ , --'', --''', 18)) t(string, desired_pos)
)
, cte2 AS (
select x.start
, x.finish
, case when x.start > 0 THEN STUFF(string, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1)) ELSE string END AS newString
, 1 as level
, string as orig
, desired_pos
from cte
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', string) AS start
, charindex(']', string) AS finish
UNION ALL
SELECT charindex('''', string) AS startQ
, charindex('''', string, charindex('''', string) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
UNION ALL
select x.start
, x.finish
, STUFF(newString, x.start, x.finish - x.start + 1, REPLICATE('a', x.finish - x.start + 1))
, 1 as level
, orig
, desired_pos
from cte2
CROSS APPLY (
SELECT *
, ROW_NUMBER() OVER(ORDER BY case when start > 0 THEN 0 ELSE 1 END, start) AS sortorder
FROM (
SELECT charindex('[', newString) AS start
, charindex(']', newString) AS finish
UNION ALL
SELECT charindex('''', newString) AS startQ
, charindex('''', newString, charindex('''', newString) + 1) AS finishQ
) x
) x
WHERE x.sortorder = 1
AND x.start > 0
AND cte2.start > 0 -- Must have been a match
)
SELECT PATINDEX('%, --%', newString), *
from (
select *, row_number() over(partition by orig order by level desc) AS sort
from cte2
) x
where x.sort = 1
Try this approach. I'm replacing the strings you don't need for another string of the same length. Then look for the position of the interested string.
SELECT string, desired_pos,
CHARINDEX(', --', REPLACE(REPLACE(string, ''', --''', '******'), '[, --]', '******')
) start_index
FROM (VALUES (N''', --''[, --], --[, --]', 13),
(N'[, --]aaaaaaa_ , --'', --''', 16)) t(string, desired_pos)
I don't know if it makes sense with a C# solution, but this class for CVS is a nice little parcer: TextFieldParser
Then you just define Delimeters etc. and assuming the input is escaped consistently then all is good.
Im late the game here but This kind of thing is simple in SQL Server when leveraging NGrams8k. Not only do you not need REGEX, a CLR, C# required. Furthermore, NGrams8k will be the fastest by far. In 8 years nobody has produced anything remotely as fast. Furthermore, this code will be faster and far less complex than a recursive CTE solution (which are almost always slow in SQL Server)
;--==== Sample Data
DECLARE #T Table (String VARCHAR(100))
INSERT #T
VALUES (N'''a, --''[, --], --[, --]'),
(N'[a, —-b]aaaaaaa_ , --'', --''');
;--==== Solution
SELECT
t.String, ng.Position
FROM #t AS t
CROSS APPLY (VALUES(REPLACE(t.String,'[',CHAR(1)))) AS f(S)
CROSS APPLY samd.NGrams8k(f.S,4) AS ng
CROSS APPLY (VALUES(SUBSTRING(f.S,ng.Position-2,7))) AS g(String)
WHERE ng.Token = ', --'
AND g.String NOT LIKE '%''%''%'
AND g.String NOT LIKE '%'+CHAR(1)+'%]%';
Results:
String Position
----------------------------- --------------------
'a, --'[, --], --[, --] 14
[a, —-b]aaaaaaa_ , --', --' 18
I have two columns A, B in oracle where A value has values like that xx-target-xx
xx any data but target is exists
A
--------
xx-target-xx
xx-target
i neet to return only 'target' from text
i tired this
select TRIM(substr(A, 0, instr(A, '-') - 1)) from mytable
but the result returns xx not target
Use REGEXP_SUBSTR. You want the second string of any characters except the minus sign:
select a, regexp_substr(a, '[^-]+', 1, 2) from mytable;
Using INSTR and SUBSTR instead is a tad more complicated, but possible of course:
select a, substr(a,
instr(a, '-') + 1,
instr(a || '-', '-', 1, 2) - instr(a, '-') - 1
) as value
from mytable;
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=e75b878bbd6300e9207cd698bb3029ec
What is the best method to extract the text from the string field below?
I am trying to extract the ProjectID numbers (91, 108, 250) below but am struggling because the ProjectIDs are either 2 or 3 integers long and are within different parts of the string.
Row Parameter
1 ProjectID=91&GroupID=250&ParentID=1
2 ProjectID=108&GroupID=250&ParentID=35
3 GroupID=1080&ProjectID=250&ParentID=43
4 ProjectID=250
Any help would be much appreciated.
SQL Server is kind of lousy on string functionality. Here is one method:
select left(v1.p, patindex('%[^0-9]%', v1.p + ' ') - 1)
from (values ('ProjectID=91&GroupID=250&ParentID=1'),
('ProjectID=108&GroupID=250&ParentID=35'),
('GroupID=1080&ProjectID=250&ParentID=43'),
('ProjectID=250')
) v(parameter) cross apply
(values (stuff(v.parameter, 1, charindex('ProjectID=', v.parameter) + 9, ''))
) v1(p);
Or split the string and look for a match:
select stuff(s.value, 1, 10, '')
from (values ('ProjectID=91&GroupID=250&ParentID=1'),
('ProjectID=108&GroupID=250&ParentID=35'),
('GroupID=1080&ProjectID=250&ParentID=43'),
('ProjectID=250')
) t(parameter) cross apply
string_split(t.parameter, '&') s
where s.value like 'ProjectId=%';
Here is a db<>fiddle.
SELECT
substring ( #Parameter,
CHARINDEX('ProjectID', #Parameter) + 10,
CHARINDEX('&', #parameter, CHARINDEX('ProjectID', #Parameter)) -
(CHARINDEX('ProjectID', #Parameter) + 10))
from table
I need to extract a certain string from a column in a table as part of an SSIS package.
The contents of the column is formatted like this "TST_AB1_ABC123456_TEST".
I need to get the string between the second and 3rd "_", e.g. "ABC123456" without changing too much of the package so would rather do it in 1 SQL command if possible.
I've tried a few different methods using SUBSTRING, REVERSE and CHARINDEX but can't figure out how to get just that string.
Using the base string functions:
SELECT
SUBSTRING(col,
CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1,
CHARINDEX('_', col, CHARINDEX('_', col, CHARINDEX('_', col) + 1) + 1) -
CHARINDEX('_', col, CHARINDEX('_', col) + 1) - 1)
FROM yourTable;
In notes format, the above call to SUBSTRING is saying:
SELECT
SUBSTRING(<your column>,
<starting at one past the second underscore>,
<for a length of the number of characters in between the 2nd and 3rd
underscore>)
FROM yourTable;
On other databases, such as Postgres and Oracle, there are substring index and regex functions which can handle the above more gracefully. Actually, more recent versions of SQL Server have a STRING_SPLIT function, which could be used here, but it does not maintain the order of the resulting parts.
If your column values always have 4 parts you can use the PARSENAME() function like this.
DECLARE #MyString VARCHAR(100)
SET #MyString = 'TST_AB1_ABC123456_TEST';
SELECT PARSENAME(REPLACE(#MyString, '_', '.'), 2)
You could also do this using Cross Apply. I added in a where clause to make sure you don't get an error resulting from strings without 3 underscores
with your_table as (select 'TST_AB1_ABC123456_TEST' as txt1)
select txt1, txt2
from your_table t1
where txt1 like '%_%_%_%'
cross apply (select charindex( '_', txt1) as i1) t2 -- locate the 1st underscore
cross apply (select charindex( '_', txt1, (i1 + 1)) as i2 ) t3 -- then the 2nd
cross apply (select charindex( '_', txt1, (i2 + 1)) as i3 ) t4 -- then the 3rd
cross apply (select substring( txt1,(i2+1), (i3-i2-1)) as txt2) t5 -- between 2nd & 3rd
Outputs
+------------------------+-----------+
| txt1 | txt2 |
+------------------------+-----------+
| TST_AB1_ABC123456_TEST | ABC123456 |
+------------------------+-----------+
DEMO
I want to sub-string 11.1.2.3.4.5 or 10.1.2.4.5 and so on to be split until 4(dot) only like 11.1.2.3 and 10.1.2.3 likewise.
Can someone help to achieve this in SQL?
You could use a recurcive CTE as the following
CREATE TABLE Strings( S VARCHAR(25) );
INSERT Strings VALUES
('1.2.3.4.5.6'),
('11.2.12.5.66'),
('y.888.p.666.2.00');
WITH CTE AS
(
SELECT 1 N, CHARINDEX('.', S) Pos, S
FROM Strings
UNION ALL
SELECT N + 1, CHARINDEX('.', S, Pos + 1), S
FROM CTE
WHERE Pos > 0
)
SELECT S, SUBSTRING(S, 1, Pos - 1) --or use LEFT()
FROM CTE
WHERE N = 4;
Or using a nested CHARINDEX() as
SELECT S, LEFT(S, CI-1)
FROM Strings
CROSS APPLY
(
VALUES
(CHARINDEX('.', S, CHARINDEX('.', S, CHARINDEX('.', S, CHARINDEX('.', S)+1)+1)+1))
) T(CI)
Here is a db<>fiddle