changing the sub-string delimiter in a regex, does not return same result

changing the sub-string delimiter in a regex, does not return same result - sql

I have a problem with understanding a regex.
I have the following string:
aaa'dd?'d'xxx'
In this string, the
'
is substring delimiter and
?
is a escape character for the
'
.
In Oracle SQL, I have a sentence which splits my string in substrings, based on substring delimiter:
select replace(
regexp_substr(q'[aaa'dd?'d'xxx']', '(.*?[^?])(''|$)', 1, level, null, 1),
'?''',
'''') as result
FROM dual
connect by level <= regexp_count(q'[aaa'dd?'d'xxx']', '(.*?[^?])(''|$)');
In this case, the result is:
aaa
dd'd
xxx
... which is correct.
My problem comes from the fact that I want to change the sub-string delimiter from
'
into
+
.
In this case, the main string becomes
aaa+dd?+d+xxx+
I modified the SQL statement in:
SELECT REPLACE(
regexp_substr(q'[aaa+dd?+d+xxx+]', '(.*?[^?])(+|$)', 1, level, null, 1),
'?''',
'''') as result
FROM dual
connect by level <= regexp_count(q'[aaa+dd?+d+xxx+]', '(.*?[^?])(+|$)');
... and the result is different:
a
a
a
+
d
d
?+
d
+
x
x
x
+
Can you point me what am I doing wrong in my modified script in order to get same result, please?

In regexp + means 1 or more of the preceding pattern. Try escaping the + with \ making your regexp '(.*?[^?])(\+|$)'

Related

SQL I need to extract a stored procedure name from a string

I am a bit new to this site but I have looked an many possible answers to my question but none of them has answered my need. I have a feeling it's a good challenge. Here it goes.
In one of our tables we list what is used to run a report this can mean that we can have a short EXEC [svr1].[dbo].[stored_procedure] or "...From svr1.dbo.stored_procedure...".
My goal is to get the stored procedure name out of this string (column). I have tried to get the string between '[' and ']' but that breaks when there are no brackets. I have been at this for a few days and just can't seem to find a solution.
Any assistance you can provide is greatly appreciated.
Thank you in advance for entertaining this question.
almostanexpert

Considering the ending character of your sample sentences is space, or your sentences end without trailing ( whether space or any other character other than given samples ), and assuming you have no other dots before samples, the following would be a clean way which uses substring(), len(), charindex() and replace() together :
with t(str) as
(
select '[svr1].[dbo].[stored_procedure]' union all
select 'before svr1.dbo.stored_procedure someting more' union all
select 'abc before svr1.dbo.stored_procedure'
), t2(str) as
(
select replace(replace(str,'[',''),']','') from t
), t3(str) as
(
select substring(str,charindex('.',str)+1,len(str)) from t2
)
select
substring(
str,
charindex('.',str)+1,
case
when charindex(' ',str) > 0 then
charindex(' ',str)
else
len(str)
end - charindex('.',str)
) as "Result String"
from t3;
Result String
----------------
stored_procedure
stored_procedure
stored_procedure
Demo

With the variability of inputs you seem to have we will need to plan for a few scenarios. The below code assumes that there will be exactly two '.' characters before the stored_procedure, and that [stored_procedure] will either end the string or be followed by a space if the string continues.
SELECT TRIM('[' FROM TRIM(']' FROM --Trim brackets from final result if they exist
SUBSTR(column || ' ', --substr(string, start_pos, length), Space added in case proc name is end of str
INSTR(column || ' ', '.', 1, 2)+1, --start_pos: find second '.' and start 1 char after
INSTR(column || ' ', ' ', INSTR(column || ' ', '.', 1, 2), 1)-(INSTR(column || ' ', '.', 1, 2)+1))
-- Len: start after 2nd '.' and go until first space (subtract 2nd '.' index to get "Length")
))FROM TABLE;
Working from the middle out we'll start with using the SUBSTR function and concatenating a space to the end of the original string. This allows us to use a space to find the end of the stored_procedure even if it is the last piece of the string.
Next to find our starting position, we use INSTR to search for the second instance of the '.' and start 1 position after.
For the length argument, we find the index of the first space after that second '.' and then subtract that '.' index.
From here we have either [stored_procedure] or stored_procedure. Running the TRIM functions for each bracket will remove them if they exist, and if not will just return the name of the procedure.
Sample inputs based on above description:
'EXEC [svr1].[dbo].[stored_procedure]'
'EXEC [svr1].[dbo].[stored_procedure] FROM TEST'
'svr1.dbo.stored_procedure'
Note: This code is written for Oracle SQL but can be translated to mySQL using similar functions.

match something ended with a specific char BUT not those which have something in front of the ending char

I have the following string:
aaa'dd?'d'xxx'
The delimiter is
'
but if it has
?
in front of it, it should not be consider delimiter but just a literal (the ? is a escape character for delimiter).
The result I want to display is:
aaa
dd'd
xxx
In this moment I am using [^']+ which does not take into consideration the escaping character(?).
Can you help me, please?

A simple option is to replace offending string with something else; for example, I used #. For the final result, replace it with a single quote, '.
SQL> with test (col) as
2 (select q'[aaa'dd?'d'xxx']' from dual),
3 inter as
4 (select replace(col, '?''', '#') icol
5 from test
6 )
7 select replace(regexp_substr(icol, '[^'']+', 1, level), '#', '''') result
8 from inter
9 connect by level <= regexp_count(icol, '''');
RESULT
-------------
aaa
dd'd
xxx

If you want to this without replacing the ?' pattern with a fixed dummy character - whether that's a '#' or anything else you are sure will never actually appear - then you can use a regular expression pattern like this:
-- bind variable for sample value
var str varchar2(20);
exec :str := q'[aaa'dd?'d'xxx']';
select regexp_substr(:str, '((.*?[^?])*?)(''|$)', 1, level, null, 1) as result
from dual
connect by level < regexp_count(:str, '((.*?[^?])*?)(''|$)');
RESULT
--------------------
aaa
dd?'d
xxx
and you can then just apply a simple replace afterwards:
select replace(
regexp_substr(:str, '((.*?[^?])*?)(''|$)', 1, level, null, 1),
'?''',
'''') as result
from dual
connect by level < regexp_count(:str, '((.*?[^?])*?)(''|$)');
RESULT
--------------------
aaa
dd'd
xxx
If you have two adjacent unescaped delimiters you get a null element back from that position (this didn't happen with an earlier version of the regex pattern):
exec :str := q'[aaa''dd?'d'xxx']';
-- just to make them more visible...
set null (null)
select replace(
regexp_substr(:str, '((.*?[^?])*?)(''|$)', 1, level, null, 1),
'?''',
'''') as result
from dual
connect by level < regexp_count(:str, '((.*?[^?])*?)(''|$)');
RESULT
--------------------
aaa
(null)
dd'd
xxx

How to extract the number from a string using Oracle?

I have a string as follows: first, last (123456) the expected result should be 123456. Could someone help me in which direction should I proceed using Oracle?

It will depend on the actual pattern you care about (I assume "first" and "last" aren't literal hard-coded strings), but you will probably want to use regexp_substr.
For example, this matches anything between two brackets (which will work for your example), but you might need more sophisticated criteria if your actual examples have multiple brackets or something.
SELECT regexp_substr(COLUMN_NAME, '\(([^\)]*)\)', 1, 1, 'i', 1)
FROM TABLE_NAME

Your question is ambiguous and needs clarification. Based on your comment it appears you want to select the six digits after the left bracket. You can use the Oracle instr function to find the position of a character in a string, and then feed that into the substr to select your text.
select substr(mycol, instr(mycol, '(') + 1, 6) from mytable
Or if there are a varying number of digits between the brackets:
select substr(mycol, instr(mycol, '(') + 1, instr(mycol, ')') - instr(mycol, '(') - 1) from mytable

Find the last ( and get the sub-string after without the trailing ) and convert that to a number:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE test ( str ) AS
SELECT 'first, last (123456)' FROM DUAL UNION ALL
SELECT 'john, doe (jr) (987654321)' FROM DUAL;
Query 1:
SELECT TO_NUMBER(
TRIM(
TRAILING ')' FROM
SUBSTR(
str,
INSTR( str, '(', -1 ) + 1
)
)
) AS value
FROM test
Results:
| VALUE |
|-----------|
| 123456 |
| 987654321 |

How to replace more than one character in oracle?

How to replace multiple whole characters, except those in combinations...?
The below code replaces multiple characters, but it also disturbing those in combinations.
SELECT regexp_replace('a,ca,va,ea,r,y,q,b,g','(a|y|q|g)','X') RESULT FROM dual;
Current output:
RESULT
--------------------
X,cX,vX,eX,r,X,X,b,X
Expected output:
RESULT
------------------------
'X,ca,va,ea,r,X,X,b,X
I just want to replace only separate whole characters('a','y','q','g'), but not the 1 in combinations('ca','va','ea')...

Because you are delimiting with a comma ',' you can combine that like ',a,'
and this will replace only single a's.

you can try follows:
with t as
(
select 'a,ca,va,ea,r,y,q,b,g' str
from dual
)
select substr(sys_connect_by_path(regexp_replace(regexp_substr(str, '[^,]+', 1, level), '^(a|y|q|g)$', 'X'), ','), 2) as str
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^,]*')) + 1;

Sadly oracle doesn´t support lookahead and lookbehind. But this is a solution i came up with.
SELECT regexp_replace
(regexp_replace
('a,ca,va,ea,r,y,q,b,g',
'^[ayqg](,)|(,)[ayqg](,)|(,)[ayqg]$',
'\2\4X\1\3'),'(,)[ayqg](,)','\1X\2')
RESULT FROM dual;
I had to use the regexp twice sadly, since it doesn´t find two similar values following after each other and replacing it. ..,a,y,.. is getting replaced as ..,X,y,... So the second call replaces the missing [ayqg] with the exact values. In the first inner regexp call replaces the first and last values.
Maybe this could be simplified into one expression, but i am not that conform with the regex from oracle.
As a explanation i am grouping the commata and basicly replace every ,[ayqg], with ,X, by backreferencing the commata

You would look for word boundaries, which is \b, and which is unfortunately not supported by Oracle's regexp_replace.
So let's look for a non-word character \W or the beginning ^ or ending $ of the text.
select
regexp_replace('a,ca,va,ea,r,y,q,b,g','(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3') as result
from dual;
In order to not remove the non-word characters, we must have them in the replace string: \1 for the expression in the first parenteses, \3 for the ones in the third. Thus we only change the expression in the second parentheses, which is a, y, q or g, with X.
Unfortunately above gives
X,ca,va,ea,r,X,q,b,X
The q was not replaced, because we recognize ',y,' thus being positioned a 'g,' whereas we'd need to be positioned at ',g,' to recognize g as a word, too.
So we need to replace in iterations (i.e. recursively):
with results(txt, num) as
(
select 'a,ca,va,ea,r,y,q,b,g' as txt, 0 as num from dual
union all
select regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3'), num + 1 as num
from results
where txt <> regexp_replace(txt, '(^|$|\W)(a|y|q|g)(^|$|\W)','\1X\3')
)
select max(txt) keep (dense_rank last order by num) as result
from results;
EDIT: Kevin Esche is right; of course one has to do it only twice. Hence you can also do:
select
regexp_replace(txt, search_str, replace_str) as result
from
(
select
regexp_replace(txt, search_str, replace_str) as txt, search_str, replace_str
from
(
select
'a,ca,va,ea,r,y,q,y,q,b,g' as txt,
'(^|$|\W)(a|y|q|g)(^|$|\W)' as search_str,
'\1X\3' as replace_str
from dual
)
);

with replaced_values as (
SELECT case when length(val)=1 then regexp_replace(val,'(a|y|q|g)','X') else val end new_val, lvl
from (
SELECT regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+', 1, LEVEL) val, level lvl FROM dual
connect by regexp_substr('a,ca,va,ea,r,y,q,b,g','[^,]+',1, LEVEL) is not null
) all_values
)
select lISTAGG(new_val, ',') WITHIN GROUP (ORDER BY lvl) RESULT
from replaced_values
This statement pivots data into rows and replaces only lines wich contains one character.
Data are then unpivoted in one rows

This sql works also with empty entries like 'a,,,b,c' and more complex regular expressions:
with t as
(select ',a,,ca,va,ea,bbb,ba,r,y,q,b,g,,,' as str,
',' as delimiter,
'(a|y|q|g|ea|[b]*)' as regexp_expr,
'X' as replace_expr
from dual)
(select substr (sys_connect_by_path(regexp_replace(substr(str,
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1)) + 1,
decode(instr(str, ',', 1, level),
0,
length(str),
instr(str, ',', 1, level) - 1) -
decode(level - 1, 0, 0, instr(str, ',', 1, level - 1))),
'^' || regexp_expr || '$',
replace_expr), ','), 2)
from t
where connect_by_isleaf = 1
connect by level <= length(regexp_replace(str, '[^'|| delimiter||']')) + 1)
Result
,X,,ca,va,X,X,ba,r,X,X,X,X,,,

Don't Know much Oracle, but I would have thought something like this could work. Assuming the delimiter is always a comma.
SELECT
regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace('a,ca,va,ea,r,y,q,b,g','(,a,|,y,|,q,|,g,)',',X,') ,'(,a,|,y,|,q,|,g,)',',X,'), '(^a,|^y,|^q,|^g,)','X,'), '(,a$|,y$|,q$|,g$)',',X'), '(^a$|^y$|^q$|^g$)','X')
RESULT FROM test;
The first two parts replaces a single character in commas in the middle, the third part gets those at the start of the string, the fourth is for the end of the string and the fifth is for when then string has just one character.
This answer might will be simplifiable by advanced Regexp use.

How i can replace words?
RS & OS ===> D, LS & IS ==== >
SECTION_ID Output required
1-LS-1991 1-P-1991
1-IS-1991 1-P-1991
1-RS-1991 1- D- 1991
1-OS-1991 1-D-1991

How can I remove leading and trailing quotes in SQL Server?

I have a table in a SQL Server database with an NTEXT column. This column may contain data that is enclosed with double quotes. When I query for this column, I want to remove these leading and trailing quotes.
For example:
"this is a test message"
should become
this is a test message
I know of the LTRIM and RTRIM functions but these workl only for spaces. Any suggestions on which functions I can use to achieve this.

I have just tested this code in MS SQL 2008 and validated it.
Remove left-most quote:
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 2, LEN(FieldName))
WHERE LEFT(FieldName, 1) = '"'
Remove right-most quote: (Revised to avoid error from implicit type conversion to int)
UPDATE MyTable
SET FieldName = SUBSTRING(FieldName, 1, LEN(FieldName)-1)
WHERE RIGHT(FieldName, 1) = '"'

I thought this is a simpler script if you want to remove all quotes
UPDATE Table_Name
SET col_name = REPLACE(col_name, '"', '')

You can simply use the "Replace" function in SQL Server.
like this ::
select REPLACE('this is a test message','"','')
note: second parameter here is "double quotes" inside two single quotes and third parameter is simply a combination of two single quotes. The idea here is to replace the double quotes with a blank.
Very simple and easy to execute !

My solution is to use the difference in the the column values length compared the same column length but with the double quotes replaced with spaces and trimmed in order to calculate the start and length values as parameters in a SUBSTRING function.
The advantage of doing it this way is that you can remove any leading or trailing character even if it occurs multiple times whilst leaving any characters that are contained within the text.
Here is my answer with some test data:
SELECT
x AS before
,SUBSTRING(x
,LEN(x) - (LEN(LTRIM(REPLACE(x, '"', ' ')) + '|') - 1) + 1 --start_pos
,LEN(LTRIM(REPLACE(x, '"', ' '))) --length
) AS after
FROM
(
SELECT 'test' AS x UNION ALL
SELECT '"' AS x UNION ALL
SELECT '"test' AS x UNION ALL
SELECT 'test"' AS x UNION ALL
SELECT '"test"' AS x UNION ALL
SELECT '""test' AS x UNION ALL
SELECT 'test""' AS x UNION ALL
SELECT '""test""' AS x UNION ALL
SELECT '"te"st"' AS x UNION ALL
SELECT 'te"st' AS x
) a
Which produces the following results:
before after
-----------------
test test
"
"test test
test" test
"test" test
""test test
test"" test
""test"" test
"te"st" te"st
te"st te"st
One thing to note that when getting the length I only need to use LTRIM and not LTRIM and RTRIM combined, this is because the LEN function does not count trailing spaces.

I know this is an older question post, but my daughter came to me with the question, and referenced this page as having possible answers. Given that she's hunting an answer for this, it's a safe assumption others might still be as well.
All are great approaches, and as with everything there's about as many way to skin a cat as there are cats to skin.
If you're looking for a left trim and a right trim of a character or string, and your trailing character/string is uniform in length, here's my suggestion:
SELECT SUBSTRING(ColName,VAR, LEN(ColName)-VAR)
Or in this question...
SELECT SUBSTRING('"this is a test message"',2, LEN('"this is a test message"')-2)
With this, you simply adjust the SUBSTRING starting point (2), and LEN position (-2) to whatever value you need to remove from your string.
It's non-iterative and doesn't require explicit case testing and above all it's inline all of which make for a cleaner execution plan.

The following script removes quotation marks only from around the column value if table is called [Messages] and the column is called [Description].
-- If the content is in the form of "anything" (LIKE '"%"')
-- Then take the whole text without the first and last characters
-- (from the 2nd character and the LEN([Description]) - 2th character)
UPDATE [Messages]
SET [Description] = SUBSTRING([Description], 2, LEN([Description]) - 2)
WHERE [Description] LIKE '"%"'

You can use following query which worked for me-
For updating-
UPDATE table SET colName= REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') WHERE...
For selecting-
SELECT REPLACE(LTRIM(RTRIM(REPLACE(colName, '"', ''))), '', '"') FROM TableName

you could replace the quotes with an empty string...
SELECT AllRemoved = REPLACE(CAST(MyColumn AS varchar(max)), '"', ''),
LeadingAndTrailingRemoved = CASE
WHEN MyTest like '"%"' THEN SUBSTRING(Mytest, 2, LEN(CAST(MyTest AS nvarchar(max)))-2)
ELSE MyTest
END
FROM MyTable

Some UDFs for re-usability.
Left Trimming by character (any number)
CREATE FUNCTION [dbo].[LTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(LTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Right Trimming by character (any number)
CREATE FUNCTION [dbo].[RTRIMCHAR] (#Input NVARCHAR(max), #TrimChar CHAR(1) = ',')
RETURNS NVARCHAR(max)
AS
BEGIN
RETURN REPLACE(REPLACE(RTRIM(REPLACE(REPLACE(#Input,' ','¦'), #TrimChar, ' ')), ' ', #TrimChar),'¦',' ')
END
Note the dummy character '¦' (Alt+0166) cannot be present in the data (you may wish to test your input string, first, if unsure or use a different character).

To remove both quotes you could do this
SUBSTRING(fieldName, 2, lEN(fieldName) - 2)
you can either assign or project the resulting value

You can use TRIM('"' FROM '"this "is" a test"') which returns: this "is" a test

CREATE FUNCTION dbo.TRIM(#String VARCHAR(MAX), #Char varchar(5))
RETURNS VARCHAR(MAX)
BEGIN
RETURN SUBSTRING(#String,PATINDEX('%[^' + #Char + ' ]%',#String)
,(DATALENGTH(#String)+2 - (PATINDEX('%[^' + #Char + ' ]%'
,REVERSE(#String)) + PATINDEX('%[^' + #Char + ' ]%',#String)
)))
END
GO
Select dbo.TRIM('"this is a test message"','"')
Reference : http://raresql.com/2013/05/20/sql-server-trim-how-to-remove-leading-and-trailing-charactersspaces-from-string/

I use this:
UPDATE DataImport
SET PRIO =
CASE WHEN LEN(PRIO) < 2
THEN
(CASE PRIO WHEN '""' THEN '' ELSE PRIO END)
ELSE REPLACE(PRIO, '"' + SUBSTRING(PRIO, 2, LEN(PRIO) - 2) + '"',
SUBSTRING(PRIO, 2, LEN(PRIO) - 2))
END

Try this:
SELECT left(right(cast(SampleText as nVarchar),LEN(cast(sampleText as nVarchar))-1),LEN(cast(sampleText as nVarchar))-2)
FROM TableName

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

changing the sub-string delimiter in a regex, does not return same result - sql

In regexp + means 1 or more of the preceding pattern. Try escaping the + with \ making your regexp '(.*?[^?])(\+|$)'

Related

SQL I need to extract a stored procedure name from a string

match something ended with a specific char BUT not those which have something in front of the ending char

How to extract the number from a string using Oracle?

How to replace more than one character in oracle?

How can I remove leading and trailing quotes in SQL Server?

Categories

Resources