Replacing String that have multiple commas - sql

I have data like this in the below column
ColA
a,b,,|c,d,e,f|l,m,,|p,q,r,s
So here I want to remove the complete data seperated by Pipe(|) if it has two consecutive commas ,,.
My output should be,
ColA
c,d,e,f|p,q,r,s
Pls help with the query.

I'll be using regexp_replace with the following example
SELECT
REGEXP_REPLACE('a,b,,|c,d,e,f|l,m,,|p,q,r,s', --source column
'.{1,3}(,,\|)', --find pattern
'', --replace with null
1, --start with position number
0, --occurance
'i') --regex match parameter
FROM dual;

Similar approach, but more scenarios addressed:
SELECT REGEXP_REPLACE('a,b,,|c,d,e,f|l,m,,|p,q,r,s', '^[^|]*,{2,3}[^|]*\|*|\|[^|]*,{2,3}[^|]*' )
FROM dual;
Components of the replacement regular expression (seperated by alternation operator)
^[^|]*,{2,3}[^|]*\|* left end or only one entry
or
\|[^|]*,{2,3}[^|]* center (always take off left pipe)
explanation:
^ is an anchor for the start of the string
[^|] is a non-pipe character
* is the 0, n quantifier
{2,3} explicit quantifier match of 2 or 3 times
| alternation operator (or)

Related

How to return a substring after a colon and before a whitespace in Oracle SQL

I need to get a substring from a table column that is after a colon and before a whitespace. The length of the substring can vary, but the length of the data before the colon and after the whitespace is constant.
So the data in my table column named "Subject" consists of 5 words, immediately followed by a colon, immediately followed by the substring I need (which can vary in length), followed by a whitespace and a date. The substring I need is a course name. Examples:
Payment Due for Upcoming Course:FIN/370T 11/26/2019
Payment Due for Upcoming Course:BUS/475 11/26/2019
Payment Due for Upcoming Course:ADMIN9/475TG 11/26/2019
I have tried using REGEXP function with REGEXP_SUBSTR(COLUMN_NAME,'[^:]+$') to get everything after the colon, and REGEXP_SUBSTR(COLUMN_NAME, '[^ ]+' , 1 , 5 ) to get data before the last whitespace, but I need to combine them.
I have tried the following:
select
REGEXP_SUBSTR(SUBJECT,'[^:]+$') COURSE_ID
from TABLE
Result:
FIN/370T 11/26/2019
and this:
select
REGEXP_SUBSTR (SUBJECT, '[^ ]+' , 1 , 5 ) COURSE_ID2
from TABLE
Result:
Course:FIN/370T
I need the output to return FIN/370T
In short use:
select regexp_replace(str,'(.*:)(.*)( )(.*)$','\2') as short_course_id
from tab
I prefer regexp_replace, because there are more possibilities to extract part of strings.
If you don't want to mess with regex, you can use a combo of substr and instr.
select
substr(part1,1,instr(part1, ' ',-1,1) ) as course,
part1
from (
select
substr(<your column>,instr(<your column>,':',1,1) +1) as part1
from
<your table>
) t
Fiddle
One option would be
select replace(regexp_substr(str,'[^:]+$'),
regexp_substr(str,'[^:][^ ]+$'),'') as course_id
from tab
Demo
where first regexp_substr() extracts the substring starting from the colon to the end, and the second one from the last space to the end.

replace all occurrences of a sub string between 2 charcters using sql

Input string: ["1189-13627273","89-13706681","118-13708388"]
Expected Output: ["14013627273","14013706681","14013708388"]
What I am trying to achieve is to replace any numbers till the '-' for each item with hard coded text like '140'
SELECT replace(value_to_replace, '-', '140')
FROM (
VALUES ('1189-13627273-77'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
check this
I found the right way to achieve that using the below regular expression.
SELECT REGEXP_REPLACE (string_to_change, '\\"[0-9]+\\-', '140')
You don't need a regexp for this, it's as easy as concatenation of 140 and the substring from - (or the second part when you split by -)
select '140'||substring('89-13706681' from position('-' in '89-13706681')+1 for 1000)
select '140'||split_part('89-13706681','-',2)
also, it's important to consider if you might have instances that don't contain - and what would be the output in this case
Use regexp_replace(text,text,text) function to do so giving the pattern to match and replacement string.
First argument is the value to be replaced, second is the POSIX regular expression and third is a replacement text.
Example
SELECT regexp_replace('1189-13627273', '.*-', '140');
Output: 14013627273
Sample data set query
SELECT regexp_replace(value_to_replace, '.*-', '140')
FROM (
VALUES ('1189-13627273'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
Caution! Pattern .*- will replace every character until it finds last occurence of - with text 140.

Oracle SQL - select parts of a string

How can I select abcdef.txt from the following string?
abcdef.123.txt
I only know how to select abcdef by doing select substr('abcdef.123.txt',1,6) from dual;
You can using || for concat and substr -3 for right part
select substr('abcdef.123.txt',1,6) || '.' ||substr('abcdef.123.txt',-3) from dual;
or avoiding a concat (like suggested by Luc M)
select substr('abcdef.123.txt',1,7) || substr('abcdef.123.txt',-3) from dual;
A general solution, assuming the input string has exactly two periods . and you want to extract the first and third tokens, separated by one . The length of the "tokens" in the input string can be arbitrary (including zero!) and they can contain any characters other than .
select regexp_replace('abcde.123.xyz', '([^.]*).([^.]*).([^.]*)', '\1.\3') as result
from dual;
RESULT
---------
abcde.xyz
Explanation:
[ ] means match any of the characters between brackets.
^
means do NOT match the characters in the brackets - so...
[^.]
means match any character OTHER THAN .
* means match zero or
more occurrences, as many as possible ("greedy" match)
( ... ) is called a subexpression... see below
'\1.\3 means replace the original string
with the first subexpression, followed by ., followed by the THIRD
subexpression.
Replace the substring of anything surrounded by dots (inclusive) with a single dot. No dependence on lengths of components of the string:
SQL> select regexp_replace('abcdef.123.txt', '\..*\.', '.') fixed
from dual;
FIXED
----------
abcdef.txt
SQL>

Postgresql : Pattern matching of values starting with "IR"

If I have table contents that looks like this :
id | value
------------
1 |CT 6510
2 |IR 52
3 |IRAB
4 |IR AB
5 |IR52
I need to get only those rows with contents starting with "IR" and then a number, (the spaces ignored). It means I should get the values :
2 |IR 52
5 |IR52
because it starts with "IR" and the next non space character is an integer. unlike IRAB, that also starts with "IR" but "A" is the next character. I've only been able to query all starting with IR. But other IR's are also appearing.
select * from public.record where value ilike 'ir%'
How do I do this? Thanks.
You can use the operator ~, which performs a regular expression matching.
e.g:
SELECT * from public.record where value ~ '^IR ?\d';
Add a asterisk to perform a case insensitive matching.
SELECT * from public.record where value ~* '^ir ?\d';
The symbols mean:
^: begin of the string
?: the character before (here a white space) is optional
\d: all digits, equivalent to [0-9]
See for more info: Regular Expression Match Operators
See also this question, very informative: difference-between-like-and-in-postgres

Remove last two characters from each database value

I run the following query:
select * from my_temp_table
And get this output:
PNRP1-109/RT
PNRP1-200-16
PNRP1-209/PG
013555366-IT
How can I alter my query to strip the last two characters from each value?
Use the SUBSTR() function.
SELECT SUBSTR(my_column, 1, LENGTH(my_column) - 2) FROM my_table;
Another way using a regular expression:
select regexp_replace('PNRP1-109/RT', '^(.*).{2}$', '\1') from dual;
This replaces your string with group 1 from the regular expression, where group 1 (inside of the parens) includes the set of characters after the beginning of the line, not including the 2 characters just before the end of the line.
While not as simple for your example, arguably more powerful.