Regular Expression PL/SQL - sql

I am trying to use some regular expressions in PL/SQL.
I try to check the following pattern: LOKATIONS_ID => /LAND/ORT/GEBÄUDE/
I've tried this one:
create or replace function check_lok_id(lok_id in varchar2) return boolean
is
begin
if regexp_like (lok_id, '^(/[A-Z]+?){3}/$)')
then
return true;
else
return false;
end if;
end;
But unfortunately this one and several other regular expressions i've tested so far doesn't work.
Any suggestions?

Your example won't match because the umlauted character Ä is not in the range A-Z.
Try this regex:
^(/\w+?){3}/$
Or if you want to match only uppercase letters, but from all languages:
^(/[[:upper:]]+?){3}/$
See live demo

Related

How to replace character accented characters in Snowflake?

I'm trying to replace accented characters from a column to "normal" characters.
select 'áááããã'
I'd like some operation which would return 'aaaaaa'.
There is a more general way that uses a built-in JavaScript function to replace them:
Remove Diacritics from string in Snowflake
create or replace function REPLACE_DIACRITICS("str" string)
returns string
language javascript
strict immutable
as
$$
return str.normalize("NFD").replace(/\p{Diacritic}/gu, "");
$$;
select REPLACE_DIACRITICS('ö, é, č => a, o e, c');
Just found a solution with one of my colleagues.
select translate('áááããã','áéíóúãõâêôàç','aeiouaoaeoac')
We can also add a lower() to make it generalized for more cases
select translate(lower('ÁÁÁÃÃÃ'),'áéíóúãõâêôàç','aeiouaoaeoac')

Detect specific string including wildcards and isolate wildcards in PLPGSQL

Is it possible in PostgreSQL 9.4 (PLPGSQL) to detect if a string contains a certain string including wildcards and get the wildcards, ex.:
IF NEW.my_string CONTAINS 'patternXYZ' THEN
NEW.my_values := getXYZ(my_string)
END IF;
Which would result in NEW.my_values to contain XYZ (which can be anything in the string, but only the 3 characters).
SELECT
CASE
WHEN (NEW.my_string like '%patternXYZ%')
THEN substring(NEW.my_string from '+pattern+')
ELSE '00'
END AS data
FROM my_table;
Pattern should be the parameter for query.

SQL Regular expression Function

I'm trying to understand the meaning of this regular expression function and it purpose in the select statement.
create or replace FUNCTION REPS_MTCH(string_orig IN VARCHAR2 , string_new IN VARCHAR2, score IN NUMBER)
RETURN PLS_INTEGER AS
BEGIN
IF string_orig IS NULL AND string_new IS NULL THEN
RETURN 0;
ELSIF utl_match.jaro_winkler_similarity(replace(REGEXP_REPLACE(UPPER(string_orig), '[^a-z|A-Z|0-9]+', ''),' ',''),replace(REGEXP_REPLACE(UPPER(string_new), '[^a-z|A-Z|0-9]+', ''),' ','')) >= score THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
//the REPS_MTCH function is being called in this select statement. the select statement is to match names in the the Temp table name as REPS_MTCH_D_STDNT_TMP against the master table named as REPS_MTCH_D_STDNT_MSTR. what is the purpose of the REPS_MTCH function in this select statement?
SELECT
REPS_MTCH(REPS_MTCH_D_STDNT_TMP.FIRST_NAME,REPS_MTCH_D_STDNT_MSTR.FIRST_NAME,85) AS first_match_score,
what is the purpose of the REPS_MTCH function in this select statement?
In the above function the REGEXP_REPLACE is removing all occurrences any non alpha numeric or pipe (|) characters. After that the REGEXP_REPLACE is also wrapped in a redundant call to the regular REPLACE function which simply removes the spaces which were already removed by the REGEXP_REPLACE calls. The test could be rewritten as follows and still behave the identically since the inputs are first UPPERcased before the replace operations occur:
ELSIF utl_match.jaro_winkler_similarity(
REGEXP_REPLACE(UPPER(string_orig), '[^A-Z|0-9]+', '')
,REGEXP_REPLACE(UPPER(string_new) , '[^A-Z|0-9]+', '')
) >= score
THEN RETURN 1;
I simply removed the extra replace operation, the unnecessary lower case a-z and the extra pipe (|) character from the regular expression's character classes.
The JARO_WINKLER_SIMILARITY function just computes a score from 0 not similar to 100 identical of the remaining alpha numeric and pipe characters. You can check out the wikipedia entry on Jaro Winkler distances if you want to know more about them.

How to remove part of the string in oracle

Input data:
abcdef_fhj_viji.dvc
Expected output:
fhj_viji.dvc
The part to be trimmed is not constant.
Use the REPLACE method
Select REPLACE('abcdef_fhj_viji.dvc','abcde','')
If you want this query for your table :
Select REPLACE(column,'abcde','') from myTable
For update :
UPDATE TABLE
SET column = REPLACE(column,'abcde','')
select substr('abcdef_fhj_viji.dvc',instr('abcdef_fhj_viji.dvc','_')+1) from dual
So, Its all depends on INSTR function, define from which position and which occurrence, you will get the index and pass that index to SUBSTR to get your string.
Since you didn't give a lot of information I'm gonna assume some.
Let's assume you want a prefix of some string to be deleted. A good way to do that is by using Regular Expressions. There's a function called regexp_replace, that can find a substring of a string, depending on a pattern, and replace it with a different string. In PL/SQL you could write yourself a function using regexp_replace, like this:
function deletePrefix(stringName in varchar2) return varchar2 is
begin
return regexp_replace(stringName, '^[a-zA-Z]+_', '');
end;
or just use this in plain sql like:
regexp_replace(stringName, '^[a-zA-Z]+_', '');
stringName being the string you want to process, and the ^[a-zA-Z]+_ part depending on what characters the prefix includes. Here I only included upper- and lowercase letters.

simple parameter substitution in regexp_matches postgreSQL function

I have a table with a structure like this...
the_geom data
geom1 data1+3000||data2+1000||data3+222
geom2 data1+500||data2+900||data3+22232
I want to create a function that returns the records by user request.
Example: for data2, retrieve geom1,1000 and geom2, 900
Till now I created this function (see below) which works quite good but I am facing a parameter substitution problem... (you can see I am not able to substitute 'data2' for $1 in... BUT yes I can use $1 later
regexp_matches(t::text, E'(data2[\+])([0-9]+)'::text)::text)[2]::integer
MY FUNCTION
create or replace function get_counts(taxa varchar(100))
returns setof record
as $$
SELECT t2.counter,t2.the_geom
FROM (
SELECT (regexp_matches(t.data::text, E'(data2[\+])([0-9]+)'::text)::text)[2]::integer as counter,the_geom
from (select the_geom,data from simple_inpn2 where data ~ $1::text) as t
) t2
$$
language sql;
SELECT get_counts('data2') will work **but we should be able to make this substitution**:
regexp_matches(t::text, E'($1... instead of E'(data2....
I think its more a syntaxis issue, as the function execution gives no error, just interprets $1 as a string and gives no result.
thanks in advance,
A E'$1' is a string literal (using the escape string syntax) containing a dollar sign followed by a one. An unquoted $1 is the first parameter to your function. So this:
regexp_matches(t, E'($1[\+])([0-9]+)'))[2]::integer
as you've found, won't interpolate the $1 with the function's first parameter.
The regex is just a string, a string with an internal structure but still just a string. If you know that $1 will be a normal word then you could say:
regexp_matches(t, E'(' || $1 || E'[\+])([0-9]+)'))[2]::integer
to paste your strings together into a suitable regex. However, it is better to be a little paranoid, sooner or later someone is going to call your function with a string like 'ha ha (' so you should be prepared for it. The easiest way that I can think of to add an arbitrary string to a regex is to escape all the non-word characters:
-- Don't forget to escape the escaped escapes! Hence all the backslashes.
str := regexp_replace($1, E'(\\W)', E'\\\\\\1', 'g');
and then paste str into the regex as above:
regexp_matches(t, E'(' || str || E'[\+])([0-9]+)'))[2]::integer
or better, build the regex outside the regexp_matches to cut down on the nested parentheses:
re := E'(' || str || E'[\+])([0-9]+)';
-- ...
select regexp_matches(t, re)[2]::integer ...
PostgreSQL doesn't have Perl's \Q...\E and the (?q) metasyntax applies until the end of the regex so I can't think of any better way to paste an arbitrary string into the middle of a regex as a non-regex literal value than to escape everything and let PostgreSQL sort it out.
Using this technique, we can do things like:
=> do $$
declare
m text[];
s text;
r text;
begin
s = E'''{ha)?';
r = regexp_replace(s, E'(\\W)', E'\\\\\\1', 'g');
r = '(ha' || r || ')';
raise notice '%', r;
select regexp_matches(E'ha''{ha)?', r) into m;
raise notice '%', m[1];
end$$;
and get the expected
NOTICE: ha'{ha)?
output. But if you leave out the regexp_replace escaping step, you'll just get an
invalid regular expression: parentheses () not balanced
error.
As an aside, I don't think you need all that casting so I removed it. The regexes and escaping are noisy enough, there's no need to throw a bunch of colons into the mix. Also, I don't know what your standard_conforming_strings is set to or which version of PostgreSQL you're using so I've gone with E'' strings everywhere. You'll also want to switch your procedure to PL/pgSQL (language plpgsql) to make the escaping easier.