Extract all object names from PL/SQL file in Notepad++ - sql

I need to get the function, procedure, cursor names and other objects from a PL/SQL package body file (*.spb) in Notepad++, for example from this sql script:
create or replace PACKAGE BODY pac_emp3 AS
PROCEDURE p_buscar_salario_emp3 (p_employee_id IN employees.employee_id%TYPE,
p_employee_name OUT employees.first_name%type,
p_string IN OUT varchar2)
AS
v_salario employees.salary%TYPE;
BEGIN
SELECT salary, first_name INTO v_salario, p_employee_name FROM employees WHERE employees.employee_id = p_employee_id;
p_string := 'Procedimiento terminado';
DBMS_OUTPUT.PUT_LINE('Salario: '|| v_salario);
END p_buscar_salario_emp3;
FUNCTION f_foo RETURN NUMBER IS
SELECT 1+1 FROM DUAL;
RETURN 1;
END;
END pac_emp3;
In this case, I need extract only:
PROCEDURE p_buscar_salario_emp3
or that the text looks only with the object and the name of the object:
PROCEDURE p_buscar_salario_emp3
FUNCTION f_foo
Same with FUNCTION names, etc.
I understand that it's possible with regular expression, but which one regex?

Ctrl+H
Find what: (?:\A|\G)(?:(?!(?:PROCEDURE|FUNCTION)).)*((?:PROCEDURE|FUNCTION)\s+\w+)(?:(?!(?:PROCEDURE|FUNCTION)).)*
Replace with: $1\n or $1\r\n
check Wrap around
check Regular expression
CHECK . matches newline
Replace all
Explanation:
(?:\|\G) # non capture group, beginning of strig or restart from last match position
(?: # start non capture group
(?! # start negative lookahead
(?:PROCEDURE|FUNCTION) # non capture group PROCEDURE or FUNCTION (you can add other keywords)
) # end lookahead
. # any character
)* # end group, may appear 0 or more times
( # start group 1
(?:PROCEDURE|FUNCTION) # non capture group PROCEDURE or FUNCTION (you can add other keywords)
\s+ # 1 or more spaces
\w+ # 1 or more word character
) # end group 1
(?: # start non capture group
(?! # start negative lookahead
(?:PROCEDURE|FUNCTION) # non capture group PROCEDURE or FUNCTION (you can add other keywords)
) # end lookahead
. # any character
)* # end group, may appear 0 or more times
Replacement:
$1 # content of group 1
\n # linefeed, use \r\n for windows line break
Result for given example:
PROCEDURE p_buscar_salario_emp3
FUNCTION f_foo

This regex should work for you.
((PROCEDURE|FUNCTION) \S+)
If you need to add more terms, type them in like so:
((PROCEDURE|FUNCTION|NEW_TERM) \S+)

Related

Oracle sql REGEXP_REPLACE expression to replace a number in a string matching a pattern

I have a string 'ABC.1.2.3'
I wish to replace the middle number with 1.
Input 'ABC.1.2.3'
Output 'ABC.1.1.3'
Input 'XYZ.2.2.1'
Output 'XYZ.2.1.1'
The is, replace the number after second occurrence of '.' with 1.
I know my pattern is wrong, the sql that I have at the moment is :
select REGEXP_REPLACE ('ABC.1.2.8', '(\.)', '.1.') from dual;
You can use capturing groups to refer to surrounding numbers in replacement string later:
select REGEXP_REPLACE ('ABC.1.2.8', '([0-9])\.[0-9]+\.([0-9])', '\1.1.\2') from dual;
You could use
^([^.]*\.[^.]*\.)\d+(.*)
See a demo on regex101.com.
This is:
^ # start of the string
([^.]*\.[^.]*\.) # capture anything including the second dot
\d+ # 1+ digits
(.*) # the rest of the string up to the end
This is replaced by
$11$2

Get string after '/' character

I want to extract the string after the character '/' in a PostgreSQL SELECT query.
The field name is source_path, table name is movies_history.
Data Examples:
Values for source_path:
184738/file1.mov
194839/file2.mov
183940/file3.mxf
118942/file4.mp4
And so forth. All the values for source_path are in this format
random_number/filename.xxx
I need to get 'file.xxx' string only.
If your case is that simple (exactly one / in the string) use split_part():
SELECT split_part(source_path, '/', 2) ...
If there can be multiple /, and you want the string after the last one, a simple and fast solution would be to process the string backwards with reverse(), take the first part, and reverse() again:
SELECT reverse(split_part(reverse(source_path), '/', 1)) ...
Or you could use the more versatile (and more expensive) substring() with a regular expression:
SELECT substring(source_path, '[^/]*$') ...
Explanation:
[...] .. encloses a list of characters to form a character class.
[^...] .. if the list starts with ^ it's the inversion (all characters not in the list).
* .. quantifier for 0-n times.
$ .. anchor to end of string.
db<>fiddle here
Old sqlfiddle
You need use substring function
SQL FIDDLE
SELECT substring('1245487/filename.mov' from '%/#"%#"%' for '#');
Explanation:
%/
This mean % some text and then a /
#"%#"
each # is the place holder defined in the last part for '#' and need and aditional "
So you have <placeholder> % <placeholder> and function will return what is found inside both placeholder. In this case is % or the rest of the string after /
FINAL QUERY:
SELECT substring(source_path from '%/#"%#"%' for '#');
FROM movies_history
you can use the split_part string function,
syntax: split_part(string,delimiter,position)
string example: exx = "2022-06-12"
Note: can be "#ertl/eitd/record_4" etc
delimiter: any character for the above example ("-" or "/")
Position: nth position,
How it works: the above exx string will be split in x times based on the delimiter
e.g position 1- 2022, position 2-06, position 3-12
so the nth position helps choose what you want to return
thus based on your example:
syntax: slipt_part(random_number/filename.xxx,"/",2)
output: filename.xxx

SQL Regular expression Function

I'm trying to understand the meaning of this regular expression function and it purpose in the select statement.
create or replace FUNCTION REPS_MTCH(string_orig IN VARCHAR2 , string_new IN VARCHAR2, score IN NUMBER)
RETURN PLS_INTEGER AS
BEGIN
IF string_orig IS NULL AND string_new IS NULL THEN
RETURN 0;
ELSIF utl_match.jaro_winkler_similarity(replace(REGEXP_REPLACE(UPPER(string_orig), '[^a-z|A-Z|0-9]+', ''),' ',''),replace(REGEXP_REPLACE(UPPER(string_new), '[^a-z|A-Z|0-9]+', ''),' ','')) >= score THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
//the REPS_MTCH function is being called in this select statement. the select statement is to match names in the the Temp table name as REPS_MTCH_D_STDNT_TMP against the master table named as REPS_MTCH_D_STDNT_MSTR. what is the purpose of the REPS_MTCH function in this select statement?
SELECT
REPS_MTCH(REPS_MTCH_D_STDNT_TMP.FIRST_NAME,REPS_MTCH_D_STDNT_MSTR.FIRST_NAME,85) AS first_match_score,
what is the purpose of the REPS_MTCH function in this select statement?
In the above function the REGEXP_REPLACE is removing all occurrences any non alpha numeric or pipe (|) characters. After that the REGEXP_REPLACE is also wrapped in a redundant call to the regular REPLACE function which simply removes the spaces which were already removed by the REGEXP_REPLACE calls. The test could be rewritten as follows and still behave the identically since the inputs are first UPPERcased before the replace operations occur:
ELSIF utl_match.jaro_winkler_similarity(
REGEXP_REPLACE(UPPER(string_orig), '[^A-Z|0-9]+', '')
,REGEXP_REPLACE(UPPER(string_new) , '[^A-Z|0-9]+', '')
) >= score
THEN RETURN 1;
I simply removed the extra replace operation, the unnecessary lower case a-z and the extra pipe (|) character from the regular expression's character classes.
The JARO_WINKLER_SIMILARITY function just computes a score from 0 not similar to 100 identical of the remaining alpha numeric and pipe characters. You can check out the wikipedia entry on Jaro Winkler distances if you want to know more about them.

PostgreSQL function to select max values of split record

I have a number of tables 'App_build', 'Server_build' with a column called 'buildid' and it contains a large number of records. I.e.:
buildid
-----------
Application1_BLD_01
Application1_BLD_02
Application1_BLD_03
Application2_BLD_01
Application3_BLD_01
Application3_BLD_02
Application4_1_0_0_1 - old format to be disregarded
Application4_1_0_0_2
Application4_BLD_03
I want to write a function called getmax(tablename) i.e. getmax('App_build')
which will return a recordset which lists the highest values only. I.e:
buildid
--------
Application1_BLD_03
Application2_BLD_01
Application3_BLD_02
Application4_BLD_03
I am new to SQL so am not sure how to start - I guess I can use a split command and then the MAX function but I have no idea where to start.
Any help will be great.
Assuming current version PostgreSQL 9.2 for lack of information.
Plain SQL
The simple query could look like this:
SELECT max(buildid)
FROM app_build
WHERE buildid !~ '\d+_\d+_\d+_\d+$' -- to exclude old format
GROUP BY substring(buildid, '^[^_]+')
ORDER BY substring(buildid, '^[^_]+');
The WHERE condition used a regular expression:
buildid !~ '\d+_\d+_\d+_\d+$'
Excludes buildid that end in 4 integer numbers divided by _.
\d .. character class shorthand for digits. Only one backslash \ in modern PostgreSQL with standard_conforming_strings = ON.
+ .. 1 or more of preceding atom.
$ .. As last character: anchored to the end of the string.
There may be a cheaper / more accurate way, you did not properly specify the format.
GROUP BY and ORDER BY extract the the string before the first occurrence of _ with substring() as app name to group and order by. The regexp explained:
^ .. As first character: anchor search expression to start of string.
[^_] .. Character class: any chracter that is not _.
Does the same as split_part(buildid, '_', 1). But split_part() may be faster ..
Function
If you want to write a function where the table name is variable, you need dynamic SQL. That is a plpgsql function with EXECUTE:
CREATE OR REPLACE FUNCTION getmax(_tbl regclass)
RETURNS SETOF text AS
$func$
BEGIN
RETURN QUERY
EXECUTE format($$
SELECT max(buildid)
FROM %s
WHERE buildid !~ '\d+_\d+_\d+_\d+$'
GROUP BY substring(buildid, '^[^_]+')
ORDER BY substring(buildid, '^[^_]+')$$, _tbl);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM getmax('app_build');
Or if you are, in fact, using mixed case identifiers:
SELECT * FROM getmax('"App_build"');
->SQLfiddle demo.
More info on the object identifier class regclass in this related questions:
Table name as a PostgreSQL function parameter
What you want is a groupwise_max. It can be done with MAX() but the usual way is left join:
SELECT b1.buildid
FROM builds AS b1
LEFT JOIN builds AS b2 ON
split_part(b1.buildid, '_', 1)=split_part(b2.buildid, '_', 1)
AND
split_part(b1.buildid, '_', 3)::int<split_part(b2.buildid, '_', 3)::int
WHERE b2.buildid IS NULL;
But since you're using PG it can be done with DISTINCT ON ()
SELECT DISTINCT ON (split_part(buildid, '_', 1)) buildid
FROM builds
ORDER BY split_part(buildid, '_', 1),split_part(buildid, '_', 3)::int DESC
http://sqlfiddle.com/#!12/308bf/9

simple parameter substitution in regexp_matches postgreSQL function

I have a table with a structure like this...
the_geom data
geom1 data1+3000||data2+1000||data3+222
geom2 data1+500||data2+900||data3+22232
I want to create a function that returns the records by user request.
Example: for data2, retrieve geom1,1000 and geom2, 900
Till now I created this function (see below) which works quite good but I am facing a parameter substitution problem... (you can see I am not able to substitute 'data2' for $1 in... BUT yes I can use $1 later
regexp_matches(t::text, E'(data2[\+])([0-9]+)'::text)::text)[2]::integer
MY FUNCTION
create or replace function get_counts(taxa varchar(100))
returns setof record
as $$
SELECT t2.counter,t2.the_geom
FROM (
SELECT (regexp_matches(t.data::text, E'(data2[\+])([0-9]+)'::text)::text)[2]::integer as counter,the_geom
from (select the_geom,data from simple_inpn2 where data ~ $1::text) as t
) t2
$$
language sql;
SELECT get_counts('data2') will work **but we should be able to make this substitution**:
regexp_matches(t::text, E'($1... instead of E'(data2....
I think its more a syntaxis issue, as the function execution gives no error, just interprets $1 as a string and gives no result.
thanks in advance,
A E'$1' is a string literal (using the escape string syntax) containing a dollar sign followed by a one. An unquoted $1 is the first parameter to your function. So this:
regexp_matches(t, E'($1[\+])([0-9]+)'))[2]::integer
as you've found, won't interpolate the $1 with the function's first parameter.
The regex is just a string, a string with an internal structure but still just a string. If you know that $1 will be a normal word then you could say:
regexp_matches(t, E'(' || $1 || E'[\+])([0-9]+)'))[2]::integer
to paste your strings together into a suitable regex. However, it is better to be a little paranoid, sooner or later someone is going to call your function with a string like 'ha ha (' so you should be prepared for it. The easiest way that I can think of to add an arbitrary string to a regex is to escape all the non-word characters:
-- Don't forget to escape the escaped escapes! Hence all the backslashes.
str := regexp_replace($1, E'(\\W)', E'\\\\\\1', 'g');
and then paste str into the regex as above:
regexp_matches(t, E'(' || str || E'[\+])([0-9]+)'))[2]::integer
or better, build the regex outside the regexp_matches to cut down on the nested parentheses:
re := E'(' || str || E'[\+])([0-9]+)';
-- ...
select regexp_matches(t, re)[2]::integer ...
PostgreSQL doesn't have Perl's \Q...\E and the (?q) metasyntax applies until the end of the regex so I can't think of any better way to paste an arbitrary string into the middle of a regex as a non-regex literal value than to escape everything and let PostgreSQL sort it out.
Using this technique, we can do things like:
=> do $$
declare
m text[];
s text;
r text;
begin
s = E'''{ha)?';
r = regexp_replace(s, E'(\\W)', E'\\\\\\1', 'g');
r = '(ha' || r || ')';
raise notice '%', r;
select regexp_matches(E'ha''{ha)?', r) into m;
raise notice '%', m[1];
end$$;
and get the expected
NOTICE: ha'{ha)?
output. But if you leave out the regexp_replace escaping step, you'll just get an
invalid regular expression: parentheses () not balanced
error.
As an aside, I don't think you need all that casting so I removed it. The regexes and escaping are noisy enough, there's no need to throw a bunch of colons into the mix. Also, I don't know what your standard_conforming_strings is set to or which version of PostgreSQL you're using so I've gone with E'' strings everywhere. You'll also want to switch your procedure to PL/pgSQL (language plpgsql) to make the escaping easier.