String operations with regexp

String operations with regexp - sql

I have the following code :
DECLARE
r_blob_data VARCHAR2(4000);
r_length NUMBER;
r_payor_identifer VARCHAR2(100);
clp07_ctx NUMBER;
BEGIN
r_blob_data := 'ISA*00* *00* *ZZ*HCA835EOB *ZZ*BOFAECSUSO *170726*1513*U*00401*835013201*1*P*~\GS*HP*HCA835EOB*BOFAECSUSO*20170726*1513*13201*X*004010X091A1\ST*835*000000547\BPR*C*7.01*C*ACH*CTX*01*061000052*DA*3299975294*3621796494**01*122105744*DA*2500335303*20170728\NTE*ZZZ*SEE OUR WEBSITE #HEALTHCHOICEAZ.COM FOR INFORMATION ON CLAIMS DISPUTERESOLUTION\\DTM*405*20170728\N1*PR*HEALTH
CHOICE ARIZONA\N3*410 NORTH 44TH ST, SUITE 900\N4*PHOENIX*AZ*850080000\N1*PE*FL
AGSTAFF MEDICAL CT DBA*FI*860110232\N3*PO BOX 29435\N4*PHOENIX*AZ*850389435\LX*1
\CLP*10906801*1*22*7.01*0*HM*719281738*22\CLP*10906802*1*23*8.01*0*HM*719281739*
22\NM1*QC*100080033001*CLAIRITY*MICHAEL*E***MR*A77240030\NM1*82*2*FLAGSTAFF MEDI
CAL CT DBA TRAUM*****F*86-0110232\REF*1W*A77240030\AMT*AU*7.01\SVC*HC|93010*22*7
.01**1\DTM*472*20170616\CAS*CO*45*14.58\AMT*B6*7.01\SE*22*000000547\GE*1*13201\I
EA*1*835013201\ ';
select regexp_count(r_blob_data, 'CLP')
into r_length
from dual;
dbms_output.put_line('Number of CLP = '||r_length);
for i in 1 .. r_length
loop
SELECT instr(r_blob_data, 'CLP')
INTO clp07_ctx
FROM dual;
dbms_output.put_line('Clp07_ctx = '||clp07_ctx);
r_payor_identifer := substr(r_blob_data, instr(r_blob_data, '*',clp07_ctx,7)+1,instr(r_blob_data,'*',clp07_ctx,8)-instr(r_blob_data,'*',clp07_ctx,7)-1);
r_payor_identifer := to_char(r_payor_identifer) + to_char(r_payor_identifer);
dbms_output.put_line('CLP07String = '||r_payor_identifer);
end loop;
dbms_output.put_line('CLP07String = '||r_payor_identifer);
END;
What I am trying to do is count the number of occurrences of the CLP segment(a segment is for example, CLP*10906801*1*22*7.01*0*HM*719281738*22) and pull out the values that are equivalent to CLP value 7, in this case for both occurrences the value of 719281738 and get the r_payor_identifer variable output as "719281738719281738".
Any suggestions?

Try this REGEXP pattern : 'CLP[^\\]+\*(\d+)\*' to extract 719281738 and
719281739 from your string.
It matches CLP followed by anything other than a \ and the
number between last *..*
select REGEXP_SUBSTR(r_blob_data, 'CLP[^\\]+\*(\d+)\*' , 1 ,LEVEL, NULL,1) as clp
FROM t
CONNECT BY LEVEL <= REGEXP_COUNT(r_blob_data,'CLP[^\\]+');
Also, to set r_payor_identifer, use || for concatenation and not +
Demo

Related

Transforming Strings

I need to accept the following strings and put them into a type collection and pass it to a procedure
String Cars = Dodge Charger||Ford Mustang||Chevy Camro||Ford GT
String Cost = 35,000||25,000||29,000|55,000
String CarDesc = Power House||Sweet Ride||Too Cool||Blow your door off
How do I transform the records so they will be like the following?
Cars:
Dodge Charger||35,000||Power House
Ford Mustang||25,00||Sweet Ride
Chevy Camro||29,000||Too Cool
Ford GT||55,000||Blow your door off
How do I parse them into an array?
The types:
create or replace TYPE "CAR_OBJ"
AS
OBJECT (CAR_NAME VARCHAR2 (50),
Price Number,
CarDesc VARCHAR2 (100));
/
create or replace TYPE "CAR_IN_ARR"
IS
TABLE OF CAR_OBJ;
/
procedure car_values (
p_in_upd_car car_in_arr,
p_out_upd_results out car_out_cur
)
as
I have tried all kinds of for loops and I just can't get it in the right order
Thank you soo much

The double delimiter || makes this hard, so I cheated by replacing them with ~. Probably there is a neater way to handle this with a single regex, or a more elaborate approach using substr and instr.
Also I've assumed the cost example should be 35,000||25,000||29,000||55,000.
The real code should probably confirm that all the strings contain the same number of delimiters. Also you might want to parse the cost value into a number.
declare
inCars long := 'Dodge Charger||Ford Mustang||Chevy Camro||Ford GT';
inCost long := '35,000||25,000||29,000||55,000';
inCarDesc long := 'Power House||Sweet Ride||Too Cool||Blow your door off';
type varchar2_tt is table of varchar2(50);
cars varchar2_tt := varchar2_tt();
costs varchar2_tt := varchar2_tt();
carDescs varchar2_tt := varchar2_tt();
begin
inCars := replace(inCars,'||','~');
inCost := replace(inCost,'||','~');
inCarDesc := replace(inCarDesc,'||','~');
cars.extend(regexp_count(inCars,'~') +1);
costs.extend(regexp_count(inCost,'~') +1);
carDescs.extend(regexp_count(inCarDesc,'~') +1);
for i in 1..cars.count loop
cars(i) := regexp_substr(inCars,'[^~]+', 1, i);
costs(i) := regexp_substr(inCost,'[^~]+', 1, i);
carDescs(i) := regexp_substr(inCarDesc,'[^~]+', 1, i);
dbms_output.put_line(cars(i) || '||' || costs(i) || '||' || carDescs(i));
end loop;
end;

How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?

I've seen a bunch of different solutions on StackOverflow that span many years and many Postgres versions, but with some of the newer features like gen_random_bytes I want to ask again to see if there is a simpler solution in newer versions.
Given IDs which contain a-zA-Z0-9, and vary in size depending on where they're used, like...
bTFTxFDPPq
tcgHAdW3BD
IIo11r9J0D
FUW5I8iCiS
uXolWvg49Co5EfCo
LOscuAZu37yV84Sa
YyrbwLTRDb01TmyE
HoQk3a6atGWRMCSA
HwHSZgGRStDMwnNXHk3FmLDEbWAHE1Q9
qgpDcrNSMg87ngwcXTaZ9iImoUmXhSAv
RVZjqdKvtoafLi1O5HlvlpJoKzGeKJYS
3Rls4DjWxJaLfIJyXIEpcjWuh51aHHtK
(Like the IDs that Stripe uses.)
How can you generate them randomly and safely (as far as reducing collisions and reducing predictability goes) with an easy way to specify different lengths for different use cases, in Postgres 9.6+?
I'm thinking that ideally the solution has a signature similar to:
generate_uid(size integer) returns text
Where size is customizable depending on your own tradeoffs for lowering the chance of collisions vs. reducing the string size for usability.
From what I can tell, it must use gen_random_bytes() instead of random() for true randomness, to reduce the chance that they can be guessed.
Thanks!
I know there's gen_random_uuid() for UUIDs, but I don't want to use them in this case. I'm looking for something that gives me IDs similar to what Stripe (or others) use, that look like: "id": "ch_19iRv22eZvKYlo2CAxkjuHxZ" that are as short as possible while still containing only alphanumeric characters.
This requirement is also why encode(gen_random_bytes(), 'hex') isn't quite right for this case, since it reduces the character set and thus forces me to increase the length of the strings to avoid collisions.
I'm currently doing this in the application layer, but I'm looking to move it into the database layer to reduce interdependencies. Here's what the Node.js code for doing it in the application layer might look like:
var crypto = require('crypto');
var set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
function generate(length) {
var bytes = crypto.randomBytes(length);
var chars = [];
for (var i = 0; i < bytes.length; i++) {
chars.push(set[bytes[i] % set.length]);
}
return chars.join('');
}

Figured this out, here's a function that does it:
CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
bytes BYTEA := gen_random_bytes(size);
l INT := length(characters);
i INT := 0;
output TEXT := '';
BEGIN
WHILE i < size LOOP
output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
i := i + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
And then to run it simply do:
generate_uid(10)
-- '3Rls4DjWxJ'
Warning
When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10 for any reasonably commonly created object, I just used 10 as a simple example.
Usage
With the function defined, you can use it in a table definition, like so:
CREATE TABLE users (
id TEXT PRIMARY KEY DEFAULT generate_uid(10),
name TEXT NOT NULL,
...
);
And then when inserting data, like so:
INSERT INTO users (name) VALUES ('ian');
INSERT INTO users (name) VALUES ('victor');
SELECT * FROM users;
It will automatically generate the id values:
id | name | ...
-----------+--------+-----
owmCAx552Q | ian |
ZIofD6l3X9 | victor |
Usage with a Prefix
Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:
CREATE TABLE users (
id TEXT PRIMARY KEY DEFAULT ('user_' || generate_uid(10)),
name TEXT NOT NULL,
...
);
INSERT INTO users (name) VALUES ('ian');
INSERT INTO users (name) VALUES ('victor');
SELECT * FROM users;
id | name | ...
---------------+--------+-----
user_wABNZRD5Zk | ian |
user_ISzGcTVj8f | victor |

I'm looking for something that gives me "shortcodes" (similar to what Youtube uses for video IDs) that are as short as possible while still containing only alphanumeric characters.
This is a fundamentally different question from what you first asked. What you want here then is to put a serial type on the table, and to use hashids.org code for PostgreSQL.
This returns 1:1 with the unique number (serial)
Never repeats or has a chance of collision.
Also base62 [a-zA-Z0-9]
Code looks like this,
SELECT id, hash_encode(foo.id)
FROM foo; -- Result: jNl for 1001
SELECT hash_decode('jNl') -- returns 1001
This module also supports salts.

Review,
26 characters in [a-z]
26 characters in [A-Z]
10 characters in [0-9]
62 characters in [a-zA-Z0-9] (base62)
The function substring(string [from int] [for int]) looks useful.
So it looks something like this. First we demonstrate that we can take the random-range and pull from it.
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
1, -- 1 is 'a', 62 is '9'
1,
);
Now we need a range between 1 and 63
SELECT trunc(random()*62+1)::int+1
FROM generate_series(1,1e2) AS gs(x)
This gets us there.. Now we just have to join the two..
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1
1
)
FROM generate_series(1,1e2) AS gs(x);
Then we wrap it in an ARRAY constructor (because this is fast)
SELECT ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
);
And, we call array_to_string() to get a text.
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
)
, ''
);
From here we can even turn it into a function..
CREATE FUNCTION random_string(randomLength int)
RETURNS text AS $$
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,randomLength) AS gs(x)
)
, ''
)
$$ LANGUAGE SQL
RETURNS NULL ON NULL INPUT
VOLATILE LEAKPROOF;
and then
SELECT * FROM random_string(10);

Thanks to Evan Carroll answer, I took a look on hashids.org.
For Postgres you have to compile the extension or run some TSQL functions.
But for my needs, I created something simpler based on hashids ideas (short, unguessable, unique, custom alphabet, avoid curse words).
Shuffle alphabet:
CREATE OR REPLACE FUNCTION consistent_shuffle(alphabet TEXT, salt TEXT) RETURNS TEXT AS $$
DECLARE
SALT_LENGTH INT := length(salt);
integer INT = 0;
temp TEXT = '';
j INT = 0;
v INT := 0;
p INT := 0;
i INT := length(alphabet) - 1;
output TEXT := alphabet;
BEGIN
IF salt IS NULL OR length(LTRIM(RTRIM(salt))) = 0 THEN
RETURN alphabet;
END IF;
WHILE i > 0 LOOP
v := v % SALT_LENGTH;
integer := ASCII(substr(salt, v + 1, 1));
p := p + integer;
j := (integer + v + p) % i;
temp := substr(output, j + 1, 1);
output := substr(output, 1, j) || substr(output, i + 1, 1) || substr(output, j + 2);
output := substr(output, 1, i) || temp || substr(output, i + 2);
i := i - 1;
v := v + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
The main function:
CREATE OR REPLACE FUNCTION generate_uid(id INT, min_length INT, salt TEXT) RETURNS TEXT AS $$
DECLARE
clean_alphabet TEXT := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
curse_chars TEXT := 'csfhuit';
curse TEXT := curse_chars || UPPER(curse_chars);
alphabet TEXT := regexp_replace(clean_alphabet, '[' || curse || ']', '', 'gi');
shuffle_alphabet TEXT := consistent_shuffle(alphabet, salt);
char_length INT := length(alphabet);
output TEXT := '';
BEGIN
WHILE id != 0 LOOP
output := output || substr(shuffle_alphabet, (id % char_length) + 1, 1);
id := trunc(id / char_length);
END LOOP;
curse := consistent_shuffle(curse, output || salt);
output := RPAD(output, min_length, curse);
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
How-to use examples:
-- 3: min-length
select generate_uid(123, 3, 'salt'); -- output: "0mH"
-- or as default value in a table
CREATE SEQUENCE IF NOT EXISTS my_id_serial START 1;
CREATE TABLE collections (
id TEXT PRIMARY KEY DEFAULT generate_uid(CAST (nextval('my_id_serial') AS INTEGER), 3, 'salt')
);
insert into collections DEFAULT VALUES ;

This query generate required string. Just change second parasmeter of generate_series to choose length of random string.
SELECT
string_agg(c, '')
FROM (
SELECT
chr(r + CASE WHEN r > 25 + 9 THEN 97 - 26 - 9 WHEN r > 9 THEN 64 - 9 ELSE 48 END) AS c
FROM (
SELECT
i,
(random() * 60)::int AS r
FROM
generate_series(0, 62) AS i
) AS a
ORDER BY i
) AS A;

So I had my own use-case for something like this. I am not proposing a solution to the top question, but if you are looking for something similar like I am, then try this out.
My use-case was that I needed to create a random external UUID (as a primary key) with as few characters as possible. Thankfully, the scenario did not have a requirement that a large amount of these would ever be needed (probably in the thousands only). Therefore a simple solution was a combination of using generate_uid() and checking to make sure that the next sequence was not already used.
Here is how I put it together:
CREATE OR REPLACE FUNCTION generate_id (
in length INT
, in for_table text
, in for_column text
, OUT next_id TEXT
) AS
$$
DECLARE
id_is_used BOOLEAN;
loop_count INT := 0;
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
loop_length INT;
BEGIN
LOOP
next_id := '';
loop_length := 0;
WHILE loop_length < length LOOP
next_id := next_id || substr(characters, get_byte(gen_random_bytes(length), loop_length) % length(characters) + 1, 1);
loop_length := loop_length + 1;
END LOOP;
EXECUTE format('SELECT TRUE FROM %s WHERE %s = %s LIMIT 1', for_table, for_column, quote_literal(next_id)) into id_is_used;
EXIT WHEN id_is_used IS NULL;
loop_count := loop_count + 1;
IF loop_count > 100 THEN
RAISE EXCEPTION 'Too many loops. Might be reaching the practical limit for the given length.';
END IF;
END LOOP;
END
$$
LANGUAGE plpgsql
STABLE
;
here is an example table usage:
create table some_table (
id
TEXT
DEFAULT generate_id(6, 'some_table', 'id')
PRIMARY KEY
)
;
and a test to see how it breaks:
DO
$$
DECLARE
loop_count INT := 0;
BEGIN
-- WHILE LOOP
WHILE loop_count < 1000000
LOOP
INSERT INTO some_table VALUES (DEFAULT);
loop_count := loop_count + 1;
END LOOP;
END
$$ LANGUAGE plpgsql
;

Oracle : String Concatenation is too long

I have below SQL as a part of a view. In one of the schema I am getting "String Concatenation is too long" error and not able to execute the view.
Hence I tried the TO_CLOB() and now VIEW is not throwing ERROR, but it not returning the result as well it keep on running..
Please suggest....
Sql:
SELECT Iav.Item_Id Attr_Item_Id,
LISTAGG(La.Attribute_Name
||'|~|'
|| Lav.Attribute_Value
||' '
|| Lau.Attribute_Uom, '}~}') WITHIN GROUP (
ORDER BY ICA.DISP_SEQ,LA.ATTRIBUTE_NAME) AS ATTR
FROM Item_Attribute_Values Iav,
Loc_Attribute_Values Lav,
Loc_Attribute_Uoms Lau,
Loc_Attributes La,
(SELECT *
FROM Item_Classification Ic,
CATEGORY_ATTRIBUTES CA
WHERE IC.DEFAULT_CATEGORY='Y'
AND IC.TAXONOMY_TREE_ID =CA.TAXONOMY_TREE_ID
) ICA
WHERE IAV.ITEM_ID =ICA.ITEM_ID(+)
AND IAV.ATTRIBUTE_ID =ICA.ATTRIBUTE_ID(+)
AND Iav.Loc_Attribute_Id =La.Loc_Attribute_Id
AND La.Locale_Id =1
AND Iav.Loc_Attribute_Uom_Id =Lau.Loc_Attribute_Uom_Id(+)
AND Iav.Loc_Attribute_Value_Id=Lav.Loc_Attribute_Value_Id
GROUP BY Iav.Item_Id;
Error:
ORA-01489: result of string concatenation is too long
01489. 00000 - "result of string concatenation is too long"
*Cause: String concatenation result is more than the maximum size.
*Action: Make sure that the result is less than the maximum size.

You can use the COLLECT() function to aggregate the strings into a collection and then use a User-Defined function to concatenate the strings:
Oracle Setup:
CREATE TYPE stringlist IS TABLE OF VARCHAR2(4000);
/
CREATE FUNCTION concat_List(
strings IN stringlist,
delim IN VARCHAR2 DEFAULT ','
) RETURN CLOB DETERMINISTIC
IS
value CLOB;
i PLS_INTEGER;
BEGIN
IF strings IS NULL THEN
RETURN NULL;
END IF;
value := EMPTY_CLOB();
IF strings IS NOT EMPTY THEN
i := strings.FIRST;
LOOP
IF i > strings.FIRST AND delim IS NOT NULL THEN
value := value || delim;
END IF;
value := value || strings(i);
EXIT WHEN i = strings.LAST;
i := strings.NEXT(i);
END LOOP;
END IF;
RETURN value;
END;
/
Query:
SELECT Iav.Item_Id AS Attr_Item_Id,
CONCAT_LIST(
CAST(
COLLECT(
La.Attribute_Name || '|~|' || Lav.Attribute_Value ||' '|| Lau.Attribute_Uom
ORDER BY ICA.DISP_SEQ,LA.ATTRIBUTE_NAME
)
AS stringlist
),
'}~}'
) AS ATTR
FROM your_table
GROUP BY iav.item_id;

LISTAGG is limited to 4000 characters unfortunately. So you may want to use another approach to concatenate the values.
Anyway ...
It is strange to see LISTAGG which is a rather new feature combined with error-prone SQL1992 joins. I'd suggest you re-write this. Are the tables even properly joined? It looks strange that there seems to be no relation between Loc_Attributes and, say, Loc_Attribute_Values. Doesn't have Loc_Attribute_Values a Loc_Attribute_Id so an attribute value relates to an attribute? It would be hard to believe that there is no such relation.
Moreover: Is it guaranteed that your classification subquery doesn't return more than one record per attribute?
Here is your query re-written:
select
iav.item_id as attr_item_id,
listagg(la.attribute_name || '|~|' || lav.attribute_value || ' ' || lau.attribute_uom,
'}~}') within group (order by ica.disp_seq, la.attribute_name) as attr
from item_attribute_values iav
join loc_attribute_values lav
on lav.loc_attribute_value_id = iav.loc_attribute_value_id
and lav.loc_attribute_id = iav.loc_attribute_id -- <== maybe?
join loc_attributes la
on la.loc_attribute_id = lav.loc_attribute_id
and la.loc_attribute_id = lav.loc_attribute_id -- <== maybe?
and la.locale_id = 1
left join loc_attribute_uoms lau
on lau.loc_attribute_uom_id = iav.loc_attribute_uom_id
and lau.loc_attribute_id = iav.loc_attribute_id -- <== maybe?
left join
(
-- aggregation needed to get no more than one sortkey per item attribute?
select ic.item_id, ca.attribute_id, min (ca.disp_seq) as disp_seq
from item_classification ic
join category_attributes ca on ca.taxonomy_tree_id = ic.taxonomy_tree_id
where ic.default_category = 'y'
group by ic.item_id, ca.attribute_id
) ica on ica.item_id = iav.item_id and ica.attribute_id = iav.attribute_id
group by iav.item_id;
Well, you get the idea; check your keys and alter your join criteria where necessary. Maybe this gets rid of duplicates, so LISTAGG has to concatenate less attributes, and maybe the result even stays within 4000 characters.

Xquery approach.
Creating extra types or function isn't necessary.
with test_tab
as (select object_name
from all_objects
where rownum < 1000)
, aggregate_to_xml as (select xmlagg(xmlelement(val, object_name)) xmls from test_tab)
select xmlcast(xmlquery('for $row at $idx in ./*/text() return if($idx=1) then $row else concat(",",$row)'
passing aggregate_to_xml.xmls returning content) as Clob) as list_in_lob
from aggregate_to_xml;

I guess you need to write a small function to concatenate the strings into a CLOB, because even when you cast TO_CLOB() the LISTAGG at the end, this might not work.
Here´s a sample-function that takes a SELECT-Statement (which MUST return only one string-column!) and a separator and returns the collected values as a CLOB:
CREATE OR REPLACE FUNCTION listAggCLob(p_stringSelect VARCHAR2
, p_separator VARCHAR2)
RETURN CLOB
AS
cur SYS_REFCURSOR;
s VARCHAR2(4000);
c CLOB;
i INTEGER;
BEGIN
dbms_lob.createtemporary(c, FALSE);
IF (p_stringSelect IS NOT NULL) THEN
OPEN cur FOR p_stringSelect;
LOOP
FETCH cur INTO s;
EXIT WHEN cur%NOTFOUND;
dbms_lob.append(c, s || p_separator);
END LOOP;
END IF;
i := length(c);
IF (i > 0) THEN
RETURN dbms_lob.substr(c,i-length(p_separator));
ELSE
RETURN NULL;
END IF;
END;
This function can be used f.e. like this:
WITH cat AS (
SELECT DISTINCT t1.category
FROM lookup t1
)
SELECT cat.category
, listAggCLob('select t2.name from lookup t2 where t2.category = ''' || cat.category || '''', '|') allcategorynames
FROM cat;

Using pl/sql output in WHERE-statement

I have a little question. So I'm using the Levenshtein-score to search for a comparison of more than 85% between street names in two different tables. But when I use my Levenshtein-score calculation in my WHERE-statement, I get as output for example this:
street names in one table: BEAU SITE 1ÈRE AVENUE & BEAU SITE 2ÈME AVENUE
street names in the other table: BEAU SITE-1ÈRE AVENUE & BEAU SITE-2ÈRE AVENUE
output: linking between all, first one with first and second on, and second one with first and second.
So I have to use the largest score for all the score-calculations and is this like this:
DECLARE
L_SCORE NUMBER;
L_NEW_SCORE NUMBER;
L_BEST_MAP varchar2(255);
CURSOR C_TO_FIND IS
SELECT TT_NAME, L_MUNI, R_MUNI
FROM Y_TT_NOT_LINKED_STREETS ;
CURSOR C_TOMTOM_STREET (L_MUNI VARCHAR2) IS
SELECT STREET_NAME
FROM STREET_NAME SN
JOIN STREET STR ON STR.STREET_ID = SN.STR_STREET_ID
JOIN ADMINISTRATIVE_AREA_NAME AAN ON AAN.AAR_ADMIN_AREA_ID = STR.AAR_ADMIN_AREA_ID
WHERE AAN.ADMIN_AREA_NAME = L_MUNI
AND STR.STREET_ID NOT IN (SELECT ROMA_STREET_ID FROM Y_DS_STREETS_LINK);
BEGIN
FOR S IN C_TO_FIND LOOP
L_SCORE := 0;
L_NEW_SCORE := 0;
FOR R IN C_TOMTOM_STREET(S.L_MUNI) LOOP
L_SCORE := PCK$ADDRESSMATCH.GET_LEVENSHTEIN_SCORE(S.TT_NAME,R.STREET_NAME);
IF L_SCORE > L_NEW_SCORE THEN
L_NEW_SCORE := L_SCORE ;
L_BEST_MAP := R.STREET_NAME ||CHR(9)||TO_CHAR(L_NEW_SCORE);
END IF;
END LOOP;
IF L_NEW_SCORE > 85 THEN
DBMS_OUTPUT.PUT_LINE(S.L_MUNI||CHR(9)||S.TT_NAME||chr(9)||L_BEST_MAP);
END IF;
L_NEW_SCORE := 0;
END LOOP;
END;
Now is the question, how can I use the output in a WHERE-statement, so that I can link with only the largest Levenshtein-score and the problem above will not occur?
So that:
SELECT ...
FROM ...
WHERE (largest score from previous block code)
(or on another way, after a whole week doing SQL I can't see a solution for this)
Thx! =)

COUNT function for files?

Is it possible to use COUNT in some way that will give me the number of tuples that are in a .sql file? I tried using it in a query with the file name like this:
SELECT COUNT(*) FROM #q65b;
It tells me that the table is invalid, which I understand because it isn't a table, q65b is a file with a query saved in it. I'm trying to compare the number of rows in q65b to a view that I have created. Is this possible or do I just have to run the query and check the number of rows at the bottom?
Thanks

You can do this in SQL*Plus. For example:
Create the text file, containing the query (note: no semicolon!):
select * from dual
Save it in a file, e.g. myqueryfile.txt, to the folder accessible from your SQL*Plus session.
You can now call this from within another SQL query - but make sure the # as at the start of a line, e.g.:
SQL> select * from (
2 #myqueryfile.txt
3 );
D
-
X
I don't personally use this feature much, however.

Here is one approach. It's a function which reads a file in a directory, wraps the contents in a select count(*) from ( .... ) construct and executes the resultant statement.
1 create or replace function get_cnt
2 ( p_file in varchar2 )
3 return number
4 as
5 n pls_integer;
6 stmt varchar2(32767);
7 f_line varchar2(255);
8 fh utl_file.file_type;
9 begin
10 stmt := 'select count(*) from (';
11 fh := utl_file.fopen('SQL_SCRIPTS', p_file, 'R');
12 loop
13 utl_file.get_line(fh, f_line );
14 if f_line is null then exit;
15 elsif f_line = '/' then exit;
16 else stmt := stmt ||chr(10)||f_line;
17 end if;
18 end loop;
19 stmt := stmt || ')';
20 execute immediate stmt into n;
21 return n;
22* end get_cnt;
SQL>
Here is the contents of a sql file:
select * from emp
/
~
~
~
"scripts/q_emp.sql" 3L, 21C
And here is how the script runs:
SQL> select get_cnt ('q_emp.sql') from dual
2 /
GET_CNT('Q_EMP.SQL')
--------------------
14
SQL>
So it works. Obviously what I have posted is just a proof of concept. You will need to include lots of error handling for the UTL_FILE aspects - it's a package which can throw lots of exceptions - and probably some safety checking of the script that gets passed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

String operations with regexp - sql

Related

Transforming Strings

How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?

Oracle : String Concatenation is too long

Using pl/sql output in WHERE-statement

COUNT function for files?

Categories

Resources