split CLOB field into table like structure - sql

I have field in a Oracle database of type CLOB. I would like to split this field in several columns and rows. Here's an example of the content:
4:true5:false24:<p>option sample 1.</p>4:true22:<p>option sample 2.</p>5:false23:<p>option sample 3.</p>5:false22:<p>option sample 4.</p>5:false
The result should look like this:
ID LEVEL ANSWER_OPTION VALUE
1 3 option sample 3 false
1 4 option sample 4 false
2 3 option sample 3 false
4 3 option sample 3 true
3 2 option sample 2 false
1 2 option sample 2 true
2 1 option sample 1 true
2 4 option sample 4 false
4 1 option sample 1 false
2 2 option sample 2 false
4 2 option sample 2 false
1 1 option sample 1 false
3 4 option sample 4 false
4 4 option sample 4 false
3 3 option sample 3 false
3 1 option sample 1 true
We have made the following statement which created the result above.
with guest_string as
( select qsn.id id
, dbms_lob.substr( qsn.guest, 2000, 1 ) answer_options
from mneme_question qsn
where qsn.id < 10
)
select distinct id
, level
, substr(regexp_substr( answer_options'<p>[^<]+', 1, level, 'i'), 4) ANSWER_OPTION
, substr(regexp_substr( answer_options, '(true|false)', regexp_instr( answer_options, '</p>', 1, 1), level, 'i'), 1) VALUE
from guest_string
connect by regexp_substr( answer_options, '<p>[^<]+', 1, level, 'i') is not null
The problem with this code is it takes way to long to split all records we have. We had to cut it off at 10 rows (5 row take 0.25 sec, 10 takes 16 seconds, 15 rows takes about over 2,5 minutes). We currently have 30000 rows and they will grow. At the moment we cannot change the software to change the datamodel, so we will have to do this ad hoc.
Our current approach is to create a procedure that will be called for each record, but it would be better to have a faster parsing. Does anybody have a suggestion how to create a script that can do this in a reasonable time. We can do this at night time, but preferably it shouldn't take longer than 2 minutes.
BTW in the future we could build some sort of mechanism to determine which records have already been parsed, but that also requires some form of detecting already parsed fields that have changed in the mean time. We don't have time to do that yet, so for now we need crude parsing as fast as possible.
Thanks

Here is a different approach.
The code may not be as nice as it can, and you will probably have to fix some small things...
I checked it on 11g (couldn't find 10g) and used your input as the values in my clob column.
For 10 rows (all with the same input as you gave as example),
original query: 9.8 sec
new query: 0.08 sec
I used a pipelined function, here is the code:
create or replace type t_parse is object(idd number, levell number, answer_option varchar2(128), valuee varchar2(4000));
/
create or replace type tab_parse is table of t_parse;
/
create or replace function split_answers return tab_parse
pipelined is
cursor c is
select * from mneme_question;
str_t clob;
phraseP varchar2(128);
phraseV varchar2(8);
i1s number;
i1e number;
i2s number;
levell number;
begin
for r in c loop
str_t := r.guest;
levell := 1;
while str_t is not null loop
i1s := dbms_lob.instr(str_t, '<p>', 1, 1) + 3;
if i1s = 3 then
str_t := '';
else
i1e := dbms_lob.instr(str_t, '</p>', 1, 1);
phraseP := dbms_lob.substr(str_t, i1e - i1s, i1s);
str_t := dbms_lob.substr(str_t, offset => i1e + 4);
i2s := dbms_lob.instr(str_t, 'true', 1, 1) ;
if i2s = 0 then
i2s := dbms_lob.instr(str_t, 'false', 1, 1) ;
if i2s = 0 then
str_t := '';
else
phraseV := dbms_lob.substr(str_t, 5, i2s);
pipe row(t_parse(r.id, levell, phraseP, phraseV));
levell := levell + 1;
end if;
else
phraseV := dbms_lob.substr(str_t, 4, i2s);
pipe row(t_parse(r.id, levell, phraseP, phraseV));
levell := levell + 1;
end if;
end if;
end loop;
end loop;
return;
end split_answers;
/
new query should be like:
select * from table(split_answers);

There is some function whis returning pipelined table:
CREATE OR REPLACE FUNCTION split_clob(p_clob IN CLOB DEFAULT NULL,
p_varchar IN VARCHAR2 DEFAULT NULL,
p_separator IN VARCHAR2)
RETURN varchar_set
PIPELINED IS
l_clob CLOB;
BEGIN
l_clob := nvl(p_clob,
to_clob(p_varchar));
FOR rec IN (
WITH vals AS
(SELECT CAST(TRIM(regexp_substr(l_clob,
'[^'||p_separator||']+',
1,
levels.column_value)) AS
VARCHAR2(320)) AS val
FROM TABLE(CAST(MULTISET
(SELECT LEVEL
FROM dual
CONNECT BY LEVEL <=
length(regexp_replace(l_clob,
'[^'||p_separator||']+')) + 1) AS
sys.odcinumberlist)) levels)
SELECT val FROM vals)
LOOP
PIPE ROW(rec.val);
END LOOP;
RETURN;
END;
U can use it like this:
-with table with clob
SELECT t2.column_value FROM my_table t, TABLE(split_clob(p_clob => t.clob_column,p_separator => ',')) t2
-or with varchar too
SELECT * FROM TABLE(split_clob(p_varchar => '1,2,3,4,5,6, 7, 8, 9', p_separator => ','))

Related

IF condition containing IN operator inside ORACLE Trigger is not working

I have a trigger that uses IF condition with IN operator and two variables v_audit_user and v_evdnse_user inside IN. Both variables are containing comma separated ID values. The trigger gets compiled successfully with no errors. I am not understanding why the IF condition with IN is not working. When I select the function that assigns value to the variables independently, I do see the comma separated values, so nothing is wrong with function (see screenshot).
create or replace TRIGGER TRG_CHK_HRCQA_CASE_ACTIONS
AFTER INSERT ON KDD_CASE_ACTIONS
FOR EACH ROW
DECLARE
user_audit kdd_review_owner.OWNER_ID%TYPE; /* The user that is displayed in audit */
user_evdnse kdd_review_owner.OWNER_ID%TYPE; /* The user that took action in evidence tab */
v_audit_user NUMBER; /* The HRCO/QA user from audit tab */
v_evdnse_user NUMBER; /* The HRCO/QA user from evidence tab */
LV_ERRORCODE VARCHAR2(1000);
BEGIN
/* pass the username into the variables */
SELECT kro.OWNER_ID into user_audit from kdd_review_owner kro where kro.OWNER_SEQ_ID = :NEW.ACTION_BY_ID;
SELECT kro.OWNER_ID into user_evdnse from kdd_review_owner kro where kro.OWNER_SEQ_ID = :NEW.ACTION_BY_ID;
/* fetch the comma separated IDs */
v_audit_user := F_GET_HRCQA_ACTIONS(:NEW.CASE_INTRL_ID,user_audit,'AUDIT');
v_evdnse_user := F_GET_HRCQA_ACTIONS(:NEW.CASE_INTRL_ID,user_evdnse,'EVDNSE');
-- select ENTITY_ID into v_evdnse_user from table(f_get_arg_table(F_GET_HRCQA_ACTIONS(:NEW.CASE_INTRL_ID,user_evdnse,'EVDNSE')));
/* If the action taken is by QA or HRCO role */
IF (:NEW.ACTION_SEQ_ID in (v_audit_user,v_evdnse_user))
THEN
/* then insert record in the SC_HRCQA_CASE_ACTIONS table with IS_HRCO_QA flag as Y */
Insert into SC_HRCQA_CASE_ACTIONS (ACTION_SEQ_ID,ACTION_BY_ID,ACTION_TS,STATUS_CD,CASE_INTRL_ID,ACTION_ID,NEW_CASE_OWNR_ASSGN_ID,CASE_DUE_TS,PREV_CASE_OWNR_ASSGN_ID,IS_HRCO_QA)
values (:NEW.ACTION_SEQ_ID, :NEW.ACTION_BY_ID, :NEW.ACTION_TS, :NEW.STATUS_CD, :NEW.CASE_INTRL_ID, :NEW.ACTION_ID, :NEW.NEW_CASE_OWNR_ASSGN_ID, :NEW.CASE_DUE_TS, :NEW.PREV_CASE_OWNR_ASSGN_ID,'Y');
-- ELSE
--
-- /* else the logged in user is NOT HRCO/QA hence insert record in the SC_HRCQA_CASE_ACTIONS table with IS_HRCO_QA flag as N */
--
-- Insert into SC_HRCQA_CASE_ACTIONS (ACTION_SEQ_ID,ACTION_BY_ID,ACTION_TS,STATUS_CD,CASE_INTRL_ID,ACTION_ID,NEW_CASE_OWNR_ASSGN_ID,CASE_DUE_TS,PREV_CASE_OWNR_ASSGN_ID,IS_HRCO_QA)
-- values (:NEW.ACTION_SEQ_ID, :NEW.ACTION_BY_ID, :NEW.ACTION_TS, :NEW.STATUS_CD, :NEW.CASE_INTRL_ID, :NEW.ACTION_ID, :NEW.NEW_CASE_OWNR_ASSGN_ID, :NEW.CASE_DUE_TS, :NEW.PREV_CASE_OWNR_ASSGN_ID,'N');
END IF;
EXCEPTION
WHEN OTHERS THEN LV_ERRORCODE := SQLCODE;
INSERT INTO KDD_LOGS_MSGS (LOG_DT, LOG_INFO_TX, REMARK_TX)
VALUES (SYSDATE,'ErrorCode - ' || LV_ERRORCODE,'TRG_CHK_HRCQA_CASE_ACTIONS');
END;
You cannot pass a comma-delimited string stored in a single variable to an IN condition and expect it to be parsed as multiple values as it is not.
If you want to use a single variable containing a delimited list then you will need to use string functions to find a sub-string match:
create or replace TRIGGER TRG_CHK_HRCQA_CASE_ACTIONS
AFTER INSERT ON KDD_CASE_ACTIONS
FOR EACH ROW
DECLARE
v_owner_id kdd_review_owner.OWNER_ID%TYPE;
v_audit_user VARCHAR2(1000);
v_evdnse_user VARCHAR2(1000);
LV_ERRORCODE VARCHAR2(1000);
BEGIN
SELECT OWNER_ID
into v_owner_id -- You only need one variable here
from kdd_review_owner
where OWNER_SEQ_ID = :NEW.ACTION_BY_ID;
v_audit_user := F_GET_HRCQA_ACTIONS(:NEW.CASE_INTRL_ID, v_owner_id, 'AUDIT');
v_evdnse_user := F_GET_HRCQA_ACTIONS(:NEW.CASE_INTRL_ID, v_owner_id, 'EVDNSE');
IF ','||v_audit_user||','||v_evdnse_user||',' LIKE '%,'||:NEW.ACTION_SEQ_ID||',%'
THEN
Insert into SC_HRCQA_CASE_ACTIONS (
ACTION_SEQ_ID, ACTION_BY_ID, ACTION_TS, STATUS_CD, CASE_INTRL_ID,
ACTION_ID, NEW_CASE_OWNR_ASSGN_ID, CASE_DUE_TS, PREV_CASE_OWNR_ASSGN_ID, IS_HRCO_QA
) values (
:NEW.ACTION_SEQ_ID, :NEW.ACTION_BY_ID, :NEW.ACTION_TS, :NEW.STATUS_CD, :NEW.CASE_INTRL_ID,
:NEW.ACTION_ID, :NEW.NEW_CASE_OWNR_ASSGN_ID, :NEW.CASE_DUE_TS, :NEW.PREV_CASE_OWNR_ASSGN_ID,'Y'
);
END IF;
EXCEPTION
WHEN OTHERS THEN LV_ERRORCODE := SQLCODE;
INSERT INTO KDD_LOGS_MSGS (LOG_DT, LOG_INFO_TX, REMARK_TX)
VALUES (SYSDATE,'ErrorCode - ' || LV_ERRORCODE,'TRG_CHK_HRCQA_CASE_ACTIONS');
END;
/
I don't have your tables so I'll try to illustrate it using my own code.
This is what you're doing now: to us (humans), it is obvious that L_NEW_ID (its value is 10) is contained in L_VAR1 (its value is '10,20'). However, Oracle doesn't recognize that and returns "Not OK":
SQL> declare
2 -- does L_NEW_ID exist in L_VAR1 and L_VAR2 (using your code)?
3 l_new_id number := 10;
4 --
5 l_var1 varchar2(20) := '10,20';
6 l_var2 varchar2(20) := '30,40';
7 begin
8 if l_new_id in (l_var1, l_var2) then
9 dbms_output.put_line('OK');
10 else
11 dbms_output.put_line('Not OK');
12 end if;
13 end;
14 /
Not OK
PL/SQL procedure successfully completed.
SQL>
So, what to do? One option is to split variables to rows and then check whether search value (10, right?) exists in such a list of values (rows). The result is - as you can see - "OK":
SQL> declare
2 l_new_id number := 10;
3 --
4 l_var1 varchar2(20) := '10,20';
5 l_var2 varchar2(20) := '30,40';
6 l_cnt number;
7 begin
8 -- count how many times L_NEW_ID exists in L_VAR1 nad L_VAR2
9 select count(*)
10 into l_cnt
11 from (-- split L_VAR1 into rows
12 select regexp_substr(l_var1, '[^,]+', 1, level) val
13 from dual
14 connect by level <= regexp_count(l_var1, ',') + 1
15 union all
16 -- split L_VAR2 into rows
17 select regexp_substr(l_var2, '[^,]+', 1, level)
18 from dual
19 connect by level <= regexp_count(l_var2, ',') + 1
20 ) x
21 where x.val = l_new_id;
22
23 if l_cnt > 0 then
24 dbms_output.put_line('OK');
25 else
26 dbms_output.put_line('Not OK');
27 end if;
28 end;
29 /
OK
PL/SQL procedure successfully completed.
SQL>
Just to verify it, let's change L_NEW_ID value to e.g. 99 and see what happens; as expected, "Not OK" as 99 isn't contained in 10, 20 nor 30, 40.
SQL> l2
2* l_new_id number := 10;
SQL> c/10/99
2* l_new_id number := 99;
SQL> /
Not OK
PL/SQL procedure successfully completed.
SQL>

check if two values are present in a table with plsql in oracle sql

I'm trying to create a procedure, that checks if two values are present in a table.
The logic is as follows: Create a function called get_authority. This function takes two parameters (found in the account_owner table): cust_id and acc_id, and returns 1 (one), if the customer has the right to make withdrawals from the account, or 0 (zero), if the customer doesn't have any authority to the account. I'm writing plsql and using oracle live sql. I can't figure out how to handle the scenario where a customer has two accounts!
account_owner is seen here:
create or replace function get_authority(
p_cust_id in account_owner.cust_id%type,
p_acc_id in account_owner.acc_id%type
)
return varchar2
as
v_return number(1);
v_acc_id account_owner.acc_id%type;
v_cust_id account_owner.cust_id%type;
begin
for v_ in (select account_owner.cust_id,
account_owner.acc_id
from account_owner
where p_cust_id = cust_id)
LOOP
if p_cust_id = v_cust_id and p_acc_id = v_acc_id then
v_return := v_return + 1;
else
v_return := v_return + 0;
end if;
return v_return;
END LOOP;
end;
/
When I check for the cust_id I get the return 0 - but it should be 1??
select get_authority('650707-1111',123) from dual;
return:
GET_AUTHORITY('650707-1111',123)
0
What do I do wrong?
You got 0? How come; should be NULL.
v_return number(1);
so it is initially NULL. Later on, you're adding "something" to it, but - adding anything to NULL will be NULL:
SQL> select 25 + null as result from dual;
RESULT
----------
SQL>
Therefore, set its default value to 0 (zero):
v_return number(1) := 0;
Also, you declared two additional variables:
v_acc_id account_owner.acc_id%type;
v_cust_id account_owner.cust_id%type;
Then you compare them to values passed as parameters; as they are NULL, ELSE is executed.
Furthermore, there's a loop, but you don't do anything with it. If you meant that this:
for v_ in (select account_owner.cust_id,
(rewritten as for v_ in (select cust_id) evaluates to v_cust_id - it does not. Cursor variables are referred to as v_.cust_id (note the dot in between).
Also, if there's only one row per p_cust_id and p_acc_id, why do you use cursor FOR loop at all? To avoid no_data_found or too_many_rows? I wouldn't do that; yes, it fixes such "errors", but is confusing. You'd rather properly handle exceptions.
Here's what you might have done:
Sample data:
SQL> select * From account_owner;
ACCOW_ID CUST_ID ACC_ID
---------- ----------- ----------
1 650707-1111 123
2 560126-1148 123
3 650707-1111 5899
Function; if there are more rows per parameters' combination, max function will make sure that too_many_rows is avoided (as it bothers you). You don't really care what it returns - important is that select returns anything to prove that authority exists for that account.
SQL> create or replace function get_authority
2 (p_cust_id in account_owner.cust_id%type,
3 p_acc_id in account_owner.acc_id%type
4 )
5 return number
6 is
7 l_accow_id account_owner.accow_id%type;
8 begin
9 select max(o.accow_id)
10 into l_accow_id
11 from account_owner o
12 where o.cust_id = p_cust_id
13 and o.acc_id = p_acc_id;
14
15 return case when l_accow_id is not null then 1
16 else 0
17 end;
18 end;
19 /
Function created.
Testing:
SQL> select get_authority('650707-1111', 123) res_1,
2 get_authority('650707-1111', 5899) res_2
3 from dual;
RES_1 RES_2
---------- ----------
1 1
SQL>

How to generate a random, unique, alphanumeric ID of length N in Postgres 9.6+?

I've seen a bunch of different solutions on StackOverflow that span many years and many Postgres versions, but with some of the newer features like gen_random_bytes I want to ask again to see if there is a simpler solution in newer versions.
Given IDs which contain a-zA-Z0-9, and vary in size depending on where they're used, like...
bTFTxFDPPq
tcgHAdW3BD
IIo11r9J0D
FUW5I8iCiS
uXolWvg49Co5EfCo
LOscuAZu37yV84Sa
YyrbwLTRDb01TmyE
HoQk3a6atGWRMCSA
HwHSZgGRStDMwnNXHk3FmLDEbWAHE1Q9
qgpDcrNSMg87ngwcXTaZ9iImoUmXhSAv
RVZjqdKvtoafLi1O5HlvlpJoKzGeKJYS
3Rls4DjWxJaLfIJyXIEpcjWuh51aHHtK
(Like the IDs that Stripe uses.)
How can you generate them randomly and safely (as far as reducing collisions and reducing predictability goes) with an easy way to specify different lengths for different use cases, in Postgres 9.6+?
I'm thinking that ideally the solution has a signature similar to:
generate_uid(size integer) returns text
Where size is customizable depending on your own tradeoffs for lowering the chance of collisions vs. reducing the string size for usability.
From what I can tell, it must use gen_random_bytes() instead of random() for true randomness, to reduce the chance that they can be guessed.
Thanks!
I know there's gen_random_uuid() for UUIDs, but I don't want to use them in this case. I'm looking for something that gives me IDs similar to what Stripe (or others) use, that look like: "id": "ch_19iRv22eZvKYlo2CAxkjuHxZ" that are as short as possible while still containing only alphanumeric characters.
This requirement is also why encode(gen_random_bytes(), 'hex') isn't quite right for this case, since it reduces the character set and thus forces me to increase the length of the strings to avoid collisions.
I'm currently doing this in the application layer, but I'm looking to move it into the database layer to reduce interdependencies. Here's what the Node.js code for doing it in the application layer might look like:
var crypto = require('crypto');
var set = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
function generate(length) {
var bytes = crypto.randomBytes(length);
var chars = [];
for (var i = 0; i < bytes.length; i++) {
chars.push(set[bytes[i] % set.length]);
}
return chars.join('');
}
Figured this out, here's a function that does it:
CREATE OR REPLACE FUNCTION generate_uid(size INT) RETURNS TEXT AS $$
DECLARE
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
bytes BYTEA := gen_random_bytes(size);
l INT := length(characters);
i INT := 0;
output TEXT := '';
BEGIN
WHILE i < size LOOP
output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
i := i + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
And then to run it simply do:
generate_uid(10)
-- '3Rls4DjWxJ'
Warning
When doing this you need to be sure that the length of the IDs you are creating is sufficient to avoid collisions over time as the number of objects you've created grows, which can be counter-intuitive because of the Birthday Paradox. So you will likely want a length greater (or much greater) than 10 for any reasonably commonly created object, I just used 10 as a simple example.
Usage
With the function defined, you can use it in a table definition, like so:
CREATE TABLE users (
id TEXT PRIMARY KEY DEFAULT generate_uid(10),
name TEXT NOT NULL,
...
);
And then when inserting data, like so:
INSERT INTO users (name) VALUES ('ian');
INSERT INTO users (name) VALUES ('victor');
SELECT * FROM users;
It will automatically generate the id values:
id | name | ...
-----------+--------+-----
owmCAx552Q | ian |
ZIofD6l3X9 | victor |
Usage with a Prefix
Or maybe you want to add a prefix for convenience when looking at a single ID in the logs or in your debugger (similar to how Stripe does it), like so:
CREATE TABLE users (
id TEXT PRIMARY KEY DEFAULT ('user_' || generate_uid(10)),
name TEXT NOT NULL,
...
);
INSERT INTO users (name) VALUES ('ian');
INSERT INTO users (name) VALUES ('victor');
SELECT * FROM users;
id | name | ...
---------------+--------+-----
user_wABNZRD5Zk | ian |
user_ISzGcTVj8f | victor |
I'm looking for something that gives me "shortcodes" (similar to what Youtube uses for video IDs) that are as short as possible while still containing only alphanumeric characters.
This is a fundamentally different question from what you first asked. What you want here then is to put a serial type on the table, and to use hashids.org code for PostgreSQL.
This returns 1:1 with the unique number (serial)
Never repeats or has a chance of collision.
Also base62 [a-zA-Z0-9]
Code looks like this,
SELECT id, hash_encode(foo.id)
FROM foo; -- Result: jNl for 1001
SELECT hash_decode('jNl') -- returns 1001
This module also supports salts.
Review,
26 characters in [a-z]
26 characters in [A-Z]
10 characters in [0-9]
62 characters in [a-zA-Z0-9] (base62)
The function substring(string [from int] [for int]) looks useful.
So it looks something like this. First we demonstrate that we can take the random-range and pull from it.
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
1, -- 1 is 'a', 62 is '9'
1,
);
Now we need a range between 1 and 63
SELECT trunc(random()*62+1)::int+1
FROM generate_series(1,1e2) AS gs(x)
This gets us there.. Now we just have to join the two..
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1
1
)
FROM generate_series(1,1e2) AS gs(x);
Then we wrap it in an ARRAY constructor (because this is fast)
SELECT ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
);
And, we call array_to_string() to get a text.
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,1e2) AS gs(x)
)
, ''
);
From here we can even turn it into a function..
CREATE FUNCTION random_string(randomLength int)
RETURNS text AS $$
SELECT array_to_string(
ARRAY(
SELECT substring(
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
trunc(random()*62)::int+1,
1
)
FROM generate_series(1,randomLength) AS gs(x)
)
, ''
)
$$ LANGUAGE SQL
RETURNS NULL ON NULL INPUT
VOLATILE LEAKPROOF;
and then
SELECT * FROM random_string(10);
Thanks to Evan Carroll answer, I took a look on hashids.org.
For Postgres you have to compile the extension or run some TSQL functions.
But for my needs, I created something simpler based on hashids ideas (short, unguessable, unique, custom alphabet, avoid curse words).
Shuffle alphabet:
CREATE OR REPLACE FUNCTION consistent_shuffle(alphabet TEXT, salt TEXT) RETURNS TEXT AS $$
DECLARE
SALT_LENGTH INT := length(salt);
integer INT = 0;
temp TEXT = '';
j INT = 0;
v INT := 0;
p INT := 0;
i INT := length(alphabet) - 1;
output TEXT := alphabet;
BEGIN
IF salt IS NULL OR length(LTRIM(RTRIM(salt))) = 0 THEN
RETURN alphabet;
END IF;
WHILE i > 0 LOOP
v := v % SALT_LENGTH;
integer := ASCII(substr(salt, v + 1, 1));
p := p + integer;
j := (integer + v + p) % i;
temp := substr(output, j + 1, 1);
output := substr(output, 1, j) || substr(output, i + 1, 1) || substr(output, j + 2);
output := substr(output, 1, i) || temp || substr(output, i + 2);
i := i - 1;
v := v + 1;
END LOOP;
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
The main function:
CREATE OR REPLACE FUNCTION generate_uid(id INT, min_length INT, salt TEXT) RETURNS TEXT AS $$
DECLARE
clean_alphabet TEXT := 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
curse_chars TEXT := 'csfhuit';
curse TEXT := curse_chars || UPPER(curse_chars);
alphabet TEXT := regexp_replace(clean_alphabet, '[' || curse || ']', '', 'gi');
shuffle_alphabet TEXT := consistent_shuffle(alphabet, salt);
char_length INT := length(alphabet);
output TEXT := '';
BEGIN
WHILE id != 0 LOOP
output := output || substr(shuffle_alphabet, (id % char_length) + 1, 1);
id := trunc(id / char_length);
END LOOP;
curse := consistent_shuffle(curse, output || salt);
output := RPAD(output, min_length, curse);
RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;
How-to use examples:
-- 3: min-length
select generate_uid(123, 3, 'salt'); -- output: "0mH"
-- or as default value in a table
CREATE SEQUENCE IF NOT EXISTS my_id_serial START 1;
CREATE TABLE collections (
id TEXT PRIMARY KEY DEFAULT generate_uid(CAST (nextval('my_id_serial') AS INTEGER), 3, 'salt')
);
insert into collections DEFAULT VALUES ;
This query generate required string. Just change second parasmeter of generate_series to choose length of random string.
SELECT
string_agg(c, '')
FROM (
SELECT
chr(r + CASE WHEN r > 25 + 9 THEN 97 - 26 - 9 WHEN r > 9 THEN 64 - 9 ELSE 48 END) AS c
FROM (
SELECT
i,
(random() * 60)::int AS r
FROM
generate_series(0, 62) AS i
) AS a
ORDER BY i
) AS A;
So I had my own use-case for something like this. I am not proposing a solution to the top question, but if you are looking for something similar like I am, then try this out.
My use-case was that I needed to create a random external UUID (as a primary key) with as few characters as possible. Thankfully, the scenario did not have a requirement that a large amount of these would ever be needed (probably in the thousands only). Therefore a simple solution was a combination of using generate_uid() and checking to make sure that the next sequence was not already used.
Here is how I put it together:
CREATE OR REPLACE FUNCTION generate_id (
in length INT
, in for_table text
, in for_column text
, OUT next_id TEXT
) AS
$$
DECLARE
id_is_used BOOLEAN;
loop_count INT := 0;
characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
loop_length INT;
BEGIN
LOOP
next_id := '';
loop_length := 0;
WHILE loop_length < length LOOP
next_id := next_id || substr(characters, get_byte(gen_random_bytes(length), loop_length) % length(characters) + 1, 1);
loop_length := loop_length + 1;
END LOOP;
EXECUTE format('SELECT TRUE FROM %s WHERE %s = %s LIMIT 1', for_table, for_column, quote_literal(next_id)) into id_is_used;
EXIT WHEN id_is_used IS NULL;
loop_count := loop_count + 1;
IF loop_count > 100 THEN
RAISE EXCEPTION 'Too many loops. Might be reaching the practical limit for the given length.';
END IF;
END LOOP;
END
$$
LANGUAGE plpgsql
STABLE
;
here is an example table usage:
create table some_table (
id
TEXT
DEFAULT generate_id(6, 'some_table', 'id')
PRIMARY KEY
)
;
and a test to see how it breaks:
DO
$$
DECLARE
loop_count INT := 0;
BEGIN
-- WHILE LOOP
WHILE loop_count < 1000000
LOOP
INSERT INTO some_table VALUES (DEFAULT);
loop_count := loop_count + 1;
END LOOP;
END
$$ LANGUAGE plpgsql
;

Result from pipelined function, always will sorted as "written", or not?

Needed get renumbered result set, for example:
CREATE OR REPLACE TYPE nums_list IS TABLE OF NUMBER;
CREATE OR REPLACE FUNCTION generate_series(from_n INTEGER, to_n INTEGER, cycle_max INTEGER)
RETURN nums_list PIPELINED AS
cycle_iteration INTEGER := from_n;
BEGIN
FOR i IN from_n..to_n LOOP
PIPE ROW( cycle_iteration );
cycle_iteration := cycle_iteration + 1;
IF cycle_iteration > cycle_max THEN
cycle_iteration := from_n;
END IF;
END LOOP;
RETURN;
END;
SELECT * FROM TABLE(generate_series(1,10,3));
Question is: there is guarantee, that oracle always will return result in that order? :
1
2
3
1
2
3
1
2
3
1
or maybe sometimes result will unexpected ordered, like this:
1
1
1
1
2
2
....
?
Pipelining negates the need to build huge collections by piping rows
out of the function as they are created, saving memory and allowing
subsequent processing to start before all the rows are generated
pipelined-table-functions
This means, it will start processing the rows before get fetched completely and that's why you are seeing unpredictable order.

COUNT function for files?

Is it possible to use COUNT in some way that will give me the number of tuples that are in a .sql file? I tried using it in a query with the file name like this:
SELECT COUNT(*) FROM #q65b;
It tells me that the table is invalid, which I understand because it isn't a table, q65b is a file with a query saved in it. I'm trying to compare the number of rows in q65b to a view that I have created. Is this possible or do I just have to run the query and check the number of rows at the bottom?
Thanks
You can do this in SQL*Plus. For example:
Create the text file, containing the query (note: no semicolon!):
select * from dual
Save it in a file, e.g. myqueryfile.txt, to the folder accessible from your SQL*Plus session.
You can now call this from within another SQL query - but make sure the # as at the start of a line, e.g.:
SQL> select * from (
2 #myqueryfile.txt
3 );
D
-
X
I don't personally use this feature much, however.
Here is one approach. It's a function which reads a file in a directory, wraps the contents in a select count(*) from ( .... ) construct and executes the resultant statement.
1 create or replace function get_cnt
2 ( p_file in varchar2 )
3 return number
4 as
5 n pls_integer;
6 stmt varchar2(32767);
7 f_line varchar2(255);
8 fh utl_file.file_type;
9 begin
10 stmt := 'select count(*) from (';
11 fh := utl_file.fopen('SQL_SCRIPTS', p_file, 'R');
12 loop
13 utl_file.get_line(fh, f_line );
14 if f_line is null then exit;
15 elsif f_line = '/' then exit;
16 else stmt := stmt ||chr(10)||f_line;
17 end if;
18 end loop;
19 stmt := stmt || ')';
20 execute immediate stmt into n;
21 return n;
22* end get_cnt;
SQL>
Here is the contents of a sql file:
select * from emp
/
~
~
~
"scripts/q_emp.sql" 3L, 21C
And here is how the script runs:
SQL> select get_cnt ('q_emp.sql') from dual
2 /
GET_CNT('Q_EMP.SQL')
--------------------
14
SQL>
So it works. Obviously what I have posted is just a proof of concept. You will need to include lots of error handling for the UTL_FILE aspects - it's a package which can throw lots of exceptions - and probably some safety checking of the script that gets passed.