UTL_MATCH-like function to work with CLOB - sql

My question is: Is there a UTL_MATCH-like function which works with a CLOB rather than a VARCHAR2?
My specific problem is: I'm on an Oracle database. I have a bunch of pre-written queries which interface with Domo CenterView. The queries have variables in them defined by ${variableName}. I need to rewrite these queries. I didn't write the original so instead of figuring out what a good value for the variables should be I want to run the queries with the application and get what the query was from V$SQL.
So my solution is: Do a UTL_MATCH on the queries with the variable stuff in it and V$SQL.SQL_FULLTEXT. However, UTL_MATCH is limited to VARCHAR2 and the datatype of V$SQL.SQL_FULLTEXT is CLOB. So, this is why I'm looking for a UTL_MATCH-like function which works with a CLOB datatype.
Any other tips of how to accomplish this are welcome. Thanks!
Edit, about the tips. If you have a better idea of how to do this, let me just tell you some information I've got at my disposal. I have about 100 queries, they're all in an excel spreadsheet (the ones with the ${variableName} in them). So I could pretty easily use excel to write a query for me. I'm hoping to just union all those queries together and copy the output to another sheet. Anyway, maybe that's helpful if you're thinking there's a better way to do this.
An example: Let's say I have the following query from Domo:
select department.dept_name
from department
where department.id = '${selectedDepartmentId}'
;
I want to call something like this:
select v.sql_fulltext
from v$sql v
where utl_match.jaro_winkler_similarity(v.sql_fulltext,
'select department.dept_name
from department
where department.id = ''${selectedDepartmentId}''') > 90
;
And get something like this in return:
SQL_FULLTEXT
------------------------------------------
select department.dept_name
from department
where department.id = '154'
What I've tried:
I tried substringing the clob and casting it to a varchar. I was really hopeful this would work, but it gives me an error. Here's the code:
select v.sql_fulltext
from v$sql v
where utl_match.jaro_winkler_similarity( cast( substr (v.sql_fulltext, 0, 4000) as varchar2 (4000)),
'select department.dept_name
from department
where department.id = ''${selectedDepartmentId}''') > 90
;
And here's the error:
ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 8000, maximum: 4000)
However, if I run this it works fine:
select cast(substr(v.sql_fulltext, 0, 4000) as varchar2 (4000))
from v$sql v
;
So I'm not sure what the problem is with casting the substring...

UTL_MATCH is a packaging for comparing strings with regards for checking how similar two strings are. Its functions evaluate strings and return scores. So all you're going to get is a number indicating (say) how many edits you need to turn ${variableName} into "Farmville" or "StackOveflow".
What you won't get is the actual differences: these two strings of text are identical except at offset 123 where it replaces ${variableName} with "Farmville".
Putting it like that suggests an alternative approach. Using INSTR() and SUBSTR() to locate instances of ${variableName} in your Domo CenterView queries and use those offsets to identify the different text in the v$sql.fulltext equivalents. You can do this with CLOB in PL/SQL with the DBMS_LOB package.

If the text you want to search has length <= 32767, then you can just convert the CLOB to VARCHAR2 using DBMS_LOB.SUBSTR:
select v.sql_fulltext
from v$sql v
where utl_match.jaro_winkler_similarity(dbms_lob.substr(v.sql_fulltext), 'select department.dept_name from department where department.id = ''${selectedDepartmentId}''') > 90 ;

I ended up creating a custom function for it. Here's the code:
CREATE OR REPLACE function match_clob(clob_1 clob, clob_2 clob) return number as
similar number := 0;
sec_similar number := 0;
sections number := 0;
max_length number := 3949;
length_1 number;
length_2 number;
vchar_1 varchar2 (3950);
vchar_2 varchar2 (3950);
begin
length_1 := length(clob_1);
length_2 := length(clob_2);
--dbms_output.put_line('length_1: '||length_1);
--dbms_output.put_line('length_2: '||length_2);
IF length_1 > max_length or length_2 > max_length THEN
FOR x IN 1 .. ceil(length_1 / max_length) LOOP
--dbms_output.put_line('((x-1)*max_length) + 1'||(x-1)||' * '||max_length||' = '||(((x-1)*max_length) + 1));
vchar_1 := substr(clob_1, ((x-1)*max_length) + 1, max_length);
vchar_2 := substr(clob_2, ((x-1)*max_length) + 1, max_length);
-- dbms_output.put_line('Section '||sections||' vchar_1: '||vchar_1||' ==> vchar_2: '||vchar_2);
sec_similar := UTL_MATCH.JARO_WINKLER_SIMILARITY(vchar_1, vchar_2);
--dbms_output.put_line('sec_similar: '||sec_similar);
similar := similar + sec_similar;
sections := sections + 1;
END LOOP;
--dbms_output.put_line('Similar: '||similar||' ==> Sections: '||sections);
similar := similar / sections;
ELSE
similar := UTL_MATCH.JARO_WINKLER_SIMILARITY(clob_1,clob_2);
END IF;
--dbms_output.put_line('Overall Similar: '||similar);
return(similar);
end;
/

Related

sql big clob convert rawtohex gives error ORA-06502

Hello guys I have one clob table with 8000bytes I want to convert it to rawtohex using
rawtohex(TO_CHAR(DBMS_LOB.SUBSTR(a.COMMENTS, 8000, 1))) but it throws
character string buffer too small error any idea to solve it?
If your maximum_string_size is set to standard, then the maximum raw size is 2000 bytes and your 8000 byte (character?) CLOB is too big for that. You should consider that your CLOB might have multibyte characters and 1 character != 1 byte.
If you are absolutely sure your CLOB contains only ASCII characters, you might be able to get away with something like this in PL/SQL:
i := 1;
while i < length(l_clob) loop
l_chunk := dbms_lob.substr( l_clob, 2000, i );
l_raw_chunk := utl_raw.cast_to_raw( my_chunk );
l_hex := l_hex || rawtohex( l_raw_chunk );
i := i + 2000;
end loop;
You could also use dbms_lob.convertoblob to convert your CLOB to a BLOB, use dbms_lob.substr to turn that BLOB into chunks of RAWs, then use rawtohex on those. You don't say what you're really trying to do, so I'm not sure I can be more specific.
Edit:
If you are just trying to find instances where 0x2014 or 0xC296 occur, you might be better off with something like this:
select *
from mytable a
where dbms_lob.instr( a.comments, hextoraw( '2014' )) > 0

Extract number from string with Oracle function

I need to create an Oracle DB function that takes a string as parameter. The string contains letters and numbers. I need to extract all the numbers from this string. For example, if I have a string like RO1234, I need to be able to use a function, say extract_number('RO1234'), and the result would be 1234.
To be even more precise, this is the kind of SQL query which this function would be used in.
SELECT DISTINCT column_name, extract_number(column_name)
FROM table_name
WHERE extract_number(column_name) = 1234;
QUESTION: How do I add a function like that to my Oracle database, in order to be able to use it like in the example above, using any of Oracle SQL Developer or SQLTools client applications?
You'd use REGEXP_REPLACE in order to remove all non-digit characters from a string:
select regexp_replace(column_name, '[^0-9]', '')
from mytable;
or
select regexp_replace(column_name, '[^[:digit:]]', '')
from mytable;
Of course you can write a function extract_number. It seems a bit like overkill though, to write a funtion that consists of only one function call itself.
create function extract_number(in_number varchar2) return varchar2 is
begin
return regexp_replace(in_number, '[^[:digit:]]', '');
end;
You can use regular expressions for extracting the number from string. Lets check it. Suppose this is the string mixing text and numbers 'stack12345overflow569'. This one should work:
select regexp_replace('stack12345overflow569', '[[:alpha:]]|_') as numbers from dual;
which will return "12345569".
also you can use this one:
select regexp_replace('stack12345overflow569', '[^0-9]', '') as numbers,
regexp_replace('Stack12345OverFlow569', '[^a-z and ^A-Z]', '') as characters
from dual
which will return "12345569" for numbers and "StackOverFlow" for characters.
This works for me, I only need first numbers in string:
TO_NUMBER(regexp_substr(h.HIST_OBSE, '\.*[[:digit:]]+\.*[[:digit:]]*'))
the field had the following string: "(43 Paginas) REGLAS DE PARTICIPACION".
result field: 43
If you are looking for 1st Number with decimal as string has correct decimal places, you may try regexp_substr function like this:
regexp_substr('stack12.345overflow', '\.*[[:digit:]]+\.*[[:digit:]]*')
To extract charecters from a string
SELECT REGEXP_REPLACE(column_name,'[^[:alpha:]]') alpha FROM DUAL
In order to extract month and a year from a string 'A0807' I did the following in PL/SQL:
DECLARE
lv_promo_code VARCHAR2(10) := 'A0807X';
lv_promo_num VARCHAR2(5);
lv_promo_month NUMBER(4);
lv_promo_year NUMBER(4);
BEGIN
lv_promo_num := REGEXP_SUBSTR(lv_promo_code, '(\d)(\d)(\d)(\d)');
lv_promo_month := EXTRACT(month from to_date(lv_promo_num, 'MMYY'));
DBMS_OUTPUT.PUT_LINE(lv_promo_month);
lv_promo_year := EXTRACT(year from to_date(lv_promo_num, 'MMYY'));
DBMS_OUTPUT.PUT_LINE(lv_promo_year);
END;

Insert character to a number datatype column

Note: I cant change the datatype of column
I want to store a character into a table that has column with number datatype.
The work around i found is convert the character values to ASCII and when retrieving it from the database convert it back to character.
I used couple of function ASCII and ASCIISTR but the limitation with these functions are they are converting only first character of the string.
So i used dump function
select dump('Puneet_kushwah1') from dual;
Result: Typ=96 Len=15: 80,117,110,101,101,116,95,107,117,115,104,119,97,104,49
This function is giving ASCII value of all the characters. Then i execute below query
select replace(substr((DUMP('Puneet_kushwah1')),(instr(DUMP('Puneet_kushwah1'),':')+2 )),',',' ') from dual;
Result: 80 117 110 101 101 116 95 107 117 115 104 119 97 104 49
then i used a special character to fill the space, so that i can replace it while retrieving from the database.
select replace(substr((DUMP('Puneet_kushwah1')),(instr(DUMP('Puneet_kushwah1'),':')+2 )),',','040') from dual;
Result: 80040117040110040101040101040116040950401070401170401150401040401190409704010404049
Table definition:
create table test (no number);
Then i inserted it into the table
INSERT into test SELECT replace(substr((DUMP('Puneet_kushwah1')),(instr(DUMP('Puneet_kushwah1'),':')+2 )),',','040') from dual;
Problem 1:
When i execute
select * from test;
i got
Result: 8.004011704011E82
I want to convert it into number only. Exact same what i inserted.
Problem 2:
And then when i execute select i want it to return the exact character string.
Please help i tried many functions.
Thanks in advance.
You can't get the exact string back because Oracle numbers are only stored up to 38 digits of precision.
So if you run this:
select cast(no as varchar2(100))
from test;
You'll get:
80040117040110040101040101040116040950400000000000000000000000000000000000000000000
While I advise not to proceed like this as this could be rife for errors and a possible maintenance nightmare, I do like a challenge and have been forced to do some screwy things myself in order make some vendor's bizarre way of doing things work for us so I sympathize with you if that is the case. So, for the fun of it check this out.
Convert to hex, then to a decimal and insert into the database (x_test has one NUMBER column), then select, converting back:
SQL> insert into x_test
2 select to_number(rawtohex('Puneet_kushwah1'), rpad('X', length(rawtohex('Puneet_kushwah1')), 'X')) from dual;
1 row created.
SQL> select * from x_test;
col1
----------
4.1777E+35
SQL> SELECT utl_raw.cast_to_varchar2(hextoraw(trim(to_char(col1, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))))
2 FROM x_test;
UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW(TRIM(TO_CHAR(col1,'XXXXXXXXXXXXXXXXXXXXXXXXXXXX
--------------------------------------------------------------------------------
Puneet_kushwah1
SQL>
While it's a horrible idea and a horrible data model, you could convert some strings into numbers by converting their raw representation into a number:
create or replace function string_to_number(p_string varchar2)
return number as
l_raw raw(40);
l_number number;
begin
l_raw := utl_i18n.string_to_raw(data => p_string, dst_charset => 'AL32UTF8');
l_number := to_number(rawtohex(l_raw), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx');
return l_number;
end;
/
And back again:
create or replace function number_to_string(p_number number)
return varchar2 as
l_raw raw(40);
l_string varchar2(20);
begin
l_raw := hextoraw(to_char(p_number, 'fmxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'));
l_string := utl_i18n.raw_to_char(data => l_raw, src_charset => 'AL32UTF8');
return l_string;
end;
/
Which you could use as:
insert into test (no) values (string_to_number('Puneet_kushwah1'));
1 row inserted.
select * from test;
NO
---------------------------------------
417765537084927079232028220523112497
select number_to_string(no) from test;
NUMBER_TO_STRING(NO)
--------------------------------------------------------------------------------
Puneet_kushwah1
You don't really need functions, you could do the conversions in-line; but this makes what's happening a bit clearer.
But you're restricted by the precision of the number type. I think you're limited to about 20 characters, but it'll depend a bit on the actual string and its hex representation.
(I am not endorsing this approach, it's just a mildly interesting problem).

ERROR at line 191: ORA-01489: result of string concatenation is too long [duplicate]

This question already has an answer here:
Oracle - ORA-01489: result of string concatenation is too long [duplicate]
(1 answer)
Closed 7 years ago.
Select TO_CLOB(a)|| TO_CLOB(b)|| TO_CLOB(c) || TO_CLOB(d)
from table1
Above query is not spooling the data properly into text file.
whereas,
Select a||b||c||d
from table1.
is ending to
ERROR at line 191: ORA-01489: result of string concatenation is too long.
Please help !!!
VARCHAR2 are limited to 4000 bytes. If you get this error
ERROR at line 191: ORA-01489: result of string concatenation is too
long.
Then it is pretty clear that the concatenation exceed 4000 bytes.
Now what to do ?
Your first solution to use CLOB instead is correct.
select TO_CLOB(a)|| TO_CLOB(b)|| TO_CLOB(c) || TO_CLOB(d)
It seems like your real problem is saving to file
Above query is not spooling the data properly into text file.
While you did not post how to save the resulting clob to a file, I believe you are not doing it correctly. If you try to save to file the same way as you were doing it with VARCHAR2, you are doing it wrong.
You need to first use dbms_lob.read to read the clob from database, then use utl_file.put_raw to write to file.
DECLARE
position NUMBER := 1;
byte_length NUMBER := 32760;
length NUMBER;
vblob BLOB;
rawlob RAW(32760);
temp NUMBER;
output utl_file.file_type;
BEGIN
-- Last parameter is maximum number of bytes returned.
-- wb stands for write byte mode
output := utl_file.fopen('DIR', 'filename', 'wb', 32760);
position := 1;
select dbms_lob.getlength(yourLob)
into len
from somewhere
where something;
temp := length;
select yourLob
into vlob
from somewhere
where something;
IF len < 32760 THEN
utl_file.put_raw(output, vblob);
-- Don't forget to flush
utl_file.fflush(output);
ELSE -- write part by part
WHILE position < len AND byte_length > 0
LOOP
dbms_lob.read(vblob, byte_length, position, rawlob);
utl_file.put_raw(output,rawlob);
-- You must admit, you would have forgot to flush.
utl_file.fflush(output);
position := position + byte_length;
-- set the end position if less than 32000 bytes
temp := temp - bytelen;
IF temp < 32760 THEN
byte_length := temp;
END IF;
END IF;
END;
How about increasing the value for LONG? You might have to increase the long variable to a higher value. Click here for detailed description.
Example:
SET LONG 100000;
SPOOL test_clob.txt
SELECT to_clob(lpad('A',4000,'A'))
||'B'
||to_clob(lpad('C',4000,'C'))
||'D'
||to_clob(lpad('E',4000,'E'))
||'F'
FROM dual;
SPOOL OFF;
Your second query returns error because,
The concat(||) operator in the query is trying to return varchar2, which has limit of 4000 characters and getting exceeded.
SQL*Plus hardcode linesize 81 when displaying CLOB, there seems no way around it. Therefore if you want to generate a csv file to be loaded to other databases, you will have a problem of parsing these extra newlines.
The ultimate solution is to use PL/SQL. For example, to genarate a comma delimited csv file from table "xyz", use the following code:
set lin 32766
set serveroutput on size unlimited
DECLARE
TYPE arraytable IS TABLE OF xyz%ROWTYPE;
myarray arraytable;
CURSOR c IS
select * from xyz ;
BEGIN
OPEN c;
LOOP
FETCH c BULK COLLECT INTO myarray LIMIT 10000;
FOR i IN 1 .. myarray.COUNT
LOOP
DBMS_OUTPUT.PUT_LINE(
myarray(i).col1||','||
myarray(i).col2||','||
myarray(i).col3||','||
myarray(i).col4;
END LOOP;
EXIT WHEN c%NOTFOUND;
END LOOP;
END;
/
A bonus of this approach is that this even works with LONG datatype!
Try xmlagg function. That worked well for me when I encountered a similar issue.
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions215.htm

How to get size in bytes of a CLOB column in Oracle?

How do I get the size in bytes of a CLOB column in Oracle?
LENGTH() and DBMS_LOB.getLength() both return number of characters used in the CLOB but I need to know how many bytes are used (I'm dealing with multibyte charactersets).
After some thinking i came up with this solution:
LENGTHB(TO_CHAR(SUBSTR(<CLOB-Column>,1,4000)))
SUBSTR returns only the first 4000 characters (max string size)
TO_CHAR converts from CLOB to VARCHAR2
LENGTHB returns the length in Bytes used by the string.
I'm adding my comment as an answer because it solves the original problem for a wider range of cases than the accepted answer. Note: you must still know the maximum length and the approximate proportion of multi-byte characters that your data will have.
If you have a CLOB greater than 4000 bytes, you need to use DBMS_LOB.SUBSTR rather than SUBSTR. Note that the amount and offset parameters are reversed in DBMS_LOB.SUBSTR.
Next, you may need to substring an amount less than 4000, because this parameter is the number of characters, and if you have multi-byte characters then 4000 characters will be more than 4000 bytes long, and you'll get ORA-06502: PL/SQL: numeric or value error: character string buffer too small because the substring result needs to fit in a VARCHAR2 which has a 4000 byte limit. Exactly how many characters you can retrieve depends on the average number of bytes per character in your data.
So my answer is:
LENGTHB(TO_CHAR(DBMS_LOB.SUBSTR(<CLOB-Column>,3000,1)))
+NVL(LENGTHB(TO_CHAR(DBM‌​S_LOB.SUBSTR(<CLOB-Column>,3000,3001))),0)
+NVL(LENGTHB(TO_CHAR(DBM‌​S_LOB.SUBSTR(<CLOB-Column>,6000,6001))),0)
+...
where you add as many chunks as you need to cover your longest CLOB, and adjust the chunk size according to average bytes-per-character of your data.
Try this one for CLOB sizes bigger than VARCHAR2:
We have to split the CLOB in parts of "VARCHAR2 compatible" sizes, run lengthb through every part of the CLOB data, and summarize all results.
declare
my_sum int;
begin
for x in ( select COLUMN, ceil(DBMS_LOB.getlength(COLUMN) / 2000) steps from TABLE )
loop
my_sum := 0;
for y in 1 .. x.steps
loop
my_sum := my_sum + lengthb(dbms_lob.substr( x.COLUMN, 2000, (y-1)*2000+1 ));
-- some additional output
dbms_output.put_line('step:' || y );
dbms_output.put_line('char length:' || DBMS_LOB.getlength(dbms_lob.substr( x.COLUMN, 2000 , (y-1)*2000+1 )));
dbms_output.put_line('byte length:' || lengthb(dbms_lob.substr( x.COLUMN, 2000, (y-1)*2000+1 )));
continue;
end loop;
dbms_output.put_line('char summary:' || DBMS_LOB.getlength(x.COLUMN));
dbms_output.put_line('byte summary:' || my_sum);
continue;
end loop;
end;
/
The simple solution is to cast CLOB to BLOB and then request length of BLOB !
The problem is that Oracle doesn't have a function that cast CLOB to BLOB, but we can simply define a function to do that
create or replace
FUNCTION clob2blob (p_in clob) RETURN blob IS
v_blob blob;
v_desc_offset PLS_INTEGER := 1;
v_src_offset PLS_INTEGER := 1;
v_lang PLS_INTEGER := 0;
v_warning PLS_INTEGER := 0;
BEGIN
dbms_lob.createtemporary(v_blob,TRUE);
dbms_lob.converttoblob
( v_blob
, p_in
, dbms_lob.getlength(p_in)
, v_desc_offset
, v_src_offset
, dbms_lob.default_csid
, v_lang
, v_warning
);
RETURN v_blob;
END;
The SQL command to use to obtain number of bytes is
SELECT length(clob2blob(fieldname)) as nr_bytes
or
SELECT dbms_lob.getlength(clob2blob(fieldname)) as nr_bytes
I have tested this on Oracle 10g without using Unicode(UTF-8).
But I think that this solution must be correct using Unicode(UTF-8) Oracle instance :-)
I want render thanks to Nashev that has posted a solution to convert clob to blob How convert CLOB to BLOB in Oracle? and to this post written in german (the code is in PL/SQL) 13ter.info.blog that give additionally a function to convert blob to clob !
Can somebody test the 2 commands in Unicode(UTF-8) CLOB so I'm sure that this works with Unicode ?
NVL(length(clob_col_name),0) works for me.
Check the LOB segment name from dba_lobs using the table name.
select TABLE_NAME,OWNER,COLUMN_NAME,SEGMENT_NAME from dba_lobs where TABLE_NAME='<<TABLE NAME>>';
Now use the segment name to find the bytes used in dba_segments.
select s.segment_name, s.partition_name, bytes/1048576 "Size (MB)"
from dba_segments s, dba_lobs l
where s.segment_name = l.segment_name
and s.owner = '<< OWNER >> ' order by s.segment_name, s.partition_name;
It only works till 4000 byte, What if the clob is bigger than 4000 bytes then we use this
declare
v_clob_size clob;
begin
v_clob_size:= (DBMS_LOB.getlength(v_clob)) / 1024 / 1024;
DBMS_OUTPUT.put_line('CLOB Size ' || v_clob_size);
end;
or
select (DBMS_LOB.getlength(your_column_name))/1024/1024 from your_table