I have a query which return 50 millions rows. I want to generate XML files for each row (file max. size is 100k). Of course I know the tags but I don't know how to write this in the most efficient way. Any help ?
Thanks
I wouldn't recommend trying to write 50M files to disk, but here's some code you can play with to demonstrate why its not a good idea
1: a function that writes files to disk using a directory
create or replace function WRITETOFILE (dir in VARCHAR2,fn in VARCHAR2, dd IN clob) return clob AS
ff UTL_FILE.FILE_TYPE;
l_amt number := 30000;
l_offset number := 1;
l_length number := nvl(dbms_lob.getlength(dd),0);
buf varchar2(30000);
begin
ff := UTL_FILE.FOPEN(dir,fn,'W',32760);
while ( l_offset < l_length ) loop
buf := dbms_lob.substr(dd,l_amt,l_offset);
utl_file.put(ff, buf);
utl_file.fflush(ff);
utl_file.new_line(ff);
l_offset := l_offset+length(buf);
end loop;
UTL_FILE.FCLOSE(ff);
return dd;
END WRITETOFILE;
/
2: statement creating a table with all rows from a query that makes use of function above - suggesting that you keep the number of rows small to see how it plays
create table tmptbl as
select writetofile('DMP_DIR','xyz-'||level||'.xml', xmlelement("x", level).getClobVal()) tmpcol, systimestamp added_at
from dual CONNECT BY LEVEL <= 10;
3: drop table to repeat create table statement with more rows
drop table tmptbl purge;
I did 10k files in 10 seconds - which would give 1000 seconds for 1M files and 50000 seconds for 50M files (i.e. just under 14 hours).
Related
We have kiosks for customers to check their purchase volume for two different categories of items. They will input their mobile number, which will send an OTP to their mobile numbers and they will input it back to authenticate, the system has to check the data and display for them. As a developer, the kiosk supplier has provided us with a limited functionality development kit by which we can execute select statement on the database and display the returned values on the kiosk.
I have created an object type as follows:
CREATE OR REPLACE TYPE rebate_values
AS
OBJECT (ASales_total number,
ACurrent_Rebate_Percent number,
ANeeded_Sales number,
ANext_Rebate_Percent number,
BSales_total number,
BCurrent_Rebate_Percent number,
BNeeded_Sales number,
BNext_Rebate_Percent number);
A function to which I will pass customers' mobile to get their sales and rebate information:
CREATE OR REPLACE FUNCTION AA_rebate_function (P_phone IN NUMBER)
RETURN rebate_values
IS
A_P_Sales_total NUMBER;
A_P_Current_Rebate_Percent NUMBER;
A_P_Needed_Sales NUMBER;
A_P_Next_Rebate_Percent NUMBER;
B_P_Sales_total NUMBER;
B_P_Current_Rebate_Percent NUMBER;
B_P_Needed_Sales NUMBER;
B_P_Next_Rebate_Percent NUMBER;
P_CODE VARCHAR (10);
BEGIN
SELECT CC_CODE
INTO P_CODE
FROM CUSTOMERS
WHERE C_MOBILE = P_phone;
FOR OUTDATA
IN (
--My Query to retrieve the data
Select ................
)
LOOP
IF OUTDATA.CLASS = 'X'
THEN
A_P_Sales_total := OUTDATA.SALES_TOTAL;
A_P_Current_Rebate_Percent := OUTDATA.CURRENT_REBATE_PERCENT;
A_P_Needed_Sales := OUTDATA.NEEDED_SALES_FOR_HIGHER_REBATE;
A_P_Next_Rebate_Percent := OUTDATA.NEXT_HIGHER_REBATE_PERCENT;
END IF;
IF OUTDATA.CLASS = 'Y'
THEN
B_P_Sales_total := OUTDATA.SALES_TOTAL;
B_P_Current_Rebate_Percent := OUTDATA.CURRENT_REBATE_PERCENT;
B_P_Needed_Sales := OUTDATA.NEEDED_SALES_FOR_HIGHER_REBATE;
B_P_Next_Rebate_Percent := OUTDATA.NEXT_HIGHER_REBATE_PERCENT;
END IF;
END LOOP;
RETURN rebate_values (A_P_Sales_total,
A_P_Current_Rebate_Percent,
A_P_Needed_Sales,
A_P_Next_Rebate_Percent,
B_P_Sales_total,
B_P_Current_Rebate_Percent,
B_P_Needed_Sales,
B_P_Next_Rebate_Percent);
END;
/
The query takes 27 seconds to retrieve the values for each customer. Each customer will have 2 rows, so that's why I have used LOOP to collect the values.
When I execute the function:
SELECT AA_rebate_function (XXXXXXXXXX) FROM DUAL;
I get data as follows in a single column within 27 seconds:
(XXXX, X, XXXX, X, XXXX, X, XXXX, X)
But when I execute the function to get the values in different columns, it takes 27 x 8 seconds = 216 seconds, i.e., approximately 3.6 minutes which is a big issue as the customer cannot wait for 3.6 minutes on the kiosk to view the data.
SELECT x.c.ASales_total,
x.c.ACurrent_Rebate_Percent,
x.c.ANeeded_Sales,
x.c.ANext_Rebate_Percent,
x.c.BSales_total,
x.c.BCurrent_Rebate_Percent,
x.c.BNeeded_Sales,
x.c.BNext_Rebate_Percent
FROM (SELECT AA_rebate_function (XXXXXXXXXX) c FROM DUAL) x;
I have tried using stored procedure with OUT values but it doesn't fit in my environment as I cannot program to execute stored procedures from the kiosk development toolkit because it only supports select statements, checked with the supplier and they don't have any plan to add that support in near future.
I tried converting the single field into multiple columns using REGEXP_SUBSTR but I get a type conversion error as it is an array.
The query is very complex and has to calculate data for the last 10 years and has millions of rows, 27 seconds is actually the optimum time to get the desired results.
Interesting! I didn't realize that when you query a function that returns an object, it runs the function once for each column you reference the object in. That's awkward.
The easiest solution I could find for this is to switch your function to be PIPELINED. You'll need to create a nested table type to do this.
create type rebate_values_t is table of rebate_values;
/
CREATE OR REPLACE FUNCTION AA_rebate_function (P_phone IN NUMBER)
RETURN rebate_values_t PIPELINED
IS
... your code here ...
PIPE ROW (rebate_values (A_P_Sales_total,
A_P_Current_Rebate_Percent,
A_P_Needed_Sales,
A_P_Next_Rebate_Percent,
B_P_Sales_total,
B_P_Current_Rebate_Percent,
B_P_Needed_Sales,
B_P_Next_Rebate_Percent));
RETURN;
END;
/
SELECT x.ASales_total,
x.ACurrent_Rebate_Percent,
x.ANeeded_Sales,
x.ANext_Rebate_Percent,
x.BSales_total,
x.BCurrent_Rebate_Percent,
x.BNeeded_Sales,
x.BNext_Rebate_Percent
FROM TABLE(AA_rebate_function (XXXXXXXXXX)) x;
For some reason, this should only execute the function once, and take 27 seconds.
Needed get renumbered result set, for example:
CREATE OR REPLACE TYPE nums_list IS TABLE OF NUMBER;
CREATE OR REPLACE FUNCTION generate_series(from_n INTEGER, to_n INTEGER, cycle_max INTEGER)
RETURN nums_list PIPELINED AS
cycle_iteration INTEGER := from_n;
BEGIN
FOR i IN from_n..to_n LOOP
PIPE ROW( cycle_iteration );
cycle_iteration := cycle_iteration + 1;
IF cycle_iteration > cycle_max THEN
cycle_iteration := from_n;
END IF;
END LOOP;
RETURN;
END;
SELECT * FROM TABLE(generate_series(1,10,3));
Question is: there is guarantee, that oracle always will return result in that order? :
1
2
3
1
2
3
1
2
3
1
or maybe sometimes result will unexpected ordered, like this:
1
1
1
1
2
2
....
?
Pipelining negates the need to build huge collections by piping rows
out of the function as they are created, saving memory and allowing
subsequent processing to start before all the rows are generated
pipelined-table-functions
This means, it will start processing the rows before get fetched completely and that's why you are seeing unpredictable order.
ive got a problem for which i couldnt find answwer so far.
Is there a way to read blob file from oracle table using sql or pl/sql and measure time of reading it? I mean like reading whole of it, i dont need it displayed anywhere. All i found was to read 4000 bytes of file but thats not enough.
For importing there is simply
SET TIMING ON and OFF option in sqlplus but using select on tablle gives only small portion of file and doesnt matter how big it is, it always takes the same time pretty much.
Any help anybody?
Not quite sure what you're trying to achieve, but you can get some timings in a PL/SQL block using dbms_utility.get_time as LalitKumarB suggested. The initial select is (almost) instant though, it's reading through or processing the data that's really measurable. This is reading a blob with three different 'chunk' sizes to show the difference it makes:
set serveroutput on
declare
l_start number;
l_blob blob;
l_byte raw(1);
l_16byte raw(16);
l_kbyte raw(1024);
begin
l_start := dbms_utility.get_time;
select b into l_blob from t42 where rownum = 1; -- your own query here obviously
dbms_output.put_line('select: '
|| (dbms_utility.get_time - l_start) || ' hsecs');
l_start := dbms_utility.get_time;
for i in 1..dbms_lob.getlength(l_blob) loop
l_byte := dbms_lob.substr(l_blob, 1, i);
end loop;
dbms_output.put_line('single byte: '
|| (dbms_utility.get_time - l_start) || ' hsecs');
l_start := dbms_utility.get_time;
for i in 1..(dbms_lob.getlength(l_blob)/16) loop
l_16byte := dbms_lob.substr(l_blob, 16, i);
end loop;
dbms_output.put_line('16 bytes: '
|| (dbms_utility.get_time - l_start) || ' hsecs');
l_start := dbms_utility.get_time;
for i in 1..(dbms_lob.getlength(l_blob)/1024) loop
l_kbyte := dbms_lob.substr(l_blob, 1024, i);
end loop;
dbms_output.put_line('1024 bytes: '
|| (dbms_utility.get_time - l_start) || ' hsecs');
end;
/
For a sample blob that gives something like:
anonymous block completed
select: 0 hsecs
single byte: 950 hsecs
16 bytes: 61 hsecs
1024 bytes: 1 hsecs
So clearly reading the blob in larger chunks is more efficient. So your "measure time of reading it" is a bit flexible...
I guess you already have the solution to access the BLOB data. For getting the time, use DBMS_UTILITY.GET_TIME before and after the step in your PL/SQL code. You could declare two variables, start_time and end_time to capture the respective times, and just subtract them to get the time elapsed/taken for the step.
See this as an example, http://www.oracle-base.com/articles/11g/plsql-new-features-and-enhancements-11gr1.php
Is it possible to use COUNT in some way that will give me the number of tuples that are in a .sql file? I tried using it in a query with the file name like this:
SELECT COUNT(*) FROM #q65b;
It tells me that the table is invalid, which I understand because it isn't a table, q65b is a file with a query saved in it. I'm trying to compare the number of rows in q65b to a view that I have created. Is this possible or do I just have to run the query and check the number of rows at the bottom?
Thanks
You can do this in SQL*Plus. For example:
Create the text file, containing the query (note: no semicolon!):
select * from dual
Save it in a file, e.g. myqueryfile.txt, to the folder accessible from your SQL*Plus session.
You can now call this from within another SQL query - but make sure the # as at the start of a line, e.g.:
SQL> select * from (
2 #myqueryfile.txt
3 );
D
-
X
I don't personally use this feature much, however.
Here is one approach. It's a function which reads a file in a directory, wraps the contents in a select count(*) from ( .... ) construct and executes the resultant statement.
1 create or replace function get_cnt
2 ( p_file in varchar2 )
3 return number
4 as
5 n pls_integer;
6 stmt varchar2(32767);
7 f_line varchar2(255);
8 fh utl_file.file_type;
9 begin
10 stmt := 'select count(*) from (';
11 fh := utl_file.fopen('SQL_SCRIPTS', p_file, 'R');
12 loop
13 utl_file.get_line(fh, f_line );
14 if f_line is null then exit;
15 elsif f_line = '/' then exit;
16 else stmt := stmt ||chr(10)||f_line;
17 end if;
18 end loop;
19 stmt := stmt || ')';
20 execute immediate stmt into n;
21 return n;
22* end get_cnt;
SQL>
Here is the contents of a sql file:
select * from emp
/
~
~
~
"scripts/q_emp.sql" 3L, 21C
And here is how the script runs:
SQL> select get_cnt ('q_emp.sql') from dual
2 /
GET_CNT('Q_EMP.SQL')
--------------------
14
SQL>
So it works. Obviously what I have posted is just a proof of concept. You will need to include lots of error handling for the UTL_FILE aspects - it's a package which can throw lots of exceptions - and probably some safety checking of the script that gets passed.
I have a fairly time intensive PL/SQL block that builds fingerprints from molecular structures. I would like to print output to SQL*Plus console to provide feedback on how many structures have been processed. I can do this with dbms_output.put_line
However everytime that is called a new line is written. I want to overwrite the line.
For example, currently I have the below.
Structure x of y processed
Structure x of y processed
Structure x of y processed
Structure x of y processed
Eventually I fill up the buffer as I'm dealing with thousands of structure records.
Is there a method I can use that will just overwrite the last output line?
Using DBMS_OUTPUT means that SQL*Plus will display nothing until the entire PL/SQL block is complete and will then display all the data currently in the buffer. It is not, therefore, an appropriate way to provide an ongoing status.
On the other hand, Oracle does provide a package DBMS_APPLICATION_INFO that is specifically designed to help you monitor your running code. For example, you could do something like
CREATE PROCEDURE process_structures
AS
<<other variable declarations>>
rindex BINARY_INTEGER;
slno BINARY_INTEGER;
totalwork NUMBER := y; -- Total number of structures
worksofar NUMBER := 0; -- Number of structures processed
BEGIN
rindex := dbms_application_info.set_session_longops_nohint;
FOR i IN (<<select structures to process>>)
LOOP
worksofar := worksofar + 1;
dbms_application_info.set_session_longops(
rindex => rindex,
slno => slno,
op_name => 'Processing of Molecular Structures',
sofar => worksofar ,
totalwork => totalwork,
target_desc => 'Some description',
units => 'structures');
<<process your structure with your existing code>>
END LOOP;
END;
From a separate SQL*Plus session, you can then monitory progress by querying the V$SESSION_LONGOPS view
SELECT opname,
target_desc,
sofar,
totalwork,
units,
elapsed_seconds,
time_remaining
FROM v$session_longops
WHERE opname = 'Processing of Molecular Structures';
You may also send messages to a named pipe and have another process read the message from the pipe.
procedure sendmessage(p_pipename varchar2
,p_message varchar2) is
s number(15);
begin
begin
sys.dbms_pipe.pack_message(p_message);
exception
when others then
sys.dbms_pipe.reset_buffer;
end;
s := sys.dbms_pipe.send_message(p_pipename, 0);
if s = 1
then
sys.dbms_pipe.purge(p_pipename);
end if;
end;
function receivemessage(p_pipename varchar2
,p_timeout integer) return varchar2 is
n number(15);
chr varchar2(200);
begin
n := sys.dbms_pipe.receive_message(p_pipename, p_timeout);
if n = 1
then
return null;
end if;
sys.dbms_pipe.unpack_message(chr);
return(chr);
end;
I don't think you can. As far as I understood the dbms_output it just doesn't work that way.
I recommend you use put to echo a single dot and a newline every 1000 or so entries to see that something is happening and write into a table or sequence the current position so you can have a look if you want to know.