I have an old and a new table. In the old table, there is a VARCHAR2 column, called number, with values like 11-38D402342 and 11-38D402342/43. Also some crappy data, but never null.
In the new table, there are the two VARCHAR2 columns number_left and number_right. these two were filled from the number column of the old table:
number_left = nvl(substr(number,1,instr(number,'/',1)-1),number)
number_right = substr(number,1,instr(rtrim(number,'/'),'/',1)-length(substr(number,instr(number,'/'))))||substr(number,decode(instr(number,'/'),0,null,instr(number,'/')+1))
After some other decisions, we now need only one column again. To make sure the copied numbers get set correctly, I decided to use the number of the old table and the used conversion to identify matching rows.
I have a mapping but cannot use it since the number_left and number_right can be copied into new rows by an application and must get the old number too. So a number of the old table can possibly copied into multible rows in the new table.
I tried used this code:
declare
left_num varchar2(255 char) := '';
right_num varchar2(255 char) := '';
pos number := 0;
CURSOR c_number is
select number from old_table;
begin
for cur in c_number loop
left_num := nvl(substr(cur.number,1,instr(cur.number,'/',1)-1),cur.number);
if instr(cur.number,'/') = 0 then
pos := null;
else
pos := instr(cur.number,'/')+1;
end if;
right_num := substr(cur.number,1,instr(rtrim(cur.number,'/'),'/',1)-length(substr(cur.number,instr(cur.number,'/'))))||substr(cur.number, pos);
update new_table n
set n.number = cur.snummer
where n.number_left = left_num
and (
n.number_right = right_num
or (
n.number_right is null
and right_num is null
)
);
end loop;
end;
In the old table I have 170.000 lines and in the new one 180.000 since there is also some new data. Peanuts.
But here comes the strange part:
Everything seems to work fine but after the first 14.000 rows it gets terribly slow, perhaps 3 rows in a second. And I think it becomes slower and slower.
Any idea?
Instead of asking a bunch of random starngers to take a guess you should learn how to trace the performance of SQL in the database. There are several differnt ways you can gain insight into the performance of your statement, but you will ned some level of DBA-type access.
This is an area which is covered in the official documentation but I suggest you start with Tim Hall's excellent overview. Find it here.
Related
I have a system for tracking usage of computers in a lab. Slightly simplified, it works out to:
Machines are associated with a lab.
Machines have a binary logged_in state, which gets updated automatically when users log in and out.
There is a view keyed on the lab which gathers the total number of seats associated with the lab, and the current number in use for that lab.
What I would like to do is add a history or audit table, which would track changes to lab population over time. I had a trigger on the machine table to store the time and the total lab population in my lab history table every time the machine table changed. The problem is that, in order to get the new total for the lab, I have to examine the other values in the machine table. This results in a table mutating error.
Some things I found on here and elsewhere suggested that I should create a package to track the labs being changed. Use a before trigger to clear the list, a row trigger to store each labid being changed, and an after trigger to update the history table with new values for only those labs whose ids are in the list. I've tried that, but can't figure out how to access the values I've stored in the package table (or even if it is storing them properly in the first place.) As will no doubt be obvious, I'm unfamiliar with PL/SQL packages and table variables - the whole syntax of referring to table entries like arrays struck me as vaguely heretical though incredibly useful if it works. So most of the below is just copied and adapted from other solutions I've found, but they didn't stretch as far as how to actually use my table of changed lablocids, assuming its being created properly in the first place. The following simply tells me that pg_machine_in_use_pkg.changedlablocids does not exist when I try to compile the final trigger.
create or replace package labstats_adm.pg_machine_in_use_pkg
as
type arr is table of number index by binary_integer;
changedlablocids arr;
empty arr;
end;
/
create or replace trigger labstats_adm.pg_machine_in_use_init
before insert or update
on labstats_adm.pg_machine
begin
-- begin each update with a blank list of changed lablocids
pg_machine_in_use_pkg.changedlablocids := pg_machine_in_use_pkg.empty;
end;
/
--
create or replace trigger labstats_adm.pg_machine_in_use_update
after insert or update of in_use,lablocid
on labstats_adm.pg_machine
for each row
begin
-- record lablocids - old and new - of changed machines
if :new.lablocid is not null then
pg_machine_in_use_pkg.changedlablocids( pg_machine_in_use_pkg.changedlablocids.count+1 ) := :new.lablocid;
end if;
if :old.lablocid is not null and :old.lablocid != :new.lablocid then
pg_machine_in_use_pkg.changedlablocids( pg_machine_in_use_pkg.changedlablocids.count+1 ) := :old.lablocid;
end if;
end;
create or replace trigger labstats_adm.pg_machine_lab_history
after insert or update of in_use,lablocid
on labstats_adm.pg_machine
begin
-- for each lablocation we just logged a change to, update that labs history
insert into labstats_adm.pg_lab_history (labid, time, total_seats, used_seats)
select labid, systimestamp, total_seats, used_seats
from labstats_adm.lab_usage
where labid in (
select distinct labid from pg_machine_in_use_pkg.changedlablocids
);
end;
/
On the other hand, if there is a better overall approach than the package, I'm all ears.
After some reflection I've got to go with #tbone on this one. In my experience a history table should be a copy of the data in the "real" table with fields added to show when particular 'version' of the data shown by a row in the history table was in effect. So if the "real" table is something like
CREATE TABLE REAL_TABLE
(ID_REAL_TABLE NUMBER PRIMARY KEY,
COL2 NUMBER,
COL3 VARCHAR2(50));
then I'd create the history table as
CREATE TABLE HIST_TABLE
(ID_HIST_TABLE NUMBER PRIMARY KEY
ID_REAL_TABLE NUMBER,
COL2 NUMBER,
COL3 VARCHAR2(50),
EFFECTIVE_START_DT TIMESTAMP(9) NOT NULL,
EFFECTIVE_END_DT TIMESTAMP(9));
and I'd have the following triggers to get everything populated:
CREATE TRIGGER REAL_TABLE_BI
BEFORE INSERT ON REAL_TABLE
REFERENCING OLD AS OLD
NEW AS NEW
FOR EACH ROW
BEGIN
IF :NEW.ID_REAL_TABLE IS NULL THEN
:NEW.ID_REAL_TABLE := REAL_TABLE_SEQUENCE.NEXTVAL;
END IF;
END REAL_TABLE_BI;
CREATE TRIGGER HIST_TABLE_BI
BEFORE INSERT ON HIST_TABLE
FOR EACH ROW
BEGIN
IF :NEW.ID_HIST_TABLE IS NULL THEN
:NEW.ID_HIST_TABLE := HIST_TABLE_SEQUENCE.NEXTVAL;
END IF;
END HIST_TABLE_BI;
CREATE TRIGGER REAL_TABLE_AIUD
AFTER INSERT OR UPDATE OR DELETE ON REAL_TABLE
FOR EACH ROW
DECLARE
tsEffective_start_date TIMESTAMP(9) := SYSTIMESTAMP;
tsEffective_end_date TIMESTAMP(9) := dtEffective_start_date - INTERVAL '0.000000001' SECOND;
BEGIN
IF UPDATING OR DELETING THEN
UPDATE HIST_TABLE
SET EFFECTIVE_END_DATE := tsEffective_end_date
WHERE ID_REAL_TABLE = :NEW.ID_REAL_TABLE AND
EFFECTIVE_END_DATE IS NULL;
END IF;
IF INSERTING OR UPDATING THEN
INSERT INTO HIST_TABLE (ID_REAL_TABLE, COL2, COL3, EFFECTIVE_START_DATE)
VALUES (:NEW.ID_REAL_TABLE, :NEW.COL2, :NEW.COL3, tsEffective_start_date);
END IF;
END REAL_TABLE_AIUD;
Using this method the "history" table has all historical versions of the data in the "real" table PLUS a complete copy of the "current" data from the "real" table; this is done to simplify queries which need to report on all versions of the data in the table up to and including present values.
The advantage of using triggers to do all this is that the maintenance of the primary keys and the history table becomes automatic and can't be easily circumvented or forgotten.
Share and enjoy.
Sorry so slow to get back; its taken me a bit of fiddling, and I haven't had a lot of time to work on it.
Thanks to Bob Jarvis for pointing me at the compound triggers, which cleaned up the overall structure significantly. After that, I just had to sanitise the way I'm getting values back out of my table variable. On the odd chance that someone else stumbles over this looking for the answer to the same problem, I'll post my final solution here:
create or replace
trigger pg_machine_in_use_update
for insert or update or delete of in_use,lablocid
on labstats_adm.pg_machine
compound trigger
type arr is table of number index by binary_integer;
changedlabids arr;
idx binary_integer;
after each row is
newlabid labstats_adm.pg_labs.labid%TYPE;
oldlabid labstats_adm.pg_labs.labid%TYPE;
begin
-- store the labids of any changed locations
-- PL/SQL does not like us testing for the existence of something that isn't there, so just set it twice if necessary
if ( :new.lablocid is not null ) then
select labid into newlabid from labstats_adm.pg_lablocation where lablocid = :new.lablocid;
changedlabids( newlabid ) := 1;
end if;
if ( :old.lablocid is not null ) then
select labid into oldlabid from labstats_adm.pg_lablocation where lablocid = :old.lablocid;
changedlabids( oldlabid ) := 1;
end if;
end after each row;
after statement is
begin
idx := changedlabids.FIRST;
while idx is not null loop
insert into labstats_adm.pg_lab_history (labid, time, total_seats, used_seats)
select labid, systimestamp, total_seats, used_seats
from labstats_adm.lab_usage
where labid = idx;
idx := changedlabids.NEXT(idx);
end loop;
end after statement;
end pg_machine_in_use_update;
how can you select the same sequence twice in one query?
I have googled this and just cant get an answer.
Just to be clear, this is what I need, example :
select seq.nextval as v1, seq.nextval as v2 from dual (I know that doesn't work)
I have tried UNION as well, Just cant get my head around it.
If you always need exactly two values, you could change the sequence to increment by 2:
alter sequence seq increment by 2;
and then select the current value and its successor:
select seq.nextval,
seq.nextval + 1
from dual;
Not pretty, but it should do the job.
UPDATE
Just to clarify: the ALTER SEQUENCE should be issued only once in a lifetime, not once per session!
The Answer by Frank has a downside:
You cannot use it in transactional system, because the ALTER SEQUENCE commits any pending DML.
The Answer of Sean only pulls the sequence once.
As I understand, you want to pull two values.
As an alternative to Seans solution, you could also select two times from .nextval, due ORACLE gives you the same value twice.
I'd prefer wrapping up the sequence in a procedure.
This tricks oracle to pulling the sequence twice.
CREATE or replace FUNCTION GetSeq return number as
nSeq NUMBER;
begin
select seq.nextval into nSeq from dual;
return nSeq;
end;
/
If you need this generically, maybe you'd like:
CREATE or replace FUNCTION GetSeq(spSeq in VARCHAR2) return number as
nSeq NUMBER;
v_sql long;
begin
v_sql:='select '||upper(spSeq)||'.nextval from dual';
execute immediate v_sql into nSeq;
return nSeq;
end;
/
There are some limitations on how NEXTVAL can be used. There's a list in the Oracle docs. More to the point, the link includes a list of conditions where NEXTVAL will be called more than once.
The only scenario named on the page where a straight SELECT will call NEXTVAL more than once is in "A top-level SELECT statement". A crude query that will call NEXTVAL twice is:
SELECT seq.NEXTVAL FROM (
SELECT * FROM DUAL
CONNECT BY LEVEL <= 2) -- Change the "2" to get more sequences
Unfortunately, if you try to push this into a subquery to flatten out the values to one row with columns v1 and v2 (like in your question), you'll get the ORA-02287: sequence number not allowed here.
This is as close as I could get. If the query above won't help you get where you want, then check out the other answers that have been posted here. As I've been typing this, a couple of excellent answers have been posted.
Telling you to just call sequence.nextval multiple times would be boring, so here's what you can try:
create type t_num_tab as table of number;
CREATE SEQUENCE SEQ_TEST
START WITH 1
MAXVALUE 999999999999999999999999999
MINVALUE 1
NOCYCLE
CACHE 100
NOORDER;
create or replace function get_seq_vals(i_num in number) return t_num_tab is
seq_tab t_num_tab;
begin
select SEQ_TEST.nextval
bulk collect into seq_tab
from all_objects
where rownum <= i_num;
return seq_tab;
end;
And you can use it as follows. This example is pulling 7 numbers at once from the sequence:
declare
numbers t_num_tab;
idx number;
begin
-- this grabs x numbers at a time
numbers := get_seq_vals(7);
-- this just loops through the returned numbers
-- to show the values
idx := numbers.first;
loop
dbms_output.put_line(numbers(idx));
idx := numbers.next(idx);
exit when idx is null;
end loop;
end;
Also note that I use "next" instead of first..last as its possible you may want to remove numbers from the list before iterating through (or, sometimes a sequence cache can results in numbers incrementing by more than 1).
These are all great answers, unfortunately it was just not enough. Will definitely be able to return to this page when struggling in the future.
I have gone with a different method. Asked a new question and got my answer.
You can go check it out here.
I'm faced with the task of having to look in a database with millions of records, which codes of a set of about 1500 have a corresponding record, which ones of those exist in the db. For example i have 1500 IDs in a csv file. I want to know which ones of those IDs exist in the database, and are therefore correct, and which ones don't.
Is there a better way of doing this without "... WHERE id IN (1, 2, 3, ..., 1500);" ?
The DB/language in question is ORACLE PL/SQL.
Thanks in advance for any help.
Build an external table on your CSV file. These are highly neat things which allow us to query the contents of an OS file in SQL. Find out more.
Then it's a simple matter of issuing a query:
select csv.id
, case ( when tgt.id is null then 'invalid' else 'valid') end as valid_id
from your_external_tab csv
left join target_table tgt on (csv.id = tgt.id)
"CSV table is hardly ideal from a performance point of view"
Performance is a matter of context. In this case it depends on how often the data in the CSV changes and how often we need to query it. If the file is produced once a day and we only need to check the values after it has been delivered then an external table is the most efficient solution. But if this data set is a permanent repository which needs to be queried often then the overhead of writing to a heap table is obviously justified.
To me, a CSV file consisting of a bunch IDs and nothing else sounds like transient data and so
fits the use case for external tables. But the OP may have additional requirements which they haven't mentioned.
Here is an alternative approach which doesn't require creating any permanent database objects. Consequently it is less elegant, and probably will perform worse.
It reads the CSV file labouriously using UTL_FILE and populates a collection based on SYSTEM.NUMBER_TBL_TYPE, a pre-defined collection (nested table of NUMBER) which should be available in your Oracle database.
declare
ids system.number_tbl_type;
fh utl_file.file_handle;
idx pls_integer := 0;
n pls_integer;
begin
fh := utl_file.fopen('your_data_directory', 'your_data.csv', 'r');
begin
utl_file.get_line(fh, n);
loop
idx := idx + 1;
ids.extend();
ids(idx) := n;
utl_file.get_line(fh, n);
end loop;
exception
when no_data_found then
if utl_file.is_open(fh) then
utl_file.fclose(fh);
end if;
when others then
raise;
end;
for id_recs in in ( select csv.column_value
, case ( when tgt.id is null then 'invalid' else 'valid') end as valid_id
from (select * from table(ids)) csv
left join target_table tgt on (csv.column_value = tgt.id)
) loop
dbms_output.put_line '(ID '||id_recs.column_value || ' is '||id_recs.valid_id);
end loop;
end;
Note: I have not tested this code. The principle is sound but the details may need debugging ;)
I have a table with a lot of records (could be more than 500 000 or 1 000 000). I added a new column in this table and I need to fill a value for every row in the column, using the corresponding row value of another column in this table.
I tried to use separate transactions for selecting every next chunk of 100 records and update the value for them, but still this takes hours to update all records in Oracle10 for example.
What is the most efficient way to do this in SQL, without using some dialect-specific features, so it works everywhere (Oracle, MSSQL, MySQL, PostGre etc.)?
ADDITIONAL INFO: There are no calculated fields. There are indexes. Used generated SQL statements which update the table row by row.
The usual way is to use UPDATE:
UPDATE mytable
SET new_column = <expr containing old_column>
You should be able to do this is a single transaction.
As Marcelo suggests:
UPDATE mytable
SET new_column = <expr containing old_column>;
If this takes too long and fails due to "snapshot too old" errors (e.g. if the expression queries another highly-active table), and if the new value for the column is always NOT NULL, you could update the table in batches:
UPDATE mytable
SET new_column = <expr containing old_column>
WHERE new_column IS NULL
AND ROWNUM <= 100000;
Just run this statement, COMMIT, then run it again; rinse, repeat until it reports "0 rows updated". It'll take longer but each update is less likely to fail.
EDIT:
A better alternative that should be more efficient is to use the DBMS_PARALLEL_EXECUTE API.
Sample code (from Oracle docs):
DECLARE
l_sql_stmt VARCHAR2(1000);
l_try NUMBER;
l_status NUMBER;
BEGIN
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
-- Execute the DML in parallel
l_sql_stmt := 'update EMPLOYEES e
SET e.salary = e.salary + 10
WHERE rowid BETWEEN :start_id AND :end_id';
DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is an error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
LOOP
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask');
l_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask');
END LOOP;
-- Done with processing; drop the task
DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');
END;
/
Oracle Docs: https://docs.oracle.com/database/121/ARPLS/d_parallel_ex.htm#ARPLS67333
You could drop any indexes on the table, then do your insert, and then recreate the indexes.
Might not work you for, but a technique I've used a couple times in the past for similar circumstances.
created updated_{table_name}, then select insert into this table in batches. Once finished, and this hinges on Oracle ( which I don't know or use ) supporting the ability to rename tables in an atomic fashion. updated_{table_name} becomes {table_name} while {table_name} becomes original_{table_name}.
Last time I had to do this was for a heavily indexed table with several million rows that absolutely positively could not be locked for the duration needed to make some serious changes to it.
What is the database version? Check out virtual columns in 11g:
Adding Columns with a Default Value
http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-schemamanagement.html
update Hotels set Discount=30 where Hotelid >= 1 and Hotelid <= 5504
For Postgresql I do something like this (if we are sure no more updates/inserts take place):
create table new_table as table orig_table with data;
update new_table set column = <expr>
start transaction;
drop table orig_table;
rename new_table to orig_table;
commit;
Update:
One improvement is that if your table is very large you will not lock the table, this operation in this case could take minutes.
Only if you are sure in the process no inserts and/or updates take
place.
I'm changing a database (oracle) with a script containing a few updates looking like:
UPDATE customer
SET status = REPLACE(status, 'X_Y', 'xy')
WHERE status LIKE '%X_Y%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
UPDATE customer
SET status = REPLACE(status, 'X_Z', 'xz')
WHERE status LIKE '%X_Z%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
-- More updates looking the same...
In this case, how would you enforce DRY (Don't Repeat Yourself)?
I'd particularly interested in solving the two following recurring problems:
Define a function, available from this script only, to extract the subquery SELECT id FROM category WHERE code = 'ABC'
Create a set of replace rules (that could look like {"X_Y": "yx", "X_Z": "xz", ...} in a popular programming language) and then iterate a single update query on it.
Thanks!
I would reduce it to a single query:
UPDATE customer
SET status = REPLACE(REPLACE(status, 'X_Y', 'xy'), 'X_Z', 'xz')
WHERE status REGEXP_LIKE 'X_[YZ]'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
First of all, remember that scripting is not the same thing as programming, and you don't have to adhere to DRY principles. Scripts like this one are usually one-offs, not a program to be maintained over a long time.
But you could use PL/SQL to do this:
declare
type str_tab is table of varchar2(30) index by binary_integer;
from_tab str_tab;
to_tab str_tab;
begin
from_tab(1) := 'X_Y';
from_tab(2) := 'X_Z';
to_tab(1) := 'xy';
to_tab(2) := 'xz';
for i in 1..from_tab.count loop
UPDATE customer
SET status = REPLACE(status, from_tab(i), to_tab(i))
WHERE status LIKE '%' || from_tab(i) || '%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
end loop;
end;
Pretty straightforward, unless I'm missing something.
UPDATE customer
SET status = REPLACE(REPLACE(status,'X_Y','xy'),'X_Z','xz')
WHERE ( status LIKE '%X_Y%' Or status LIKE '%X_Z%')
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
Write a script that takes parameters and call it multiple times. (I'm assuming you're using SQLPlus to run the script.)
replace_in_status.sql:
UPDATE customer
SET status = REPLACE(status, UPPER('&1'), '&2')
WHERE status LIKE '%' ||UPPER('&1')|| '%'
AND category_id IN
(SELECT id
FROM category
WHERE code = 'ABC');
Calling script:
#replace_in_status X_Y xy
#replace_in_status X_Z xz
Okay, a shot from the hip here, take it easy on my syntax :-)
Would an approach like this help:
DECLARE
v_sql1 VARCHAR2(1000);
v_sql2 VARCHAR2(2000);
TYPE T_Rules IS RECORD (srch VARCHAR2(100), repl(VARCHAR2(100));
TYPE T_RuleTab IS TABLE OF T_Rules INDEX BY BINARY_INTEGER;
v_rules T_RuleTab;
FUNCTION get_subquery RETURN VARCHAR2 IS
BEGIN
RETURN '(SELECT id FROM category WHERE code = ''ABC'')';
END;
BEGIN
v_sql1 := 'UPDATE customer SET status = REPLACE('':1'','':2'') WHERE status LIKE ''%:1%'' AND category_id IN ';
v_rules(1).srch := ('X_Y'); v_rules(1).repl := 'yx';
v_rules(2).srch := ('X_Z'); v_rules(2).repl := 'xz';
FOR i IN 1..v_rules.COUNT LOOP
v_sql2 := v_sql1||get_subquery();
EXECUTE IMMEDIATE v_sql2 USING v_rules(i).srch, v_rules(i).repl;
END LOOP;
END;
You could replace the PL/SQL table with a real table and run a cursor over it, but this addresses your second requirement.
Obviously some work is left on get_subquery, your first requirement ;-)
EDIT
Dang! forgot to mention you need to be careful with that replace string in your WHERE clause - underscores are a single character matching wild card in Oracle...
Depending on how important the script is, I would:
Just copy and paste and modify, or
Write a script in another programming language that has better ways to resolve the duplication.
For the replace rules you could create a temporary table and fill it with these replace rules, and then join with this table.
If the subquery is always the same, you have solved the first problem also by using a join.
I've seen a few approaches to this:
Use string buffers to assemble the sql dynamically using PL/SQL or in your programming language.
Use a framework such as IBATIS which let's you reuse and extend fragments of SQL that are stored in XML files.
Using an ORM framework circumvents this issue by working with objects rather than directly with the SQL.
Depending on your language and problem at hand using a framework may be the best approach and then extending it to do what you want it to do.
The solution suggested by soulmerge is the simplest, and therefore best one - you just need to nest the calls to "replace". I just want to add that the condition
status like '%tagada%'
is useless. replace() will change nothing to the status if the searched string is not found, therefore you can safely apply it to all rows. And since a condition where you search a string lost in the middle of another string cannot make any use of whatever index you have, it's useless as a filtering condition.
Your only filtering condition is the one on category_id ...
Which brings one point that justifies why soulmerge's solution is best: iterating on all the changes is a bad idea. Suppose that the filter on category_id is moderately selective, odds are that Oracle will choose to scan the table. Do you really want to scan the table each time when you can do all the changes in a single pass?