I have around 2000 tables that most of them are not in use and do not have any data in them.
I know how to list all tables as below
SELECT owner, table_name FROM ALL_TABLES
But I do not know how to list the one that has at least one row of data in it.
Is there anyway to do that?
There are a few ways you could do this:
Brute-force and count the rows in every table
Check the table stats
Check if there is any storage allocated
Brute force
This loops through the tables, counts the rows, and spits out those that are empty:
declare
c integer;
begin
for t in (
select table_name from user_tables
where external = 'NO'
and temporary = 'N'
) loop
execute immediate
'select count(*) from ' || t.table_name
into c;
if c = 0 then
dbms_output.put_line ( t.table_name );
end if;
end loop;
end;
/
This is the only way to be sure there are no rows in the table now. The main drawback to this is it could take a looooong time if you a many tables with millions+ rows.
I've excluded:
Temporary tables. You can only see data inserted in your session. If they're in use in another session you can't see this
External tables. These point to files on the database server's file system. The files could be temporarily missing/blank/etc.
There may be other table types with issues like these - make sure you double check any that are reported as empty.
Check the stats
If all the table stats are up-to-date, you can check the num_rows:
select table_name
from user_tables ut
where external = 'NO'
and temporary = 'N'
and num_rows = 0;
The caveat with this is this figures may be out-of-date. You can force a regather now by running:
exec dbms_stats.gather_schema_stats ( user );
Though this is likely to take a while and - if gathering has been disabled/deferred - might result in unwanted plan changes. Avoid doing this on your production database!
Check storage allocation
You can look for tables with no segments allocated with:
select table_name
from user_tables ut
where external = 'NO'
and temporary = 'N'
and segment_created = 'NO';
As there's no space allocated to these, there's definitely no rows in them! But a table could have space allocated but no rows in it. So it may omit some of the empty tables - this is particularly likely for tables that did have rows in the past, but are empty now.
Final thoughts
It's worth remembering that a table with no rows now could still be in use. Staging tables used for daily/weekly/monthly loads may be purged at the end of the process; removing these will still break your apps!
There could also be code which refers to empty tables which work as-is, but would error if you drop the table.
A better approach would be to enable auditing, then run this for "a while". Any tables will no audited access in the time period are probably safe to remove.
Related
I have a scenario where I'm creating a temp table with the data coming in from a select statement. My problem is when the data coming in from the select query is huge, I'm running into insufficient memory problems which results in my query failing to give results.
I was wondering if there was a way to commit a chunk of data (say every 1000 rows) into the temp table before moving on to the next one.
Eg:
CREATE TABLE NEW_TABLE AS
SELECT [ column1, column2...columnN ]
FROM EXISTING_TABLE
[ WHERE ] <ALL CONDITIONS>
Now lets assume that the inner select returns 100 rows. I don't want all the 100 rows to be inserted into the NEW_TABLE at once. I want to split this up. How do I go about efficiently doing this?
If you create temp table why you not create table as GLOBAL TEMPORARY TABLE, like this? :
CREATE GLOBAL TEMPORARY TABLE NEW_TABLE AS
SELECT [ column1, column2...columnN ]
FROM EXISTING_TABLE
[ WHERE ] <ALL CONDITIONS>
ON COMMIT DELETE ROWS;
It can cause lower memory usage because of " Decreased redo generation as, by definition, they are non-logging." (http://psoug.org/reference/gtt.html)
Elaborating on what I wrote:
"Redo records are buffered in a
circular fashion in the redo log
buffer of the SGA (see "How Oracle
Database Writes to the Redo Log" )
and are written to one of the redo
log files by the Log Writer (LGWR)
database background process." - docs.oracle.com/cd/B28359_01/server.111/b28310/onlineredo001.htm
using global temporary let you avoiding redundant memory cost. I think you don't care about safety temporary data. Besides Global Temporary Table default is written to temporary tablespace which is optimized for the storage of transient data.
" When Oracle needs to store data in a global temporary table or build a hash table for a hash join, Oracle also starts the operation in memory and completes the task without writing to disk if the amount of data involved is small enough. While populating a global temporary table or building a hash is not a sorting operation, we will lump all of these activities together in this paper because they are handled in a similar way by Oracle.
If an operation uses up a threshold amount of memory, then Oracle breaks the operation into smaller ones that can each be performed in memory. Partial results are written to disk in a temporary tablespace."
Basing on upon:
- you use other memory area than default process memory (sga, undo) which appeared to small in your case
- when memory size will be insufficient -> oracle will write data to disk
OK. I think I have a temp solution/work around for my problem: Something like this:
DECLARE
counter number (6) := 1;
cursor c1 is
<select query>
type t__data is table of c1%rowtype index by inary_integer;
t_data t__data;
begin
open c1;
loop
fetch c1 bulk collect into t_data;
exit when t_data.count = 0;
for idx in t_data.first .. t_data.last loop
insert into TEMP_ONE1 (x,y)
values (t_data(idx).x, t_data(idx).y);
counter := counter + 1;
if counter = 10000
then
counter := 0;
commit;
end if;
end loop;
end loop;
close c1;
end;
Do you think this is a good solution to the problem?
(I ran the query for a small sample of data and it does work)
I want to insert rows with a MERGE statement in a specified order to avoid deadlocks. Deadlocks could otherwise happen because multiple transaction will call this statement with overlapping sets of keys. Note that this code is also sensitive to duplicate value exception but I handle that by retrying so that is not my question. I was doing the following:
MERGE INTO targetTable
USING (
SELECT ...
FROM sourceCollection
ORDER BY <desiredUpdateOrder>
)
WHEN MATCHED THEN
UPDATE ...
WHEN NOT MATCHED THEN
INSERT ...
Now I'm still getting the dead lock so I'm becoming unsure whether oracle maintains the order of the sub-query. Does anyone know how to best make sure that oracle locks the rows in targetTable in the same order in this case? Do I have to do a SELECT FOR UPDATE before the merge? In which order does the SELECT FOR UPDATE lock the rows? Oracle UPDATE statement has an ORDER BY clause that MERGE seems to be missing. Is there another way to avoid dead locks other than locking the rows in the same order every time?
[Edit]
This query is used to maintain a count of how often a certain action has taken place. When the action happens the first time a row is inserted, when it happens a second time the "count" column is incremented. There are millions of different actions and they happen very often. A table lock wouldn't work.
Controlling the order in which the target table rows are modified requires that you control the query execution plan of the USING subquery. That's a tricky business, and depends on what sort of execution plans your query is likely to be getting.
If you're getting deadlocks then I'd guess that you're getting a nested loop join from the source collection to the target table, as a hash join would probably be based on hashing the source collection and would modify the target table roughly in target-table rowid order because that would be full scanned -- in any case, the access order would be consistent across all of the query executions.
Likewise, if there was a sort-merge between the two data sets you'd get consistency in the order in which target table rows are accessed.
Ordering of the source collection seems to be desirable, but the optimiser might not be applying it so check the execution plan. If it is not then try inserting your data into a global temporary table using APPEND and with an ORDER BY clause, and then selecting from there without an order by clause, and explore the us of hints to entrench a nested loop join.
I don't believe the ORDER BY will affect anything (though I'm more than willing to be proven wrong); I think MERGE will lock everything it needs to.
Assume I'm completely wrong, assume that you get row-by-row locks with MERGE. Your problem still isn't solved as you have no guarantees that your two MERGE statements won't hit the same row simultaneously. In fact, from the information given, you have no guarantees that an ORDER BY improves the situation; it might make it worse.
Despite there being no skip locked rows syntax as there is with UPDATE there is still a simple answer, stop trying to update the same row from within different transactions. If feasible, you can use some form of parallel execution, for instance the DBMS_PARALLEL_EXECUTE subprogram CREATE_CHUNKS_BY_ROWID and ensure that your transactions only work on a specific sub-set of the rows in the table.
As an aside I'm a little worried by your description of the problem. You say there's some duplicate erroring that you fix by rerunning the MERGE. If the data in these duplicates is different you need to ensure that the ORDER BY is done not only on the data to be merged but the data being merged into. If you don't then there's no guarantee that you don't overwrite the correct data with older, incorrect, data.
First locks are not really managed at row level but at block level. You may encounter an ORA-00060 error even without modifying the same row. This can be tricky. Managing this is the request developper's job.
One possible workaround is to organize your table (never do that on huge tables or table with heavy change rates)
https://use-the-index-luke.com/sql/clustering/index-organized-clustered-index
Rather than do a merge, I suggest that you try and lock the row. If successful update it, if not insert new row. By default lock will wait if another process has a lock on the same thing.
CREATE TABLE brianl.deleteme_table
(
id INTEGER PRIMARY KEY
, cnt INTEGER NOT NULL
);
CREATE OR REPLACE PROCEDURE brianl.deleteme_table_proc (
p_id IN deleteme_table.id%TYPE)
AUTHID DEFINER
AS
l_id deleteme_table.id%TYPE;
-- This isolates this procedure so that it doesn't commit
-- anything outside of the procedure.
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
-- select the row for update
-- this will pause if someone already has the row locked.
SELECT id
INTO l_id
FROM deleteme_table
WHERE id = p_id
FOR UPDATE;
-- Row was locked, update it.
UPDATE deleteme_table
SET cnt = cnt + 1
WHERE id = p_id;
COMMIT;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
-- we were unable to lock the record, insert a new row
INSERT INTO deleteme_table (id, cnt)
VALUES (p_id, 1);
COMMIT;
END deleteme_table_proc;
CREATE OR REPLACE PROCEDURE brianl.deleteme_proc_test
AUTHID CURRENT_USER
AS
BEGIN
-- This resets the table to empty for the test
EXECUTE IMMEDIATE 'TRUNCATE TABLE brianl.deleteme_table';
brianl.deleteme_table_proc (p_id => 1);
brianl.deleteme_table_proc (p_id => 2);
brianl.deleteme_table_proc (p_id => 3);
brianl.deleteme_table_proc (p_id => 2);
FOR eachrec IN ( SELECT id, cnt
FROM brianl.deleteme_table
ORDER BY id)
LOOP
DBMS_OUTPUT.put_line (
a => 'id: ' || eachrec.id || ', cnt:' || eachrec.cnt);
END LOOP;
END;
BEGIN
-- runs the test;
brianl.deleteme_proc_test;
END;
I'm working on oracle 9i. I have a table with 135,000,000 records, partitioned where each partition having approx. 10,000,000 rows. all indexed and everything.
I need to delete around 70,000,000 rows from this as the new business requirement.
So I created a backup of the rows to be deleted as separate table.
Table1 <col1, col2........> -- main table (135,000,000 rows)
Table2 <col1, col2........> -- backup table (70,000,000 rows)
Tried the below delete query.
Delete from table1 t1 where exists (select 1 from table2 t2 where t2.col1 = t1.col1)
but it takes infinite hours.
then tried
declare
cursor c1 is
select col1 from table2;
c2 c1%rowtype;
cnt number;
begin
cnt :=0;
open c1;
loop
fetch c1 into c2;
exit when c1%notfound;
delete from table1 t1 where t1.col1 = c2.col1;
if cnt >= 100000 then
commit;
end if;
cnt:=cnt+1;
end loop;
close c1;
end;
even still its been running for more than 12 hours. and still not completed.
Please note that there are multiple indexes on table1 and an index on col1 on table2. all the tables and indexes are analysed.
Please advise if there is any way of optimizing for this scenario.
Thanks guys.
Drop all indexes (backup the create statements)
Use the select statement that used to build the backup table, create from it a DELETE command
Recreate all index
I remember facing this issue earlier. In that case, we resorted to doing this since it worked out faster than any other delete operation:
1) Create another table with identical structure
2) Insert into the new table the records you want to keep (use Direct path insert to speed this up)
3) Drop the old table
4) Rename the new table
You say that the table is partitioned. Is your intention to drop all the data in certain partitions? If so, you should be able to simply drop the 7 partitions that have the 70 million rows that you want to drop. I'm assuming, however, that your problem isn't that simple.
If you can do interim commits, that implies that you don't care about transactional consistency, the most efficient approach is likely something along the lines of
CREATE TABLE rows_to_save
AS SELECT *
FROM table1
WHERE <<criteria to select the 65 million rows you want to keep>>
TRUNCATE TABLE table1;
INSERT /*+ append */
INTO table1
SELECT *
FROM rows_to_save;
Barring that, rather than creating the backup table, it would be more efficient to simply issue the DELETE statement
DELETE FROM table1
WHERE <<criteria to select the 70 million rows you want to keep>>
You may also benefit from dropping or disabling the indexes and constraints before running the DELETE.
I'm going to answer this assuming that it is cheaper to filter against the backup table, but it would probably be cheaper to just use the negation of the criteria you used to populate the backup table.
1) create a new table with the same structure. No indexes, constraints, or triggers.
2)
select 'insert /*+ append nologging */ into new_table partition (' || n.partition_name || ') select * from old_table partition (' || o.partition_name || ') minus select * from bak_table partition (' || b.partition_name || ');'
from all_tab_partitions o, all_tab_partitions n, all_tab_partitions b
where o.partition_no = all( n.partition_no, b.partition_no)
and o.table_name = 'OLD_TABLE' and o.table_owner = 'OWNER'
and n.table_name = 'NEW_TABLE' and n.table_owner = 'OWNER'
and b.table_name = 'BAK_TABLE' and b.table_owner = 'OWNER';
-- note, I haven't run this it may need minor corrections in addition to the obvious substitutions
3) verify and the run the result of the previous query
4) build the indexes, constraints, and triggers if needed
This avoids massive amounts of redo and undo compared to the delete.
append hint for direct path inserts
no logging to further reduce redo - make sure you backup afterwards
takes advantage of your partitioning to break the work into chunks that can be sorted in less passes
You could probably go faster with parallel insert + parallel select, but it is probably not necessary. Just don't do a parallel select without the insert and an "alter session enable parallel dml"
We have a 'merge' script that is used to assign codes to customers. Currently it works by looking at customers in a staging table and assigning them unused codes. Those codes are marked as used and the staged records, with codes, loaded to a production table. The staging table gets cleared and life is peachy.
Unfortunately we are working with a larger data set now (both customers and codes) and the process is taking WAY to long to run. I'm hoping the wonderful community here can look at the code here and offer either improvements upon it or another way of attacking the problem.
Thanks in advance!
Edit - Forgot to mention part of the reason for some of the checks in this is that the staging table is 'living' and can have records feeding into it during the script run.
whenever sqlerror exit 1
-- stagingTable: TAB_000000003134
-- codeTable: TAB_000000003135
-- masterTable: TAB_000000003133
-- dedupe staging table
delete from TAB_000000003134 a
where ROWID > (
select min(rowid)
from TAB_000000003134 b
where a.cust_id = b.cust_id
);
commit;
delete from TAB_000000003134
where cust_id is null;
commit;
-- set row num on staging table
update TAB_000000003134
set row_num = rownum;
commit;
-- reset row nums on code table
update TAB_000000003135
set row_num = NULL;
commit;
-- assign row nums to codes
update TAB_000000003135
set row_num = rownum
where dateassigned is null
and active = 1;
commit;
-- attach codes to staging table
update TAB_000000003134 d
set (CODE1, CODE2) =
(
select CODE1, CODE2
from TAB_000000003135 c
where d.row_num = c.row_num
);
commit;
-- mark used codes compared to template
update TAB_000000003135 c
set dateassigned = sysdate, assignedto = (select cust_id from TAB_000000003134 d where c.CODE1 = d.CODE1)
where exists (select 'x' from TAB_000000003134 d where c.CODE1 = d.CODE1);
commit;
-- clear and copy data to master
truncate table TAB_000000003133;
insert into TAB_000000003133 (
<custmomer fields>, code1, code2, TIMESTAMP_
)
select <custmomer fields>, CODE1, CODE2,SYSDATE
from TAB_000000003134;
commit;
-- remove any staging records with code numbers
delete from TAB_000000003134
where CODE1 is not NULL;
commit;
quit
Combine statements as much as possible. For example, combine the first two deletes by simply adding "or cust_id is null" to the first delete. This will definitely reduce the number of reads, and may also significantly decrease the amount of data written. (Oracle writes blocks, not rows, so even if the two statements work with different rows they may be re-writing the same blocks.)
It's probably quicker to insert the entire table into another table than to update every row. Oracle does a lot of extra work for updates and deletes, to maintain concurrency and consistency. And updating values to NULL can be especially expensive, see update x set y = null takes a long time for some more details. You can avoid (almost all) UNDO and REDO with direct-path inserts: make sure the table is in NOLOGGING mode (or the database is in NOARCHIVELOG mode), and insert using the APPEND hint.
Replace the UPDATEs with MERGEs. UPDATEs can only use nested loops, MERGEs can also use hash joins. If you're updating a large amount of data a MERGE can be significantly faster. And MERGEs don't have to read a table twice if it's used for the SET and for a EXISTS. (Although creating a new table may also be faster.)
Use /*+ APPEND */ with the TAB_000000003133 insert. If you're truncating the table, I assume you don't need point-in-time recovery of the data, so you might as well insert it directly to the datafile and skip all the overhead.
Use parallelism (if you're not already). There are side-affects and dozens of factors to consider for tuning, but don't let that discourage you. If you're dealing with large amounts of data, sooner or later you'll need to use parallelism if you want to get the most out of your hardware.
Use better names. This advice is more subjective, but in my opinion I think using good names is extremely important. Even though it's all 0s and 1s at some level, and many programmers think that cryptic code is cool, you want people to understand and care about your data. People just won't care as much about TAB_000000003135 as something like TAB_CUSTOMER_CODES. It'll be harder to learn, people are less likely to change it because it looks so complicated, and people are less likely to see errors because the purpose isn't as clear.
Don't commit after every statement. Instead, you should issue one COMMIT at the end of the script. This isn't so much for performance, but because the data is not in a consistent state until the end of the script.
(It turns out there probably are performance benefits to committing less frequently in Oracle, but your primary concern should be about maintaining consistency)
You might look into using global temporary tables. The data in a global temp table is only visible to the current session, so you could skip some of the reset steps in your script.
At work, I have a large table (some 3 million rows, like 40-50 columns). I sometimes need to empty some of the columns and fill them with new data. What I did not expect is that
UPDATE table1 SET y = null
takes much more time than filling the column with data which is generated, for example, in the sql query from other columns of the same table or queried from other tables in a subquery. It does not matter if I go through all table rows at once (like in the update query above) or if I use a cursor to go through the table row by row (using the pk). It does not matter if I use the large table at work or if I create a small test table and fill it with some hundredthousands of test-rows. Setting the column to null always takes way longer (Throughout the tests, I encountered factors of 2 to 10) than updating the column with some dynamic data (which is different for each row).
Whats the reason for this? What does Oracle do when setting a column to null? Or - what's is my error in reasoning?
Thanks for your help!
P.S.: I am using oracle 11g2, and found these results using both plsql developer and oracle sql developer.
Is column Y indexed? It could be that setting the column to null means Oracle has to delete from the index, rather than just update it. If that's the case, you could drop and rebuild it after updating the data.
EDIT:
Is it just column Y that exhibits the issue, or is it independent of the column being updated? Can you post the table definition, including constraints?
Summary
I think updating to null is slower because Oracle (incorrectly) tries to take advantage of the way it stores nulls, causing it to frequently re-organize the rows in the block ("heap block compress"), creating a lot of extra UNDO and REDO.
What's so special about null?
From the Oracle Database Concepts:
"Nulls are stored in the database if they fall between columns with data values. In these cases they require 1 byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the remaining columns in the previous row are null. For example, if the last three columns of a table are null, no information is stored for those columns. In tables with many columns,
the columns more likely to contain nulls should be defined last to conserve disk space."
Test
Benchmarking updates is very difficult because the true cost of an update cannot be measured just from the update statement. For example, log switches will
not happen with every update, and delayed block cleanout will happen later. To accurately test an update, there should be multiple runs,
objects should be recreated for each run, and the high and low values should be discarded.
For simplicity the script below does not throw out high and low results, and only tests a table with a single column. But the problem still occurs regardless of the number of columns, their data, and which column is updated.
I used the RunStats utility from http://www.oracle-developer.net/utilities.php to compare the resource consumption of updating-to-a-value with updating-to-a-null.
create table test1(col1 number);
BEGIN
dbms_output.enable(1000000);
runstats_pkg.rs_start;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = 1';
commit;
end loop;
runstats_pkg.rs_pause;
runstats_pkg.rs_resume;
for i in 1 .. 10 loop
execute immediate 'drop table test1 purge';
execute immediate 'create table test1 (col1 number)';
execute immediate 'insert /*+ append */ into test1 select 1 col1
from dual connect by level <= 100000';
commit;
execute immediate 'update test1 set col1 = null';
commit;
end loop;
runstats_pkg.rs_stop();
END;
/
Result
There are dozens of differences, these are the four I think are most relevant:
Type Name Run1 Run2 Diff
----- ---------------------------- ------------ ------------ ------------
TIMER elapsed time (hsecs) 1,269 4,738 3,469
STAT heap block compress 1 2,028 2,027
STAT undo change vector size 55,855,008 181,387,456 125,532,448
STAT redo size 133,260,596 581,641,084 448,380,488
Solutions?
The only possible solution I can think of is to enable table compression. The trailing-null storage trick doesn't happen for compressed tables.
So even though the "heap block compress" number gets even higher for Run2, from 2028 to 23208, I guess it doesn't actually do anything.
The redo, undo, and elapsed time between the two runs is almost identical with table compression enabled.
However, there are lots of potential downsides to table compression. Updating to a null will run much faster, but every other update will run at least slightly slower.
That's because it deletes from blocks that data.
And delete is the hardest operation. If you can avoid a delete, do it.
I recommend you to create another table with that column null(Create table as select for example, or insert select), and fill it(the column) with your procedure. Drop old table and then rename the new table with current name.
UPDATE:
Another important thing is that you should update the column as is, with new values. It is useless to set them null and after that refill them.
If you do not have values for all rows, you can do the update like this:
udpate table1
set y = (select new_value from source where source.key = table1.key)
and will set to null those rows that does not exists in source.
I would try what Tom Kyte suggested on large updates.
When it comes to huge tables, it best to go like this : take a few rows, update them, take some more, update those etc. Don't try to issue an update on all the table. That's a killer move right from the start.
Basically create binary_integer indexed table, fetch 10 rows at a time, and update them.
Here is a piece of code that i have used of large tables with success. Because im lazy and its like 2AM now ill just copy paste it here and let you figure it out, but let me know if you need help :
DECLARE
TYPE BookingRecord IS RECORD (
bprice number,
bevent_id number,
book_id number
);
TYPE array is TABLE of BookingRecord index by binary_integer;
l_data array;
CURSOR c1 is
SELECT LVC_USD_PRICE_V2(ev.activity_version_id,ev.course_start_date,t.local_update_date,ev.currency,nvl(t.delegate_country,ev.sponsor_org_country),ev.price,ev.currency,t.ota_status,ev.location_type) x,
ev.title,
t.ota_booking_id
FROM ota_gsi_delegate_bookings_t#diseulprod t,
inted_parted_events_t#diseulprod ev
WHERE t.event_id = ev.event_id
and t.ota_booking_id =
BEGIN
open c1;
loop
fetch c1 bulk collect into l_data limit 20;
for i in 1..l_data.count
loop
update ou_inc_int_t_01
set price = l_data(i).bprice,
updated = 'Y'
where booking_id = l_data(i).book_id;
end loop;
exit when c1%notfound;
end loop;
close c1;
END;
what can also help speed up updates is to use alter table table1 nologging so that the update won't generate redo logs. another possibility is to drop the column and re-add it. since this is a DDL operation it will generate neither redo nor undo.