Merge, Partition and Remote Database - Performance Tuning Oracle - sql

I want to tune my merge query which inserts and updates table in Oracle based on source table in SQL Server. Table Size is around 120 million rows and normally around 120k records are inserted/updated daily. Merge takes around 1.5 hours to run. It uses nested loop and primary key index to perform insert and update.
There is no record update date in source table to use; so all records are compared.
Merge abc tgt
using
(
select a,b,c
from sourcetable#sqlserver_remote) src
on (tgt.ref_id = src.ref_id)
when matched then
update set
.......
where
decode(tgt.a, src.a,1,0) = 0
or ......
when not matched then
insert (....) values (.....);
commit;
Since the table is huge and growing every day, I partitioned the table in DEV based on ref id (10 groups) and created local index on ref id.
Now it uses hash join and full table scan and it runs longer than the existing process.
When I changed from local to global index (ref_id), i uses nested loops but still takes longer to run than the existing process.
Is there a way to performance tune the process.
Thanks...

I'd be wary to join/merge huge tables over a database link. I'd try to copy over the complete source table (for instance with a non-atomic mview, possibly compressed, possibly sorted, certainly only the columns you'll need). After gathering statistics, I'd merge the target table with the local copy. Afterwards, the local copy can be truncated.
I wouldn't be surprised, if partitioning speeds up the merge from the local copy to your target table.

Related

How to make DELETE faster in fast changing (DELSERT) table in MonetDB?

I am using MonetDB (MDB) for OLAP queries. I am storing source data in PostgreSQL (PGSQL) and syncing it with MonetDB in batches written in Python.
In PGSQL there is a wide table with ID (non-unique) and few columns. Every few seconds Python script takes a batch of 10k records changed in the PGSQL and uploads them to MDB.
The process of upload to MDB is as follows:
Create staging table in MDB
Use COPY command to upload 10k records into the staging table.
DELETE from destination table all IDs that are in staging table.
INSERT to the destination table all rows from staging table.
So, it is basically a DELETE & INSERT. I cannot use MERGE statement, because I do not have a PK - one ID can have multiple values in the destination. So I need to do a delete and full insert for all IDs currently synced.
Now to the problem: the DELETE is slow.
When I do a DELETE on a destination table, deleting 10k records in table of 25M rows, it will take 500ms.
However! If I run simple SELECT * FROM destination WHERE id = 1 and THEN do a DELETE, it takes 2ms.
I think that it has something to do with automatic creation of auxiliary indices. But this is where my knowledge ends.
I tried to solve this problem of "pre-heating" by doing the lookup myself and it works - but only for the first DELETE after pre-heat.
Once I do DELETE and INSERT, the next DELETE gets again slow. And doing the pre-heating before each DELETE does not make sense, because the pre-heat itself takes 500ms.
Is there any way on how to sync data to MDB without breaking auxiliary indices already built? Or make the DELETE faster without pre-heat? Or should I use some different technique to sync data into MDB without PK (does MERGE has the same problem?).
Thanks!

BigQuery Atomicity

I am trying to do a full load of a table in big query daily, as part of ETL. The target table has dummy partition column of type integer and is clustered. I want to have the statement to be atomic i.e either it will completely overwrite the new data or rollback to old data in case of failure for any reason in between and it will serve user queries with old data until it completely overwritten.
One way of doing this is delete and insert but big query does not support multi statement transactions.
I am thinking to use the below statement. Please let me know if this is atomic.
create or replace table_1 partition by dummy_int cluster dummy_column
as select col1,col2,col3 from stage_table1

Enabling and disabling indexes in plsql having performance issue

I am disabling and enabling the indexes before inserting data into staging table and before inserting data into the destination table(using MERGE statement) respectively. While the functionality is working fine my program takes too much time, as long as 10 hours to complete. This is how i'm doing in the code :
first disabling indexes of staging table..
load data into stage table using SQL*Loader..
enable the indexes of staging table..
insert data into destination table using MERGE(MERGE to dest. table using staging table.)
update errors, if any, to the staging table
NOTE : The staging table already has nearly 400 million rows. I was trying to insert 23 rows into staging and eventually destination table. The insertion into staging table is quick(till step 2) but rebuilding indexes and further on from step 3 is taking 10 hours.!!
Is my approach correct? How do i improve the performance?
Using the facts you mentioned:
1. Table already have 400M;
2. It's a Staging table;
3. New inserts can be massive;
4. You didn't specify if you need the rows in Staging, so I will cover as you need it.
Scenario I would create 3 tables:
TABLE_STAGING
TABLE_DESTINATION
TABLE_TEMP
1- First disable indexes of TABLE_TEMP;
2- Load data into TABLE_TEMP using SQL*Loader (read about APPEND
hint and Direct Load)
3- Enable the indexes in your TABLE_TEMP;
4- Insert data into TABLE_DESTINATION using MERGE on TABLE_TEMP;
5- Insert data into TABLE_STAGING from TABLE_TEMP - here you correct
the errors you found:
INSERT INTO TABLE_STAGING SELECT * FROM TABLE_TEMP;
6- Truncate table TABLE_TEMP;
Rebuilding index in 400M rows all the time is not ideal, it's a massive CPU work to check each value in the row to build the index. Staging tables should be empty all the table, or use temporary tables.

Add new column without table lock?

In my project having 23 million records and around 6 fields has been indexed of that table.
Earlier I tested to add delta column for Thinking Sphinx search but it turns in holding the whole database lock for an hour. Afterwards when the file is added and I try to rebuild indexes this is the query that holds the database lock for around 4 hours:
"update user_messages set delta = false where delta = true"
Well for making the server up I created a new database from db dump and promote it as database so server can be turned live.
Now what I am looking is that adding delta column in my table with out table lock is it possible? And once the column delta is added then why is the above query executed when I run the index rebuild command and why does it block the server for so long?
PS.: I am on Heroku and using Postgres with ika db model.
Postgres 11 or later
Since Postgres 11, only volatile default values still require a table rewrite. The manual:
Adding a column with a volatile DEFAULT or changing the type of an existing column will require the entire table and its indexes to be rewritten.
Bold emphasis mine. false is immutable. So just add the column with DEFAULT false. Super fast, job done:
ALTER TABLE tbl ADD column delta boolean DEFAULT false;
Postgres 10 or older, or for volatile DEFAULT
Adding a new column without DEFAULT or DEFAULT NULL will not normally force a table rewrite and is very cheap. Only writing actual values to it creates new rows. But, quoting the manual:
Adding a column with a DEFAULT clause or changing the type of an
existing column will require the entire table and its indexes to be rewritten.
UPDATE in PostgreSQL writes a new version of the row. Your question does not provide all the information, but that probably means writing millions of new rows.
While doing the UPDATE in place, if a major portion of the table is affected and you are free to lock the table exclusively, remove all indexes before doing the mass UPDATE and recreate them afterwards. It's faster this way. Related advice in the manual.
If your data model and available disk space allow for it, CREATE a new table in the background and then, in one transaction: DROP the old table, and RENAME the new one. Related:
Best way to populate a new column in a large table?
While creating the new table in the background: Apply all changes to the same row at once. Repeated updates create new row versions and leave dead tuples behind.
If you cannot remove the original table because of constraints, another fast way is to build a temporary table, TRUNCATE the original one and mass INSERT the new rows - sorted, if that helps performance. All in one transaction. Something like this:
BEGIN
SET temp_buffers = 1000MB; -- or whatever you can spare temporarily
-- write-lock table here to prevent concurrent writes - if needed
LOCK TABLE tbl IN SHARE MODE;
CREATE TEMP TABLE tmp AS
SELECT *, false AS delta
FROM tbl; -- copy existing rows plus new value
-- ORDER BY ??? -- opportune moment to cluster rows
-- DROP all indexes here
TRUNCATE tbl; -- empty table - truncate is super fast
ALTER TABLE tbl ADD column delta boolean DEFAULT FALSE; -- NOT NULL?
INSERT INTO tbl
TABLE tmp; -- insert back surviving rows.
-- recreate all indexes here
COMMIT;
You could add another table with the one column, there won't be any such long locks. Of course there should be another column, a foreign key to the first column.
For the indexes, you could use "CREATE INDEX CONCURRENTLY", it doesn't use too heavy locks on this table http://www.postgresql.org/docs/9.1/static/sql-createindex.html.

Time associated with dropping a table

Does the time it takes to drop a table in SQL reflect the quantity of data within the table?
Lets say for dropping an identical table one with 100000 rows and 1000 rows.
Is this MySQL? The ext3 Linux filesystem is known to be slow to drop large tables. The time to drop will have to do more with the size of the table, more than the rows in the table.
http://www.mysqlperformanceblog.com/2009/06/16/slow-drop-table/
It will definitely depend on the dbserver and on the specific storage params in that db server and upon the existence of LOBs in the table. For MOST scenarios, it's very fast.
In Oracle, dropping a table is very quick and unlikely to matter compared to other operations you would be performing (creating the table and populating it).
It would not be idomatic to be creating and dropping tables enough in Oracle for this to be any sort of a performance factor. You would instead consider using global temporary tables.
From http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/schema.htm
In addition to permanent tables,
Oracle Database can create temporary
tables to hold session-private data
that exists only for the duration of a
transaction or session.
The CREATE GLOBAL TEMPORARY TABLE
statement creates a temporary table
that can be transaction-specific or
session-specific. For
transaction-specific temporary tables,
data exists for the duration of the
transaction. For session-specific
temporary tables, data exists for the
duration of the session. Data in a
temporary table is private to the
session. Each session can only see and
modify its own data. DML locks are not
acquired on the data of the temporary
tables. The LOCK statement has no
effect on a temporary table, because
each session has its own private data.