relation not found, this can be validly caused by a concurrent delete operation on this object - locking

We have a query running on GreenPlum running postgres 9.4 which creates temp tables as follow:
drop table if exists mytemptable;
create temp table mytemptable as
/select code here/
distributed randomly;
drop table if exists mytemptable;
create temp table mytemptable as
/another select code here/
distributed randomly;
and it repeats the drop/create 5 times for different purposes, using the same temp table.
It works in the current version we are using (8.4) but we are migrating from Postgres 8.4 to 9.4 and now this doesn't work, and we get an error related to a concurrent delete operation, so we suspect it is deleting the temp tables on a Process ID when the same table is getting accessed by some other Process ID, kinda running the 5 create table statements at the same time..
How can this be avoided? We're using the default postgres optimizer, when we switch on the GPORCA optimizer, this query works, but for other reasons not related to this query specifically, we have decided not to use the GPORCA optimizer.

Related

Redshift: Support for concurrent inserts in the same table

I have a lambda code that fires some insert queries to the same Table concurrently through redshift data api.
1. Insert into Table ( select <some analytical logic> from someTable_1)
2. Insert into Table ( select <some analytical logic> from someTable_2)
3. Insert into Table ( select <some analytical logic> from someTable_n)
Considering such queries will be fired concurrently, does Redshift apply a lock to the Table for each insert? Or does it allow parallel insert queries in the same table?
I'm asking because postgres allows concurrent inserts.
https://www.postgresql.org/files/developer/concurrency.pdf
Both Redshift and Postgres DBs us MVCC - https://en.wikipedia.org/wiki/Multiversion_concurrency_control - so they will likely work the same. There are no write-locks, just serial progression through the commit queue when the commits are seen. I've see no functional problems with this in Redshift so you should be good.
Functionally this is good but Redshift is columnar and Postgres is row-based. This leads to differences in the updating side. Since these INSERTs are likely only adding a small (for Redshift) number of rows and the minimum write size on Redshift is 1MB per column per slice, there is likely to be a lot of unused space in these blocks. If this is done often there will be a lot of wasted space in the table and large need to vacuum. If you can you will want to look at this write pattern to see if more batching of the insert data can be done.
Based on the discussion in comments, one can conclude that concurrent inserts into the same table in Redshift are blocking in nature as opposed to postgres.
Refer to the docs:-
https://docs.aws.amazon.com/redshift/latest/dg/r_Serializable_isolation_example.html
Edit:-
FYI if you are thinking what is the exact information to look for in the above mentioned docs, I am directly pasting it below:-
Concurrent COPY operations into the same table
Transaction 1 copies rows into the LISTING table:
begin;
copy listing from ...;
end;
Transaction 2 starts concurrently in a separate session and attempts to copy more rows into the LISTING table. Transaction 2 must wait until transaction 1 releases the write lock on the LISTING table, then it can proceed.
begin;
[waits]
copy listing from ;
end;
The same behavior would occur if one or both transactions contained an INSERT command instead of a COPY command.

Merge, Partition and Remote Database - Performance Tuning Oracle

I want to tune my merge query which inserts and updates table in Oracle based on source table in SQL Server. Table Size is around 120 million rows and normally around 120k records are inserted/updated daily. Merge takes around 1.5 hours to run. It uses nested loop and primary key index to perform insert and update.
There is no record update date in source table to use; so all records are compared.
Merge abc tgt
using
(
select a,b,c
from sourcetable#sqlserver_remote) src
on (tgt.ref_id = src.ref_id)
when matched then
update set
.......
where
decode(tgt.a, src.a,1,0) = 0
or ......
when not matched then
insert (....) values (.....);
commit;
Since the table is huge and growing every day, I partitioned the table in DEV based on ref id (10 groups) and created local index on ref id.
Now it uses hash join and full table scan and it runs longer than the existing process.
When I changed from local to global index (ref_id), i uses nested loops but still takes longer to run than the existing process.
Is there a way to performance tune the process.
Thanks...
I'd be wary to join/merge huge tables over a database link. I'd try to copy over the complete source table (for instance with a non-atomic mview, possibly compressed, possibly sorted, certainly only the columns you'll need). After gathering statistics, I'd merge the target table with the local copy. Afterwards, the local copy can be truncated.
I wouldn't be surprised, if partitioning speeds up the merge from the local copy to your target table.

BigQuery Atomicity

I am trying to do a full load of a table in big query daily, as part of ETL. The target table has dummy partition column of type integer and is clustered. I want to have the statement to be atomic i.e either it will completely overwrite the new data or rollback to old data in case of failure for any reason in between and it will serve user queries with old data until it completely overwritten.
One way of doing this is delete and insert but big query does not support multi statement transactions.
I am thinking to use the below statement. Please let me know if this is atomic.
create or replace table_1 partition by dummy_int cluster dummy_column
as select col1,col2,col3 from stage_table1

oracle creating table from another table created partially ; unable to extend temp space

We are trying to create a table from another table with method -
create table tab1 as select * from tab2;
But the process failed with error
ORA-01652: unable to extend temp segment by 8192 in tablespace
However the table tab1 is created with partial data only. There is a count mismatch in tab1 and tab2. Any of these two tables being not populated/ updated by any transaction. This happened with a couple of tables.
What my knowledge says about it, a create table should create a table at all or not at all. There is no possibility of table being created partially.
Any insight is suggested from experts.
Putting the cause of the error aside (addressed by #Leo in his answer):
I have not found anything specific on transactions for CREATE TABLE ... AS SELECT. Any CREATE TABLE statement is a DDL operation, which in turn are generally non-transactional operations.
This is just a speculation, but I'd say that the table creation did succeed. The instruction you gave is basically a two in one, where the first one is the actual table creation, which does work (and as it is not transactional, it can't be affected by the second one) and the second is a variant of a bulk insert from select (with implicit commits for batches), which breaks at some point.
This is probably not answering your question, but as the operation is apparently two-phase anyway, if you need more transactional approach, you would benefit from splitting the operation into two separate ones:
first:
CREATE TABLE tab1 AS SELECT * FROM tab2 WHERE 1 = 2;
second:
INSERT INTO tab1 SELECT * FROM tab2;
This way if the second part fails, you will not end up with a partial insert. You will still have the table in place though.
Execute the following to determine the filename for the existing tablespace as sysadmin
SELECT * FROM DBA_DATA_FILES;
Then extend the size of the datafile as follows (replace the filename with the one from the previous query):
ALTER DATABASE DATAFILE 'C:\ORACLEXE\ORADATA\XE\SYSTEM.DBF' RESIZE 4096M;
You can first try below command or ask DBA to give the privilege:
grant unlimited tablespace to <schema_name>;

What's the benefit of doing temporary tables (#table) instead of persistent tables (table)?

I can think of two main benefits:
Avoiding concurrency problems, if you have many processes creating/dropping tables you can get in trouble as one process tries to create an already existing table.
Performance, I imagine that creating temporary tables (with #) is more performant than regular tables.
Is there any other reason, and is any of my reasons false?
You can't compare temporary and persistent tables:
Persistent tables keep your data and can be used by any process.
Temporary ones are throw away and #ones are visible only to that connection
You'd use a temp table to spool results for further processing and such.
There is little difference in performance (either way) between the two types of table.
You shouldn't be dropping and creating tables all the time... any app that relies on this is doing something wrong, not least way too many SQL calls.
(1)Temp Tables are created in the SQL Server TEMPDB database and therefore require more IO resources and locking. Table Variables and Derived Tables are created in memory.
(2)Temp Tables will generally perform better for large amounts of data that can be worked on using parallelism whereas Table Variables are best used for small amounts of data (I use a rule of thumb of 100 or less rows) where parallelism would not provide a significant performance improvement.
(3)You cannot use a stored procedure to insert data into a Table Variable or Derived Table. For example, the following will work: INSERT INTO #MyTempTable EXEC dbo.GetPolicies_sp whereas the following will generate an error: INSERT INTO #MyTableVariable EXEC dbo.GetPolicies_sp.
(4)Derived Tables can only be created from a SELECT statement but can be used within an Insert, Update, or Delete statement.
(5) In order of scope endurance, Temp Tables extend the furthest in scope, followed by Table Variables, and finally Derived Tables.
1)
A table variable's lifespan is only for the duration of the transaction that it runs in. If we execute the DECLARE statement first, then attempt to insert records into the #temp table variable we receive the error because the table variable has passed out of existence. The results are the same if we declare and insert records into #temp in one transaction and then attempt to query the table. If you notice, we need to execute a DROP TABLE statement against #temp. This is because the table persists until the session ends or until the table is dropped.
2)
table variables have certain clear limitations.
-Table variables can not have Non-Clustered Indexes
-You can not create constraints in table variables
-You can not create default values on table variable columns
-Statistics can not be created against table variables
-Similarities with temporary tables include:
Similarities with temporary tables include:
-Instantiated in tempdb
-Clustered indexes can be created on table variables and temporary tables
-Both are logged in the transaction log
-Just as with temp and regular tables, users can perform all Data Modification Language -(DML) queries against a table variable: SELECT, INSERT, UPDATE, and DELETE.