The "proper" way to atomically replace all contents in a PostgreSQL table? - sql

In the project I have been recently working on, many (PostgreSQL) database tables are just used as big lookup arrays. We have several background worker services, which periodically pull the latest data from a server, then replace all contents of a table with the latest data. The replacing has to be atomic because we don't want a partially completed table to be seen by lookup-ers.
I thought the simplest way to do the replacing is something like this:
BEGIN;
DELETE FROM some_table;
COPY some_table FROM 'source file';
COMMIT;
But I found a lot of production code use this method instead:
BEGIN;
CREATE TABLE some_table_tmp (LIKE some_table);
COPY some_table_tmp FROM 'source file';
DROP TABLE some_table;
ALTER TABLE some_table_tmp RENAME TO some_table;
COMMIT;
(I omit some logic such as change the owner of a sequence, etc.)
I just can't see any advantage of this method. Especially after some discoveries and experiments. SQL statements like ALTER TABLE and DROP TABLE acquire an ACCESS EXCLUSIVE lock, which even blocks a SELECT.
Can anyone explain what problem the latter SQL pattern is trying to solve? Or it's wrong and we should avoid using it?

Related

PL/SQL Replicating a table with a trigger on Oracle DB

I have never used triggers in PLSQL before, but I am now supposed to make a trigger that replicates the table of one database, and creates a copy of this table in another database. I am using AQT(Advanced Query Tool) as DBMS, and i have 2 database connections, and I need to copy the table and or data from DB1 to DB2. It's only based on one table that I need replicated, following tutorialspoint I have concluded that it should look somewhat like this:
`
CREATE OR REPLACE TRIGGER db_transfer
AFTER DELETE OR INSERT OR UPDATE ON X
WHEN (NEW.ID > 0)
I don't think i need for each since i want a copy of the whole table, and the condition is supposed to trigger this replication of the DB table. Is this the right approach?
EDIT
For anyone who uses AQT, they have a feature under Create -> Trigger -> and then click on the relevant tables etc to create it.

What is the most efficient way of creating a copy of a table with data and no constraints in Oracle?

What is the most efficient way of creating a copy of a table with data and no constraints (Primary key and foreign) in Oracle? some thing similar to the below query.
CREATE TABLE new_table
AS
SELECT * FROM old_table;
It's fine if we need to drop the constraints manually after copying but the creation of copy should be quick.
Please advise.
Using a CREATE TABLE AS SELECT statement the way you have it now is probably the most efficient way to do it. If not, it's pretty close.
It doesn't create constraints (apart from not null constraints) or indexes, so you have to create them manually after the operation completes.
You can specify that the operation should be parallelized by using the parallel keyword, though I believe that the feature is only available in the Enterprise Edition. Example:
create table new_table
parallel
as
select * from old_table;
It's even possible to specify the number of threads to use by adding an integer parameter right after the parallel keyword. But, by default, it parallelizes according to the available CPUs on the server.
It is also possible to make the operation even faster by avoiding redo log generation. This is done by specifying the nologging keyword:
create table new_table
parallel
nologging
as
select * from old_table;
However, because no redo log is generated, the operation is unrecoverable. So, if you're going to use that, you should consider backing up the database immediately after the operation completes. I would personally not use this option unless that extra performance is critical for some reason.
For more information on how to use the additional options with the create table as select statement, see the documentation: CREATE TABLE.

How do I replace a table in Postgres?

Basically I want to do this:
begin;
lock table a;
alter table a rename to b;
alter table a1 rename to a;
drop table b;
commit;
i.e. gain control and replace my old table while no one has access to it.
Simpler:
BEGIN;
DROP TABLE a;
ALTER TABLE a1 RENAME TO a;
COMMIT;
DROP TABLE acquires an ACCESS EXCLUSIVE lock on the table anyway. An explicit LOCK command is no better. And renaming a dead guy is just a waste of time.
You may want to write-lock the old table while preparing the new, to prevent writes in between. Then you'd issue a lock like this earlier in the process:
LOCK TABLE a IN SHARE MODE;
What happens to concurrent transactions trying to access the table? It's not that simple, read this:
Best way to populate a new column in a large table?
Explains why you may have seen error messages like this:
ERROR: could not open relation with OID 123456
Create SQL-backup, make changes you need directly at the backup.sql file and restore database. I used this trick when have added INHERIT for group of tables (Postgres dbms) to remove inherited fields from subtable.
I would use answer#13, but I agree, it will not inherit the constraints, and drop table might fail
line up the relevant constraints first (like from pg_dump --schema-only,
drop the constraints
do the swap per answer#13
apply the constraints (sql snippets from the schema dump)

postgresql: \copy method enter valid entries and discard exceptions

When entering the following command:
\copy mmcompany from '<path>/mmcompany.txt' delimiter ',' csv;
I get the following error:
ERROR: duplicate key value violates unique constraint "mmcompany_phonenumber_key"
I understand why it's happening, but how do I execute the command in a way that valid entries will be inserted and ones that create an error will be discarded?
The reason PostgreSQL doesn't do this is related to how it implements constraints and validation. When a constraint fails it causes a transaction abort. The transaction is in an unclean state and cannot be resumed.
It is possible to create a new subtransaction for each row but this is very slow and defeats the purpose of using COPY in the first place, so it isn't supported by PostgreSQL in COPY at this time. You can do it yourself in PL/PgSQL with a BEGIN ... EXCEPTION block inside a LOOP over a select from the data copied into a temporary table. This works fairly well but can be slow.
It's better, if possible, to use SQL to check the constraints before doing any insert that violates them. That way you can just:
CREATE TEMPORARY TABLE stagingtable(...);
\copy stagingtable FROM 'somefile.csv'
INSERT INTO realtable
SELECT * FROM stagingtable
WHERE check_constraints_here;
Do keep concurrency issues in mind though. If you're trying to do a merge/upsert via COPY you must LOCK TABLE realtable; at the start of your transaction or you will still have the potential for errors. It looks like that's what you're trying to do - a copy if not exists. If so, skipping errors is absolutely the wrong approach. See:
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
Insert, on duplicate update in PostgreSQL?
Postgresql - Clean way to insert records if they don't exist, update if they do
Can COPY be used with a function?
Postgresql csv importation that skips rows
... this is a much-discussed issue.
One way to handle the constraint violations is to define triggers on the target table to handle the errors. This is not ideal as there can still be race conditions (if concurrently loading), and triggers have pretty high overhead.
Another method: COPY into a staging table and load the data into the target table using SQL with some handling to skip existing entries.
Additionally, another useful method is to use pgloader

PL/SQL embedded insert into table that may not exist

I much prefer using this 'embedded' style inserts in a pl/sql block (opposed to the execute immediate style dynamic sql - where you have to delimit quotes etc).
-- a contrived example
PROCEDURE CreateReport( customer IN VARCHAR2, reportdate IN DATE )
BEGIN
-- drop table, create table with explicit column list
CreateReportTableForCustomer;
INSERT INTO TEMP_TABLE
VALUES ( customer, reportdate );
END;
/
The problem here is that oracle checks if 'temp_table' exists and that it has the correct number of colunms and throws a compile error if it doesn't exist.
So I was wondering if theres any way round that?! Essentially I want to use a placeholder for the table name to trick oracle into not checking if the table exists.
EDIT:
I should have mentioned that a user is able to execute any 'report' (as above). A mechanism that will execute an arbitrary query but always write to the temp_table ( in the user's schema). Thus each time the report proc is run it drops the temp_table and recreates it with, most probably, a different column list.
You could use a dynamic SQL statement to insert into the maybe-existent temp_table, and then catch and handle the exception that occurs when the table doesn't exist.
Example:
execute immediate 'INSERT INTO '||TEMP_TABLE_NAME||' VALUES ( :customer, :reportdate )' using customer, reportdate;
Note that having the table name vary in a dynamic SQL statement is not very good, so if you ensure the table names stay the same, that would be best.
Maybe you should be using a global temporary table (GTT). These are permanent table structures that hold temporary data for an Oracle session. Many different sessions can insert data into the same GTT, and each will only be able to see their own data. The data is automatically deleted either on COMMIT or when the session ends, according to the GTT's definition.
You create the GTT (once only) like this:
create globabal temporary table my_gtt
(customer number, report_date date)
on commit delete/preserve* rows;
* delete as applicable
Then your programs can just use it like any other table - the only difference being it always begins empty for your session.
Using GTTs are much preferable to dropping/recreating tables on the fly - if your application needs a different structure for each report, I strongly suggest you work out all the different structures that each report needs, and create separate GTTs as needed by each, instead of creating ordinary tables at runtime.
That said, if this is just not feasible (and I've seen good examples when it's not, e.g. in a system that supports a wide range of ad-hoc requests from users), you'll have to go with the EXECUTE IMMEDIATE approach.