Safely replace table with new data and schema - sql

I am trying to create a stored procedure to recreate a table from scratch, with a possible change of schema (including possible additions/removals of columns), by using a DROP TABLE followed by a SELECT INTO, like this:
BEGIN TRAN
DROP TABLE [MyTable]
SELECT (...) INTO [MyTable] FROM (...)
COMMIT
My concern is that errors could be generated if someone tries to access the table after it has been dropped but before the SELECT INTO has completed. Is there a way to lock [MyTable] in a way that will persist through the DROP?
Instead of DROP/SELECT INTO, I could TRUNCATE/INSERT INTO, but this would not allow the schema to be changed. SELECT INTO is convenient in my situation because it allows the new schema to be automatically determined. Is there a way to make this work safely?
Also, I would like to be sure that the source tables in "FROM (...)" are not locked during this process.

If you try to make a significant change to the table (like adding a column in the middle of existing columns, not at the end) using SSMS and see what script it generates, you'll see that SSMS uses sp_rename.
The general structure of the SSMS's script:
create a new table with temporary name
populate the new table with data
drop the old table
rename the new table to the correct name.
All this in a transaction.
This should keep the time when tables are locked to a minimum.
BEGIN TRANSACTION
SELECT (...) INTO dbo.Temp_MyTable FROM (...)
DROP TABLE dbo.MyTable
EXECUTE sp_rename N'dbo.Temp_MyTable', N'dbo.MyTable', 'OBJECT'
COMMIT
DROP TABLE MyTable acquires a schema modification (Sch-M) lock on it until the end of transaction, so all other queries using MyTable would wait. Even if other queries use the READ UNCOMMITTED isolation level (or the infamous WITH (NOLOCK) hint).
See also MSDN Lock Modes:
Schema Locks
The Database Engine uses schema modification (Sch-M)
locks during a table data definition language (DDL) operation, such as
adding a column or dropping a table. During the time that it is held,
the Sch-M lock prevents concurrent access to the table. This means the
Sch-M lock blocks all outside operations until the lock is released.

Related

Most efficient way of updating ~100 million rows in Postgresql database?

I have a database with a single table. This table will need to be updated every few weeks. We need to ingest third-party data into it and it will contain 100-120 million rows. So the flow is basically:
Get the raw data from the source
Detect inserts, updates & deletes
Make updates and ingest into the database
What's the best way of detecting and performing updates?
Some options are:
Compare incoming data with current database one by one and make single updates. This seems very slow and not feasible.
Ingest incoming data into a new table, then switch out old table with the new table
Bulk updates in-place in the current table. Not sure how to do this.
What do you suggest is the best option, or if there's a different option out there?
Postgres has a helpful guide for improving performance of bulk loads. From your description, you need to perform a bulk INSERT in addition to a bulk UPDATE and DELETE. Below is a roughly step by step guide for making this efficient:
Configure Global Database Configuration Variables Before the Operation
ALTER SYSTEM SET max_wal_size = <size>;
You can additionally disable WAL entirely.
ALTER SYSTEM SET wal_level = 'minimal';
ALTER SYSTEM SET archive_mode = 'off';
ALTER SYSTEM SET max_wal_senders = 0;
Note that these changes will require a database restart to take effect.
Start a Transaction
You want all work to be done in a single transaction in case anything goes wrong. Running COPY in parallel across multiple connections does not usually increase performance as disk is usually the limiting factor.
Optimize Other Configuration Variables at the Transaction level
SET LOCAL maintenance_work_mem = <size>
...
You may need to set other configuration parameters if you are doing any additional special processing of the data inside Postgres (work_mem is usually most important there especially if using Postgis extension.) See this guide for the most important configuration variables for performance.
CREATE a TEMPORARY table with no constraints.
CREATE TEMPORARY TABLE changes(
id bigint,
data text,
) ON COMMIT DROP; --ensures this table will be dropped at end of transaction
Bulk Insert Into changes using COPY FROM
Use the COPY FROM Command to bulk insert the raw data into the temporary table.
COPY changes(id,data) FROM ..
DROP Relations That Can Slow Processing
On the target table, DROP all foreign key constraints, indexes and triggers (where possible). Don't drop your PRIMARY KEY, as you'll want that for the INSERT.
Add a Tracking Column to target Table
Add a column to target table to determine if row was present in changes table:
ALTER TABLE target ADD COLUMN seen boolean;
UPSERT from the changes table into the target table:
UPSERTs are performed by adding an ON CONFLICT clause to a standard INSERT statement. This prevents the need from performing two separate operations.
INSERT INTO target(id,data,seen)
SELECT
id,
data,
true
FROM
changes
ON CONFLICT (id) DO UPDATE SET data = EXCLUDED.data, seen = true;
DELETE Rows Not In changes Table
DELETE FROM target WHERE not seen is true;
DROP Tracking Column and Temporary changes Table
DROP TABLE changes;
ALTER TABLE target DROP COLUMN seen;
Add Back Relations You Dropped For Performance
Add back all constraints, triggers and indexes that were dropped to improve bulk upsert performance.
Commit Transaction
The bulk upsert/delete is complete and the following commands should be performed outside of a transaction.
Run VACUUM ANALYZE on the target Table.
This will allow the query planner to make appropriate inferences about the table and reclaim space taken up by dead tuples.
SET maintenance_work_mem = <size>
VACUUM ANALYZE target;
SET maintenance_work_mem = <original size>
Restore Original Values of Database Configuration Variables
ALTER SYSTEM SET max_wal_size = <size>;
...
You may need to restart your database again for these settings to take effect.

How to rename/recreate a table without disrupting service?

I've a table I need to purge without disrupting the service. About 99.99% of data should be deleted, so I'm trying to recreate the table and moving the 0.01% usefull data into the new table as following (and I will truncate the old table later) :
BEGIN ISOLATION LEVEL SERIALIZABLE;
LOCK TABLE table1 IN ACCESS EXCLUSIVE MODE;
/* I rename the old table */
ALTER TABLE table1 RENAME TO table1_to_be_deleted;
/* And I recreate the table */
CREATE TABLE table1 (
...
);
/* Restore usefull data from old table to new one */
INSERT INTO table1 SELECT * FROM table1_to_be_deleted WHERE toBeKept = 1;
COMMIT;
But when I run my transaction I've got some client's error due to rows not found into the new table but present in the old one. These rows are well tagged as to be kept so they should be copied from old table to the new inside the transaction and found by the client's request....
When other requests are waiting for a lock acquired on a table, has it got a pointer to the targeted object? It's the only I've which can explained the update of the old table after I commit my transaction...
PS : I'm using Postgres 9.1
To do that I'd rather:
create auxilary table
create rules to DML instead of original table to auxilary
create rule to select instead of original, `unionned' both
move good data from ONLY original to auxilary
truncate original
either move back data (will not need to rebuild references) or rename
drop obsoleted rules and objects
But really, I'd just delete from where 99%, not inventing the wheel

How do I replace a table in Postgres?

Basically I want to do this:
begin;
lock table a;
alter table a rename to b;
alter table a1 rename to a;
drop table b;
commit;
i.e. gain control and replace my old table while no one has access to it.
Simpler:
BEGIN;
DROP TABLE a;
ALTER TABLE a1 RENAME TO a;
COMMIT;
DROP TABLE acquires an ACCESS EXCLUSIVE lock on the table anyway. An explicit LOCK command is no better. And renaming a dead guy is just a waste of time.
You may want to write-lock the old table while preparing the new, to prevent writes in between. Then you'd issue a lock like this earlier in the process:
LOCK TABLE a IN SHARE MODE;
What happens to concurrent transactions trying to access the table? It's not that simple, read this:
Best way to populate a new column in a large table?
Explains why you may have seen error messages like this:
ERROR: could not open relation with OID 123456
Create SQL-backup, make changes you need directly at the backup.sql file and restore database. I used this trick when have added INHERIT for group of tables (Postgres dbms) to remove inherited fields from subtable.
I would use answer#13, but I agree, it will not inherit the constraints, and drop table might fail
line up the relevant constraints first (like from pg_dump --schema-only,
drop the constraints
do the swap per answer#13
apply the constraints (sql snippets from the schema dump)

Populating a table from a view in Oracle with "locked" truncate/populate

I would like to populate a table from a (potentially large) view on a scheduled basis.
My process would be:
Disable indexes on table
Truncate table
Copy data from view to table
Enable indexes on table
In SQL Server, I can wrap the process in a transaction such that when I truncate the table a schema modification lock will be held until I commit. This effectively means that no other process can insert/update/whatever until the entire process is complete.
However I am aware that in Oracle the truncate table statement is considered DDL and will thus issue an implicit commit.
So my question is how can I mimic the behaviour of SQL Server here? I don't want any other process trying to insert/update/whatever whilst I am truncating and (re)populating the table. I would also prefer my other process to be unaware of any locks.
Thanks in advance.
Make your table a partitioned table with a single partition and local indexes only. Then whenever you need to refresh:
Copy data from view into a new temporary table
CREATE TABLE tmp AS SELECT ... FROM some_view;
Exchange the partition with the temporary table:
ALTER TABLE some_table
EXCHANGE PARTITION part WITH TABLE tmp
WITHOUT VALIDATION;
The table is only locked for the duration of the partition exchange, which, without validation and global index update, should be instant.

SQL Sever 2008 R2 - Lock table during clear and insert

We have a need to (once per month) clear out the contents of a table with 50,000 records, and repopulate, using a Stored Procedure. The SP has a User Defined Table Type parameter which contains all of the new records to be inserted.
The current thought is as follows
ALTER PROCEDURE [ProcName]
#TableParm UserTableType READONLY
AS
[Set lock on table?]
BEGIN TRAN
DELETE FROM [table]
INSERT INTO [table](column, column, column)
SELECT (a.column, a.column, a.column) FROM #TableParm a
COMMIT TRAN
[Remove lock from table?]
I've read some solutions which suggest to set READ COMMITED or READ UNCOMMITED... but figured I'd turn to the pro's to steer me in the right direction, based on the situation.
thanks!
I'd use a serializable transaction
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
Both the READ... type levels would allow data of some form to be read from the table, which is probably not what you want.
You may also be able to use TRUNCATE TABLE rather than DELETE, depending on your data structure.
If reducing the unavailability of this table is an issue, you may be able to reduce it by creating a new table, populating it, then renaming the old and new tables.