Update VERY LARGE PostgreSQL database table efficiently - sql

I have a very large database table in PostgresQL and a column like "copied". Every new row starts uncopied and will later be replicated to another thing by a background programm. There is an partial index on that table "btree(ID) WHERE replicated=0". The background programm does a select for at most 2000 entries (LIMIT 2000), works on them and then commits the changes in one transaction using 2000 prepared sql-commands.
Now the problem ist that I want to give the user an option to reset this replicated-value, make it all zero again.
An update table set replicated=0;
is not possible:
It takes very much time
It duplicates the size of the tabel because of MVCC
It is done in one transaction: It either fails or goes through.
I actually don't need transaction-features for this case: If the system goes down, it shall process only parts of it.
Several other problems:
Doing an
update set replicated=0 where id >10000 and id<20000
is also bad: It does a sequential scan all over the whole table which is too slow.
If it weren't doing that, it would still be slow because it would be too many seeks.
What I really need is a way of going through all rows, changing them and not being bound to a giant transaction.
Strangely, an
UPDATE table
SET replicated=0
WHERE ID in (SELECT id from table WHERE replicated= LIMIT 10000)
is also slow, although it should be a good thing: Go through the table in DISK-order...
(Note that in that case there was also an index that covered this)
(An update LIMIT like Mysql is unavailable for PostgresQL)
BTW: The real problem is more complicated and we're talking about an embedded system here that is already deployed, so remote schema changes are difficult, but possible
It's PostgresQL 7.4 unfortunately.
The amount of rows I'm talking about is e.g. 90000000. The size of the databse can be several dozend gigabytes.
The database itself only contains 5 tables, one is a very large one.
But that is not bad design, because these embedded boxes only operate with one kind of entity, it's not an ERP-system or something like that!
Any ideas?

How about adding a new table to store this replicated value (and a primary key to link each record to the main table). Then you simply add a record for every replicated item, and delete records to remove the replicated flag. (Or maybe the other way around - a record for every non-replicated record, depending on which is the common case).
That would also simplify the case when you want to set them all back to 0, as you can just truncate the table (which zeroes the table size on disk, you don't even have to vacuum to free up the space)

If you are trying to reset the whole table, not just a few rows, it is usually faster (on extremely large datasets -- not on regular tables) to simply CREATE TABLE bar AS SELECT everything, but, copied, 0 FROM foo, and then swap the tables and drop the old one. Obviously you would need to ensure nothing gets inserted into the original table while you are doing that. You'll need to recreate that index, too.
Edit: A simple improvement in order to avoid locking the table while you copy 14 Gigabytes:
lock ;
create a new table, bar;
swap tables so that all writes go to bar;
unlock;
create table baz as select from foo;
drop foo;
create the index on baz;
lock;
insert into baz from bar;
swap tables;
unlock;
drop bar;
(let writes happen while you are doing the copy, and insert them post-factum).

While you cannot likely fix the problem of space usage (it is temporary, just until a vacuum) you can probably really speed up the process in terms of clock time. The fact that PostgreSQL uses MVCC means that you should be able to do this without any issues related to newly inserted rows. The create table as select will get around some of the performance issues, but will not allow for continued use of the table, and takes just as much space. Just ditch the index, and rebuild it, then do a vacuum.
drop index replication_flag;
update big_table set replicated=0;
create index replication_flag on big_table btree(ID) WHERE replicated=0;
vacuum full analyze big_table;

This is pseudocode. You'll need 400MB (for ints) or 800MB (for bigints) temporary file (you can compress it with zlib if it is a problem). It will need about 100 scans of a table for vacuums. But it will not bloat a table more than 1% (at most 1000000 dead rows at any time). You can also trade less scans for more table bloat.
// write all ids to temporary file in disk order
// no where clause will ensure disk order
$file = tmpfile();
for $id, $replicated in query("select id, replicated from table") {
if ( $replicated<>0 ) {
write($file,&$id,sizeof($id));
}
}
// prepare an update query
query("prepare set_replicated_0(bigint) as
update table set replicated=0 where id=?");
// reread this file, launch prepared query and every 1000000 updates commit
// and vacuum a table
rewind($file);
$counter = 0;
query("start transaction");
while read($file,&$id,sizeof($id)) {
query("execute set_replicated_0($id)");
$counter++;
if ( $counter % 1000000 == 0 ) {
query("commit");
query("vacuum table");
query("start transaction");
}
}
query("commit");
query("vacuum table");
close($file);

I guess what you need to do is
a. copy the 2000 records PK value into a temporary table with the same standard limit, etc.
b. select the same 2000 records and perform the necessary operations in the cursor as it is.
c. If successful, run a single update query against the records in the temp table. Clear the temp table and run step a again.
d. If unsuccessful, clear the temp table without running the update query.
Simple, efficient and reliable.
Regards,
KT

I think it's better to change your postgres to version 8.X. probably the cause is the low version of Postgres. Also try this query below. I hope this can help.
UPDATE table1 SET name = table2.value
FROM table2
WHERE table1.id = table2.id;

Related

Postgres extracting data from a huge table based on non-indexed column

We have a table on production which has been there for quite some time and the volume of that table is huge(close to 3 TB), since most of the data in this table is stale and unused we are planning to get rid of historical data which does not have any references.
There is a column "active" with type boolean which we can use to get rid of this data, however this column is not indexed.
Considering the volume of the table i am not too sure whether creation of a new index is going to help, i tried to incrementally delete the inactive rows 100K at a time but still the volume is so huge that this is going to take months to clear up.
The primary key of the table is of type UUID, i thought of creating a new table and inserting only the valued with active="true" as
insert
into
mytable_active
select
*
from
mytable
where
is_active = true;
But as expected this approach also fails because of the volume and keeps running like forever.
Any suggestions approaches would be most welcome.
When you need to delete a lot of rows quickly, partitioning is great......... when the table is already partitioned.
If there is no index on the column you need, then at least one full table scan will be required, unless you can use another index like "date" or something to narrow it down.
I mean, you could create an index "WHERE active" but that would also require the full table scan you're trying to avoid, so... meh.
First, DELETE. Just don't, not even in small bits with LIMIT. Not only will it write most of the table (3TB writes) but it will also write it to the WAL (3 more TB) and it will also update the indexes, and write that to the WAL too. This will take forever, and the random IO from index updates will nuke your performance. And if it ever finishes, you will still have a 3TB file, with most of it unallocated. Plus indexes.
So, no DELETE. Uh, wait.
Scenario with DELETE:
Swap the table with a view "SELECT * FROM humongous WHERE active=true" and add triggers or rules on the view to redirect updates/inserts/delete to the underlying table. Make sure triggers set all new rows with active=true.
Re-create each index (concurrently) except the primary key, adding "WHERE active=true". This will require a full table scan for the first index, even if you create the index on "active", because CREATE INDEX WHERE doesn't seem to be able to use another index to speed up when a WHERE is specified.
Drop the old indices
Note the purpose of the view is only to ensure absolutely all queries have "active=true" in the WHERE, because otherwise, they wouldn't be able to use the conditional indices we just created, so each query would be a full table scan, and that would be undesirable.
And now, you can DELETE, bit by bit, with your delete from mytable where id in ( select id from mytable where active = false limit 100000);
It's a tradeoff, you'll have a large number of table scans to recreate indices, but you'll avoid the random IO from index update due to a huge delete, which is the real reason why you say it will take months.
Scenario with INSERT INTO new_table SELECT...
If you have inserts and updates running on this huge table, then you have a problem, because these will not be transferred to the new table during the operation. So a solution would be to:
turn off all the scripts and services that run long queries
lock everything
create new_table
rename huge_table to huge_old
create a view that is a UNION ALL of huge_table and huge_old. From the application point of view, this view replaces huge_table. It must handle priority, ie if a row is present in the new table, a row with the same id present in the old table should be ignored... so it will have to have a JOIN. This step should be tested carefully beforehand.
unlock
Then, let it run for a while, see if the view does not destroy your performance. At this point, if it breaks, you can easily go back by dropping the view and renaming the table back to its old self. I said to turn off all the scripts and services that run long queries because these might fail with the view, and you don't want to take a big lock while one long query is running, because that will halt everything until it's done.
add insert/update/delete triggers on the view to redirect the writes to new_table. Inserts go directly to the new table, updates will have to transfer the row, deletes will have to hit both tables, and UNIQUE constraints will be... interesting. This will be a bit complicated.
Now to transfer the data.
Even if it takes a while, who cares? It will finish eventually. I suppose if you have a 3TB table, you must have some decent storage, even if that's these old spinning things that we used to put data on, it shouldn't take more than a few hours if the IO is not random. So the idea is to only use linear IO.
Fingers crossed hoping the table does not have a big text column that is stored in separate TOAST table that is going to require one random access per row. Did you check?
Now, you might actually want it to run for longer so it uses less IO bandwidth, both for reads and writes, and especially WAL writes. It doesn't matter how long the query runs as long as it doesn't degrade performance for the rest of the users.
Postgres will probably go for a parallel table scan to use all the cores and all the IO in the box, so maybe disable that first.
Then I think you should try to avoid the hilarious (for onlookers) scenario where it reads from the table for half a day, not finding any rows that match, so the disks handle the reads just fine, then it finds all the rows that match at the end and proceeds to write 300GB to the WAL and the destination table, causing huge write contention, and you have to Ctrl-C it when you know, you just know it in your gut that it was THIS CLOSE to finishing.
So:
create bogus_table just like mytable but without indices;
insert into bogus_table select * from mytable;
10% of "active" rows is still 300GB so better check the server can handle writing a 300GB table without slowing down. Watch vmstat and check if iowait goes crazy, watch number of transactions per second, query latency, web server responsiveness, the usual database health stuff. If the phone rings, hit Ctrl-C and say "Fixed!"
After it's done a few checkpoints, Ctrl-C. Time to do the real thing.
Now to make this query take much longer (and therefore destroy much less IO bandwidth) you can add this to the columns in your select:
pg_sleep((random()<0.000001)::INTEGER * 0.1)
That will make it sleep for 0.1s every million rows on average. Adjust to taste while looking at vmstat.
You can also monitor query progress using hacks.
It should work fine.
Once the interesting rows have been extracted from the accursed table, you could move the old data to a data warehouse or something, or to cold storage, or have fun loading it into clickhouse if you want to run some analytics.
Maybe partitioning the new table would also be a good idea, before it grows back to 3TB. Or periodically moving old rows.
Now, I wonder how you backup this thing...
-- EDIT
OK, I have another idea, maybe simpler, but you'll need a box.
Get a second server with fast storage and setup logical replication. On this replica server, create an empty UNLOGGED replica of the huge table with only one index on the primary key. Logical replication will copy the entire table, so it will take a while. A second network card in the original server or some QoS tuning would help not blowing up the ethernet connection you actually use to serve queries.
Logical replication is row based and identifies rows by primary key, so you absolutely need to manually create that PK index on the slave.
I've tested it on my home box right now and it works very well. The initial data transfer was a bit slow, but that may be my network. Pausing then resuming replication transferred rows inserted or updated on the master during the pause. However, renaming the table seems to break it, so you won't be able to do INSERT INTO SELECT, you'll have to DELETE on the replica. With SSDs, only one PK index, the table set to UNLOGGED, it should not take forever. Maybe using btrfs would turn the random index write IO into linear IO due to its copy on write nature. Or, if the PK index fits in shared_buffers, just YOLO it and set checkpoint_timeout to "7 days" so it doesn't actually write anything. You'll probably need to do the delete in chunks so the replicated updates keep up.
When I dropped the PK index to speed up the deletion, then recreated it before re-enabling replication, it didn't catch up on the updates. So you can't drop the index.
But is there a way to only transfer the rows you want to keep instead of transferring everything and deleting, while also having the replica keep up with the master's updates?... It's possible to do it for inserts (just disable the initial data copy) but not for updates unfortunately. You'd need an integer primary key so you could generate bogus rows on the replica that would then be updated during replication... but you can't do that with your UUID PK.
Anyway. Once this is done, set the number of WAL segments to be kept on the master server to a very high value, to resume replication later without missing updates.
And now you can run your big DELETE on the replica. When it's done, vacuum, maybe CLUSTER, re-create all indexes, etc, and set the table to LOGGED.
Then you can failover to the new server. Or if you're feeling adventurous, you could replicate the replica's table back on the master, since it will have the same name it should be in another schema.
That should allow for very little downtime since all updates are replicated, the replica will always be up to date.
I would suggest:
Copy the active records to a temporary table
Drop the main table
Rename the temporary table to the main table name

How to avoid data fragmentation in an "INSERT once, UPDATE once" table?

I have large number of tables that are "INSERT once", and then read-only after that. ie: After the initial INSERT of a record, there are never any UPDATE or DELETE statements. Due to this, the data fragmentation on disk for the tables is minimal.
I'm now considering adding an needs_action boolean field to every table. This field will only ever get updated once, and it will be done on a slow/periodic basis. As a result of MVCC, when VACUUM comes along (with an even slower schedule) after the UPDATE, the table becomes quite fragmented as it clears out the originally inserted tuples, and they get subsequently backfilled by new insertions.
In short: The addition of this single "always updated once" field changes the table from being being minimally fragmented by design, to being highly fragmented by design.
Is there some method of effectively achieving this single needs_action record flagging that can avoid the resulting table fragmentation?
.
.
.
.
<And now for some background/supplemental info...>
Some options considered so far...
At the risk of making this question massive (and therefore overlooked?), below are some options that have been considered so far:
Just add the column to each table, do the UPDATE and don't worry about resulting fragmentation until it actually proves to be an issue.
I'm conscious of premature optimization here, but with some of the tables getting large (>1M, and even > 1B) I'd rather get the design right up front.
Make an independent tracking table (for every table) only containing A) the PK from the master table, and B) the needs_action flag. Create a record in the tracking table with an AFTER INSERT trigger in the master table
This will preserve "INSERT only" minimal fragmentation levels on the master table... at the expense of adding (significant?) up-front write overhead
putting the tracking tables in a separate schema would also neatly separate the functionality from the core tables
Forcing the needs_action field to be a HOT update to avoid tuple replication
Needing an index on WHERE needs_action = TRUE seems to rule out this option, but maybe there's another way to find them quickly?
Using table fillfactor (at 50?) to leave space for the inevitable UPDATE
eg: set fillfactor to 50 to leave room for the UPDATE, and therefore keep it in the same page
But... with only one UPDATE it seems like this will leave the table packing fraction eternally at 50% and take twice the storage space? I don't 100% understand this option yet... still learning.
Find a special/magical field/bit in the master table records that can be twiddled without MVCC impact.
This does not seem to exist in postgres. Even if it did, it would need to be indexed (or have some other quick finding mechanism akin to a WHERE needs_action = TRUE partial index)
Being able to optionally suppress MVCC operation on specific columns seems like it would be nice here (although surely fraught with peril)
Storing needs_action outside of postgres (eg: as a <table_name>:needs_copying list of PKs in redis) to evade fragmentation due to mvcc.
I have concerns about keeping this atomic, though. Maybe using redis_fdw (or some other fdw?) in an AFTER INSERT trigger can keep it atomic? I need to learn more about fdw capabilities... seems like all fdw's I can find are all read-only, though.
Run a fancy view with background defrag/compacting, as described in this fantastic article
Seems like a bit much to do for all tables.
Simply track ids/PKs that need copying in a postgres table
just stash ids that need action as records to a fast inert table (eg: no PK), and DELETE the records when action is completed
similar to RPUSHing to an offline redis list (but definitely ACID)
This seems like the best option at the moment.
Are there other options to consider?
More on the specific use case driving this...
I'm interested in the general case how to avoid this fragmentation, but here's a bit more on the current use case:
Read performance is much more important than write performance on all tables (but avoiding crazy slow writes is clearly desirable)
Some tables will reach millions of rows. A few may hit billions of rows.
SELECT queries will be across wide table ranges (not just recent data) and can range from single result records to 100k+
Table design can be done from scratch... no need to worry about existing data
PostgreSQL 9.6
I would just lower the fillfactor below the default value of 100.
Depending on the size of the row, use a value like 80 or 90, so that a few new rows will still fit into the block. After the update, the old row can be “pruned” and defragmented by the next transaction so that the space can be reused.
A value of 50 seems way too low. True, this would leave room for all rows in a block being updated at the same time, but that is not your use case, right?
You can try to use inherited tables. This approach does not directly support PK for tables, but it might be resolved by triggers.
CREATE TABLE data_parent (a int8, updated bool);
CREATE TABLE data_inserted (CHECK (NOT updated)) INHERITS (data_parent);
CREATE TABLE data_updated (CHECK (updated)) INHERITS (data_parent);
CREATE FUNCTION d_insert () RETURNS TRIGGER AS $$
BEGIN
NEW.updated = false;
INSERT INTO data_inserted VALUES (NEW.*);
RETURN NULL;
END
$$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE TRIGGER d_insert BEFORE INSERT ON data_parent FOR EACH ROW EXECUTE PROCEDURE d_insert();
CREATE FUNCTION d_update () RETURNS TRIGGER AS $$
BEGIN
NEW.updated = true;
INSERT INTO data_updated VALUES (NEW.*);
DELETE FROM data_inserted WHERE (data_inserted.*) IN (OLD);
RETURN NULL;
END
$$ LANGUAGE plpgsql SECURITY DEFINER;
CREATE TRIGGER d_update BEFORE INSERT ON data_inserted FOR EACH ROW EXECUTE PROCEDURE d_update();
-- GRANT on d_insert to regular user
-- REVOKE insert / update to regular user on data_inserted/updated
INSERT INTO data_parent (a) VALUES (1);
SELECT * FROM ONLY data_parent;
SELECT * FROM ONLY data_inserted;
SELECT * FROM ONLY data_updated;
INSERT 0 0
a | updated
---+---------
(0 rows)
a | updated
---+---------
1 | f
(1 row)
a | updated
---+---------
(0 rows)
UPDATE data_parent SET updated = true;
SELECT * FROM ONLY data_parent;
SELECT * FROM ONLY data_inserted;
SELECT * FROM ONLY data_updated;
UPDATE 0
a | updated
---+---------
(0 rows)
a | updated
---+---------
(0 rows)
a | updated
---+---------
1 | t
(1 row)

Replace rows from another table

I have two tables with exactly same columns. First one is used for production, web application(django) is retrieving objects from it to show on webpage.
And I am using Python script for adding objects to the second one.
When script is done I need to replace all rows in table-1 with rows from table-2. Right now I am using something like this:
TRUNCATE table-1;
INSERT INTO table-1 (columns) SELECT columns FROM table-2;
TRUNCATE table-2;
VACUUM FULL;
Problem is that it taking too long time and after TRUNCATE table-1 website is just useless until INSERT is done. What would be the best way to approach this?
You can try to create a view and code django against that. When it's time to cut over between table1 and table2, just create or replace view to switch to the other table instead.
You don't need the expensive VACUUM FULL at all. Right after your big INSERT there is nothing to clean up. No dead tuples, and all indexes are in pristine condition.
It can make sense to run ANALYZE though to update completely changed table statistics right away.
For small tables DELETE can be faster than TRUNCATE, but for medium to large tables TRUNCATE is typically faster.
And do it all in a single transaction. INSERT after TRUNCATE in the same transaction does not have to write to WAL and is much faster:
BEGIN;
TRUNCATE table1;
INSERT INTO table1 TABLE table2;
TRUNCATE table2; -- do you need this?
ANALYZE table1; -- might be useful
COMMIT;
Details:
What causes large INSERT to slow down and disk usage to explode?
If you have any indexes on table1 it pays to drop them first and recreate them after the INSERT for big tables.
If you don't have depending objects, you might also just drop table1 and rename table2, which would be much faster.
How do I replace a table in postgres?
Either method requires an exclusive lock on the table. But you have been using VACUUM FULL previously which takes an exclusive lock as well:
VACUUM FULL rewrites the entire contents of the table into a new disk
file with no extra space, allowing unused space to be returned to the
operating system. This form is much slower and requires an exclusive
lock on each table while it is being processed.
So it's safe to assume an exclusive lock is ok for you.

Delete large portion of huge tables

I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database software is MS SQL 2005. There are several indexes and statistics on the table but not external relationships.
The best solution I came up with, so far, is to put the database into "simple" recovery mode, copy all the records I want to keep to a temporary table, truncate the original table, set identity insert to on and copy back the data from the temp table.
It works but it's still taking several hours to complete. Is there a faster way to do this ?
As per the comments my suggestion would be to simply dispense with the copy back step and promote the table containing records to be kept to become the new main table by renaming it.
It should be quite straightforward to script out the index/statistics creation to be applied to the new table before it gets swapped in.
The clustered index should be created before the non clustered indexes.
A couple of points I'm not sure about though.
Whether it would be quicker to insert into a heap then create the clustered index afterwards. (I guess no if the insert can be done in clustered index order)
Whether the original table should be truncated before being dropped (I guess yes)
#uriDium -- Chunking using batches of 50,000 will escalate to a table lock, unless you have disabled lock escalation via alter table (sql2k8) or other various locking tricks.
I am not sure what the structure of your data is. When does a row become eligible for deletion? If it is a purely ID based on date based thing then you can create a new table for each day, insert your new data into the new tables and when it comes to cleaning simply drop the required tables. Then for any selects construct a view over all the tables. Just an idea.
EDIT: (In response to comments)
If you are maintaining a view over all the tables then no it won't be complicated at all. The complex part is coding the dropping and recreating of the view.
I am assuming that you don't want you data to be locked down too much during deletes. Why not chunk the delete operations. Created a SP that will delete the data in chunks, 50 000 rows at a time. This should make sure that SQL Server keeps a row lock instead of a table lock. Use the
WAITFOR DELAY 'x'
In your while loop so that you can give other queries a bit of breathing room. Your problem is the old age computer science, space vs time.

Oracle SQL technique to avoid filling trans log

Newish to Oracle programming (from Sybase and MS SQL Server). What is the "Oracle way" to avoid filling the trans log with large updates?
In my specific case, I'm doing an update of potentially a very large number of rows. Here's my approach:
UPDATE my_table
SET a_col = null
WHERE my_table_id IN
(SELECT my_table_id FROM my_table WHERE some_col < some_val and rownum < 1000)
...where I execute this inside a loop until the updated row count is zero,
Is this the best approach?
Thanks,
The amount of updates to the redo and undo logs will not at all be reduced if you break up the UPDATE in multiple runs of, say 1000 records. On top of it, the total query time will be most likely be higher compared to running a single large SQL.
There's no real way to address the UNDO/REDO log issue in UPDATEs. With INSERTs and CREATE TABLEs you can use a DIRECT aka APPEND option, but I guess this doesn't easily work for you.
Depends on the percent of rows almost as much as the number. And it also depends on if the update makes the row longer than before. i.e. going from null to 200bytes in every row. This could have an effect on your performance - chained rows.
Either way, you might want to try this.
Build a new table with the column corrected as part of the select instead of an update. You can build that new table via CTAS (Create Table as Select) which can avoid logging.
Drop the original table.
Rename the new table.
Reindex, repoint contrainst, rebuild triggers, recompile packages, etc.
you can avoid a lot of logging this way.
Any UPDATE is going to generate redo. Realistically, a single UPDATE that updates all the rows is going to generate the smallest total amount of redo and run for the shortest period of time.
Assuming you are updating the vast majority of the rows in the table, if there are any indexes that use A_COL, you may be better off disabling those indexes before the update and then doing a rebuild of those indexes with NOLOGGING specified after the massive UPDATE statement. In addition, if there are any triggers or foreign keys that would need to be fired/ validated as a result of the update, getting rid of those temporarily might be helpful.