Truncating part of a postgres table

Truncating part of a postgres table - sql

I am running Postgres version 10.01
psql -V
psql (PostgreSQL) 10.5
I have a table mytable with about 250 million rows - my objective is to create a new table newtable and copy about half of the mytable into newtable (SELECT * WHERE time > '2019-01-01 ), then to delete the records I copied from mytable
Of course, I want to keep all indices in mytable
What is the most efficient command to do this in psql ? TRUNCATE TABLE is efficient but will remove all rows. DELETE would probably take a lot of time and prevent inserts from happening (INSERTS are scheduled every 10 mins)
Any suggestions would be helpful

You would need to proceed in two steps.
First, copy the rows to the new table. You can use a CREATE..AS SELECT statement (but you will need to recreate indexes and other objects such as constraints manually on the new table after that).
CREATE TABLE new_table
AS SELECT * FROM old_table WHERE time > '2019-01-01
Then, delete records from the old table. It looks like an efficient way would be to JOIN with the new table, using the DELETE...USING syntax. Assuming that you have a primary key called id :
DELETE FROM old_table o
USING new_table n
WHERE n.id = o.id
(Be sure to create an indice on id in the new table before running this).

If you are just trying to delete rows couldn't you just treat your delete as a transaction. Unless your insert is dependent on the existing data in the table, there should be no blocking.

Related

Fastest options for merging two tables in SQL Server

Consider two very large tables, Table A with 20 million rows in, and Table B which has a large overlap with TableA with 10 million rows. Both have an identifier column and a bunch of other data. I need to move all items from Table B into Table A updating where they already exist.
Both table structures
- Identifier int
- Date DateTime,
- Identifier A
- Identifier B
- General decimal data.. (maybe 10 columns)
I can get the items in Table B that are new, and get the items in Table B that need to be updated in Table A very quickly, but I can't get an update or a delete insert to work quickly. What options are available to merge the contents of TableB into TableA (i.e. updating existing records instead of inserting) in the shortest time?
I've tried pulling out existing records in TableB and running a large update on table A to update just those rows (i.e. an update statement per row), and performance is pretty bad, even with a good index on it.
I've also tried doing a one shot delete of the different values out of TableA that exist in TableB and performance of the delete is also poor, even with the indexes dropped.
I appreciate that this may be difficult to perform quickly, but I'm looking for other options that are available to achieve this.

Since you deal with two large tables, in-place updates/inserts/merge can be time consuming operations. I would recommend to have some bulk logging technique just to load a desired content to a new table and the perform a table swap:
Example using SELECT INTO:
SELECT *
INTO NewTableA
FROM (
SELECT * FROM dbo.TableB b WHERE NOT EXISTS (SELECT * FROM dbo.TableA a WHERE a.id = b.id)
UNION ALL
SELECT * FROM dbo.TableA a
) d
exec sp_rename 'TableA', 'BackupTableA'
exec sp_rename 'NewTableA', 'TableA'
Simple or at least Bulk-Logged recovery is highly recommended for such approach. Also, I assume that it has to be done out of business time since plenty of missing objects to be recreated on a new tables: indexes, default constraints, primary key etc.

A Merge is probably your best bet, if you want to both inserts and updates.
MERGE #TableB AS Tgt
USING (SELECT * FROM #TableA) Src
ON (Tgt.Identifier = SRc.Identifier)
WHEN MATCHED THEN
UPDATE SET Date = Src.Date, ...
WHEN NOT MATCHED THEN
INSERT (Identifier, Date, ...)
VALUES (Src.Identifier, Src.Date, ...);
Note that the merge statement must be terminated with a ;

Add column to huge table in Postgresql without downtime

I have a very small table(about 1 mil rows) and I'm going to drop constraints and add new column. The query below is hang about 5 minutes, had to rollback.
BEGIN WORK;
LOCK TABLE usertable IN SHARE MODE;
ALTER TABLE usertable ALTER COLUMN email DROP NOT NULL;
COMMIT WORK;
Another approach suggested on the similar questions in the internet -
CREATE TABLE new_tbl
(
field1 int,
field2 int,
...
);
INSERT INTO new_tbl(field1, field2, ...)
(
SELECT FROM ... -- use your new logic here to insert the updated data
)
CREATE INDEX -- add your constraints and indexes to new_tbl
DROP TABLE tbl;
ALTER TABLE tbl_new RENAME tbl;
Create new table
Insert records from old table to new one (take less then a second)
Drop old table - this query hangs for about 5 minutes ~. Had to rollback. Does not work for me.
Renamed new created table to old one
Dropping old table simply hangs. However when I try to drop new created table with 1 million rows - it works instantly. Why dropping of old table takes so much time ?
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
I can see a lot of concurrent writes/reads which are waiting for my operation. Since I took lock on the table, I don't really think that the reason why I can't drop old table.
Just run vacuum on old table it did not help.
Why I can't drop old table why it takes so much time compared to dropping recently created table ?

I don't have a lot of experience with PostgreSQL, but my guess is that it keeps a bit to signify when a NULLable column is NULL (as opposed to empty) and when that column is marked as NOT NULL it no longer needs that bit. So, when you change that attribute on a column the system needs to go through the whole table and rearrange the data, moving lots of bits around so that the rows are all correctly structured.
This is much different from a DROP TABLE, which merely needs to mark the disk space as no longer in use and perhaps update a few metadata values.
In short, they're very different actions, so of course they take different amounts of time.

I was not able to drop/rename table original table cause of FK of others tables. Once I dropoed it, approach with renaming table works great

updating a very large oracle table

I have a very large table people with 60M rows indexed on id, wish to populate a field newid for every record based on a look up table id_conversion (1M rows) which contains id and newid, indexed on id.
when I run
update people p set p.newid=(select l.newid from id_conversion l where l.id=p.id)
it runs for an hour or so and then I get an archive error ora 00257.
Any suggestions for either running update in sections or better sql command?

To avoid writing to Oracle's undo log if your update statement hits every single row of the table then you are likely better off running a create table as select query which will bypass all undo logs, which is likely the issue you're running into as it is logging the impact across 60 million rows. You can then drop the old table and rename the new table to that of the old table's name.
Something like:
create table new_people as
select l.newid,
p.col2,
p.col3,
p.col4,
p.col5
from people p
join id_conversion l
on p.id = l.id;
drop table people;
-- rebuild any constraints and indexes
-- from old people table to new people table
alter table new_people rename to people;
For reference, read some of the tips here: http://www.dba-oracle.com/t_efficient_update_sql_dml_tips.htm
If you are basically creating a new table and not just updating some of the rows of a table it will likely prove the faster method.

I doubt you will be able to get this to run in seconds. Your query, as written, needs to update all 60 million rows.
My first advice is to add an index on id_conversion(id, newid), to make the subquery more efficient. If that doesn't help, then doing the update in batches might be the best way to go.
I should add. Because you are updating all the rows, it might be faster to take the following approach:
Copy the data into a new table with the new values.
Truncate the original table.
Insert the new data into the old table.
Inserts are faster than updates.

In addition to the answers above, which probably will work better in this case, you should know the MERGE statement
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
that is used for updating one table according to another table and is far faster then update according to a select statement

how to select the newly added rows in a table efficiently?

I need to periodically update a local cache with new additions to some DB table. The table rows contain an auto-increment sequential number (SN) field. The cache keeps this number too, so basically I just need to fetch all rows with SN larger than the highest I already have.
SELECT * FROM table where SN > <max_cached_SN>
However, the majority of the attempts will bring no data (I just need to make sure that I have an absolutely up-to-date local copy). So I wander if this will be more efficient:
count = SELECT count(*) from table;
if (count > <cache_size>)
// fetch new rows as above
I suppose that selecting by an indexed numeric field is quite efficient, so I wander whether using count has benefit. On the other hand, this test/update will be done quite frequently and by many clients, so there is a motivation to optimize it.

this test/update will be done quite frequently and by many clients
this could lead to unexpected race competition for cache generation
I would suggest
upon new addition to your table add the newest id into a queue table
using like crontab to trigger the cache generation by checking queue table
upon new cache generated, delete the id from queue table
as you stress majority of the attempts will bring no data, the above will only trigger where there is new addition
and the queue table concept, even can expand for update and delete

I believe that
SELECT * FROM table where SN > <max_cached_SN>
will be faster, because select count(*) may call table scan. Just for clarification, do you never delete rows from this table?

SELECT COUNT(*) may involve a scan (even a full scan), while SELECT ... WHERE SN > constant can effectively use an index by SN, and looking at very few index nodes may suffice. Don't count items if you don't need the exact total, it's expensive.

You don't need to use SELECT COUNT(*)
There is two solution.
You can use a temp table that has one field that contain last count of your table, and create new Trigger after insert on your table and inc temp table field in Trigger.
You can use a temp table that has one field that contain last SN of your table is cached and create new Trigger after insert on your table and update temp table field in Trigger.

not much to this really
drop table if exists foo;
create table foo
(
foo_id int unsigned not null auto_increment primary key
)
engine=innodb;
insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);
select * from foo order by foo_id desc limit 10;
insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);
select * from foo order by foo_id desc limit 10;

Multi Rows Deletion from table in SQL Server

How I can Delete 1.5 Millions Rows From SQL Server 2000, And how much time it will take to complete this task.
I dont want to delete all records from table.... I just want to delete all records which are fullfilling WHERE condition.
EDITED from a comment to an answer below.
"I fire the same query i.e. delete from table_name with Where Clause... Is it possible to Disable Indexing at the running Query, becuase Query is going on from past 20 hr.. Also help me out how i can disable Indexing.."

If (and only if) you want to delete all of the records in a table, you can use DROP TABLE or TRUNCATE TABLE.
DELETE removes one record at a time and records an entry in the transaction log for each deleted row.
TRUNCATE TABLE is much faster because it doesn't record the activity in the transaction log. It removes all rows from a table, but the table structure & its columns, constraints, indexes and so on remain. DROP TABLE would remove those.
Use caution if you decide to TRUNCATE. It's irreversible (unless you have a backup).

create a second table, inserting all rows from the first that you don't want deleting.
delete the first table
rename the second table to be the first
(or a variation on the above)
This can often be quicker than doing a delete of selected records from a big table.

You may want to try deleting in batches too. I just tested this on a table I have and the delete operation went from 13 seconds to 3 seconds.
While Exists(Select * From YourTable Where YourCondition = True)
Delete Top (100000)
From YourTable
Where YourCondition = True
I don't think you can use the TOP predicate if you are running SQL2000, but it works with SQL2005 and up. If you are using SQL2000, then you can use this syntax instead:
Set RowCount 100000
While Exists(Select * From YourTable Where YourCondition = True)
Delete
From YourTable
Where YourCondition = True

DELETE FROM table WHERE a=b;
When deleting that many rows you may want to disable the indexes so they don't get updated on every delete. Rewriting the indexes on every deletion will significantly slow down the whole process.
You'll want to disable these indexes before beginning your deletion or else there may be table locks already in place.
--Disable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn DISABLE
--Enable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn REBUILD
If you wish to remove all entries in a table you can use TRUNCATE.

Does the table you are deleting from have multiple foreign keys, or cascaded deletes or triggers? All of these will impact performance.
Depending on what you want to do and the transactional integrity, can you delete things in small batches e.g. if you are trying to delete 1.5 million records that is 1 years worth of data, can you do it 1 week at a time?

Delete from table where condition for those 1.5 million rows
The time depends.

On Oracle it is also possible to use
truncate table <table>
Not sure if that is standard SQL or available in SQL Server. It will however clear the whole table - but then it is quicker than "delete from " (it will also conduct a commit).

TRUNCATE will also ignore any referential integrity or triggers on the table. DELETE FROM ... WHERE will respect both. The time will depend on the indexing of your condition columns, your hardware, and any additional system load.

The delete SQL is exactly the same as a normal SQL delete
delete from table where [your condition ]
However if your worried about time then I'll assume your question is a little deeper than this. If your table is has a significant number of non-clustered indexes then in some circumstances it may be faster to drop all these indexes first and rebuild after the delete. This is unusual but in cases where your straightforward delete is vulnerable to timeout issues it may be helpful

CREATE TABLE new_table as select <data you want to keep> from old_table;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Truncating part of a postgres table - sql

If you are just trying to delete rows couldn't you just treat your delete as a transaction. Unless your insert is dependent on the existing data in the table, there should be no blocking.

Related

Fastest options for merging two tables in SQL Server

Add column to huge table in Postgresql without downtime

updating a very large oracle table

how to select the newly added rows in a table efficiently?

Multi Rows Deletion from table in SQL Server

Categories

Resources