question: there is a table with over 9000 rows. It must be cleaned but without any locks (table in active using). I tried to use pg_advisory_unlock_all, but no result.
select pg_advisory_unlock_all();
start transaction();
delete from table where id='1';
DELETE 1
start transaction();
delete from table where id='1';
(waiting for finish first transaction)
There is no way to delete data from a table without locking the rows you want to delete.
That shouldn't be a problem as long as concurrent access doesn't try to modify these rows or insert new ones with id = '1', because writers never block readers and vice versa in PostgreSQL.
If concurrent transactions keep modifying the rows you want to delete, that's a little funny (why would you want to delete data you need?). You'd have to wait for the exclusive locks, and you might well run into deadlocks. In that case, it might be best to lock the whole table with the LOCK statement before you start. Deleting from a table that small should then only take a very short time.
Related
I have a table with 372 million rows, I want to delete old rows starting from the first ones without blocking the DB. How can I reach that?
The table have
id | memberid | type | timeStamp | message |
1 123 10 2014-03-26 13:17:02.000 text
UPDATE:
I deleted about 30 GB of Space in DB, but my DISC is ON 6gb space yet..
Any suggestion to get that free space?
Thank you in advance!
select 1;
while(##ROWCOUNT > 0)
begin
WAITFOR DELAY '00:00:10';
delete top(10) from tab where <your-condition>;
end
delete in chunks using above sql
You may want to consider another approach:
Create a table based on the existing one
Adjust the identity column in the empty table to start from the latest value from the old table (if there is any)
Swap the two tables using sp_rename
Copy the records in batches into the new table from the old table
You can do whatever you want with the old table.
BACKUP your database before you start deleting records / play with tables.
the best performance is to query data by id, then:
delete from TABLENAME where id>XXXXX
is the lowest impact you can execute.
You can also divide the operation in suboperations limiting the number of deleted rows for each operation adding ROWCONT declatarion,
example if you want to delete only 5.000.000 of rows per call you can do this:
SET ROWCOUNT=5000000;
delete from TABLENAME where id>XXXXX;
here you can find a reference https://msdn.microsoft.com/it-it/library/ms188774%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396
The answer to the best way to delete rows from an Oracle table is: It
depends! In a perfect world where you can take the table offline for
maintenance, a complete reorganization is always best because it does
the delete and places the table back into a pristine state. We will
address the tools for doing large scale deletes and the appropriate
methods for each environment.
Factors and tools for massive deletes
The choice of the delete methods depends on many factors:
Is the target table partitioned? Partitioning greatly improves delete performance. For example, it is common to have a large time-based table partition and deleting elderly rows from these table can be as simple as dropping the desired partition. See these notes on managing partitioned tables.
Can you reorganize the table after the delete to remove fragmentation?
What percentage of the table will be deleted? In cases where you are deleting more than 30-50% of the rows in a very large table it is faster to use CTAS to delete from a table than to do a vanilla delete and a reorganization of the table blocks and a rebuild of the constraints and indexes.
Do you want to release the space consumed by the deleted rows? If you know that the empty space will be re-used by subsequent DML then you will want to leave the empty space within the table. Conversely, if you want to released the space back onto the tablespace then you will need to reorganize the table.
There are many tools that you can use to delete from large tables:
dbms_metadata.get_ddl: This procedure wil punch-off the definitions of all table indexes and constraints.
dbms_redefinition: This procedure will reorganize a table while it remains available for updating.
Create Table as Select: You can use CTAS to copy a table while removing rows in bulk.
Rename table: If you copy a table when deleting rows you can rename it back to its original name.
COMMIT: In cases where a delete might run for many hours, even the largest UNDO log will not be able to hold the rollback information and it becomes necessary to do the delete in a PL/SQL loop, issuing a COMMIT every zillion-rows to free-up the undo logs. This approach will be re-startable automatically because the delete will pick-up where it left off as on your last commit checkpoint.
More information visit here
So I have a rather large table (150 million rows) that data scrub queries get run on nightly. Now these queries don't update a lot of records, but to get the records needed, that have to query that single table multiple times in sub queries, which takes some time.
So, would it be better for me to do a normal update statement, or would it be better if I put the few results I needed in a temp table, and then just did an update for those few rows, which would greatly reduce the locks during update.
I'm unsure how an update statement locks work when most of the time is spent querying. If it is going to only update 5 records, and runs for half and hour, will it release a record that it updated in the first minute, or does it wait till the end of the query?
Thanks
You need to use (and look into) into the ROWLOCK table hint. You can use it with the update statement while updating in batches of 5000 rows of less. This will attempt to place row locks in the target table (or on index keys, if a covering index is present). If for some reason that fails, the lock will be escalated to a table lock.
From MSDN (as for reasons why lock escalation might occur):
When the Database Engine checks for possible escalations at every 1250
newly acquired locks, a lock escalation will occur if and only if a
Transact-SQL statement has acquired at least 5000 locks on a single
reference of a table. Lock escalation is triggered when a Transact-SQL
statement acquires at least 5,000 locks on a single reference of a
table. For example, lock escalation is not triggered if a statement
acquires 3,000 locks in one index and 3,000 locks in another index of
the same table. Similarly, lock escalation is not triggered if a
statement has a self join on a table, and each reference to the table
only acquires 3,000 locks in the table.
Actually, there's more to read in this last article. You should have a look at mixed lock type escalation section.
I need to get the latest modified time of a table, so came across
select relfilenode from pg_class where relname = 'test';
which gives me the relfilenode id, this seems to be a directory name in
L:\Databases\PostgresSQL\data\base\inodenumber
For which I later extract the latest modified time.
Is this the right way to do this or are there any better methods to do the same
Testing the mtime of the table's relfilenode won't work well. As Eelke noted VACUUM among other operations will modify the timestamp. Hint bit setting will modify the table too, causing it to appear to be "modified" by a SELECT. Additionally, sometimes a table has more than one fork to its on-disk relation (1GB chunks), and you'd have to check all of them to find the most recent.
If you want to keep a last modified time for a table, add an AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ... FOR EACH STATEMENT trigger that updates a timestamp row in a table you use for tracking modification times.
The downside of the trigger is that it'll contest a single row lock on the table, so it'll serialize all your transactions. It'll also greatly increase the chance of getting deadlocks. What you really want is probably something nontransactional that doesn't have to roll back when the transaction does, where if multiple transactions update a counter the highest value wins. There's nothing like that built in, though it might not be too hard as a C extension.
A slightly more sophisticated option would be to create a trigger that uses dblink to do an update of the last-updated counter. That'll avoid most of the contention problems but it'll actually make deadlocking worse because PostgreSQL's deadlock detection won't be able to "see" the fact that the two sessions are deadlocked via an intermediary. You'd need a way to SELECT ... FOR UPDATE with a timeout to make it reliable without aborting transactions too often.
In any case, a trigger won't catch DDL, though. DDL triggers ("Event triggers") are coming in Pg 9.3.
See also:
How do I find the last time that a PostgreSQL database has been updated?
How to get 'last modified time' of the table in postgres?
I don't think that would be completly reliable as a vacuum would also modify the file(s) containing the table but the logical content of the table does not change during a vacuum.
You could create triggers for INSERT, UPDATE and DELETE that maintain the last modified timestamp for each table in another table. This method has a slight performance impact but will provide accurate information.
I have a table that can grew to millions records (50 millions for example). On each 20 minutes records that are older than 20 minutes are deleted.
The problems is that if the table has so many records such deletion can take a lot of time and I want to make it faster.
I can not do "truncate table" because I want to remove only records that are older than 20 minutes. I suppose that when doing the "delete" and filtering the information that need to be delete, the server is creating log file or something and this take much time?
Am I right? Is there a way to stop any flag or option to optimize the delete, and then to turn on the stopped option?
To expand on the batch delete suggestion, i'd suggest you do this far more regularly (every 20 seconds perhaps) - batch deletions are easy:
WHILE 1 = 1
BEGIN
DELETE TOP ( 4000 )
FROM YOURTABLE
WHERE YourIndexedDateColumn < DATEADD(MINUTE, -20, GETDATE())
IF ##ROWCOUNT = 0
BREAK
END
Your inserts may lag slightly whilst they wait for the locks to release but they should insert rather than error.
In regards to your table though, a table with this much traffic i'd expect to see on a very fast raid 10 array / perhaps even partitioned - are your disks up to it? Are your transaction logs on different disks to your data files? - they should be
EDIT 1 - Response to your comment
TO put a database into SIMPLE recovery:
ALTER DATABASE Database Name SET RECOVERY='SIMPLE'
This basically turns off transaction logging on the given database. Meaning in the event of data loss you would need loose all data since your last full backup. If you're OK with that, well this should save a lot of time when running large transactions. (NOTE that as the transaction is running, the logging still takes place in SIMPLE - to enable the rolling back of the transaction).
If there are tables within your database where you cant afford to loose data you'll need to leave your database in FULL recovery mode (i.e. any transaction gets logged (and hopefully flushed to *.trn files by your servers maintenance plans). As i stated in my question though, there is nothing stopping you having two databases, 1 in FULL and 1 in SIMPLE. the FULL database would be fore tables where you cant afford to loose any data (i.e. you could apply the transaction logs to restore data to a specific time) and the SIMPLE database would be for these massive high-traffic tables that you can allow data loss on in the event of a failure.
All of this is relevant assuming your creating full (*.bak) files every night & flushing your log files to *.trn files every half hour or so).
In regards to your index question, it's imperative your date column is indexed, if you check your execution plan and see any "TABLE SCAN" - that would be an indicator of a missing index.
Your date column i presume is DATETIME with a constraint setting the DEFAULT to getdate()?
You may find that you get better performance by replacing that with a BIGINT YYYYMMDDHHMMSS and then apply a CLUSTERED index to that column - note however that you can only have 1 clustered index per table, so if that table already has one you'll need to use a Non-Clustered index. (in case you didnt know, a clustered index basically tells SQL to store the information in that order, meaning that when you delete rows > 20 minutes SQL can literally delete stuff sequentially rather than hopping from page to page.
The log problem is probably due to the number of records deleted in the trasaction, to make things worse the engine may be requesting a lock per record (or by page wich is not so bad)
The one big thing here is how you determine the records to be deleted, i'm assuming you use a datetime field, if so make sure you have an index on the column otherwise it's a sequential scan of the table that will really penalize your process.
There are two things you may do depending of the concurrency of users an the time of the delete
If you can guarantee that no one is going to read or write when you delete, you can lock the table in exclusive mode and delete (this takes only one lock from the engine) and release the lock
You can use batch deletes, you would make a script with a cursor that provides the rows you want to delete, and you begin transtaction and commit every X records (ideally 5000), so you can keep the transactions shorts and not take that many locks
Take a look at the query plan for the delete process, and see what it shows, a sequential scan of a big table its never good.
Unfortunately for the purpose of this question and fortunately for the sake of consistency and recoverability of the databases in SQL server, putting a database into Simple recovery mode DOES NOT disable logging.
Every transaction still gets logged before committing it to the data file(s), the only difference would be that the space in the log would get released (in most cases) right after the transaction is either rolled back or committed in the Simple recovery mode, but this is not going to affect the performance of the DELETE statement in one way or another.
I had a similar problem when I needed to delete more than 70% of the rows from a big table with 3 indexes and a lot of foreign keys.
For this scenario, I saved the rows I wanted in a temp table, truncated the original table and reinserted the rows, something like:
SELECT * INTO #tempuser FROM [User] WHERE [Status] >= 600;
TRUNCATE TABLE [User];
INSERT [User] SELECT * FROM #tempuser;
I learned this technique with this link that explains:
DELETE is a a fully logged operation , and can be rolled back if something goes wrong
TRUNCATE Removes all rows from a table without logging the individual row deletions
In the article you can explore other strategies to resolve the delay in deleting many records, that one worked to me
As I had raised a question here at Please recommend the best bulk-delete option, CASCADE constraint is the one that prevents me to delete the records in all the tables when they were loaded with bulk records.
Is there any reason for why CASCADE takes time when DELETE FROM table1; Or TRUNCATE table1 CASCADE is attempted?
FYI, I'm using PostgreSQL 8.1.4. Though outdated, when I remove CASCADE constraint in my tables (listed in the top link), both DELETE and TRUNCATE queries work fine.
However, CASCADE is what I needed! I can't just remove the constraint. Please help me on this.
A common mistake is a missing index on the column of the foreign key. When deleting one row from the referenced table, all refering rows have to be found. Witout an index each row will lead to a SLOW sequential scan. With an index - easy and fast.
Perhaps this is your problem.
Using cascade delete is a very poor idea! You have now discovered why. It siomply takes too long if large numbers of records are deleted. You should correctly delete by starting with the child records first. If you are deleting a large number of records, you may need to write a script to delete in batches to avoid locking and taking too long for one command.
Let me explain why it gets slower. Suppose you want to delete 1000 records from the parent table, called TableA. There are three child tables involved. TableB averages 10 records per parent record. TableC averages 5 records per parent record. TableD averages 100 records per parant record. So your delete of 1000 records in Table A actually involves deleting 115000 records. Now suppose you were deleting 10,000 records from tableA, now your cascade delete will delete 1,150,000 records. Now in most databases, a parent table could have considerably more than three related tables (we have one with over 100 FKS). If we were to allow a cascade delete on our databases and someone tried to deleted 1000 records, they would end up deleting hundreds of millions of records.
ON CASCADE DELETE on small operations, but it performs poorly on large ones. To understand why we have to look at what is going on behind the scenes: On PostgreSQL we use triggers.
So if we delete from the parent table, for every row we delete, it goes and deletes on the child table as well. This happens for each row deleted. Now, note, that sequential scans are relatively cheap in PostgreSQL so you may be forcing a large number of index scans when a single sequential scan would be a lot faster.
So suppose on table 1 we are deleting 1000 records, and this means on table 2 we are deleting 10000 records. If we do this right, we go and we delete from table 2, performing a single scan to do it. Might take a few seconds on good hardware. Then we go and delete from the parent record and this is fast. Good, right?
Now suppose we rely on triggers to do the deletion.....
Scan through table 1, for each of 1000 rows we delete, scan through table 2's index, delete 10 rows, go to next. We entirely lose any help we could get from the OS's prefetch routines, and we substitute a lot of redundant, random page reads for a much smaller number of sequential reads. Now we are spending a lot of time waiting for disk platters to turn and heads to move. Ouch......
ON DELETE CASCADE triggers have their place. They work fine if we are just deleting from a few records. But they fall apart very fast on bulk deletions. Wrap all your deletions in a transaction, and delete from child tables first, and it will be far faster.