Make a delete statement delete rows as soon as it find it, rather than keeping the table static until delete is finished - sql

I'm wondering if there is a way to get a delete statement to remove rows as it is traversing a table. So where now a delete statement will find all the appropriate rows to delete and then delete them all once it has found them, I want it to find a row that meets the criteria for deletion and remove it immediately then continue, comparing the next rows with the new table that has entries removed.
I think this could be accomplished in a loop...maybe? But I feel like it would be horribly inefficient. Possibly something like, it will look for a row to delete, then once it finds a single row, it will delete, stop, and then go through for deletion again on the new table.
Any ideas?

A set-oriented environment like SQL usually requires this kind of thing to happen "all at once".
You might be able to use a SQL DELETE statement within a transaction to delete a single row, with that transaction wrapped in a stored procedure to handle the logic, but that would be kind of like kicking dead whales down the beach.
You need the transaction (a committed transaction, maybe a serializable transaction) to reliably "free up" values, and to reliably handle concurrency and race conditions.

Related

postgresql: delete all locks

question: there is a table with over 9000 rows. It must be cleaned but without any locks (table in active using). I tried to use pg_advisory_unlock_all, but no result.
select pg_advisory_unlock_all();
start transaction();
delete from table where id='1';
DELETE 1
start transaction();
delete from table where id='1';
(waiting for finish first transaction)
There is no way to delete data from a table without locking the rows you want to delete.
That shouldn't be a problem as long as concurrent access doesn't try to modify these rows or insert new ones with id = '1', because writers never block readers and vice versa in PostgreSQL.
If concurrent transactions keep modifying the rows you want to delete, that's a little funny (why would you want to delete data you need?). You'd have to wait for the exclusive locks, and you might well run into deadlocks. In that case, it might be best to lock the whole table with the LOCK statement before you start. Deleting from a table that small should then only take a very short time.

SQL unique field: concurrency bugs? [duplicate]

This question already has answers here:
Only inserting a row if it's not already there
(7 answers)
Closed 9 years ago.
I have a DB table with a field that must be unique. Let's say the table is called "Table1" and the unique field is called "Field1".
I plan on implementing this by performing a SELECT to see if any Table1 records exist where Field1 = #valueForField1, and only updating or inserting if no such records exist.
The problem is, how do I know there isn't a race condition here? If two users both click Save on the form that writes to Table1 (at almost the exact same time), and they have identical values for Field1, isn't it possible that the following would happen?
User1 makes a SQL call, which performs the select operation and determines there are no existing records where Field1 = #valueForField1. User1's process is preempted by User2's process, which also finds no records where Field1 = #valueForField1, and performs an insert. User1's process is allowed to run again, and inserts a second record where Field1 = #valueForField1, violating the requirement that Field1 be unique.
How can I prevent this? I'm told that transactions are atomic, but then why do we need table locks too? I've never used a lock before and I don't know whether or not I need one in this case. What happens if a process tries to write to a locked table? Will it block and try again?
I'm using MS SQL 2008R2.
Add a unique constraint on the field. That way you won't have to SELECT. You will only have to insert. The first user will succeed the second will fail.
On top of that you may make the field autoincremented, so you won't have to care on filling it, or you may add a default value, again not caring on filling it.
Some options would be an autoincremented INT field, or a unique identifier.
You can add a add a unique constraint. Example from http://www.w3schools.com/sql/sql_unique.asp:
CREATE TABLE Persons
(
P_Id int NOT NULL UNIQUE
)
EDIT: Please also read Martin Smith's comment below.
jyparask has a good answer on how you can tackle this specific problem. However, I would like to elaborate on your confusion over locks, transactions, blocking, and retries. For the sake of simplicity, I'm going to assume transaction isolation level serializable.
Transactions are atomic. The database guarantees that if you have two transactions, then all operations in one transaction occur completely before the next one starts, no matter what kind of race conditions there are. Even if two users access the same row at the same time (multiple cores), there is no chance of a race condition, because the database will ensure that one of them will fail.
How does the database do this? With locks. When you select a row, SQL Server will lock the row, so that all other clients will block when requesting that row. Block means that their query is paused until that row is unlocked.
The database actually has a couple of things it can lock. It can lock the row, or the table, or somewhere in between. The database decides what it thinks is best, and it's usually pretty good at it.
There is never any retrying. The database will never retry a query for you. You need to explicitly tell it to retry a query. The reason is because the correct behavior is hard to define. Should a query retry with the exact same parameters? Or should something be modified? Is it still safe to retry the query? It's much safer for the database to simply throw an exception and let you handle it.
Let's address your example. Assuming you use transactions correctly and do the right query (Martin Smith linked to a few good solutions), then the database will create the right locks so that the race condition disappears. One user will succeed, and the other will fail. In this case, there is no blocking, and no retrying.
In the general case with transactions, however, there will be blocking, and you get to implement the retrying.

How to attempt to delete records without terminating on error

I have a table that is used in several other tables as a foreign key. If the table is referenced in only one specific table, I want the delete to be allowed and to cascade on delete. However, if there exists references in other tables, the delete should fail.
I want to test the referntial integrity of this with my data set by attempting to delete every record. No records should be deleted except for the last one. However, when I attempt to delete every record, it errors (as expected) and terminates the rest of the statement.
How can I write a script that attempts to delete every record in a table and not terminate the statement on the first error?
Kind Regards,
ribald
EDIT:
The reason I would want to do something like this is because the business users have added a lot of duplicate data (ie: Search for someone and click the "Add As New" instead of the "Select"). Now we may have 10 people out there that only have a name and no relation to the other tables. I hope this clearifies any confusion.
I played around with different ideas. Here is the most straight forward way. However, it is pretty costly. Again, this is to attempt to delete unused duplicate data. 1,000 records took 8 minutes. Can anyone think of a more effecient way to do this?
DECLARE #DeletedID Int
DECLARE ItemsToDelete SCROLL CurSor For
SELECT ID FROM ParentTable
Open ItemsToDelete
FETCH NEXT FROM ItemsToDelete INTO #DeletedID
While ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
--ATTEMPT TO DELETE
DELETE FROM ParentTable WHERE ID = #DeletedID;
END TRY
BEGIN CATCH
--DO NOTHING
END CATCH
--FETCH NEXT ROW
FETCH NEXT FROM ItemsToDelete INTO #DeletedID
END
Close ItemsToDelete
Deallocate ItemsToDelete
Take this with a grain of salt: I'm not actually a DBA, and have never worked with SQL Server. However:
You're actually up against two different rules here:
Referential constraints
Business-specific (I'm assuming) 'delete-allowed' rules.
It sounds like when the referential constraints (in the children tables) were setup, they were created with the option RESTRICT or NO ACTION. This means that attempting to delete from the parent table will, when children rows are present, cause the failure. But the catch is that you want, on a specific table, to allow deletes, and to propogate them (option CASCADE). So, for only those tables where the delete should be propogated, alter the referential constraint to use CASCADE. Otherwise, prevent the delete with the (already present) error.
As for dealing with the 'exceptions' that crop up... here are some ways to deal with them:
Predict them. Write your delete in such a fashion as to not delete something if the key is referenced in a set of the children tables. This is obviously not maintainable in the long term, or for a heavily-referenced table.
Catch the exception. I doubt these can be caught 'in-script' (especially if running a single .txt-type file), so you're probably going to have to at least write a stored procedure, or run from a higher-level language. The second problem here is that you can't 'skip' the error and 'continue' the delete (that I'm aware of) - SQL is going to keep failing you every time (...on what amounts to a random row). You'd have to loop through the entire file, by line (or possibly set of lines, dynamically), to be able to ensure the record delete. For a small file (relatively), this may work.
Write a (set) of dynamic statements that uses the information schema tables and lookups to exclude ids included in the 'non-deletable' tables. Interesting in concept, but potentially difficult/expensive.
Well, each statement in SQL is considered as transaction, therefore, if you try to delete several records, and you hit a error considering integrity rules, each and every change you made so far will be rolled back. The thing you might do is to write query which will delete data from referencing table (table which has value from foreign key (T_child table in your case) first, and then delete data from T_parent table.
In total: First check T_child table for records which you want to delete, and then delete records from T_parent table, to avoid transaction failure.
Hope this helps.
(Correct me if I'm wrong)
There are a few options here. There is a bit of ambiguity in your question, so I want to re-iterate your use case first (and I'll answer your question as I understand it).
Use Case
Database consists of several tables T_Parent, T_Child1, T_Child2, T_Child3. A complete set of data would have records in all 4 tables. Given business requirements, we often end up with partial data that needs to be removed at a later time. For example, T_parent and T_Child2 may get data, but not TC1 and TC3.
I need to be able to check for partial data and if found remove all the partial data (T_Parent and T_Child 2 in my example, but it could be TP and TC3, or other combinations).
#ribald - Is my understanding correct?
Comment on my understanding and I'll write out an answer. If the comments aren't long or clear enough, just edit your question.
you said "cascade" in terms of delete (which means a very specific thing in SQL server), but in your later description it sounds more like you want to delete all the partial data.
"Cascading" is something that is available, but not really something you turn on and off based on conditions of some data.
when you said "dataset" you didn't mean an ADO.NET Dataset, you just meant test data.
I assume you aren't looking for a good way to test, you just want to be sure you have data integrity.

Are Transactions Always Atomic?

I'm trying to better understand a nuance of SQL Server transactions.
Say I have a query that updates 1,000 existing rows, updating one of the columns to have the values 1 through 1,000. It's possible to execute this query and, when completed, those rows would not be numbered sequentially. This is because it's possible for another query to modify one of those rows before my query finishes.
On the other hand, if I wrap those updates in a transaction, that guarantees that if any one update fails, I can fail all updates. But does it also mean that those rows would be guaranteed to be sequential when I'm done?
In other words, are transactions always atomic?
But does it also mean that those rows would be guaranteed to be sequential when I'm done?
No. This has nothing to do with transactions, because what you're asking for simply doesn't exists: relational tables have no order an asking for 'sequential rows' is the wrong question to ask. You can rephrase the question as 'will the 1000 updated rows contain the entire sequence from 1 to 1000, w/o gaps' ? Most likely yes, but the truth of the matter is that there could be gaps depending on the way you do the updates. Those gaps would not appear because updated rows are modified after the update before commit, but because the update will be a no-op (will not update any row) which is a common problem of read-modify-write back type of updates ( the row 'vanishes' between the read and the write-back due to concurrent operations).
To answer your question more precisely whether your code is correct or not you have to post the exact code you're doing the update with, as well as the exact table structure, including all indexes.
Atomic means the operation(s) within the transaction with either occur, or they don't.
If one of the 1,000 statements fails, none of the operations within the transaction will commit. The smaller the sample of statements within a transaction -- say 100 -- means that the blocks of 100 leading up to the error (say at the 501st) can be committed (the first 400; the 500 block won't, and the 600+ blocks will).
But does it also mean that those rows would be guaranteed to be sequential when I'm done?
You'll have to provide more context about what you're doing in a transaction to be "sequential".
The 2 points are unrelated
Sequential
If you insert values 1 to 1000, it will be sequential with an WHERE and ORDER BY to limit you to these 1000 rows in some column. Unless there are duplicates, so you'd need a unique constraint
If you rely on an IDENTITY, it isn't guaranteed: Do Inserted Records Always Receive Contiguous Identity Values.
Atomicity
All transactions are atomic:
Is neccessary to encapsulate a single merge statement (with insert, delete and update) in a transaction?
SQL Server and connection loss in the middle of a transaction
Does it delete partially if execute a delete statement without transaction?
SQL transactions, like transactions on all database platforms, put the data in isolation to cover the entire ACID acronym (atomic, consistent, isolated and durable). So the answer is yes.
A transaction guarantees atomicity. That is the point.
You problem is that after you do the insert, they are only "Sequential" until the next thing comes along and touches one of the new records.
If another step in you process requires them to still be sequential then that step, too, needs to be within your original transaction.

Postgresql Concurrency

In a project that I'm working, there's a table with a "on update" trigger, that monitors if a boolean column has changed (ex.: false -> true = do some action). But this action can only be done once for a row.
There will be multiple clients accessing the database, so I can suppose that eventually, multiple clients will try to update the same row column in parallel.
Does the "update" trigger itself handle the concurrency itself, or I need to do it in a transaction and manually lock the table?
Triggers don't handle concurrency, and PostgreSQL should do the right thing whether or not you use explicit transactions.
PostgreSQL uses optimistic locking which means the first person to actually update the row gets a lock on that row. If a second person tries to update the row, their update statement waits to see if the first commits their change or rolls back.
If the first person commits, the second person gets an error, rather than their change going through and obliterating a change that might have been interesting to them.
If the first person rolls back, the second person's update un-blocks, and goes through normally, because now it's not going to overwrite anything.
The second person can also use the NOWAIT option, which makes the error happen immediately instead of blocking, if their update conflicts with an unresolved change.