sql multithreading application select and delete from a table - sql

I am developing a multi-thread application (could be considered as a client-server) which processes data. The below is the high level description of the application.
there is a table (with no key and Id field) with many rows in our database server. I have several systems (threads) which read (select) some rows (fixed number of rows) from the table and process them and remove (delete) those rows from the table.
I am looking for a solution for removing (deleting) data without using a temp table; but any ideas with temp storage are welcome.
P.S: By using locks and a temp table, I solved the reading process but I need help on deleting part.
P.S2: One possible solution that Jean said is not removing rows physically from the table. This idea is great but I forgot to mention that this table must be empty after a specific period of time and by using the solution I need to have a system which deletes all the marked rows at the end (which is not possible)

Related

avoiding write conflicts while re-sorting a table

I have a large table that I need to re-sort periodically. I am partly basing this on a suggestion I was given to stay away from using cluster keys since I am inserting data ordered differently (by time) from how I need it clustered (by ID), and that can cause re-clustering to get a little out of control.
Since I am writing to the table on a hourly I am wary of causing problems with these two processes conflicting: If I CTAS to a newly sorted temp table and then swap the table name it seems like I am opening the door to have a write on the source table not make it to the temp table.
I figure I can trigger a flag when I am re-sorting that causes the ETL to pause writing, but that seems a bit hacky and maybe fragile.
I was considering leveraging locking and transactions, but this doesn't seem to be the right use case for this since I don't think I'd be locking the table I am copying from while I write to a new table. Any advice on how to approach this?
I've asked some clarifying questions in the comments regarding the clustering that you are avoiding, but in regards to your sort, have you considered creating a nice 4XL warehouse and leveraging the INSERT OVERWRITE option back into itself? It'd look something like:
INSERT OVERWRITE INTO table SELECT * FROM table ORDER BY id;
Assuming that your table isn't hundreds of TB in size, this will complete rather quickly (inside an hour, I would guess), and any inserts into the table during that period will queue up and wait for it to finish.
There are some reasons to avoid the automatic reclustering, but they're basically all the same reasons why you shouldn't set up a job to re-cluster frequently. You're making the database do all the same work, but without the built in management of it.
If your table is big enough that you are seeing performance issues with the clustering by time, and you know that the ID column is the main way that this table is filtered (in JOINs and WHERE clauses) then this is probably a good candidate for automatic clustering.
So I would recommend at least testing out a cluster key on the ID and then monitoring/comparing performance.
To give a brief answer to the question about resorting without conflicts as written:
I might recommend using a time column to re-sort records older than a given time (probably in a separate table). While it's sorting, you may get some new records. But you will be able to use that time column to marry up those new records with the, now sorted, older records.
You might consider revoking INSERT, UPDATE, DELETE privileges on the original table within the same script or procedure that performs the CTAS creating the newly sorted copy of the table. After a successful swap you can re-enable the privileges for the roles that are used to perform updates.

MS Access: move deleted records automatically to "recycle bin" table

I need to implement a copy of records right after they are deleted from the table, so they can be recovered in case of accidental deletion.
I am using MS Access. Is there any built in way to do it or will I have to INSERT INTO SELECT before every DELETE?
Doing it for just one table is not a concern. I want to use something ready for any table regardless of its structure, so I don't need to create and configure another recycle-bin-table for every table I have in the database, which would be necessary if I want successful move operations.
Besides SQL, I can run VBA to accomplish this task.
EDIT
There are recommendations of adding a boolean column that indicates if the record is to be displayed or is archived (has the meaning of "deleted" for my purposes), but this involves changing every table and every query I have done, so it won't fit for me, only as a last resort.
What happens when you have cascading deletions, as in all good designed databases? Also your INSERT in a backup table before DELETE will not solve all the issues you will face. Also copying table can result in a lot of copies that will increase your database size and you will have soon or later to clean your data.
Journaling can be better solutions?

Best approach to optimize a record history database

I have a database that keeps record history. For each update to a record, the system will "deactivate" the previous record (along with all it's children), by setting the "Status" column to "0".
Now it's not a problem yet...but eventually this system is going to have a lot of records, and history is more important than speed right now. But the more records inserted, the slower searches become.
What is the best approach to archive the records? I've had suggestions to create a cloned archive database to hold the data. I've also had the idea to storing all previous records into a xml file, that can be read / loaded later if we need to dig up archived records.
You could create a separate partition containing only the active record if your DBMS supports it. You can also add an index to Status so that the select ... from tbl where status=1 isn't incredibly slow.
http://msdn.microsoft.com/en-us/library/ms187802.aspx

Delete large portion of huge tables

I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database software is MS SQL 2005. There are several indexes and statistics on the table but not external relationships.
The best solution I came up with, so far, is to put the database into "simple" recovery mode, copy all the records I want to keep to a temporary table, truncate the original table, set identity insert to on and copy back the data from the temp table.
It works but it's still taking several hours to complete. Is there a faster way to do this ?
As per the comments my suggestion would be to simply dispense with the copy back step and promote the table containing records to be kept to become the new main table by renaming it.
It should be quite straightforward to script out the index/statistics creation to be applied to the new table before it gets swapped in.
The clustered index should be created before the non clustered indexes.
A couple of points I'm not sure about though.
Whether it would be quicker to insert into a heap then create the clustered index afterwards. (I guess no if the insert can be done in clustered index order)
Whether the original table should be truncated before being dropped (I guess yes)
#uriDium -- Chunking using batches of 50,000 will escalate to a table lock, unless you have disabled lock escalation via alter table (sql2k8) or other various locking tricks.
I am not sure what the structure of your data is. When does a row become eligible for deletion? If it is a purely ID based on date based thing then you can create a new table for each day, insert your new data into the new tables and when it comes to cleaning simply drop the required tables. Then for any selects construct a view over all the tables. Just an idea.
EDIT: (In response to comments)
If you are maintaining a view over all the tables then no it won't be complicated at all. The complex part is coding the dropping and recreating of the view.
I am assuming that you don't want you data to be locked down too much during deletes. Why not chunk the delete operations. Created a SP that will delete the data in chunks, 50 000 rows at a time. This should make sure that SQL Server keeps a row lock instead of a table lock. Use the
WAITFOR DELAY 'x'
In your while loop so that you can give other queries a bit of breathing room. Your problem is the old age computer science, space vs time.

SQL Server 2005 efficient delete

My issue at hand is that I need to remove about 60M records from a table without causing deadlocks with other processes that use this table. At this point I'm almost done removing the records using a while loop that only processes about 1M records at a time however it's taken all day.
Q1: What is the optimal way to remove large quantities of data from a table, keeping the table online and minimal impact to other resources that need to use this table in MS SQL Server 2005?
Q2: Is there a way to implement the individual row locking (rather than table locking) in SQL Server like they have in Oracle? (Note answering this may answer Q1).
A2: So as #Remus Rusanu informed me there is a way to do row level locking with a delete.
See this thread, the original poster actually did some tests and posted the most efficient method. An MVP did originally chime in with an option to actually insert the data you want to retain into a temp table and then truncate the original table and reinsert.
I just did something like that recently. I simply created a SQL Server job that ran every 10 mins deleting a million rows. Code follows
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DELETE TOP 1000000 FROM BIG_TABLE WHERE CreatedDate <= '20080630'
Said table in question had about 900 mil rows to start with. Didn't notice any significant performance issues.
The most efficient way is to use partition switching, see Transferring Data Efficiently by Using Partition Switching. The drawback is that it requires planning ahead in how the partitions are deployed.
If partition switching is not available then the answer depends on the actual table schema. You better post the actual schema (including all indexes and most importantly the clustered key definiton) and the criteria that qualifies the delete candidates.
As for Q2, SQL Server had row level locking since mid 90s, I don't know what you're actually asking.