MS Access: move deleted records automatically to "recycle bin" table - vba

I need to implement a copy of records right after they are deleted from the table, so they can be recovered in case of accidental deletion.
I am using MS Access. Is there any built in way to do it or will I have to INSERT INTO SELECT before every DELETE?
Doing it for just one table is not a concern. I want to use something ready for any table regardless of its structure, so I don't need to create and configure another recycle-bin-table for every table I have in the database, which would be necessary if I want successful move operations.
Besides SQL, I can run VBA to accomplish this task.
EDIT
There are recommendations of adding a boolean column that indicates if the record is to be displayed or is archived (has the meaning of "deleted" for my purposes), but this involves changing every table and every query I have done, so it won't fit for me, only as a last resort.

What happens when you have cascading deletions, as in all good designed databases? Also your INSERT in a backup table before DELETE will not solve all the issues you will face. Also copying table can result in a lot of copies that will increase your database size and you will have soon or later to clean your data.
Journaling can be better solutions?

Related

avoiding write conflicts while re-sorting a table

I have a large table that I need to re-sort periodically. I am partly basing this on a suggestion I was given to stay away from using cluster keys since I am inserting data ordered differently (by time) from how I need it clustered (by ID), and that can cause re-clustering to get a little out of control.
Since I am writing to the table on a hourly I am wary of causing problems with these two processes conflicting: If I CTAS to a newly sorted temp table and then swap the table name it seems like I am opening the door to have a write on the source table not make it to the temp table.
I figure I can trigger a flag when I am re-sorting that causes the ETL to pause writing, but that seems a bit hacky and maybe fragile.
I was considering leveraging locking and transactions, but this doesn't seem to be the right use case for this since I don't think I'd be locking the table I am copying from while I write to a new table. Any advice on how to approach this?
I've asked some clarifying questions in the comments regarding the clustering that you are avoiding, but in regards to your sort, have you considered creating a nice 4XL warehouse and leveraging the INSERT OVERWRITE option back into itself? It'd look something like:
INSERT OVERWRITE INTO table SELECT * FROM table ORDER BY id;
Assuming that your table isn't hundreds of TB in size, this will complete rather quickly (inside an hour, I would guess), and any inserts into the table during that period will queue up and wait for it to finish.
There are some reasons to avoid the automatic reclustering, but they're basically all the same reasons why you shouldn't set up a job to re-cluster frequently. You're making the database do all the same work, but without the built in management of it.
If your table is big enough that you are seeing performance issues with the clustering by time, and you know that the ID column is the main way that this table is filtered (in JOINs and WHERE clauses) then this is probably a good candidate for automatic clustering.
So I would recommend at least testing out a cluster key on the ID and then monitoring/comparing performance.
To give a brief answer to the question about resorting without conflicts as written:
I might recommend using a time column to re-sort records older than a given time (probably in a separate table). While it's sorting, you may get some new records. But you will be able to use that time column to marry up those new records with the, now sorted, older records.
You might consider revoking INSERT, UPDATE, DELETE privileges on the original table within the same script or procedure that performs the CTAS creating the newly sorted copy of the table. After a successful swap you can re-enable the privileges for the roles that are used to perform updates.

Truncate and insert new content into table with the least amount of interruption

Twice a day, I run a heavy query and save the results (40MBs worth of rows) to a table.
I truncate this results table before inserting the new results such that it only ever has the latest query's results in it.
The problem, is that while the update to the table is written, there is technically no data and/or a lock. When that is the case, anyone interacting with the site could experience an interruption. I haven't experienced this yet, but I am looking to mitigate this in the future.
What is the best way to remedy this? Is it proper to write the new results to a table named results_pending, then drop the results table and rename results_pending to results?
Two methods come to mind. One is to swap partitions for the table. To be honest, I haven't done this in SQL Server, but it should work at a low level.
I would normally have all access go through a view. Then, I would create the new day's data in a separate table -- and change the view to point to the new table. The view change is close to "atomic". Well, actually, there is a small period of time when the view might not be available.
Then, at your leisure you can drop the old version of the table.
TRUNCATE is a DDL operation which causes problems like this. If you are using snapshot isolation with row versioning and want users to either see the old or new data then use a single transaction to DELETE the old records and INSERT the new data.
Another option if a lot of the data doesn't actually change is to UPDATE / INSERT / DELETE only those records that need it and leave unchanged records alone.

How to perform data archive in SQL Server with many tables?

Let's say I have a database with many tables in it. I want to perform data archiving on certain tables, that is create a same table with same structures (same constraint, indexes, columns, triggers, etc) as a new table and insert specific data into the new table from the old table.
Example, current table has data from 2008-2017 and I want to move only data from 2010-2017 into the new table. Then after that, I can delete the old table and rename the new table with naming conventions similar to old table.
How should I approach this?
For the sort of clone-rename-drop logic you're talking about, the basics are pretty straight forward. Really the only time this is a good idea is if you have a table with a large amount of data, which you can't afford down time or blocking on, and you only plan to do this one. The process looks something like this:
Insert all the data from your original table into the clone table
In a single transaction, sp_rename the original table from (for example) myTable to myTable_OLD (just something to distinguish it from the real table). Then sp_rename the clone table from (for example) myTable_CLONE to myTable
Drop myTable_OLD when you're happy everything has worked how you want. If it didn't work how you want, just sp_rename the objects back.
Couple considerations to think about if you go that route
Identity columns: If your table has any identities on it, you'll have to use identity_insert on then reseed the identity to pick up at where the old identity left off
Do you have the luxury of blocking the table while you do this? Generally if you need to do this sort of thing, the answer is no. What I find works well is to insert all the rows I need using (nolock), or however you need to do it so the impact of the select from the original table is mitigated. Then, after I've moved 99% of the data, I will then open a transaction, block the original table, insert just the new data that's come in since the bulk of the data movement, then do the sp_rename stuff
That way you don't lock anything for the bulk of the data movement, and you only block the table for the very last bit of data that came into the original table between your original insert and your sp_rename
How you determine what's come in "since you started" will depend on how your table is structured. If you have an identity or a datestamp column, you can probably just pick rows which came in after the max of those fields you moved over. If your table does NOT have something you can easily hook into, you might need to get creative.
Alternatives
A couple other alternatives that came to mind:
Table Partitioning:
This shards a single table across multiple partitions (which can be managed sort of like individual tables). You can, say, partition you data by year, then when you want to purge the trailing year of data, you "switch out" that partition to a special table which you can then truncate. All those operations are meta-data only, so they're super fast. This also works really well for huge amounts of data where deletes and all their pesky transaction logging aren't feasible
The downside to table partitioning is it's kind of a pain to set up and manage.
Batched Deletes:
If you're data isn't too big, you could just do batched deletes on the trailing end of your data. If you can find a way to get clustered index seeks for your deletes, they should be reasonably lightweight. As long as you're not accumulating data faster than you can get rid of it, the benefit of this kind of thing is you just run it semi-continuously and it just nibbles away at the trailing end of your data
Snapshot Isolation:
If deletes cause too much blocking, you can also set up something like snapshot isolation, which basically stores historical versions of rows in tempdb. Any query which sets isolation level read committed snapshot will then read those pre-change rows instead of contend for locks on the "real" table. You can then do batched deletes to your hearts content and know that any queries that hit the table will never get blocked by a delete (or any other DML operation) because they'll either read the pre-delete snapshot, or they'll read the post-delete snapshot. They won't wait for an in-process delete to figure out whether it's going to commit or rollback. This is not without its drawbacks as well unfortunately. For large data sets, it can put a big burden on tempdb and it too can be a little bit of a black box. It's also going to require buy-in from your DBAs.

Is there a more elegant way to detect changes in a large SQL table without altering it? [duplicate]

This question already has answers here:
How can I get a hash of an entire table in postgresql?
(7 answers)
Closed 9 years ago.
Suppose you have a reasonably large (for local definitions of “large”), but relatively stable table.
Right now, I want to take a checksum of some kind (any kind) of the contents of the entire table.
The naïve approach might be to walk the entire table, taking the checksum (say, MD5) of the concatenation of every column on each row, and then perhaps concatenate them and take its MD5sum.
From the client side, that might be optimized a little by progressively appending columns' values into the MD5 sum routine, progressively mutating the value.
The reason for this, is that at some point in future, we want to re-connect to the database, and ensure that no other users may have mutated the table: that includes INSERT, UPDATE, and DELETE.
Is there a nicer way to determine if any change/s have occurred to a particular table? Or a more efficient/faster way?
Update/clarification:
We are not able/permitted to make any alterations to the table itself (e.g. adding a “last-updated-at” column or triggers or so forth)
(This is for Postgres, if it helps. I'd prefer to avoid poking transaction journals or anything like that, but if there's a way to do so, I'm not against the idea.)
Adding columns and triggers is really quite safe
While I realise you've said it's a large table in a production DB so you say you can't modify it, I want to explain how you can make a very low impact change.
In PostgreSQL, an ALTER TABLE ... ADD COLUMN of a nullable column takes only moments and doesn't require a table re-write. It does require an exclusive lock, but the main consequence of that is that it can take a long time before the ALTER TABLE can actually proceed, it won't hold anything else up while it waits for a chance to get the lock.
The same is true of creating a trigger on the table.
This means that it's quite safe to add a modified_at or created_at column and an associated trigger function to maintain them to a live table that's in intensive real-world use. Rows added before the column was created will be null, which makes perfect sense since you don't know when they were added/modified. Your trigger will set the modified_at field whenever a row changes, so they'll get progressively filled in.
For your purposes it's probably more useful to have a trigger-maintained side-table that tracks the timestamp of the last change (insert/update/delete) anywhere in the table. That'll save you from storing a whole bunch of timestamps on disk and will let you discover when deletes have happened. A single-row side-table with a row you update on each change using a FOR EACH STATEMENT trigger will be quite low-cost. It's not a good idea for most tables because of contention - it essentially serializes all transactions that attempt to write to the table on the row update lock. In your case that might well be fine, since the table is large and rarely updated.
A third alternative is to have the side table accumulate a running log of the timestamps of insert/update/delete statements or even the individual rows. This allows your client read the change-log table instead of the main table and make small changes to its cached data rather than invalidating and re-reading the whole cache. The downside is that you have to have a way to periodically purge old and unwanted change log records.
So... there's really no operational reason why you can't change the table. There may well be business policy reasons that prevent you from doing so even though you know it's quite safe, though.
... but if you really, really, really can't:
Another option is to use the existing "md5agg" extension: http://llg.cubic.org/pg-mdagg/ . Or to apply the patch currently circulating pgsql-hackers to add an "md5_agg" to the next release to your PostgreSQL install if you built from source.
Logical replication
The bi-directional replication for PostgreSQL project has produced functionality that allows you to listen for and replay logical changes (row inserts/updates/deletes) without requiring triggers on tables. The pg_receivellog tool would likely suit your purposes well when wrapped with a little scripting.
The downside is that you'd have to run a patched PostgreSQL 9.3, so I'm guessing if you can't change a table, running a bunch of experimental code that's likely to change incompatibly in future isn't going to be high on your priority list ;-) . It's included in the stock release of 9.4 though, see "changeset extraction".
Testing the relfilenode timestamp won't work
You might think you could look at the modified timestamp(s) of the file(s) that back the table on disk. This won't be very useful:
The table is split into extents, individual files that by default are 1GB each. So you'd have to find the most recent timestamp across them all.
Autovacuum activity will cause these timestamps to change, possibly quite a while after corresponding writes happened.
Autovacuum must periodically do an automatic 'freeze' of table contents to prevent transaction ID wrap-around. This involves progressively rewriting the table and will naturally change the timestamp. This happens even if nothing's been added for potentially quite a long time.
Hint-bit setting results in small writes during SELECT. These writes will also affect the file timestamps.
Examine the transaction logs
In theory you could attempt to decode the transaction logs with pg_xlogreader and find records that affect the table of interest. You'd have to try to exclude activity caused by vacuum, full page writes after hint bit setting, and of course the huge amount of activity from every other table in the entire database cluster.
The performance impact of this is likely to be huge, since every change to every database on the entire system must be examined.
All in all, adding a trigger on a table is trivial in comparison.
What about creating a trigger on insert/update/delete events on the table? The trigger could call a function that inserts a timestamp into another table which would mark the time for any table-changing event.
The only concern would be an update event updated using the same data currently in the table. The trigger would fire, even though the table didn't really change. If you're concerned about this case, you could make the trigger call a function that generates a checksum against just the updated rows and compares against a previously generated checksum, which would usually be more efficient than scanning and checksumming the whole table.
Postgres documentation on triggers here: http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
If you simply just want to know when a table has last changed without doing anything to it, you can look at the actual file(s) timestamp(s) on your database server.
SELECT relfilenode FROM pg_class WHERE relname = 'your_table_name';
If you need more detail on exactly where it's located, you can use:
select t.relname,
t.relfilenode,
current_setting('data_directory')||'/'||pg_relation_filepath(t.oid)
from pg_class t
join pg_namespace ns on ns.oid = t.relnamespace
where relname = 'your_table_name';
Since you did mention that it's quite a big table, it will definitely be broken into segments, and toasts, but you can utilize the relfilenode as your base point, and do a ls -ltr relfilenode.* or relfilnode_* where relfilenode is the actual relfilenode from above.
These files gets updated at every checkpoint if something occured on that table, so depending on how often your checkpoints occur, that's when you'll see the timestamps update, which if you haven't changed the default checkpoint interval, it's within a few minutes.
Another trivial, but imperfect way to check if INSERTS or DELETES have occurred is to check the table size:
SELECT pg_total_relation_size('your_table_name');
I'm not entirely sure why a trigger is out of the question though, since you don't have to make it retroactive. If your goal is to ensure nothing changes in it, a trivial trigger that just catches an insert, update, or delete event could be routed to another table just to timestamp an attempt but not cause any activity on the actual table. It seems like you're not ensuring anything changes though just by knowing that something changed.
Anyway, hope this helps you in this whacky problem you have...
A common practice would be to add a modified column. If it were MySQL, I'd use timestamp as datatype for the field (updates to current date on each updade). Postgre must have something similar.

Making structural changes to very large tables in an online environment

So here's what I'm facing.
The Problem
A large table, with ~230,000,000
rows.
We want to change the
clustering index and primary key of
this table to a simple bigint
identity field. There is one other
empty field being added to the table,
for future use.
The existing table
has a composite key. For the sake of
argument, let's say it's 2 bigint's.
The first one may have 1 or 10,000
'children' in the 2nd part of the
key.
Requirements
Minimal downtime, like preferably the
length of time it takes to run
SP_Rename.
Existing rows may change
while we're copying data. The updates
must be reflected in the new table.
Ideas
Put a trigger on existing table,
to update row in new table if it
already exists there.
Iterate through original table, copying data
into new table ~10,000 at a time.
Maybe 2,000 of the first part of the
old key.
When the copy is
complete, rename the old table to
"ExistingTableOld" and the new one
from "NewTable" to "ExistingTable".
This should allow stored procs to
continue to run without intervention
Are there any glaring omissions in the plan, or best practices I'm ignoring?
Difficult problem. Your plan sounds good, but I'm not totally sure you really need to batch the query as long as you run it in a transaction isolation level of READ UNCOMMITTED to stop locks being generated.
My experience making big schema changes is big changes are best done during a maintenance window—at night/over a weekend—when users are booted off the system. Just like running dbcc checkdb with the repair option. Then, when things go south, you have the option to roll back to the full backup that you providentially made right before starting the upgrade.
Item #3 on your list: Renaming the old/new tables. You'll probably want to recompile the stored procedures/views. My experience is that execution plans are bound against the object ids rather than object names.
Consider table dbo.foo: if it is renamed to dbo.foo_old, any stored procedures or user-defined functions won't necessarily error out until the dependent object is recompiled and its execution plan rebound. Cached execution plans continue to work perfectly fine.
sp_recompile is your friend.