Best approach to optimize a record history database

Best approach to optimize a record history database - sql

I have a database that keeps record history. For each update to a record, the system will "deactivate" the previous record (along with all it's children), by setting the "Status" column to "0".
Now it's not a problem yet...but eventually this system is going to have a lot of records, and history is more important than speed right now. But the more records inserted, the slower searches become.
What is the best approach to archive the records? I've had suggestions to create a cloned archive database to hold the data. I've also had the idea to storing all previous records into a xml file, that can be read / loaded later if we need to dig up archived records.

You could create a separate partition containing only the active record if your DBMS supports it. You can also add an index to Status so that the select ... from tbl where status=1 isn't incredibly slow.
http://msdn.microsoft.com/en-us/library/ms187802.aspx

Related

Bigquery snapshots when base table gets overwritten

Context
I have an ETL process that keeps overwritng all rows of a table in bq by deleting all first then inserting new ones. I'm looking for a data back up design that can be triggered regularly on that table.
Issue
I'm concerned about the cost implications of using snapshots for this kind of table.
What exactly am I worried about?
On each drop and recreation of the base table, the new data has many rows that are identical to previous row, some new rows and some updated rows. However, the data gets inserted in a different sort order each time.
So when bq is creating a snapshot, by looking for rows that have changed, will it know that that some previous rows are still in the base table and have only changed position in order to avoid increased storage costs on the snapshot?

Have you thought about using merge statements?
These can deal with inserts, updates and even deletes in one query.
An example here https://querystash.com/query/62cf51097d57d7579954c0d418afc063

Truncate and insert new content into table with the least amount of interruption

Twice a day, I run a heavy query and save the results (40MBs worth of rows) to a table.
I truncate this results table before inserting the new results such that it only ever has the latest query's results in it.
The problem, is that while the update to the table is written, there is technically no data and/or a lock. When that is the case, anyone interacting with the site could experience an interruption. I haven't experienced this yet, but I am looking to mitigate this in the future.
What is the best way to remedy this? Is it proper to write the new results to a table named results_pending, then drop the results table and rename results_pending to results?

Two methods come to mind. One is to swap partitions for the table. To be honest, I haven't done this in SQL Server, but it should work at a low level.
I would normally have all access go through a view. Then, I would create the new day's data in a separate table -- and change the view to point to the new table. The view change is close to "atomic". Well, actually, there is a small period of time when the view might not be available.
Then, at your leisure you can drop the old version of the table.

TRUNCATE is a DDL operation which causes problems like this. If you are using snapshot isolation with row versioning and want users to either see the old or new data then use a single transaction to DELETE the old records and INSERT the new data.
Another option if a lot of the data doesn't actually change is to UPDATE / INSERT / DELETE only those records that need it and leave unchanged records alone.

MS Access: move deleted records automatically to "recycle bin" table

I need to implement a copy of records right after they are deleted from the table, so they can be recovered in case of accidental deletion.
I am using MS Access. Is there any built in way to do it or will I have to INSERT INTO SELECT before every DELETE?
Doing it for just one table is not a concern. I want to use something ready for any table regardless of its structure, so I don't need to create and configure another recycle-bin-table for every table I have in the database, which would be necessary if I want successful move operations.
Besides SQL, I can run VBA to accomplish this task.
EDIT
There are recommendations of adding a boolean column that indicates if the record is to be displayed or is archived (has the meaning of "deleted" for my purposes), but this involves changing every table and every query I have done, so it won't fit for me, only as a last resort.

What happens when you have cascading deletions, as in all good designed databases? Also your INSERT in a backup table before DELETE will not solve all the issues you will face. Also copying table can result in a lot of copies that will increase your database size and you will have soon or later to clean your data.
Journaling can be better solutions?

sql multithreading application select and delete from a table

I am developing a multi-thread application (could be considered as a client-server) which processes data. The below is the high level description of the application.
there is a table (with no key and Id field) with many rows in our database server. I have several systems (threads) which read (select) some rows (fixed number of rows) from the table and process them and remove (delete) those rows from the table.
I am looking for a solution for removing (deleting) data without using a temp table; but any ideas with temp storage are welcome.
P.S: By using locks and a temp table, I solved the reading process but I need help on deleting part.
P.S2: One possible solution that Jean said is not removing rows physically from the table. This idea is great but I forgot to mention that this table must be empty after a specific period of time and by using the solution I need to have a system which deletes all the marked rows at the end (which is not possible)

Finding changed records in a database table

I have a problem that I haven't been able to come up with a solution for yet. I have a database (actually thousands of them at customer sites) that I want to extract data from periodically. I'd like to do a full data extract one time (select * from table) then after that only get rows that have changed.
The challenge is that there aren't any updated date columns in most of the tables that could be used to constrain the SQL query. I can't use a trigger based approach nor change the application that writes to the database since it's another group that develops the app and they are way backed up already.
I may be able to write to the database tables when doing the data extract, but would prefer not to do that. Does anyone have any ideas for how we might be able to do this?

You will have to programatically mark the records. I see suggestions of an auto-incrementing field but that will only get newly inserted records. How will you track updated or deleted records?
If you only want newly inserted that an autoincrementing field will do the job; in subsequent data dumps grab every thing since the last value of the autoincrment field and then recrod the current value.
If you want updates the minimum I can see is to have a last_update field and probably a trigger to populare it. If the last_update is later the the last data dump grab that record. This will get inserts and updates but not deletes.
You could try something like a 'instead of delete' trigger if your RDBMS supports it and NULL the last_update field. On subsequent data dumps grap all recoirds where this field is NULL and then delete them. But there would be problems with this (e.g. how to stop the app seeing them between the logical and physical delete)
The most fool proof method I can see is aset of history (audit) tables and ech change gets written to them. Then you select your data dump from there.
By the way do you only care about know the updates have happened? What about if 2 (or more) updates have happened. The history table is the only way that I can see you capturing this scenario.

This should isolate rows that have changed since your last backup. Assuming DestinationTable is a copy of SourceTable even on the key fields; if not you could list out the important fields.
SELECT * FROM SourceTable
EXCEPT
SELECT * FROM DestinationTable

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Best approach to optimize a record history database - sql

You could create a separate partition containing only the active record if your DBMS supports it. You can also add an index to Status so that the select ... from tbl where status=1 isn't incredibly slow. http://msdn.microsoft.com/en-us/library/ms187802.aspx

Related

Bigquery snapshots when base table gets overwritten

Truncate and insert new content into table with the least amount of interruption

MS Access: move deleted records automatically to "recycle bin" table

sql multithreading application select and delete from a table

Finding changed records in a database table

Categories

Resources