I have a join request and the data can change during the day and from date to date (deleted rows), so I want to keep certain data by picking them and save them elsewhere the next day from 3 months.
Usually, I would do a materialized view (for performances / do not touch the production tables) and refresh it every night /or on logs, but the issue here is that I want to be able to ADD the new data from yesterday and do not update the whole mview (data will be deleted from the mview then) and say: what is older than 3 months can be deleted.
How can I do this? Maybe I'm totally wrong thinking about mview and the only way is with dbms_scheduler?
Use your own table, then. Schedule a job (using dbms_scheduler you mentioned) which will
insert new rows (dated yesterday)
delete rows older than 3 months
Properly index it so that you'd be able to fetch "archive" data faster than without an index. Don't forget to regularly gather statistics on both table and index(es).
Related
I have 3 reports based on 3 different tables, which ideally should match each other in audit.
They are updated sequentially once in a day.
The problem here is when one of the table is updated and second one is in progress, the customer sees data discrepancy between the reports for some time.
We tried the solution where in we commit after all 3 tables are updated but we started having issue on undo tbsp. The application have many other things running on.
I am looking for a solution where in we can restrict the user to show data to a specific point, and he must see updated data only after all 3 tables are refreshed/updated.
I think you can use select * for update for all 3 tables befor start updating procedure.
In that case users can select data and will see only not changed data till update session will not finish and make commit.
You can use a flashback query to show data as-of a point in time:
select * from table1 as of timestamp timestamp '2021-12-10 12:00:00';
The application would need to determine the latest time when the tables were synchronized - perhaps with a log table that records when the update process last started. However, the flashback query also uses the UNDO tablespace. But the query should at least use less UNDO since some of the committed transactions will now free up some space.
I have a table tblcalldatastore which produce around 4000000 records daily. I want to create a daily job to delete any record order than 24 hours. What is the most efficient and less time taking way? Below query is my requirement.
delete from [tblcalldatastore]
where istestcase=0
and datediff(hour,receiveddate,GETDATE())>24
The better approach is to avoid delete entirely by using partitions on your table. Instead of deleting records, drop partitions.
For example, you can create a partition for each hour. Then you can drop the entire partition for the 25th hour in the past. Or you can basically have two partitions by day and drop the older one after 24 hours.
This approach has a big performance advantage, because partition drops are not logged at the record level, saving lots of time. They also do not invoke triggers or other checks, saving more effort.
The documentation on partitioning is here.
You might not want to go down the Partitions route.
It looks like you will typically be deleting approx half the data in your table every day.
Deletes are very expensive...
A much faster way to do this is to
Select INTO a New Table (the data you want to keep)
rename (or Drop) your old Table
Then Rename your new table to the old table name.
This should work out quicker - Unless you have heaps of Indexes & FKs...
Twice a day, I run a heavy query and save the results (40MBs worth of rows) to a table.
I truncate this results table before inserting the new results such that it only ever has the latest query's results in it.
The problem, is that while the update to the table is written, there is technically no data and/or a lock. When that is the case, anyone interacting with the site could experience an interruption. I haven't experienced this yet, but I am looking to mitigate this in the future.
What is the best way to remedy this? Is it proper to write the new results to a table named results_pending, then drop the results table and rename results_pending to results?
Two methods come to mind. One is to swap partitions for the table. To be honest, I haven't done this in SQL Server, but it should work at a low level.
I would normally have all access go through a view. Then, I would create the new day's data in a separate table -- and change the view to point to the new table. The view change is close to "atomic". Well, actually, there is a small period of time when the view might not be available.
Then, at your leisure you can drop the old version of the table.
TRUNCATE is a DDL operation which causes problems like this. If you are using snapshot isolation with row versioning and want users to either see the old or new data then use a single transaction to DELETE the old records and INSERT the new data.
Another option if a lot of the data doesn't actually change is to UPDATE / INSERT / DELETE only those records that need it and leave unchanged records alone.
I have an audit table setup which essentially mirrors one of my tables along with a date, user and command type. Here's how it might look like:
AuditID UserID Individual modtype user audit_performed
1 1239 Day Meff INSERT dbo 2010-11-04 14:50:56.357
2 2334 Dasdf fdlla INSERT dbo 2010-11-04 14:51:07.980
3 3324 Dasdf fdla DELETE dbo 2010-11-04 14:51:11.130
4 5009 Day Meffasdf UPDATE dbo 2010-11-04 14:51:12.777
Since these types of tables can get big pretty quick - I was thinking of putting in some sort of automatic delete of the older rows. So for example if I have 3 months of history - if I could delete the first month while retaining the last two. And again all of this must be automatic - I imagine once a certain date is hit, a query activates and deletes the oldest month with audit data. What is the best way to do this?
I'm using SQL Server 2005 by the way.
A SQL agent job should be fine here. You definitely don't need to do this on every single insert with a trigger. I doubt you even need to do it every day. You could schedule a job that runs once a month and clears out anything older than 2 months (so at most you'd have 3 months of data minus 1 day at any given time).
You could use SQL Server agent..you can schedule a repeating job like deleting entries from the current audit table after certain period. Here is how you would do it.
I would recommend storing the data in another table audit_archive table and deleting it from the current audit table. So, that in case you want some history you still have it and your table also doesn't get too big.
You could try a trigger every time a row is added it will clear anything older than 3 months.
You could also try SQL Agent to run a script every day that will do that.
Have you looked at using triggers? You could define a trigger to run when you add a row (on INSERT) that deletes any rows that are more than three months old.
I am designing a database to store product informations, and I want to store several months of historical (price) data for future reference. However, I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries. Does anyone have a good idea of how to approach this problem? My initial design is to have a table named historical data, and everyday, it pulls the active data and stores it into the historical database with a time stamp. Does anyone have a better idea? Or can see what is wrong with mine?
First, I'd like to comment on your proposed solution. The weak part of course is that, there can, actually, be more than one change between your intervals. That means, the record was changed three times during the day, but you only archive the last change.
It's possible to have the better solution, but it must be event-driven. If you have the database server that supports events or triggers (like MS SQL), you should write a trigger code that creates entry in history table. If your server does not support triggers, you can add the archiving code to your application (during Save operation).
You could place a trigger on your price table. That way you can archive the old price in an other table at each update or delete event.
It's a much broader topic than it initially seems. Martin Fowler has a nice narrative about "things that change with time".
IMO your approach seems sound if your required history data is a snapshot of the end of the day's data - in the past I have used a similar approach with overnight jobs (SP's) that pick up the day's new data, timestamp it and then use a "delete all data that has a timestamp < today - x" where x is the time period of data I want to keep.
If you need to track all history changes, then you need to look at triggers.
I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries
We store data in Archive tables, using a Trigger, as others have suggested. Our archive table has additional column for AuditDate, and stores the "Deleted" data - i.e. the previous version of the data. The current data is only stored in the actual table.
We prune the Archive table with a business rule along the lines of "Delete all Archive data more than 3 months old where there exists at least one archive record younger than 3 months old; delete all archive data more than 6 months old"
So if there has been no price change in the last 3 months you would still have a price change record from the 3-6 months ago period.
(Ask if you need an example of the self-referencing-join to do the delete, or the Trigger to store changes in the Archive table)