I have two tables on my database, and I am trying to get the last DML (Insert, update or delete timestamp) using "SCN_TO_TIMESTAMP(MAX(ora_rowscn))" and "dba_tab_modifications" in Oracle 12DB.
Following is the information for the two table:
Table Name | Create Date | Last DML | SCN_TO_TIMESTAMP(MAX(ora_rowscn))
| |(as given from user)|
-----------+-------------+--------------------+-----------------------------------
Table1 | 25 SEP 2017 | 13 OCT 2020 |ORA-08181: specified number is not a valid system change number
| | |ORA-06512: at "SYS.SCN_TO_TIMESTAMP"
Table2 | 30 JAN 2017 | 29 OCT 2020
Following is the result:
Table Name | SCN_TO_TIMESTAMP(MAX(ora_rowscn)) |dba/all_tab_modifications
-----------+--------------------------------------+-------------------------
Table1 |ORA-08181: specified number is not a | NULL (0 row returned)
| valid system change number |
|ORA-06512: at "SYS.SCN_TO_TIMESTAMP" |
Table2 |29/OCT/20 03:40:15.000000000 AM | 29/OCT/20 03:50:52
Earliest date from dba/all_tab_modifications:
02/OCT/18 22:00:02
Can anyone share me a light why I am not able to get the last DML for Table1, but I ma able to get it for Table2?
I was thinking to execute "DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO" as advised from other blogs. However, my question is if the DML for second table have been monitored, it should have already been flushed.
Both tables ae updated inside different store procedures under the same user ID.
Can anyone share me an idea on how can I get the last DML for the first table? Thanks in advance!
Realistically, if you need this information, you need to store it in the table, use auditing, or do something else to capture changes (i.e. triggers that populate a table of modifications).
max(ORA_ROWSCN) will work to give you the last SCN of a modification (note that by default, this is stored at the block level not at the row level, so rows with the max(ora_rowscn) aren't necessarily the most recently modified). But Oracle only maintains the mapping of SCN to timestamp for a limited period of time. In the documentation, Oracle guarantees it will maintain the mapping for 120 hours (5 days). If the last modification was more than a few days ago, scn_to_timestamp will no longer work. If your system has a relatively constant rate of SCN generation, you could try to build your own function to generate approximate timestamps but that could produce significant inaccuracies.
dba_tab_modifications is used by the optimizer to identify tables that need new stats gathered so that data is even more transient. If you have statistics gathering enabled every night, you'd expect that information about some tables would get removed every night depending on which tables had fresh statistics gathered. Plus, the timestamp isn't intended to accurately identify the time the underlying table was modified but the time that Oracle wrote the monitoring information.
If this is something you need going forward, you could
Add a timestamp to the table that gets populated when a row is modified.
Add some logging to the stored procedures that lets you identify when tables were modified.
Put a trigger on the table that logs modifications in whatever form is useful to you.
Use Oracle's built-in auditing to capture DML affecting the table.
If you're really determined, assuming that the database is in archivelog mode and that you have all the archived log files since each table was last modified, you could use LogMiner to read through each archived log and find the timestamp of the last modification. But that will be exceedingly slow and depends on your backup strategy allowing you to recover old log files back to the last change.
Related
I have 3 reports based on 3 different tables, which ideally should match each other in audit.
They are updated sequentially once in a day.
The problem here is when one of the table is updated and second one is in progress, the customer sees data discrepancy between the reports for some time.
We tried the solution where in we commit after all 3 tables are updated but we started having issue on undo tbsp. The application have many other things running on.
I am looking for a solution where in we can restrict the user to show data to a specific point, and he must see updated data only after all 3 tables are refreshed/updated.
I think you can use select * for update for all 3 tables befor start updating procedure.
In that case users can select data and will see only not changed data till update session will not finish and make commit.
You can use a flashback query to show data as-of a point in time:
select * from table1 as of timestamp timestamp '2021-12-10 12:00:00';
The application would need to determine the latest time when the tables were synchronized - perhaps with a log table that records when the update process last started. However, the flashback query also uses the UNDO tablespace. But the query should at least use less UNDO since some of the committed transactions will now free up some space.
I'd like some input on designing the SQL data layer for a service that should store and provide the latest N entries for a specific user. The idea is to track each user (id), the time of an event and then the event id.
The service should only respond with the last X numbers of events for each user, and also only contain the events that occured during the last Y number of days.
The service also needs to scale to large amounts of updates and reads.
I'm considering just a simple table with the fields:
ID | USERID | EVENT | TIMESTAMP
============================================
1 | 1 | created file Z | 2014-03-20
2 | 2 | deleted dir Y | 2014-03-20
3 | 1 | created dir Y | 2014-03-20
But how would you consider solving the temporal requirements? I see two alternatives here:
1) On insert and/or reads for a user, also remove outdated and all but the last X events for a user. Affects latency as you need to perform both select,delete and insert on each request. But it keeps the disk size to minimum.
2) Let the service filter on query and do pruning as a separate batch job with some sql that:
First removes all obsolete events irrespective of users based on the timestamp.
Then do some join that removes all but the last X events for each user.
I have looked for design principles regarding these requirements which seems like fairly common ones. But I haven't yet found a perfect match.
It is at the moment NOT a requirement to query for all users that have performed a specific type of events.
Thanks in advance!
Edit:
The service is meant to scale to millions of requests / hour so I've been playing around with the idea of denormalizing this for performance reasons. Given that the requirements are set in stone:
10 last events
No events older than 10 days
I'm actually considering a pivoted table like this:
USERID | EV_1 | TS_1 | EV_2 | TS_2 | EV_3 | TS_3 | etc up to 10...
======================================================================
1 | Create | 2014.. | Del x | 2013.. | etc.. | 2013.. |
This way I can probably shift the events with a MERGE with SELECT and I get eviction for "free". Then I only have to purge all records where TS_1 is older than 10 days. I can also filter in my application logic to only show the events that are newer than 10 days after doing the trivial selects.
The caveat is if events comes in "out of order". The idea above works if I can always guarantee that the events are ordered from "left to right". Probably have to think a bit on that one..
Aside from the fact that it is basically a big cut in the relational data model, do you think I'm on the right track here if it comes to prioritize performance above all?
Your table design is good. Consider also the indexes you want to use. In practice, you will need a multi-column index on (userid, timestamp) to quickly respond to queries that query the last N events having a certain userid. Then you need a single-column index on (timestamp) to efficiently delete old events.
How many events you're planning to store and how many events you're planning to retrieve per query? I.e. does the size of the table exceed the RAM available? Are you using traditional spinning hard disks or solid-state disks? If the size of the table exceeds the RAM available and you are using traditional HDDs, note that each row returned for the query takes about 5-15 milliseconds due to slow seek time.
If your system supports batch jobs, I would use a batch job to delete old events instead of deleting old events at each query. The reason is that batch jobs do not slow down the interactive code path, and can perform more work at once provided that you execute the batch job rarely enough.
If your system doesn't support batch jobs, you could use a probabilistic algorithm to delete old events, i.e. delete only with 1% probability if events are queried. Or alternatively, you could have a helper table into which you store the timestamp of the last deleting of old events, and then check that timestamp and if it's old enough then perform new delete job and update the timestamp. The helper table should be so small that it will always stay in the cache.
My inclination is not to delete data. I would just store the data in your structure and have an interface (perhaps a view or table functions) that runs a query such as;
select s.*
from simple s
where s.timestamp >= CURRENT_DATE - interval 'n days' and
s.UserId = $userid
order by s.timestamp desc
fetch first 10 row only;
(Note: this uses standard syntax because you haven't specified the database, but there is similar functionality in any database.)
For performance, you want an index on simple(UserId, timestamp). This will do most of the work.
If you really want, you can periodically delete older rows. However, keeping all the rows is advantageous for responding to changing requirements ("Oh, we now want 60 days instead of 30 days") or other purposes, such as investigations into user behaviors and changes in events over time.
There are situations that are out-of-the-ordinary where you might want a different approach. For instance, there could be legal restrictions on the amount of time you could hold the data. In that case, use a job that deletes old data and run it every day. Or, if your database technology were an in-memory database, you might want to restrict the size of the table so old data doesn't occupy much memory. Or, if you had really high transaction volumes and lost of users (like millions of users with thousands of events), you might be more concerned with data volume affecting performance.
I've been asked to do a snapshots of certain tables from the database, so in the future we can have a clear view of the situation for any given day in the past. lets say that one of such tables looks like this:
GKEY Time_in Time_out Category Commodity
1001 2014-05-01 10:50 NULL EXPORT Apples
1002 2014-05-02 11:23 2014-05-20 12:05 IMPORT Bananas
1003 2014-05-05 11:23 NULL STORAGE Null
The simples way to do a snapshot would be creating copy of the table with another column SNAPSHOT_TAKEN (Datetime) and populate it with an INSERT statement
INSERT INTO UNITS_snapshot (SNAPSHOT_TAKEN, GKEY,Time_in, Time_out, Category, Commodity)
SELECT getdate() as SNAPSHOT_TAKEN, * FROM UNITS
OK, it works fine, but it would make the destination table quite big pretty soon, especially if I'd like to run this query often. Better solution would be checking for changes between current live table and the latest snapshot and write them down, omitting everything that hasn't been changed.
Is there a simply way to write such query?
EDIT: Possible solution for the "Forward delta" (assuming no deletes from original table)
INSERT INTO UNITS_snapshot
SELECT getdate() as SNAP_DATE,
r.* -- Here goes all data from from the original table
CASE when b.gkey is null then 'I' else 'U' END AS change_type
FROM UNITS r left outer join UNITS_snapshot b
WHERE (r.time_in <>b.time_in or r.time_out<>b.time_out or r.category<>b.category or r.commodity<>b.commodity or b.gkey is null)
and (b.snap_date =
(SELECT max (b.snap_date) from UNITS_snapshot b right outer join UNITS r
on r.gkey=b.gkey) or b.snap_date is null)
Assumptions: no value from original table is deleted. Probably also every field in WHERE should be COALESCE (xxx,'') to avoid comparing null values with set ones.
Both Dan Bracuk and ITroubs have made very good comments.
Solution 1 - Daily snapshop
The first solution you proposed is very simple. You can build the snapshot with a simple query and you can also consult it and rebuild any day's snapshot with a very simple query, by just filtering on the SNAPSHOT_TAKEN column.
If you have just some thousands of records, I'd go with this one, without worrying too much about its growing size.
Solution 2 - Daily snapshop with rolling history
This is basically the same as solution 1, but you keep only some of the snapshots over time... to avoid having the snapshot DB growing indefinitely over time.
The simplest approach is just to save the snapshots of the last N days... maybe a month or two of data. A more sophisticated approach is to keep snapshot with a density that depends on age... so, for example, you could have every day of the last month, plus every sunday of the last 3 months, plus every end-of-month of the last year, etc...
This solution requires you develop a procedure to handle deletion of the snapshots that are not required any more. It's not as simple as using getdate() within a query. But you obtain a good balance between space and historic information. You just need to balance out a good snapshot retainment strategy to suit your needs.
Solution 3 - Forward row delta
Building any type of delta is a much more complex procedure.
A forward delta is built by storing the initial snapshot (as if all rows had been inserted on that day) and then, on the following snapshots, just storing information about the difference between snapshot(N) and snapshot(N-1). This is done by analyzing each row and just storing the data if the row is new or updated or deleted. If the main table does not change much over time, you can save quite a lot of space, as no info is stored for unchanged rows.
Obviously, to handle deltas, you now need 2 extra columns, not just one:
delta id (you snapshot_taken is good, if you only want 1 delta per
day)
row change type (could be D=deleted, I=inserted, U=updated... or
something similar)
The main complexity derives from the necessity to identify rows (usually by primary key) so as to calculate if between 2 snapshots any individual row has been inserted, updated, deleted... or none of the above.
The other complexity comes from reading the snapshot DB and building the latest (or any other) snapshot. This is necessary because, having only row differences in the table, you cannot simply select a day's snapshot by filtering on snapshot_taken.
This is not easy in SQL. For each row you must take into account just the final version... the one with MAX snapshot_taken that is <= the date of the snapshot you want to build. If it is an insert or update, then keep the data for that row, else (if it is a delete) then ignore it.
To build a delta of snapshot(N), you must first build the latest snapshot (N-1) from the snapshot DB. Then you must compare the two snapshots by primary key or row identity and calculate the change type (I/U/D) and insert the changes in the snapshot DB.
Beware that you cannot delete old snapshot data without consolidating it first. That is because all snapshots are calculated from the oldest initial one and all the subsequent difference data. If you want to remove a year's of old snapshots, you'll have to consolidate the old initial snapshot and all the year's variations in a new initial snapshot.
Solution 4 - Backward row delta
This is very similar to solution 3, but a bit more complex.
A backward delta is built by storing the final snapshot and then, on the following snapshots, just storing information about the difference between snapshot(N-1) and snapshot(N).
The advantage is that the latest snapshot is always readily available through a simple select on the snapshot DB. You only need to merge the difference data when you want to retrieve an older snapshot. Compare this to the forward delta, where you always need to rebuild the snapshot from the difference data unless you are actually interested in the very first snapshot.
Another advantage (compared to solution 3) is that you can remove older snapshots by just deleting the difference data older than a particular snapshot. You can do this easily because snapshots are calculated from the final snapshot and not from the initial one.
The disadvantage is in the obscure logic. Difference data is calculated backwards. Values must be stored on the (U)pdate and (D)elete variations, but are unnecessary on the I variations. Going backwards, rows must be ignored if the first variation you find is an (I)nsert. Doable, but a bit trickier.
Solution 5 - Forward and backward column delta
If the main table has many columns, or many long text or varchar columns, and only a bunch of these are updated, then it could make sense to store only column variations instead of row variations.
This is done by using a table with this structure:
delta id (you snapshot_taken is good, if you only want 1 delta per
day)
change type (could be D=deleted, I=inserted, U=updated... or
something similar)
column name
value
The difference can be calculated forward or backward, as per row deltas.
I've seen this done, but I really advise against it. There are just too many disadvantages and added complexity.
Value is a text or varchar, and there are typecasting issues to handle if you have numeric, boolean or date/time values... and, if you have a lot of these, it could very well be you won't be saving as much space as you think you are.
Rebuilding any snapshot is hell. Altogether... any operation on this type of table really requires a lot of knowledge of the main table's structure.
I have a relatively large table (~100m records) which is basically an XML store. There can be multiple XML documents with different timestamps (with the logic that the latest timestamp = the most recent version). We're expecting monthly batches of updated data, probably with new versions of ~70% of the data.
We're planning on only keeping the most recent 2-3 versions in the store, so I'm guessing our current b-tree index on (record ID, timestamp) is not necessarily the fastest? A straight-forward "select * from table where timestamp >= yyyy-mm-dd order by record id, timestamp" query took 15 hours to complete last night - pretty high-spec kit and I don't think anyone else was using the DB at the time.
(re: the query itself, ideally I only want to select the most recent document with timestamp >= yyyy-mm-dd, but that's less of an issue for now).
Is there any way I can create an auto-decrement column, as follows:
Record ID Timestamp Version XML
1 2011-10-18 1 <...>
1 2011-10-11 2 <...>
1 2011-10-04 3 <...>
2 2011-10-18 1 <...>
2 2011-10-11 2 <...>
etc etc - i.e. as a new version comes along, the most recent timestamp = version 1, and all the older records get version = version + 1. This way my house-keeping scripts can be a simple "delete where version > 3" (or whatever we decide to keep), and I can have a b-tree index on record ID, and a binary index on version?
Hope I'm not barking completely up the wrong tree - have been "creatively Googling" all morning and this is the theory I've come up with...
I'm not sure decrementing the version would be a good idea.. The only way to do it would be with triggers looking up matching record ids and updating them accordingly. This wouldn't be great for performance..
This is how I do something similat in our database environment (which is of a similar size). Hopefully its useful:
Create a seperate archive table that will hold all versions of your records. This will be populated by a trigger on insert to your main table. The trigger will insert the current version of the record into your archive, and update the record on the master table, incrementing the version number and updating the timestamp and data.
Then, when you only need to select the latest version of all records, you simply do:
SELECT * FROM TABLE;
If you need the ability to view 'snapshots' of how the data looked at a given point in time, you will also need valid_from and valid_to columns on the table to record the times at which each version of the records were latest versions. You can populate these using the triggers when you write to the archive table..
Valid_to on the latest version of a record can be set to the maximum date available. When a newer version of a record is inserted, you'd update the valid_to of the previous version to be just before the valid_from of the new record (its not the same to avoid dupes)..
Then, when you want to see how your data looked at a given time, you query th archive table using SQL like:
SELECT *
FROM ARCHIVE_TABLE a
WHERE <time you're interested in> BETWEEN a.valid_from AND a.valid_to
Batch work is definitely different than the typical insert/update approach (esp if triggers or many indexes are involved). Even with decent disks/hardware, you'll find traditional DML approach is very slow with this volume. For 100mm + tables where you're updating 70mm in batch each month, I would suggest looking into an approach similar to:
Load new batch file (70mm) into separate table (NEW_XML), same format as existing table (EXISTING_XML). Use nologging to avoid undo.
Append (nologging) records from EXISTING_XML that don't exist in NEW_XML (30mm recs, based on whatever key(s) you already use).
Rename EXISTING_XML to HISTORY_XML and NEW_XML to EXISTING_XML. Here you'll need some downtime, off hours over a weekend perhaps. This won't take any time really, but you'll need time for next step (and due to object invalidations). If you already have a HISTORY_XML from previous month, truncate and drop it first (keep 1 month of old data).
Build indexes, stats, constraints, etc on EXISTING_XML (which now contains the new data as well). Recompile any invalidated objects, use logging, etc.
So in a nutshell, you'll have a table (EXISTING_XML) that not only has the new data, but was built relatively quickly (many times faster than DML/trigger approach). Also, you may try using parallel for step 2 if needed.
Hope that helps.
I have an audit table setup which essentially mirrors one of my tables along with a date, user and command type. Here's how it might look like:
AuditID UserID Individual modtype user audit_performed
1 1239 Day Meff INSERT dbo 2010-11-04 14:50:56.357
2 2334 Dasdf fdlla INSERT dbo 2010-11-04 14:51:07.980
3 3324 Dasdf fdla DELETE dbo 2010-11-04 14:51:11.130
4 5009 Day Meffasdf UPDATE dbo 2010-11-04 14:51:12.777
Since these types of tables can get big pretty quick - I was thinking of putting in some sort of automatic delete of the older rows. So for example if I have 3 months of history - if I could delete the first month while retaining the last two. And again all of this must be automatic - I imagine once a certain date is hit, a query activates and deletes the oldest month with audit data. What is the best way to do this?
I'm using SQL Server 2005 by the way.
A SQL agent job should be fine here. You definitely don't need to do this on every single insert with a trigger. I doubt you even need to do it every day. You could schedule a job that runs once a month and clears out anything older than 2 months (so at most you'd have 3 months of data minus 1 day at any given time).
You could use SQL Server agent..you can schedule a repeating job like deleting entries from the current audit table after certain period. Here is how you would do it.
I would recommend storing the data in another table audit_archive table and deleting it from the current audit table. So, that in case you want some history you still have it and your table also doesn't get too big.
You could try a trigger every time a row is added it will clear anything older than 3 months.
You could also try SQL Agent to run a script every day that will do that.
Have you looked at using triggers? You could define a trigger to run when you add a row (on INSERT) that deletes any rows that are more than three months old.