I need to know when the data in a table was last modified (data inserted, created, deleted). The common answer is to extract the data from sys.dm_db_index_usage_stats, but I don't have access to that table and as I understand it, access to that is a server-level permission, and since this is on a corporate shared server, the odds of the IT department granting me access is up there with pigs flying.
Is there a way to get this information, that does not require greatly elevated privileges?
Update #1
Some additional information: What I am doing, is caching the contents of this table locally, and need to know when to update my local cache. This means:
I don't need an actual timestamp, just any value that I can compare against a locally-cached value that tells me "things have changed, update the cache"
If this value changes too often (e.g. gets reset every time they restart the server which is extremely rarely) it's OK, it just means I do an extra cache update that I didn't actually need to do
Update #2
I should have done this early on, but I just assumed I would be able to create tables as needed ... but I'm not. Permission not granted. So neither of the proposed methods will work :-(
Here's my new thought : I can call
select checksum_agg(binary_checksum(*))
and get a checksum on the entire remote table. Then, if the checksum changes, I know the table has changed and I can update my local cache. I've seen the problems with checksum, but using HASHBYTES sounds like it would be much more complicated and much slower.
This is working, the problem is that when the checksum changes, I have to reload the entire cache. My table has so many rows that returning the checksum per row takes an unacceptably long time, is there a way to use the "OVER" clause and get maybe 10 checksums, the first checksum for the first tenth of the rows, etc.?
If you can modify their schema, then you should be able to add a trigger.
Once you add a lastmodified column, you can use this trigger to get the time updated any time the record changes:
CREATE trigger [dbo].[TR_TheirTable_Timestamp]
on [dbo].[TheirTable] for update
as
begin
update
dbo.TheirTable
set
lastmodified=getdate()
from
Inserted
where
dbo.TheirTable.UniqueID=Inserted.UniqueKey
end
The reason I do it only for update, and not insert is because I can see a new record, and I don't need a timestamp, but a modified record I can compare the time I last updated the record. if you want an insert update, then
on [dbo].[TheirTable] for insert,update
would work
If you just wanted to know when the table was updated, then the trigger could write to another table with the tablename and date, and you wouldn't have to modify their schema
Related
I want to find when the last INSERT, UPDATE or DELETE statement was performed on a table (for now, in the future I want to do this in multiple tables) in an Oracle database.
I created a table and then I updated one of its rows. Now I've the following query:
SELECT SCN_TO_TIMESTAMP(ora_rowscn) from test_table;
This query returns the timestamps of each row, and for each of them it gives the time when they were first created.
But the row that I've updated have the same timestamp as the others. Why? Shouldn't the timestamp be updated?
ORA_ROWSCN is not the right solution for this. It is not necessarily reliable at the row level. Moreover, it's not going to be useful at all for deleted rows.
If you have a real need to know when DML changes were made to a table, you should look at Oracle's auditing feature.
An alternative is to use triggers to record when changes are made to the table. Since you say you only care about the time of the most recent change, you can just create a single-column table to record the time, and write a trigger that fires on any DML statement to maintain it. If you're doing this in a production environment or even just in one where more than one session might be modifying the table, you'd want to think about how it should work when concurrent changes are made. You could force the table to have at most one row, but that would serialize every change to the table. You could allow each session to insert a separate row and take the max value when querying it, but then you probably want to think about clearing out old rows from time to time.
This question already has answers here:
How can I get a hash of an entire table in postgresql?
(7 answers)
Closed 9 years ago.
Suppose you have a reasonably large (for local definitions of “large”), but relatively stable table.
Right now, I want to take a checksum of some kind (any kind) of the contents of the entire table.
The naïve approach might be to walk the entire table, taking the checksum (say, MD5) of the concatenation of every column on each row, and then perhaps concatenate them and take its MD5sum.
From the client side, that might be optimized a little by progressively appending columns' values into the MD5 sum routine, progressively mutating the value.
The reason for this, is that at some point in future, we want to re-connect to the database, and ensure that no other users may have mutated the table: that includes INSERT, UPDATE, and DELETE.
Is there a nicer way to determine if any change/s have occurred to a particular table? Or a more efficient/faster way?
Update/clarification:
We are not able/permitted to make any alterations to the table itself (e.g. adding a “last-updated-at” column or triggers or so forth)
(This is for Postgres, if it helps. I'd prefer to avoid poking transaction journals or anything like that, but if there's a way to do so, I'm not against the idea.)
Adding columns and triggers is really quite safe
While I realise you've said it's a large table in a production DB so you say you can't modify it, I want to explain how you can make a very low impact change.
In PostgreSQL, an ALTER TABLE ... ADD COLUMN of a nullable column takes only moments and doesn't require a table re-write. It does require an exclusive lock, but the main consequence of that is that it can take a long time before the ALTER TABLE can actually proceed, it won't hold anything else up while it waits for a chance to get the lock.
The same is true of creating a trigger on the table.
This means that it's quite safe to add a modified_at or created_at column and an associated trigger function to maintain them to a live table that's in intensive real-world use. Rows added before the column was created will be null, which makes perfect sense since you don't know when they were added/modified. Your trigger will set the modified_at field whenever a row changes, so they'll get progressively filled in.
For your purposes it's probably more useful to have a trigger-maintained side-table that tracks the timestamp of the last change (insert/update/delete) anywhere in the table. That'll save you from storing a whole bunch of timestamps on disk and will let you discover when deletes have happened. A single-row side-table with a row you update on each change using a FOR EACH STATEMENT trigger will be quite low-cost. It's not a good idea for most tables because of contention - it essentially serializes all transactions that attempt to write to the table on the row update lock. In your case that might well be fine, since the table is large and rarely updated.
A third alternative is to have the side table accumulate a running log of the timestamps of insert/update/delete statements or even the individual rows. This allows your client read the change-log table instead of the main table and make small changes to its cached data rather than invalidating and re-reading the whole cache. The downside is that you have to have a way to periodically purge old and unwanted change log records.
So... there's really no operational reason why you can't change the table. There may well be business policy reasons that prevent you from doing so even though you know it's quite safe, though.
... but if you really, really, really can't:
Another option is to use the existing "md5agg" extension: http://llg.cubic.org/pg-mdagg/ . Or to apply the patch currently circulating pgsql-hackers to add an "md5_agg" to the next release to your PostgreSQL install if you built from source.
Logical replication
The bi-directional replication for PostgreSQL project has produced functionality that allows you to listen for and replay logical changes (row inserts/updates/deletes) without requiring triggers on tables. The pg_receivellog tool would likely suit your purposes well when wrapped with a little scripting.
The downside is that you'd have to run a patched PostgreSQL 9.3, so I'm guessing if you can't change a table, running a bunch of experimental code that's likely to change incompatibly in future isn't going to be high on your priority list ;-) . It's included in the stock release of 9.4 though, see "changeset extraction".
Testing the relfilenode timestamp won't work
You might think you could look at the modified timestamp(s) of the file(s) that back the table on disk. This won't be very useful:
The table is split into extents, individual files that by default are 1GB each. So you'd have to find the most recent timestamp across them all.
Autovacuum activity will cause these timestamps to change, possibly quite a while after corresponding writes happened.
Autovacuum must periodically do an automatic 'freeze' of table contents to prevent transaction ID wrap-around. This involves progressively rewriting the table and will naturally change the timestamp. This happens even if nothing's been added for potentially quite a long time.
Hint-bit setting results in small writes during SELECT. These writes will also affect the file timestamps.
Examine the transaction logs
In theory you could attempt to decode the transaction logs with pg_xlogreader and find records that affect the table of interest. You'd have to try to exclude activity caused by vacuum, full page writes after hint bit setting, and of course the huge amount of activity from every other table in the entire database cluster.
The performance impact of this is likely to be huge, since every change to every database on the entire system must be examined.
All in all, adding a trigger on a table is trivial in comparison.
What about creating a trigger on insert/update/delete events on the table? The trigger could call a function that inserts a timestamp into another table which would mark the time for any table-changing event.
The only concern would be an update event updated using the same data currently in the table. The trigger would fire, even though the table didn't really change. If you're concerned about this case, you could make the trigger call a function that generates a checksum against just the updated rows and compares against a previously generated checksum, which would usually be more efficient than scanning and checksumming the whole table.
Postgres documentation on triggers here: http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
If you simply just want to know when a table has last changed without doing anything to it, you can look at the actual file(s) timestamp(s) on your database server.
SELECT relfilenode FROM pg_class WHERE relname = 'your_table_name';
If you need more detail on exactly where it's located, you can use:
select t.relname,
t.relfilenode,
current_setting('data_directory')||'/'||pg_relation_filepath(t.oid)
from pg_class t
join pg_namespace ns on ns.oid = t.relnamespace
where relname = 'your_table_name';
Since you did mention that it's quite a big table, it will definitely be broken into segments, and toasts, but you can utilize the relfilenode as your base point, and do a ls -ltr relfilenode.* or relfilnode_* where relfilenode is the actual relfilenode from above.
These files gets updated at every checkpoint if something occured on that table, so depending on how often your checkpoints occur, that's when you'll see the timestamps update, which if you haven't changed the default checkpoint interval, it's within a few minutes.
Another trivial, but imperfect way to check if INSERTS or DELETES have occurred is to check the table size:
SELECT pg_total_relation_size('your_table_name');
I'm not entirely sure why a trigger is out of the question though, since you don't have to make it retroactive. If your goal is to ensure nothing changes in it, a trivial trigger that just catches an insert, update, or delete event could be routed to another table just to timestamp an attempt but not cause any activity on the actual table. It seems like you're not ensuring anything changes though just by knowing that something changed.
Anyway, hope this helps you in this whacky problem you have...
A common practice would be to add a modified column. If it were MySQL, I'd use timestamp as datatype for the field (updates to current date on each updade). Postgre must have something similar.
I am working on a program that is supposed to insert hundreds of rows to the database per run.
The problem is that once the inserted data is wrong, how can we recover from that run? Currently I only have a log file (I created the format), which records the raw data get inserted (no metadata nor primary keys). Is there a way we can create a log that database can understand it, and once we want to undo the insertion we feed the database with that log file.
Or, if there is alternative mechanism of undoing an operation from a program, kindly let me know, thanks.
The fact, that this is only hundreds of rows, makes it succeptible to the great-grandmother of all undo mechanisms:
have a table importruns with a row for each run you do. I assume it has an integer auto-increment PK
add a field to your data table, that identifies carries the PK of the import run
for insert-only runs, you just need to DELETE FROM sometable WHERE importid=$whatever
If you also have replace/update imports, go one step further
for each data table have a corresponding table, that has one field more: superseededby
for each row you update/replace, place an original copy of the row in this table plus the import id in superseededby
to revert, you now have to add INSERT INTO originaltable SELECT * FROM superseededtable WHERE superseededby=$whatever
You can clean up superseededtable for known-good imports, to make sure, storage doesn't grow unlimited.
You have several options. Depending on when you notice the error.
If you know there is an error with the data, the you can use the transactions API to rollback to changes of the current transaction.
In case you know there was an error only later, then you can create your own log. Make an index identifying the transaction, and add a field to the relevant table where that id would be inserted. This would allow you to identify exactly which transaction it came from. You can also create a stored procedure that deletes rows according to the given transaction id.
I need to get the latest modified time of a table, so came across
select relfilenode from pg_class where relname = 'test';
which gives me the relfilenode id, this seems to be a directory name in
L:\Databases\PostgresSQL\data\base\inodenumber
For which I later extract the latest modified time.
Is this the right way to do this or are there any better methods to do the same
Testing the mtime of the table's relfilenode won't work well. As Eelke noted VACUUM among other operations will modify the timestamp. Hint bit setting will modify the table too, causing it to appear to be "modified" by a SELECT. Additionally, sometimes a table has more than one fork to its on-disk relation (1GB chunks), and you'd have to check all of them to find the most recent.
If you want to keep a last modified time for a table, add an AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ... FOR EACH STATEMENT trigger that updates a timestamp row in a table you use for tracking modification times.
The downside of the trigger is that it'll contest a single row lock on the table, so it'll serialize all your transactions. It'll also greatly increase the chance of getting deadlocks. What you really want is probably something nontransactional that doesn't have to roll back when the transaction does, where if multiple transactions update a counter the highest value wins. There's nothing like that built in, though it might not be too hard as a C extension.
A slightly more sophisticated option would be to create a trigger that uses dblink to do an update of the last-updated counter. That'll avoid most of the contention problems but it'll actually make deadlocking worse because PostgreSQL's deadlock detection won't be able to "see" the fact that the two sessions are deadlocked via an intermediary. You'd need a way to SELECT ... FOR UPDATE with a timeout to make it reliable without aborting transactions too often.
In any case, a trigger won't catch DDL, though. DDL triggers ("Event triggers") are coming in Pg 9.3.
See also:
How do I find the last time that a PostgreSQL database has been updated?
How to get 'last modified time' of the table in postgres?
I don't think that would be completly reliable as a vacuum would also modify the file(s) containing the table but the logical content of the table does not change during a vacuum.
You could create triggers for INSERT, UPDATE and DELETE that maintain the last modified timestamp for each table in another table. This method has a slight performance impact but will provide accurate information.
I am using sql server 2000. I need to get only updated records from remote server and need to insert that record in my local server on daily basis. But that table did not have created date or modified date field.
Use Transactional Replication.
Update
If you cannot do administrative operations on the source then you'll going to have to read all the data every day. Since you cannot detect changes (and keep in mind that even if you'd have a timestamp you still wouldn't be able to detect changes because there is no way to detect deletes with a timestamp) then you have to read every row every time you sync. And if you read every row, then the simplest solution is to just replace all the data you have with the new snapshot.
You need one of the following
a column in the table which flag new or updated records in a fashion or other (lastupdate_timestamp, incremental update counter...)
some trigger on Insert and Update, on the table, which produces some side-effect such as adding the corresponding row id into a separate table
You can also compare row-by-row the data from the remote server against that of the production server to get the list of new or updated rows... Such a differential update can also be produced by comparing some hash value, one per row, computed from the values of all columns for the row.
Barring one the above, and barring some MS-SQL built-in replication setup, the only other possibility I can think of is [not pretty]:
parsing the SQL Log to identify updates and addition to the table. This requires specialized software; I'm not even sure if the Log file format is published/documented, though I have seen this types of tools. Frankly this approach is more one for forensic-type situations...
If you can't change the remote server's database, your best option may be to come up with some sort of hash function on the values of a given row, compare the old and new tables, and pull only the ones where function(oldrow) != function(newrow).
You can also just do a direct comparison of the columns in question, and copy that record over when not all the columns in question are the same between old and new.
This means that you cannot modify values in the new table, or they'll get overwritten daily from the old. If this is an issue, you'll need another table in which to cache the old table's values from the day before; then you'll be able to tell whether old, new, or both were modified in the interim.
I solved this by using tablediff utility which will compare the data in two tables for non-convergence, and is particularly useful for troubleshooting non-convergence in a replication topology.
See the link.
tablediff utility
TO sum up:
You have an older remote db server that you can't modify anything in (such as tables, triggers, etc).
You can't use replication.
The data itself has no indication of date/time it was last modified.
You don't want to pull the entire table down each time.
That leaves us with an impossible situation.
You're only option if the first 3 items above are true is to pull the entire table. Even if they did have a modified date/time column, you wouldn't detect deletes. Which leaves us back at square one.
Go talk to your boss and ask for better requirements. Maybe something that can be done this time.