Oracle: Data consistency across multiple tables to be displayed - sql

I have 3 reports based on 3 different tables, which ideally should match each other in audit.
They are updated sequentially once in a day.
The problem here is when one of the table is updated and second one is in progress, the customer sees data discrepancy between the reports for some time.
We tried the solution where in we commit after all 3 tables are updated but we started having issue on undo tbsp. The application have many other things running on.
I am looking for a solution where in we can restrict the user to show data to a specific point, and he must see updated data only after all 3 tables are refreshed/updated.

I think you can use select * for update for all 3 tables befor start updating procedure.
In that case users can select data and will see only not changed data till update session will not finish and make commit.

You can use a flashback query to show data as-of a point in time:
select * from table1 as of timestamp timestamp '2021-12-10 12:00:00';
The application would need to determine the latest time when the tables were synchronized - perhaps with a log table that records when the update process last started. However, the flashback query also uses the UNDO tablespace. But the query should at least use less UNDO since some of the committed transactions will now free up some space.

Related

How to consistently track all new rows in a SQL database table

What I am trying to do
I am developing a web service, which runs in multiple server instances, all accessing the same RDBMS (PostgreSQL). While the database is needed for persistence, it contains very little data, which is why every server instance has a cache of all the data. Further the application is really simple in that it only ever inserts new rows in rather simple tables and selects that data in a scheduled fashion from all server instances (no updates or changes... only inserts and reads).
The way it is currently implemented
basically I have a table which roughly looks like this:
id BIGSERIAL,
creation_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- further data columns...
The server is doing something like this every couple of seconds (pseudocode):
get all rows with creation_timestamp > lastMaxTimestamp
lastMaxTimestamp = max timestamp for all data just retrieved
insert new rows into application cache
The issue I am running into
The application skips certain rows when updating the caches. I analyzed the issue and figured out, that the problem is caused in the following way:
one server instance is creating a new row in the context of a transaction. An id for the new row is retrieved from the associated sequence (id=n) and the creation_timestamp (with value ts_1) is set.
another server does the same in the context of a different transaction. The new row in this transaction gets id=n+1 and a creation_timestamp ts_2 (where ts_1 < ts_2).
transaction 2 finishes before transaction 1
one of the servers executes a "select all rows with creation_timestamp > lastMaxTimestamp". It gets row n+1, but not n1. It sets lastMaxTimestamp to ts_2.
transaction 1 completes
some time later the server from step 4 executes "select all rows with creation_timestamp > lastMaxTimestamp" again. But since lastMaxTimestamp=ts_2 and ts_2>ts_1 the row n will never be read on that server.
Note: CURRENT_TIMESTAMP has the same value during a transaction, which is the transaction start time.
So the application gets inconsistent data into its cache and can't get new rows based on the insertion timestamp OR based on the sequence id. Transaction isolation levels don't really change anything about the situation, since the problem is created in essence by transaction 2 finishing before transaction 1.
My question
Am I missing something? I am thinking there must be a straightforward way to get all new rows of a RDBMS, but I can't come up with a simple solution... at least with a simple solution that is consistent. Extensive locking (e.g. of tables) wouldn't be acceptable because of performance reasons. Simply trying to ensure to get all ids from that sequence seems like a) a complicated solution and b) can't be done easily, since rollbacks during transactions can happen (which would lead to sequence ids not being used).
Anyone has the solution?
After a lot of searching, I found the right keywords to google for... "transaction commit timestamp" to leads to all sorts of transaction timestamp tracking and system columns like xmin:
https://dba.stackexchange.com/questions/232273/is-there-way-to-get-transaction-commit-timestamp-in-postgres
This post has some more detailed information:
Questions about Postgres track_commit_timestamp (pg_xact_commit_timestamp)
In short:
you can turn on a postgresql option to track timestamps of commits and compare those instead of the current_timestamps/clock_timestamps inside the transaction
it seems though, that it is only tracked when a transaction is completed - not when it is commited, which makes the solution not bullet proof. There are also further issue to consider like transaction id (xmin) rollover for example
logical decoding / replication is something to look into for a proper solution
Thanks to everyone trying to help me find an answer. I hope this summary is useful to someone in the future.

How to find list of rows being affected in an update?

Yesterday I made a data correction in my Oracle table in production environment. But later I found out that however my selection command fetched 62 rows, apparently with the same conditions 64 rows got updated. Since I do not have the list of rows affected in that update, I am unable to compare the list of rows selected and later the list of rows updated. So is there a way to find the list of rows that were updated on that particular time on that table, say from 16:20 to 16:21? Does Oracle keep track on which time which rows of a given table were updated?
If you have flashback enabled on your database you can check the data at a particular time in the database.
SELECT * FROM table
AS OF TIMESTAMP
TO_TIMESTAMP('2015-01-14 13:33:00', 'YYYY-MM-DD HH:MI:SS')
WHERE column = 'your value';
Using this you can check the data before and after for the records you suspect.
If you cant find the records, then you can use
SELECT SCN_TO_TIMESTAMP(ORA_ROWSCN) FROM table where <<your condition>>
However SCN_TO_TIMESTAMP(ORA_ROWSCN) can be obtained only for few records which were updated recently. You can only convert to and from SCNs that are in the redo/flashback window maintained by the system. Once changes age out then the mapping is lost.
Try and use oracle FLASHBACK functionality (just in case your database is configured as per flashback utility)
if enabled, it will show you your table's data at any given point of time in past.
for more details on this, please follow http://docs.oracle.com/cd/E11882_01/appdev.112/e41502/adfns_flashback.htm#ADFNS01001

No waiting while Truncate Table

I have an SSIS package that runs repeatedly after 1 hour. This package first truncates a table and then populate that table with new data. And this process takes 15-20 minutes. When this package runs, data is not available to the users. So they have to wait until package runs completely. Is there any way to handle this situation so users don't have to wait?
Do not truncate the table. Instead, add a audit column with date data type, partition the table with hourly partitions on this audit column, drop the old partition once the new partition is loaded with new data.
Make sure the users query are directed to the proper partition with the help of the audit column
You can do an 'A-B flip'.
Instead of truncating the client-facing table and reloading it, you could use two tables to do the job.
For example, if the table in question is called ACCOUNT:
Load the data to a table called STG_ACCOUNT
Rename ACCOUNT to ACCOUNT_OLD
Rename STG_ACCOUNT to ACCOUNT
Rename ACCOUNT_OLD to STG_ACCOUNT
By doing this, you minimize the amount of time the users have an empty table.
It's very dangerous practice but you can change isolation levels of your transactions (I mean users queries) from ReadCommitted/Serializable to ReadUncommitted. But the behavior of this queries is very hard to predict. If your table is under using of SSIS package (insert/delete/update) and end users do some uncommitted reads (like SELECT * FROM Table1 WITH (NOLOCK) ), some rows can be counted several times or missed.
If users want to read only 'new-hour-data' you can try to change isolation levels to 'dirty read', but be careful!
If they can work with data from previous hour, the best solution is described by Arnab, but partitions are available only in Enterprise edition. Use rename in another SQL Server editions as Zak said.
[Updated] If the main lag (tens of minutes, as you said) is caused by complex calculations (and NOT because of amount of loaded rows!), you can use another table like a buffer. Store there several rows (hundreds, thousands etc.) and then reload them to the main table. So new data will be available in portions without 'dirty read' tricks.

Finding changed records in a database table

I have a problem that I haven't been able to come up with a solution for yet. I have a database (actually thousands of them at customer sites) that I want to extract data from periodically. I'd like to do a full data extract one time (select * from table) then after that only get rows that have changed.
The challenge is that there aren't any updated date columns in most of the tables that could be used to constrain the SQL query. I can't use a trigger based approach nor change the application that writes to the database since it's another group that develops the app and they are way backed up already.
I may be able to write to the database tables when doing the data extract, but would prefer not to do that. Does anyone have any ideas for how we might be able to do this?
You will have to programatically mark the records. I see suggestions of an auto-incrementing field but that will only get newly inserted records. How will you track updated or deleted records?
If you only want newly inserted that an autoincrementing field will do the job; in subsequent data dumps grab every thing since the last value of the autoincrment field and then recrod the current value.
If you want updates the minimum I can see is to have a last_update field and probably a trigger to populare it. If the last_update is later the the last data dump grab that record. This will get inserts and updates but not deletes.
You could try something like a 'instead of delete' trigger if your RDBMS supports it and NULL the last_update field. On subsequent data dumps grap all recoirds where this field is NULL and then delete them. But there would be problems with this (e.g. how to stop the app seeing them between the logical and physical delete)
The most fool proof method I can see is aset of history (audit) tables and ech change gets written to them. Then you select your data dump from there.
By the way do you only care about know the updates have happened? What about if 2 (or more) updates have happened. The history table is the only way that I can see you capturing this scenario.
This should isolate rows that have changed since your last backup. Assuming DestinationTable is a copy of SourceTable even on the key fields; if not you could list out the important fields.
SELECT * FROM SourceTable
EXCEPT
SELECT * FROM DestinationTable

Periodically Deleting Rows in TSQL

I have an audit table setup which essentially mirrors one of my tables along with a date, user and command type. Here's how it might look like:
AuditID UserID Individual modtype user audit_performed
1 1239 Day Meff INSERT dbo 2010-11-04 14:50:56.357
2 2334 Dasdf fdlla INSERT dbo 2010-11-04 14:51:07.980
3 3324 Dasdf fdla DELETE dbo 2010-11-04 14:51:11.130
4 5009 Day Meffasdf UPDATE dbo 2010-11-04 14:51:12.777
Since these types of tables can get big pretty quick - I was thinking of putting in some sort of automatic delete of the older rows. So for example if I have 3 months of history - if I could delete the first month while retaining the last two. And again all of this must be automatic - I imagine once a certain date is hit, a query activates and deletes the oldest month with audit data. What is the best way to do this?
I'm using SQL Server 2005 by the way.
A SQL agent job should be fine here. You definitely don't need to do this on every single insert with a trigger. I doubt you even need to do it every day. You could schedule a job that runs once a month and clears out anything older than 2 months (so at most you'd have 3 months of data minus 1 day at any given time).
You could use SQL Server agent..you can schedule a repeating job like deleting entries from the current audit table after certain period. Here is how you would do it.
I would recommend storing the data in another table audit_archive table and deleting it from the current audit table. So, that in case you want some history you still have it and your table also doesn't get too big.
You could try a trigger every time a row is added it will clear anything older than 3 months.
You could also try SQL Agent to run a script every day that will do that.
Have you looked at using triggers? You could define a trigger to run when you add a row (on INSERT) that deletes any rows that are more than three months old.