What I am trying to do
I am developing a web service, which runs in multiple server instances, all accessing the same RDBMS (PostgreSQL). While the database is needed for persistence, it contains very little data, which is why every server instance has a cache of all the data. Further the application is really simple in that it only ever inserts new rows in rather simple tables and selects that data in a scheduled fashion from all server instances (no updates or changes... only inserts and reads).
The way it is currently implemented
basically I have a table which roughly looks like this:
id BIGSERIAL,
creation_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- further data columns...
The server is doing something like this every couple of seconds (pseudocode):
get all rows with creation_timestamp > lastMaxTimestamp
lastMaxTimestamp = max timestamp for all data just retrieved
insert new rows into application cache
The issue I am running into
The application skips certain rows when updating the caches. I analyzed the issue and figured out, that the problem is caused in the following way:
one server instance is creating a new row in the context of a transaction. An id for the new row is retrieved from the associated sequence (id=n) and the creation_timestamp (with value ts_1) is set.
another server does the same in the context of a different transaction. The new row in this transaction gets id=n+1 and a creation_timestamp ts_2 (where ts_1 < ts_2).
transaction 2 finishes before transaction 1
one of the servers executes a "select all rows with creation_timestamp > lastMaxTimestamp". It gets row n+1, but not n1. It sets lastMaxTimestamp to ts_2.
transaction 1 completes
some time later the server from step 4 executes "select all rows with creation_timestamp > lastMaxTimestamp" again. But since lastMaxTimestamp=ts_2 and ts_2>ts_1 the row n will never be read on that server.
Note: CURRENT_TIMESTAMP has the same value during a transaction, which is the transaction start time.
So the application gets inconsistent data into its cache and can't get new rows based on the insertion timestamp OR based on the sequence id. Transaction isolation levels don't really change anything about the situation, since the problem is created in essence by transaction 2 finishing before transaction 1.
My question
Am I missing something? I am thinking there must be a straightforward way to get all new rows of a RDBMS, but I can't come up with a simple solution... at least with a simple solution that is consistent. Extensive locking (e.g. of tables) wouldn't be acceptable because of performance reasons. Simply trying to ensure to get all ids from that sequence seems like a) a complicated solution and b) can't be done easily, since rollbacks during transactions can happen (which would lead to sequence ids not being used).
Anyone has the solution?
After a lot of searching, I found the right keywords to google for... "transaction commit timestamp" to leads to all sorts of transaction timestamp tracking and system columns like xmin:
https://dba.stackexchange.com/questions/232273/is-there-way-to-get-transaction-commit-timestamp-in-postgres
This post has some more detailed information:
Questions about Postgres track_commit_timestamp (pg_xact_commit_timestamp)
In short:
you can turn on a postgresql option to track timestamps of commits and compare those instead of the current_timestamps/clock_timestamps inside the transaction
it seems though, that it is only tracked when a transaction is completed - not when it is commited, which makes the solution not bullet proof. There are also further issue to consider like transaction id (xmin) rollover for example
logical decoding / replication is something to look into for a proper solution
Thanks to everyone trying to help me find an answer. I hope this summary is useful to someone in the future.
Related
Context : I expose an API that provides only the new stuff presents in a PostgreSQL table. "New" means added since the last call.
To do that, I have switched on the track_commit_timestamp option defined into my PostgreSQL server and ordered table content by this commit date. It works well.
To provide the new stuff, I keep the most recent commit timestamp presents on my SQL select response and uses it to build the next request like a starting point (date >= to this value).
But, I have a question about this mechanism (track_commit_timestamp) : Is it possible for 2 simultaneous commits to have the same timestamp and are not "visibles" at the same moment ? And so, if my "SQL select" request is executed during this small interval I lost the second transaction content ?
I don't think "simultaneous" makes sense here.
Imagine two transactions writing to your database (A,B) and one reading from it (C).
time------------>
A->|
B------------>|
C-->|
Now, (C) will of course see A with a timestamp of T1. B also has a timestamp of T1 but hasn't committed yet. As a result, C can't see it since it may be rolled back.
So, you can only reliably access data for transactions that completed before C started. The latest viable timestamp is just before the oldest concurrent transaction.
There are two simplifications that might help you and one alternative approach.
Firstly, if all your transactions are short (say under 1 second) then you simply subtract 1 second from your transaction's start time and read changes older than that. This is actually quite commonly workable.
Secondly, if concurrency is not a concern you can enforce serialisation perhaps with transaction levels or through an explicit lock.
The alternative is to stop worrying about timestamps and instead consider your situation as a stream of changes. This would be the approach taken by various trigger-based replication systems or the logical replication supported as an add-on in 9.5(?) onwards and natively in version 10.
I am implementing a CQRS pattern where one or more processes are inserting records into the database and one or more processes are pulling them at a difference pace.
I'd like consumer processes to poll the database for new records that were inserted since last check, but I'm not sure how to (safely) implement this.
You can assume that rows will not change once they are inserted. It seems it isn't enough for each row to have a unique id, and a timestamp indicating when it was inserted.
If I query for records with a timestamp greater than the last row I saw then I run into problems if multiple records were inserted at the same time (having the same timestamp).
If I query for records with an id greater than the last row I saw then I run into problems where concurrent transactions may commit IDs in non-increasing order (e.g. postgreSQL sessions allocate and cache sequence IDs ahead of time to improve performance).
Ideally, I am looking for a DBMS-agnostic solution and be able to consume data as close to real-time as possible. Any ideas?
Clarification: Each row should be consumed multiple times, once per consumer. Meaning, just because one consumer processes a row should not prevent other consumers from doing so. Each consumer will do something different with the same data.
Since you have a lot of data coming in and might have multiple records for the last time stamp, you need a way to keep track of the data read. Here are a few different approaches with their pro and cons:
You can wait for the data to come in for a time stamp. You would do this by not reading the MAX(timestamp) so you would get all the data from the table except the last one for which the data might still be coming in.
Pro: Simple design
Con: Not real time processing
You can store the id's you have read each time for the last time stamp. When getting the data, you can use a query like (timestamp = lasttimestamp and id not in (set of ids)) or timestamp > lasttimestamp)
Pro: Almost real time
Con: Additional storage required
If you don't use sharding or similar:
You can use optimistic locking.
For this you can create an order column, with an unique index on the records table (the Log). Before each insertion, the producer query the Log for the greatest order, it increments it and insert the next record with this order.
If a concurrency exception occurs (i.e. Duplicate entry '12345' for key order) then you retry the entire process (query, increment, insert).
If you use sharding or similar:
Then you will need an additional service/table that will generate a new, unique, always-increasing order integer every time it is asked to do so.
This has the disadvantage that there is another piece that must be managed, a single point of failure that must be highly-available.
P.S.
"sharding or similar" means that you can't have unique indexes on the entire table because you use sharding or you write to multiple tables.
you can't rely on the timestamps or anything that relates to physical time because the system time may be adjusted, by an automated service (NTP) or by an human operator.
A lot of events of various type is triggered in my application - I want to count them over time to keep track.
I'm trying to figure out the best way to do this. I will have multiple servers and threads saving events, so it has to work during concurrency.
I cannot have one row per event, since the number of events is very large, it has to be aggregated somehow.
So, I'm trying to have a table row per event type and "time interval" like
COLUMN
------
ID
EVENTTYPE
COUNT
FIRSTTIMESTAMP
LASTTIMESTAMP
I first tried to make a solution where a new row was created by the logger:
UDPATE EVENTCOUNTER SET COUNT = COUNT + 1 WHERE LASTTIMESTAMP > CURRENT TIMESTAMP and EVENTTYPE = ?;
If num rows updated = 0 then insert a new row with new timestamps.
However, to make this work, I would have to lock the entire table so that there will no be a race condition creating new rows from multiple threads.
i.e.
LOCK TABLE EVENTCOUNTER ..
UPDATE ..
if numRows = 0 then INSERT ..
COMMIT
Will this table lock impact performance by a great deal? Is there a better way to solve my problem without table locks?
Using a DB2 database and Java Client - actually Hibernate if that matters.
I would insert a row for each event and run a script every 24h or so that aggregates the information and puts the aggregated in a separate table. This is the classical way OLAP (analisys services etc.) works.
Since you have many inputs, consider using a less restrictive lock (here are the transaction locks for db2). After all, if your aggregated result misses a few entries out of hundreds it's not that bad.
I do not know much about DB2 but in Oracle there are Insert/Update row or statement level triggers. So, every time user inserts/updates a row trigger fires and does smth. (row level trigger) like update some log or other table as in your case. Not sure if this feature is available in DB2. Just FYI...
The Requirements
I have a following table (pseudo DDL):
CREATE TABLE MESSAGE (
MESSAGE_GUID GUID PRIMARY KEY,
INSERT_TIME DATETIME
)
CREATE INDEX MESSAGE_IE1 ON MESSAGE (INSERT_TIME);
Several clients concurrently insert rows in that table, possibly many times per second. I need to design a "Monitor" application that will:
Initially, fetch all the rows currently in the table.
After that, periodically check if there are any new rows inserted and then fetch
these rows only.
There may be multiple Monitors concurrently running. All the Monitors need to see all the rows (i.e. when a row is inserted, it must be "detected" by all the currently running Monitors).
This application will be developed for Oracle initially, but we need to keep it portable to every major RDBMS and would like to avoid as much database-specific stuff as possible.
The Problem
The naive solution would be to simply find the maximal INSERT_TIME in rows selected in step 1 and then...
SELECT * FROM MESSAGE WHERE INSERT_TIME >= :max_insert_time_from_previous_select
...in step 2.
However, I'm worried this might lead to race conditions. Consider the following scenario:
Transaction A inserts a new row but does not yet commit.
Transaction B inserts a new row and commits.
The Monitor selects rows and sees that the maximal INSERT_TIME
is the one inserted by B.
Transaction A commits. At this point, A's INSERT_TIME is actually
earlier than the B's (A's INSERT was actually executed before
B's, before we even knew who is going to commit first).
The Monitor selects rows newer than B's INSERT_TIME (as a consequence of step 3). Since A's INSERT_TIME is earlier than B's insert time, A's row is skipped.
So, the row inserted by transaction A is never fetched.
Any ideas how to design the client SQL or even change the database schema (as long as it is mildly portable), so these kinds of concurrency problems are avoided, while still keeping a decent performance?
Thanks.
Without using any of the platform-specific change data capture (CDC) technologies, there are a couple of approaches.
Option 1
Each Monitor registers a sort of subscription to the MESSAGE table. The code that writes messages then writes each MESSAGE once per Monitor, i.e.
CREATE TABLE message_subscription (
message_subscription_id NUMBER PRIMARY KEY,
message_id RAW(32) NOT NULLL,
monitor_id NUMBER NOT NULL,
CONSTRAINT uk_message_sub UNIQUE (message_id, monitor_id)
);
INSERT INTO message_subscription
SELECT message_subscription_seq.nextval,
sys_guid,
monitor_id
FROM monitor_subscribers;
Each Monitor then deletes the message from its subscription once that is processed.
Option 2
Each Monitor maintains a cache of the recent messages it has processed that is at least as long as the longest-running transaction could be. If the Monitor maintained a cache of the messages it has processed for the last 5 minutes, for example, it would query your MESSAGE table for all messages later than its LAST_MONITOR_TIME. The Monitor would then be responsible for noting that some of the rows it had selected had already been processed. The Monitor would only process MESSAGE_ID values that were not in its cache.
Option 3
Just like Option 1, you set up subscriptions for each Monitor but you use some queuing technology to deliver the messages to the Monitor. This is less portable than the other two options but most databases can deliver messages to applications via queues of some sort (i.e. JMS queues if your Monitor is a Java application). This saves you from reinventing the wheel by building your own queue table and gives you a standard interface in the application tier to code against.
You need to be able to identify all rows added since the last time you checked (i.e. the monitor checks). You have a "time of insert" column. However, as you spell it out, that time of insert column cannot be used with "greater than [last check]" logic to reliably identify subsequently inserted new items. Commits do not occur in the same order as (initial) inserts. I am not aware of anything that works on all major RDBMSs that would clearly and safely apply such an "as of" tag at the actual time of commit. [This is not to say I would know it if such a thing existed, but it seems a pretty safe guess to me.] Thus, you will have to use something other than a "greater than last check" algorithm.
That leads to filtering. Upon insert, an item (row) is flagged as "not yet checked"; when a montior logs in, it reads all not yet checked items, returns that set, and flips the flag to "checked" (and if there are multiple monitors, this must all be done within its own transaction). The monitors' queries will have to read all the data and pick out which have not yet been checked. The implication is, however, that this will be a fairly small set of data, at least relative to the entire set of data. From here, I see two likely options:
Add a column, perhaps "Checked". Store a binary 1/0 value for is/isnot checked. The cardinality of this value will be extreme -- 99.9s Checked, 00,0s Unchecked, so it should be rather efficient. (Some RDBMSs provide filtered queries, such that the Checked rows won't even be in the index; once flipped to checked, a row will presumably never be flipped back, so the overhead to support this shouldn't be too great.)
Add a separate table identify those rows in the "primary" table that have not yet been checked. When a montior logs in, it reads and deletes the items from that table. This doesn't seem efficient... but again, if the data set involved is small, the overall performance pain might be acceptable.
You should use Oracle AQ with a multi-subscriber queue.
This is Oracle specific, but you can create an abstraction layer of stored procedures (or abstract in Java if you like) so that you have a common API to enqueue the new messages and have each subscriber (monitor) dequeue any pending messages. Behind that API, for Oracle you use AQ.
I am not sure if there is a queuing solution for other databases.
I don't think you will be able to come up with a totally database agnostic approach that meets your requirements. You could extend the example above that included the 'checked' column, to have a second table called monitor_checked - that would contain one row per message per monitor. That is basically what AQ does behind the scenes, so it is sort of reinventing the wheel.
With PostgreSQL, use PgQ. It has all those little details worked out for you.
I doubt you will find a robust and manageable database-agnostic solution for this.
I am trying to select 100s of rows at a DB that contains 100000s of row and update those rows afters.
the problem is I don't want to go to DB twice for this purpose since update only marks those rows as "read".
is there any way I can do this in java using simple jdbc libraries? (hopefully without using stored procedures)
update: ok here is some clarification.
there are a few instance of same application running on different servers, they all need to select 100s of "UNREAD" rows sorted according to creation_date column, read blob data within it, write it to file and ftp that file to some server. (I know prehistoric but requirements are requirements)
The read and update part is for to ensure each instance getting diffent set of data. (in order, tricks like odds and evens wont work :/)
We select data for update. the data transfers through the wire (we wait and wait) and then we update them as "READ". then release lock for reading. this entire thing takes too long. By reading and updating at the same time, I would like to reduce lock time (from time we use select for update to actual update) so that using multiple instances would increase read rows per second.
Still have ideas?
It seems to me there might be more than one way to interpret the question here.
You are selecting the rows for the
sole purpose of updating them and
not reading them.
You are selecting the rows to show
to somebody, and marking them as
read either one at a time or all as a group.
You want to select the rows and mark
them as read at the time you select
them.
Let's take Option 1 first, as that seems to be the easiest. You don't need to select the rows in order to update them, just issue an update with a WHERE clause:
update table_x
set read = 'T'
where date > sysdate-1;
Looking at option 2, you want to mark them as read when a user has read them (or a down stream system has received it, or whatever). For this, you'll probably have to do another update. If you query for the primary key, in addition to the other columns you'll need in the first select, you will probably have an easier time of updating, as the DB won't have to do table or index scans to find the rows.
In JDBC (Java) there is a facility to do a batch update, where you execute a set of updates all at once. That's worked out well when I need to perform a lot of updates that are of the exact same form.
Option 3, where you want to select and update all in one shot. I don't find much use for this, personally, but that doesn't mean others don't. I suppose some kind of stored procedure would reduce the round trips. I'm not sure what db you are working with here and can't really offer specifics.
Going to the DB isn't so bad. If you aren't returning anything 'across the wire' then an update shouldn't do you too much damage and its only a few hundred thousand rows. What is your worry?
If you're doing a SELECT in JDBC and iterating over the ResultSet to UPDATE each row, you're doing it wrong. That's an (n+1) query problem that will never perform well.
Just do an UPDATE with a WHERE clause that determines which of those rows needs to be updated. It's a single network round trip that way.
Don't be too code-centric. Let the database do the job it was designed for.
Can't you just use the same connection without closing it?