Tracking appends to a table in PostgreSQL - sql

Consider this table:
create table entries (
sequence_number integer
default nextval('entries_sequence_number_seq')
primary key,
time timestamp default now()
);
This table is used as an append-only stream of changes. There may be other tables involved in writes, but as the last SQL statement in each transaction, we insert a row to this table. In other words, our transactions may be large and time-consuming, but eventually we write this row and commit immediately.
Now we want one or more consumers that can track changes as they are appended to this table:
Each consumer needs to loop at regular intervals to fetch the next batch of changes in roughly chronological order — in other words, the delta of new rows appended to entries since the last time the consumer polled.
The consumer always goes forward in time, never backwards.
Each consumer gets all the data. There's no need for selective distribution.
Order of consumption is not important. However, the consumer eventually must see all committed entries: If an in-flight transaction commits a new entry to the table, it must be picked up.
We’d like to minimize the possibility of ever seeing the same row twice, but we can tolerate it if it happens.
Conceptually:
select * from entries where sequence_number > :high_watermark
…where the high_watermark is the highest number seen by the consumer.
However, since nextval() is computed before commit time, you can get into a situation where there are gaps caused by in-flight transactions that haven’t committed yet. You may have a race condition happen like so:
Assume world starts at sequence number 0.
Writer A txn: Inserts, gets sequence number 1.
Writer B txn: Inserts, gets sequence number 2.
Writer B txn commits.
Newest sequence number is now 2.
The consumer does it select on > 0, finds entry with sequence number 2, sets it as high_watermark.
Writer A txn commits.
The consumer does it select on > 2, thus never sees the entry with sequence number 1.
The race condition is probably very small in the general case, but it’s still a possibility, and the probability of it occurring increases with the load of the system.
So far the best, but certainly not elegant, solution that comes to mind is to always select with time:
select * from entries
where sequence_number > :newest_sequence_number
or time >= :newest_timestamp
This should theoretically — modulo leap seconds and drifting clocks — guarantee that older entries are seen, at the expense of getting rows that appeared in the last batch. The consumer should want to maintain a hopefully small set of already-seen entries that it can ignore. Leap seconds and drifting clocks could be accounted for by padding the timestamp with some unscientific number of seconds. The downside is that it will be constantly reading a bunch of redundant rows. And it just feels a bit clunky and hand-wavy.
A slightly blunter, but more deterministic approach would be to maintain an unlogged table of pending events, and always delete from it as we read from it. This has two downsides: One is performance, obviously. The other is that since there may be any number of consumers, we would have to produce one event per consumer, which in turn means we have to identify the consumers by some sort of unique ID at event-emitting time, and of course garbage-collect unused events when a consumer no longer exists.
It strikes me that a better approach than an unlogged table would be to use LISTEN/NOTIFY, with the ID of the entry as a payload. This has the advantage of avoiding polling in the first place, although that's not a huge win, since the object of the consumer in this application is to wake up only now and then and reduce work on the system. On the other hand, the only major downside I can see is that there is a limit (albeit a large one) to the number of messages that can be in flight, and that transactions will begin to fail if a notification cannot happen. This might be a reasonable compromise, however.
At the same time, something in the back of my mind is telling me that there must be a mathematically more elegant way of doing this with even less work.

Your improved idea with WHERE time >= :newest_timestamp is subject to the same race condition, because there is no guarantee that the timestamps are in commit order. Processes go to sleep occasionally.
Add a boolean field consumed_n for each consumer which is initialized to FALSE. Consumer n then uses:
UPDATE entries
SET consumed_n = TRUE
WHERE NOT consumed_n
RETURNING sequence_number, time;
It helps to have partial indexes ON entries(1) WHERE NOT consumed_n.
If that takes up too much storage for your taste, use one bit(n) field with a bit for each consumer.
The consumers will lock each other out as long as the transaction that issues these statements remains open. So keep it short for good concurrency.

Related

Get rows inserted since last check?

I am implementing a CQRS pattern where one or more processes are inserting records into the database and one or more processes are pulling them at a difference pace.
I'd like consumer processes to poll the database for new records that were inserted since last check, but I'm not sure how to (safely) implement this.
You can assume that rows will not change once they are inserted. It seems it isn't enough for each row to have a unique id, and a timestamp indicating when it was inserted.
If I query for records with a timestamp greater than the last row I saw then I run into problems if multiple records were inserted at the same time (having the same timestamp).
If I query for records with an id greater than the last row I saw then I run into problems where concurrent transactions may commit IDs in non-increasing order (e.g. postgreSQL sessions allocate and cache sequence IDs ahead of time to improve performance).
Ideally, I am looking for a DBMS-agnostic solution and be able to consume data as close to real-time as possible. Any ideas?
Clarification: Each row should be consumed multiple times, once per consumer. Meaning, just because one consumer processes a row should not prevent other consumers from doing so. Each consumer will do something different with the same data.
Since you have a lot of data coming in and might have multiple records for the last time stamp, you need a way to keep track of the data read. Here are a few different approaches with their pro and cons:
You can wait for the data to come in for a time stamp. You would do this by not reading the MAX(timestamp) so you would get all the data from the table except the last one for which the data might still be coming in.
Pro: Simple design
Con: Not real time processing
You can store the id's you have read each time for the last time stamp. When getting the data, you can use a query like (timestamp = lasttimestamp and id not in (set of ids)) or timestamp > lasttimestamp)
Pro: Almost real time
Con: Additional storage required
If you don't use sharding or similar:
You can use optimistic locking.
For this you can create an order column, with an unique index on the records table (the Log). Before each insertion, the producer query the Log for the greatest order, it increments it and insert the next record with this order.
If a concurrency exception occurs (i.e. Duplicate entry '12345' for key order) then you retry the entire process (query, increment, insert).
If you use sharding or similar:
Then you will need an additional service/table that will generate a new, unique, always-increasing order integer every time it is asked to do so.
This has the disadvantage that there is another piece that must be managed, a single point of failure that must be highly-available.
P.S.
"sharding or similar" means that you can't have unique indexes on the entire table because you use sharding or you write to multiple tables.
you can't rely on the timestamps or anything that relates to physical time because the system time may be adjusted, by an automated service (NTP) or by an human operator.

How to assign a sequential number (gapless) to a record on INSERT in a transactional sql database?

Let's say we have to store orders in the database and the requirement is that the orders should be numbered as YEAR/NUM where NUM is a number like 1, 2, 3,... without any gaps starting with 1 each year.
How to implement that the right way?
The first thought is:
last_num = get_int('select max(num) from orders where year = :current_year:')
next_num = last_num + 1
execute('insert into orders (year, num) values (:current_year:, :next_num:)');
That will do it in the most cases for most systems. But if you have very high load there is a possibility of a race condition that 2 threads ask for last_num simultaneously and obtain the same number. How to solve that? Do you need to do something with the transaction? Or something with locking a database table?
The solution should be database vendor independent. Just a theoretical transactional sql database.
UPDATE 1. Actually you can have a similar situation in a banking database where you have a field with how much money the guy has on his account. Now you need to add some money to his account (last_state + more_money). You can have the same race condition here at reading the last_state.
You can do this work for the transaction, but that requires a trigger. Most databases support the ANSI standard function ROW_NUMBER(), which allows you to do this on output:
select t.*,
row_number() over (partition by year order by id) as year_seqnum
from table t;
I would recommend having an auto_increment/identity/sequence id in the table or a creation date to capture the sequential ordering of rows (there can be gaps in such a column). The database can automatically update this field on input and you can use it to assign the sequence number per year later.
There are important reasons why having the database implement its own sequence number is a much, much better idea than trying to get a sequence number without holes.
Basically, to really do what you want, you have to lock the database for each insert transaction in order to get exactly the correct number -- with no gaps or duplicates. This is called a "serializable transaction". And, although supported by databases, is a very high bar performance-wise. In addition, deletions and perhaps updates to transactions become a nightmare. If you delete the first transaction of the year, you basically have to lock the entire year's worth of data in order to adjust the sequence numbers. And, there can be no inserts during this time.
It can be done, but consider if You can delete rows later and do not renumber, You would end up with gapes anyway.
You can make unique constraint on (num, year) and Your first thought would work well if chances of race conditions are low. Every collision would just mean one failed transaction, You can retry automatically.
A theoretical answer to theoretical question.
Gaps in sequence can appears when:
transaction A begins
A allocates (reserves) a new number in the sequence (say, 1)
transaction B begins
B allocates (reserves) next number in the sequence (say, 2)
A fails and rolls back
B commits with sequence value 2
The sequence value 1 is not used in the final committed data - it is a gap
In theory, to prevent these kind of gaps you need to make sure that no two transactions are running in parallel. It is relatively easy to implement this in practice - just lock the whole table for the duration of the transaction and make all concurrent transactions to wait in queue. Or set transaction isolation level to serializable.
In practice, it usually reduces the throughput of the system and people don't do it.
In your second example of banking database, I would not do it like you described. I store a simple list of all actions (deposits and withdrawals) that happened with the account (just a date and a positive or negative amount of the single transaction). I do not have a permanent field that contains the balance of the account.
When the statement is printed I sum up all actions (over this account) till the needed date to get the balance of the account at a given date.
So, there is no place for race condition at all with this approach.

How is concurrency control implemented in any ORDBMS

I have a weird question about concurrency control in ORDBMS. This is completely theoretical.
I have two transactions T1 and T2 trying to update a particular row on a table.
Now both the transactions T1 and T2 hits the database simultaneously.
By simultaneously, I mean both hits at the same time calculated till nanoseconds.
So if both the transactions have a timestamp that is exactly same, then how does a DBMS (be it Oracle, DB2, SQL Server) identifies which transaction to process first and which transaction to process later.
I understand that a row level lock will be achieved by one transaction and the other will wait till the lock is released. But how will it identify whether T1 or T2 will acquire the lock. Is there some other parameter that is taken into account other than timestamp.
Thanks
Nirmalya
This question seems to be related more to concurrency control of DBMS in general, rather then of ORDBMS.
Anyway, as far as I know, even if two requests are issued exactly at the same time, they will be processed sequentially by the scheduler, which is responsible for acquiring locks and assigning timestamps. Obviously, only the scheduler is sequential: after scheduling, the queries can be processed parallely, if this is allowed by the locks and by timestamps ordering.
Wikipedia has an accurate explanation of timestamp-based concurrency control: https://en.wikipedia.org/wiki/Timestamp-based_concurrency_control. There you can see that there are few assumptions to be made. Look at the first two:
Every timestamp value is unique and accurately represents an instant in time.
No two timestamps can be the same.
These assumption can be guaranteed only using a thread-safe scheduler, which assings timestamps to transactions sequentially. Also in lock-based concurrency control the scheduler must be thread-safe. Otherwise when it locks a record it cannot be sure that no another transactions acquired a lock on the same record.

How to efficiently keep count by reading, incrementing it & updating a column in the database

I have a column in the database which keeps counts of incoming requests, but updated from different sources and systems.
And the incoming requests are in thousands per minute.
What is the best way to update this column with the new request count?
The 2 ways at the top of my head are -
Read current value from column, increment it by one, and then update it back(All part of a sproc).
The problem I see with this is that every source/system that updates needs to lock this column and this might increase the wait time of read and updating of the column. And will slow down the DB.
Put requests in a queue, and a job reads the queue and updates the column, one at a time. This method looks safer, atleast to me, but is it too much work to get a count of requests coming in?
What is the approach you would typically take in such a read & update in a column in huge amounts scenario?
Thanks
1000s per minute is not "huge". Let's say its 10k per minute. That leaves 6ms of time per update. For an in-memory row with a simple integer increment and not too many indexes expect <1ms per update. Works out fine.
So just use
UPDATE T SET Count = Count + 1 WHERE ID = 1234
Put an index on the database and just do:
update table t
set request_count = requestcount + 1
where <whatever conditions are appropriate>;
Be sure that the conditions in the where clause all refer to indexes, so finding the row is likely to be as fast as possible.
Without strenuous effort, I would expect the update to be as fast enough. You should test this to see if this is true. You could also insert a row into a requests table and do the counting when you query that table. inserts are faster than updates, because the engine doesn't have to find the row first.
If this doesn't meet performance goals, then some sort of distributed mechanism may prove successful. I don't see that batching the requests using sequences would be a simple solution. Although the queue is likely to be distributed, you then have the problem that the request counts are out-of-sync with the actual updates.

Efficiently detecting concurrent insertions using standard SQL

The Requirements
I have a following table (pseudo DDL):
CREATE TABLE MESSAGE (
MESSAGE_GUID GUID PRIMARY KEY,
INSERT_TIME DATETIME
)
CREATE INDEX MESSAGE_IE1 ON MESSAGE (INSERT_TIME);
Several clients concurrently insert rows in that table, possibly many times per second. I need to design a "Monitor" application that will:
Initially, fetch all the rows currently in the table.
After that, periodically check if there are any new rows inserted and then fetch
these rows only.
There may be multiple Monitors concurrently running. All the Monitors need to see all the rows (i.e. when a row is inserted, it must be "detected" by all the currently running Monitors).
This application will be developed for Oracle initially, but we need to keep it portable to every major RDBMS and would like to avoid as much database-specific stuff as possible.
The Problem
The naive solution would be to simply find the maximal INSERT_TIME in rows selected in step 1 and then...
SELECT * FROM MESSAGE WHERE INSERT_TIME >= :max_insert_time_from_previous_select
...in step 2.
However, I'm worried this might lead to race conditions. Consider the following scenario:
Transaction A inserts a new row but does not yet commit.
Transaction B inserts a new row and commits.
The Monitor selects rows and sees that the maximal INSERT_TIME
is the one inserted by B.
Transaction A commits. At this point, A's INSERT_TIME is actually
earlier than the B's (A's INSERT was actually executed before
B's, before we even knew who is going to commit first).
The Monitor selects rows newer than B's INSERT_TIME (as a consequence of step 3). Since A's INSERT_TIME is earlier than B's insert time, A's row is skipped.
So, the row inserted by transaction A is never fetched.
Any ideas how to design the client SQL or even change the database schema (as long as it is mildly portable), so these kinds of concurrency problems are avoided, while still keeping a decent performance?
Thanks.
Without using any of the platform-specific change data capture (CDC) technologies, there are a couple of approaches.
Option 1
Each Monitor registers a sort of subscription to the MESSAGE table. The code that writes messages then writes each MESSAGE once per Monitor, i.e.
CREATE TABLE message_subscription (
message_subscription_id NUMBER PRIMARY KEY,
message_id RAW(32) NOT NULLL,
monitor_id NUMBER NOT NULL,
CONSTRAINT uk_message_sub UNIQUE (message_id, monitor_id)
);
INSERT INTO message_subscription
SELECT message_subscription_seq.nextval,
sys_guid,
monitor_id
FROM monitor_subscribers;
Each Monitor then deletes the message from its subscription once that is processed.
Option 2
Each Monitor maintains a cache of the recent messages it has processed that is at least as long as the longest-running transaction could be. If the Monitor maintained a cache of the messages it has processed for the last 5 minutes, for example, it would query your MESSAGE table for all messages later than its LAST_MONITOR_TIME. The Monitor would then be responsible for noting that some of the rows it had selected had already been processed. The Monitor would only process MESSAGE_ID values that were not in its cache.
Option 3
Just like Option 1, you set up subscriptions for each Monitor but you use some queuing technology to deliver the messages to the Monitor. This is less portable than the other two options but most databases can deliver messages to applications via queues of some sort (i.e. JMS queues if your Monitor is a Java application). This saves you from reinventing the wheel by building your own queue table and gives you a standard interface in the application tier to code against.
You need to be able to identify all rows added since the last time you checked (i.e. the monitor checks). You have a "time of insert" column. However, as you spell it out, that time of insert column cannot be used with "greater than [last check]" logic to reliably identify subsequently inserted new items. Commits do not occur in the same order as (initial) inserts. I am not aware of anything that works on all major RDBMSs that would clearly and safely apply such an "as of" tag at the actual time of commit. [This is not to say I would know it if such a thing existed, but it seems a pretty safe guess to me.] Thus, you will have to use something other than a "greater than last check" algorithm.
That leads to filtering. Upon insert, an item (row) is flagged as "not yet checked"; when a montior logs in, it reads all not yet checked items, returns that set, and flips the flag to "checked" (and if there are multiple monitors, this must all be done within its own transaction). The monitors' queries will have to read all the data and pick out which have not yet been checked. The implication is, however, that this will be a fairly small set of data, at least relative to the entire set of data. From here, I see two likely options:
Add a column, perhaps "Checked". Store a binary 1/0 value for is/isnot checked. The cardinality of this value will be extreme -- 99.9s Checked, 00,0s Unchecked, so it should be rather efficient. (Some RDBMSs provide filtered queries, such that the Checked rows won't even be in the index; once flipped to checked, a row will presumably never be flipped back, so the overhead to support this shouldn't be too great.)
Add a separate table identify those rows in the "primary" table that have not yet been checked. When a montior logs in, it reads and deletes the items from that table. This doesn't seem efficient... but again, if the data set involved is small, the overall performance pain might be acceptable.
You should use Oracle AQ with a multi-subscriber queue.
This is Oracle specific, but you can create an abstraction layer of stored procedures (or abstract in Java if you like) so that you have a common API to enqueue the new messages and have each subscriber (monitor) dequeue any pending messages. Behind that API, for Oracle you use AQ.
I am not sure if there is a queuing solution for other databases.
I don't think you will be able to come up with a totally database agnostic approach that meets your requirements. You could extend the example above that included the 'checked' column, to have a second table called monitor_checked - that would contain one row per message per monitor. That is basically what AQ does behind the scenes, so it is sort of reinventing the wheel.
With PostgreSQL, use PgQ. It has all those little details worked out for you.
I doubt you will find a robust and manageable database-agnostic solution for this.