Conflict resolution? - conflict

If 2 or more users are both offline and they are editing the same data, who wins? Or, better yet, is there a conflict/merge resolution?

The answer depends on how they're modifying data.
set() (and remove, push, setWithPriority, etc.) are last-write-wins. So if a client A and a client B are both "offline" and then later connect to Firebase, if client A successfully connects to Firebase first, his set() will be written to Firebase, but when client B eventually gets connected, his set will overwrite client A's set, so client B will ultimately win.
transaction() has built-in conflict resolution. So if client A gets connected to Firebase first, his transaction will succeed on first try (since there's no conflict). Then when client B connects, his transaction will fail on the first try, and therefore his transaction update function will be automatically run a second time (now on the new data that client A previously wrote), and this new data will be written to Firebase (assuming no further conflicts).
So if you don't care who wins, use set(). If you need to ensure some sort of consistency through conflict/merge resolution, use transaction().

Related

CoTurn Data Usage Stats on Multi User System

We want to track each users turn usage seperately. I inspected Turn Rest API, AFAIU it is just used to authorize the user which already exists in Coturn db. This is the point I couldn't realize exactly. I can use an ice server list which includes different username-credential peers. But I must have initialized these username-credential peers on coturn db before. Am I right? If I am right, I do not have an idea how to do this. Detection of user's request to use turn from frontend -> Should generate credentials like this CoTURN: How to use TURN REST API? which I am already achieved -> if this is a new user, my backend should go to my EC2 instance and run "turnadmin create user command" somehow -> then I will let the WebRTC connection - > then track the usage of specific user and send it back to my backend somehow.
Is this a true scenario? If not, how should it be? Is there another way to manage multiple users and their data usage? Any help would be appreciated.
AFAIU to get the stats data, we must use redis database. I tried to used it, I displayed the traffic data (with psubscribe turn/realm//user//allocation//traffic ) but the other subscribe events have never worked ( psubscribe turn/realm//user//allocation//traffic/peer or psubscribe turn/realm//user//allocation/*/total_traffic even if the allocation is deleted). So, I tried to get past traffic data from redis db but I couldn't find how. At redis, KEYS * command gives just "status" events.
Even if I get these traffic data, I couldn't realize how to use it with multi users. Currently in our project we have one user(in terms of coturn) and other users are using turn over this user.
BTW we tried to track the usage where we created peer connection object from RTCPeerConnection interface. I realized that incoming bytes are lower than the redis output. So I think there is a loss and I think I should calculate it from turn side.

Prevent Duplicates in Rest API

We have an API endpoint which accepts people.
During the call we check to ensure that the person PIN has not already been used, if so we reject the request with a 422 input error.
Recently a client complained about a duplicate PIN and we found that the API endpoint was being triggered by two different REST calls with duplicated PINs but not a duplicate body.
i.e.
{First Name: John, Last Name: Doe, PIN: 722}
{First Name: Jane, Last Name: Doe, PIN: 722}
As both come in within milliseconds of each other when the test is performed on the second record for the duplicate PIN it returns false as the first record has not yet been inserted into the DB and as a result continues to process the second record.
We have looked into a few options, such as unique constraints on the DB which do work but would require huge amounts of rework to bubble the error up to the REST API response. A huge risk on a thriving production APP.
There are around 5-6 different API calls that can modify the PIN collection in one guise or another and around 20-30 different APIs where this sort of problem exists (unique emails, unique item name etc) so I am not sure we can maintain lists for quick access.
Aside from the DB constraints are there any other options available to us that would be simpler to implement? Perhaps some obscure attribute on the .NET APIController class.
In my head at least I would like a request to be made, and subsequent requests to be queued. I am aware we can simply store the payload for processing later however on the basis the APIs are already being consumed this doesn't seem to be an option the queue would have to block the response.
Trying to Google this has been far from successful as everything assumes I am trying to decline duplicate complete bodies.
If you're basing the check on whether the PIN exists in the db, in theory you could have tens or hundreds of potential duplicates coming in on the REST API, all assuming the PIN is ok as the database is slow to say whether a PIN exists.
If you think of it as a pipeline:
client -> API -> PIN check from db -> ok -> insert in db
the PIN check from db is a bottleneck as it will most likely be slower than incoming REST calls. As you say, the solution is to enforce the uniqueness at the database level but if that's not practical, you could use a 'gateway' check instead. Basically a PIN cache that's fast to update/query and PINs are added to it once the request has been accepted for processing and before they're written to the database. So the pipeline becomes:
client -> API -> PIN check from cache -> ok -> write PIN to cache ->
insert in db
and the second request would end up as:
client -> API -> PIN check from cache -> exists -> fail
so the cache is keeping up with the requests, allowing the more leisurely database operations to continue with known clean data. You'd have to build the cache (from PINs in the database) on every service restart and only allow API calls once the cache is ready if it's a memory cache. Or you could use a disk cache. The downside is it duplicates all the PINs in your database but it's a compromise that lets you keep up with checking PINs in API calls.

In what scenarios is recommended a reliable session?

In few words, if I am not wrong, a session is used when I want to ensure that the packages are sent in order, and to be able to use sessions is needed a reliable connection.
But my doubt what kind of applications need that? In my case is a simple application in which a client request to a service data from a database, the service get the data from the database and send to the client the results. Also the client can requeset to add, modify or delete data from database. In this case, should I need a reliable connection and sessions or not?
Thanks.
Session presumes that for some period of time you want to retain some data. Such a period of time, as far as session is concerned, refers to client's lifecycle that is when client opens up proxy, both service along with session are created, when client closes proxy service and session terminate their actions. There is exception when closing proxy does not actually perform it right away and this occures when you invoke one-way-operation. Service will keep working as long as operation performs its action despite the fact that it previously received an order to get rid of instance.
Based on provided information I assume the best choice would be PerCall. You do not store any data between calls and every single call can be perceived separately. Additionaly, leverage of ConcurrencyMode set to multiple so as to allow services being created simultaneously.
Personally, I find session useful in MSMQ, whenever I want to specific number of messages be wrapped into single queue-message. If error occures, regardless of whether which message is in charge of it, the whole queue-message is rolled back.

How to keep an API idempotent while receiving multiple requests with the same id at the same time?

From a lot of articles and commercial API I saw, most people make their APIs idempotent by asking the client to provide a requestId or idempotent-key (e.g. https://www.masteringmodernpayments.com/blog/idempotent-stripe-requests) and basically store the requestId <-> response map in the storage. So if there's a request coming in which already is in this map, the application would just return the stored response.
This is all good to me but my problem is how do I handle the case where the second call coming in while the first call is still in progress?
So here is my questions
I guess the ideal behaviour would be the second call keep waiting until the first call finishes and returns the first call's response? Is this how people doing it?
if yes, how long should the second call wait for the first call to be finished?
if the second call has a wait time limit and the first call still hasn't finished, what should it tell the client? Should it just not return any responses so the client will timeout and retry again?
For wunderlist we use database constraints to make sure that no request id (which is a column in every one of our tables) is ever used twice. Since our database technology (postgres) guarantees that it would be impossible for two records to be inserted that violate this constraint, we only need to react to the potential insertion error properly. Basically, we outsource this detail to our datastore.
I would recommend, no matter how you go about this, to try not to need to coordinate in your application. If you try to know if two things are happening at once then there is a high likelihood that there would be bugs. Instead, there might be a system you already use which can make the guarantees you need.
Now, to specifically address your three questions:
For us, since we use database constraints, the database handles making things queue up and wait. This is why I personally prefer the old SQL databases - not for the SQL or relations, but because they are really good at locking and queuing. We use SQL databases as dumb disconnected tables.
This depends a lot on your system. We try to tune all of our timeouts to around 1s in each system and subsystem. We'd rather fail fast than queue up. You can measure and then look at your 99th percentile for timings and just set that as your timeout if you don't know ahead of time.
We would return a 504 http status (and appropriate response body) to the client. The reason for having a idempotent-key is so the client can retry a request - so we are never worried about timing out and letting them do just that. Again, we'd rather timeout fast and fix the problems than to let things queue up. If things queue up then even after something is fixed one has to wait a while for things to get better.
It's a bit hard to understand if the second call is from the same client with the same request token, or a different client.
Normally in the case of concurrent requests from different clients operating on the same resource, you would also want to implementing a versioning strategy alongside a request token for idempotency.
A typical version strategy in a relational database might be a version column with a trigger that auto increments the number each time a record is updated.
With this in place, all clients must specify their request token as well as the version they are updating (typical the IfMatch header is used for this and the version number is used as the value of the ETag).
On the server side, when it comes time to update the state of the resource, you first check that the version number in the database matches the supplied version in the ETag. If they do, you write the changes and the version increments. Assuming the second request was operating on the same version number as the first, it would then fail with a 412 (or 409 depending on how you interpret HTTP specifications) and the client should not retry.
If you really want to stop the second request immediately while the first request is in progress, you are going down the route of pessimistic locking, which doesn't suit REST API's that well.
In the case where you are actually talking about the client retrying with the same request token because it received a transient network error, it's almost the same case.
Both requests will be running at the same time, the second request will start because the first request still has not finished and has not recorded the request token to the database yet, but whichever one ends up finishing first will succeed and record the request token.
For the other request, it will receive a version conflict (since the first request has incremented the version) at which point it should recheck the request token database table, find it's own token in there and assume that it was a concurrent request that finished before it did and return 200.
It's seems like a lot, but if you want to cover all the weird and wonderful failure modes when your dealing with REST, idempotency and concurrency this is way to deal with it.

How is Redis used in Trello?

I understand that, roughly speaking, Trello uses Redis for a transient data store.
Is anyone able to elaborate further on the part it plays in the application?
We use Redis on Trello for ephemeral data that we would be okay with losing. We do not persist the data in Redis to disk, and we use it allkeys-lru, so we only store things there can be kicked out at any time with only very minor inconvenience to users (e.g. momentarily seeing an incorrect user status). That being said, we give it more than 5x the space it needs to store its actual working set and choose from 10 keys for expiry, so we really never see anything get kicked out that we're using.
It's our pubsub server. When a user does something to a board or a card, we want to send a message with that delta to all websocket-connected clients that are subscribed to the object that changed, so all of our Node processes are subscribed to a pubsub channel that propagates those messages, and they propagate that out to the appropriately permissioned and subscribed websockets.
We SORT OF use it to back socket.io, but since we only use the websockets, and since socket.io is too chatty to scale like we need it to at the moment, we have a patch that disables all but the one channel that is necessary to us.
For our users who don't have websockets, we have to keep a list of the actions that have happened on each object channel since the user's last poll request. For that we use a list which we cap at the most recent 100 elements, and an auxilary counter of how many elements have been added to the list since it was created. So when we're answering a poll request from such a browser, we can check the last element it reports that it has seen, and only send down any messages that have been added to the queue since then. So that gets a poll request down to just a permissions check and a single Redis key check in most cases, which is very fast.
We store some ephemeral data about the active status of connected users in Redis, because that data changes frequently and it is not necessary to persist it to disk.
We store short-lived keys to support OAuth logins in Redis.
We love Redis; once you have an instance of it up and running, you want to use it for all kinds of things. The only real trouble we have had with it is with slow-consuming clients eating up the available space.
We use MongoDB for our more traditional database needs.
Trello uses Redis with Socket.IO (RedisStore) for scaling, with the following two features:
key-value store, to set and get values for a connected client
as a pub-sub service
Resources:
Look at the code for RedisStore in Socket.IO here: https://github.com/LearnBoost/socket.io/blob/master/lib/stores/redis.js
Example of Socket.IO with RedisStore: http://www.ranu.com.ar/2011/11/redisstore-and-rooms-with-socketio.html