In my app I'm displaying Race objects that essentially have three states: pending, inProgress and completed. I want to display all Races that are currently pending or inProgress, but not the ones that are completed. To do this, I want to create a RESTful API for getting these resources from my server, but I'm not sure what the best (i.e. most RESTful) approach would be.
The issue is that when someone opens or refreshes the app, I need to two things:
Perform a GET on all the Races that are currently displayed in the client to update their status.
GET all of the new pending or inProgress Races that have been created since the client last updated
I've come up with a few different solutions, though I don't know which, if any, would be best:
Simply delete the old Race records on the client and always GET all new records
Perform 2 separate GET operations, the first which updates all the old records, and the second where I GET all the new pending / inProgress Races
Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
To me, this seems like a pretty common scenario but I haven't been able to find a specific answer to this type of problem. I'd like to see what SO thinks :)
Thanks in advance for your help!
Simply delete the old Race records on the client and always GET all new records
This is probably the easiest solution. However you shouldn't do that if you need a very smooth update on your client (for games, data visualization, etc.).
Perform 2 separate GET operations (...) / Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
I would definitely do it with a single operation. Better than an update timestamp (timestamp operations are costly, and several operations could happen at the same time), I would use a sequence number. This is the way CouchDB handles "changes".
Moreover, as you will see in the documentation, this solution can then be upgraded for asynchronous notifications (if you need so).
Related
I am currently developing a backend system that has two endpoints of concern that interacts with a common relational database table. The main purpose of this system is an after-registration email verification system that has a time limit.
Let's suppose there are three tables that contain the users that are pending verification, already verified, and out of time for verification. These tables will contain similar attributes of the users. One user (represented by a unique ID) should exist in only one of these tables.
The first endpoint is the verification endpoint, which will be triggered by the user through a verification link (e.g., www.hello.com/verify?token=XXXX). The to-be-verified user will be searched through the pending table. If not found, it means that the token is expired and nothing will be done after. Otherwise, it will be moved to the verified table. Moving, in this case, means that the selected row will be removed from the first table, and then will be inserted into the second table. Therefore, at least 3 queries will be executed as below, with the last two could be on a single transaction.
SELECT * FROM pending WHERE pending.id = id;
DELETE FROM pending WHERE pending.id = id;
INSERT INTO verified VALUES (what we get from SELECT);
The second endpoint is the expired users cleaning endpoint, which will be triggered by some kind of scheduler. Let's assume it will be triggered exactly when the user's verification token just expired. The overall task will be similar to the first endpoint, but the data row will be moved into the out of time table instead, and we assume that the user is already verified when we could not find the specified user when using SELECT.
SELECT * FROM pending WHERE pending.id = id;
DELETE FROM pending WHERE pending.id = id;
INSERT INTO outoftime VALUES (what we get from SELECT);
I believe the problem may arise if these two endpoints are unfortunately triggered at the same time (i.e., the user verify themselves right at the expiration time) by two concurrent processes. Both processes might manage to successfully find the user from SELECT before running DELETE. Therefore, they both will also run INSERT, causing the user data to be inserted into two tables, violating our rule (one user should exist in only one of these tables).
An ideal solution for me would be to find a way to detect and "fail" one of the two processes, which will produce a similar result to the more common situation where that process starts after another process has already done its job (i.e., the second process will terminate when it fails to retrieve a user from SELECT). The choice of the process to be failed is not significant in this case; either of the two would work.
I am aware that using locks is one of the possible solutions theoretically, by covering each critical section with a lock acquisition and release. However, I am not sure whether it is a good practice or not in this problem.
Are there any common design patterns or ideas that could solve this problem?
Please note that no specific technology/database stacks have been chosen yet.
Thanks!
Edit: There are multiple tables in this case since I found that the frequency of access in each type of user may not equal, so we could use different system specifications for each table. For example, the out-of-time table is more like an archive--just a big pile of data with minimal access, while the active table will be accessed every time there are changes to the user; so they might require better hardware, etc. Using a status column seems to be one solution though. However, will there be a similar situation in system design where this kind of problem is inevitable? How it will be dealt with?
I use the podio API to create an item. In the form I have a few calculations. When I retrieve the item immediately after its creation, using the api, the fields are not calculated yet. The calculation is asynchronous, so that makes sense.
When I use a create hook, and fetch the itembased on the hook, the calculated fields are there.
Does anyboofy know if I can depend on his, meaning is the create hook fired after the fields are calculated?
Yes the Javascript calculations are asynchronous.
Plus also related, the MongoDB (which Podio uses on the back end) is "eventually consistent".
I faced this same problem, and ended up making a queueing system for my incoming webhooks, where I waited 30 seconds before actioning any record retrieval from Podio, to get updated values for our local reporting database to cache.
Also related more to the MongoDB asynchronicity... if you are using Globiflow to trigger updates in related tables using the javascript calulated field from the parent table, I found there was occasional incorrect values.
I solved it by adding a 30 second delay in the Globiflow script, before updating the related app/table with calculated fields from the parent app/table. This gave enough time for javascript to calculate and mongodb to save the calculated value
https://www.globiflow.com/help/wait-delay.php
Given an SQL table with timestamped records. Every once in a while an application App0 does something like foreach record in since(certainTimestamp) do process(record); commitOffset(record.timestamp), i.e. periodically it consumes a batch of "fresh" data, processes it sequentially and commits success after each record and then just sleeps for reasonable time (to accumulate yet another batch). That works perfect with single instance.. however how to load balance multiple ones?
In exactly the same environment App0 and App1 concurrently competite for the fresh data. The idea is that ready query executed by the App0 must not overlay with the same read query executed by the App1 - such that they never try to process the same item. In other words, I need SQL-based guarantees that concurrent read queries return different data. Is that even possible?
P.S. Postgres is preferred option.
The problem description is rather vague on what App1 should do while App0 is processing the previously selected records.
In this answer, I make the following assumptions:
all Apps somehow know what the last certainTimestamp is and it is the same for all Apps whenever they start a DB query.
while App0 is processing, say the 10 records it found when it started working, new records come in. That means, the pile of new records with respect to certainTimestamp grows.
when App1 (or any further App) starts, the should process only those new records with respect to certainTimestamp that are not yet being handled by other Apps.
yet, if on App fails/crashes, the unfinished records should be picked the next time another App runs.
This can be achieved by locking records in many SQL databases.
One way to go about this is to use
SELECT ... FOR UPDATE SKIP LOCKED
This statement, in combination with the range-selection since(certainTimestamp) selects and locks all records matching the condition and not being locked currently.
Whenever a new App instance runs this query, it only gets "what's left" to do and can work on that.
This solves the problem of "overlay" or working on the same data.
What's left is then the definition and update of the certainTimestamp.
In order to keep this answer short, I don't go into that here and just leave the pointer to the OP that this needs to be thought through properly to avoid situations where e.g. a single record that cannot be processed for some reason keeps the certainTimestamp at a permanent minimum.
How do I know when to invalidate the cache, if a table change is made from an outside source?
I have an api call that returns an employee table. The first time this call is made, I will cache the results so that on subsequent calls it will pull the data from the cache instead of the database. This makes sense, however, what happens if someone adds a new record to the employee table from outside of the api, how does the cache know that it is now invalid?
If the user made the change to the employee table through the API I can capture that, but we have a separate desktop app that doesn't use the API, and that app can directly make changes to the employee table. Is there any accepted standards for handling this?
The only possible solution I can think of is to add a trigger to the employee table, and somehow use that to know when a table has changed. But, we have over a thousand tables, and we are making an api call for each table - So, I do not think that adding a thousand triggers to our database is an acceptable solution.
Yes you could add a trigger as suggested. Or you could use a caching system that support expiry time/sliding expiry. So you would be serving up stale data some of the time but not always.
As the other answer a suggests your trigger idea is ok, however as you've stated that would be a lot of triggers.
If your cache is not local to the API, which i assume it isn't if triggers would be able to access. Could you not access it from your desktop application? You could invalidate your cache by removing the employee record from the cache with the desktop application when it makes a successful change to the employee table.
It boils down to..
You have a cache (which is essentially a read store).
You have two options to update it
- Either it times out and fetches (which is ok, if you dont need up to the minute real time data)
- Or is has to be told its data is no longer valid.
Two ways to solve this
Push model
Pull model
Push Model: Using a database trigger for SQL server table to populate an intermediate audit table and polling that using a background task.
Pull Model: Using CLR Trigger and pushing the updates to an API. Whenever DML happens the CLR trigger will call the Api, qhich in-turn can update the cache!
Hope this helps!
I recently ran into such problem:
For each user, I need to do the following on server side:
First
(SQL) Insert user's record with a Unique constraint on ID
Then Parallel
(Http) Subscribe user to Service A, get subscription_id_A
(Http) Subscribe user to Service B, get subscription_id_B
Finally
(SQL) Update user's record with both subscription ids
Ideally I want this entire operation to be transactional, eg if any of http requests or sql fails, it would be as if nothing happened. Added: if Request A fails but B succeeds, I would be stuck: Do I cancel the transaction and end up with an untracked subscription or do I commit it and end up with user missing a subcription
Given that this is likely impossible to achieve, what would be the next best thing I can do?
The service A and B does provide APIs to check for existence of subscriptions and to modify, delete a subscription too, but I want to avoid the Check Then Act style. The SQL server has highest isolation level
This is indeed a standard problem. (Often, developers are not aware of this problem and only find out in production.) There is no standard solution. It is impossible to solve in general (see the http://en.wikipedia.org/wiki/Two_Generals%27_Problem - two systems can never agree with 100% certainty on whether they should commit or abort).
Maybe you can perform all the SQL work first. Insert the user but without subscription IDs. You then try to add the subscriptions one by one and add their IDs in separate transactions once you got them.
Install a background job that periodically checks for users that have been created a long time ago but that do not have subscriptions yet. If you find any discrepancies fix them and log this fact.
This periodic cleanup ensures that temporary failures (which will occur due to network glitches, timeouts, redeployments, bugs, ...) are temporary. It also ensures that they are being detected and reported to developers if you like.
This would be an eventually consistent system. The idea is to first transactionally record the target state (the user and the goal to create two subscriptions) and then have a background job try to converge the data to the target state.