Hitting same API to get different number at the same time using concurrency?

Hitting same API to get different number at the same time using concurrency? - asp.net-core

I have created a web API to generate a sequence number every time that API hits it generates a sequence number. Now, what I need is to make it concurrent for multiple users so that when multiple users at the same time hit that API then the API generates a different number every time.

This depends a lot on what you are trying to do and for what reason.
Generating unique sequence numbers is very difficult in an environment where you can have multiple users hitting that endpoint at the same time.
If you are trying to give them an ID to use for some sort of data insert then I suggest you don't offer integers. Instead offer GUIDs.
The issue with inserting data based on this kind of mechanism is that sometimes no data is actually inserted for various reason. users change their mind, end up requesting another id, or the subsequent call simply fails so you end up with holes in your data.
Instead offer them back GUIDs and if the call finally comes in then use it.

Related

Salesforce integration with others NO duplicates

Sorry, new to Salesforce platform. Trying to api integrate the two together for one environment. We use TrackWise a platform on top of the Salesforce development stack. Goal, migrate records abc from erp to trackwise custom objects.
Should seem simple to initial load. Read all records from SF, compare key from erp - if exists skip, if not add.
That's where the fun begins - limit return of objects from call is 1,000. I only have total 50,000 objects, but can't retrieve at beginning.
Logic would just say, check for key first, if exists skip, execpt hit limit 2 - only 200 queries per minute.
Should I just add a timer to my inserts? Is there a way to effectively make sure I do not insert a duplicate via an api call?
Well what would you do if your mother asked you?

Records insertion with primary key while load testing on locust

I have to test load on POST API on multiple users. API uses the primary constrains while inserting the records in tables. Locust will be facing issue with the primary constrains approach.

I'm not 100% sure I understand exactly what you need still, but it sounds like you need each Locust user to use different data in POST requests you're doing. The easiest way would just be to generate the data randomly. It could be based on some sort of pattern, or if it has to be absolutely unique you could generate a UUID (or do a combination of the two like locust-user-{time stamp}-{UUID} so you can tell in your system back end or elsewhere that it's test data).
But Locust just runs whatever Python code you give it and automates running it concurrently. In most cases, if you write a simple Python script to do what you want successfully, you can just drop it into a Locust task and it should work. You can do whatever you need to in order to get unique or otherwise different data for your POST requests and have Locust users do that in your tests.

Handling paging with changing sort orders

I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.
I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.
Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).
Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?
Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.

This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:
Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.
Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.
Maintain different state for each client
This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.
There are variations within those that mix the various compromises, but that's what it all boils down to.
For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.
Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.
Etc.
See also:
How to provide an API client with 1,000,000 database results?
Using "Cursors" for paging in PostgreSQL
Iterate over large external postgres db, manipulate rows, write output to rails postgres db
offset/limit performance optimization
If PostgreSQL count(*) is always slow how to paginate complex queries?
How to return sample row from database one by one
I'd probably implement a hybrid solution of some form, like:
Using a cursor, read and immediately send the first part of the data to the client.
Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.
Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.
If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.
That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.

I am not aware of a perfect solution for this problem. But if you want the user to have a stale view of the data then cursor is the way to go. Only tuning you can do is to store only the data for 1st 2 pages in the cursor. Beyond that you fetch it again.

Rest philosophy for updating and getting records

In my app I'm displaying Race objects that essentially have three states: pending, inProgress and completed. I want to display all Races that are currently pending or inProgress, but not the ones that are completed. To do this, I want to create a RESTful API for getting these resources from my server, but I'm not sure what the best (i.e. most RESTful) approach would be.
The issue is that when someone opens or refreshes the app, I need to two things:
Perform a GET on all the Races that are currently displayed in the client to update their status.
GET all of the new pending or inProgress Races that have been created since the client last updated
I've come up with a few different solutions, though I don't know which, if any, would be best:
Simply delete the old Race records on the client and always GET all new records
Perform 2 separate GET operations, the first which updates all the old records, and the second where I GET all the new pending / inProgress Races
Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
To me, this seems like a pretty common scenario but I haven't been able to find a specific answer to this type of problem. I'd like to see what SO thinks :)
Thanks in advance for your help!

Simply delete the old Race records on the client and always GET all new records
This is probably the easiest solution. However you shouldn't do that if you need a very smooth update on your client (for games, data visualization, etc.).
Perform 2 separate GET operations (...) / Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
I would definitely do it with a single operation. Better than an update timestamp (timestamp operations are costly, and several operations could happen at the same time), I would use a sequence number. This is the way CouchDB handles "changes".
Moreover, as you will see in the documentation, this solution can then be upgraded for asynchronous notifications (if you need so).

Data structure for efficient access of random slices of data from an API call

We are writing a library for an Api which pulls down on ordered stream of data. Through this Api you can make calls for data by slices. For instance if I want items 15-25 I can make an api call for that.
The library we are writing will allow the client to call for any slice of data as well, but we want the library to be as efficient with these api calls as possible. So if I've already asked for items 21-30, I don't want to ever request those individual data items again. If someone asks the library for 15-25 we want to call the api for 15-20. We will need to search for what data we already have and avoid requesting that data again.
What is the most efficient data structure for storing the results of these api calls? The data sets will not be huge so search time in local memory isn't that big of a deal. We are looking for simplicity and cleanliness of code. There are several obvious answers to this problem but I'm curious if any data structure nerds out there have an elegant solution that isn't coming to mind.
For reference we are coding in Python but are really just looking for a data structure that solves this problem elegantly.

I'd use a balanced binary tree (e.g. http://pypi.python.org/pypi/bintrees/0.4.0) to map begin -> (end, data). When a new request comes in for [b, e) range, do a search for b (followed by move to previous record if b != key), another search for e (also step back), scan all entries between the resulting keys, pull down missing ranges, and merge all from-cache intervals and the new data into one interval. For N intervals in the cache, you'll get amortized O(log-N) cost of each cache update.
You can also simply keep a list of (begin, end, data) tuples, ordered by begin, and use bisect_right to search. Cost: O(N=number of cached intervals) for every update in the worst case, but if the clients tend to request data in increasing order, the cache update will be O(1).
Cache search itself is O(log-N) in either case.

The canonical data structure often used to self this problem is an interval tree. (See this Wikipedia article.) Your problem can be thought of as needing to know what things you've sent (what intervals) overlap with what you're trying to send -- then cut out the intervals that intersect with what you're trying to send (which is linear time with respect to the number of intervals that you find overlap) and you're there. The "Augmented" tree half way down the Wikipedia article looks simpler in implementation, though, so I'd stick with that. Should be "log N" time complexity, amortization or not.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas