I want to develop a Web API using .NET Core that needs to be able to handle a large number of concurrent requests. Furthermore, the WebAPI needs to connect to a database. The Web API will be internal and won't be exposed on Internet.
I'm considering two possibilities:
Hosting an ASP.NET Core application in Kestrel in one or more containers / EC2 instances, with or without IIS.
A serverless solution using AWS Lambda
I'm trying to understand what considerations I need to be aware of that will impact the performance and cost of each of these solutions.
I understand that a Kestrel server with an application that uses async / await patterns can be expected to handle large numbers of concurrent requests. Database connection pooling means database connections can be efficiently shared between requests.
In this forum post on AWS Lambda I read that:
I understand this questions more in terms “if AWS Lambda calls:
response = Function(request) are thread-safe”.
The answer is yes.
It is because of one very simple reason. AWS Lambda does not allow for
another thread to invoke the same lambda instance before the previous
thread exits. And since multiple lambda instances of the same function
do not share resources, this is not an issue also.
I understood and interpreted this as meaning:
Each lambda instance can only handle one request at a time.
This means a large number of instances are needed to handle a large number of concurrent requests.
Each lambda instance will have its own database connection pool (probably with only a single connection).
Therefore the total number of database connections is likely to be higher, bounded by the Lambda function level concurrency limit.
Also as traffic increases and new lambda instances are created to respond to the demand, there will be significant latency for some requests due to the cold start time of each lambda instance.
Is this understanding correct?
Yes, each lambda instance is completely isolated, and are running as a single thread. So in your case there will be a lot of database connections.
One problem with your architecture is that you are trying to mix scalable resource in this case lambda with non scalable resource in this case relational database. I've seen such setups explode in very spectacular ways.
So in your case I'd either go with a number of static servers running Kestrel or another high performance web server or would replace the relational database with something that could smoothly scale, like DynamoDB or maybe AWS Aurora
Related
I have multiple web applications (.NET Core 3.1) that need to utilize one resource (another, third-party, legacy application) via it's API. The servers are really terrible and I can only make only few concurrent request and the same time (assume I can't change that).
I'm looking for some scheduler that can be used across all applications to manage requests made to that legacy application.
For now each application uses it's own Quartz .Net instance for different maintenance purposes. From what I know there is no way in which I can "chain" does instances so that only certain number of request will be made to that API at the same time across multiple Quartz schedulers (correct me if I'm wrong).
I was looking at Hangfire for answers but I don't think it is possible as well. According to this github extension I can actually create one Hangfire server that will be used by multiple applications, each one utilizing different queue but from what I can gather I cannot limit the number of workers per queue to control the number of requests made in total.
Any ideas?
I'm interested in understand how the Adapter invocations works from Worklight server point of view if I am in this situation:
Basically, we are defining an architecture for several (n) adapters that must use a common function.
We are planning to create a dedicated adapter to do this, so in this way each adapter should be able to call this "common" procedure using WL.Server.invokeProcedure API.
The doubt is if with a large number (y) of requests from these several n adapters that call this "common" one may occur performance issues on the Worklight Server that receives a lot of invocations on a single procedure.
What I would understand (or at least have an official confirmation) is: if the Worklight server receives a lot of invocation on a single procedure of an adapter (particularly, an HTTP adapter) how manages these invocations (e.g. WL Server manage different invocations with different threads in parallel for each invocation, or put them in a queue?) and if sharing a procedure between different adapters using another adapter is a common practice (and if we can use an alternative way avoiding an additional invocation to WL server ).
I've read the Performance and Scalability documents, but I haven't noticed information on this specific point.
One aspect that may be of interest to you in regards to performance settings of adapters is
maxConcurrentConnectionsPerNode.
maxConcurrentConnectionsPerNode – The maximum number of concurrent requests that can be
performed on the back-end application from the Worklight server node. This
maxConcurrentConnectionsPerNode parameter is set in the adapter.xml in the connectivity
entry.
There are two considerations when setting this parameter:
If there is no limitation in the back-end about the incoming connections then,
a "Rough" rule of thumb will be to set the number of connection threads per adapter
to be the number of http threads in the application server. A more precise rule
of thumb will be to understand the percent of requests going to each back-end and
set the number respectively.
The back-end may have a limitation on the incoming connection threads: In that case
we can have only BACKEND_MAX_CONNECTIONS/NUM_OF_CLUSTER_NODES connection threads where
BACKEND_MAX_CONNECTIONS is the maximum incoming connections define in the back-end
server and NUM_OF_CLUSTER_NODES is the number Worklight server nodes in the cluster.
You can also look into the Tuning Worklight Server documentation that covers some of the aspects you mentioned above:
https://www.ibm.com/developerworks/community/blogs/worklight/entry/tuning_worklight_server?lang=en
I'm currently looking to use Microsoft Sync Framework (2.1) to sync clients (running SQL Server Express) with a cloud based central data store, using WCF for all communications.
The central data store is a SQL database, with a scalable number of processing nodes connected to it, each with an instance of my WCF service to process sync calls.
There could be a large amount of data transferred from the server to the clients when syncing, so I think batching is necessary to avoid out of memory issues, better handle unreliable connections, etc. My problem is that the N-tier examples I've seen seem to require an instancing mode of PerSession on the WCF service end, and batch files are stored to a location on disk, which isn't an option as there is no guarantee subsequent calls will go to the same processing node, so my WCF services are all set to PerCall instancing.
What is the best way for me to tackle this batching problem? Is there a way to store the batches on a central data store (say my server database) or is there an alternative to batching to reduce the size of the dataset to 'bite sized' transfers that will be more robust?
the batching in Sync Framework is just for the transmission of the changes, not the application of the changes. so if you have a sync session whose changes are batch into 10 batches, a single batch is not applied individually. rather, the entire 10 batches is applied as one. internally, the batches are actually byte arrays that are reconstructed to a dataset. so you can't have part of the batch in one node and the others on other nodes.
not sure if it helps, but the Windows Azure Sync Service sample may offer you some patterns on how to go about storing the batch file and the writing a similar service and handle the batching.
have a look at Walkthrough of Windows Azure Sync Service Sample
I'm working on a real-time application and building it on Azure.
The idea is that every user reports something about himself and all the other users should see it immediately (they poll the service every seconds or so for new info)
My approach for now was using a Web Role for a WCF REST Service where I'm doing all the writing to the DB (SQL Azure) without a Worker Role so that it will be written immediately.
I've come think that maybe using a Worker Role and a Queue to do the writing might be much more scalable, but might interfere with the real-time side of the service. (The worker role might not take the job immediately from the queue)
Is it true? How should I go about this issue?
Thanks
While it's true that the queue will add a bit of latency, you'll be able to scale out the number of Worker Role instances to handle the sheer volume of messages.
You can also optimize queue-reading by getting more than one message at a time. Since a single queue has a scalability target of 500 TPS, this lets you go well beyond 500 messages per second on reads.
You might look into a Cache for buffering the latest user updates, so when polling occurs, your service reads from cache instead of SQL Azure. That might help as the volume of information increases.
You could have a look at SignalR, it does not support farm scenarios out-of-the-box, but should be able to work with the use of either internal endpoint calls to update every instance, using the Azure Service Bus, or using the AppFabric Cache. This way you get a Push scenario rather than a Pull scenario, thus you don't have to poll your endpoints for potential updates.
Am using Lucene API in my web portal which is going to have 1000s of concurrent users.
Our web server will call Lucene API which will be sitting on an app server.We plan to use 2 app servers for load balancing.
Given this, what should be our strategy for replicating lucene indexes on the 2nd app server?any tips please?
You could use solr, which contains built in replication. This is possibly the best and easiest solution, since it probably would take quite a lot of work to implement your own replication scheme.
That said, I'm about to do exactly that myself, for a project I'm working on. The difference is that since we're using PHP for the frontend, we've implemented lucene in a socket server that accepts queries and returns a list of db primary keys. My plan is to push changes to the server and store them in a queue, where I'll first store them into the the memory index, and then flush the memory index to disk when the load is low enough.
Still, it's a complex thing to do and I'm set on doing quite a lot of work before we have a stable final solution that's reliable enough.
From experience, Lucene should have no problem scaling to thousands of users. That said, if you're only using your second App server for load balancing and not for fail over situations, you should be fine hosting Lucene on only one of those servers and accessing it via NDS (if you have a unix environment) or shared directory (in windows environment) from the second server.
Again, this is dependent on your specific situation. If you're talking about having millions (5 or more) of documents in your index and needing your lucene index to be failoverable, you may want to look into Solr or Katta.
We are working on a similar implementation to what you are describing as a proof of concept. What we see as an end-product for us consists of three separate servers to accomplish this.
There is a "publication" server, that is responsible for generating the indices that will be used. There is a service implementation that handles the workflows used to build these indices, as well as being able to signal completion (a custom management API exposed via WCF web services).
There are two "site-facing" Lucene.NET servers. Access to the API is provided via WCF Services to the site. They sit behind a physical load balancer and will periodically "ping" the publication server to see if there is a more current set of indicies than what is currently running. If it is, it requests a lock from the publication server and updates the local indices by initiating a transfer to a local "incoming" folder. Once there, it is just a matter of suspending the searcher while the index is attached. It then releases its lock and the other server is available to do the same.
Like I said, we are only approaching the proof of concept stage with this, as a replacement for our current solution, which is a load balanced Endeca cluster. The size of the indices and the amount of time it will take to actually complete the tasks required are the larger questions that have yet to be proved out.
Just some random things that we are considering:
The downtime of a given server could be reduced if two local folders are used on each machine receiving data to achieve a "round-robin" approach.
We are looking to see if the load balancer allows programmatic access to have a node remove and add itself from the cluster. This would lessen the chance that a user experiences a hang if he/she accesses during an update.
We are looking at "request forwarding" in the event that cluster manipulation is not possible.
We looked at solr, too. While a lot of it just works out of the box, we have some bench time to explore this path as a learning exercise - learning things like Lucene.NET, improving our WF and WCF skills, and implementing ASP.NET MVC for a management front-end. Worst case scenario, we go with something like solr, but have gained experience in some skills we are looking to improve on.
I'm creating the Indices on the publishing Backend machines into the filesystem and replicate those over to the marketing.
That way every single, load & fail balanced, node has it's own index without network latency.
Only drawback is, you shouldn't try to recreate the index within the replicated folder, as you'll have the lockfile lying around at every node, blocking the indexreader until your reindex finished.