Ways in which one can develop a modern API Web Sevice which is Enterprise level and Scalable - api

This is more of a general type of question. There are multiple ways in which one can develop API's to exchange data. For example:
Python Flask, Apache Camel, Node.js, etc
And using Testing tools like:
SoapUI, Postman, Swagger editor.
My question is what are the pros and cons of developing different types of API at Enterprise level to handle heavy load, is Scalable and Future proof

It really depends on what you are doing.
Is the system growing in number of transactions, size of transactions, number of clients, complexity of request or what?
Can the requests be done in parallel or are there shared resources?
Are requests coming from a variety of locations or networks?
How much are you willing to spend?
All are important considerations.

Related

Performance benchmark between boost graph and tigerGraph, Amazon Neptune, etc

This might be a controversial topic, but I am concerned about the performance of boost graph vs commercial software such as TigerGraph, since we need to choose one.
I am inclined to choose Boost, but I am concerned whether performance-wise, boost is good enough.
Disregarding anything around persistence and management, I am concerned with boost graph's core performance of algorithms.
If it is good enough, we can build our application logic on top of it without worry.
Also, I got below benchmarks of LDBC SOCIAL NETWORK BENCHMARK.
LDBC benchmark
seems that TuGraph is the fastest...
Is LDBC's benchmark authoritative in the realm of graph analysis software?
Thank you
I would say that any benchmark request is a controversial topic as they tend to represent a singular workload, which may or may not be representative of your workload. Additionally, performance is only one of the aspects you should look at as each option is built to target different workloads and offers different features:
Boost is a library, not a database, so anything around persistence and management would fall on the application to manage.
TigerGraph is an analytics platform that is focused on running real-time graph analytics, such as deep link analysis.
Amazon Neptune is a fully managed service focused on highly concurrent transactional graph workloads.
All three have strong capabilities and will perform well when used in the manner intended. I'd suggest you figure out which option best matches the type of workload you are looking to run, the type of support you need, and the amount of operational work you are willing to onboard to make the choice more straightforward.

Choice decision between different types of distributed frameworks available

How should one decide between rabbit mq, kafka , akka and vertx or chose a combination of a few of these?
I have a use case where I want to get huge (half a TB each day) market data using a java client API provided by an upstream.
We have currently implemented a distributed etl using Akka but want to know what other improvements or better choices or combination of choices(like akka+kafka) can be considered.
Regarding the choice between akka and vert.x the following Devoxx talk is to the point:
https://www.youtube.com/watch?v=EMv_8dxSqdE
It compares concurrency models, among them event bus (vert.x being the example) and actor systems (akka being the example).
In the summary slide (1h00m40s into the talk), the difference is summarised in that akka provides hierarchical supervision for error-handling, that being presented as an advantage over vert.x
akka-stream-kafka (formerly reactive-kafka) feels like a natural fit to bridge the two and we are happy users of it, but cannot comment on how it compares to rabbitmq.

CNTK: Python vs C# API for model consumption

We have trained a model using CNTK. We are building a service that is going to load this model and respond to requests to classify sentences. What is the best API to use regarding performance? We would prefer to build a C# service as in https://github.com/Microsoft/CNTK/tree/master/Examples/Evaluation/CSEvalClient but alternatively we are considering building a Python service that is going to load the model in python.
Do you have any recommendations towards one or the other approach? (regarding which API is faster, actively maintained or other parameters you can think of). The next step would be to set up an experiment measuring the performance of both API calls, but was wondering if there is some prior knowledge here that could help us decide.
Thank you
Both APIs are well developed/maintained. For text data I would go with the C# API.
In C# the main focus is fast and easy evaluation and for text loading the data is straightforward.
The Python API is good for development/training of models and at this time not much attention has been paid to evaluation. Furthermore, because of the wealth of packages loading data in exotic formats is easier in Python than C#.
The new C# Eval API based on CNTKLibrary will be available very soon (the first beta is probably next week). This API has functional parity with the C++ and Python API regarding evaluation.
This API supports using multiple threads to serve multiple evaluation requests in parallel, and even better, model parameters of the same loaded model is shared between these threads, which will significantly reduce memory usage in a service environment.
We have also a turorial about how to use Eval API in ASP.Net environment. It still refers to EvalDLL evaluation, but applies to the new C# API too. The document will be updated after the new C# API is released.

Building GIS apps from scratch?

I am a very beginner in software and I am asking or a direction to proceed for research technologies to build my app. I am having just an idea for the app. I am trying to build something like zomato but different services. The idea of location based system is similar. I searched online and came to know about GIS systems. But while researching further, it seems I've to create a map all together. This feels redundant to build as we have api of google maps.
But can i use this api to build a system "ON" it????
Any tutorials or some direction in this direction would be helpful.
Also what is difference between GIS and gps based apps.
As you see, I am not very clear in the fundamentals of the GIS and GPS based apps
Thanks for the help
Regarding Android, you have almost all you need by combining the platform API and the comprehensive Google Maps Android API. Regarding the later, it's actually a matter of opting by convenience and possibly paying a licence fee to Google, versus developing your own solutions of aggregating free or cheaper services from elsewhere.
Most problems solved by apps are not the same problems solved by classical GIS software, since the former are more consumer-oriented (using public transportation, navigating a route, planning a trip, finding a nearby restaurant), and the later are more specialist-oriented, typically solving larger-scale and more technical issues (detecting regions with flood risk, monitoring deforestation, calculating volumes of terrain to be bulldozed, etc.)
You should not, IMO, be discouraged by the seemingly hard technical concepts of geography and map making. Your best bet is to have a clear vision of what actual problems you app should be solving, and study the geography topics gradually, as the need arises.
A bit of consideration on your question about GIS:
If it were created today, the GIS acronym would mean any software dealing with geographic data, be it a mobile app or a workstation software suite destined to specialized professional use.
But when it was created, the term meant almost exclusively the later sense, and so it has a lot of tradition and cultural legacy to it - which is of couse not always a good thing. Specifically (at least in my experience), it seems to me the jargon and concepts used by the classic GIS community are a bit impenetrable to the newcomer, specially if she comes from the software-development field instead of the geo-sciences field.
But geographic information availability has gone from scarcity to overwhelming abundance, and so have its enabling technologies: GPS satellites, mobile computing and mobile connectivity.

Concurrent page request comparisons

I have been hoping to find out what different server setups equate to in theory for concurrent page requests, and the answer always seems to be soaked in voodoo and sorcery. What is the approximation of max concurrent page requests for the following setups?
apache+php+mysql(1 server)
apache+php+mysql+caching(like memcached or similiar (still one server))
apache+php+mysql+caching+dedicated Database Server (2 servers)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/single dbserver)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/multi dbserver)
+distributed (amazon cloud elastic) -- I know this one is "as much as you can afford" but it would be nice to know when to move to it.
I appreciate any constructive criticism, I am just trying to figure out when its time to move from one implementation to the next, because they each come with their own implementation feat either programming wise or setup wise.
In your question you talk about caching and this is probably one of the most important factors in a web architecture r.e performance and capacity.
Memcache is useful, but actually, before that, you should be ensuring proper HTTP cache directives on your server responses. This does 2 things; it reduces the number of requests and speeds up server response times (if you have Apache configured correctly). This can also be improved by using an HTTP accelerator like Varnish and a CDN.
Another factor to consider is whether your system is stateless. By stateless, it usually means that it doesn't store sessions on the server and reference them with every request. A good systems architecture relies on state as little as possible. The less state the more horizontally scalable a system. Most people introduce state when confronted with issues of personalisation - i.e serving up different content for different users. In such cases you should first investigate using the HTML5 session storage (i.e store the complete user data in javascript on the client, obviously over https) or if the data set is smaller, secure javascript cookies. That way you can still serve up cached resources and then personalise with javascript on the client.
Finally, your stack includes a database tier, another potential bottleneck for performance and capacity. If you are only reading data from the system then again it should be quite easy to horizontally scale. If there are reads and writes, its typically better to separate the read write datasets into a separate database and have the read only in another. You can then use more relevant methods to scale.
These setups do not spit out a single answer that you can then compare to each other. The answer will vary on way more factors than you have listed.
Even if they did spit out a single answer, then it is just one metric out of dozens. What makes this the most important metric?
Even worse, each of these alternatives is not free. There is engineering effort and maintenance overhead in each of these. Which could not be analysed without understanding your organisation, your app and your cost/revenue structures.
Options like AWS not only involve development effort but may "lock you in" to a solution so you also need to be aware of that.
I know this response is not complete, but I am pointing out that this question touches on a large complicated area that cannot be reduced to a single metric.
I suspect you are approaching this from exactly the wrong end. Do not go looking for technologies and then figure out how to use them. Instead profile your app (measure, measure, measure), figure out the actual problem you are having, and then solve that problem and that problem only.
If you understand the problem and you understand the technology options then you should have an answer.
If you have already done this and the problem is concurrent page requests then I apologise in advance, but I suspect not.