API calls in Microservices Architecture - api

not sure if my question is explainable enough but I will try to explain it here as much as I can. I am currently exploring and playing with a microservices architecture, to see how it works and learn more. Mostly I understand how things work, what is the role of API Gateway in this architecture, etc...
So I have one more theoretical question. For example, imagine there are 2 services, ie. event (which manage possible events) and service ticket which manages tickets related to a specific event (there could be many tickets). So these 2 services really depend on each other but they have a separated database, are completely isolated and loosely coupled, just as it should be in "ideal" microservices environment.
Now imagine I want to fetch event and all tickets related to that event and display it in a mobile application or web spa application or whatever. Is calling multiple services / URLs to fetch data and output to UI completely okay in this scenario? Or is there a better way to fetch and aggregate this data.
From my reading different sources calling one service from another service is adding latency, making services depend on each other, future changes in one service will break another one, etc so it's not a great idea at all.
I'm sorry if I am repeating a question and it was already asked somewhere (althought I could not find it), but I need an opinion from someone that met this question earlier and can explain the flow here in a better way.

Is calling multiple services / URLs to fetch data and output to UI
completely okay in this scenario? Or is there a better way to fetch
and aggregate this data.
Yes it is ok to call multiple services from your UI and aggregate the data in your Fronted code for your needs. Actually in this case you would call 2 Rest API's to get the data from ticket micro-service and event micro-service.
Another option is that you have some Views/Read optimized micro-service which would aggregate data from both micro-services and serve as a Read-only micro-service. This of course involves some latency considerations and other things. For example this approach can be used if you have requirement like a View which consists of multiple of models(something like a Denormalized view) which will be accessed a lot and/or have some complex filter options as well. In this approach you would have a Third micro-service which would be aggregated from the data of your 2 micro-services(tickets and events). This micro-services would be optimized for reading and could also if needed use a different storage type(Document db or similar). For your case if you would decide to do this you could have only one API call to this micro-service which will provide you all your data.
Calling One micro-service from another. In some cases you can not really avoid this. Even though there are some sources online which would tell you not to do it sometimes it is inevitable. For your example I would not recommend this approach as it would produce coupling and unnecessary latency which can be avoided with other approaches.
Background info:
You can read this answer where the topic is about if it is ok to call one micro-service from another micro-service. For your particular case it is not a good option but for some cases it might be. So read it for some clarification.
Summary:
I have worked with system where we where doing all those 3 things. It really depends on your business scenario and needs of your application. Which approach to pick will depend on a couple of criteria like: usability from UI, scaling(if you have high demand on the micro-services you could consider adding a Third micro-service which could aggregate data from tickets and events micro-service), domain coupling.
For your case I would suggest option 1 or option 2 (if you have a high demanding UI) from user prospective. For some cases option 1 is enough and having a third micro-service would be an overkill, but sometimes this is an option as well.

In my experience with cloud based services, primarily Microsoft Azure, the latency of one service calling another does indeed exist, but can be relied upon to be minimal. This is especially true when compared to the unknown latency involved with the users device making the call over whichever internet plan they happen to have.
There will always be a consuming client that is dependent on a service and its defined interface, whether it is the SPA app or a another service. So in the scenario which you described, something has to aggregate the payloads from both services.
Based on this I have seen improved performance by using a service which handles client requests, aggregates results from n services and responds accordingly. This does indeed result in dependencies, but as your services evolve, it is possible to have multiple versions of your services active simultaneously allowing you to depreciate older versions at a time that is appropriate.
I hope that helps.

Optional Advice
You can maintain the read table (denormalize ) inside any of the services , which suited best. Why? - Because the CQRS apply where needed , CQRS best suited for the big and complex application. It introduce the complexity in your system , and you gain less profit and more headache.

Related

Rest instead of direct sql for internal application data access

We have a layer based application architecture. It is written in C# and uses the sql objects available in .Net for data access. Some of it is a home built ORM, some is with stored procedures. We have a number of windows services that use this architecture to process data. Scaling and performance have always been issues. A new person on our team is pushing to convert our data access to use rest based data services. This would replace our current data access layer.
I don't think rest is meant for our architecture. I also have concerns about performance. I have to think it will be significantly slower. I don't see how going out of process to effectively a web service and then to the database for CRUD operations is not going to make our performance issues worse. I know rest can lead to performance improvements with caching and further scaling out abilities but that is not being addressed now. It's just a data access replacement with no bells and whistles for now. On top of this the initial implementation will not allow us to use stored procedures. All processing will be table based CRUD operations and any data massaging will be done in the C# code, no set based operations.
I could easily be wrong but I can see a disaster coming and I don't know if I'm right or if I'm a chicken little. Looking for any guidance, advice, case study references on this. Anything that can either help my case or resolve my dread. Thanks.
If all the clients, such as Windows services etc are using stored procedures or direct SQL access, that's tight coupling. Any change in the database schema will mean all clients need to be updated to handle the change. If the schema never changes, it's not an issue. If the schema does change and the people who developed one of the services have left, it's a big issue.
If the database is abstracted away from the clients behind a REST layer, that's loose coupling. Any change in the database schema is irrelevant to the clients. The only thing that needs to change is the data access layer of the REST layer. The client facing endpoints won't change. How those endpoints interact with the database will change.
Essentially, moving from direct SQL to REST is taking a step back from your system design. To paraphrase:
SELECT * FROM ORDERS WHERE NOT PAID
becomes
https://api.com/orders/unpaid
and the returned object is a domain object representing a list of unpaid order objects.
So the clients move away from the implementation (select *) and move towards a domain solution. There is no more tight coupling between clients and database but a loose coupling between the clients and the domain.
Rather than speaking to a database in its own language, the clients now talk the domain language, "get all unpaid orders". They don't care how those orders are stored, or where. They just ask the REST endpoint.
This is all just implementation so far but your understanding of the business will increase as you'll be talking in Domain Driven Design (DDD) terms as the REST endpoints will accept and return domain objects rather than raw SQL.
Is this good for the business? Is it good for developers to understand the business at a domain level? If these answers are "yes", is the cost/benefit ratio of rewriting tons of client code to talk REST positive enough to make the change? Will having a REST/domain interface with the data open up new ways of looking at the data? Will that touch on profits?
The real question is along the lines of, will changing the architecture to be a loosely coupled REST integration that improves understanding of business objects and opens the door to a wider talent pool (potentially more REST coders than SQL gurus?) be worth it in terms of future proofing the business without hitting profits in the short term?
Will thinking of the business in DDD terms be worth the initial hit of moving from SQL to REST? Will that new DDD experience open up new doors in future design?
These are the real questions to ask. Caching, scaling etc are REST implementation issues, only relevant once you've answered the philosophical questions posed above.
Good luck, sounds like an exciting time for you!

What's the best way to monitor your REST API? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I've created an API based on the RESTful pattern and I was wondering what's the best way to monitor it? Can I somehow gather statistics on each request and how deep could I monitor the requests?
Also, could it be done by using open source software (maybe building my own monitoring service) or do I need to buy third party software?
If it could be achieved by using open source software where do I start?
Start with identifying the core needs that you think monitoring will solve. Try to answer the two questions "What do I want to know?" and "How do I want to act on that information?".
Examples of "What do I want to know?"
Performance over time
Largest API users
Most commonly used API features
Error occurrence in the API
Examples of "How do I want to act on that information?"
Review a dashboard of known measurements
Be alerted when something changes beyond expected bounds
Trace execution that led to that state
Review measurements for the entire lifetime of the system
If you can answer those questions, you can either find the right third party solution that captures the metrics that you're interested in, or inject monitoring probes into the right section of your API that will tell you what you need to do know. I noticed that you're primarily a Laravel user, so it's likely that many of the metrics you want to know can be captured by adding before ( Registering Before Filters On a Controller ) and after ( Registering an After Application Filter ) filters with your application, to measure time for response and successful completion of response. This is where the answers to the first set of questions are most important ( "What do I want to know?" ), as it will guide where and what you measure in your app.
Once you know where you can capture the data, selecting the right tool becomes a matter of choosing between (roughly) two classes of monitoring applications: highly specialized monitoring apps that are tightly bound to the operation of your application, and generalized monitoring software that is more akin to a time series database.
There are no popular (to my knowledge) examples of the highly specialized case that are open source. Many commercial solutions do exist however: NewRelic, Ruxit, DynaTrace, etc. etc. etc. Their function could easily be described to be similar to a remote profiler, with many other functions besides. (Also, don't forget that a more traditional profiler may be useful for collecting some of the information you need - while it definitely will not supplant monitoring your application, there's a lot of valuable information that can be gleaned from profiling even before you go to production.)
On the general side of things, there are many more open source options that I'm personally aware of. The longest lived is Graphite (a great intro to which may be read here: Measure Anything, Measure Everything), which is in pretty common use amongst many. Graphite is by far from the only option however, and you can find many other options like Kibana and InfluxDB should you wish to host yourself.
Many of these open source options also have hosted options available from several providers. Additionally, you'll find that there are many entirely commercial options available in this camp (I'm founder of one, in fact :) - Instrumental ).
Most of these commercial options exist because application owners have found it pretty onerous to run their own monitoring infrastructure on top of running their actual application; maintaining availability of yet another distributed system is not high on many ops personnel's wishlists. :)
(I'm clearly biased for answering this since I co-founded Runscope which I believe is the leader in API Monitoring, so you can take this all with a grain of salt or trust my years of experience working with 1000s of customers specifically on this problem :)
I don't know of any OSS tools specific to REST(ful) API monitoring. General purpose OSS metrics monitoring tools (like Graphite) can definitely help keep tabs on pieces of your API stack, but don't have any API-specific features.
Commercial metrics monitoring tools (like Datadog) or Application Performance Monitoring (APM) tools like (New Relic or AppDynamics) have a few more features specific to API use cases, but none are centered on it. These are a useful part of what we call a "layered monitoring approach": start with high-level API monitoring, and use these other tools (exception trackers, APM, raw logs) to dive into issues when they arise.
So, what API-specific features should you be looking for in an API monitoring tool? We categorize them based on the three factors that you're generally monitoring for: uptime/availability, performance/speed and correctness/data validation.
Uptime Monitoring
At a base level you'll want to know if you're APIs are even available to the clients that need to reach them. For "public" (meaning, available on the public internet, not necessarily publicized...a mobile backend API is public but not necessarily publicized) APIs you'll want to simulate the clients that are calling them as much as possible. If you have a mobile app, it's likely the API needs to be available around the world. So at a bare minimum, your API monitoring tool should allow you to run tests from multiple locations. If your API can't be reached from a location, you'll want notifications via email, Slack, etc.
If your API is on a private network (corporate firewall, staging environment, local machine, etc.) you'll want to be able to "see" it as well. There are a variety of approaches for this (agents, VPNs, etc.) just make sure you use one your IT department signs off on.
Global distribution of testing agents is an expensive setup if you're self-hosting, building in-house or using an OSS tool. You need to make sure each remote location you set up (preferably outside your main cluster) is highly-available and fully-monitored as well. This can get expensive and time-consuming very quickly.
Performance Monitoring
Once you've verified your APIs are accessible, then you'll want to start measuring how fast they are performing to make sure they're not slowing down the apps that consume them. Raw response times is the bare minimum metric you should be tracking, but not always the most useful. Consider cases where multiple API calls are aggregated into a view for the user, or actions by the user generate dynamic or rarely called data that may not be present in a caching layer yet. These multi-step tasks or workflows are can be difficult to monitor with APM or metrics-based tools as they don't have the capabilities to understand the content of the API calls, only their existence.
Externally monitoring for speed is also important to get the most accurate representation of performance. If the monitoring agent sits inside your code or on the same server, it's unlikely it's taking into account all the factors that an actual client experiences when making a call. Things like DNS resolution, SSL negotiation, load balancing, caching, etc.
Correctness and Data Validation
What good is an API that's up and fast if it's returning the wrong data? This scenario is very common and is ultimately a far worse user experience. People understand "down"...they don't understand why an app is showing them the wrong data. A good API monitoring tool will allow you to do deep inspection of the message payloads going back and forth. JSON and XML parsing, complex assertions, schema validation, data extractions, dynamic variables, multi-step monitors and more are required to fully validate the data being sent back and forth is Correct.
It's also important to validate how clients authenticate with your API. Good API-specific monitoring tools will understand OAuth, mutual authentication with client certificates, token authentication, etc.
Hopefully this gives you a sense of why API monitoring is different from "traditional" metrics, APM and logging tools, and how they can all play together to get a complete picture of your application is performing.
I am using runscope.com for my company. If you want something free apicombo.com also can do.
Basically you can create a test for your API endpoint to validate the payload, response time, status code, etc. Then you can schedule the test to run. They also provide some basic statistics.
I've tried several applications and methods to do that, and the best (for my company and our related projects) is to log key=value pairs (atomic entries with all the information associated with this operation like IP source, operation result, elapsed time, etc... on specific log files for each node/server) and then monitorize with Splunk. With your REST and json data maybe your aproach will be different, but it's also well supported.
It's pretty easy to install and setup. You can monitor (almost) real time data (responses times, operation results), send notifications on events and do some DWH (and many other things, there are lots of plugins).
It's not open source but you can try it for free if you use less than 50MB logs per day (that's how it worked some time ago, since now I'm on enterprise license I'm not 100% sure).
Here is a little tutorial explaining how to achieve what you are looking for: http://blogs.splunk.com/2013/06/18/getting-data-from-your-rest-apis-into-splunk/

Web apps architecture: 1 or n API

Background:
I'm thinking about web application organisation. I will separate front (web site for browser) from back (API): 2 apps, 2 repository, 2 hosting. Front will call API for almost everything.
So, if I have two separate domain services with my API (example: learning context and booking context) with no direct link between them, should I build 2 API (with 2 repository, 2 build process, etc.) ? Is it a good practice to build n APIs for n needs or one "big" API ? I'm speaking about a substantial web app with trafic.
(I hope this question will not be closed as not constructive... I think it's a real question for a concrete case, sorry if not. This question and some other about architecture were not closed so there is hope for mine)
It all depends on the application you are working on, its business needs, priorities you have and so on. Generally you have several options:
Stay with one monolithic application
Stay with one monolithic application but decouple domain model across separate modules/bundles/libraries
Create distributed architecture (like Service Oriented Architecture (SOA) or Event Driven Architecture (EDA))
One monolithic application
It's the easiest and the cheapest way to develop application on its beginning stage. You don't have to worry about complex architecture, complex deployment and development process. It also works better if there are no many developers around.
Once the application is growing up, this model begins to be problematic. You can't deploy modules separately, the app is more exposed to anti-patterns, spaghetti code/design (especially when a lot people working on it). QA process takes more and more time, which may make it unusable on CI basis. Introducing approaches like Continuous Integration/Delivery/Deployment is also much much harder.
Within this approach you have one repo/build process for all your APIs,
One monolithic application but decouple domain model
Within this approach you still have one big platform, but you connect logically separate modules on 3rd party basis. For example you may extract one module and create a library from it.
Thanks to that you are able to introduce separate processes (QA, dev) for different libraries but you still have to deploy whole application at once. It also helps you avoid anti-patterns, but it may be hard to keep backward compatibility across libraries within the application lifespan.
Regarding your question, in this way you have separate API, dev process and repository for each "type of actions" as long as you move its domain logic to separate library.
Distributed architecture (SOA / EDA)
SOA has a lot profits. You can introduce completely different processes for each service: dev, QA, deploying. You can deploy just one service at once. You also can use different technologies for different purposes. QA process gets more reliable as it involves smaller projects. You can version communication (API) between services which makes them even more independent. Moreover you have better ability to scale horizontally.
On the other hand complexity of the high level architecture grows. You have much more different components you have to take care: authentication / authorisation between services, security, service discovering, distributed transactions etc. If your application is data driven (separate frontend which use APIs for consuming data) and particular services don't need to communicate to each other - it may be not as much complicate (but such assumption is IMO quite risky, sooner or letter you will need to communicate them).
In that approach you have separate API, with separate repositories and separate processes for each "type of actions" (which I understand ss separate domain model / services).
As I wrote on the beginning the way you choose depends on the application and its needs. Anyway, back to your original question, my suggestion is to keep APIs as separate as you can. Even if you have one monolithic application you should be able to version APIs separately and keep their domain logic separate. Separating repositories and/or processes depends on the approach you choose (eg. among these I mentioned before).
If I missed your point, please describe in more detailed way what answer do you expect.
Best!

Best practice for commitment control using RESTful APIs

We are currently designing an application using RESTful APIs to communicate with the DB. Our DB is a normalized structure, with upwards of 7 table representing a single application data point with an API for each DB entity.
What I am struggling with is how to institute commitment control over these tables. Ideally, I would like to call each API from my API controller, but that would make the commit scope to a table level, and make the application control rollbacks. This is not ideal as this would mean that we are in essence doing dirty writes.
What is the best practice to use RESTful APIs and still have teh DB perform commitment control?
The model that you expose as a group of RESTful resources need not be the same as the model that the database uses. For example, you can use RESTful manipulation to build up a “change description” resource (local to the user's session) that is then applied to the database in one commit. The change description is complex, but there are no problems with dirty writes because all the user is changing a private world until they choose that they're going to commit to it.
If you think of a web-based model (useful with REST!) then this is like filling out a complicated order form in multiple stages. The company from which you are buying happily lets you fill out the form, storing values as necessary, but doesn't commit to actually fulfilling the order and charging your credit card until you say that it is all ready to go. I'm sure you can apply the same principle to other complex modifications.
One key thing though; if the commitment is not idempotent (i.e., if you commit it twice, different things happen) it must be a POST. That's probably a good idea in your scenario anyway, since I'd guess you want to remove the “building up an action description” resource on successful POSTing. (Yes, we'd still be following the “web form” model.)
I do think you want to carefully consider the complexity of your models though. It's a useful exercise to make things as simple as possible (no simpler though) where “simple” involves keeping the number of concepts down. If you have lots of things, but they all work exactly the same way, they're actually pretty simple. (Increasing the number of address lines in a customer record doesn't really increase the complexity very much.) The good thing about REST is that it uses very few concepts, and they're concepts that lots of people are familiar with from the web.
Implement the controller you want for your RESTful services. The controller does little more than call through to a service layer where your transactions are managed. The service layer coordinates access to the various tables by whatever DAOs need to cooperate together--the DAOs do not need to be concerned with the transaction boundaries. If you happen to be a Spring user, this thread may be of help.

Does a CQRS project need a messaging framework like NServiceBus?

The last 6 months learning curve have been challenging with CQRS and DDD the main culprits.
It has been fun and we are 1/2 way through our project and the area I have not had time to delve into is a messaging framework.
Currently I don't use DTC so there is a very good likely hood that if my read model is not updated then I will have inconsistency between the read and write databases. Also my read and write database will be on the same machine. I doubt we will ever put them on separate machines.
I don't have a large volume of messages in my system so my concern is more to do with consistency and reliability of the system.
So, do I have to put in a messaging framework like NServiceBus (even though both read and write databases are on the same machine) or do I have other options? Yes there is learning curve but I suppose there would be a hell of a lot to learn if I don't use it.
Also, I don't want to put in a layer if it is not necessary
Thoughts?
Currently I don't use DTC so there is a very good likely hood that if
my read model is not updated then I will have inconsistency between
the read and write databases.
Personally, I dislike the DTC and try to avoid it. Instead, it is often possible to implement a compensation mechanism, especially for something like a read model where eventual consistency is already acceptable and updates are idempotent. For example, you could implement a version on entities and have a background task which ensures versions are in-sync. Having a DTC will provide transactional retry functionality, but it still won't solve cases where failure occurs after retries - you still have to watch the error log and have procedures in place to deal with errors.
So, do I have to put in a messaging framework like NServiceBus (even
though both read and write databases are on the same machine) or do I
have other options?
It depends on a few things. What you often encounter in a CQRS system is need for pub/sub where several sub-systems publish events to which the query/caching system subscribes to. If you see a need for pub/sub beyond basic point-to-point messaging, then go with something like NServiceBus. Also, I wouldn't immediately shy away from using NServiceBus even if you don't need it for scalability purposes because I think the logical partitioning is beneficial on its own. On the other hand, as you point out, adding layers of complexity is costly, therefore first try to see if the simplest possible thing will work.
Another question to ask is whether you need a separate query store at all. If all you have is a single machine, why bother? You could use something simpler like the read-model pattern and still reap a lot of the benefits of CQRS.
Does a CQRS project need a messaging framework like NServiceBus?
The short answer: no.
It is the first time I hear about the 'read-model pattern' mentioned by eulerfx. It is a nice enough name but there is a bit more to it:
The general idea behind the 'query' part is to query a denormalized view of your data. In the 'read-model pattern' link you will notice that the query used to populate the read-model is doing some lifting. In the mentioned example the required data manipulation is not that complex but what if it does become more complex? This is where the denomalization comes in. When you perform your 'command' part the next action is to denormalize the data and store the results for easy reading. All the heavy lifting should be done by your domain.
This is why you are asking about the messaging. There are several techniques here:
denormalized data in same database, same table, different columns
denormalized data in same database, different table
denormalized data in different database
That's the storage. How about the consistency?:
immediately consistent
eventually consistent
The simplest solution (quick win) is to denormalize your data in your domain and then after saving your domain objects through the repository you immediately save the denomarlized data to the same data store, same table(s), different columns. 100% consistent and you can start reading the denormalized data immediately.
If you really want to you can create a separate bunch of objects to transport that data but it is simpler to just write a simple query layer that returns some data carrying object provided by your data-access framework (in the case of .Net that would be a DataRow/DataTable). Absolutely no reason to get fancy. There will always be exceptions but then you can go ahead and write a data container.
For eventual consistency you will need some form of queuing and related processing. You can roll your own solution or your may opt for a service bus. That is up to you and your time / technical constraints :)
BTW: I have a free open-source service bus here:
Shuttle.Esb
documentation
Any feedback would be welcomed. But any old service bus will do (MassTransit / NServiceBus / etc.).
Hope that helps.