Is it possible to query the ORM of a microservice through its API?

Is it possible to query the ORM of a microservice through its API? - orm

Is it possible to query the ORM of a microservice through its API, and use it as the ORM of another microservice?
E.g. Let's say I have the microservice A with its API (let's call it API_A), its DB (DB_A) and its internal Object-Relation Mapper instances (ORM_A) defining the correspondences between the the classes belonging to the microservice into the structure of the relational DB and managing its access.
Now imagine I want to have a microservice B, with different functionalities respect to A, although with the same ORM as A (and so a DB with the same structure of DB_A, although not necessarily with the same data, as the different functionalities may produce different data).
How do I query/copy/mirror ORM_A into the microservice B in a smart way, so that I have no code duplication and when A changes, also ORM_B changes accordingly with no manual intervention?
Is there the option to query ORM_A into B via its API, and recreate it in the microservice B?

The idea that code changes inside API_A could yield code changes inside API_B creates a coupling between the services and their data that would suggest they shouldn't be two different services.
If API_B does in fact perform wildly different functions than API_A and only needs a few pieces of data from structures surfaced by API_A, you should consider a couple different options to ensure the relevant data is accessible to API_B from API_A:
Surface the data from API_A in an endpoint that is accessible to API_B. This creates an API contract that is easier to enforce and test. This solution is relatively easy to implement, but creates some dependency relationships between the two APIs.
Setup an event topic that you can notify whenever API_A writes data that API_B (or other services) might want to consume. By reading these events, API_B can write the relevant data to its own DB in its own format to avoid coupling with A either by data structures or contracts. This solution requires the creation of event queues, but would be the best solution for the performance of API_B.
One thing that I've seen people struggle with when adopting microservices (I struggled with it myself) was the idea that data duplication is ok. Try not to get stuck thinking of data as relational across multiple services because that's how you'll naturally create the kind of coupling that you'll want to avoid in microservices. Good luck!

Related

2 different not shared databases for the same microservice good approach?

Context:
Microservices architecture, DDD, CQRS, Event driven.
SQL database.
I have an use case, where I have to store a record when a entity state is updated. I'm afraid that the quantity of records could be huge, and I was thinking that maybe an sql table is not the right place to store it. Also this records are used every now and then, and probably not by the service domain.
Could be a good practice to store it in another database(firestore, mongo, cassandra...) so it doesn't affect the performance and the scope of this service?
Thanks!

Could be a good practice to store it in another database(firestore, mongo, cassandra...) so it doesn't affect the performance and the scope of this service?
Part of the benefit of using microservices is that you are hiding implementation details. As such, you are able to store/process data by whatever means is required or available without the need to broadcast that implementation to external services.
That said, from a technical standpoint, it is worth considering transaction boundaries. When writing to a single database, it is possible to commit transactions easily. Once you are writing to different databases within the same transaction, you can run into situations where one write might succeed while another one might fail.
My recommendation is to make sure you write to only one of those databases at a time. Use a two-phase commit to ensure that the second database is written to. In this way, you can avoid lost data and get the benefit of using a more efficient data store.

Best practice around GraphQL nesting depth

Is there an optimum maximum depth to nesting?
We are often presented with the option to try to represent complex heirarchical data models with the nesting they demonstrate in real life. In my work this is genetics and modelling protein / transcript / homology relationships where it is possible to have very deep nesting up to maybe 7/8 levels. We use dataloader to make nested batching more efficient and resolver level caching with directives. Is it good practice to model a schema on a real life data model or should you focus on making your resolvers reasonable to query and keep nesting to a maximum ideal depth of say 4 levels?
When designing a schema is it better to create a different parent resolver for a type or use arguments that to direct a conditional response?
If I have two sets of for example ‘cars’ let’s say I have cars produced by Volvo and cars produced by tesla and the underlying data while having similarities is originally pulled from different apis with different characteristics. Is it best practice to have a tesla_cars and volvo_cars resolver or one cars resolver which uses for example a manufacturer argument to act differently on the data it returns and homogenise the response especially where there may then be a sub resolver that expects certain fields which may not be similar in the original data.
Or is it better to say that these two things are both cars but the shape of the data we have for them is significantly different so its better to create seperate resolvers with totally or notably different fields?
Should my resolvers and graphQL apis try to model the data they describe or should I allow duplication in order to create efficient application focused queries and responses?
We often find ourselves wondering do we have a seperate API for application x and y that maybe use underlying data and possibly even multiple sources (different databases or even API calls) inside resolvers very differently or should we try to make a resolver work with any application even if that means using type like arguments to allow custom filtering and conditional behaviour?

Is there an optimum maximum depth to nesting?
In general I'd say: don't restrict your schema. Your resolvers / data fetchers will only get called when the client requests the corresponding fields.
Look at it from this point of view: If your client needs the data from 8 levels of the hierarchy to work, then he will ask for it no matter what. With a restricted schema the client will execute multiple requests. With an unrestricted schema he can get all he needs in a single request. Though the amount processing on your server side and amount of data will still be the same, just split across multiple network requests.
The unrestricted schema has several benefits:
The client can decide if he wants all the data at once or use multiple requests
The server may be able to optimize the data fetching process (i.e. don't fetch duplicate data) when he knows everything the client wants to receive
The restricted schema on the other hand has only downsides.
When designing a schema is it better to create a different parent resolver for a type or use arguments that to direct a conditional response
That's a matter of taste and what you want to achieve. But if you expect your application to grow and incorporate more car manufacturers, your API may become messy, if there are lot's of abc_cars and xyz_cars queries.
Another thing to keep in mind: Even if the shape of data is different, all cars have something in common: They are some kind of type Car. And all of them have for example a construction year. If you now want to be able to query "all cars sorted by construction year" you will need a single query endpoint.
You can have a single cars query endpoint in your api an then use interfaces to query different kinds of cars. Just like GraphQL Relay's node endpoint works: Single endpoint that can query all types that implement the Node interface.
On the other hand, if you've got a very specialized application, where your type is not extensible (like for example white and black chess pieces), then I think it's totally valid to have a white_pieces and black_pieces endpoint in your API.
Another thing to keep in mind: With a single endpoint some queries become extremely hard (or even impossible), like "sort white_pieces by value ascending, and black_pieces by value descending". This is much easier if there are separate endpoints for each color.
But even this is solvable if you have a single endpoint for all pieces, and simply call it twice.
Should my resolvers and graphQL apis try to model the data they describe or should I allow duplication in order to create efficient application focused queries and responses?
That's question of use case and scalability. If you have exactly two types of clients that use the API in different ways, just build two seperate APIs. But if you expect your application to grow, get more different clients, then of course it will become an unmaintainable mess to have 20 APIs.
In this case have a look at schema directives. You can for example decorate your types and fields to make them behave differently for each client or even show/hide parts of your API depending on the client.
Summary:
Build your API with your clients in mind.
Keep things object oriented, make use of interfaces for similar types.
Don't provide endpoints you clients don't need, you can still extend your schema later if necessary.
Think of your data a huge graph ;) that's what GraphQL is all about.

Is it okay to have more than one repository for an aggregate in DDD?

I've read this question about something similar but it didn't quite solve my problem.
I have an application where I'm required to use data from an API. Problem is there are performance and technical limitations to doing this. The performance limitations are obvious. The technical limitations lie in the fact that the API does not support some of the more granular queries I need to make.
I decided to use MySQL as a queryable cache.
Since the data I needed to retrieve from the API did not change very often, I settled on refreshing the cache once a day, so I didn't need any complicated mapper that checked if we had the data in the cache and if not fell back to the API. That was my first design, but I realized that wasn't very practical when the API couldn't support most of the queries I needed to make anyway.
Now I have a set of two mappers for every aggregate. One for MySQL and one for the API.
My problem is now how I hide the complexities of persistence from the domain, and the fact that it seems that I need multiple repositories.
Ideally I would have an interface that both mappers adhered to, but as previously disclosed that's not possible.
Is it okay to have multiple repositories, one for each mapper?

Is it okay to have more than one repository for an aggregate in DDD?
Short answer: yes.
Longer answer: you won't find any suggestion of multiple repository in the original book by Evans. As he described things, the domain model would have one representation of the aggregate, and the repository abstraction provided consumers with the illusion that the aggregate was stored in an in-memory collection.
Largely, this makes sense -- you are trying to ensure that writes to data within the aggregate boundary are consistent, so you need a single authority for change.
But... there's no particular reason that reads need to travel through the same code path as writes. Welcome to the world of cqrs. What that gives you immediately is the idea that the in memory representation for reads might need to be optimized differently from the in memory representation used for writes.
In its more general form, you get the idea that the concept that you are modeling might have different representations for each use case.
For your case, where it is sometimes appropriate to read from the RDBMS, sometimes from the API, sometimes both, this isn't quite an exact match -- the repository interface hides the implementation details from the consumer, but you still have to bother with the implementation.
One thing you might look at is your requirements; how fresh does the data need to be in each use case? A constraint that is often relaxed in the CQRS pattern is the idea that the effects of writes are immediately available for reading. The important question to ask would be, if the data hasn't been cached yet, can you simply report "data not available" without hitting the API?
If so, then use cases that access the cached data need only a single repository implementation.

If you are using external API to read and modify data, you can cache them locally to be faster in reads, but I would avoid to have a domain repository.
From the domain perspective it seems that you need a service to query (or just a Query in CQRS implementation) for some data, that you can do with a service, that internally can call some remote API or read from a local cache (mysql, whatever).
When you read your local cache you can develop a repository to decouple your logic from the db implementation, but this is a different concept from a domain repository, it is just a detail of your technical implementation, that has nothing to do with your domain.
If the remote service start offering the query you need you will change the implementation of how your query is executed, calling the remote API instead of the db, but your domain model should not change.
A domain repository is used to load and persist your aggregates, meanwhile if you are working with external aggregates (in a different context, subdomain) you need to interact with them using services.

Is it a good practice to have one RESTful resource for each database table?

What is the best way to map RESTful resources to database tables? When defining the architecture of a RESTful API, which criteria decides which resources to have and what is contained in each resource? Should each database table be mapped to a separate resource, or is this not best practice?

Don't.
The API layer should not be tied to the data layer. That's an undesirable instance of strong coupling. The purpose of the database is to store data in a way that makes retrieval convenient. The purpose of the API is to get clients the information they need. It's highly unlikely that they will have the same structure. Furthermore, if you tightly couple them, you can't make changes to your database structure (such as renormalizing) without making a breaking API change.

How to use multiple databases (MSSQL) in one service or DAL?

I have been looking through a few different threads all over the interwebs and either I can't see that the proposed solution will work for me, or their particular situations are not the same as the one I am in.
Currently with have about 8 or so different self contained databases, each sitting behind yet another self contained website (asp.net webforms). All of the databases are very small, and serve a very particular purpose. That said, none of the schemas and designs really match up in a reasonable way. There are various GUIDs that identified a user that will all EVENTUALLY map to a single guid for that user, but the mapping between them are different, and sometimes you have to hop between a couple databases to get what you need from a third. It's just a bit messy.
I would like to refactor completely to put them all in one database to reduce confusion and duplicate data, but due to push-back the solution will have to be use what we have.
What I want to do them is create a layer where all these databases will be accessible in ONE application (maybe fire up a new site and put it there). A few questions regarding this:
Is there a simple solution where I can leverage Entity Framework on each database and have them all sit in the same application?
Would something like a WCF service where I map every CRUD operation that needs to be done for each database be an efficient solution?

Each DBContext in Entity Framework uses a single database, so your solution here must use one for each database. What I usually do is create an IContext around the DBContext that implements the CRUD operations, then build a Repository that performs the "business logic" (like RegisterUser) my manipulating multiple contexts. The IContext should represent a set of related "database objects" (such as IProductContext for the product database). For now, you would probably just have one context per database. As databases are consolidated, just change the connection string of the affected context(s) to the new database and you can get back up and running without much (if any) code changes.
Here's a good intro I found on what I'm talking about, though I think this calls my "Contexts" Repositories" and my "Repositories" "Unit of Work".
http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application
That said, you could probably write one WCF service that abstracts away multiple EF DBContexts, then you can change things on the backend while continuing to use the same webservice.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas