About Containers scalability in Micro service architecture - sql

A simple question about scalability. I have been studying about scalability and I think I understand the basic concept behind it. You use an orchestrator like Kubernetes to manage the automatic scalability of a system. So in that way, as a particular microservice gets an increase demand of calls, the orchestrator will create new instances of it, to deal with the requirement of the demand. Now, in our case, we are building a microservice structure similar to the example one at Microsoft's "eShop On Containers":
Now, here each microservice has its own database to manage just like in our application. My question is: When upscaling this system, by creating new instances of a certain microservice, let's say "Ordering microservice" in the example above, wouldn't that create a new set of databases? In the case of our application, we are using SQLite, so each microservice has its own copy of the database. I would asume that in order to be able to upscale such a system would require that each microservice connects to an external SQL Server. But if that was the case, wouldn't that be a bottle neck? I mean, having multiple instances of a microservice to attend more demand of a particular service BUT with all those instances still accessing a single database server?

In the case of our application, we are using SQLite, so each microservice has its own copy of the database.
One of the most important aspects of services that scale-out is that they are stateless - services on Kubernetes should be designed according to the 12-factor principles. This means that service-instances cannot have its own copy of the database, unless it is a cache.
I would asume that in order to be able to upscale such a system would require that each microservice connects to an external SQL Server.
yes, if you want to be able to scale-out, you need to use a database that are outside the instances and shared between the instances.
But if that was the case, wouldn't that be a bottle neck?
This depend very much on how you design your system. Comparing microservices to monoliths; when using a monolith, the whole thing typically used one big database, but with microservices it is easier to use multiple different databases, so it should be much easier to scale-out the database this way.
I mean, having multiple instances of a microservice to attend more demand of a particular service BUT with all those instances still accessing a single database server?
There are many ways to scale a database system as well, e.g. caching read-operations (but be careful). But this is a large topic in itself and depends very much on what and how you do things.

Related

Microservices are compatible with existing SQL database?

I'm creating a microservice architecture with Core, rabbitMQ, strangler pattern ... but I have to use an existing SQL database (Transaction requeriment).
Doing a research I don't found a lot of information about how implement SQL database, but I think it's impossible to do a transactional operation on different services at the same time.
1- Every service must have access to entirely database?
2- Is a good idea do a service exclusive to do transactionals operations?
3- SQL with microservices it's maybe too much slow?
I don't know if exist a standard for this.
Thanks.
The whole point of microservices is about having small, independent services that are decoupled as much as possible.
Sharing a common database introduces very strong coupling, and is not recommended.
If two services need the same data, you could either (a) have a different database for each, and replicate the data, or (b) introduce a third service that is responsible for access to the database.
If you're looking for a bigger-scale distributed transaction across microservices, then you should look into things like sagas. Typically you'll have a coordinator ("process manager" in some literature) that tracks the various operations, and can compensate or cancel actions that have been performed if the transaction as a whole is bound to fail.
3- SQL with microservices it's maybe too much slow?
What makes you think so?
There is nothing about SQL that makes it inadequate for microservices. Microservices may vary wildly in terms of what they do and what they require. SQL will be perfectly suitable for some microservices, and possibly not so suitable for others. It depends on the service.
It look like you need a distributed transactions in your system
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681205(v=vs.85).aspx
Also there is a nice book devoted to microservices. It includes distributed transactions and other patters used in microservice bases apps.
http://shop.oreilly.com/product/0636920033158.do
1- Every service must have access to entirely database?
No. A microservice has its own schema related to the Aggregate Root / Service that it offers. If a service needs data of another entity, it invokes the APIs provided by another micro service.
2- Is a good idea do a service exclusive to do transactionals
operations?
No. Each microservice is a transaction boundary in its own right. Distributed transactions, particularly using 2PC, do not perform particularly well.
3- SQL with microservices it's maybe too much slow?
I am not totally clear as to why you make such a statement.

WCF SOA: CRUD Data Access Service...why bother (or is our design wrong)?

We have a Data Access service in our SOA WCF system. This service is responsible for doing CRUD (create, update, delete) operations on "system wide" database tables, and is also the source of this data for queries. Any other service in the system wanting to access the tables under the contol of the DAS have to go to the DAS to get it or modify it. We use Entity Framework and built our own POCO state tracking system for this DAS.
We have other tables in our database that belong to single services and store data only for their own use, ie state information they can access if they crash and resume or recording of business information. We have a rule any one table cannot be accessed by more than one service: so data needed by multiple services ends up in the DAS.
Truth is I have never really understood why a Data Access Service is a good idea as opposed to just accessing tables directly. It seems to be to be slower, our DAS is not transactional as it cannot send back a POCO graph for database update (only single POCOS at a time) and we have issues also where the DAS is actually a client to another service which needs data from it...circular dependancy.
Why bother with a DAS? Why is a DAS so important when it comes to SOA? What am I missing here? Single point of control?
Is it also an SOA design flaw that not all tables are part of a DAS and that some services have their own "private" tables?
Any discussion about this welcome.
You're correct in thinking that this is the proper way to do things, and you're also correct that it slows things down and can occasionally be cumbersome. SOA necessarily trades off some efficiency in exchange for ensuring single points of control for all data associated with a service. In fact, even the idea of having a "common DAS" service is slightly smelly in some SOA circles.
By centralizing all CRUD operations to one service in an SOA application, you can ensure data integrity and that business rules are being acted upon properly. To give an example, think of an entity you'd like to store that has some business rules associated with it that are difficult to approach from a pure SQL perspective - for example, let's say a table that stores file references, and create / update services that ensure that these files exist.
With SOA and a single access point to those tables, you can code the logic into the create / update methods and be reasonably assured that the data you're recieving from the service is valid - i.e. the files referenced exist. If anyone was capable of writing to these tables or retrieving data from them, no such assurance would exist - even if you're calling the service yourself, you don't know what other programmers, through malice or just plan forgetfulness, forgot to implement that critical business rule. This leads to defensive programming where every bit of client code is ensuring business logic independently, and ultimately a tangled mess of business logic scattered throughout your application.
Another benefit is scalability and maintanability. Let's say one of your services is accessing a huge chunk of data. With SOA, everything is "black-boxed" so that your client code doesn't have much knowledge of how the data is ultimately obtained. You could change your RDBMS, partition tables, or implement caching, and make that all invisible to the client code calling it - ensuring your painful updates only need to be made in one place. With database code scattered throughout your app, this sort of upgrade becomes extremely painful.

WCF: sharing cached data across multiple services

We are developing a project that involves about 10 different WCF services with several endpoints each. One of the services keeps a few big tables of data cached in memory.
We have found we need access to that data from another service. Rather than keeping 2 copies of the cache, I'd like to be able to share those tables across all services.
I have done some research and found some articles about using an IExtension attached to the servicehosts to store the shared data.
Provided that all the services are running under the same web site, will that work? And is it the right approach? Or should I be looking elsewhere?
If the data that you're caching is required by more than one service, it sounds like - from a Service Oriented Architecture perspective, anyway - that it doesn't belong in either of services you have calling it.
If the data being cached isn't really related to either service, but is something that both services need, then perhaps it belongs in it's own seperate service. Have you considered encapsulating your cache in a third service, and performing a service-to-service call to retrieve the data you need? Benefits include...
It solves your original dilemma, avoiding the need to read the whole cache from the database several times;
It encapsulates the cache in one place for easy maintainance/change later.
It allows you to abstract the implementation of the cache away from the other services by putting another service interface in the way.
All in all, I'd suggest that's the best approach. The only downside is the extra overhead of making the service-to-service call, but that surely outperforms having to read the whole cache from the database.
Alternatively, if the data in your cache is very closely related to BOTH of the services that are calling the cache, i.e. both services add/change the data in the cache, etc. then perhaps the two existing services should be combined into a single service.
If what I'm saying is making some sense, then then principle of SOA I'm drawing on is Service Autonomy.
Provided all your services are part of the same application there doesn't seem to be any reason why you can't share the cache directly via a shared object reference. The simplest way of doing this is via a static field.
If you choose this approach, one thing to be very careful about is thread safety. If your cache is concurrently accessed via two WCF sessions, you must ensure that the two sessions are not going to interfere with each other by both changing the cache at the same time. If the cache is read-only, your need to do this is lessened, but you still might need to synchronrise initialisation of the cache.

Multi-tenancy with SQL/WCF/Silverlight

We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?
Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.
We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)
What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.
I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks
Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.

Application Level Replication Technologies

I am building out a solution that will be deployed in multiple data centers in multiple regions around the world, with each data center having a replicated copy of data actively updated in each region. I will have a combination of multiple databases and file systems in each data center, the state of which must be kept consistent (within a data center). These multiple repositories will be fronted by a SOA service tier.
I can tolerate some latency in the replication, and need to allow for regions to be off-line, and then catch up later.
Given the multiple back end repositories of data, I can't easily rely on independent replication solutions for each one to maintain a consistent state. I am thus lead to implementing replication at the application layer -- by replicating the SOA requests in some manner. I'll need to make sure that replication loops don't occur, and that last writer conditions are sorted out correctly.
In your experience, what is the best pattern for solving this problem, and are there good products (free or otherwise) that should be investigated?
Lotus/ Domino is your answer. I've been working with it for ten years and its exactly what you need. It may not be trendy (a perception that I would challenge) but its powerful, adaptable and very secure, The latest version R8 is the best yet.
You should definitely consider IBM Lotus Domino. A Lotus Notes database can replicate between sites on a predefined schedule. The replicate in Notes/Domino is definitely a very powerful feature and enables for full replication of data between sites. Even if a server is unavailable the next time it connects it will simply replicate and get back in sync.
As far as SOA Service tier you could then use Domino Designer to write a webservice. Since Notes/Domino 7.5.x (I believe) Domino has been able to provision and consume webservices.
AS what other advised, I will recommend also Lotus Notes/Domino. 8.5 is really very powerful application development platfrom
You dont give enough specifics to be certain of your needs but I think you should check out SQL Server Merge replication. It allows for asynchronous replication of multiple databases with full conflict resolution. You will need to designate a Global master and all the other databases will replicate to that one, but all the database instances are fully functional (read/write) and so you can schedule replication at whatever intervals suit you. If any region goes offline they can catch up later with no issues - if the master goes offline everyone will work independantly until replication can resume.
I would be interested to know of other solutions this flexible (apart from Lotus Notes/Domino of course which is not very trendy these days).
I think that your answer is going to have to be based on a pub/sub architecture. I am assuming that you have reliable messaging between your data centers so that you can rely on published updates being received eventually. If all of your access to the data repositories is via service you can add an event notification to the orchestration of each of your update services that notifies all interested data centers of the event. Ideally the master database is the only one that sends out these updates. If the master database is the only one sending the updates you can exclude routing the notifications to the node that generated them in the first place thus avoiding update loops.