I am new to microservices and have been struggling to wrap my brain around it. On the surface they sound like a good idea, but from a practical standpoint, I can't break away from my centralized database background. For an example, I have this real-world Marketplace example that I cannot figure out if microservices would help or hurt. This site was working well until the PO asked for "Private Products." Now it is fragile and slow so I need to do a major refactor. A good time to implement microservices. I feel like many systems have this type of coupling, so that deconstructing this example would be very instructive.
Current State
This is a b2b marketplace where users belong to companies that are buying products from each other. Currently, there exists a monolithic database: User, Company, Catalog, Product, and Order. (This is a simplification, the actual scenario is much more complicated, users have roles, orders have header/detail, products have inventories, etc.)
Users belong to Companies
Companies have a Catalog of their Products
Companies have Orders for Products from other Companies
So far so good. I could see breaking the app into microservices on the major entity boundaries.
New Requirement
Unfortunately for my architectural aspirations, the product owner wants more features. In this case: Private Products.
Products are Public or Private
Companies send time-bound Invitations to Products or Catalogs to Users of other Companies
This seemingly simple request all the suddenly complicated everything.
Use Case - User displays a list of products
For example, listing or searching products was once just a simple case of asking the Products to list/search themselves. It is one of the top run queries on the system. Unfortunately, now what was a simple use case just got messy.
A User should be able to see all public Products (easy)
A User should be able to see all their own Company's private Products (not horrible)
A User can see any Product that their Company has Ordered in the past regardless of privacy (Uh oh, now the product needs to know about the User Company's Order history)
A User can see any private Product for which they have an active Invitation (Uh oh, now the product needs to know about the User's Product or Catalog Invitations which are time dependent)
Initial Monolithic Approach
This can be solved at the database level, but the SQL joins basically ALL of the tables together (and not just master data tables, all the transactions as well.) While it is a lot slower than before, since a DBMS is designed for big joins it seems like the appropriate tool. So I can start working on optimizing the query. However, as I said, for this and other reasons the system is feeling fragile.
Initial Design Thoughts... and ultimately my questions
So considering a Microservices architecture as a potential new direction, I began to think about how to start. Data redundancy seems necessary. Since, if I translate my major entities into services, asking to get a list of products without data redundancy would just have all of the services calling each other and a big slow mess.
Starting with a the idea of carving out "Product and Catalog" as its own microservice. Since Catalogs are just collections of Products, they seem to belong together - I'll just call the whole thing the "Product Service". This service would have an API for managing both products and catalogs and, most importantly, to search and list them.
As a separate service, to perform a Product search would require a lot of redundant data as it would have to subscribe to any event that affected product privacy, such as:
Listen for Orders and keep at least a summary of the relationship between purchased Products and Purchasing Companies
Listen to Invitations and maintain a list of active User/Product/Time relationships
Listen to User and Company events to maintain a User to Company relationship
I begin to worry about keeping it all synchronized.
In the end, a Product Service would have a large part of the current schema replicated. So I begin to think, maybe Microservices won't work for this situation. Or am I being melodramatic and the schema will be simpler enough to be more managable and faster so it is worth it?
Overall, am I thinking about this whole approach properly? Is this how microservice based designs are intended to be thought through? If not, can somebody give me a push in a different direction?
Try splitting your system into services over and over until it makes sense. Use your gut feeling. Read more books, articles, forums where other people describing how they did it.
You've mentioned that there is no point of splitting ProductService into Product and ProductSearch - fair enough, try to implement it like that. If you will end up with a pretty complicated schema for some reason or with performance bottlenecks - it's a good sign to continue splitting further. If not - it is fine like that for your specific domain.
Not all product services made equal. In some situations, you have to be able to create millions or even billions of products per day. In this situation, it is most likely that you should consider separating product catalogue and product search. The reason is performance: to make search perform faster (indexing) you have to slow down inserts. These are two mutually exclusive goals that are hard to reach without separating data into different microservices (which will lead to data duplication as well).
Related
I am working at a company that merged with another company a while ago.
There we have several business units that are basically equivalent. One in Europe, one in China, each. We already had an in-house MariaDB database, which we want to start sharing.
The problem is that there are different GDPR regulations and contracts that prohibit sharing certain data across sites. So what I can't do, is replicate data across sites and then just hide in from the user in the frontend. The private data has to stay at the facility, it belongs to.
So my idea was to separate each table that we have now and where possibly sensitive information is contained into two tables each.
One say table_contracts_private and table_contracts_public.
This would still seem pretty doable with basic database replication and replicating the public tables across sites. But how would you go about publishing private data? Also how would I best combine private and public data? Just using a view
I just could not find any good mechanisms for this, especially because we would also like to avoid data duplication, so the private entries would need to be removed and replaced by the public ones, which would entail also changing all referencing IDs.
Is this a possible application of sharding?
I'd be really grateful, if someone could point me in the right direction, or if someone has a demo project with similar requirements that I could check out.
Cheers
Is this a possible application of sharding?
I wouldn't think so. Sharding is a performance optimization method. What you need is to support legal constraints. Those are two very different problems.
I think you are on the right track. I call this a "walled garden" approach. You create a database with all non-PII information, using ids only. Nothing that remotely directly identifies people, their addresses, phones, credit cards, and so on. This can be tricky. In some jurisdictions combinations of demographics can be PII.
Some of those ids then refer to another database where you store all the sensitive information; this is the "walled garden". I would recommend that this second database be on a separate server. It has a very restricted access list. And this is where you implement requirements for things like "forgetting" a customer.
In any case, the point is that sharding is not the right approach. You want an application redesign with privacy and security as the top priorities. Happily, this is not actually that hard to implement, although if the databases are changing, you may need period auditing. For instance, in one database I worked with, we discovered that "coupon codes" sometimes contained unencrypted email addresses. Arrgggh!
I want to design a Db for an e-commerce site. The web application will provide facility to users to register as a business owner and then they will be able to perform CURD operation on their items. While the buyers or clients, will able to search for a particular product or business name, based on the search criteria results will be shown to clients. when the buyers will click a product from filtered results the buyer will be taken to that business page.
Based on the description I designed a DB but I'm not sure will it work or not?
Kindly go through and suggest changes as well as relations
enter image description here
First of all we cannot see all the data items there because we cannot actually scroll in the image. The Design looks okay to me and is normalized to some extent, I cannot be sure unless I see the whole thing though.
Database Designs these days aren't simple like it used to be and with new technologies out in the market it is getting complicated day by day. I know people who don't go for normalization and okay with repeating the data in different collections or tables so that the actual machine operation is reduced. With nosql databases it has becomes easier to have big chunks of data with very good turnaround time.
Why do you think it wont work BTW?
What if you had one large database to server all your apps. So your website that needs to store customer orders can use the same database that your game uses to store registered users. Different applications could have tables only for them to use. Some may say that this could be a security issue, because if someone cracks your database, they could attack all your applications. But in a lot of databases you could use a line like the following to restrict access:
deny select on aTable to aUser;
I am wondering if this central database would be considered a poor practice, and if so why?
They way I look at it, a web application is nothing more than a collection of web pages. Because of this, it really doesn't matter if one page is about, say, cooking, while the other page is about computer programming.
If you also consider it, this is very similar to Openid, which I use to log into my SO account!
If you have your fundamental security implemented correctly, it doesn't matter how the user is interacting with your website. Where I would make this distinction is in two cases:
Don't mix http with https. On a shared host, this isn't going to be an issue anyway; if you buy the certificate for https, make everything that way (excluding the rare case where this might affect performance).
E-commerce or financial data should be handled fundamentally in a different way. If you look at your typical bank, they have multiple log-in protocols, picture verification and short log-in times. This builds confidence in user's securities. It would be a pain in the butt for a game site, or most other non-mission critical applications.
Regarding structure, if you do mix applications into one large database, you should consider the other maintenance issues, such as:
Keep tables separate; consider a prefix for every table unique to each application. Following my example above, you would then start the cooking DB table names with 'ck', and the computer programming DB table names with 'pg'. This would allow you to easily separate the applications if you need to in the future.
Use a matching table to identify which ID goes to which web application.
Consider what you would do and how to handle it if a user decided to register for both applications. Do you want to offer transparency that they can share the same username?
Keep an eye on both your data storage limit AND your bandwidth limit.
If you are counting on these applications to drive revenue, you are putting "all your eggs in one basket". Make sure if it goes down, you have options to restore or move to another host.
These are just a few of the things to consider. But fundamentally, outside of huge (big data) applications there is nothing wrong with sharing resources/databases/hardware between applications.
Conceptually, it could be done.
Implementation-wise, to make the various parts distinct from one another, you could use both naming conventions (as per #Sable Foste) and/or separate database schemas (table Finance.Users, GameApp.Users, etc.)
Management-wise, things could get tricky. Repeating some points, adding others:
One application could use a disproportionally large share of resources (disk space, I/O, CPU)
Tracking versions could be tricky (App is v4, finance is v7) -- depends on how many application instances you have to support.
Disaster recovery-wise, everything is lumped together. It all gets backed up as one set, it all gets restored as one set. Finance corrupt? Restore from backup... and lose your more recent game data.
Single point of failure. One database goes down, all your applications are down.
These (and other similar issues) are trade-offs you'll want to consider. Plan ahead, to lessen the chance that what's reasonable and economic today becomes a major headache tomorrow.
Can you please give me an database design suggestion?
I want to sell tickets for events but the problem is that the database can become bootleneck when many user what to buy simultaneously tickets for the same event.
if I have an counter for tickets left for each event there will be more updates on this field (locking) but I will easy found how much tickets are left
if I generate tickets for each event in advance it will be hard to know how much tickets are left
May be it will be better if each event can use separate database (if the requests for this event are expected to be high)?
May be reservation also have to asynchronous operation?
Do I have to use relation database (MySQL, Postgres) or no relation database (MongoDB)?
I'm planing to use AWS EC2 servers so I can run more servers if I need them.
I heard that "relation databases don't scale" but I think that I need them because they have transactions and data consistency that I will need when working with definite number of tickets, Am I right or not?
Do you know some resources in internet for this kind of topics?
If you sell 100.000 tickets in 5 minutes, you need a database that can handle at least 333 transactions per second. Almost any RDBMS on recent hardware, can handle this amount of traffic.
Unless you have a not so optimal database schema and/of SQL, but that's another problem.
First things first: when it comes to selling stuff (ecommerce), you really do need a transactional support. This basically excludes any type of NoSQL solutions like MongoDB or Cassandra.
So you must use database that supports transactions. MySQL does, but not in every storage engine. Make sure to use InnoDB and not MyISAM.
Of cause many popular databases support transactions, so it's up to you which one to choose.
Why transactions? Because you need to complete a bunch of database updates and you must be sure that they all succeed as one atomic operation. For example:
1) make sure ticket is available.
2) Reduce the number of available tickets by one
3) process credit card, get approval
4) record purchase details into database
If any of the operations fail you must rollback the previous updates. For example if credit card is declined you should rollback the decreasing of available ticket.
And database will lock those tables for you, so there is no change that in between step 1 and 2 someone else tries to purchase a ticket but the count of available tickets has not yet been decreased. So without the table lock it would be possible for a situation where only 1 ticket is left available but it is sold to 2 people because second purchase started between step 1 and step 2 of first transaction.
It's essential that you understand this before you start programming ecommerce project
Check out this question regarding releasing inventory.
I don't think you'll run into the limits of a relational database system. You need one that handles transactions, however. As I recommended to the poster in the referenced question, you should be able to handle reserved tickets that affect inventory vs tickets on orders where the purchaser bails before the transaction is completed.
your question seems broader than database design.
first of all, relational database will scale perfectly well for this. You may need to consider a web services layer which will provide the actual ticket brokering to the end users. here you will be able to manage things in a cached manner independent of the actual database design. however, you need to think through the appropriate steps for data insertion, and update as well as select in order to optimize your performance.
first step would be to go ahead and construct a well normalized relational model to hold your information.
second, build some web service interface to interact with the data model
then put that into a user interface and stress test for many simultaneous transactions.
my bet will be you need to then rework your web services layer iteratively until you are happy - but your database (well normalized) will not be cusing you any bottleneck issues.
We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?
Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.
We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)
What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.
I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks
Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.