How to design Database for an e-commerce site

How to design Database for an e-commerce site - sql

I want to design a Db for an e-commerce site. The web application will provide facility to users to register as a business owner and then they will be able to perform CURD operation on their items. While the buyers or clients, will able to search for a particular product or business name, based on the search criteria results will be shown to clients. when the buyers will click a product from filtered results the buyer will be taken to that business page.
Based on the description I designed a DB but I'm not sure will it work or not?
Kindly go through and suggest changes as well as relations
enter image description here

First of all we cannot see all the data items there because we cannot actually scroll in the image. The Design looks okay to me and is normalized to some extent, I cannot be sure unless I see the whole thing though.
Database Designs these days aren't simple like it used to be and with new technologies out in the market it is getting complicated day by day. I know people who don't go for normalization and okay with repeating the data in different collections or tables so that the actual machine operation is reduced. With nosql databases it has becomes easier to have big chunks of data with very good turnaround time.
Why do you think it wont work BTW?

Related

New to Microservices - refactoring a monolith "Marketplace" database

I am new to microservices and have been struggling to wrap my brain around it. On the surface they sound like a good idea, but from a practical standpoint, I can't break away from my centralized database background. For an example, I have this real-world Marketplace example that I cannot figure out if microservices would help or hurt. This site was working well until the PO asked for "Private Products." Now it is fragile and slow so I need to do a major refactor. A good time to implement microservices. I feel like many systems have this type of coupling, so that deconstructing this example would be very instructive.
Current State
This is a b2b marketplace where users belong to companies that are buying products from each other. Currently, there exists a monolithic database: User, Company, Catalog, Product, and Order. (This is a simplification, the actual scenario is much more complicated, users have roles, orders have header/detail, products have inventories, etc.)
Users belong to Companies
Companies have a Catalog of their Products
Companies have Orders for Products from other Companies
So far so good. I could see breaking the app into microservices on the major entity boundaries.
New Requirement
Unfortunately for my architectural aspirations, the product owner wants more features. In this case: Private Products.
Products are Public or Private
Companies send time-bound Invitations to Products or Catalogs to Users of other Companies
This seemingly simple request all the suddenly complicated everything.
Use Case - User displays a list of products
For example, listing or searching products was once just a simple case of asking the Products to list/search themselves. It is one of the top run queries on the system. Unfortunately, now what was a simple use case just got messy.
A User should be able to see all public Products (easy)
A User should be able to see all their own Company's private Products (not horrible)
A User can see any Product that their Company has Ordered in the past regardless of privacy (Uh oh, now the product needs to know about the User Company's Order history)
A User can see any private Product for which they have an active Invitation (Uh oh, now the product needs to know about the User's Product or Catalog Invitations which are time dependent)
Initial Monolithic Approach
This can be solved at the database level, but the SQL joins basically ALL of the tables together (and not just master data tables, all the transactions as well.) While it is a lot slower than before, since a DBMS is designed for big joins it seems like the appropriate tool. So I can start working on optimizing the query. However, as I said, for this and other reasons the system is feeling fragile.
Initial Design Thoughts... and ultimately my questions
So considering a Microservices architecture as a potential new direction, I began to think about how to start. Data redundancy seems necessary. Since, if I translate my major entities into services, asking to get a list of products without data redundancy would just have all of the services calling each other and a big slow mess.
Starting with a the idea of carving out "Product and Catalog" as its own microservice. Since Catalogs are just collections of Products, they seem to belong together - I'll just call the whole thing the "Product Service". This service would have an API for managing both products and catalogs and, most importantly, to search and list them.
As a separate service, to perform a Product search would require a lot of redundant data as it would have to subscribe to any event that affected product privacy, such as:
Listen for Orders and keep at least a summary of the relationship between purchased Products and Purchasing Companies
Listen to Invitations and maintain a list of active User/Product/Time relationships
Listen to User and Company events to maintain a User to Company relationship
I begin to worry about keeping it all synchronized.
In the end, a Product Service would have a large part of the current schema replicated. So I begin to think, maybe Microservices won't work for this situation. Or am I being melodramatic and the schema will be simpler enough to be more managable and faster so it is worth it?
Overall, am I thinking about this whole approach properly? Is this how microservice based designs are intended to be thought through? If not, can somebody give me a push in a different direction?

Try splitting your system into services over and over until it makes sense. Use your gut feeling. Read more books, articles, forums where other people describing how they did it.
You've mentioned that there is no point of splitting ProductService into Product and ProductSearch - fair enough, try to implement it like that. If you will end up with a pretty complicated schema for some reason or with performance bottlenecks - it's a good sign to continue splitting further. If not - it is fine like that for your specific domain.
Not all product services made equal. In some situations, you have to be able to create millions or even billions of products per day. In this situation, it is most likely that you should consider separating product catalogue and product search. The reason is performance: to make search perform faster (indexing) you have to slow down inserts. These are two mutually exclusive goals that are hard to reach without separating data into different microservices (which will lead to data duplication as well).

Data modeling Issue for Moqui custom application

We are working on one custom project management application on top of Moqui framework. Our requirement is, we need to inform any changes in ticket to the developers associated with the project through email.
Currently we are using WorkEffortParty entity to store all parties associated with the project and then PartyContactMech entity to store their email addresses. Here we need to iterate through WorkEffortParty and PartyContactMech everytime to fetch all email address to which we need to send emails for changes in tickets every time.
To avoid these iterations, we are now thinking of giving feature to add comma separated email addresses at project level. Project admin can add email addresses of associated parties or mailing list address to which he needs to send email notification for ticket change.
For this requirement, we studied around the data model but we didn't got the right place to store this information. Do we need to extend any entity for this or is there any best practice for this? This requirement is very useful in any project management application. We appreciate any help on this data modeling problem.

The best practice is to use existing data model elements as they are available. Having a normalized data model involves more work in querying data, but also more flexibility in addressing a wide variety of requirements without changes to the data structures.
In this case with a joined query you can get a list of email addresses in a single query based on the project's workEffortId. If you are dealing with massive data and message volumes there are better solutions than denormalizing source data, but I doubt that's the case... unless you're dealing with more than thousands of projects and millions of messages per day the basic query and iterate approach will work just fine.
If you need to go beyond that the easiest approach with Moqui is to use a DataDocument and DataFeed to send updates on the fly to ElasticSearch, and then use it for your high volume queries and filtering (with arbitrarily complex filtering, etc requirements).
Your question is way too open to answer directly, data modeling is a complex topic and without good understanding of context and intended usage there are no good answers. In general it's best to start with a data model based on decades of experience and used in a large number of production systems. The Mantle UDM is one such model.

How can SQL Server be used to efficiently store website page views?

I am currently recording basic page views on a website using a single column, incrementing by one on each page load.
This gives a limited, very general view of the most visited pages, without taking into account pages being repeatedly loaded by a visitor, or being visited by search bots, etc.
Without worrying about these, I would like to efficiently track webpage visits, to allow querying for more detail, such as the most popular page today, or most popular this week.
Storing each view as an individual record would surely be quickly inefficient, and the data required doesn't need that level of detail.

In order to answer your question, you would have to provide your storage requirements and limitations, as well as the information that you want to store to identify page views.
In terms of pure storage efficiency, your existing logging is, i'd say, the most efficient way of storing page views, but realistically, this data is not very relevant without other pieces of information that give you a better picture, as you mentioned, tracking either the user, IP address and other non-sensitive information will give you a better panorama of the activity in your site.
I would suggest an approach that gives you both meaningful information and analytics capability in the following form:
Keep a log for all page views in a table that will store information such as:
IP
Page (either the address, or if you're using MVC, the Controller and Action)
User Agent
IsMobileRequest? (Optional, in MVC you can access it through the Request.Browser.IsMobileDevice property)
TimeStamp
Additionally, you can have another table that stores the summary for the visits of all pages for a given period (for example, by month) that is updated using a SQL Server Job every month, retrieving records from the previous table, filtering them, creating a summary record in the monthly summary table and deleting them from the PageViews log table. This table would look similar to the one you already have, with maybe a few additional columns that contain figures such as different IPs count, most popular browser, amount of mobile visits and maybe an average visit hour range (all of them calculated by the job using the Log table).
This way, you can always have information about your page visits for the last month and statistic summaries for the monthly activity of your site, effectively using your available storage and providing you with a more rich source of analysis about your site's users.

Optimising a query for Top 5% of users

On my website, there exists a group of 'power users' who are fantastic and adding lots of content on to my site.
However, their prolific activities has led to their profile pages slowing down a lot. For 95% of the other users, the SPROC that is returning the data is very quick. It's only for these group of power users, the very same SPROC is slow.
How does one go about optimising the query for this group of users?
You can assume that the right indexes have already been constructed.
EDIT: Ok, I think I have been a bit too vague. To rephrase the question, how can I optimise my site to enhance the performance for these 5% of users. Given that this SPROC is the same one that is in use for every user and that it is already well optimised, I am guessing the next steps are to explore caching possibilities on the data and application layers?
EDIT2: The only difference between my power users and the rest of the users is the amount of stuff they have added. So I guess the bottleneck is just the sheer number of records that is being fetched. An average user adds about 200 items to my site. These power users add over 10,000 items. On their profile, I am showing all the items they have added (you can scroll through them).

I think you summed it up here:
An average user adds about 200 items
to my site. These power users add over
10,000 items. On their profile, I am
showing all the items they have added
(you can scroll through them).
Implement paging so that it only fetches 100 at a time or something?

Well you can't optimize a query for a specific result set and leave the query for the rest unchanged. If you know what I mean. I'm guessing there's only one query to change, so you will optimize it for every type of user. Therefore this optimization scenario is no different from any other. Figure out what the problem is; is it too much data being returned? Calculations taking too long because of the amount of data? Where exactly is the cause of the slowdown? Those are questions you need to ask yourself.
However I see you talking about profile pages being slow. When you think the query that returns that information is already optimized (because it works for 95%), you might consider some form of caching of the profile page content. In general, profile pages do not have to supply real-time information.
Caching can be done in a lot of ways, far too many to cover in this answer. But to give you one small example; you could work with a temp table. Your 'profile query' returns information from that temp table, information that is already calculated. Because that query will be simple, it won't take that much time to execute. Meanwhile, you make sure that the temp table periodically gets refreshed.
Just a couple of ideas. I hope they're useful to you.
Edit:
An average user adds about 200 items to my site. These power users add over 10,000 items.
On their profile, I am showing all the
items they have added (you can scroll
through them).
An obvious help for this will be to limit the number of results inside the query, or apply a form of pagination (in the DAL, not UI/BLL!).

You could limit the profile display so that it only shows the most recent 200 items. If your power users want to see more, they can click a button and get the rest of their items. At that point, they would expect a slower response.

Partition / separate the data for those users then the tables in question will be used by only them.
In a clustered environment I believe SQL recognises this and spreads the load to compensate, however in a single server environment i'm not entirely sure how it does the optimisation.
So essentially (greatly simplified of course) ...
If you havea table called "Articles", have 2 tables ... "Articles", "Top5PercentArticles".
Because the data is now separated out in to 2 smaller subsets of data the indexes are smaller and the read and write requests on a single table in the database will drop.
it's not ideal from a business layer point of view as you would then need some way to list what data is stored in what tables but that's a completely separate problem altogether.
Failing that your only option past execution plans is to scale up your server platform.

Multi-tenancy with SQL/WCF/Silverlight

We're building a Silverlight application which will be offered as SaaS. The end product is a Silverlight client that connects to a WCF service. As the number of clients is potentially large, updating needs to be easy, preferably so that all instances can be updated in one go.
Not having implemented multi tenancy before, I'm looking for opinions on how to achieve
Easy upgrades
Data security
Scalability
Three different models to consider are listed on msdn
Separate databases. This is not easy to maintain as all schema changes will have to be applied to each customer's database individually. Are there other drawbacks? A pro is data separation and security. This also allows for slight modifications per customer (which might be more hassle than it's worth!)
Shared Database, Separate Schemas. A TenantID column is added to each table. Ensuring that each customer gets the correct data is potentially dangerous. Easy to maintain and scales well (?).
Shared Database, Separate Schemas. Similar to the first model, but each customer has its own set of tables in the database. Hard to restore backups for a single customer. Maintainability otherwise similar to model 1 (?).
Any recommendations on articles on the subject? Has anybody explored something similar with a Silverlight SaaS app? What do I need to consider on the client side?

Depends on the type of application and scale of data. Each one has downfalls.
1a) Separate databases + single instance of WCF/client. Keeping everything in sync will be a challenge. How do you upgrade X number of DB servers at the same time, what if one fails and is now out of sync and not compatible with the client/WCF layer?
1b) "Silos", separate DB/WCF/Client for each customer. You don't have the sync issue but you do have the overhead of managing many different instances of each layer. Also you will have to look at SQL licensing, I can't remember if separate instances of SQL are licensed separately ($$$). Even if you can install as many instances as you want, the overhead of multiple instances will not be trivial after a certain point.
3) Basically same issues as 1a/b except for licensing.
2) Best upgrade/management scenario. You are right that maintaining data isolation is a huge concern (1a technically shares this issue at a higher level). The other issue is if your application is data intensive you have to worry about data scalability. For example if every customer is expected to have tens/hundreds millions rows of data. Then you will start to run into issues and query performance for individual customers due to total customer base volumes. Clients are more forgiving for slowdowns caused by their own data volume. Being told its slow because the other 99 clients data is large is generally a no-go.
Unless you know for a fact you will be dealing with huge data volumes from the start I would probably go with #2 for now, and begin looking at clustering or moving to 1a/b setup if needed in the future.

We also have a SaaS product and we use solution #2 (Shared DB/Shared Schema with TenandId). Some things to consider for Share DB / Same schema for all:
As mention above, high volume of data for one tenant may affect performance of the other tenants if you're not careful; for starters index your tables properly/carefully and never ever do queries that force a table scan. Monitor query performance and at least plan/design to be able to partition your DB later on based some criteria that makes sense for your domain.
Data separation is very very important, you don't want to end up showing a piece of data to some tenant that belongs to other tenant. every query must have a WHERE TenandId = ... in it and you should be able to verify/enforce this during dev.
Extensibility of the schema is something that solutions 1 and 3 may give you, but you can go around it by designing a way to extend the fields that are associated with the documents/tables in your domain that make sense (ie. Metadata for tables as the msdn article mentions)

What about solutions that provide an out of the box architecture like Apprenda's SaaSGrid? They let you make database decisions at deploy and maintenance time and not at design time. It seems they actively transform and manage the data layer, as well as provide an upgrade engine.

I've similar case, but my solution is take both advantage.
Where data and how data being placed is the question from tenant. Being a tenant of course I don't want my data to be shared, I want my data isolated, secure and I can get at anytime I want.
Certain data it possibly share eg: company list. So database should be global and tenant database, just make sure to locked in operation tenant database schema, and procedure to update all tenant database at once.
Anyway SaaS model everything delivered as server / web service, so no matter where the database should come to client as service, then only render by client GUI.
Thanks

Existing answers are good. You should look deeply into the issue of upgrading and managing multiple databases. Without knowing the specific app, it might turn out easier to have multiple databases and not have to pay the extra cost of tracking the TenantID. This might not end up being the right decision, but you should certainly be wary of the dev cost of data sharing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas