Representing ecommerce products and variations cleanly in the database - sql

I have an ecommerce store that I am building. I am using Rails/ActiveRecord, but that really isn't necessary to answer this question (however, if you are familiar with those things, please feel free to answer in terms of Rails/AR).
One of the store's requirements is that it needs to represent two types of products:
Simple products - these are products that just have one option, such as a band's CD. It has a basic price, and quantity.
Products with variation - these are products that have multiple options, such as a t-shirt that has 3 sizes and 3 colors. Each combination of size and color would have its own price and quantity.
I have done this kind of thing in the past, and done the following:
Have a products table, which has the main information for the product (title, etc).
Have a variants table, which holds the price and quantity information for each type of variant. Products have_many Variants.
For simple products, they would just have one associated Variant.
Are there better ways I could be doing this?

I worked on an e-commerce product a few years ago, and we did it the way you described. But we added one more layer to handle multiple attributes on the same product (size and color, like you said). We tracked each attribute separately, and we had a "SKUs" table that listed each attribute combination that was allowed for each product. Something like this:
attr_id attr_name
1 Size
2 Color
sku_id prod_id attr_id attr_val
1 1 1 Small
1 1 2 Blue
2 1 1 Small
2 1 2 Red
3 1 1 Large
3 1 2 Red
Later, we added inventory tracking and other features, and we tied them to the sku IDs so that we could track each one separately.

Your way seems pretty flexible. It would be similar to my first cut.

Related

Pivoting DB2 data on multiple conditions

I'm trying to get a query together that selects products based on a combination of data (body type, materials and colors,etc.) and I've gotten close but I'm still missing a step to get exactly what I want. I've been looking more at pivoting data and I've tinkered with this query but I'm still left with this issue.
Basically, I have multiple products that only have one material and color attributed to them and they each have a unique stock keeping ID, but some of the body types will have 2 materials and colors per stock keeping unit.
My current results in this db fiddle https://www.db-fiddle.com/f/u4zKAdw3H4hFLbfnzEeZS2/1
are as shown:
For the most part the results are there but BodB has the same color for both materials and should be one single row like BoDB | Fabric | Black | Leather | Black
So if one body code is attached to 2 different materials but the same color with sequence 1 and 2, it should be a single row effectively grouped by the body and the color (the materials wouldn't be the same, only the color. And there will only ever be up to 2 materials and colors on a given stock keeping unit
The fiddle is ready to go with these results, any help is much appreciated

How to use normalization to set levels of confidence between a rating and the number of ratings in Python or SQL?

I have a list of about 800 sales items that have a rating (from 1 to 5), and the number of ratings. I'd like to list the items that are most probable of having a "good" rating in an unbiased way, meaning that 1 person voting 5.0 isn't nearly as good as 50 people having voted and the rating of the item being a 4.5.
Initially I thought about getting the smallest amount of votes (which will be zero 99% of the time), and the highest amount of votes for an item on the list and factor that into the ratings, giving me a confidence level of 0 to 100%, however I'm thinking that this approach would be too simplistic.
I've heard about Bayesian probability but I have no idea on how to implement it. My list of items, ratings and number of ratings is on a MySQL view, but I'm parsing the code using Python, so I can make the calculations on either side (but preferably at the SQL view).
Is there any practical way that I can normalize this voting with SQL, considering the rating and number of votes as parameters?
|----------|--------|--------------|
| itemCode | rating | numOfRatings |
|----------|--------|--------------|
| 12330 | 5.00 | 2 |
| 85763 | 4.65 | 36 |
| 85333 | 3.11 | 9 |
|----------|--------|--------------|
I've started off trying to assign percentiles to the rating and numOfRatings, this way I'd be able to do normalization (sum them with an initial 50/50 weight). Here's the code I've attempted:
SELECT p.itemCode AS itemCode, (p.rating - min(p.rating)) / (max(p.rating) - min(p.rating)) AS percentil_rating,
(p.numOfRatings - min(p.numOfRatings)) / (max(p.numOfRatings) - min(p.numOfRatings)) AS percentil_qtd_ratings
FROM products p
WHERE p.available = 1
GROUP BY p.itemCode
However that's only bringing me a result for the first itemCode on the list, not all of them.
Clearly the issue here is the low number of observations your data has. Implementing Bayesian's method is the way to go because it provides great probability distribution for applications involving ratings especially if there is limited observations, and it easily decides the future likelihood ratio based on given parameters (this article provides an excellent explanation about Bayesian probability for beginners).
I would suggest storing your data in CSV files so it becomes easier to manipulate in python. Denormalizing the data via joins is the first task to do before analyzing your ratings.
This is Bayesian's simplified formula to use in your python code:
R – Confidence level aka number of observations
v – number of votes for a single product
C – avg vote for all products
m - tuneable parameter aka cutoff number required for votes to be considered (How many votes do you want displayed)
Since this is the simplified formula, this article explains how its been derived from its original formula. This article is helpful too in explaining the parameters.
Knowing the formula pretty much gets 50% of your work done, the rest is just importing your data and working with it. I provided below examples similar to your problem in case you need full demonstration:
Github example 1
Github example 2

Choosing a value based on a ranking of another column

I have decent Google-Fu skills, but they've utterly failed me on this.
I'm working in PowerPivot and I'm trying to match up a product with a price point in another table. Sounds easy, right? Well each product has several price points, based on the source of the price, with a hierarchy of importance.
For Example:
Product 1 has three prices living in a Pricing Ledger:
Price 1 has an account # of A22011
Price 2 has an account # of B22011
Price 3 has an account # of C22011
Price A overrides Price B which overrides Price C
What I want to do is be able to pull the most relevant price (i.e, that with the highest rank in the hierarchy) when not all price points are being used.
I'd originally used a series of IF statements, but that's when there were only four points. We now have ten points, and that might grow, so the IF statements are an untenable solution.
I'd appreciate any help.
-Thanks-

Optimal selection for ordering multiple items (parts) from multiple suppliers (vendors)

The task here is to define the optimal (as detailed below) way of ordering items (parts) from suppliers.
The relevant parts of the table schema (with some sample data) are
Items
ID NUMBER
1 Item0001
2 Item0002
3 Item0003
Suppliers
ID NAME DELIVERY DISCOUNT
1 Supplier0001 0 0
2 Supplier0002 0 0.025
3 Supplier0003 20 0
DELIVERY is the delivery charge (in dollars) levied by that supplier on each delivery. DISCOUNT is the settlement discount (as a percentage i.e. 2.5% for ID=2 above) allowed by that supplier for on time payment.
SupplierItems
SUPPLIER_ID ITEM_ID PRICE
1 2 21.67
1 5 45.54
1 7 32.97
This is the many-to-many join between suppliers and items with the price that supplier charges for that item (in dollars). Every item has at least 1 supplier but some have more than one. A supplier may have no items.
PartsRequests
ID ITEM_ID QUANTITY LOCATION_ID ORDER_ID
1 59 4 2 (null)
2 89 5 2 (null)
3 42 4 2 (null)
This table is a request from a field site for parts to be ordered and delivered by the supplier to that site. A delivery of any number of items to a site attracts a delivery charge. When the parts are ordered, the ORDER_ID is inserted into the table so we are only concerned with those where ORDER_ID IS NULL
The question is, what is the optimal way to order these parts for each `LOCATION' where there are 3 optimal solutions that need to be presented to the user for selection.
The combination of orders with the least number of suppliers
The combination of orders with the lowest total cost i.e. The sum of QUANTITY*PRICE for each item plus the DELIVERY for each order summed over all orders ignoring DISCOUNT
As item 2 but accounting for DISCOUNT
Clearly I need to determine the combinations of orders that are available and then determining the optimal ones becomes trivial but I am a bit stuck on an efficient way to deal with building the combinations.
I have built some SQL fiddles in SQL Server 2008 with random data. This one has 100 items, 10 suppliers and 100 requests. This one has 1000 items, 50 suppliers and 250 requests. The table schema is the same.
Update
I reasoned that the solution had to be recursive and I built a nice table valued function to get but I ran into the 32 hard limit on recursion in SQL Server. I was uncomfortable with it anyway because it hinted more of a procedural language solution than a RDMS.
So I am now playing with CTE recursion.
The root query is:
SELECT DISTINCT
'' SOLUTION_ID
,LOCATION_ID
,SUPPLIER_ID
,(subquery I haven't quite worked out) SOLE_SUPPLIER
FROM PartsRequests pr
INNER JOIN
SupplierItems si ON pr.ITEM_ID=si.ITEM_ID
WHERE pr.ORDER_ID IS NULL
This gets all the suppliers that can supply the required items and is certainly a solution, probably not optimal. The subquery sets a flag if the supplier is the sole supplier of any product required for that location; if so they must be part of any solution.
The recursive part is to remove suppliers one by one by means of CTE.SUPPLIER_ID<>CTE.SUPPLIER_ID and add them if they still cover all the items. The SOLUTION_ID will be a CSV list of the suppliers removed, partly to uniquely identify each solution and partly to check against so I get combinations instead of permutations.
Still working on the details, the purpose of this update was to allow the Community to say "Yay, looks like that will work" or, alternatively "You moron, that won't work because ..."
Thanks
This is a more general answer (as in, not sql) as I think solving this problem will require something more powerful. Your first scenario is to select a minimum number of suppliers. This problem can be seen as a set cover problem as you are trying to cover all demands per site with the suppliers. This problem is already NP-complete.
Your third scenario seems to be basically the same as the second. You just have to take the discount into account in the prices, assuming you pay on time for every order.
The second scenario is at least NP-hard as I see a lot of resemblance with the facility location problem. You are trying to decide which suppliers (facilities) to use (open) to cover your orders (demands) based on their prices and delivery costs (opening costs).
Enumerating your possible solutions seems infeasible as with 10 suppliers, you have 2^10 possibilities of using them, further complicated by the distribution of demands internally.
I would suggest some dynamic programming to first select the suppliers that you have to use (=they are the only ones that deliver a specific thing), eliminating some possibilities (if the cost for supplier A +delivery cost A< cost for supplier B) and then trying to expand your set of possible solutions. Linear programming is also a valid train of thought.

Database design - alternatives for Entity Attribute Value (EAV)

see How to design a product table for many kinds of product where each product has many parameters for similar topic.
My question: i want to design a database, that will be used for a production facility of different types of products where each product has its own (number of) parameters.
because i want the serial numbers to be in one tabel for overview purposes i have a problem with these different paraeters .
One solution could be EAV, but it has its downsides, certainly because we have +- 5 products with every product +- 20.000 serial numbers (records). it looks a bit overkill to me...
I just don't know how one could design a database so that you have an attribute in a mastertable that says: 'hey, you could find details of this record in THAT detail-table".
'in a way that you qould easely query the results)
currenty i am using Visual Basic & Acces 2007. but i'm going to Visual Basic & MySQL.
thanks for your help.
Bob
I would go with something like this:
product [productid, title, price, datecreated, datemodified, etc]
attribute [attributeid, title]
productattribute [productid, attributeid, value, unit]
Example:
[product]
productid title price datecreated datemodified
1 LCD TV 99.95 2010-01-01 2010-01-01
2 Car 12356 2010-01-01 2010-01-02
3 B/W TV 12.95 1960-01-01 1960-01-01
[attribute]
attributeid title
10 Colors
11 Dimensions
12 Passengers
[productattribute]
productid attributeid value unit
1 10 16 million
1 11 32 inch
2 12 4 adults
3 10 2 colors
3 11 6 inch
It seems you probably need to learn more about the available design patterns when dealing with this sort of problem as there isn't a one-size-fits-all solution.
I recommend picking up a copy of Patterns of Enterprise Application Delvelopment to help you on your way. Sorry that I'm not able to answer your question directly (hopefully someone else here on SO can) but I think the answer given in the question you linked to is about as good as it gets.