Price comparison database - put price data in main table, in one separate table or in many product tables? - sql

I'm trying to build a price comparison database with n products and a definitive but changing number of vendors that sell these products.
For my price comparison database, I need to store both current prices for a product across different vendors and historical prices (one lowest price).
As I see it, I have 2 options to design the database tables:
1. Put all vendor prices into the main table.
I know how many vendors there will be and if I add or remove a vendor I can add or remove a column.
Historical prices (lowest price on certain date across all vendors), goes into a separate table with a product name, a price and a date.
2. Have one table for products and one table for prices
I will have only the static attribute data in the main table such as categories, attributes etc and then add prices to a separate product table where I store price, vendor, date in it and I can store the lowest price as a pseudo-vendor in that table for each date or I can store it in a separate table as well.
Which method would you suggest and am I missing something?

You should store the base data in a normalized format that contains all the history. This means that you have tables for:
products, with one row per product and the static information about the products.
vendors, with one row per vendor and the static information about the vendor.
prices, with one row per price along with the date and product and vendor.
You can get the current and lowest prices using a query, such as:
select pr.*
from (select pr.*, min(price) over (partition by product) as min_price
row_number() over (partition by product, vendor order by price_datetime desc) as seqnum
from prices pr
where pr.product_id = XXX
) pr
where seqnum = 1;
For performance, you want an index on prices(product, vendor, price_datetime desc).
Eventually, you may find that this query runs too slowly. In that case, you will then consider optimizations. One optimization would simply be storing the latest date for each price/vendor combination using a trigger, along with the minimum price in the products table -- presumably using triggers.
Another would be maintaining a summary table for each product and vendor using triggers. However, that is probably not how you should start the endeavor.
However, you might be surprised at how well the above query can perform on your data.

Related

Aggregate my quantity sum in a way that doesn't lead to the storeID repeating?

I am writing a SQL query that needs to show the total number of orders from each store. The issue I am running into, is that while I can figure out how to sum the orders by product and each product is only sold by one store, I can't figure out how to total the orders by store alone
This is the code I currently have
SELECT storeID AS [STORE], Product_ID
, SUM(quantity) AS [ORDERS BY STORE]
FROM Fulfillment, Store
GROUP BY storeID, Product_ID;
This line of code leads to a repeat of storeID in the results, where ideally, I would only want storeID to be included in the results once with the total quantity of all of Product_ID being included. I tried to remove Product_ID from the GROUP BY statement, but this resulted in the following error
Column 'Fulfillment.Product_ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I'm new to SQL and am trying to learn, so any help and advice is greatly appreciated
#ZLK is correct that if your goal is a total number of units ordered ("quantity") of any product, simply remove the [product_id] column from the SELECT and GROUP BY.
However, it appears that you're referencing two tables ("FROM Fulfillment, Store") and not specifying how those tables are joined, creating a cartesian join - all rows in one table will be joined to all rows in the other table. If the [storeID] and [quantity] fields are available in the Fulfillment table, I recommend removing the Store table reference from the FROM clause (so "FROM Fulfillment" alone).
One last note: You mention that you want to count "orders". In some circumstances, an order may have multiple products and a quantity > 1. If your goal is the total number of "orders" regardless of the number of products or quantity of products on an order, you'll want to use "COUNT(DISTINCT orderID) as [Orders]" (where "orderID" is the reference to the unique order number).

BigQuery SQL Query for top products from the Merchant Center

I am running into an issue with writing an SQL query with Google Big Query. Basically looking to transfer the top products, per country, per category which are also in stock into a table.
So far I have pulled in the top products, per country, per category but the issue is with getting the 'in-stock' part added to the table. I can't find any similar keys in the schema to match them up.
Ideally the table would include:
Rank, Product Title, Country, Category, In-Stock
I would really appreciate any help on this! Thanks.
I have tried to add in a separate table that includes the 'availability' key for each product but I could not match it
You have your top_products table to check the rank that you can join to the product_inventory using the rank_id.
This join will retrieve the product_id and join that key to the products table.
After that, you get the availability information of the product and then you have all the information you require.

Compare the sum of rows with shared reference in one table with a single value in another

I have a list of my stock in one table, stock, and I reserve portions of that stock to an order in another table, orders
I have a feeling my data is no longer in a good state, and I need to compare the amount I have reserved from a chunk of stock in the orders table to the total I have in the reserved stock table.
Each stockref has a unique entry in stock but can occur multiple times in orders if it represents a quantity that is split across multiple orders.
The ideal query would show me stockrefs that are over-reserved. Once I have these I can then check my code to see what caused the issue and fix it before I go bust selling items I no longer have stock of.
I think this should resolve your problem
Select s.reservedqty,s.stockref,O.Total_reservedqty from stock S join
(
Select sum(reservedqty) as Total_reservedqty,stockref from orders group by stockref
) O
On S.stockref=O.stockref
where O.Total_reservedqty !=s.reservedqty

How would I structure a database for taking sales orders?

I am helping someone build a grocery deliver service. Very simple site and order process. The problem I am having is knowing how to come up with schema for the orders. The order would have contact information, but would also need to have all of the products they ordered. Since it can/will be different for each persons order, how would I go about designing this? Would I just have a products ordered field with a string list of all the items? Or would I need multiple linked tables?
Thanks!
You need three tables:
a) Products - Contains the product details
b) Orders - Contains the order header, contact information etc
c) Product_Orders - is a table with two columns: product_id and order_id, that bind the orders to the products, and a quantity field and unit price.
You could have a customers table, but ideally information like the delivery address, etc, would be attached to the order so that if the user changes his address it will not affect your order history.
For the same reason unit price must be in Product_Orders, so that if a product price changes it does not affect previous orders.
Product table
with name and price of all products.
Customer table
with name, address, phone etc...
Order table
with entries for each order: order id, cusomer id, date etc
Order Items table
with all ordered items, with link to id in ordertable, product table, quantity, price, etc...

SQL "GROUP BY" issue

I'm designing a shopping cart. To circumvent the problem of old invoices showing inaccurate pricing after a product's price gets changed, I moved the price field from the Product table into a ProductPrice table that consists of 3 fields, pid, date and price. pid and date form the primary key for the table. Here's an example of what the table looks like:
pid date price
1 1/1/09 50
1 2/1/09 55
1 3/1/09 54
Using SELECT and GROUP BY to find the latest price of each product, I came up with:
SELECT pid, price, max(date) FROM ProductPrice GROUP BY pid
The date and pid returned were accurate. I received exactly 1 entry for every unique pid and the date that accompanied it was the latest date for that pid. However, what came as a surprise was the price returned. It returned the price of the first row matching the pid, which in this case was 50.
After reworking my statement, I came up with this:
SELECT pp.pid, pp.price, pp.date FROM ProductPrice AS pp
INNER JOIN (
SELECT pid AS lastPid, max(date) AS lastDate FROM ProductPrice GROUP BY pid
) AS m
ON pp.pid = lastPid AND pp.date = lastDate
While the reworked statement now yields the correct price(54), it seems incredible that such a simple sounding query would require an inner join to execute. My question is, is my second statement the easiest way to accomplish what I need to do? Or am I missing something here? Thanks in advance!
James
The reason you get an arbitrary price is that mysql cannot know which columns to select if you GROUP BY something. It knows it needs a price and a date per pid and can fetch the latest date as you requested with max(date) but chooses to return a price that is most efficient for him to retrieve - you didn't provide an aggregate function for that column (your first query is not valid SQL, actually.)
Your second query looks OK, but here is a shorter alternative:
SELECT pid, price, date
FROM ProductPrice p
WHERE date = (SELECT MAX(date) FROM ProductPrice tmp WHERE tmp.pid = p.pid)
But if you access the latest price a lot (which I think you do), I would recommend adding the old column back to your original table to hold the newest value, if you have the option of altering the database structure again.
I think you broke your database schema.
To circumvent the problem of old invoices showing inaccurate pricing after a product's price gets changed, I moved the price field from the Product table into a ProductPrice table that consists of 3 fields, pid, date and price. pid and date form the primary key for the table.
As you have pointed out you need to keep a change history of prices. But you can still keep the current price in the products table in addition to that new table. That would make your life much easier (and your queries faster).
You cannot solve your problem with the GROUP BY clause, because for each group of pid MySQL will simply fetch the first pid, the maximum date and the first price found (which is not what you need).
You may either use a subquery (which can be inefficient):
SELECT pid, date, price
FROM ProductPrice p1
WHERE date = ( SELECT MAX(p2.date)
FROM ProductPrice p2
WHERE p1.pid = p2.pid)
or you can simply join the table with itself:
SELECT p1.pid, p1.date, p1.price
FROM ProductPrice p1
LEFT JOIN ProductPrice p2 ON p1.pid = p2.pid
AND p1.date < p2.date
WHERE p2.pid IS NULL
Take a look at this section of MySQL docs.
You might wanna try this:
SELECT pid, price, date FROM ProductPrice GROUP BY pid ORDER BY date DESC
Group has some obscure functionality, I'm too always unsure if it's the right field...but it should be the first in the resultset.
Here is another -possibly inefficient- one:
SELECT pid, substring_index( group_concat( price order by date desc ), ',', 1 ) , max(date)
FROM ProductPrice
GROUP BY pid
I think that the key here is simple sounding query - you can see what you want but computers ain't human and so to produce the desired result from set based operations you have to be explicit as in the second query.
The inner query identifies the last price for each product, then the outer query lets you get the value for the last price - that's about as simple as it can get.
As an aside, if you have an invoicing system, you really ought to store the price for the product (and the tax rates as well as the "codes") with the invoice i.e. the invoice tables should contain all the necessary financial information to reproduce the invoice. In general, you do not want to rely on being able to look up a price (or a tax rate) in a mutable table even allowing for the system introduced as above. Regardless of this have the pricing history has its own merits.
i faced same problem in one of my project i used subquery to fetch date and then compare it but it makes system slow when data increases. so, its better to store latest price in your Products table in addition to the new table you have created to keep history of price changes.
you can always use any of query ppl suggested to get latest price of product on particular date. but also you can add one field in the same table is it latest. so for one date you can make flag true once. and you can always find product's latest price for particular date by one simple query.