SQL query for joining two tables and pulling repeated entries - sql

I'm trying to join two tables that handle customer purchases, or purchase-intents in order to figure out how many customers made a repeat purchase-intent.
Posting a stripped down version of the DB schema for the problem:
Bid
id
user_id
created_at
Order
id
bid_id (optional)
user_id
created_at
User
id
created_at
Bids signify purchase intent, while Orders signify purchases. A user could be associated with many Bids or Orders.
So a user can start with
- creating a Bid, which could then become fulfilled, and generates a Order, or
- the user could create an Order without an associating Bid.
Bids aren't directly associated with Orders because a Bid could generate many Orders (one to many).
I'm trying to write a SQL query that pulls users who first created a Bid which became a purchase (i.e have an Order record with a bid_id), and then created another Bid after that.
In English, customers who created a Bid, had the Bid fulfilled, and then placed a new Bid sometime after.
Currently not particular about whether they made a direct purchase (Order without Bid), or had an unfulfilled Bid in between the fulfilled Bid and the new Bid.
Main metric I'm trying to confirm is that they placed a new bid sometime after one of their prior bids got fulfilled.
In trying to solve this, I've been only able to get a list of users who placed more than one bid and had at least one fulfilled. But been unable to get the number of repeat users who first had a prior Bid fulfilled before placing a new bid

exists comes to mind for this type of query:
select o.*
from orders o
where bid_id is not null and
exists (select 1
from bids b
where b.user_id = o.user_id and b.created_at > o.created_at
);

Related

Price comparison database - put price data in main table, in one separate table or in many product tables?

I'm trying to build a price comparison database with n products and a definitive but changing number of vendors that sell these products.
For my price comparison database, I need to store both current prices for a product across different vendors and historical prices (one lowest price).
As I see it, I have 2 options to design the database tables:
1. Put all vendor prices into the main table.
I know how many vendors there will be and if I add or remove a vendor I can add or remove a column.
Historical prices (lowest price on certain date across all vendors), goes into a separate table with a product name, a price and a date.
2. Have one table for products and one table for prices
I will have only the static attribute data in the main table such as categories, attributes etc and then add prices to a separate product table where I store price, vendor, date in it and I can store the lowest price as a pseudo-vendor in that table for each date or I can store it in a separate table as well.
Which method would you suggest and am I missing something?
You should store the base data in a normalized format that contains all the history. This means that you have tables for:
products, with one row per product and the static information about the products.
vendors, with one row per vendor and the static information about the vendor.
prices, with one row per price along with the date and product and vendor.
You can get the current and lowest prices using a query, such as:
select pr.*
from (select pr.*, min(price) over (partition by product) as min_price
row_number() over (partition by product, vendor order by price_datetime desc) as seqnum
from prices pr
where pr.product_id = XXX
) pr
where seqnum = 1;
For performance, you want an index on prices(product, vendor, price_datetime desc).
Eventually, you may find that this query runs too slowly. In that case, you will then consider optimizations. One optimization would simply be storing the latest date for each price/vendor combination using a trigger, along with the minimum price in the products table -- presumably using triggers.
Another would be maintaining a summary table for each product and vendor using triggers. However, that is probably not how you should start the endeavor.
However, you might be surprised at how well the above query can perform on your data.

Average Number on entries where ID is the same as another table

I have a database I am making with Microsoft Access 2013, where there is 2 tables. First table has productID as the primary key, second table has a unique reviewID as well as the productID of the product that the review is referring to. In first table where the products information is kept, I want to have a field that averages the ratings that it was given in it's reviews (kept in second table).
How do I average it's rating without averaging the rating for all reviews, and only for reviews about that specific product?
Based on your descriptions I've created a table called tblProducts with the following data:
I've then created a table called tblReview with the following data (here I've assumed you have a field to store a value for each review's rating that I've called ReviewRating.. and I've assumed that reviews are rated from 0-10):
I then created this query:
SELECT tblProducts.ProductName, Avg(tblReview.ReviewRating) AS AvgOfReviewRating
FROM tblReview INNER JOIN tblProducts ON tblReview.productID = tblProducts.productID
GROUP BY tblProducts.ProductName;
...which results in:
Note that this is a SELECT query, so it won't put the average review rating in to the original tblProducts table, for that you would need an UPDATE query. I wouldn't recommend that though as you'll have to remember to run the update before using tblProducts for anything that needs up-to-date averages.

Simple database for product order

I want to make a simple database to order products, like chips/drinks (as full product, without any specific info about product just name and price for unit)
I designed this but I'm not sure if it's good:
**Person:**
id
username
name
password
phone
email
street
zip
**order:**
id
person_id
product_id
date
quantity (neccessary?)
status (done or not)
**product:**
id
name
price
relations:
[person] 1 --- 1 [order] 1 --- many [product]
(I'm not sure about relations and fields)
It seems that in your way you are going to end up in orders containing a single product (even if you use the quantity)
I would modify the Order table:
**order:**
id
person_id
date
status (done or not)
And I would add a new table:
**OrderDetails**
id
order_id
product_id
quantity
You may check out for db normalization. You should add columns to a table that are directly related to the table. For instance date in the order is valid, because it refers to the order it was made. On the other hand it wouldn't be valid in the person table (unless it was referring to the person join date). So, similarly the quantity refers to the product in the order (thus in OrderDetails) not in the Order or the Product.
You will probably need an intermediate table between order and product, so you can add many times same order to different products

Help understand booking system

I'm creating a theatre booking system.
I am quite confused on how I can get multiple tickets to one booking, and able to query those tickets separately. (fields that are of no relevance to the question have not been included)
I have a ticket table:
ticketId, ticketName
Booking Table:
bookingId, bookingReference, ticketId
When connecting this I will receive the ability to create many tickets but the bookingId will change everytime, I will need the ability to find all the tickets associated with a booking and then query an individual ticket so it can be used for single ticket printing etc.
Can anyone help me understand what I need to do.
Thanks.
The relationship between Tickets and Bookings is many to one. It would make more sense to have a field bookingid in the ticket Table rather than having a ticketId field in Booking table:
Ticket table:
ticketId, ticketName, bookingId
Booking Table:
bookingId, bookingReference
SELECT * FROM Ticket WHERE bookingid = foo
SELECT * FROM Ticket AS T INNER JOIN Booking AS B on T.bookingid = B.bookingid
etc

SQL "GROUP BY" issue

I'm designing a shopping cart. To circumvent the problem of old invoices showing inaccurate pricing after a product's price gets changed, I moved the price field from the Product table into a ProductPrice table that consists of 3 fields, pid, date and price. pid and date form the primary key for the table. Here's an example of what the table looks like:
pid date price
1 1/1/09 50
1 2/1/09 55
1 3/1/09 54
Using SELECT and GROUP BY to find the latest price of each product, I came up with:
SELECT pid, price, max(date) FROM ProductPrice GROUP BY pid
The date and pid returned were accurate. I received exactly 1 entry for every unique pid and the date that accompanied it was the latest date for that pid. However, what came as a surprise was the price returned. It returned the price of the first row matching the pid, which in this case was 50.
After reworking my statement, I came up with this:
SELECT pp.pid, pp.price, pp.date FROM ProductPrice AS pp
INNER JOIN (
SELECT pid AS lastPid, max(date) AS lastDate FROM ProductPrice GROUP BY pid
) AS m
ON pp.pid = lastPid AND pp.date = lastDate
While the reworked statement now yields the correct price(54), it seems incredible that such a simple sounding query would require an inner join to execute. My question is, is my second statement the easiest way to accomplish what I need to do? Or am I missing something here? Thanks in advance!
James
The reason you get an arbitrary price is that mysql cannot know which columns to select if you GROUP BY something. It knows it needs a price and a date per pid and can fetch the latest date as you requested with max(date) but chooses to return a price that is most efficient for him to retrieve - you didn't provide an aggregate function for that column (your first query is not valid SQL, actually.)
Your second query looks OK, but here is a shorter alternative:
SELECT pid, price, date
FROM ProductPrice p
WHERE date = (SELECT MAX(date) FROM ProductPrice tmp WHERE tmp.pid = p.pid)
But if you access the latest price a lot (which I think you do), I would recommend adding the old column back to your original table to hold the newest value, if you have the option of altering the database structure again.
I think you broke your database schema.
To circumvent the problem of old invoices showing inaccurate pricing after a product's price gets changed, I moved the price field from the Product table into a ProductPrice table that consists of 3 fields, pid, date and price. pid and date form the primary key for the table.
As you have pointed out you need to keep a change history of prices. But you can still keep the current price in the products table in addition to that new table. That would make your life much easier (and your queries faster).
You cannot solve your problem with the GROUP BY clause, because for each group of pid MySQL will simply fetch the first pid, the maximum date and the first price found (which is not what you need).
You may either use a subquery (which can be inefficient):
SELECT pid, date, price
FROM ProductPrice p1
WHERE date = ( SELECT MAX(p2.date)
FROM ProductPrice p2
WHERE p1.pid = p2.pid)
or you can simply join the table with itself:
SELECT p1.pid, p1.date, p1.price
FROM ProductPrice p1
LEFT JOIN ProductPrice p2 ON p1.pid = p2.pid
AND p1.date < p2.date
WHERE p2.pid IS NULL
Take a look at this section of MySQL docs.
You might wanna try this:
SELECT pid, price, date FROM ProductPrice GROUP BY pid ORDER BY date DESC
Group has some obscure functionality, I'm too always unsure if it's the right field...but it should be the first in the resultset.
Here is another -possibly inefficient- one:
SELECT pid, substring_index( group_concat( price order by date desc ), ',', 1 ) , max(date)
FROM ProductPrice
GROUP BY pid
I think that the key here is simple sounding query - you can see what you want but computers ain't human and so to produce the desired result from set based operations you have to be explicit as in the second query.
The inner query identifies the last price for each product, then the outer query lets you get the value for the last price - that's about as simple as it can get.
As an aside, if you have an invoicing system, you really ought to store the price for the product (and the tax rates as well as the "codes") with the invoice i.e. the invoice tables should contain all the necessary financial information to reproduce the invoice. In general, you do not want to rely on being able to look up a price (or a tax rate) in a mutable table even allowing for the system introduced as above. Regardless of this have the pricing history has its own merits.
i faced same problem in one of my project i used subquery to fetch date and then compare it but it makes system slow when data increases. so, its better to store latest price in your Products table in addition to the new table you have created to keep history of price changes.
you can always use any of query ppl suggested to get latest price of product on particular date. but also you can add one field in the same table is it latest. so for one date you can make flag true once. and you can always find product's latest price for particular date by one simple query.