SQL First Product and Second Product Combinations - sql

I'm trying to find what the preferred combination of item purchases for customers who have shopped here. Currently, this code tells me the number of combinations, but this a) does not tell me which product was purchased first and second and b) tells me all the possible combinations of customers' varying frequency.
The data I currently have looks something like this:
CustomerKey CalendarDate PnLCategory ChannelName
8 2014-06-27 Laptop Online
8 2015-07-01 Mouse Retail
8 2015-12-13 Earphones Online
10 2014-01-10 Headphones Retail
14 2016-01-25 Laptop Online
14 2017-02-18 Mouse Retail
Based on this data, you can find that customers typically purchase a laptop then mouse. Additionally, you can tell that customers typically purchase online than retail.
I only care about the first two transactions a customer makes. Also, how would you include which channel the product was purchased from? Ideally, would like to be able to know what second product a customer would likely purchase given first product and in which channel.
SELECT A.PnLCategory, B.PnLCategory, COUNT (*) CountForCombination
FROM MyTable3 A
INNER JOIN MyTable3 B
ON A.CustomerKey = B.CustomerKey
AND A.PnLCategory < B.PnLCategory
GROUP BY A.PnLCategory, B.PnLCategory
ORDER BY CountForCombination desc
Successful result would look something like:
FirstProduct ChannelName1 SecondProduct ChannelName2 #Occurences
Laptop Online Mouse Retail 100
Mouse Retail Headphones Online 50

Related

Only Show unique Customers per date cohort for repeat purchase rate

Scenario:
I have a table that has all of the customer purchases by Month and each month has a period. Within that table I am showing the customers that have made purchases in each Month/Period. What I am trying to figure out is how to exclude any customer that made a purchase in the previous month so that the repeat purchases are only for unique customers. The data looks like the following:
customer_email
cohortMonth
month_number
orders_for_period
abc#gmail.com
10/2019
0
2
def#gmail.com
10/2019
0
1
ghi#gmail.com
10/2019
0
1
def#gmail.com
10/2019
1
1
abc#gmail.com
10/2019
1
1
def#gmail.com
10/2019
2
1
In the Table above for Month_number=0 we have 3 total customers and within this period customer abc#gmail.com was the only repeat customer because they have 2 orders. This would show as a 33% repeat purchase rate for month_number 0. For Month_number=1 we have 2 customers that have purchased again in the period but only def#gmail.com is unique as abc#gmail.com already made the purchase. This would then bring the repeat_rate to 66% as now 2 customers have comeback and purchased out of the 3 that originally purchased.
cohortMonth
month_number
repeat_purchase_rate
10/2019
0
33%
10/2019
1
66%
10/2019
2
66%
With every unique customer that purchases in the subsequent periods we want to add that to the total to understand the repeat rate at a cumulative level.
I have tried a ton of different ways to figure this out but backing out the customers that made purchases in the previous period and only showing the unique customers is where I am struggling at. Any help is greatly appreciated!
Side Note: Whenever I format a table it looks like how I want it to look in the preview but then when I review I get the error :"Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon."
I then indent and it breaks the way the table looks. Any help on that would be great as well. Thank you

How to get the value of the least and most expensive book in sql?

How to get the value of the least and the most expensive book for each publisher that is based on the retail price?
I have already queried the tables below. However I just don't know how to get, which I thought would be the last 5 rows, the result for this one. Also, if I understood the question right.
select name, title, retail, sum(quantity)
from books full outer join orderitems using(isbn)
join publisher using(pubid)
group by name, title, retail
order by retail;
NAME TITLE RETAIL SUM(QUANTITY)
----------------------- ------------------------------ ---------- -------------
REED-N-RITE BIG BEAR AND LITTLE DOVE 8.95 4
READING MATERIALS INC. COOKING WITH MUSHROOMS 19.95 8
PRINTING IS US REVENGE OF MICKEY 22 5
AMERICAN PUBLISHING HANDCRANKED COMPUTERS 25 2
READING MATERIALS INC. THE WOK WAY TO COOK 28.75
READING MATERIALS INC. HOW TO GET FASTER PIZZA 29.95
READING MATERIALS INC. BODYBUILD IN 10 MINUTES A DAY 30.95 1
PRINTING IS US HOW TO MANAGE THE MANAGER 31.95 1
REED-N-RITE SHORTEST POEMS 39.95 1
PUBLISH OUR WAY E-BUSINESS THE EASY WAY 54.5 2
AMERICAN PUBLISHING DATABASE IMPLEMENTATION 55.95 7
PUBLISH OUR WAY BUILDING A CAR WITH TOOTHPICKS 59.95
AMERICAN PUBLISHING HOLY GRAIL OF ORACLE 75.95 3
REED-N-RITE PAINLESS CHILD-REARING 89.95 6
A bit hard to tell as you haven't shown us the other tables, but how about:
select publisher, min(retail), max(retail)
from books join orderitems using(isbn)
join publisher using(pubid)
group by publisher;
Key points:
You want a min and a max per publisher so group by the publisher
min(retail) gives you the minimum value of retail for all books in the group (the publisher) so that tells you the cheapest book each publisher publishes.
Similarly, max tells you the most expensive

SQL-sum over dynamic period

I have 2 tables: Customers and Actions, where each customer has uniqe ID (which can be found in each table).
Part of the customers became club members at a specific date (change between the customers). I'm trying to summarize their purchases until that date, and to get those who purchase more than (for example) 200 until they become club members.
For example, I can have the following customer:
custID purchDate purchAmount
1 2015-05-12 100
1 2015-07-12 150
1 2015-12-29 320
Now, assume that custID=1 became a club member at 2015-12-25; in that case, I'd like to get SUM(purchAmount)=250 (pay attention that I'd like to get this customer because 250>200).
I tried the following:
SELECT cust.custID, SUM(purchAmount)totAmount
FROM customers cust
JOIN actions act
ON cust.custID=act.custID
WHERE act.clubMember=1
AND cust.purchDate<act.clubMemberDate
GROUP BY cust.custID
HAVING totAmount>200;
Is it the right way to "attack" this question, or should I use something like while loop over the clubMemberDate (which telling the truth-I don't know how to do)?
I'm working with Teradata.
Your help will be appreciated.

Challenging Excel VBA/Macro for inventory management

I work for an eCommerce company and we use Microsoft Excel for our inventory database. We currently just keep adding items to this database as we purchase them, without ever removing them. What I would like to do is start removing items as they sell. I am not sure how to attach the file, so if you e-mail me at drenollet#supplykick.com I can send it to you. Below are the following steps:
The Sales tab includes the sales data for the items. I would like to take this data and be able to copy and paste it in a sheet in our Inventory Managment file in excel (a separate file, but I included a sample in the "Database" sheet).
I then need to just use a VLOOKUP formula and the Catalog data to get the Product ID instead of the SKU. (I can do this.)
Then use the copied data in the Sales Tab that is in the Inventory Management file and move the corresponding rows out of the Database file/sheet to the Sold Items sheet.
A few thoughts on specifics:
I want to make sure all the quantities are right. (e.g.1 if we purchased two of an item and only one sold - reducing the quantity in the Database sheet from two down to one.) (e.g.2 If we purchased an item two different times at two different prices and both were purchased in one sale, I would want to make sure both of the rows are moved out of the database).
If you have any thoughts on making sure the quantities are right, let me know. Maybe we need to set all the purchase quantities to one and copy the purchase of a multiple quantity of items X number of times for each one that was purchased.
Would love your input on how to cross this bridge! Let me know if you would like to see the sample file and I can directly e-mail it to you!
Best Regards,
Don Renollet
The best way to do this is to have a sheet called Movements
then you have just rows of entries like
A B C D
----------------------------------------
prodID Movement type Qty Date
123 Purchase 5 08/01/15
789 Sale 2 07/01/15
123 Return 1 06/01/15
456 Sale 1 05/01/15
789 Purchase 10 04/01/15
456 Purchase 5 03/01/15
123 Sale 2 03/01/15
123 Return 1 02/01/15
123 Sale 1 02/01/15
123 Purchase 10 01/01/15
Then at anytime excel can calculate whats in stock using sumifs or similar
=SUMIFS(C:C,A:A,"123",B:B,"Purchase") - Sumif(C:C,A:A,"123",B:B,"Sale")) + Sumif(C:C,A:A,"123",B:B,"Return"))
You should never remove rows from a database like this, you can always do a stock take every so often and restart the database with 1 entry for each item, but aways store the old data elsewhere.
Try not to mix price with quantity if possible, if you need to manage price , consider using a moving average price (MAP)

Optimal selection for ordering multiple items (parts) from multiple suppliers (vendors)

The task here is to define the optimal (as detailed below) way of ordering items (parts) from suppliers.
The relevant parts of the table schema (with some sample data) are
Items
ID NUMBER
1 Item0001
2 Item0002
3 Item0003
Suppliers
ID NAME DELIVERY DISCOUNT
1 Supplier0001 0 0
2 Supplier0002 0 0.025
3 Supplier0003 20 0
DELIVERY is the delivery charge (in dollars) levied by that supplier on each delivery. DISCOUNT is the settlement discount (as a percentage i.e. 2.5% for ID=2 above) allowed by that supplier for on time payment.
SupplierItems
SUPPLIER_ID ITEM_ID PRICE
1 2 21.67
1 5 45.54
1 7 32.97
This is the many-to-many join between suppliers and items with the price that supplier charges for that item (in dollars). Every item has at least 1 supplier but some have more than one. A supplier may have no items.
PartsRequests
ID ITEM_ID QUANTITY LOCATION_ID ORDER_ID
1 59 4 2 (null)
2 89 5 2 (null)
3 42 4 2 (null)
This table is a request from a field site for parts to be ordered and delivered by the supplier to that site. A delivery of any number of items to a site attracts a delivery charge. When the parts are ordered, the ORDER_ID is inserted into the table so we are only concerned with those where ORDER_ID IS NULL
The question is, what is the optimal way to order these parts for each `LOCATION' where there are 3 optimal solutions that need to be presented to the user for selection.
The combination of orders with the least number of suppliers
The combination of orders with the lowest total cost i.e. The sum of QUANTITY*PRICE for each item plus the DELIVERY for each order summed over all orders ignoring DISCOUNT
As item 2 but accounting for DISCOUNT
Clearly I need to determine the combinations of orders that are available and then determining the optimal ones becomes trivial but I am a bit stuck on an efficient way to deal with building the combinations.
I have built some SQL fiddles in SQL Server 2008 with random data. This one has 100 items, 10 suppliers and 100 requests. This one has 1000 items, 50 suppliers and 250 requests. The table schema is the same.
Update
I reasoned that the solution had to be recursive and I built a nice table valued function to get but I ran into the 32 hard limit on recursion in SQL Server. I was uncomfortable with it anyway because it hinted more of a procedural language solution than a RDMS.
So I am now playing with CTE recursion.
The root query is:
SELECT DISTINCT
'' SOLUTION_ID
,LOCATION_ID
,SUPPLIER_ID
,(subquery I haven't quite worked out) SOLE_SUPPLIER
FROM PartsRequests pr
INNER JOIN
SupplierItems si ON pr.ITEM_ID=si.ITEM_ID
WHERE pr.ORDER_ID IS NULL
This gets all the suppliers that can supply the required items and is certainly a solution, probably not optimal. The subquery sets a flag if the supplier is the sole supplier of any product required for that location; if so they must be part of any solution.
The recursive part is to remove suppliers one by one by means of CTE.SUPPLIER_ID<>CTE.SUPPLIER_ID and add them if they still cover all the items. The SOLUTION_ID will be a CSV list of the suppliers removed, partly to uniquely identify each solution and partly to check against so I get combinations instead of permutations.
Still working on the details, the purpose of this update was to allow the Community to say "Yay, looks like that will work" or, alternatively "You moron, that won't work because ..."
Thanks
This is a more general answer (as in, not sql) as I think solving this problem will require something more powerful. Your first scenario is to select a minimum number of suppliers. This problem can be seen as a set cover problem as you are trying to cover all demands per site with the suppliers. This problem is already NP-complete.
Your third scenario seems to be basically the same as the second. You just have to take the discount into account in the prices, assuming you pay on time for every order.
The second scenario is at least NP-hard as I see a lot of resemblance with the facility location problem. You are trying to decide which suppliers (facilities) to use (open) to cover your orders (demands) based on their prices and delivery costs (opening costs).
Enumerating your possible solutions seems infeasible as with 10 suppliers, you have 2^10 possibilities of using them, further complicated by the distribution of demands internally.
I would suggest some dynamic programming to first select the suppliers that you have to use (=they are the only ones that deliver a specific thing), eliminating some possibilities (if the cost for supplier A +delivery cost A< cost for supplier B) and then trying to expand your set of possible solutions. Linear programming is also a valid train of thought.