Finding progression of category purchases - sql

If I had a table as such:
How do I structure a SQL (mysql or redshift) to show a progression of what customers purchased into their subsequent purchases of a new category? like:
Where I use the first_time_customer flag to define if a parent_category is NEW (or first time buyer of this category) and progression_category defines the next categories being purchased (purchased is defined as customers buying the same progression of categories from new until their last purchase). Notice that customer A bought 2 different categories, and that drives the subsequent 4 rows at the end. The actual table is 90 million rows.

Related

SQL Databricks - Selecting the number of customers that have not bought anything from certain product categories

Do you know how I can select the number of customers that have not bought anything from specific product categories, for a 6 months period of time?
The result should look something like that:
table_example
I tried to full outer join the sales table, the product categories table and the customer ids table, then count the null values from the customer ids table. Returned no results.

Price comparison database - put price data in main table, in one separate table or in many product tables?

I'm trying to build a price comparison database with n products and a definitive but changing number of vendors that sell these products.
For my price comparison database, I need to store both current prices for a product across different vendors and historical prices (one lowest price).
As I see it, I have 2 options to design the database tables:
1. Put all vendor prices into the main table.
I know how many vendors there will be and if I add or remove a vendor I can add or remove a column.
Historical prices (lowest price on certain date across all vendors), goes into a separate table with a product name, a price and a date.
2. Have one table for products and one table for prices
I will have only the static attribute data in the main table such as categories, attributes etc and then add prices to a separate product table where I store price, vendor, date in it and I can store the lowest price as a pseudo-vendor in that table for each date or I can store it in a separate table as well.
Which method would you suggest and am I missing something?
You should store the base data in a normalized format that contains all the history. This means that you have tables for:
products, with one row per product and the static information about the products.
vendors, with one row per vendor and the static information about the vendor.
prices, with one row per price along with the date and product and vendor.
You can get the current and lowest prices using a query, such as:
select pr.*
from (select pr.*, min(price) over (partition by product) as min_price
row_number() over (partition by product, vendor order by price_datetime desc) as seqnum
from prices pr
where pr.product_id = XXX
) pr
where seqnum = 1;
For performance, you want an index on prices(product, vendor, price_datetime desc).
Eventually, you may find that this query runs too slowly. In that case, you will then consider optimizations. One optimization would simply be storing the latest date for each price/vendor combination using a trigger, along with the minimum price in the products table -- presumably using triggers.
Another would be maintaining a summary table for each product and vendor using triggers. However, that is probably not how you should start the endeavor.
However, you might be surprised at how well the above query can perform on your data.

Crystal 2016 pull from 2 unrelated tables in one report

I have searched and the only answers I found were for cross joining.
I have 3 tables that are related by 1 field only. I'm trying to pull data from 2 tables that are linked to the other table.
The first table contains salesman data IDnumber, name, address, phone number, hire date, wage, etc.
There is a sales table that contains salesmanIDnumber, date of sale, object sold, and price.
There is a purchases table that contains salesmanIDnumber, date of purchase, object purchased, and price.
The date fields in sales and purchases are unrelated. I know the easiest solution would be to have the sales and purchase table combined with a column for buy/sell, but I didn't create the database and I'm working with what I've got. basically I want to pull all purchases or sales by salesmanID in one report.
I have linked the salesman table to the sales table and the purchases table with left outer joins by the salesman ID. What I'm getting in results is cross join with each result from the purchase table displayed once for each result in the sales table, which gives me multiplied results instead of added. for example, 4 sales and 6 purchases would be 10 entries, but I'm getting 24 results.
I tried entering an example but the site stripped the spacing and pushed everything together basically making it unreadable.
how can I get it to show data from both tables independently?
I do have access to create views in the database if that's the best solution, but I'm not proficient at it.
Create 2 views (one for sales, the other for purchases), each Grouped By SalesMan.
Since each SalesMan would have only one row in each view, you can join them without record inflation.
Or use a UNION to append Purchase records to Sales Records, taking care of including a 'Type' column ('Sales' as Type, or 'Puurchases' as Type) and/or reverse sign on quantities to allow summarizing things in a logical.

Duplicate in n-to-n relationship in ssas cube

I have a cube with the structure of n-to-n relationship to categories.
Each product can belong to many categories. In this scenario my product table looks like this:
There is one product in 3 different categories:
Fashion
Men
Women
Electronic
The Bridge_ProductCategories has 3 rows for mapping each category to this product. So I have 3 records in FactSales table showing the sales amount of this product.
In cube browsing when I filter the "Electronic" category, everything works fine.
But when I filter "Fashion" category, because of the belonging to 2 subcategories, the sales amount will be duplicated.
Does anyone have any solution for this situation?

How to count unique records and get number of these uniques in table using SQL?

Imagine I have table like this:
id:Product:shop_id
1:Basketball:41
2:Football:41
3:Rocket:45
4:Car:86
5:Plane:86
Now, this is an example of large internet mall, where there are shops which sell to one customer, so customer can choose more products from each shop and buy it in one basket.
However, I am not sure if there is any SQL syntax which allows me to simply get unique shop_ids and total number of those shops' products in customer basket. So I'd get something like:
Shop 41 has 2 products
Shop 45 one product
Shop 86 two product
I can make SQL queries to scoop through table to make some kind of ['shop_id']['number_of_products'] array variable that would store all products' shop_ids, then "unique them" - up and count how many times I had to cut one more shop_id out to have some remaining but that just seems as a lot of useless scripting.
If you got some nice and neat idea, please, let me know.
This is exactly the sort of thing that aggregate functions are for. You make one row of output for each group of rows in the table. Group them by shop_id and count how many rows are in each group.
select shop_id, count(1) from TABLE_NAME
group by shop_id