Redshift: Incremental products ordered across marketplaces - sql

I have a table, 'Customer_Orders' that basically lists the products purchased by customers across marketplaces (UK, DE, US etc). Here's a short overview of the table:
Cust_id
marketplace
product
1
UK
A
1
UK
B
1
DE
A
1
US
A
1
US
C
From the above table, i want to extract, the incremental number of products ordered 1. in total; 2. by marketplace. For eg: If we use UK as the base, Total: 3; DE: 0 and US: 1.

Are you trying to get the differences like this?
select cust_id, marketplace,
(count(*) - max(case when marketplace = 'UK' then count(*) end) over (partition by cust_id)) as diff_from_uk
from Customer_Orders co
group by cust_id, marketplace;

Related

how to fill missing values in table using sql for window function

order_id
Products
Country
MB-123
Bread
US
MB-123
Milk
MB-1256
Cheese
UK
MB-1256
Tomato Sauce
MB-1256
Milk
The missing values in the above table needs to be filled with the same country names for the same order_id.
I tried with coalesce with window function but I am not able to fill the null value for each orderid. I want null values in country column to be filled by the country name for the same order_id.
I want the desired output as the table shown below :-
order_id
Products
Country
MB-123
Bread
US
MB-123
Milk
US
MB-1256
Cheese
UK
MB-1256
Tomato Sauce
UK
MB-1256
Milk
UK
In PostgreSQL, you can use the MAX window function, that allows you to get the non-null value for each "order_id" partition.
SELECT order_id, Products, MAX(Country) OVER(PARTITION BY order_id) AS Country
FROM tab
If you need to update the existing table, you can compute the maximum value for each country, then apply a JOIN operation inside the UPDATE` statement:
WITH cte AS (
SELECT order_id, Products, MAX(Country) OVER(PARTITION BY order_id) AS Country
FROM tab
)
UPDATE tab
SET Country = cte.Country
FROM cte
WHERE cte.order_id = tab.order_id AND cte.Products = tab.products
AND tab.Country IS NULL
Check the demo here.

SQL - How to solve this challenging problem?

I have two tables
First table - ticket history:
customer_id
ticket_price
transportation
company_id
1
$342.21
Plane
D7573
1
$79.00
Car
G2943
1
$91.30
Car
M3223
2
$64.00
Car
K2329
3
$351.00
Plane
H2312
3
$354.27
Plane
P3857
4
$80.00
Car
N2938
4
$229.67
Plane
J2938
5
$77.00
Car
L2938
2nd table - companies and corresponding vehicles:
company_id
vehicle
D7573
Boeing
G2943
Coach
M3223
Shuttle
K2329
Shuttle
H2312
Airbus
P3857
Boeing
N2938
Minibus
J2938
Airbus
L2938
Minibus
Z3849
Airbus
A3848
Minibus
If a customer took both plane and car, then they are "mixed". Otherwise they are "plane" or "car" customers. How can I get the result below?
# shuttle took
Avg ticket price per customer
# of customers
mixed
??????????????
????????????????????????????
??????????????
plane
??????????????
????????????????????????????
??????????????
car
??????????????
????????????????????????????
??????????????
Your title is misleading, you need to specify which part you are having problem.
May not be the best answer. Tested in MYSQL env, sql fiddle
select transportation,
sum(no_of_shuttle) as no_of_shuttle_took,
round(avg(ticket_price), 2) as avg_price_per_customer,
count(customer_id) as no_of_customer
from (
select
customer_id,
'mixed' as transportation,
count(transportation) as no_of_shuttle,
sum(ticket_price) as ticket_price
from tickets
group by customer_id
having count(distinct transportation) > 1
union all
select
customer_id,
transportation,
count(transportation) as no_of_shuttle,
sum(ticket_price) as avg_ticket_price
from tickets
group by customer_id
having count(distinct transportation) = 1
) t
group by transportation
I am using subqueries to aggregate
customers with multiple distinct transportation type
customers with single distinct transportation type
Then I union these two results into one result set to further calculate the number of customers, number of shuttle took and average ticket price per customer. Note that I am rounding the price to 2 decimal places.
SQL Server using a common table expression:
;WITH cte1 as (
SELECT customer_id,CASE when count(distinct(transportation))>1 THEN 'Mixed' ELSE MAX(transportation) END as transportation, AVG(ticket_price) as avg_ticket_price,SUM(CASE WHEN vehicle='Shuttle' THEN 1 ELSE 0 END) as shuttle
FROM history as a
JOIN vehicle as b ON a.company_id=b.company_id
GROUP BY customer_id)
SELECT transportation,COUNT(DISTINCT(customer_id)) as num_cust, AVG(avg_ticket_price) as avg_ticket_price,sum(shuttle) as shuttle
FROM cte1
GROUP BY transportation

SQL - Query the same column but with 2 different conditions

I have a table called Products which contains the entire catalog. That table has a unique Product_ID, the Category it belongs to, and then a field Available which shows in which countries (US, UK, DE, ...) the product can be sold. If a product can be sold on multiple then the combination Product_ID and Available looks like:
23523 DE
23523 UK
23523 US
...
I need to do a query that produces 3 columns:
Category Total_Number_Products DE_Number_Products
I can do this on 2 separate queries, one for Total_Number_Products and the other for DE_Number_Products, each one with a Count - the 1st one without any condition and the 2nd one checking if "Available = 'DE'".
How can I or should I query that same column with COUNT(Product_ID) twice on the same query, once for all the products and then for the DE specific products?
Please consider this:
select category,
count(*) total_number_products,
sum(case available when 'DE' then 1 else 0 end) de_number_products
from products
group by category
you can do conditional aggregation here:
select category,
count(*) as total_number_products,
count(case when country = 'DE' then 1 end) as DE_number_products
from your_table
group by category;

Fill Users table with data using percentages from another table

I have a Table Users (it has millions of rows)
Id Name Country Product
+----+---------------+---------------+--------------+
1 John Canada
2 Kate Argentina
3 Mark China
4 Max Canada
5 Sam Argentina
6 Stacy China
...
1000 Ken Canada
I want to fill the Product column with A, B or C based on percentages.
I have another table called CountriesStats like the following
Id Country A B C
+-----+---------------+--------------+-------------+----------+
1 Canada 60 20 20
2 Argentina 35 45 20
3 China 40 10 50
This table holds the percentage of people with each product. For example in Canada 60% of people have product A, 20% have product B and 20% have product C.
I would like to fill the Users table with data based on the Percentages in the second data. So for example if there are 1 million user in canada, I would like to fill 600000 of the Product column in the Users table with A 200000 with B and 200000 with C
Thanks for any help on how to do that. I do not mind doing it in multiple steps I jsut need hints on how can I achieve that in SQL
The logic behind this is not too difficult. Assign a sequential counter to each person in each country. Then, using this value, assign the correct product based on this value. For instance, in your example, when the number is less than or equal to 600,000 then 'A' gets assigned. For 600,001 to 800,000 then 'B', and finally 'C' to the rest.
The following SQL accomplishes this:
with toupdate as (
select u.*,
row_number() over (partition by country order by newid()) as seqnum,
count(*) over (partition by country) as tot
from users u
)
update u
set product = (case when seqnum <= tot * A / 100 then 'A'
when seqnum <= tot * (A + B) / 100 then 'B'
else 'C'
end)
from toupdate u join
CountriesStats cs
on u.country = cs.country;
The with statement defines an updatable subquery with the sequence number and total for each each country, on each row. This is a nice feature of SQL Server, but is not supported in all databases.
The from statement is joining back to the CountriesStats table to get the needed values for each country. And the case statement does the necessary logic.
Note that the sequential number is assigned randomly, using newid(), so the products should be assigned randomly through the initial table.

Trying to count # distinct values of a field based on value of a different field

Looking for some help with my SQL query. I am trying to find a way to specify cases in which a customer purchases or sells at two different stores, but if they purchase at one store and sell at another, I don't care or want that to count.
I have tried this -
Select count(distinct store) OVER(Partition BY Customer)
but it doesn't like the distinct and causes an error. When I dont specify distinct, it will give me the count of all observations of that customer, instead of just the count of # of stores that they purchased from, or sold to.
Based on the data below, customer D is the type im looking to filter for.
My Raw Data:
Customer Type Qty Store
A Purchase 1 2
A Purchase 2 2
A Sale 3 1
B Sale 24 1
B Sale 12 1
C Purchase 4 2
D Sale 12 2
D Purchase 4 2
D Purchase 2 1
D Purchase 2 1
Any ideas?
select customer
from your_table
group by customer, type
having count(distinct store) > 1
Don't you just want to GROUP?
Select Count(store) FROM Blah GROUP BY Customer, Store
Edit: Ah I see what you want - you want a count of store over customer, sorry misread it!