PostgreSQL - Find the most expensive and cheapest wine per region - sql

Most Expensive And Cheapest Wine
I'm trying to solve this question from Stratascratch, following the hint was given on the platform.
Find the cheapest and the most expensive variety in each region. Output the region along with the corresponding most expensive and the cheapest variety.
Please help review my answer and would love to know the better way to solve this.
SELECT EX.region_1, EX.expensive_variety, CH.cheap_variety
FROM
(SELECT A.region_1, A.expensive_variety
FROM
(SELECT distinct region_1, variety AS expensive_variety, price,
ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price desc) as
most_expensive
FROM winemag_p1
ORDER BY region_1 asc) A
WHERE A.most_expensive = 1) EX
INNER JOIN
(SELECT B.region_1, B.cheap_variety
FROM
(SELECT distinct region_1, variety as cheap_variety, price,
ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price ASC) as cheapest
FROM winemag_p1
ORDER BY region_1 asc) B
WHERE B.cheapest = 1) CH
ON EX.region_1 = CH.region_1

Something like this, the MIN and MAX per region:
SELECT region
, MIN(price) AS cheapest
, MAX(price) AS most_expensive
FROM table_name
GROUP BY region;

You can find both in the same sub-query.
SELECT
B.region_1,
MAX(CASE WHEN cheapest = 1 then variety else '' end) cheapest_variety,
MAX(CASE WHEN cheapest = 1 then price else 0 end) cheapest_price,
MAX(CASE WHEN expensive = 1 then variety else '' end) expensive_variety,
MAX(CASE WHEN expensive = 1 then price else 0 end) expensive_price
FROM
(SELECT distinct region_1, variety as cheap_variety, price,
ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price ASC) as cheapest,
ROW_NUMBER() OVER (PARTITION BY region_1 ORDER BY price DESC) as expensive
FROM winemag_p1
) B
WHERE cheapest = 1 OR expensive = 1
GROUP BY region_1
ORDER BY region_1;
``

You can use window functions or subqueries to get the highest and lowest prices per region. Then get all rows with these prices and aggregate per region.
For instance:
select
region_1,
min(price) as low_price,
string_agg(variety, ', ') filter (where price = min_price) as low_price_varieties,
max(price) as high_price,
string_agg(variety, ', ') filter (where price = max_price) as high_price_varieties
from
(
select
region_1, variety, price,
min(price) over (partition by region_1) as min_price,
max(price) over (partition by region_1) as max_price
from winemag_p1
) with_min_and_max
where price in (min_price, max_price)
group by region_1
order by region_1;
As to your own query: This is an okay query. Here are my remarks:
ORDER BY in a subquery only makes sense, when limiting the rows (with FETCH FIRST ROWS), because a query result is an unordered data set by definition.
Why DISTINCT? There are no duplicates to remove.
You don't handle ties. If there are two top wines with the same price in a region for instance, you pick one arbirarily.

Related

Find the top seller

I am trying to find the top seller here. I was trying to use the below but i dont think this is right. I can see other ids that have bigger totals plus i think i should be using a SUM function instead of MAX.
select selleruserid
from sales_fact
where price = (select max(price) from sales_fact)
As I understand it you want to get the total sales of each seller and choose the seller who has the most sales.
The seller who has the most sales:
SELECT max(totalPrice) as totalPrice,selleruserid
FROM (SELECT sum(price) AS totalPrice,selleruserid FROM sales_fact GROUP BY selleruserid)
Total sales of each seller:
SELECT totalPrice as totalPrice,selleruserid
FROM (SELECT sum(price) AS totalPrice,selleruserid FROM sales_fact GROUP BY selleruserid)
You can use FETCH clause with TIES option as follows:
select selleruserid, sum(price) as total
from sales_fact
group by selleruserid
order by sum(price) desc
FETCH FIRST 1 ROW WITH TIES
OR You can use the analytical function RANK as follows:
select selleruserid, total from
(select selleruserid, sum(price) as total,
RANK() OVER (ORDER BY sum(price) DESC) AS rn
from sales_fact
group by selleruserid) t
where rn = 1

SQL: How to select the highest priced used item for each day

I need to produce a query that would give me the highest priced used product for each day where the total price of products sold that day exceeds 200.
SELECT *, max(price)
FROM products
WHERE products.`condition` = 'used' and products.price > 200
GROUP BY date_sold
Here is my products table http://prntscr.com/of3hjd
You could try using a join with sum for price > 200 group by date_sol
select m.date_sold, max(m.price)
from my_table m
inner join (
select date_sold, sum(price)
from my_table
group by date_sold
having sum(price)>200
) t on t.date_sold = m.date_sold
group by m.date_sold
You can use window functions for this:
select p.*
from (select p.*,
sum(price) over (partition by date_sold) as sum_price,
row_number() over (partition by date_sold, condition order by price desc) as seqnum
from products p
) p
where sum_price > 200 and
condition = 'used' and
seqnum = 1;
SELECT *, max(price) FROM products
where products.`condition` = 'used' and sum(products.price) > 200
GROUP BY day(date_sold)

How to split 10% of data set to “control” and 90% to “test” for each group in MS SQL server

Context:
I have a table which has RetailerCode, CustomerID,Segment like below
RetailerCode CID Segment
A6005 13SVC15 High
A6005 19VDE1F Low
A6005 1B3BD1F Medium
A6005 1B3HB48 Medium
A6005 1B3HB49 Low
A9006 1B3HB40 High
A9006 1B3HB41 High
A9006 1B3HB43 Low
A9006 1B3HB46 Medium
Here, I would like to divide the data set in to control and test as below,
For each RetailerCode, I have set of customers with each customer tagged to a segment. I need to divide in such a way that
For each retailer
10% of their High customers to control and remaining 90% of their high customers to test.
10% of their Medium customers to control and remaining 90% of their Medium customers to test.
10% of their Low customers to control and remaining 90% of their Low customers to test.
I tried below code and I know its wrong.
select RetailerCode, CID,Segment
(case when row_number() over (order by newid()) <= (select 0.1* count(*) from Table)
then 'control'
else 'test'
end) as group
from Table
group by RetailerCode, CID,Segment
Order by RetailerCode
Can someone please help me with it? Thanks in advance
You seem pretty close:
select RetailerCode, CID,Segment
(case when row_number() over (partition by segment order by newid()) <=
0.1 * count(*) over (partition by segment)
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCode;
I don't see why a group by is needed.
percent_rank is based on rank & count:
select RetailerCode, CID,Segment
(case when percent_rank() over (partition by segment order by newid()) <= 0.1
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCode
And ntile is based on row_number and count:
select RetailerCode, CID,Segment
(case when ntile(10) over (partition by segment order by newid()) = 1
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCode

SQL Insert Statement that pulls top n from each set of categories that could have duplicates

I am trying to write an Insert statement that will go through sales numbers for a group of people with each sale being marked as an R or C type of sale. I want to find the TOP 100 salespersons in ALL (both R and C), R, and C. Not only do I have sales data though, I have Sales, Margin, Count, Sales/Count data I want to do the same thing for. so far I have to do 12 SQL statements to accomplish this (4 categories X 3 sales types) each one is a slight variation of this to get one of my 4 categories.
INSERT INTO ztbl_AllTopSalesPerson (SalesPerson)
SELECT TOP 100 tbl_Master.SalesPerson
FROM tbl_Master
WHERE tbl_Master.SaleType="C"
GROUP BY tbl_Master.SalesPerson
ORDER BY Sum(tbl_Master.Margin) DESC;
INSERT INTO ztbl_AllTopSalesPerson (SalesPerson)
SELECT TOP 100 tbl_Master.SalesPerson
FROM tbl_Master
WHERE tbl_Master.SaleType="R"
GROUP BY tbl_Master.SalesPerson
ORDER BY Sum(tbl_Master.Margin) DESC;
INSERT INTO ztbl_AllTopSalesPerson (SalesPerson)
SELECT TOP 100 tbl_Master.SalesPerson
FROM tbl_Master
GROUP BY tbl_Master.SalesPerson
ORDER BY Sum(tbl_Master.Margin) DESC;
Ideally I would like a way to make this all one statement. And(if it is not impossible) I would like to filter each one by date so I can do it by monthly data too, not just overall.
Just a few notes: I cant have duplicate names, so if a salesperson is top in all three sales types, they still only appear once. Im using Access with a SQL Server back-end for only the main data table. I cant take the top 300 results, because there is so much overlap between the sales types, and I need the top from each ( I do a separate query after this list is made that lines up the SalesPersons' Alphabetically with their 4 categories as fields). And lastly, I generally up with a final list that has around 260-290 records.
THANKS!
p.s. thanks for your replies, stack exchange has saved my bacon 100s of times. I would post my attempts at this, but I think it would hurt more than it would help.
You might have to tweak it a little depending on what sort of output you want. You also might have to do a subquery for the COUNT(*) part of it, as this is untested. But I think this is the general idea of what you are looking for.
To get aggregated information, you can break it up into two CTE's:
WITH CTE1 AS (
SELECT SalesPerson,
SaleType,
SUM(Margin) OVER (PARTITION BY SalesPerson,SaleType) as Margin,
SUM(Sales) OVER (PARTITION BY SalesPerson,SaleType) as Sales,
SUM(Sales)/COUNT(*) OVER (PARTITION BY SalesPerson,SaleType) as Sales_pct,
COUNT(*) OVER (PARTITION BY SalesPerson,SaleType) as Total
SUM(Margin) OVER (PARTITION BY SalesPerson) as Margin_all,
SUM(Sales) OVER (PARTITION BY SalesPerson) as Sales_all,
SUM(Sales)/COUNT(*) OVER (PARTITION BY SalesPerson) as Sales_pct_all,
COUNT(*) OVER (PARTITION BY SalesPerson) as Total_all
FROM tbl_Master
)
,CTE2 AS (
SELECT SalesPerson
,RANK() OVER (PARTITION BY SaleType ORDER BY Margin desc) as Margin
,RANK() OVER (PARTITION BY SaleType ORDER BY Sales desc) as Sales
,RANK() OVER (PARTITION BY SaleType ORDER BY Sales_pct desc) as Sales_pct
,RANK() OVER (PARTITION BY Master.SaleType ORDER BY Total desc) as Total
,RANK() OVER (ORDER BY Margin_all desc) as Margin_all
,RANK() OVER (ORDER BY Sales_all desc) as Sales_all
,RANK() OVER (ORDER BY Sales_pct_all desc) as Sales_pct_all
,RANK() OVER (ORDER BY Total_all desc) as Total_all
FROM CTE1 )
Select distinct SalesPerson from CTE2
Where Margin <= 100 Or Sales <= 100 Or Total <= 100 or Sales_pct <= 100
Or Margin_all <= 100 Or Sales_all <= 100 Or Total_all <= 100 or Sales_pct_all <= 100
I understand this is not perfect, but it should get you started. To filter by date, add DATEPART(month,[your date field]) to your PARTITION BY clauses (and the first CTE)

SQL query the largest and smallest amount

I have a table Sales with the following fields: code, amount, index, name.
I need to get the smallest and the largest amount for a given name, and the code for which the amount is the smallest and largest.
Can somebody help me in constructing a query?
If CTE and row_number() is available to you.
with S as
(
select Amount,
Code,
row_number() over(order by Amount asc) as rn1,
row_number() over(order by Amount desc) as rn2
from Sales
where Name = 'SomeName'
)
select SMin.Amount as MinAmount,
SMin.Code as MinCode,
SMax.Amount as MaxAmount,
SMax.Code as MaxCode
from S as SMin
cross join S as SMax
where SMin.rn1 = 1 and
SMax.rn2 = 1
To find min and max amount per name:
select
name
min(amount), max(amount)
from
sales
group by name
and to get both(min and max) and code in single query:
select *
from sales s
where
(amount = (select
max(s1.amount)
from sales s1
where s1.name = s.name)
or
amount = (select
min(s2.amount)
from sales s2
where s2.name = s.name)
)
Assuming this is Postgres, try the following:
select name,
amount,
code,
case when min_rank=max_rank then 'Minimum and Maximum'
when min_rank=1 then 'Minimum'
else 'Maximum'
end as min_or_max
from
(select s.*,
rank() over (partition by name order by amount) min_rank,
rank() over (partition by name order by amount desc) max_rank
from sales s) v
where 1 in (min_rank, max_rank)