Find rows based on ordering of previous row - sql

There is a table purchase (id,title, purchase-date, quantity)
Find purchases such that the previous purchase was of larger quantity
My Thoughts :
It feels like if I was able to do a loop kind of thing and for a
(id,purchase-date) I could find the row earlier than that then
it will be doable.
But what if the previous entry or multiple entries in the purchase table
has the same purchase-date? when executed a query like -
select * from purchase order by purchase-date
Is the order well-defined in case purchase-date are same for two rows ?
Or it could appear in any order ?

Use lag():
select p.*
from (select p.*,
lag(quantity) over (order by purchase_date) as prev_quantity
from purchases p
) p
where prev_quantity > quantity;
Note: This is for all purchases. Often such a question would be asked about a customer -- but there doesn't appear to be one in the table. If there were, you would use a partition by:
select p.*
from (select p.*,
lag(quantity) over (partition by customer_id order by purchase_date) as prev_quantity
from purchases p
) p
where prev_quantity > quantity;

Related

SQL Query - second ID of a list ordered by date and ID

I have a SQL database with a list of Customer IDs CustomerID and invoices, the specific product purchased in each invoice ProductID, the Date and the Income of each invoice . I need to write a query that will retrieve for each product, which was the second customer who made a purchase
How do I do that?
EDIT:
I have come up with the following query:
SELECT *,
LEAD(CustomerID) OVER (ORDER BY ProductID, Date) AS 'Second Customer Who Made A Purchase'
FROM a
ORDER BY ProductID, Date ASC
However, this query presents multiple results for products that have more than two purchases. Can you advise?
SELECT a2.ProductID,
(
SELECT a1.CustomerID
FROM a a1
WHERE a1.ProductID = a2.ProductID
ORDER BY Date asc
LIMIT 1,1
) as SecondCustomer
FROM a a2
GROUP BY a2.ProductID
I need to write a query that will retrieve for each product, which was the second customer who made a purchase
This sounds like a window function:
select a.*
from (select a.*,
row_number() over (partition by productid order by date asc) as seqnum
from a
) a
where seqnum = 2;

Select top 10 products sold in each year

I have two tables :
Sales
columns: (Sales_id, Date , Customer_id, Product_id, Purchase_amount):
Product
columns: ( Product_id, Product_Name, Brand_id,Brand_name)
I have to write a query to find the top 10 products sold every year. The query I have right now is :
WITH PH AS
(SELECT P.Product_Name, LEFT(S.Date,4) "SYEAR", COUNT(S.Product_id) "Product Count"
FROM Sales S LEFT JOIN Product P
ON S.Product_Id=P.Product_Id
GROUP BY P.Product_Name, LEFT(S.Date,4)
SELECT P.Product_Name, "SYEAR", "Product_Count"
FROM (SELECT P.Product_Name, "SYEAR", "Product_Count",
RANK OVER (PARTITION BY "SYEAR" ORDER BY "Product_Count" DESC) "TEMP"
)
WHERE "TEMP"<=10
This doesn't seem like the most optimized query. Can you please help me with that? Can there be an alternate version to obtain the required result?
Notes
The main reason for the repetition of the code is to enable grouping by the year. There's no field for the year in the given table.
The date format is: YYYYMMDD (example: 20200630)
Any help will be appreciated. Thanks in advance
You can combine the window functions with the aggregation:
SELECT PY.*
FROM (SELECT P.Product_Name, LEFT(S.Date,4) AS YEAR, COUNT(*) AS CNT,
RANK() OVER (PARTITION BY LEFT(S.Date, 4) ORDER BY COUNT(*) DESC) AS SEQNUM
FROM Sales S LEFT JOIN
Product P
ON S.Product_Id = P.Product_Id
GROUP BY P.Product_Name, LEFT(S.Date, 4)
) PY
WHERE SEQNUM <= 10;
From a performance perspective, this probably generates an execution plan very similar to your query. It is however simpler to follow.

SQL query for table with multiple keys?

I am sorry if this seems too easy but I was asked this question and I couldn't answer even after preparing SQL thoroughly :(. Can someone answer this?
There's a table - Seller id, product id, warehouse id, quantity of products at each warehouse for each product as per each seller.
We have to list the Product Ids with Seller Id who has highest number of products for that product and the total number of units he has for that product.
I think I got confused because there were 3 keys in the table.
It's not quite clear which DBMS you are using currently. The below should work if your DBMS support window functions.
You can find count of rows for each product and seller, rank each seller within each product using window function rank and then use filter to get only top ranked sellers in each product along with count of units.
select
product_id,
seller_id,
no_of_products
from (
select
product_id,
seller_id,
count(*) no_of_products,
rank() over (partition by product_id order by count(*) desc) rnk
from your_table
group by
product_id,
seller_id
) t where rnk = 1;
If window functions are not supported, you can use correlated query to achieve the same effect:
select
product_id,
seller_id,
count(*) no_of_products
from your_table a
group by
product_id,
seller_id
having count(*) = (
select max(cnt)
from (
select count(*) cnt
from your_table b
where b.product_id = a.product_id
group by seller_id
) t
);
Don't know why having id columns would mess you up... group by the right columns, sum up the totals and just return the first row:
select *
from (
select sellerid, productid, sum(quantity) as total_sold
from theres_a_table
group by sellerid, productid
) x
order by total_sold desc
fetch first 1 row only
If I do not think about optimization, straight forward answer is like this
select *
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqo
inner join
(
select product_id, max(seller_prod_qty) as max_prod_qty
from
(
select seller_id, product_id, sum(product_qty) as seller_prod_qty
from your_table
group by seller_id, product_id
) spqi
group by product_id
) pmaxq
on spqo.product_id = pmaxq.product_id
and spqo.seller_prod_qty = pmaxq.max_prod_qty
both spqi (inner) and sqpo (outer) give you seller, product, sum of quantity across warehouses. pmaxq gives you max of each product again across warehouses, and then final inner join picks up sum of quantities if seller has highest (max) of the product (could be multiple sellers with the same quantity). I think this is the answer you are looking for. However, I'm sure query can be improved, since what I'm posting is the "conceptual" one :)

SQL Insert Statement that pulls top n from each set of categories that could have duplicates

I am trying to write an Insert statement that will go through sales numbers for a group of people with each sale being marked as an R or C type of sale. I want to find the TOP 100 salespersons in ALL (both R and C), R, and C. Not only do I have sales data though, I have Sales, Margin, Count, Sales/Count data I want to do the same thing for. so far I have to do 12 SQL statements to accomplish this (4 categories X 3 sales types) each one is a slight variation of this to get one of my 4 categories.
INSERT INTO ztbl_AllTopSalesPerson (SalesPerson)
SELECT TOP 100 tbl_Master.SalesPerson
FROM tbl_Master
WHERE tbl_Master.SaleType="C"
GROUP BY tbl_Master.SalesPerson
ORDER BY Sum(tbl_Master.Margin) DESC;
INSERT INTO ztbl_AllTopSalesPerson (SalesPerson)
SELECT TOP 100 tbl_Master.SalesPerson
FROM tbl_Master
WHERE tbl_Master.SaleType="R"
GROUP BY tbl_Master.SalesPerson
ORDER BY Sum(tbl_Master.Margin) DESC;
INSERT INTO ztbl_AllTopSalesPerson (SalesPerson)
SELECT TOP 100 tbl_Master.SalesPerson
FROM tbl_Master
GROUP BY tbl_Master.SalesPerson
ORDER BY Sum(tbl_Master.Margin) DESC;
Ideally I would like a way to make this all one statement. And(if it is not impossible) I would like to filter each one by date so I can do it by monthly data too, not just overall.
Just a few notes: I cant have duplicate names, so if a salesperson is top in all three sales types, they still only appear once. Im using Access with a SQL Server back-end for only the main data table. I cant take the top 300 results, because there is so much overlap between the sales types, and I need the top from each ( I do a separate query after this list is made that lines up the SalesPersons' Alphabetically with their 4 categories as fields). And lastly, I generally up with a final list that has around 260-290 records.
THANKS!
p.s. thanks for your replies, stack exchange has saved my bacon 100s of times. I would post my attempts at this, but I think it would hurt more than it would help.
You might have to tweak it a little depending on what sort of output you want. You also might have to do a subquery for the COUNT(*) part of it, as this is untested. But I think this is the general idea of what you are looking for.
To get aggregated information, you can break it up into two CTE's:
WITH CTE1 AS (
SELECT SalesPerson,
SaleType,
SUM(Margin) OVER (PARTITION BY SalesPerson,SaleType) as Margin,
SUM(Sales) OVER (PARTITION BY SalesPerson,SaleType) as Sales,
SUM(Sales)/COUNT(*) OVER (PARTITION BY SalesPerson,SaleType) as Sales_pct,
COUNT(*) OVER (PARTITION BY SalesPerson,SaleType) as Total
SUM(Margin) OVER (PARTITION BY SalesPerson) as Margin_all,
SUM(Sales) OVER (PARTITION BY SalesPerson) as Sales_all,
SUM(Sales)/COUNT(*) OVER (PARTITION BY SalesPerson) as Sales_pct_all,
COUNT(*) OVER (PARTITION BY SalesPerson) as Total_all
FROM tbl_Master
)
,CTE2 AS (
SELECT SalesPerson
,RANK() OVER (PARTITION BY SaleType ORDER BY Margin desc) as Margin
,RANK() OVER (PARTITION BY SaleType ORDER BY Sales desc) as Sales
,RANK() OVER (PARTITION BY SaleType ORDER BY Sales_pct desc) as Sales_pct
,RANK() OVER (PARTITION BY Master.SaleType ORDER BY Total desc) as Total
,RANK() OVER (ORDER BY Margin_all desc) as Margin_all
,RANK() OVER (ORDER BY Sales_all desc) as Sales_all
,RANK() OVER (ORDER BY Sales_pct_all desc) as Sales_pct_all
,RANK() OVER (ORDER BY Total_all desc) as Total_all
FROM CTE1 )
Select distinct SalesPerson from CTE2
Where Margin <= 100 Or Sales <= 100 Or Total <= 100 or Sales_pct <= 100
Or Margin_all <= 100 Or Sales_all <= 100 Or Total_all <= 100 or Sales_pct_all <= 100
I understand this is not perfect, but it should get you started. To filter by date, add DATEPART(month,[your date field]) to your PARTITION BY clauses (and the first CTE)

SQL question about GROUP BY

I've been using SQL for a few years, and this type of problem comes up here and there, and I haven't found an answer. But perhaps I've been looking in the wrong places - I'm not really sure what to call it.
For the sake of brevity, let's say I have a table with 3 columns: Customer, Order_Amount, Order_Date. Each customer may have multiple orders, with one row for each order with the amount and date.
My Question: Is there a simple way in SQL to get the DATE of the maximum order per customer?
I can get the amount of the maximum order for each customer (and which customer made it) by doing something like:
SELECT Customer, MAX(Order_Amount) FROM orders GROUP BY Customer;
But I also want to get the date of the max order, which I haven't figured out a way to easily get. I would have thought that this would be a common type of question for a database, and would therefore be easy to do in SQL, but I haven't found an easy way to do it yet. Once I add Order_Date to the list of columns to select, I need to add it to the Group By clause, which I don't think will give me what I want.
Apart from self-join you can do:
SELECT o1.*
FROM orders o1 JOIN orders o2 ON o1.Customer = o2.Customer
GROUP BY o1.Customer, o1.Order_Amount
HAVING o1.Order_Amount = MAX(o2.Order_Amount);
There's a good article reviewing various approaches.
And in Oracle, db2, Sybase, SQL Server 2005+ you would use RANK() OVER.
SELECT * FROM (
SELECT *
RANK() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) r
FROM orders) o
WHERE r = 1;
Note: If Customer has more than one order with maximum Order_Amount (i.e. ties), using RANK() function would get you all such orders; to get only first one, replace RANK() with ROW_NUMBER().
There's no short-cut... the easiest way is probably to join to a sub-query:
SELECT
*
FROM
orders JOIN
(
SELECT Customer, MAX(Order_Amount) AS Max_Order_Amount
FROM orders
GROUP BY Customer
) maxOrder
ON maxOrder.Customer = orders.Customer
AND maxOrder.Max_Order_Amount = orders.Order_Amount
you will want to join on the same table...
SELECT Customer, order_date, amt
FROM orders o,
( SELECT Customer, MAX(Order_Amount) amt FROM orders GROUP BY Customer ) o2
WHERE o.customer = o2.customer
AND o.order_amount = o2.amt
;
Another approach for the collection:
WITH tempquery AS
(
SELECT
Customer
,Order_Amount
,Order_Date
,row_number() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) AS rn
FROM
orders
)
SELECT
Customer
,Order_Amount
,Order_Date
FROM
tempquery
WHERE
rn = 1
If your DB Supports CROSS APPLY you can do this as well, but it doesn't handle ties correctly
SELECT [....]
FROM Customer c
CROSS APPLY
(SELECT TOP 1 [...]
FROM Orders o
WHERE c.customerID = o.CustomerID
ORDER BY o.Order_Amount DESC) o
See this data.SE query
You could try something like this:
SELECT Customer, MAX(Order_Amount), Order_Date
FROM orders O
WHERE ORDER_AMOUNT = (SELECT MAX(ORDER_AMOUNT) FROM orders WHERE CUSTOMER = O.CUSTOMER)
GROUP BY CUSTOMER, Order_Date
with t as
(
select CUSTOMER,Order_Date ,Order_Amount,max(Order_Amount) over (partition
by Customer) as
max_amount from orders
)
select * from t where t.Order_Amount=max_amount