SQL question about GROUP BY - sql

I've been using SQL for a few years, and this type of problem comes up here and there, and I haven't found an answer. But perhaps I've been looking in the wrong places - I'm not really sure what to call it.
For the sake of brevity, let's say I have a table with 3 columns: Customer, Order_Amount, Order_Date. Each customer may have multiple orders, with one row for each order with the amount and date.
My Question: Is there a simple way in SQL to get the DATE of the maximum order per customer?
I can get the amount of the maximum order for each customer (and which customer made it) by doing something like:
SELECT Customer, MAX(Order_Amount) FROM orders GROUP BY Customer;
But I also want to get the date of the max order, which I haven't figured out a way to easily get. I would have thought that this would be a common type of question for a database, and would therefore be easy to do in SQL, but I haven't found an easy way to do it yet. Once I add Order_Date to the list of columns to select, I need to add it to the Group By clause, which I don't think will give me what I want.

Apart from self-join you can do:
SELECT o1.*
FROM orders o1 JOIN orders o2 ON o1.Customer = o2.Customer
GROUP BY o1.Customer, o1.Order_Amount
HAVING o1.Order_Amount = MAX(o2.Order_Amount);
There's a good article reviewing various approaches.
And in Oracle, db2, Sybase, SQL Server 2005+ you would use RANK() OVER.
SELECT * FROM (
SELECT *
RANK() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) r
FROM orders) o
WHERE r = 1;
Note: If Customer has more than one order with maximum Order_Amount (i.e. ties), using RANK() function would get you all such orders; to get only first one, replace RANK() with ROW_NUMBER().

There's no short-cut... the easiest way is probably to join to a sub-query:
SELECT
*
FROM
orders JOIN
(
SELECT Customer, MAX(Order_Amount) AS Max_Order_Amount
FROM orders
GROUP BY Customer
) maxOrder
ON maxOrder.Customer = orders.Customer
AND maxOrder.Max_Order_Amount = orders.Order_Amount

you will want to join on the same table...
SELECT Customer, order_date, amt
FROM orders o,
( SELECT Customer, MAX(Order_Amount) amt FROM orders GROUP BY Customer ) o2
WHERE o.customer = o2.customer
AND o.order_amount = o2.amt
;

Another approach for the collection:
WITH tempquery AS
(
SELECT
Customer
,Order_Amount
,Order_Date
,row_number() OVER (PARTITION BY Customer ORDER BY Order_Amount DESC) AS rn
FROM
orders
)
SELECT
Customer
,Order_Amount
,Order_Date
FROM
tempquery
WHERE
rn = 1

If your DB Supports CROSS APPLY you can do this as well, but it doesn't handle ties correctly
SELECT [....]
FROM Customer c
CROSS APPLY
(SELECT TOP 1 [...]
FROM Orders o
WHERE c.customerID = o.CustomerID
ORDER BY o.Order_Amount DESC) o
See this data.SE query

You could try something like this:
SELECT Customer, MAX(Order_Amount), Order_Date
FROM orders O
WHERE ORDER_AMOUNT = (SELECT MAX(ORDER_AMOUNT) FROM orders WHERE CUSTOMER = O.CUSTOMER)
GROUP BY CUSTOMER, Order_Date

with t as
(
select CUSTOMER,Order_Date ,Order_Amount,max(Order_Amount) over (partition
by Customer) as
max_amount from orders
)
select * from t where t.Order_Amount=max_amount

Related

SQL Query - second ID of a list ordered by date and ID

I have a SQL database with a list of Customer IDs CustomerID and invoices, the specific product purchased in each invoice ProductID, the Date and the Income of each invoice . I need to write a query that will retrieve for each product, which was the second customer who made a purchase
How do I do that?
EDIT:
I have come up with the following query:
SELECT *,
LEAD(CustomerID) OVER (ORDER BY ProductID, Date) AS 'Second Customer Who Made A Purchase'
FROM a
ORDER BY ProductID, Date ASC
However, this query presents multiple results for products that have more than two purchases. Can you advise?
SELECT a2.ProductID,
(
SELECT a1.CustomerID
FROM a a1
WHERE a1.ProductID = a2.ProductID
ORDER BY Date asc
LIMIT 1,1
) as SecondCustomer
FROM a a2
GROUP BY a2.ProductID
I need to write a query that will retrieve for each product, which was the second customer who made a purchase
This sounds like a window function:
select a.*
from (select a.*,
row_number() over (partition by productid order by date asc) as seqnum
from a
) a
where seqnum = 2;

SQLite query with LIMIT per column

I am trying to compose a query with a where condition to get multiple unique sorted columns without having to do it in multiple queries. That is confusing so here is an example...
Price Table
id | item_id | date | price
I want to query to find the most recent price of multiple items given a date. I was previously iterating through items in my application code and getting the most recent price like this...
SELECT * FROM prices WHERE item_id = ? AND date(date) < date(?) ORDER BY date(date) DESC LIMIT 1
Iterating through each item and doing a query is too slow so I am wondering if there is a way I can accomplish this same query for multiple items in one go. I have tried UNION but I cannot get it to work with the ORDER BY and LIMIT commands like this thread says (https://stackoverflow.com/a/1415380/4400804) for MySQL
Any ideas on how I can accomplish this?
Try this (based on adapting the answer):
SELECT * FROM prices a WHERE a.RowId IN (
SELECT b.RowId
FROM prices b
WHERE a.item_id = b.item_id AND date < ?
ORDER BY b.item_id LIMIT 1
) ORDER BY date DESC;
Window functions (Available with sqlite 3.25 and newer) will likely help:
WITH ranked AS
(SELECT id, item_id, date, price
, row_number() OVER (PARTITION BY item_id ORDER BY date DESC) AS rn
FROM prices
WHERE date < ?)
SELECT id, item_id, date, price
FROM ranked
WHERE rn = 1
ORDER BY item_id;
will return the most recent of each item_id from all records older than a given date.
I would simply use a correlated subquery in the `where` clause:
SELECT p.*
FROM prices p
WHERE p.DATE = (SELECT MAX(p2.date)
FROM prices p2
WHERE p2.item_id = p.item_id
);
This is phrase so it works on all items. You can, of course, add filtering conditions (in the outer query) for a given set of items.
With NOT EXISTS:
SELECT p.* FROM prices p
WHERE NOT EXISTS (
SELECT 1 FROM prices
WHERE item_id = p.item_id AND date > p.date
)
or with a join of the table to a query that returns the last date for each item_id:
SELECT p.*
FROM prices p INNER JOIN (
SELECT item_id, MAX(date) date
FROM prices
GROUP BY item_id
) t ON t.item_id = p.item_id AND t.date = p.date

Max function on date giving multiple records

I have got 3 tables: order, customer and invoice. I need to get the latest invoice number for each customer.
I am using the max function on order date and then grouping by customer number and invoice number, where order status was confirmed or shipped.
select max(o.order_date), c.customer_number, i.invoice_number
from orders o , invoices i , customer c
where o.order_oid = i.order_oid
and c.customer_oid = i.customer_oid
and o.status_oid in ( 4,6)
group by c.customer_number, i.invoice_number;
I am getting duplicate records like:
Date cust_num invc#
1/22/2018 479 I128
4/23/2018 479 I287
5/18/2018 479 I433
It should have returned me only last record. What am I doing wrong?
From your description and comments you seem to want:
select max(o.order_date), c.customer_number,
max(i.invoice_number) keep (dense_rank last order by o.order_date) as invoice_number
from orders o , invoices i , customer c
where o.order_oid = i.order_oid
and c.customer_oid = i.customer_oid
and o.status_oid in ( 4,6)
group by c.customer_number;
The group by no longer includes the invoice number; instead the last invoice number, based on date, is found using last.
If the invoice numbers are strictly in date order, and fixed length, you could potentially just do:
select max(o.order_date), c.customer_number, max(i.invoice_number) as invoice_number
but if if it's possible to go from say invoice I999 to I1000 then it isn't safe to sort those just as strings, since - as a string 'I1000' will sort before 'I999'.
Not related, but you might want to consider moving to modern join syntax:
select max(o.order_date), c.customer_number,
max(i.invoice_number) keep (dense_rank last order by o.order_date) as invoice_number
from orders o
join invoices i on i.order_oid = o.order_oid
join customer c on c.customer_oid = i.customer_oid
where o.status_oid in (4, 6)
group by c.customer_number;
You need max(invoice_number) to avoid getting a record per invoice
You can use row_number() window analytic function
select order_date, customer_number, invoice_number
from
(
select o.order_date, c.customer_number, i.invoice_number,
row_number() over (partition by c.customer_number order by o.order_date desc) as rn
from orders o
join invoices i on o.order_oid = i.order_oid
join customer c on c.customer_oid = i.customer_oid
where o.status_oid in (4,6)
)
where rn = 1;
P.S. : Of course, proper to give up old-style comma seperated join for the queries

SQL Server : select only last record per customer from a join query

Assume I have these 3 tables :
The first 2 tables define customers of different types ,i.e second table has other columns which are not included in table 1 i just left them the same to save complexity.
The third table defines orders for both types of customers . Each customer has more than one orders
I want to select the last order for every customer, i.e the order with order_id 4 for customer 1 which was created on 23/12/2016 and the order with order_id 5 for customer 2 which was created on 26/12/2016
I tried something like this :
select *
from customertype1
left join order on order.customer_id = customertype1.customer_id
order by order_id desc;
But this gives me multiple records for every customer, as I have stated above I want only the last order for every customertype1.
If you want the last order for each customer, then you only need the orders table:
select o.*
from (select o.*,
row_number() over (partition by customer_id order by datecreated desc) as seqnum
from orders o
) o
where seqnum = 1;
If you want to include all customers, then you need to combine the two tables. Assuming they are mutually exclusive:
with c as (
select customer_id from customers1 union all
select customer_id from customers2
)
select o.*
from c left join
(select o.*,
row_number() over (partition by customer_id order by datecreated desc) as seqnum
from orders o
) o
on c.customer_id = o.customer_id and seqnum = 1;
A note about your data structure: You should have one table for all customers. You can then define a foreign key constraint between orders and customers. For the additional columns, you can have additional tables for the different types of customers.
Use ROW_NUMBER() and PARTITION BY.
ROW_NUMBER(): it will give sequence no to your each row
PARTITION BY: it will group your data by given column
When you use ROW_NUMBER() and PARTITION BY both together then first partition by group your records and then row_number give then sequence no by each group, so for each group you have start sequence from 1
Help Link: Example of ROW_NUMBER() and PARTITION BY
This is the general idea. You can work out the details.
with customers as
(select customer_id, customer_name
from table1
union
select customer_id, customer_name
from table2)
, lastOrder as
(select customer_id, max(order_id) maxOrderId
from orders
group by customer_id)
select *
from lastOrder join customers on lastOrder.Customer_id = customers.customer_id
join orders on order_id = maxOrderId

ORACLE SQL Return only duplicated values (not the original)

I have a database with the following info
Customer_id, plan_id, plan_start_dte,
Since some customer switch plans, there are customers with several duplicated customer_ids, but with different plan_start_dte. I'm trying to count how many times a day members switch to the premium plan from any other plan ( plan_id = 'premium').
That is, I'm trying to do roughly this: return all rows with duplicate customer_id, except for the original plan (min(plan_start_dte)), where plan_id = 'premium', and group them by plan_start_dte.
I'm able to get all duplicate records with their count:
with plan_counts as (
select c.*, count(*) over (partition by CUSTOMER_ID) ct
from CUSTOMERS c
)
select *
from plan_counts
where ct > 1
The other steps have me stuck. First I tried to select everything except the original plan:
SELECT CUSTOMERS c
where START_DTE not in (
select min(PLAN_START_DTE)
from CUSTOMERS i
where c.CUSTOMER_ID = i.CUSTOMER_ID
)
But this failed. If I can solve this I believe all I have to add is an additional condition where c.PLAN_ID = 'premium' and then group by date and do a count. Anyone have any ideas?
I think you want lag():
select c.*
from (select c.*,
lag(plan_id) over (partition by customer_id order by plan_start_date) as prev_plan_id
from customers c
) c
where prev_plan_id <> 'premium' and plan_id = 'premium';
I'm not sure what output you want. For the number of times this occurs per day:
select plan_start_date, count(*)
from (select c.*, lag(plan_id) over (partition by customer_id order by plan_start_date) as prev_plan_id
from customers c
) c
where prev_plan_id <> 'premium' and plan_id = 'premium'
group by plan_start_date
order by plan_start_date;