One to many join with group by - sql

I have two tables. one table is named Shopper and it looks like
SHOPPER_ID | SHOPPER_NAME |
-------------------------
1 | Marianna |
2 | Jason |
and another table named Order has information like Date on the order
ORDER_ID | SHOPPER_ID | DATE
----------------------------------
1 | 1 | 08/09/2012
2 | 1 | 08/08/2012
Now I want to do a query that joins two tables and group by SHOPPER_ID, because one shopper can have multiple orders, I want to pick the latest order base on DATE value.
My query looks like:
Select * from Shopper as s join Order as o
on s.SHOPPER_ID = o.SHOPPER_ID
group by s.SHOPPER_ID
The query is wrong right now because I don't know how to apply the filter to only get the latest order. Thanks in advance!

I suggest using a sub-select:
Select s.SHOPPER_ID, s.SHOPPER_NAME, o.MAX_DATE
from Shopper s
INNER join (SELECT SHOPPER_ID, MAX(DATE) AS MAX_DATE
FROM ORDER
GROUP BY SHOPPER_ID) o
on s.SHOPPER_ID = o.SHOPPER_ID
Best of luck.

Easy way is use row_number to find the lastest order
SQL Fiddle Demo
SELECT *
FROM
(SELECT S.*,
O.[ORDER_ID], O.[DATE],
ROW_NUMBER() OVER ( PARTITION BY S.SHOPPER_ID
ORDER BY [DATE] DESC) as rn
FROM Shopper S
JOIN Orders O
ON S.SHOPPER_ID = O.SHOPPER_ID
) T
WHERE rn = 1

SELECT *
FROM
Shopper s
CROSS APPLY
(
SELECT TOP 1 *
FROM
Order o
WHERE
s.SHOPPER_ID = o.SHOPPER_ID
ORDER BY
o.DATE DESC
) o;

You need a subquery to get the last order per shopper, and then join that with the shopper and order tables to get the name of the shopper and the order id
SELECT ss.SHOPPER_ID, ss.SHOPPER_NAME, oo.ORDER_ID LAST_ORDER
FROM (SELECT o.SHOPPER_ID, MAX(o.DATE) [DATE]
FROM Shopper s
INNER JOIN Order o
ON s.SHOPPER_ID = o.SHOPPER_ID
GROUP BY o.SHOPPER_ID) mo
INNER JOIN Shopper ss
ON mo.SHOPPER_ID = ss.SHOPPER_ID
INNER JOIN Order oo
ON mo.SHOPPER_ID = oo.SHOPPER_ID AND mo.DATE = oo.DATE
Here's the SQL Fiddle to try it out

Select s.*, o1.*
From Order as o1
left join Order as o2
on (o1.SHOPPER_ID = o2.SHOPPER_ID and o1.DATE < o2.DATE)
join Shopper as s
on (s.SHOPPER_ID = o1.SHOPPER_ID )
where o2.DATE is NULL;
Join Order table to itself, looking for newer Orders to join it to. The "left" join means that every row in the Order table will be kept in the results even if it cannot be joined to a newer order for that customer.
The "where" discards all of the rows where a newer order was found. This leaves you only with only the most recent Orders.
Join those results to the Shopper table to include the shopper data.
Edit: I suggested this answer because JOINs are much faster for a Database than sub-selects.

Related

Get max value from another query

I have problems with some query. I need to get max value and product_name from that query:
select
products.product_name,
sum(product_invoice.product_amount) as total_amount
from
product_invoice
inner join
products on product_invoice.product_id = products.product_id
inner join
invoices on product_invoice.invoice_id = invoices.invoice_id
where
month(invoices.invoice_date) = 2
group by
products.product_name
This query returns a result like this:
product_name | total_amount
--------------+--------------
chairs | 70
ladders | 500
tables | 150
How to get from this: ladders 500?
Select product_name,max(total_amount) from(
select
products.product_name,
sum(product_invoice.product_amount) as total_amount
from product_invoice
inner join products
on product_invoice.product_id = products.product_id
inner join invoices
on product_invoice.invoice_id = invoices.invoice_id
where month(invoices.invoice_date) = 2
group by products.product_name
) outputTable
You can use order by and fetch first 1 row only:
select p.product_name,
sum(pi.product_amount) as total_amount
from product_invoice pi inner join
products p
on pi.product_id = p.product_id inner join
invoices i
on pi.invoice_id = i.invoice_id
where month(i.invoice_date) = 2 -- shouldn't you have the year here too?
group by p.product_name
order by total_amount
fetch first 1 row only;
Not all databases support the ANSI-standard fetch first clause. You may need to use limit, select top, or some other construct.
Note that I have also introduced table aliases -- they make the query easier to write and to read. Also, if you are selecting the month, shouldn't you also be selecting the year?
In older versions of SQL Server, you would use select top 1:
select top (1) p.product_name,
sum(pi.product_amount) as total_amount
from product_invoice pi inner join
products p
on pi.product_id = p.product_id inner join
invoices i
on pi.invoice_id = i.invoice_id
where month(i.invoice_date) = 2 -- shouldn't you have the year here too?
group by p.product_name
order by total_amount;
To get all rows with the top amount, use SELECT TOP (1) WITH TIES . . ..
If you are using SQL Server, then TOP can offer a solution:
SELECT TOP 1
p.product_name,
SUM(pi.product_amount) AS total_amount
FROM product_invoice pi
INNER JOIN products p
ON pi.product_id = p.product_id
INNER JOIN invoices i
ON pi.invoice_id = i.invoice_id
WHERE
MONTH(i.invoice_date) = 2
GROUP BY
p.product_name
ORDER BY
SUM(pi.product_amount) DESC;
Note: If there could be more than one product tied for the top amount, and you want all ties, then use TOP 1 WITH TIES, e.g.
SELECT TOP 1 WITH TIES
... (the same query I have above)

Fetch most recent records as part of Joins

I am joining 2 tables customer & profile. Both the tables are joined by a specific column cust_id. In profile table, I have more than 1 entry. I want to select the most recent entry by start_ts (column) when joining both the tables. As a result I would like 1 row - row from customer and most recent row from profile in the resultset. Is there a way to do this ORACLE SQL?
I would use window functions:
select . . .
from customer c join
(select p.*,
row_number() over (partition by cust_id order by start_ts desc) as seqnum
from profile
) p
on c.cust_id = p.cust_id and p.seqnum = 1;
You can use a left join if you like to get customers that don't have profiles as well.
One way (which works for all DB engines) is to join the tables you want to select data from and then join against the specific max-record of profile to filter out the data
select c.*, p.*
from customer c
join profile p on c.cust_id = p.cust_id
join
(
select cust_id, max(start_ts) as maxts
from profile
group by cust_id
) p2 on p.cust_id = p2.cust_id and p.start_ts = p2.maxts
Here is another way (if there exists no newer entry then it's the newest):
select
c.*,
p.*
from
customer c inner join
profile p on p.cust_id = c.cust_id and not exists(
select *
from profile
where cust_id = c.cust_id and start_ts > p.start_ts
)

SQL strategy to fetch maximum

Suppose I have these three tables:
I want to get, for all products, it's product_id and the client that bougth it most times (the biggest client of the product).
I solved it like this:
SELECT
product_id AS product,
(SELECT TOP 1 client_id FROM Bill_Item, Bill
WHERE Bill_Item.product_id = p.product_id
and Bill_Item.bill_id = Bill.bill_id
GROUP BY
client_id
ORDER BY
COUNT(*) DESC
) AS client
FROM Product p
Do you know a better way?
the inner query will give you the ranking. The outer query will give you the client that puchase the most for a product
SELECT *
(
SELECT i.product_id, b.client_id,
r = row_number() over (partition by i.product_id
order by count(*) desc)
FROM Bill b
INNER JOIN Bill_Item i ON b.bill_id = i.bill_id
GROUP BY i.product_id, b.client_id
) d
WHERE r = 1
I was going to submit pretty much the same thing as #Squirrell only with a Common Table Expression [CTE] rather than a derived table. So I wont duplicate that but there are some learning points concerning your query. First is IMPLICIT JOINS such as FROM Bill_Item, Bill are really easy to have uintended consequences (one of many questions: Queries that implicit SQL joins can't do?) Next for the Calculated column you can actually do this in a OUTER APPLY or CROSS APPLY which is a very useful technique.
So you could re-write your method as follows:
SELECT *
FROM
Product p
OUTER APPLY (SELECT TOP 1 b.client_id
FROM
Bill_Item bi
INNER JOIN Bill b
ON bi.bill_id = b.bill_id
WHERE
bi.product_id = p.product_id
GROUP BY
b.client_id
ORDER BY
COUNT(*) DESC) c
And to show you how squirell's answer can still include products that have never been sold all you need to do is join Products and LEFT JOIN to other tables:
;WITH cte AS (
SELECT
p.product_id
,b.client_id
,ROW_NUMBER() OVER (PARTITION BY p.product_id ORDER BY COUNT(*) DESC) as RowNumber
FROM
Product p
LEFT JOIN Bill_Item bi
ON p.product_id = bi.product_id
LEFT JOIN Bill b
ON bi.bill_id = b.bill_id
GROUP BY
p.product_id
,b.client_id
)
SELECT *
FROM
cte
WHERE
RowNumber = 1
Techniques used in some of these that are useful.
CTE
APPLY (Outer & Cross)
Window Functions
Squirrel's answer doesn't return products that have never been sold. If you want to include those, then your approach is ok, although I would write the query as:
SELECT product_id as product,
(SELECT TOP 1 b.client_id
FROM Bill_Item bi JOIN
Bill b
ON bi.bill_id = b.bill_id
WHERE Bill_Item.product_id = p.product_id
GROUP BY client_id
ORDER BY COUNT(*) DESC
) as client
FROM Product p;
You can also express this using APPLY, but a correlated subquery is also fine.
Note the correct use of the explicit JOIN syntax.

postgres join max date

I need to construct a join that will give me the most recent price for each product. I vastly simplified the table structures for the purpose of the example, and each table row counts will be in the millions. My previous stabs at this have not exactly been very effecient.
In PostgreSQL, you could try DISTINCT ON to only get the first row per product id in descending create_date order;
SELECT DISTINCT ON (products.id) products.*, prices.*
FROM products
JOIN prices
ON products.id = prices.product_id
ORDER BY products.id, create_date DESC
(of course, except for illustrative purposes, you should of course select the exact columns you need)
The simplest way to do it is using the row_number function.
SELECT
p.name,
t.amount AS latest_price
FROM (
SELECT
p.*,
row_number() OVER (PARTITION BY product_id ORDER BY create_date DESC) AS rn
FROM
prices p) t
JOIN products p ON p.id = t.product_id
WHERE
rn = 1
While the DISTINCT ON answer worked for my instance, I found there's a faster way for me to get what I need.
SELECT
DISTINCT ON(u.id) u.id,
(CAST(data AS JSON) ->> 'Finished') AS Finished,
ee.post_value
FROM
users_user u
JOIN events_event ee on u.id = ee.actor_id
WHERE
u.id > 20000
ORDER BY
u.id DESC,
ee.time DESC;
takes ~25s on my DB, while
SELECT
u.id,
(CAST(data AS JSON) ->> 'Finished') AS Finished,
e.post_value
FROM
users_user u
JOIN events_event e on u.id = e.actor_id
LEFT JOIN events_event ee on ee.actor_id = e.actor_id
AND ee.time > e.time
WHERE
u.id > 20000
AND ee.id IS NULL
ORDER BY
u.id DESC;
takes ~15s.

Join two tables but only get most recent associated record

I am having a hard time constructing an sql query that gets all the associated data with respect to another (associated) table and loops over into that set of data on which are considered as latest (or most recent).
The image below describes my two tables (Inventory and Sales), the Inventory table contains all the item and the Sales table contains all the transaction records. The Inventory.Id is related to Sales.Inventory_Id. And the Wanted result is the output that I am trying to work on to.
My objective is to associate all the sales record with respect to inventory but only get the most recent transaction for each item.
Using a plain join (left, right or inner) doesn't produce the result that I am looking into for I don't know how to add another category in which you can filter the most recent data to join to. Is this doable or should I change my table schema?
Thanks.
You can use APPLY
Select Item,Sales.Price
From Inventory I
Cross Apply(Select top 1 Price
From Sales S
Where I.id = S.Inventory_Id
Order By Date Desc) as Sales
WITH Sales_Latest AS (
SELECT *,
MAX(Date) OVER(PARTITION BY Inventory_Id) Latest_Date
FROM Sales
)
SELECT i.Item, s.Price
FROM Inventory i
INNER JOIN Sales_Latest s ON (i.Id = s.Inventory_Id)
WHERE s.Date = s.Latest_Date
Think carefully about what results you expect if there are two prices in Sales for the same date.
I would just use a correlated subquery:
select Item, Price
from Inventory i
inner join Sales s
on i.id = s.Inventory_Id
and s.Date = (select max(Date) from Sales where Inventory_Id = i.id)
select * from
(
select i.name,
row_number() over (partition by i.id order by s.date desc) as rownum,
s.price,
s.date
from inventory i
left join sales s on i.id = s.inventory_id
) tmp
where rownum = 1
SQLFiddle demo