SQL - group in one table for a max in another - sql

I'm trying to do a (Microsoft) sql query to get the most recent order from one table for each product in another. That is to say - the group by is in table B, while the max is in table A. I tried a number of things, but this is the final.
WITH max_date(maxid, maxdate) as (
SELECT inmost.PROD_ID as maxid
,MAX(inmost.ORDER_DATE) as maxdate
FROM(SELECT od2.PROD_ID
,o2.ORDER_DATE
FROM ORDERS o2
INNER JOIN ORDPRODS od2 ON o2.ORDER_ID = od2.ORDER_ID
) as inmost
GROUP BY inmost.PROD_ID
)
SELECT o.ORDER_DATE
,od.ORDER_QUANT
FROM ORDERS o
INNER JOIN ORDPRODS od ON o.ORDER_ID = od.ORDER_ID
LEFT JOIN max_date mxord ON od.PROD_ID = mxord.maxid
AND o.ORDER_DATE = mxord.maxdate
WHERE o.ORDERS_Canceled = '0'
At the end of things, it's still pulling multiple version of the each product and lots and lots of date for those products. For instance:
PROD_ID ORDER_DATE
111 1/1/2015
111 1/2/2015
112 1/2/2015
112 1/3/2015
112 1/4/2015
112 1/5/2015
What I WANT is:
PROD_ID ORDER_DATE
111 1/2/2015
112 1/5/2015

The join clause here means find matching Max_Date_mxord record on the ID and the date. Return all records found in od
LEFT JOIN max_date mxord ON od.PROD_ID = mxord.maxid
AND o.ORDER_DATE = mxord.maxdate
To visualize this you can add mxord.maxdate to your SELECT clause and you'd likely see many nulls
However you want to exclude records in OD that don't match maxdate. To do that you want an INNER join.

This is labeled as #mysql, but to my knowledge, MYSQL doesn't support the WITH clause, so I'm assuming this is SQL Server.
If that's the case, you could use the CROSS APPLY function, which would look something like this.
SELECT prd.PRODID, apl.ORDER_DATE
FROM ORDPRODS prd
CROSS APPLY (
SELECT TOP 1 ORDER_DATE
FROM ORDERS ord
WHERE ord.ORDER_ID = prd.ORDER_ID
AND ord.ORDERS_Canceled = '0'
ORDER BY ORDER_DATE DESC
) apl

Related

Returning complex query on update sql

I want to return query with multiple joins and with clause after updating something.
For example my query is:
WITH orders AS (
SELECT product_id, SUM(amount) AS orders
FROM orders_summary
GROUP BY product_id
)
SELECT p.id, p.name,
p.date_of_creation,
o.orders, s.id AS store_id,
s.name AS store_name
FROM products AS p
LEFT JOIN orders AS o
ON p.id = o.product_id
LEFT JOIN stores AS s
ON s.id = p.store_id
WHERE p.id = '1'
id
name
date
orders
store_id
store_name
1
pen
11/16/2022
10
1
jj
2
pencil
11/10/2022
30
2
ff
I want to return the exact query but with updated result in my update:
UPDATE products
SET name = 'ABC'
WHERE id = '1'
RETURNING up_qeury
Desired result on update:
id
name
date
orders
store_id
store_name
1
ABC
11/16/2022
10
1
jj
You can try UPDATE products ... RETURNING *. That may get you the content of the row you just updated.
As for UPDATE .... RETURNING someQuery, You Can't Do That™. You want to do both the update and a SELECT operation in one go. But that's not possible.
If you must be sure your SELECT works on the precisely the same data as you just UPDATEd, you can wrap your two queries in a BEGIN; / COMMIT; transaction. That prevents concurrent users from making changes between your UPDATE and SELECT.

SQL - Tricky Merging

I'm facing a tricky query to do. I hope your expertise will help me to sort it out.
There are 2 tables:
Table1 : Orders
Index ProductName OrderDate
0 a 03/03/1903
1 a 10/03/2014
2 b 01/01/2017
3 c 01/01/2019
Table2 : Product Specs
--> This table shows every change made in the Color of our products
Index ProductName Color ColorUpdatedOn
0 a Blue 01/01/1900
1 a Red 01/01/2014
2 a Yellow 01/01/2017
3 b Pink 01/01/2017
4 c Black 01/01/2018
5 c Black 31/12/2018
I would like to be able to retrieve all the data from Table1 with the Column Color et UpdatedOn
Index ProductName OrderDate Color ColorUpdatedOn
0 a 03/03/1903 Blue 01/01/1900
1 a 10/03/2014 Red 01/01/2014
2 a 01/01/2019 Yellow 01/01/2017
3 c 01/01/2019 Black 31/12/2018
Do you have any idea how I could do this ?
Thank you in advance for your help
Largo
Get the max() date of Product Specs table based on color,
then join it using year() function, applicable on mysql and mssql, not sure with other db.
select o.Index, o.ProdcutName, o.Date, t1.color, t1.ColorUpdatedOn
from Orders o
inner join
(select color, max(colorupdatedon) as ColorUpdatedOn
from productspecs
group by color) t1 on year(o.OrderDate) = year(t1.createdon)
but I would prefer using right() function since your year dates are at the end.
select o.Index, o.ProdcutName, o.Date, t1.color, t1.ColorUpdatedOn
from Orders o
inner join
(select color, max(colorupdatedon) as ColorUpdatedOn
from productspecs
group by color) t1 on right(o.OrderDate, 4) = right(t1.createdon, 4)
In a database that supports lateral joins (which is quite a few of them now), this is pretty easy:
select o.*, s.* -- select the columns you want
from orders o left join lateral
(select s.*
from specs s
where s.ProductName = o.ProductName and
s.ColorUpdatedOn <= o.OrderDate
order by s.ColorUpdatedOn desc
fetch first 1 row only
) s
on 1=1;
In SQL Server, this would use outer apply rather than left join lateral.
In other databases, I would use lead():
select o.*, s.* -- select the columns you want
from orders o left join
(select s.*,
lead(ColorUpdatedOn) over (partition by ProductName order by ColorUpdatedOn) as next_ColorUpdatedOn
from specs s
) s
on s.ProductName = o.ProductName and
o.OrderDate >= s.ColorUpdatedOn and
(o.OrderDate < s.next_ColorUpdatedOn or s.next_ColorUpdatedOn is null)
Assuming, the datatype for OrderDate and ColorUpdatedOn are both date, we can find the colors which was at the time of order.
For this I have used the anlytical/windowing function. The Hive query would look like this:
SELECT
y.ProductName, y.OrderDate, y.Color, y.ColorUpdatedOn
FROM (
SELECT
x.*,
DENSE_RANK() OVER(PARTITION BY x.ProductName, x.OrderDate ORDER BY x.recency ASC) AS relevance
FROM (
SELECT
a.*, b.color, b.ColorUpdatedOn, DATEDIFF(a.OrderDate, b.ColorUpdatedOn) AS recency
FROM
Order a
INNER JOIN
Product b
ON (
a.ProductName = b.ProductName
AND a.OrderDate >= b.ColorUpdatedOn
)
) x
) y
WHERE
y.relevance = 1;
The query could be made specific if you let me know the database you are using.
Let me know if it helps.

SQL joins show same record 3 times, instead of 3 records

I am working on a join exercise from Database Processing by Kroenke and Auer
There is a question which asks to find all the items shipped from Singapore displaying information from 3 different tables.
In the table there is 3 results which match these criteria.
I have tried a where join and an inner join, but each time instead of giving 3 results, it gives 1 result 3 times, which makes me convinced I'm messing something up with my syntax.
Here's the where join:
select shippername, shipment.shipmentId, departuredate
FROM shipment, item, SHIPMENT_ITEM
WHERE shipment_item.shipmentID = shipment.shipmentID
AND item.itemId = shipment_Item.itemID
AND item.city = 'Singapore';
And the inner join:
select shippername, shipment.shipmentId, departuredate
FROM shipment
INNER JOIN shipment_item ON shipment_item.shipmentID = shipment.shipmentID
INNER JOIN item ON item.itemId = shipment_Item.itemID
WHERE item.city = 'Singapore'
order by shippername asc,
departuredate desc;
The result of both queries:
shippername shipmentId departuredate
----------------------------------- ----------- -------------
International 4 2013-06-02
International 4 2013-06-02
International 4 2013-06-02
You can also add a Distinct after the select, but probably there's something different in each row, try select all columns as Gordon suggested to see what's different between them
Looks like more Items in One shipment,
So if you want Shipment based detail try GROUP_CONCAT with group by.
SELECT shippername,
shipment.shipmentId,
departuredate ,
GROUP_CONCAT(item.itemId) AS items_shiped
FROM shipment
INNER JOIN shipment_item ON shipment_item.shipmentID = shipment.shipmentID
INNER JOIN item ON item.itemId = shipment_Item.itemID
WHERE item.city = 'Singapore'
GROUP BY shipment.shipmentId
order by shippername asc,
departuredate desc;
Hope this helps.

Cross Join Query

Rewording my original post for further clarification.
I current have the below tables:
Product_Ref
product_id
product_name
Products
product_id
so_date (date)
total_sales
Calendar
dt (date field, each row representing a single day for the past/next 10 years)
I am looking to produce a report that will tell me the number of products that were sold in the past 6 months (based on SYSDATE) on a daily basis, the report should be every combination of day in the last 6 months against every possible product_id in the format:
Product id | date | total sales
If I assume that there were 0 entries in the products table (i.e no sales) I would still expect a complete report output but instead it would show 6 months of zero'd data i.e.
1 | 2012-01-01 | 0
2 | 2012-01-01 | 0
3 | 2012-01-01 | 0
1 | 2012-01-02 | 0
2 | 2012-01-02 | 0
3 | 2012-01-02 | 0
…
This would assume there were 3 products in the product_reference table - my original query (noted below) was my starter for 10, but not sure where to go from here.
SELECT products.product_id, calendar.dt, products.total_sales
FROM products RIGHT JOIN calendar ON (products.so_date = calendar.dt)
WHERE calendar.dt < SYSDATE AND calendar.dt >= ADD_MONTHS(SYSDATE, -7)+1
ORDER BY calendar.dt ASC, products.product_id DESC;
The clue is in the question - you are looking for a CROSS JOIN.
SELECT products.product_id, calendar.dt, products.total_sales
FROM Product_Ref
CROSS JOIN calendar
LEFT JOIN products ON products.so_date = calendar.dt
AND products.product_id = Product_Ref.product_id
WHERE calendar.dt < SYSDATE AND calendar.dt >= ADD_MONTHS(SYSDATE, -7)+1
ORDER BY calendar.dt ASC, products.product_id DESC;
I was confused at first by your table names where "Product" in fact means "sale" and "Product_Ref" is a product!
This is very similar to an example of the use of CROSS JOIN I once posted here.
As far as I understood, what do you want is to have no result if there were no sales, write?
So, I think you just need to change the RIGHT JOIN to INNER JOIN.
By RIGHT joining, if there were register in the JOIN table and there weren't in the FROM table it will return the data from the JOIN table, with NULL values in the columns refering to the FROM table.
By INNER joining you will have results just if you there were data that match in both tables.
Hope I understood well and it helps.
Assuming your desired output is to match only the products date with those in the calendar table, you should use an INNER JOIN:
SELECT c.dt, p.product_id, p.total_sales
FROM calendar c
INNER JOIN products p on c.dt = p.so_date
WHERE c.dt < SYSDATE and c.dt >= ADD_MONTHS(SYSDATE,-7)+1
ORDER BY c.dt ASC, p.product_id DESC;
A CROSS JOIN would produce results with every combination from your products table and your calendar table and thus not require the use of ON.
--EDIT
See edits below (UNTESTED):
SELECT PR.Product_ID, C.dt, P.TotalSales
FROM Calendar C
CROSS JOIN Product_Ref PR
LEFT JOIN Product P ON P.Product_Id = PR.Product_Id and p.so_date = c.dt
WHERE c.dt < SYSDATE and c.dt >= ADD_MONTHS(SYSDATE,-7)+1
ORDER BY c.dt ASC, p.product_id DESC;
Good luck.

Poorly performing Mysql subquery -- can I turn it into a Join?

I have a subquery problem that is causing poor performance... I was thinking that the subquery could be re-written using a join, but I'm having a hard time wrapping my head around it.
The gist of the query is this:
For a given combination of EmailAddress and Product, I need to get a list of the IDs that are NOT the latest.... these orders are going to be marked as 'obsolete' in the table which would leave only that latest order for a a given combination of EmailAddress and Product... (does that make sense?)
Table Definition
CREATE TABLE `sandbox`.`OrderHistoryTable` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT ,
`EmailAddress` VARCHAR( 100 ) NOT NULL ,
`Product` VARCHAR( 100 ) NOT NULL ,
`OrderDate` DATE NOT NULL ,
`rowlastupdated` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ,
PRIMARY KEY ( `id` ) ,
KEY `EmailAddress` ( `EmailAddress` ) ,
KEY `Product` ( `Product` ) ,
KEY `OrderDate` ( `OrderDate` )
) ENGINE = MYISAM DEFAULT CHARSET = latin1;
Query
SELECT id
FROM
OrderHistoryTable AS EMP1
WHERE
OrderDate not in
(
Select max(OrderDate)
FROM OrderHistoryTable AS EMP2
WHERE
EMP1.EmailAddress = EMP2.EmailAddress
AND EMP1.Product IN ('ProductA','ProductB','ProductC','ProductD')
AND EMP2.Product IN ('ProductA','ProductB','ProductC','ProductD')
)
Explanation of duplicate 'IN' statements
13 bob#aol.com ProductA 2010-10-01
15 bob#aol.com ProductB 2010-20-02
46 bob#aol.com ProductD 2010-20-03
57 bob#aol.com ProductC 2010-20-04
158 bob#aol.com ProductE 2010-20-05
206 bob#aol.com ProductB 2010-20-06
501 bob#aol.com ProductZ 2010-20-07
The results of my query should be
| 13 |
| 15 |
| 46 |
| 57 |
This is because, in the orders listed, those 4 have been 'superceded' by a newer order for a product in the same category. This 'category' contains prodcts A, B, C & D.
Order ids 158 and 501 show no other orders in their respective categories based on the query.
Final Query based off of accepted answer below:
I ended up using the following query with no subquery and got about 3X performance (30 sec down from 90 sec). I also now have a separate 'groups' table where I can enumerate the group members instead of spelling them out in the query itself...
SELECT DISTINCT id, EmailAddress FROM (
SELECT a.id, a.EmailAddress, a.OrderDate
FROM OrderHistoryTable a
INNER JOIN OrderHistoryTable b ON a.EmailAddress = b.EmailAddress
INNER JOIN groups g1 ON a.Product = g1.Product
INNER JOIN groups g2 ON b.Product = g2.Product
WHERE
g1.family = 'ProductGroupX'
AND g2.family = 'ProductGroupX'
GROUP BY a.id, a.OrderDate, b.OrderDate
HAVING a.OrderDate < MAX(b.OrderDate)
) dtX
Use:
SELECT a.id
FROM ORDERHISTORYTABLE AS a
LEFT JOIN (SELECT e.EmailAddress,
e.product,
MAX(OrderDate) AS max_date
FROM OrderHistoryTable AS e
WHERE e.Product IN ('ProductA','ProductB','ProductC','ProductD')
GROUP BY e.EmailAddress) b ON b.emailaddress = a.emailaddress
AND b.max_date = a.orderdate
AND b.product = a.product
WHERE x.emailaddress IS NULL
AND a.Product IN ('ProductA','ProductB','ProductC','ProductD')
Rant:
OMG Ponies' answer gives what you asked for - a rewrite with a join. But I would not be too excited about it, your performance killer is the inside join on email address which, I assume, is not particular selective and then your database needs to sift through all of those rows looking for the MAX of order date.
This in reality for MySQL will mean doing a filesort (can you post EXPLAIN SELECT ....?).
Now, if mysql had access to an index that would include emailaddress, product and orderdate it might, especially on MyISAM be much more efficient in determining MAX(orderdate) (and no, having an index on each of the columns is not the same as having a composite index on all of the columns). If I was trying to optimize that query, I would bet on that.
Other than this rant here's my version of not the latest from a category (I don't expect it to be better, but it is different and you should test the performance; it just might be faster due to lack of subqueries)
My attempt (not tested)
SELECT DISTINCT
notlatest.id,
notlatest.emailaddress,
notlatest.product,
notlatest.orderdate
FROM
OrderHistoryTable AS notlatest
LEFT JOIN OrderHistoryTable AS EMP latest ON
notlatest.emailaddress = latest.emailaddress AND
notlatest.orderdate < latest.orderdate AND
WHERE
notlatest.product IN ('ProductA','ProductB','ProductC','ProductD') AND
latest.product IN ('ProductA','ProductB','ProductC','ProductD') AND
latest.id IS NOT NULL
Comments:
- If there is only one record in the category it will not be displayed
- Again indexes should speed the above very much
Actually this is (might be) a good example of how normalizing data would improve performance - your product implies product category, but product category is not stored anywhere and the IN test will not be maintainable in the long run.
Furthermore by creating a product category you would be able to index directly on it.
If the Product was indexed on the category then the performance of joins on the category should be better then test on the Product indexed by value (and not category).
(Actually then MyISAM's composite index on emailaddress, category, orderdate should already contain max, min and count per category and that should be cheap).
My MySQL is a bit rusty (I'm used to MSSQL), but here's my best guess. It might need a bit of tweaking in the GROUP BY and HAVING clauses. Also, I assumed from your duplicate IN statements that you want the Products to match in both tables. If this isn't the case, I'll adjust the query.
SELECT a.id
FROM OrderHistoryTable a
INNER JOIN OrderHistoryTable b
ON a.Product = b.Product AND
a.Employee = b.Employee
WHERE a.Product IN ('ProductA','ProductB','ProductC','ProductD')
GROUP BY a.id, a.OrderDate, b.OrderDate,
HAVING b.OrderDate < MAX(a.OrderDate)
Edit: removed extraneous AND.
SELECT *
FROM (
SELECT product, MAX(OrderDate) AS md
FROM OrderHistoryTable
WHERE product IN ('ProductA','ProductB','ProductC','ProductD')
GROUP BY
product
) ohti
JOIN orderhistorytable oht
ON oht.product = ohti.product
AND oht.orderdate <> ohti.md
Create an index on OrderHistoryTable (product, orderdate) for this to work fast.
Also note that it will return duplicates of the MAX(orderdate) within a product, if any.