Finding max of transaction_date for corresponding code - sql

I'm trying to find out how to find max of transaction_date per EAN_code
My table looks like:
Transaction_Date EAN_Code
09/04/2018 3029440000286
09/04/2018 3029440000286
08/04/2018 5000128221139
14/04/2018 5000128221139
08/04/2018 5000128221139
10/04/2018 5000128221108
Essentially what we need to do is for the list of items we want to pull out the latest date that it was sold across, e.g. one row per product, last date sold.
Both columns have non distinct values.

Simply do a GROUP BY. Use MAX() to get the latest date for each product.
select EAN_Code, max(Transaction_Date)
from tablename
group by EAN_Code

You could use ROW_NUMBER/RANK:
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY Ean_Code
ORDER BY Transaction_Date DESC) AS rn
FROM table_name) s
WHERE s.rn = 1;

Related

Find the nth greatest value per group in SQL

I'm trying to find the nth greatest value in each group in a table; is there an efficient way to do this in SQL? (specifically Google BigQuery, if that's relevant)
For example, suppose we had a table sales with two fields, customer_id and amount, where each record corresponds to the sale of an item to a customer for a given amount. If I wanted the top sale to each customer, I could do
SELECT customer_id, MAX(amount) top_amount
FROM sales
GROUP BY customer_id;
If I instead wanted the 5th greatest value for each customer, is there an efficient/idiomatic way to do that in SQL?
Consider below approach
SELECT customer_id, array_agg(amount order by amount desc limit 5)[safe_offset(4)] top_5th_amount
FROM sales
GROUP BY customer_id;
Yet another option with use of nth_value() function
SELECT distinct customer_id,
nth_value(amount, 5) over win top_5th_amount
FROM sales
window win as (partition by customer_id order by amount desc rows between unbounded preceding and unbounded following )
You can use qualify:
select s.*
from sales s
where 1=1
qualify row_number() over (partition by customer_id order by amount desc) = 5;
Note: You question is unclear on how to handle tied amounts. This treats them as separate amounts (so the 5th could be the same as the 1st). If you want the 5th largest distinct value, use dense_rank() instead.

SQL: Take 1 value per grouping

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?
I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;
The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date
You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins
My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

PostgreSQL backward intersection & join

I have a survey form of certain questions for a certain facility.
the facility can be monitored(data entry) more than once in a month.
now i need the latest data(values) against the questions
but if there is no latest data against any question i will traverse through prior records(previous dates) of the same month.
i can get the latest record but i don't know how to get previous record of the same month id there is no latest data.
i am using PostgreSQL 10.
Table Structure is
Desired output is
You can try to use ROW_NUMBER window function to make it.
SELECT to_char(date, 'MON') month,
facility,
idquestion,
value
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY facility,idquestion ORDER BY DATE DESC) rn
FROM T
) t1
where rn = 1
demo:db<>fiddle
SELECT DISTINCT
to_char(qdate, 'MON'),
facility,
idquestion,
first_value(value) OVER (PARTITION BY facility, idquestion ORDER BY qdate DESC) as value
FROM questions
ORDER BY facility, idquestion
Using window functions:
first_value(value) OVER ... gives you the first value of a window frame. The frame is a group of facility and idquestion. Within this group the rows are ordered by date DESC. So the very last value is first no matter which date it is
DISTINCT filtered the tied values (e.g. there are two values for facility == 1 and idquestion == 7)
Please notice:
"date" is a reserved word in Postgres. I strongly recommend to rename your column to avoid certain trouble. Furthermore in Postgres lower case is used and is recommended.

MAX() for 2 Dates Separately

I am trying to find a way to get the Max Date from one field and then to remove duplication get the Max of those dates from another field.
So far I have managed to get the Max of the Effective Dates, but need to get the Max timestamp from those values to remove duplication.
Here is what I have so far:
SELECT
a2.CUST_ID
, Address
, Effective_Date --DATE variable
, Timestamp_Entry --DATETIME variable
FROM
(SELECT
CUST_ID
, MAX (Effective_Date) as Most_Effective_Date
FROM Address_Table
GROUP BY CUST_ID) a1
JOIN Address_Table a2
ON a1.CUST_ID = a2.CUST_ID and a1.Most_Effective_Date = a2.Effective_Date
(Some timestamp entrys may be newer entries with older effective date, which is why the Effective Date takes priority, and then the TimeStamp should remove duplicates
I think this is what you want:
select a.*
from (select a.*,
row_number() over (partition by cust_id order by effective_date desc, timestamp_entry desc) as seqnum
from address_table a
) a
where seqnum = 1;
This returns the "most recent" address for each customer based on the two columns.

Select one row per index value with max column value

With a table setup with the following fields:
SKU, EVENTSTARTDATE, EVENTENDDATE, PRICE, (...etc...)
and housing thousands of rows here is example data (dates are YYMMDD, century code excluded):
1111111, 101224, 101231, 10.99
1111111, 110208, 110220, 9.99
1111111, 110301, 110331, 8.99
2222222, 101112, 101128, 15.99
2222222, 101201, 110102, 14.99
etc
I'd like to have a SELECT statement return one row per SKU with the maximum EVENTSTARTDATE without having a WHERE clause isolating a specific SKU or incomplete subset of SKUs (desired SELECT statement should return one row per SKU for all SKUs). I'd eventually like to add the criteria that start date is less than or equal to current date, and end date is greater than or equal to current date, but I have to start somewhere first.
Example results desired (for now just max date):
1111111, 110301, 110331, 8.99
2222222, 101201, 110102, 14.99
etc.
From recent versions of DB2, you can use the analytical function ROW_NUMBER()
SELECT *
FROM (
SELECT
tablename.*,
ROW_NUMBER() OVER (PARTITION BY sku
ORDER BY eventstartdate DESC) As RowNum
FROM tablename) X
WHERE X.RowNum=1
For each Partition (group of SKU), the data is row numbered following the order by eventstartdate desc, so 1,2,3,...starting from 1 for the latest EventStartDate. The WHERE clause then picks up only the latest per SKU.
Have a look at GROUP BY and HAVING clauses.
select sku, max(eventstartdate)
FROM TABLE
group by sku
having eventstartdate <= sysdate
Edit: added HAVING statement
other solution
select distinct f3.*
from yourtable f1
inner join lateral
(
select * from yourtable f2
where f1.SKU=f2.SKU
order by EVENTSTARTDATE desc, EVENTENDDATE desc
fetch first rows only
) f3 on 1=1