Let's consider we have table of currencies. The task is to derive the last update of a price for each currency (name).
My endeavor:
SELECT name, date, price
FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY name
ORDER BY date DESC
) as RN
FROM currency
) X where RN = 1;
My solution virtually cover everything, except two dollar's fields:
| USD | 2006-03-04 | 8 | and | USD | 2007-03-04 | 8 |
Technically, I comprehend why does it happen. I define the RN = 1, which selects first row for each currency (considering that the dates for each currency are in descending order). In the problem, described above the last update occurs 2006-03-04, and generally its order is second.
However, I have no idea, how to formulate a request to choose the MIN of the dates in the last subgroup.
If you have an idea how to do that - I'ill be very thankful!
I think you want the rows where the currency changes. If that interpretation is correct, use lag():
select c.*
from (select c.*, lag(name) over (order by date) as prev_name
from currency c
) c
where prev_name is null or prev_name <> name;
Note that standard SQL has a way to simplify the where clause using a NULL safe comparison:
where prev_name is distinct from name
However, not all databases support this (or similar) syntax.
EDIT:
I think my above interpretation is incorrect. You want the second to last row because the price does not change. So:
select c.*
from (select c.*, lag(price) over (partition by name order by date) as prev_price
from currency c
) c
where prev_price is null or prev_price <> price;
Then to get one row per currency:
select c.*
from (select c.*,
row_number() over (partition by name order by date desc) as seqnum
from (select c.*, lag(price) over (partition by name order by date) as prev_price
from currency c
) c
where prev_price is null or prev_price <> price
) c
where seqnum = 1
EDIT II:
In Postgres, the last query is more simply written as:
select distinct on (name) c.*
from (select c.*, lag(price) over (partition by name order by date) as prev_price
from currency c
) c
where prev_price is null or prev_price <> price
order by name, date desc
Related
I have products audits table looks like this
id
product_id
column_updated
value
timestamp
1
product_1
name
Big Shoes.
"18 September 2022 6:42:50 PM GMT+05:30"
2.
product_1
name
Green Shoes
"18 September 2022 6:42:43 PM GMT+05:30"
3.
product_1
name
Big Green Shoes
"18 September 2022 6:43:43 PM GMT+05:30"
I want to show report of latest change happened on column
in form like below
product_id
column_updated
latest_value
previous_value
product_1
name
Green Shoes
Big Green Shoes
I have prepared a query to fetch last 2 record but not sure how I can merge them to form a view like this?
my query is
select product_id, column_updated, value
from audits
where product_id = 'product_1'
and column_updated = 'name'
order by timestamp desc
limit 2;
Please suggest any approach for this, Thanks in advance!
You need to use LEADa swindow function to hget the latest and previous value
WITH CTE as
(select product_id,
column_updated,
value as latest_value,
lead(value) over (
partition by product_id,column_updated order by timestamp desc
) as previous_value,
ROW_NUMBER() over (
partition by product_id,column_updated order by timestamp desc
) rn
from audits
where product_id = 'product_1'
and column_updated = 'name')
SELECT product_id, column_updated,latest_value,previous_value FROM CTE WHERE rn = 1
product_id
column_updated
latest_value
previous_value
product_1
name
Big Green Shoes
Big Shoes.
SELECT 1
fiddle
This problem can be solved by combination of window functions and CTE:
with data as (
select
product_id,
column_updated,
value,
lag(value) over (partition by product_id, column_updated order by updated_at asc) prev_value,
row_number() over (partition by product_id, column_updated order by updated_at desc) rn,
updated_at
from log
) select
product_id,
column_updated,
value,
prev_value,
updated_at
from data
where rn = 1;
online sql editor
where lag give us previous value and row_number give ability to filter only last change
You don't need both row_number and lag like the other answers. You can do it with just row_number. Give it a row number and then join back with the prior value having row number 2.
WITH rownumbered AS (
SELECT product_id, column_updated, value, updated_at
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY updated_at DESC) rn,
FROM log
)
SELECT d.product_id, d,column_updated, d.value, p.value as prevous_value
FROM rownumbered d
JOIN rownumbered p ON d.product_id = p.product_id and p.rn = 2
WHERE d.rn = 1;
You may want to group by product_id and name -- if that is the case it looks like this:
WITH rownumbered AS (
SELECT product_id, column_updated, value, updated_at
ROW_NUMBER() OVER (PARTITION BY product_id, column_updated ORDER BY updated_at DESC) rn,
FROM log
)
SELECT d.product_id, d,column_updated, d.value, p.value as prevous_value
FROM rownumbered d
JOIN rownumbered p ON d.product_id = p.product_id
and d.column_updated = p.column_updated
and p.rn = 2
WHERE d.rn = 1;
can you use lag to get the previous value? https://www.postgresqltutorial.com/postgresql-window-function/postgresql-lag-function/
select product_id,
lag(column_updated) over (
partition by product_id order by timestamp desc
) as column_updated,
column_updated as latest_value
from ...
I have a Forecast and an Actuals table with Table structures as such:
YearNb, WeekNb, Country, Product, Volume
Now I am working on a third Table with the same structure that combines the two.
I already have a query that is simply importing all the actuals. Now I need to import all the Forecasts that are relevant. This leads to my problem. I only need the Forecasts that have a more recent Date than the actuals. The Forecasts table includes all historic forecasts, most of which are not relevant. I need to make this check on a country level, since we receive this data on a country level and different countries can have more or less recent actuals.
What I already did:
WITH cte AS
(
SELECT Country, YearNb, WeekNb, (YearNb*100 + WeekNb) AS Date,
ROW_NUMBER() OVER (PARTITION BY Country ORDER BY (YearNb*100 + WeekNb) DESC) AS rn
FROM Actuals
)
SELECT *
FROM cte
WHERE rn = 1
This gives me a grouped list per country with the latest date of actual data.
But now I am kind of stuck how I could use this to select the data from the forecast table that has a more recent date.
Country YearNb WeekNb Date
A 2018 29 201829
B 2019 5 201905
C 2018 34 201834
One important thing, I need this data on the product level, so to be in the same structure as the original two tables.
So as final output I need all the Forecast per product for country A after the date 201829, all the data from Country B after the Date 201905 etc.
Try to JOIN by year field and add a condition to get earlier dates:
SELECT
*
FROM Actuals act
INNER JOIN
(
SELECT
(
SELECT
Country, YearNb, WeekNb, (YearNb*100 + WeekNb) AS Date,
ROW_NUMBER() OVER (PARTITION BY Country ORDER BY (YearNb*100 + WeekNb) DESC) AS rn
FROM Actuals
WHERE ROW_NUMBER() OVER (PARTITION BY Country ORDER BY (YearNb*100 + WeekNb) DESC) = 1
)
WHERE RN = 1
)q ON act.YearNb = q.YearNb and (act.YearNb*100 + act.WeekNb) < q.Date
I would use a dependent query with NOT EXISTS
select YearNb, WeekNb, Country, Product, Volume
from Forecast f
where not exists (
select 1
from Actual a
where a.country = f.country and
a.YearNb * 100 + a.WeekNb >= f.YearNb * 100 + f.WeekNb
)
This select relevant data from your Forecast table. If you are considered about the performance, then EXISTS can perform better if you have an index on country attribute.
EDIT
If you want to omit forecats of countries that are not in actual then use a semi-join
select f.*
from Forecast f
where not exists (
select 1
from Actual a
where a.country = f.country and
a.YearNb * 100 + a.WeekNb >= f.YearNb * 100 + f.WeekNb
) and
exists(
select 1
from Actual a
where a.country = f.country
)
Using your own CTE you can get it
WITH cte AS
(
SELECT Country, YearNb, WeekNb, (YearNb*100 + WeekNb) AS Date,
ROW_NUMBER() OVER (PARTITION BY Country ORDER BY (YearNb*100 + WeekNb) DESC) AS rn
FROM Actuals
)
SELECT f.*
FROM forecast f
JOIN cte ON f.Country = cte.Country AND cte.date < (f.YearNb*100 + f.WeekNb)
WHERE cte.rn = 1
I would use cross apply:
select f.*, a.*
from (select a.*,
row_number() over (partition by country order by yearnb desc, weeknb desc) as seqnum
from actuals a
) a cross apply
(select f.*
from forecast f
where f.country = a.country and
(f.yearnb > a.yearnb or
f.yearnb = a.yearnb and f.weeknb > a.weeknb
)
) f
where a.seqnum = 1;
This makes it easy to choose columns from both tables.
I'm trying to query some data from SQL such that it sums some columns, gets the max of another column and the corresponding row for a third column. For example,
|dataset|
|shares| |date| |price|
100 05/13/16 20.4
200 05/15/16 21.2
300 06/12/16 19.3
400 02/22/16 20.0
I want my output to be:
|shares| |date| |price|
1000 06/12/16 19.3
The shares have been summed up, the date is max(date), and the price is the price at max(date).
So far, I have:
select sum(shares), max(date), max(price)
but that gives me an incorrect price.
EDIT:
I realize I was unclear in my OP, all the other relevant data is in one table, and the price is in other. My full code is:
select id, stock, side, exchange, max(startdate), max(enddate),
sum(shares), sum(execution_price*shares)/sum(shares), max(limitprice), max(price)
from table1 t1
INNER JOIN table2 t2 on t2.id = t1.id
where location = 'CHICAGO' and startdate > '1/1/2016' and order_type = 'limit'
group by id, stock, side, exchange
You can do this with window functions and aggregation. Here is an example:
select sum(shared), max(date), max(case when seqnum = 1 then price end) as price
from (select t.*, row_number() over (order by date desc) as seqnum
from t
) t;
EDIT:
If the results that you are looking at are in fact the result of a query, you can do:
with t as (<your query here>)
select sum(shared), max(date), max(case when seqnum = 1 then price end) as price
from (select t.*, row_number() over (order by date desc) as seqnum
from t
) t;
Heres one way to do it .... the join would obviously include the ticker symbol for the share also
select
a.sum_share,
a.max_date
b.price
FROM
(
select ticker , sum(shares) sum_share, max(date) max_date from table where ticker = 'MSFT' group by ticker
) a
inner join table on a.max_date = b.date and a.ticker = b.ticker
In the SQL space (specifically T-SQL, SQL Server 2008), given this list of values:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060
ACT 2012-01-07 11:51:07.920
ACT 2012-01-08 04:13:29.140
NOS 2012-01-09 04:29:16.873
ACT 2012-01-21 12:39:37.607 <-- THIS
ACT 2012-01-21 12:40:03.840
ACT 2012-05-02 16:27:17.370
GRAD 2012-05-19 13:30:02.503
GRAD 2013-09-03 22:58:48.750
Generated from this query:
SELECT Status, Date
FROM Account_History
WHERE AccountNumber = '1234'
ORDER BY Date
The status for this particular object started at ACT, then changed to NOS, then back to ACT, then to GRAD.
What is the best way to get the minimum date from the latest "group" of records where Status = 'ACT'?
Here is a query that does this, by identifying the groups where the student statuses are the same and then using simple aggregation:
select top 1 StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged,
(row_number() over (order by "date") -
row_number() over (partition by studentstatus order by "date)
) as grp
FROM Account_History
WHERE AccountNumber = '1234'
) t
where StudentStatus = 'ACT'
group by StudentStatus, grp
order by WhenLastChanged desc;
The row_number() function assigns sequential numbers within groups of rows based on the date. For your data, the two row_numbers() and their difference is:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060 1 1 0
ACT 2012-01-07 11:51:07.920 2 2 0
ACT 2012-01-08 04:13:29.140 3 3 0
NOS 2012-01-09 04:29:16.873 4 1 3
ACT 2012-01-21 12:39:37.607 5 4 1
ACT 2012-01-21 12:40:03.840 6 5 1
ACT 2012-05-02 16:27:17.370 7 6 1
GRAD 2012-05-19 13:30:02.503 8 1 7
GRAD 2013-09-03 22:58:48.750 9 2 7
Notice the last row is constant for rows that have the same status.
The aggregation brings these together and chooses the latest (top 1 . . . order by date desc) of the first dates (min(date)).
EDIT:
The query is easy to tweak for multiple account numbers. I probably should have written that way to begin with, except the final selection is trickier. The results from this has the date for each status and account:
select StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged, AccountNumber
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
order by WhenLastChanged desc;
But you can't get the last one per account quite so easily. Another level of subqueries:
select AccountNumber, StudentStatus, WhenLastChanged
from (select AccountNumber, StudentStatus, min(WhenLastChanged) as WhenLastChanged,
row_number() over (partition by AccountNumber, StudentStatus order by min(WhenLastChanged) desc
) as seqnum
from (SELECT AccountNumber, StudentStatus, WhenLastChanged,
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
) t
where seqnum = 1;
This uses aggregation along with the window function row_number(). This is assigning sequential numbers to the groups (after aggregation), with the last date for each account getting a value of 1 (order by min(WhenLastChanged) desc). The outermost select then just chooses that row for each account.
SELECT [Status], MIN([Date])
FROM Table_Name
WHERE [Status] = (SELECT [Status]
FROM Table_Name
WHERE [Date] = (SELECT MAX([Date])
FROM Table_Name)
)
GROUP BY [Status]
Try here Sql Fiddle
Hogan: basically, yes. I just want to know the date/time when the
account was last changed to ACT. The records after the point above
marked THIS are just extra.
Instead of just looking for act we can look for first time status changes and select act (and max) from that.
so... every time a status changes:
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
)
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
now finding the max of the act items.
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
), statuschange as
(
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
)
select max(date)
from satuschange
where status='Act'
consider the below:
ProductID Supplier
--------- --------
111 Microsoft
112 Microsoft
222 Apple Mac
222 Apple
223 Apple
In this example product 222 is repeated because the supplier is known as two names in the data supplied.
I have data like this for thousands of products. How can I delete the duplicate products or select individual results - something like a self join with SELECT TOP 1 or something like that?
Thanks!
I think you want to do the following:
select t.*
from (select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
) t
where seqnum = 1
This selects an arbitrary row for each product.
To delete all rows but one, you can use the same idea:
with todelete (
(select t.*,
row_number() over (partition by product_id order by (select NULL)) as seqnum
from t
)
delete from to_delete where seqnum > 1
DELETE a
FROM tableName a
LEFT JOIN
(
SELECT Supplier, MIN(ProductID) min_ID
FROM tableName
GROUP BY Supplier
) b ON a.supplier = b.supplier AND
a.ProductID = b.min_ID
WHERE b.Supplier IS NULL
SQLFiddle Demo
or if you want to delete productID which has more than onbe product
WITH cte
AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Supplier) rn
FROM tableName
)
DELETE FROM cte WHERE rn > 1
SQLFiddle Demo
;WITH Products_CTE AS
(
SELECT ProductID, Supplier,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY <some value>) as rn
FROM PRODUCTS
)
SELECT *
FROM Products_CTE
WHERE rn = 1
The some value is going to be the key that determines which version of Supplier you keep. If you want the first instance of the supplier, you could use the DateAdded column, if it exists.