Get most recent record from Right table with sub query - sql

When I join to the right table I am getting way too many duplicates. I am trying to grab the most recent record from the right table however, it does not matter what I try it does not work.
So Far I have tried:
PROC SQL;
CREATE TABLE fs1.sample AS
SELECT A.*,
B.xx1,
max(B.time_s)
FROM lx1.results a left join (Select Distinct C.id, c.per FROM lx2.results c
Where c.id = a.id
and COMPGED(a.txt1, c.txt1,'i') < 100
and c.dt > a.dt
and c.ksv = 37
and datepart(c.lsg) >= '12DEC2020'd ) b
ON a.id = b.id
group by a.id, a.txt1
QUIT;
Unfortunately, I get an error. I also tried using case when exists, but that takes way too long. Essentially I am trying to grab the most recent record from the right table based on time_s. I also want to make sure the record I grab from the right table somewhat matches a.txt1.
Cheers

When you perform a join, you attach all records from the table that match your join conditions.
If the table is indexed appropriately, a subquery could achieve the goal of obtaining the most recent value, however, if the query uses the wrong index, TOP or equivalent functions may return the wrong result.
There are a number of ways to accomplish the task of retrieving the most recent record but they are contingent on a couple of things.
Firstly, you need to be able to identify what the most recent row is, usually by a column called CreatedDate or something similar against the IDs. (You should know what that business logic is, it may be that the table is chronologically entered [as most tables are] and therefore, SubID might be a thing. We're going to assume it is CreatedDate.)
Secondly, you need to rank the rows in terms of the CreatedDate in a descending order so that the newest matching ID is ranked 1.
Finally, you filter your results by 1 to return the newest result, but you could also filter by <= x if you are interested in the top x newest return results per ID.
To use more mathematical language: We are deriving a value from the CreatedDate and ID values and then using that derivative value to sort and filter the data. In this case we are deriving the RowNumber from the CreatedDate in descending order for each ID.
In order to accomplish this, you can use the Windowed Function ROW_NUMBER(),
ROW_NUMBER() OVER (PARTITION BY id ORDER BY CreatedDate DESC) as RankID
This windowed function will return a row value for each ID relative to the CreatedDate in descending order, where the newest created date is equal to 1.
You can then put brackets around the whole query to make it into a table so you will be able to filter the results of that Windowed Function.
SELECT id, txt
(SELECT id, txt
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY CreatedDate DESC) as RankID
FROM SourceTable) A
WHERE RankID = 1
This should achieve your goal of returning the "newest result".
What ever your column is that determines the age of the data relative to the ID, it can be multiple, should be placed within the ORDER BY.
In order to make this query perform faster, you should index your data appropriately, whereby ID is the the first column, and CreatedDate Desc is your next column. This means your system will not have to perform a costly sort every time this runs, but that depends on whether you plan on using this query often and how much overhead it is grabbing.

Related

SQL Server : verify that two columns are in same sort order

I have a table with an ID and a date column. It's possible (likely) that when a new record is created, it gets the next larger ID and the current datetime. So if I were to sort by date or I were to sort by ID, the resulting data set would be in the same order.
How do I write a SQL query to verify this?
It's also possible that an older record is modified and the date is updated. In that case, the records would not be in the same sort order. I don't think this happens.
I'm trying to move the data to another location, and if I know that there are no modified records, that makes it a lot simpler.
I'm pretty sure I only need to query those two columns: ID, RecordDate. Other links indicate I should be able to use LAG, but I'm getting an error that it isn't a built-in function name.
In other words, both https://dba.stackexchange.com/questions/42985/running-total-to-the-previous-row and Is there a way to access the "previous row" value in a SELECT statement? should help, but I'm still not able to make that work for what I want.
If you cannot use window functions, you can use a correlated subquery and EXISTS.
SELECT *
FROM elbat t1
WHERE EXISTS (SELECT *
FROM elbat t2
WHERE t2.id < t1.id
AND t2.recorddate > t1.recorddate);
It'll select all records where another record with a lower ID and a greater timestamp exists. If the result is empty you know that no such record exists and the data is like you want it to be.
Maybe you want to restrict it a bit more by using t2.recorddate >= t1.recorddate instead of t2.recorddate > t1.recorddate. I'm not sure how you want it.
Use this:
SELECT ID, RecordDate FROM tablename t
WHERE
(SELECT COUNT(*) FROM tablename WHERE tablename.ID < t.ID)
<>
(SELECT COUNT(*) FROM tablename WHERE tablename.RecordDate < t.RecordDate);
It counts for each row, how many rows have id less than the row's id and
how many rows have RecordDate less than the row's RecordDate.
If these counters are not equal then it outputs this row.
The result is all the rows that would not be in the same position after sorting by ID and RecordDate
One method uses window functions:
select count(*)
from (select t.*,
row_number() over (order by id) as seqnum_id,
row_number() over (order by date, id) as seqnum_date
from t
) t
where seqnum_id <> seqnum_date;
When the count is zero, then the two columns have the same ordering. Note that the second order by includes id. Two rows could have the same date. This makes the sort stable, so the comparison is valid even when date has duplicates.
the above solutions are all good but if both dates and ids are in increment then this should also work
select modifiedid=t2.id from
yourtable t1 join yourtable t2
on t1.id=t2.id+1 and t1.recordDate<t2.recordDate

SQL Server, get most recent record within a specified date range

I'm trying to figure out how to do this via a view if possible (definitely aware this can be done in-line, via a function, and/or a proc.
There's a view that needs to dedupe a dataset and pick the most recent record. So I'm trying to use wither row_number(), or even a top 1 order by via a cross apply, but the problem is that the query can filter on a date, so for eg.
select x
from view
where date < somedate
and the most recent record needs to be calculated for that filtered dataset. Is there a way to do this in a view? a correlated subquery comes to mind but try as I may, either I get the full duped dataset, or it picks the most recent on the table without a date filter, then applies the date filter after the fact, which isn't the same thing.
Some background per Yogesh:
The table in question contains history of an employee table, where each employee_id can exist multiple times with different date values. There is a primary key on this table which is an employeehistory_id (identity). The goal is to get the most recent record for all employees (1 unique record per employee) where the date < some date. The problem with the windowing is that it needs to almost have the date filter in a subquery within the view (from what I'm seeing). Hopefully that helps clarify the answer.
Currently the view would be something like
SELECT a.*
FROM employeehistory a
join (select employee_id, employeehistory_ID, row_number()OVER(PARTITION
BY employee_id ORDER BY Date DESC) as Ranked
FROM employeehistory) b
on a.employee_id = b.employee_id
and a.employeehistory_ID = b.employeehistory_ID
where b.Ranked = 1
so as you can see, filtering the view with a date, doesn't propagate necessarily to the inner. So asking to see if there is a way to still keep this functional in a view. Once again, I'm aware this can be done either as a function or a proc. Thanks!
We're using SQL Server 2016 Enterprise edition.
how about
select top 1 x
from view
where date < somedate
order by date desc
SELECT TOP 1 x
FROM [View]
WHERE
date < somedate
ORDER BY
date DESC
select *
from MyView
where myDate < somedate
order by myDate desc
limit 1;
limit 1 is the same as top 1 but for MySQL. I have a fiddle that shows what I think your expected output should look like for it as well.
http://sqlfiddle.com/#!9/209226/7

Retrieving last record in each group from database with additional max() condition in MSSQL

This is a follow-up question to Retrieving last record in each group from database - SQL Server 2005/2008
In the answers, this example was provided to retrieve last record for a group of parameters (example below retrieves last updates for each value in computername):
select t.*
from t
where t.lastupdate = (select max(t2.lastupdate)
from t t2
where t2.computername = t.computername
);
In my case, however, "lastupdate" is not unique (some updates come in batches and have same lastupdate value, and if two updates of "computername" come in the same batch, you will get non-unique output for "computername + lastupdate").
Suppose I also have field "rowId" that is just auto-incremental. The mitigation would be to include in the query another criterion for a max('rowId') field.
NB: while the example employs time-specific name "lastupdate", the actual selection criteria may not be related to the time at all.
I, therefore, like to ask, what would be the most performant query that selects the last record in each group based both on "group-defining parameter" (in the case above, "computername") and on maximal rowId?
If you don't have uniqueness, then row_number() is simpler:
select t.*
from (select t.*,
row_number() over (partition by computername order by lastupdate, rowid desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery is usually faster. However, the performance difference is not that great.

Get 5 most recent records for each Number

Data I have a table in Access that has a Part Number and PriceYr and Price associated to each Part Number.There are over 10,000 records and the PartNumber are repeated and has different PriceYr and Price associated to it. However, I need a query to just find the 5 most recent price and date associated with it.
I tried using MAX(PriceYr) however, it only returns 1 most recent record for each PartNumber.
I also tried the following query but it doesn't seem to work.
SELECT Catalogs.PartNumber,Catalogs.PriceYr, Catalogs.Price FROM Catalogs
WHERE Catalogs.PriceYr in
(SELECT TOP 5 Catalogs.PriceYr
FROM Catalogs as Temp
WHERE Temp.PartNumber = Catalogs.PartNumber
ORDER By Catalogs.PriceYr DESC)
Any help will be greatly appreciated. Thank you.
Desired Result that i am trying to get.
Consider a correlated count subquery to filter by a rank variable. Right now, you pull top 5 overall on matching PartNumber not per PartNumber.
SELECT main.*
FROM
(SELECT c.PartNumber, c.PriceYr, c.Price,
(SELECT Count(*)
FROM Catalogs AS Temp
WHERE Temp.PartNumber = c.PartNumber
AND Temp.PriceYr >= c.PriceYr) As rank
FROM Catalogs c
) As main
WHERE main.rank <= 5
MAX() is an aggregating function, meaning that it groups all the data and takes the maximal value in the specified column. You need to use a GROUP BY statement to prevent the query from grouping the whole dataset in a single row.
On the other hand, your query seems to needlessly use a subquery. The following query should work quite fine :
SELECT TOP 5 c.PartNumber, c.PriceYr, c.Price
FROM Catalogs c
ORDER BY c.PriceYr DESC
WHERE c.PartNumber = #partNumber -- if you want the query to
-- work on a specific part number
(please post a table creation query to make sure this example works)

Oracle Select Max Date on Multiple records

I've got the following SELECT statement, and based on what I've seen here: SQL Select Max Date with Multiple records I've got my example set up the same way. I'm on Oracle 11g. Instead of returning one record for each asset_tag, it's returning multiples. Not as many records as in the source table, but more than (I think) it should be. If I run the inner SELECT statement, it also returns the correct set of records (1 per asset_tag), which really has me stumped.
SELECT
outside.asset_tag,
outside.description,
outside.asset_type,
outside.asset_group,
outside.status_code,
outside.license_no,
outside.rentable_yn,
outside.manufacture_code,
outside.model,
outside.manufacture_vin,
outside.vehicle_yr,
outside.meter_id,
outside.mtr_uom,
outside.mtr_reading,
outside.last_read_date
FROM mp_vehicle_asset_profile outside
RIGHT OUTER JOIN
(
SELECT asset_tag, max(last_read_date) as last_read_date
FROM mp_vehicle_asset_profile
group by asset_tag
) inside
ON outside.last_read_date=inside.last_read_date
Any suggestions?
Try with analytical functions:
SELECT outside.asset_tag,
outside.description,
outside.asset_type,
outside.asset_group,
outside.status_code,
outside.license_no,
outside.rentable_yn,
outside.manufacture_code,
outside.model,
outside.manufacture_vin,
outside.vehicle_yr,
outside.meter_id,
outside.mtr_uom,
outside.mtr_reading,
outside.last_read_date
FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY asset_tag ORDER BY last_read_date DESC) Corr
FROM mp_vehicle_asset_profile) outside
WHERE Corr = 1
I think you need to add...
AND outside.asset_tag=inside.asset_tag
...to the criteria in your ON list.
Also a RIGHT OUTER JOIN is not needed. An INNER JOIN will give the same results (and may be more efficicient), since there will be cannot be be combinations of asset_tag and last_read_date in the subquery that do not exist in mp_vehicle_asset_profile.
Even then, the query may return more than one row per asset tag if there are "ties" -- that is, multiple rows with the same last_read_date. In contrast, #Lamak's analytic-based answer will arbitrarily pick exactly one row this situation.
Your comment suggests that you want to break ties by picking the row with highest mtr_reading for the last_read_date.
You could modify #Lamak's analyic-based answer to do this by changing the ORDER BY in the OVER clause to:
ORDER BY last_read_date DESC, mtr_reading DESC
If there are still ties (that is, multiple rows with the same asset_tag, last_read_date, and mtr_reading), the query will again abritrarily pick exactly one row.
You could modify my aggregate-based answer to break ties using highest mtr_reading as follows:
SELECT
outside.asset_tag,
outside.description,
outside.asset_type,
outside.asset_group,
outside.status_code,
outside.license_no,
outside.rentable_yn,
outside.manufacture_code,
outside.model,
outside.manufacture_vin,
outside.vehicle_yr,
outside.meter_id,
outside.mtr_uom,
outside.mtr_reading,
outside.last_read_date
FROM
mp_vehicle_asset_profile outside
INNER JOIN
(
SELECT
asset_tag,
MAX(last_read_date) AS last_read_date,
MAX(mtr_reading) KEEP (DENSE_RANK FIRST ORDER BY last_read_date DESC) AS mtr_reading
FROM
mp_vehicle_asset_profile
GROUP BY
asset_tag
) inside
ON
outside.asset_tag = inside.asset_tag
AND
outside.last_read_date = inside.last_read_date
AND
outside.mtr_reading = inside.mtr_reading
If there are still ties (that is, multiple rows with the same asset_tag, last_read_date, and mtr_reading), the query may again return more than one row.
One other way that the analytic- and aggregate-based answers differ is in their treatment of nulls. If any of asset_tag, last_read_date, or mtr_reading are null, the analytic-based answer will return related rows, but the aggregate-based one will not (because the equality conditions in the join do not evaluate to TRUE when a null is involved.