SQL Server, get most recent record within a specified date range - sql

I'm trying to figure out how to do this via a view if possible (definitely aware this can be done in-line, via a function, and/or a proc.
There's a view that needs to dedupe a dataset and pick the most recent record. So I'm trying to use wither row_number(), or even a top 1 order by via a cross apply, but the problem is that the query can filter on a date, so for eg.
select x
from view
where date < somedate
and the most recent record needs to be calculated for that filtered dataset. Is there a way to do this in a view? a correlated subquery comes to mind but try as I may, either I get the full duped dataset, or it picks the most recent on the table without a date filter, then applies the date filter after the fact, which isn't the same thing.
Some background per Yogesh:
The table in question contains history of an employee table, where each employee_id can exist multiple times with different date values. There is a primary key on this table which is an employeehistory_id (identity). The goal is to get the most recent record for all employees (1 unique record per employee) where the date < some date. The problem with the windowing is that it needs to almost have the date filter in a subquery within the view (from what I'm seeing). Hopefully that helps clarify the answer.
Currently the view would be something like
SELECT a.*
FROM employeehistory a
join (select employee_id, employeehistory_ID, row_number()OVER(PARTITION
BY employee_id ORDER BY Date DESC) as Ranked
FROM employeehistory) b
on a.employee_id = b.employee_id
and a.employeehistory_ID = b.employeehistory_ID
where b.Ranked = 1
so as you can see, filtering the view with a date, doesn't propagate necessarily to the inner. So asking to see if there is a way to still keep this functional in a view. Once again, I'm aware this can be done either as a function or a proc. Thanks!
We're using SQL Server 2016 Enterprise edition.

how about
select top 1 x
from view
where date < somedate
order by date desc

SELECT TOP 1 x
FROM [View]
WHERE
date < somedate
ORDER BY
date DESC

select *
from MyView
where myDate < somedate
order by myDate desc
limit 1;
limit 1 is the same as top 1 but for MySQL. I have a fiddle that shows what I think your expected output should look like for it as well.
http://sqlfiddle.com/#!9/209226/7

Related

Get most recent record from Right table with sub query

When I join to the right table I am getting way too many duplicates. I am trying to grab the most recent record from the right table however, it does not matter what I try it does not work.
So Far I have tried:
PROC SQL;
CREATE TABLE fs1.sample AS
SELECT A.*,
B.xx1,
max(B.time_s)
FROM lx1.results a left join (Select Distinct C.id, c.per FROM lx2.results c
Where c.id = a.id
and COMPGED(a.txt1, c.txt1,'i') < 100
and c.dt > a.dt
and c.ksv = 37
and datepart(c.lsg) >= '12DEC2020'd ) b
ON a.id = b.id
group by a.id, a.txt1
QUIT;
Unfortunately, I get an error. I also tried using case when exists, but that takes way too long. Essentially I am trying to grab the most recent record from the right table based on time_s. I also want to make sure the record I grab from the right table somewhat matches a.txt1.
Cheers
When you perform a join, you attach all records from the table that match your join conditions.
If the table is indexed appropriately, a subquery could achieve the goal of obtaining the most recent value, however, if the query uses the wrong index, TOP or equivalent functions may return the wrong result.
There are a number of ways to accomplish the task of retrieving the most recent record but they are contingent on a couple of things.
Firstly, you need to be able to identify what the most recent row is, usually by a column called CreatedDate or something similar against the IDs. (You should know what that business logic is, it may be that the table is chronologically entered [as most tables are] and therefore, SubID might be a thing. We're going to assume it is CreatedDate.)
Secondly, you need to rank the rows in terms of the CreatedDate in a descending order so that the newest matching ID is ranked 1.
Finally, you filter your results by 1 to return the newest result, but you could also filter by <= x if you are interested in the top x newest return results per ID.
To use more mathematical language: We are deriving a value from the CreatedDate and ID values and then using that derivative value to sort and filter the data. In this case we are deriving the RowNumber from the CreatedDate in descending order for each ID.
In order to accomplish this, you can use the Windowed Function ROW_NUMBER(),
ROW_NUMBER() OVER (PARTITION BY id ORDER BY CreatedDate DESC) as RankID
This windowed function will return a row value for each ID relative to the CreatedDate in descending order, where the newest created date is equal to 1.
You can then put brackets around the whole query to make it into a table so you will be able to filter the results of that Windowed Function.
SELECT id, txt
(SELECT id, txt
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY CreatedDate DESC) as RankID
FROM SourceTable) A
WHERE RankID = 1
This should achieve your goal of returning the "newest result".
What ever your column is that determines the age of the data relative to the ID, it can be multiple, should be placed within the ORDER BY.
In order to make this query perform faster, you should index your data appropriately, whereby ID is the the first column, and CreatedDate Desc is your next column. This means your system will not have to perform a costly sort every time this runs, but that depends on whether you plan on using this query often and how much overhead it is grabbing.

Get 5 most recent records for each Number

Data I have a table in Access that has a Part Number and PriceYr and Price associated to each Part Number.There are over 10,000 records and the PartNumber are repeated and has different PriceYr and Price associated to it. However, I need a query to just find the 5 most recent price and date associated with it.
I tried using MAX(PriceYr) however, it only returns 1 most recent record for each PartNumber.
I also tried the following query but it doesn't seem to work.
SELECT Catalogs.PartNumber,Catalogs.PriceYr, Catalogs.Price FROM Catalogs
WHERE Catalogs.PriceYr in
(SELECT TOP 5 Catalogs.PriceYr
FROM Catalogs as Temp
WHERE Temp.PartNumber = Catalogs.PartNumber
ORDER By Catalogs.PriceYr DESC)
Any help will be greatly appreciated. Thank you.
Desired Result that i am trying to get.
Consider a correlated count subquery to filter by a rank variable. Right now, you pull top 5 overall on matching PartNumber not per PartNumber.
SELECT main.*
FROM
(SELECT c.PartNumber, c.PriceYr, c.Price,
(SELECT Count(*)
FROM Catalogs AS Temp
WHERE Temp.PartNumber = c.PartNumber
AND Temp.PriceYr >= c.PriceYr) As rank
FROM Catalogs c
) As main
WHERE main.rank <= 5
MAX() is an aggregating function, meaning that it groups all the data and takes the maximal value in the specified column. You need to use a GROUP BY statement to prevent the query from grouping the whole dataset in a single row.
On the other hand, your query seems to needlessly use a subquery. The following query should work quite fine :
SELECT TOP 5 c.PartNumber, c.PriceYr, c.Price
FROM Catalogs c
ORDER BY c.PriceYr DESC
WHERE c.PartNumber = #partNumber -- if you want the query to
-- work on a specific part number
(please post a table creation query to make sure this example works)

SQL Querying Column on Max Date

I apologize if my code is not properly typed. I am trying to query a table that will return the latest bgcheckdate and status report. The table contains additional bgcheckdates and statuses for each record but in my report I only need to see the latest bgcheckdate with its status.
SELECT BG.PEOPLE_ID, MAX(BG.DATE_RUN) AS DATERUN, BG.STATUS
FROM PKS_BGCHECK BG
GROUP BY BG.PEOPLE_ID, BG.status;
When I run the above query, I still see queries with multiple background check dates and statuses.
Whereas when I run without the status, it works fine:
SELECT BG.PEOPLE_ID, MAX(BG.DATE_RUN)
FROM PKS_BGCHECK BG
GROUP BY BG.PEOPLE_ID;
So just wondering if anyone can help me figure out help me query the date run and status and both reflecting the latest date.
The best solution depends on which RDBMS you are using.
Here is one with basic, standard SQL:
SELECT bg.PEOPLE_ID, bg.DATE_RUN, bg.STATUS
FROM (
SELECT PEOPLE_ID, MAX(DATE_RUN) AS MAX_DATERUN
FROM PKS_BGCHECK
GROUP BY PEOPLE_ID
) sub
JOIN PKS_BGCHECK bg ON bg.PEOPLE_ID = sub.PEOPLE_ID
AND bg.DATE_RUN = sub.MAX_DATERUN;
But you can get multiple rows per PEOPLE_ID if there are ties.
In Oracle, Postgres or SQL Server and others (but not MySQL) you can also use the window function row_number():
WITH cte AS (
SELECT PEOPLE_ID, DATE_RUN, STATUS
, ROW_NUMBER() OVER(PARTITION BY PEOPLE_ID ORDER BY DATE_RUN DESC) AS rn
FROM PKS_BGCHECK
)
SELECT PEOPLE_ID, DATE_RUN, STATUS
FROM cte
WHERE rn = 1;
This guarantees 1 row per PEOPLE_ID. Ties are resolved arbitrarily. Add more expressions to ORDER BY to break ties deterministically.
In Postgres, the simplest solution would be with DISTINCT ON.
Details for both in this related answer:
Select first row in each GROUP BY group?
Selecting the latest row in a time-sensitive set is fairly easy and largely platform independent:
SELECT BG.PEOPLE_ID, BG.DATE_RUN, BG.STATUS
FROM PKS_BGCHECK BG
WHERE BG.DATE_RUN =(
SELECT MAX( DATE_RUN )
FROM PKS_BGCHECK
WHERE PEOPLE_ID = BG.PEOPLE_ID
AND DATE_RUN < SYSDATE );
If the PK is (PEOPLE_ID, DATE_RUN), the query will execute about as quickly as any other method. If they don't form the PK (why not???) then use them to form a unique index. But I'm sure you're already doing one or the other.
Btw, you don't really need the and part of the sub-query if you don't allow future dates to be entered. Some temporal implementations allow for future dates (planned or scheduled events) so I'm used to adding it.

How to update a row based on min/max values in related rows

I tried to make the title as generic as possible, but I do have a very specific example of this in mind:
I have a table Table in which my rows have a StartDate and an EndDate. Each row will also be associated with an ID. For the sake of simplicity, let's assume to start that every EndDate is currently NULL.
I want to populate the EndDates with the following logic:
The EndDate for row X should correspond to the minimum of the StartDates of all other rows having the same ID as row X and having StartDates greater than the StartDate of row X.
So far the only solution I've come up with involves looping row by row and doing update statements row by row, which has terrible performance. I'm a little lost on this one. I loop over something like the following, using a temporary table that holds the rows I'm interested in (the ones with null end dates):
UPDATE BaseTable
SET EffectiveEndDate=Minimum.Date
from (
select min(BaseTable.StartDate) as date, TempTable as RowId
FROM TempTable INNER JOIN BaseTable
on BaseTable.ID=TempTable.ID
where TempTable.row=#row
and BaseTable.StartDate > TempTable.StartDate
group by TempTable) Minimum
where BaseTable.Id=Minimum.RowId
It would help if you would specify your RDBMS, but with SQL Server 2012 you can use the analytical functions LEAD and LAG. Based on your description I think the following would work:
SELECT
id,
startdate,
LAG(startdate) OVER (PARTITION BY id ORDER BY startdate DESC)
FROM Table
ORDER BY id, startdate DESC;
SQL Fiddle
EDIT:
The same should be possible with older versions of SQL Server, but you have to write some code to simulate the LAG function as it is new in SQL Server 2012.
Here is an example of how to do this:
update t
set EndDate = (select min(t2.StartDate)
from t t2
where t2.id = t.id and
t2.StartDate > t.StartDate
)
where EndDate is null;
This is standard SQL so it should run in most databases. In MySQL, you might need an additional level of subqueries.

Query to get the duration and details from a table

I have a scenario and not quite sure how to query it. As a sample, I have following table structure and want to get the history of the action for bus:
ID-----TIME---------BUSID----OPID----MOVING----STOPPED----PARKED----COUNT
1------10:10:10-----101------1101-----1---------0----------0---------15
2------10:10:11-----102------1102-----0---------1----------0---------5
3------10:11:10-----101------1101-----1---------0----------0---------15
4------10:12:10-----101------1101-----0---------1----------0---------15
5------10:13:10-----101------1101-----1---------0----------0---------19
6------10:14:10-----101------1101-----1---------0----------0---------19
7------10:15:10-----101------1101-----0---------1----------0---------19
8------10:16:10-----101------1101-----0---------0----------1---------0
9------10:17:10-----101------1101-----0---------0----------1---------0
I want to write a query to get the status of a bus like:
BUSID----OPID----STATUS-----TIME---------DURATION---COUNT
101------1101----MOVING-----10:10:10-----2-----------15
101------1101----STOPPED----10:12:10-----1-----------15
101------1101----MOVING-----10:13:10-----2-----------19
101------1101----STOPPED----10:15:10-----1-----------19
101------1101----PARKED-----10:16:10-----2-----------0
I am using SQL Server 2008.
Thanks for your help.
You can use Common Table Expressions to calculate the duration between the different rows.
WITH cte_log AS
(
SELECT
Row_Number()
OVER
(
ORDER BY time DESC
)
AS
id, time, busid, opid, moving, stopped, parked, count
FROM
log_table
WHERE
busid = 101
)
SELECT
current_rows.busid,
current_rows.opid,
current_rows.time,
DATEDIFF(second, current_rows.time, previous_rows.time) AS duration
current_rows.count
FROM
cte_log_position AS current_rows
LEFT OUTER JOIN
log_table AS previous_rows ON ((current_rows.row_id + 1) = previous_rows.row_id)
WHERE
current_rows.busid = 101
ORDER BY
current_rows.time DESC;
The WITH statement creates a temporary result set that is defined within the execution scope of this query. We are using it to fetch the previous records of each row and to calculate the time difference between the the current and the previous record.
This example was not tested, and it may not work perfectly, but I hope it gets you going in the correct direction. Feel free to leave feedback.
You may also want to check the following external links on how to use Common Table Expressions:
SQL Select Next Row and SQL Select Previous Row with Current Row using T-SQL CTE
Calculate Difference between current and previous rows... CTE and Row_Number() rocks!
4 Guys From Rolla: Common Table Expressions (CTE) in SQL Server 2005
MSDN: Using Common Table Expressions
personally i would denormalize the data so you have start_time and end_time in the one row. this will make the query much more efficient.
I don't have access to SQL Server at the moment, so there may be syntax errors in the following:
SELECT
BUSID,
OPID,
IF (MOVING = 1) 'MOVING' ELSE IF (STOPPED = 1) 'STOPPED' ELSE 'PARKED' AS STATUS
TIME,
COUNT
FROM BUS_DATA_TABLE
GROUP BY BUSID
ORDER BY TIME
You'll note that this does not include duration. Until you order your data, you don't know which is the previous entry. Once the data is ordered you can calculate the duration as the difference between the times in consecutive records. You could do this by SELECTing into a new table and then running a second query.
Grouping by BUSID, should give you your report for all buses.
Making certain assumptions about column type, etc:
SELECT
BUSID,
OPID,
STATUS,
TIME,
DURATION,
COUNT
FROM
TABLENAME
WHERE
BUSID = 1O1
ORDER BY
TIME
;