Filtering for Max Date in SQL when Multiple Records have the Same Date - sql

I have a dataset that includes the following fields: building_id, lease_id, date_signed
I am trying to get the most recent lease signed for each building_id. The issue I'm running into is that several of the leases have the same date signed in the building and I only want to include one of them. I have been using SnowSQL and was looking into using the Rank window function, but that gives the same rank value to records that have the same date in the building.
Is there a way to pull only one value for the most recent lease date for each building even if there are multiple leases with that same date? I will also need to know which lease it is associated with. Thanks!

Use row_number() instead of rank(). Something like:
select t.*
from t
qualify row_number() over (partition by building_id order by date_signed desc) = 1;

Related

How to select 1 row per id?

I'm working with a table that has multiple rows for each order id (e.g. variations in spelling for addresses and different last_updated dates), that in theory shouldn't be there (not my doing). I want to select just 1 row for each id and so far I figured I can do that using partitioning like so:
SELECT dp.order_id,
MAX(cr.updated_at) OVER(PARTITION BY dp.order_id) AS updated_at
but I have seen other queries which only use MAX and list every other column like so
SELECT dp.order_id,
MAX(dp.ship_address) as address,
MAX(cr.updated_at) as updated_at
etc...
this solution looks more neat but I can't get it to work (still returns multiple rows per single order_id). What am I doing wrong?
If you want one row per order_id, then window functions are not sufficient. They don't filter the data. You seem to want the most recent row. A typical method uses row_number():
select t.*
from (select t.*,
row_number() over (partition by order_id order by created_at desc) as seqnum
from t
) t
where seqnum = 1;
You can also use aggregation:
select order_id, max(ship_address), max(created_at)
from t
group by order_id;
However, the ship_address may not be from the most recent row and that is usually not desirable. You can tweak this using keep syntax:
select order_id,
max(ship_address) keep (dense_rank first order by created_at desc),
max(created_at)
from t
group by order_id;
However, this gets cumbersome for a lot of columns.
The 2nd "solution" doesn't care about values in other columns - it selects their MAX values. It means that you'd get ORDER_ID and - possibly - "mixed" values for other columns, i.e. those ADDRESS and UPDATED_AT might belong to different rows.
If that's OK with you, then go for it. Otherwise, you'll have to select one MAX row (using e.g. row_number analytic function), and fetch data that is related only to it (i.e. doesn't "mix" values from different rows).
Also, saying that you
can't get it to work (still returns multiple rows per single order_id)
is kind of difficult to believe. The way you put it, it can't be true.

SQL to find best row in group based on multiple columns?

Let's say I have an Oracle table with measurements in different categories:
CREATE TABLE measurements (
category CHAR(8),
value NUMBER,
error NUMBER,
created DATE
)
Now I want to find the "best" row in each category, where "best" is defined like this:
It has the lowest errror.
If there are multiple measurements with the same error, the one that was created most recently is the considered to be the best.
This is a variation of the greatest N per group problem, but including two columns instead of one. How can I express this in SQL?
Use ROW_NUMBER:
WITH cte AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY category ORDER BY error, created DESC) rn
FROM measurements m
)
SELECT category, value, error, created
FROM cte
WHERE rn = 1;
For a brief explanation, the PARTITION BY clause instructs the DB to generate a separate row number for each group of records in the same category. The ORDER BY clause places those records with the smallest error first. Should two or more records in the same category be tied with the lowest error, then the next sorting level would place the record with the most recent creation date first.

SQL select the first match in an ordered list

Using Microsoft SQL Server Management Studio version 14.0.17213.0
I have a list of events that go in order. I want to select the highest precedent acct_no, complete_date and event.
My problem is if I use
select
account_number, event, max(complete_date) as mx_comp
from
mytable
where
event in ('event1','event2'....)
then I get all my acct_numbers, all the events in the list and the max complete date for that event. But I want acct_no listed with the maximum completed date for any item in the list and the associated event.
Furthermore, its wholly possible that two events occurred on the same date, so I cannot do
select *
from mytable mt
join
(select acct_number, max(complete_date)
from mytable) t on mt.acct_number = t.account_number
and mt.complete_date = t.complete_date
because if two events occurred on the same day then I still get duplicate results.
I have tried to do a similar thing with
row_number() over (order by account_number) as RowNum
but it did not work, because I still get matches to all the events, not just my highest precedence event
it really boils down to needing to return the acct_number, event and complete date associated to the highest importance match from items in an ordered list.
I am sure it is easy - I just cannot seem to figure it out and despite all my google and stack searching I simply cannot figure it out
I have recently been thinking that it might be possible with something like coalesce(mylist) because I would be able to put my list in order but I cannot figure out how to use coalesce in a meaningful way for this problem.
The real solution would be to create a table with precedence numbers or have a most recent indicator but I dont have unlimited access to create any tables I want.
Any help or ideas on how to match to an ordered list would be appreciated
You seem to want:
select t.*
from (select t.*,
row_number() over (partition by account_number order by complete_date desc) as seqnum
from mytable t
where event in ('event1', 'event2', ....)
) t
where seqnum = 1;

Return All Historical Account Records for Accounts with Change in Corresponding Value

I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
I've tried using the row_num function, as well as a reflexive join, but for some reason I'm not getting the expected results. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt. Also, I'm using PostgreSQL in a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2
If you want only the accounts, use aggregation:
select acct_id
from t
group by acct_id
having min(value) <> max(value);
Based on your description, you could also use count(*) >.
If you want the original records, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by acct_id) as cnt
from t
) t
where cnt > 1;

SQL Server: I have multiple records per day and I want to return only the first of the day

I have some records track inquires by DATETIME. There is an glitch in the system and sometimes a record will enter multiple times on the same day. I have a query with a bunch of correlated subqueries attached to these but the numbers are off because when there were those glitches in the system then these leads show up multiple times. I need the first entry of the day, I tried fooling around with MIN but I couldn't quite get it to work.
I currently have this, I am not sure if I am on the right track though.
SELECT SL.UserID, MIN(SL.Added) OVER (PARTITION BY SL.UserID)
FROM SourceLog AS SL
Here's one approach using row_number():
select *
from (
select *,
row_number() over (partition by userid, cast(added as date) order by added) rn
from sourcelog
) t
where rn = 1
You could use group by along with min to accomplish this.
Depending on how your data is structured if you are assigning a unique sequential number to each record created you could just return the lowest number created per day. Otherwise you would need to return the ID of the record with the earliest DATETIME value per day.
--Assumes sequential IDs
select
min(Id)
from
[YourTable]
group by
--the conversion is used to stip the time value out of the date/time
convert(date, [YourDateTime]