MAX() for 2 Dates Separately - sql

I am trying to find a way to get the Max Date from one field and then to remove duplication get the Max of those dates from another field.
So far I have managed to get the Max of the Effective Dates, but need to get the Max timestamp from those values to remove duplication.
Here is what I have so far:
SELECT
a2.CUST_ID
, Address
, Effective_Date --DATE variable
, Timestamp_Entry --DATETIME variable
FROM
(SELECT
CUST_ID
, MAX (Effective_Date) as Most_Effective_Date
FROM Address_Table
GROUP BY CUST_ID) a1
JOIN Address_Table a2
ON a1.CUST_ID = a2.CUST_ID and a1.Most_Effective_Date = a2.Effective_Date
(Some timestamp entrys may be newer entries with older effective date, which is why the Effective Date takes priority, and then the TimeStamp should remove duplicates

I think this is what you want:
select a.*
from (select a.*,
row_number() over (partition by cust_id order by effective_date desc, timestamp_entry desc) as seqnum
from address_table a
) a
where seqnum = 1;
This returns the "most recent" address for each customer based on the two columns.

Related

Hive joining columns with milliseconds

I have a table having columns id,create_time,code.
create_time column is of type string having timestamp value in the format yyyy-MM-dd HH:mm:ss.SSSSSS
Now my requirement is to find the latest code(recent create_time) for each id. If the create_time column has no milliseconds part, I can do
select id,create_time,code from(
select id,max(unix_timestamp(create_time,"yyyy-MM-dd HH:mm:ss")) over (partition by id) as latest_time from table)a
join table b on a.latest_time=b.create_time
As unix time functions consider only seconds not milliseconds, am not able to proceed with them.
Please help
Why would you try to convert at all? Since you are only looking for the latest timestamp I would just do:
select id,create_time,code from(
select id,max(create_time) over (partition by id) as latest_time from table)a
join table b on a.latest_time=b.create_time
The ones without miliseconds will be treated, as they would have "000000" instead.
You do not need join for this.
If you need all records with max(create_time), use rank() or dense_rank(). Rank will assign 1 to all records with the latest create_time if there are many records with the same time.
If you need only one record per id even it there are many records with create_time=max(create_time), then use row_number() instead of rank():
select id,create_time,code
from
(
select id,create_time,code,
rank() over(partition by id order by create_time desc) rn
)s
where rn=1;

Test whether MIN would work over ROW_NUMBER

Situation:
I have three columns:
id
date
tx_id
The primary id column is tx_id and is unique in the table. Each tx_id is tied to an id and it has a record date. I would like to test whether or not the tx_id is incremental.
Objective:
I need to extract the first tx_id by id but I want to prevent using ROW_NUMBER
i.e
select id, date, tx_id, row_number() over(partition by id order by date asc) as First_transaction_id from table
and simply use
select id, date, MIN(tx_id) as First_transaction_id from table
So how can i make sure since i have more than 50 millions of ids that by using MINtx_id will yield the earliest transaction for each id?
How can i add a flag column to segment those that don't satisfy the condition?
how can i make sure since i have more than 50 millions of ids that by using MINtx_id will yield the earliest transaction for each id?
Simply do the comparison:
You can get the exceptions with logic like this:
select t.*
from (select t.*,
min(tx_id) over (partition by id) as min_tx_id,
rank() over (partition by id order by date) as seqnum
from t
) t
where tx_id = min_tx_id and seqnum > 1;
Note: this uses rank(). It seems possible that there could be two transactions for an id on the same date.
use corelated sunquery
select t.* from table_name t
where t.date= ( select min(date) from table_name
t1 where t1.id=t.id)

SQL: select next available date for multiple records

I have an oracle DB.
My table has ID and DATE columns (and more).
I would like to select for every ID the next available record after a certain date. For only one ID the query would be:
SELECT * FROM my_table
WHERE id = 1 AND date >= '01.01.2018'
(just ignoring the to_date() function)
How would that look like for multiple IDs? And I do want to SELECT *.
Thanks!
We can use ROW_NUMBER here:
SELECT ID, date -- and maybe other columns
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY date) rn
FROM my_table
WHERE date >= date '2018-01-01'
) t
WHERE rn = 1
The idea here is to assign a row number to each ID partition, starting with the earliest date which occurs after the cutoff you specify. The first record from each partition would then be the immediate next date, assuming it exists.

Nested SQL Server Query Max Date

Ladies and Gents,
I need to write a query that grabs data from a view, but I'm not sure how to go about this. The issue is there is really no key and there are two fields I'm concerned with that will control what rows I need to retrieve.
The view looks something like this:
Category columna columnb uploaddate
-----------------------------------------------------
a value value 1/30/2013 04:04:04:000
a value value 1/29/2013 04:04:04:000
b value value 1/28/2013 01:23:04:000
b value value 1/30/2013 04:04:04:000
b value value 1/30/2013 04:04:04:000
c value value 1/30/2013 01:01:01:000
c value value 1/30/2013 01:01:01:000
What I need to retrieve is all rows for each unique category and the newest uploaddate. So in the example above I would get 1 row for category a which would have the newest uploaddate. Category b would have 2 rows which have the 1/30/2013 date. Category c would have two rows also.
I also need to just compare the date of upload, not the time. As the loading can take a couple seconds. I was trying to use max date but it would only grab the time to the second.
Any guidance/thoughts would be great.
Thanks!
EDIT:
Here is what I threw together so far and I think it's close but it's not working yet and I doubt this is the most efficient way to do this.
select
*
from
VIEW c
INNER JOIN
(
SELECT
Category,
MAX(CONVERT(DateTime, Convert(VarChar, UploadDate, 101))) as maxuploaddate
FROM
View
GROUP BY
Category,
UploadDate
) temp ON temp.Category = c.Category AND CONVERT(VarChar, UploadDate, 101) = temp.maxuploaddate
The problem lies in the nested selected statement as it's still grabbing all combinations of Category and Upload date. Is there a way to do a distinct on the Category and UploadDate, just getting the newest combination?
Thanks Again
Your query is close, you have a mistake in the group by. I'd also get rid of the date conversions; date comparisons work fine.
select
*
from
VIEW c
INNER JOIN
(
SELECT
Category,
MAX(UploadDate) as maxuploaddate
FROM
View
GROUP BY
Category
) temp ON temp.Category = c.Category AND UploadDate = temp.maxuploaddate
If you want to do this to the nearest date, you need to convert to a date first. In SQL Server syntax:
select *
from (select category, columna, columnb, uploaddate,
rank() over ( partition by category order by cast(uploaddate as date) desc) as seqnum
from view
) v
where seqnum = 1
In Oracle syntax:
select *
from (select category, columna, columnb, uploaddate,
rank() over ( partition by category order by to_char(uploaddate, 'YYYY-MM-DD') desc) as seqnum
from view
) v
where seqnum = 1
Because you want ties, these use rank() instead of row_number().
In Oracle you can use Rank() to achieve this. Rank() creates a duplicate number if the same criteria are met.
Edit: And you can use Trunc() to "trim" the time from the uploaddate.
select *
from (select category, columna, columnb, uploaddate,
rank() over ( partition by category order by trunc(uploaddate) desc) rank
from view)
where rank = 1
Also Dense_Rank() exists, which won't create duplicate numbers. So this is not applicable here. See this question for more info on the differences.

Oracle query needs to return the highest date from result

I have a really big query in which makes some troubles for me because one join can return several rows. I only want the latest row (identified by a date field) in this result set, but I cant seem to put together the correct query to make it work.
The query I need MAX date from is:
SELECT custid,reason,date FROM OPT opt WHERE opt.custid = 167043;
Teh custid is really found through a join, but for simplicity I've added it to the where clause here. This query produces the following result:
custid grunn date
167043 "Test 1" 19.10.2005 12:33:18
167043 "Test 2" 28.11.2005 16:23:35
167043 "Test 3" 14.06.2010 15:43:16
How can I retrieve only one record from this resultset? And that record is the one with the highest date? Ultimately Im putting this into a big query which does alot of joins, so hopefully I can use this example into my bigger query.
You can do this:
SELECT * FROM
( SELECT custid,reason,date FROM OPT opt WHERE opt.custid = 167043
ORDER BY date DESC
)
WHERE ROWNUM = 1;
You can solve it by using analytic functions. Try something like this:
select custid
,reason
,date
from (select custid
,reason
,date
,row_number() over(partition by cust_id order by date desc) as rn
from opt)
where rn = 1;
This is how it works: The resultset is divided into groups of cust_id (partition by). In each group, the rows will be sorted by the date column in descending order (order by). Each row within the group will be assigned a sequence number (row_number) from 1 to N.
This way the row with the highest value for date will be assigned 1, the second latest 2, third latest 3 etc..
Finally, I just pick the rows with nr = 1, which basically filters out the other rows.
Or another way using the LAST function in its aggregate form.
with my_source_data as (
select 167043 as custid, 'Test 1' as reason, date '2010-10-01' as the_date from dual union all
select 167043 as custid, 'Test 2' as reason, date '2010-10-02' as the_date from dual union all
select 167043 as custid, 'Test 3' as reason, date '2010-10-03' as the_date from dual union all
select 167044 as custid, 'Test 1' as reason, date '2010-10-01' as the_date from dual
)
select
custid,
max(reason) keep (dense_rank last order by the_date) as reason,
max(the_date)
from my_source_data
group by custid
I find this quite useful as it rolls the process of finding the last row and the value all into one. The use of MAX (or another aggregate function such as MIN) in case that the combination of the grouping and the order by is not deterministic.
This function will basically take the contents of the column based on the grouping, order it by the ordering given then take the last value.
rather than using row_number() I think it's better to select what you actually want to select (e.g. the last date)
SELECT custid
, reason
, date
from
(
SELECT custid
, reason
, date
, max(opt.date) over (partition by opt.custid order by opt.date) last_date
FROM OPT opt
WHERE opt.custid = 167043;
)
where date = last_date
both solutions with ROW_NUMBER and KEEP are good. I would tend to prefer ROW_NUMBER when retrieving a large number of columns, and keep KEEP for one or two columns, otherwise you will have to deal with duplicates and the statement will get pretty unreadable.
For a small number of columns however, KEEP should perform better