Retrieve max date for distinct IDs in a table [duplicate] - sql

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
GROUP BY with MAX(DATE) [duplicate]
(6 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Oracle SQL query: Retrieve latest values per group based on time [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
Closed 3 years ago.
I have the table ABC with the following data
Id Name Date Execution id
-- ---- --------- -------------
1 AA 09SEP2019 11
1 AA 08SEP2019 22
1 AA 07SEP2019 33
2 BB 09SEP2019 44
2 BB 08SEP2019 55
2 BB 07SEP2019 66
And I want to get for every distinct ID in the table its max date. So the result set must be as the following
Id Name Date Execution id
-- ---- --------- -------------
1 AA 09SEP2019 11
2 BB 09SEP2019 44
The query that returns the result I need
WITH MaxDate as (
SELECT Id,Name,Max(Date) from ABC group by Id,Name
)
SELECT view1.*, view2.exection_id
from
MaxDate view1,
ABC view2
WHERE
view1.date=view2.date and
view1.name=view2.name;
I don't like to get the max date for the distinct ID by this way. May be there is another way ? Might be there is more easiest way?

One way is to use RANK:
WITH cte AS (
SELECT ABC.*, RANK() OVER(PARTITION BY Id,Name ORDER BY Date DESC) rnk
FROM ABC
)
SELECT *
FROM cte
WHERE rnk = 1
ORDER BY id;

You can use keep dense_rank last do to this in one level of query, as long as you only want one or a small number of column retained:
select id,
name,
max(date_) as date_,
max(execution_id) keep (dense_rank last order by date_) as execution_id
from abc
group by id, name
order by id;
ID NAME DATE_ EXECUTION_ID
---------- ---- ---------- ------------
1 AA 2019-09-09 11
2 BB 2019-09-09 44
If ID and name are not always the same, and you want the name form the latest date too, then use the same pattern:
select id,
max(name) keep (dense_rank last order by date_) as name,
max(date_) as date_,
max(execution_id) keep (dense_rank last order by date_) as execution_id
from abc
group by id
order by id;
which gets the same result with your sample data.
With lots of columns it's probably simpler to use a subquery (CTE or inline view) with a ranking function and a filter (as #Lukasz shows).

With NOT EXISTS:
select t.* from ABC t
where not exists (
select 1 from ABC
where "Id" = t."Id" and "Name" = t."Name" and "Date" > t."Date"
)
I used and name = t.name only because you have it in your code.
If it is not needed you can remove it.
See the demo.
Results:
Id | Name | Date | Execution id
-: | :--- | :---------| -----------:
1 | AA | 09-SEP-19 | 11
2 | BB | 09-SEP-19 | 44

Related

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

How to count row's in hive?

this is my table:
ID/Number/Date
1/111/2021-01-01
2/111/2021-01-02
6/333/2921-01-01
I need a table which count the rows based on Number order by Date asc.
This should be my final table:
ID/Number/Date/Row_No_Count
1/111/2021-01-01/1
2/111/2021-01-02/2
6/333/2921-01-01/1
How to achieve this with hive? Is their any function?
Row Number is a Function IN SQL Server for this type of Work.
You can solve Your Problem on based on below Query .
Query : Select *,row_number () Over (partition by Number order by Number) 'Row_Number_Count' From t ;
Output :
id Number Date Row_Number_Count
----------- ----------- ---------- --------------------
1 111 2021-01-01 1
2 111 2021-01-02 2
6 333 2921-01-01 1
(3 rows affected)
try row_number() window function like below.
select t.*,
row_number() over(partition by Number order by Number,Date asc ) as Row_No_Count
from table t

How to join two tables without performing Cartesian product in SQL

I have index_date information for IDs and I want to extract baseline ( information between index_date and Index_date minus 6 months). I want to do this without using Cartesian product.
Total Table
ID index_date detail
1 01Jan2012 xyz
1 01Dec2011 pqr
1 01Nov2010 pqr
2 26Feb2013 abc
3 02Mar2013 abc
3 02Feb2013 ert
3 02Jan2013 tyu
4 07May2015 rts
I have a table A extracted from Total which has the index_dates:
ID index_date index_detail
1 01Jan2012 xyz
2 26Feb2013 abc
3 02Mar2013 abc
4 07May2015 rts
I want to extract baseline periods data for IDs in A from from the Total table
Table want :
ID date index_date detail index_detail
1 01Jan2012 01Jan2012 xyz xyz
1 01Dec2011 01Jan2012 pqr xyz
2 26Feb2013 26Feb2013 abc abc
3 02Mar2013 02Mar2013 abc abc
3 02Feb2013 02Mar2013 ert abc
3 02Jan2013 02Mar2013 tyu abc
4 07May2015 07May2015 rts rts
code used :
create table want as
select a.* , b.date,b.detail
from table_a as a
right join
Total as b
on a.id = b.id where
a.index_date > b.date
AND b.date >= add_months( a.index_date, -6)
;
But this requires Cartesian Product. Is there a way to do this without requiring Cartesian product.
DBMS - Hive
Sorry, I don't know it.
I'll give the solution on pure SQL for MySQL 8+ - maybe you'll find the way to convert it to Hive syntax.
SELECT id,
index_date date,
FIRST_VALUE(index_date) OVER (PARTITION BY ID ORDER BY STR_TO_DATE(index_date, '%d%b%Y') DESC) index_date,
detail,
FIRST_VALUE(detail) OVER (PARTITION BY ID ORDER BY STR_TO_DATE(index_date, '%d%b%Y') DESC) index_detail
FROM test
ORDER BY 1 ASC, 2 DESC
fiddle
I would recommend three steps:
Convert the date to a number.
Find the minimum date in a six month period.
Get the first value in that group.
This looks like:
select t.*, t2.index_date, t2.detail
from (select t.*,
min(index_date) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_date
from (select t.*,
year(index_date) * 12 + month(index_date) as months
from total t
) t
) t left join
total t2
on t2.id = t.id and t2.index_date = t.sixmonth_date;
This is marginally simpler if first_value() accepts range window frames -- but I'm not sure if it does. It is worth trying, though:
select t.*,
min(index_date) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_date,
first_value(detail) over (partition by id
order by months
range between 6 preceding and current row
) as sixmonth_value
from (select t.*,
year(index_date) * 12 + month(index_date) as months
from total t
) t

How to get MAX Hike in Min month?

below is table:
Name | Hike% | Month
------------------------
A 7 1
A 6 2
A 8 3
b 4 1
b 7 2
b 7 3
Result should be:
Name | Hike% | Month
------------------------
A 8 3
b 7 2
Here is one way of doing this:
SELECT Name, [Hike%], Month
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Hike%] DESC, Month) rn
FROM yourTable
) t
WHERE rn = 1
ORDER BY Name;
If you instead want to return multiple records per name, in the case where two or more records might be tied for having the greatest hike%, then replace ROW_NUMBER with RANK.
use correlated subquery
select Name,min(Hike) as Hike,min(Month) as Month
from
(
select * from tablename a
where Hike in (select max(Hike) from tablename b where a.name=b.name)
)A group by Name
You can use something similar to the below:
SELECT Name, MAX(Hike), Month
FROM table
GROUP BY Name, Month
Hope this helps :)

Oracle SQL: retrieve sum and value from row in group based on value of another column

I am trying to create a summary query that returns the sum of the quantity for each group along with the description from from the row with the largest quantity in that group.
For example, if the table looks like this:
GROUP QTY DESC
----- --- ----
1 23 CCC
1 42 AAA
1 61 BBB
2 11 ZZZ
2 53 XXX
2 32 YYY
The query would return:
1 125 BBB (desc from row with largest qty for group 1)
2 95 XXX (desc from row with largest qty for group 2)
Thanks!
The window function row_number() is your friend for this type of query. It assigns a sequential number to values. You can then use this information in an aggregation:
select group, sum(qty), max(case when seqnum = 1 then desc end)
from (select t.*,
row_number() over (partition by group order by qty desc) as seqnum
from t
) t
group by group
By the way, group and desc are lousy names for columns because they conflict with reserved words. You should rename them or enclose them in double quotes in the query.