Choosing last biggest row by date [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 8 years ago.
I have sorted data by accNumber and Date (3rd coulmn) and grouped it by accNumber and Rest. For each account number there could be many dates (3rd column, sorted from smallest to largest). I want to select rows with highest date (3rd column) for each accNumber. Here is the code for sorting and grouping (from here I want to choose rows with largest date in 3rd columnfor each accNumber):
select a.accNumber, a.Rest, min(a.Date), max(b.Date)
from t1 a, t1 b
where a.Rest=b.Rest
and a.accnumber=b.accnumber
group by a.accNumber, a.Rest
order by a.accNumber, 3
I am using MS SQL. Thanks in advance

looks like you don't need join here, you can do it by window functions:
with cte as (
select
a.accNumber, a.Rest,
min(a.Date) over(partition by a.accNumber, a.Rest) as min_Date,
max(a.Date) over(partition by a.accNumber, a.Rest) as max_Date,
row_number() over(partition by a.accNumber order by a.Date desc) as rn
from t1 as a
)
select accNumber, Rest, min_Date, max_Date
from cte
where rn = 1
A bit hard to write code without test data, but this should do the trick

Please try:
SELECT
accNumber,
Rest,
Date,
MaxDate
FROM(
select
accNumber, Rest, Date,
MAX(Date) OVER (Partition by accNumber, Rest) MaxDate,
ROW_NUMBER() OVER(Partition by accNumber order by Date desc) RNum
from
t1
)x
WHERE RNum=1

Related

Hive - max (rather than last) date in quarter

I'm querying a table and only want to select the end of quarter dates, I've done so like this:
select
yyyy_mm_dd,
id
from
t1
where
yyyy_mm_dd = cast(date_add(trunc(add_months(yyyy_mm_dd,3-pmod(month(yyyy_mm_dd)-1,3)),'MM'),-1) as date) --last day of q
With daily rows, from 2020-01-01 until 2020-12-31, the above works fine. However, 2021 rows end up being omitted as the quarter is incomplete. How could I modify the where clause so I select the last day of each quarter and the max date in the current quarter?
You can assign a row number for each quarter in descending order of date, and filter the rows with row number equals 1 (last date in each quarter):
select yyyy_mm_dd, id
from
(select
yyyy_mm_dd,
id,
row_number() over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd) order by yyyy_mm_dd desc) as rn
from
t1
) t2
where rn = 1
It is not clear if you have multiple rows on the end-of-quarter dates. It might be safer to take the max and use that:
select t1.*
from (select t1.*,
max(yyyy_mm_dd) over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd)) as max_yyyy_mm_dd
from t1
) t1
where yyyy_mm_dd = max_yyyy_mm_dd;
Note that this uses t1.* for the select. If you only wanted the maximum date, you can aggregate:
select id, max(yyyy_mm_dd)
from t1
group by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd);

Filter SQL Server Records by Latest Date on Every Year

How would I filter this SQL server database so only the green records are left aka the last recorded date every year for each Customer ID field.
If you want to get the rows, not only the date values, using ROW_NUMBER() is an option (you only need to use the appropriate PARTITON BY and ORDER BY clauses):
SELECT *
FROM (
SELECT
CustomerId,
[Date],
ROW_NUMBER() OVER (PARTITION BY CustomerId, YEAR[Date] ORDER BY [Date] DESC) AS Rn
FROM YourTable
) t
WHERE Rn = 1
To check the maximum date in the year, you can write a query to get for each year the date where not exists another (in the same year), as follow:
SELECT *
FROM yourtable t1
WHERE NOT EXISTS
(SELECT 1
FROM yourtable t2
WHERE t1.customerID = t2.customerID
AND t1.date > t2.date
AND DATEPART(YEAR, t1) = DATEPART(YEAR, t2))
If you have only two columns, then you can just use aggregation:
select customer_id, max(date)
from t
group by customer_id, year(date);

How to aggregate the data into these form using SQL or Redshift?

For example: A ATM Machine fault data
s_id: Bank Branch
atm_id: Multiple atm on each branch
start_time: Ticket is created for fault occur
end_time: Ticket is closed
aggregate the overlapping data with group by s_id,atm_id
Raw Data
Output Required
This looks like a gaps-and-islands problem, because you want to identify "islands" of s_id, atm_id, and status that are on adjacent rows.
That suggests the difference of row numbers for this incarnation:
select s_id, atm_id, status_code,
min(start_time), max(end_time)
from (select t.*,
row_number() over (partition by s_id, atm_id order by start_time) as seqnum,
row_number() over (partition by s_id, atm_id, status_code order by start_time) as seqnum_s
from t
) t
group by s_id, atm_id, status_code, (seqnum - seqnum_s);
Why this finds adjacent rows with the same status is a little tricky to explain. However, if you look at the results of the subquery, I think you will see how the difference identifies the adjacent rows that you want to combine together.
Perhaps a self-join with some aggregation
SELECT t1.s_id, t1.atm_id,
MIN(t2.start_time) as start_time,
MAX(t2.end_time) as end_time
FROM YourTable t1
LEFT JOIN YourTable t2
ON t2.s_id = t1.s_id
AND t2.atm_id = t1.atm_id
AND t2.start_time <= t1.end_time
AND t2.end_time >= t1.start_time
GROUP BY t1.s_id, t1.atm_id

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

To find the last updated record of each month for each policy(another field)

I have a table named a, and other fields as eff_date,policy no.
Now for each policy, consider all the records, and take out the last updated one (eff_date) from each month.
So I need the last updated record for each month for each policy. How would I write a query for this?
I'm not 100 percent on Teradata syntax, but I believe you're after this:
SELECT policy_no,eff_date
FROM (SELECT policy_no,eff_date, ROW_NUMBER() OVER (PARTITION BY policy no, EXTRACT(YEAR FROM eff_date),EXTRACT(MONTH FROM eff_date) ORDER BY eff_date DESC) as RowRank
FROM a) as sub
WHERE RowRank = 1
I'm assuming when you say by month you also want to differentiate by year, but if not, just remove the EXTRACT(YEAR FROM eff_date) from the PARTITION BY section.
Edit: Update for Teradata syntax.
SELECT * from a
qualify ROW_NUMBER() OVER (PARTITION BY policy no, EXTRACT(YEAR FROM eff_date),
EXTRACT(MONTH FROM eff_date) ORDER BY eff_date DESC) = 1
The main difficulty, is that the group by needs to be made both the conbination of policy_no, but also the month (extracted from the date). For example:
In Mysql
SELECT policy_no,
month(eff_date),
year(eff_date),
max(eff_date)
FROM myTable
GROUP BY policy_no,
month(eff_date),
year(eff_date);
Update
I saw derived tables are allowed in teradata. Using a join to a derived table, here is how to access the full rows:
select * from a,
(SELECT policy_no,
month(eff_date),
year(eff_date),
max(eff_date) as MaxMonthDate
FROM a
GROUP BY policy_no,
month(eff_date),
year(eff_date)
) as b
where a.policy_no = b.policy_no and
a.eff_date = b.MaxMonthDate;
http://www.sqlfiddle.com/#!2/1f728/5
Update (Using Extract)
select * from a,
(SELECT a2.policy_no,
EXTRACT(MONTH FROM a2.eff_date),
EXTRACT(YEAR FROM a2.eff_date),
max(a2.eff_date) as MaxMonthDate
FROM a as a2
GROUP BY a2.policy_no,
EXTRACT(MONTH FROM a2.eff_date),
EXTRACT(YEAR FROM a2.eff_date)
) as b
where a.policy_no = b.policy_no and
a.eff_date = b.MaxMonthDate;
I'm going to suggest looking into Windows Aggregate functions and the QUALIFY statement. I believe the following SQL will work.
SELECT Policy_No
, EXTRACT(MONTH FROM Eff_Date) AS Eff_Month_
, Eff_Date
FROM TableA
QUALIFY ROW_NUMBER() OVER (PARTITION BY Policy_No, EXTRACT(MONTH FROM Eff_Date)
ORDER BY Eff_Date DESC) = 1;