Filter SQL Server Records by Latest Date on Every Year - sql

How would I filter this SQL server database so only the green records are left aka the last recorded date every year for each Customer ID field.

If you want to get the rows, not only the date values, using ROW_NUMBER() is an option (you only need to use the appropriate PARTITON BY and ORDER BY clauses):
SELECT *
FROM (
SELECT
CustomerId,
[Date],
ROW_NUMBER() OVER (PARTITION BY CustomerId, YEAR[Date] ORDER BY [Date] DESC) AS Rn
FROM YourTable
) t
WHERE Rn = 1

To check the maximum date in the year, you can write a query to get for each year the date where not exists another (in the same year), as follow:
SELECT *
FROM yourtable t1
WHERE NOT EXISTS
(SELECT 1
FROM yourtable t2
WHERE t1.customerID = t2.customerID
AND t1.date > t2.date
AND DATEPART(YEAR, t1) = DATEPART(YEAR, t2))

If you have only two columns, then you can just use aggregation:
select customer_id, max(date)
from t
group by customer_id, year(date);

Related

Hive - max (rather than last) date in quarter

I'm querying a table and only want to select the end of quarter dates, I've done so like this:
select
yyyy_mm_dd,
id
from
t1
where
yyyy_mm_dd = cast(date_add(trunc(add_months(yyyy_mm_dd,3-pmod(month(yyyy_mm_dd)-1,3)),'MM'),-1) as date) --last day of q
With daily rows, from 2020-01-01 until 2020-12-31, the above works fine. However, 2021 rows end up being omitted as the quarter is incomplete. How could I modify the where clause so I select the last day of each quarter and the max date in the current quarter?
You can assign a row number for each quarter in descending order of date, and filter the rows with row number equals 1 (last date in each quarter):
select yyyy_mm_dd, id
from
(select
yyyy_mm_dd,
id,
row_number() over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd) order by yyyy_mm_dd desc) as rn
from
t1
) t2
where rn = 1
It is not clear if you have multiple rows on the end-of-quarter dates. It might be safer to take the max and use that:
select t1.*
from (select t1.*,
max(yyyy_mm_dd) over (partition by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd)) as max_yyyy_mm_dd
from t1
) t1
where yyyy_mm_dd = max_yyyy_mm_dd;
Note that this uses t1.* for the select. If you only wanted the maximum date, you can aggregate:
select id, max(yyyy_mm_dd)
from t1
group by id, year(yyyy_mm_dd), quarter(yyyy_mm_dd);

Find the max date to last one year transaction for each group

I have to query in sql server where I have to find for each id it's volume such that we have last 1 year date for each id with it's volume.
for example below is my data ,
for each id I need to query the last 1 year transaction from when we have the entry for that id as you can see from the snippet for id 1 we have the latest date as 7/31/2020 so I need the last 1 year entry from that date for that id, The highlighted one is exclude because that date is more than 1 year from the latest date for that id
Similarly for Id 3 we have all the date range in one year from the latest date for that particular id
I tried using the below query and I can get the latest date for each id but I am not sure how to extract all the dates for each id from the latest date to one year, I would appreciate if some one could help me.
I am using Microsoft sql server would need the query which executes in sql server, Table name is emp and have millions of id
Select *
From emp as t
inner join (
Select tm.id, max(tm.date_tran) as MaxDate
From emp tm
Group by tm.id
) tm on t.id = tm.id and t.date_tran = tm.MaxDate
To exclude transactions where the date difference between the tran_date and the maximum tran_date for each id is greater than 1 year, something like this:
;with max_cte(id, max_date) as (
Select id, max(date_tran)
From emp tm
Group by id )
Select *
From emp e
join max_cte mc on e.id=mc.id
and datediff(d, e.date_tran, mc.max_date)<=365;
Update: per comments, added volume. Thnx GMB :)
;with max_cte(id, date_tran, volume, max_date) as (
Select *, dateadd(year, -1, max(date_tran) over(partition by id)) max_date
From #emp tm)
Select id, sum(volume) sum_volume
From max_cte mc
where mc.date_tran>max_date
group by id;
You can do this with window functions:
select id, sum(volume) total_volume
from (
select t.*, max(date_tran) over(partition by id) max_date_tran
from mytable t
) t
where date_tran > dateadd(year, -1, max_date_tran)
group by id
Alternatively, you can use a correlated subquery for filtering:
select id, sum(volume) total_volume
from mytable t
where t.date_tran > (
select dateadd(year, -1, max(t1.date_tran))
from mytable t1
where t1.id = t.id
)
The second query would take advantage of an index on (id, date_tran).
this should do the trick for you:
SELECT
*
FROM
emp
JOIN
(
SELECT
MAX(date_tran) max_date_tran
, Id
FROM
emp
GROUP BY
id
) emp2
ON emp2.Id = emp.Id
AND DATEADD(YEAR, -1, emp2.max_date_tran) <= emp.date_tran;
Your code is good. Just add the date difference function to get the particular time in between the transaction, like the following:
Select *
From emp as t
inner join ( Select id as id, max(date_tran) as maxdate
From emp tm
Group by id
) tm on t.id = tm.id and datediff(d, e.date_tran, mc.maxdate)<=365;

Running count distinct

I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email 1#gmail.com was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
I would usually do this using row_number():
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag(). However, I get nervous using lag if the same email appears multiple times on the same date.
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
DEMO HERE
Here is a solution that does not uses sum over, neither lag... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1

Date Difference between consecutive rows adding additional columns

Say I added a Cost Difference column to the second table from Rishal (see the below link for this previous post), how would I also calculate and display that?
Using just the 1001 Account Number and adding the following amounts of ID1=$10, ID4=$33 and ID6=$50 to the first table, how would I display in Rishal's second table a result of $23 and $17 in addition to the other 3 columns that are already there?
I've used this code (from GarethD) and would like to insert my Cost Difference column within this...Thanks in advance,
SELECT ID,
AccountNumber,
Date,
NextDate,
DATEDIFF("D", Date, NextDate)
FROM ( SELECT ID,
AccountNumber,
Date,
( SELECT MIN(Date)
FROM YourTable T2
WHERE T2.Accountnumber = T1.AccountNumber
AND T2.Date > T1.Date
) AS NextDate
FROM YourTable T1
) AS T
Date Difference between consecutive rows
I would recommend using JOIN to bring in the entire next record:
SELECT T.*, DATEDIFF("D", t.Date, t.NextDate) as datediff,
TNext.Amount, (Tnext.Amount - T.Amount) as amountdiff
FROM (SELECT T1.*,
(SELECT MIN(Date)
FROM YourTable T2
WHERE T2.Accountnumber = T1.AccountNumber AND
T2.Date > T1.Date
) AS NextDate
FROM YourTable as T1
) AS T LEFT JOIN
YourTable as Tnext
ON t.Accountnumber = tnext.Accountnumber AND t.Date = tnext.Accountnumber;

Choosing last biggest row by date [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 8 years ago.
I have sorted data by accNumber and Date (3rd coulmn) and grouped it by accNumber and Rest. For each account number there could be many dates (3rd column, sorted from smallest to largest). I want to select rows with highest date (3rd column) for each accNumber. Here is the code for sorting and grouping (from here I want to choose rows with largest date in 3rd columnfor each accNumber):
select a.accNumber, a.Rest, min(a.Date), max(b.Date)
from t1 a, t1 b
where a.Rest=b.Rest
and a.accnumber=b.accnumber
group by a.accNumber, a.Rest
order by a.accNumber, 3
I am using MS SQL. Thanks in advance
looks like you don't need join here, you can do it by window functions:
with cte as (
select
a.accNumber, a.Rest,
min(a.Date) over(partition by a.accNumber, a.Rest) as min_Date,
max(a.Date) over(partition by a.accNumber, a.Rest) as max_Date,
row_number() over(partition by a.accNumber order by a.Date desc) as rn
from t1 as a
)
select accNumber, Rest, min_Date, max_Date
from cte
where rn = 1
A bit hard to write code without test data, but this should do the trick
Please try:
SELECT
accNumber,
Rest,
Date,
MaxDate
FROM(
select
accNumber, Rest, Date,
MAX(Date) OVER (Partition by accNumber, Rest) MaxDate,
ROW_NUMBER() OVER(Partition by accNumber order by Date desc) RNum
from
t1
)x
WHERE RNum=1