Find the max date to last one year transaction for each group - sql

I have to query in sql server where I have to find for each id it's volume such that we have last 1 year date for each id with it's volume.
for example below is my data ,
for each id I need to query the last 1 year transaction from when we have the entry for that id as you can see from the snippet for id 1 we have the latest date as 7/31/2020 so I need the last 1 year entry from that date for that id, The highlighted one is exclude because that date is more than 1 year from the latest date for that id
Similarly for Id 3 we have all the date range in one year from the latest date for that particular id
I tried using the below query and I can get the latest date for each id but I am not sure how to extract all the dates for each id from the latest date to one year, I would appreciate if some one could help me.
I am using Microsoft sql server would need the query which executes in sql server, Table name is emp and have millions of id
Select *
From emp as t
inner join (
Select tm.id, max(tm.date_tran) as MaxDate
From emp tm
Group by tm.id
) tm on t.id = tm.id and t.date_tran = tm.MaxDate

To exclude transactions where the date difference between the tran_date and the maximum tran_date for each id is greater than 1 year, something like this:
;with max_cte(id, max_date) as (
Select id, max(date_tran)
From emp tm
Group by id )
Select *
From emp e
join max_cte mc on e.id=mc.id
and datediff(d, e.date_tran, mc.max_date)<=365;
Update: per comments, added volume. Thnx GMB :)
;with max_cte(id, date_tran, volume, max_date) as (
Select *, dateadd(year, -1, max(date_tran) over(partition by id)) max_date
From #emp tm)
Select id, sum(volume) sum_volume
From max_cte mc
where mc.date_tran>max_date
group by id;

You can do this with window functions:
select id, sum(volume) total_volume
from (
select t.*, max(date_tran) over(partition by id) max_date_tran
from mytable t
) t
where date_tran > dateadd(year, -1, max_date_tran)
group by id
Alternatively, you can use a correlated subquery for filtering:
select id, sum(volume) total_volume
from mytable t
where t.date_tran > (
select dateadd(year, -1, max(t1.date_tran))
from mytable t1
where t1.id = t.id
)
The second query would take advantage of an index on (id, date_tran).

this should do the trick for you:
SELECT
*
FROM
emp
JOIN
(
SELECT
MAX(date_tran) max_date_tran
, Id
FROM
emp
GROUP BY
id
) emp2
ON emp2.Id = emp.Id
AND DATEADD(YEAR, -1, emp2.max_date_tran) <= emp.date_tran;

Your code is good. Just add the date difference function to get the particular time in between the transaction, like the following:
Select *
From emp as t
inner join ( Select id as id, max(date_tran) as maxdate
From emp tm
Group by id
) tm on t.id = tm.id and datediff(d, e.date_tran, mc.maxdate)<=365;

Related

Filter SQL Server Records by Latest Date on Every Year

How would I filter this SQL server database so only the green records are left aka the last recorded date every year for each Customer ID field.
If you want to get the rows, not only the date values, using ROW_NUMBER() is an option (you only need to use the appropriate PARTITON BY and ORDER BY clauses):
SELECT *
FROM (
SELECT
CustomerId,
[Date],
ROW_NUMBER() OVER (PARTITION BY CustomerId, YEAR[Date] ORDER BY [Date] DESC) AS Rn
FROM YourTable
) t
WHERE Rn = 1
To check the maximum date in the year, you can write a query to get for each year the date where not exists another (in the same year), as follow:
SELECT *
FROM yourtable t1
WHERE NOT EXISTS
(SELECT 1
FROM yourtable t2
WHERE t1.customerID = t2.customerID
AND t1.date > t2.date
AND DATEPART(YEAR, t1) = DATEPART(YEAR, t2))
If you have only two columns, then you can just use aggregation:
select customer_id, max(date)
from t
group by customer_id, year(date);

How to modify my T-SQL query so that it outputs all records which appear twice or more based on 2 different periods?

I am using SQL Server 2014 and I have the following T-SQL query:
Use MyDatabase
;WITH Query_CTE AS
(
SELECT
ResID, Name,
ArrivalDate, Status,
ProfileID,
ROW_NUMBER() OVER(PARTITION BY [ResID] ORDER BY [StayDate]) AS xy
FROM
(SELECT *
FROM View1) xx
)
SELECT *
FROM Query_CTE
WHERE Query_CTE.[xy] = 1
I need to modify the above query so that it outputs all the records whose ArrivalDate is between '2018-04-01'and '2018-12-31' which are also present in the list of records having their ArrivalDate less than '2018-04-01' based on ProfileID.
How can I do this?
First you need to a where clause to your CTE, to get only the records where arrival date is between 2018-04-01 and 2018-12-31.
Then you need to add EXISTS to check if the same profile id also exists in records before 2018-04-01:
;WITH Query_CTE AS
(
SELECT
ResID
,Name
,ArrivalDate
,Status
,ProfileID
,ROW_NUMBER() OVER(PARTITION BY [ResID] ORDER BY [StayDate]) AS xy
FROM View1 v1
WHERE ArrivalDate >= '2018-04-01'
AND ArrivalDate <= '2018-12-31'
AND EXISTS
(
SELECT 1
FROM View1 v2
WHERE v2.ProfileID = v1.ProfileID
AND v2.ArrivalDate < '2018-04-01'
)
)
SELECT * FROM Query_CTE
WHERE Query_CTE.[xy] = 1
Side note: The derived table in the cte is completely redundant, so I've removed it.

Calculating per day in SQL

I have an sql table like that:
Id Date Price
1 21.09.09 25
2 31.08.09 16
1 23.09.09 21
2 03.09.09 12
So what I need is to get min and max date for each id and dif in days between them. It is kind of easy. Using SQLlite syntax:
SELECT id,
min(date),
max(date),
julianday(max(date)) - julianday(min(date)) as dif
from table group by id
Then the tricky one: how can I receive the price per day during this difference period. I mean something like this:
ID Date PricePerDay
1 21.09.09 25
1 22.09.09 0
1 23.09.09 21
2 31.08.09 16
2 01.09.09 0
2 02.09.09 0
2 03.09.09 12
I create a cte as you mentioned with calendar but dont know how to get the desired result:
WITH RECURSIVE
cnt(x) AS (
SELECT 0
UNION ALL
SELECT x+1 FROM cnt
LIMIT (SELECT ((julianday('2015-12-31') - julianday('2015-01-01')) + 1)))
SELECT date(julianday('2015-01-01'), '+' || x || ' days') as date FROM cnt
p.s. If it will be in sqllite syntax-would be awesome!
You can use a recursive CTE to calculate all the days between the min date and max date. The rest is just a left join and some logic:
with recursive cte as (
select t.id, min(date) as thedate, max(date) as maxdate
from t
group by id
union all
select cte.id, date(thedate, '+1 day') as thedate, cte.maxdate
from cte
where cte.thedate < cte.maxdate
)
select cte.id, cte.date,
coalesce(t.price, 0) as PricePerDay
from cte left join
t
on cte.id = t.id and cte.thedate = t.date;
One method is using a tally table.
To build a list of dates and join that with the table.
The date stamps in the DD.MM.YY format are first changed to the YYYY-MM-DD date format.
To make it possible to actually use them as a date in the SQL.
At the final select they are formatted back to the DD.MM.YY format.
First some test data:
create table testtable (Id int, [Date] varchar(8), Price int);
insert into testtable (Id,[Date],Price) values (1,'21.09.09',25);
insert into testtable (Id,[Date],Price) values (1,'23.09.09',21);
insert into testtable (Id,[Date],Price) values (2,'31.08.09',16);
insert into testtable (Id,[Date],Price) values (2,'03.09.09',12);
The SQL:
with Digits as (
select 0 as n
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
),
t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
Dates as (
select Id, date(MinDate,'+'||(d2.n*10+d1.n)||' days') as [Date]
from (
select Id, min([Date]) as MinDate, max([Date]) as MaxDate
from t
group by Id
) q
join Digits d1
join Digits d2
where date(MinDate,'+'||(d2.n*10+d1.n)||' days') <= MaxDate
)
select d.Id,
(substr(d.[Date],9,2)||'.'||substr(d.[Date],6,2)||'.'||substr(d.[Date],3,2)) as [Date],
coalesce(t.Price,0) as Price
from Dates d
left join t on (d.Id = t.Id and d.[Date] = t.[Date])
order by d.Id, d.[Date];
The recursive SQL below was totally inspired by the excellent answer from Gordon Linoff.
And a recursive SQL is probably more performant for this anyway.
(He should get the 15 points for the accepted answer).
The difference in this version is that the datestamps are first formatted to YYYY-MM-DD.
with t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
cte as (
select Id, min([Date]) as [Date], max([Date]) as MaxDate from t
group by Id
union all
select Id, date([Date], '+1 day'), MaxDate from cte
where [Date] < MaxDate
)
select cte.Id,
(substr(cte.[Date],9,2)||'.'||substr(cte.[Date],6,2)||'.'||substr(cte.[Date],3,2)) as [Date],
coalesce(t.Price, 0) as PricePerDay
from cte
left join t
on (cte.Id = t.Id and cte.[Date] = t.[Date])
order by cte.Id, cte.[Date];

Efficiently group by column aggregate

SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
GROUP BY date, id
HAVING sum(revenue)>1000
Returns rows that have revenue>1000.
SELECT date, id, sum(revenue)
FROM table
WHERE date between '2013-01-01' and '2013-01-08'
AND id IN (SELECT id FROM table where date between '2013-01-01' and '2013-01-08' GROUP BY id HAVING sum(revenue)>1000)
GROUP BY date, id
Returns rows for id's whose total revenue over the date period is >1000 as desired. But this query is much slower. Any quicker way to do this?
Make sure you have indexes on the date and id columns, and try this variation:
select t.date, t.id, sum(t.revenue)
from table t
inner join (
select id
from table
where date between '2013-01-01' and '2013-01-08'
group by id
having sum(revenue) > 1000
) ts on t.id = ts.id
where t.date between '2013-01-01' and '2013-01-08'
group by t.date, t.id
it's not MySQL, it's Vertica ;)
Cris, what projection and order by you using in CREATE TABLE ???
Do you try using database designer
see http://my.vertica.com/docs/6.1.x/HTML/index.htm#14415.htm

Total Count of Active Employees by Date

I have in the past written queries that give me counts by date (hires, terminations, etc...) as follows:
SELECT per.date_start AS "Date",
COUNT(peo.EMPLOYEE_NUMBER) AS "Hires"
FROM hr.per_all_people_f peo,
hr.per_periods_of_service per
WHERE per.date_start BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
AND per.date_start BETWEEN :PerStart AND :PerEnd
AND per.person_id = peo.person_id
GROUP BY per.date_start
I was now looking to create a count of active employees by date, however I am not sure how I would date the query as I use a range to determine active as such:
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.current_employee_flag = 'Y'
and TRUNC(sysdate) BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
Here is a simple way to get started. This works for all the effective and end dates in your data:
select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
It works by adding one person for each start and subtracting one for each end (via num) and doing a cumulative sum. This might have duplicates dates, so you might also do an aggregation to eliminate those duplicates:
select thedate, max(numActives)
from (select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
) t
group by thedate;
If you really want all dates, then it is best to start with a calendar table, and use a simple variation on your original query:
select c.thedate, count(*) as NumActives
from calendar c left outer join
hr.per_periods_of_service pos
on c.thedate between pos.effective_start_date and pos.effective_end_date
group by c.thedate;
If you want to count all employees who were active during the entire input date range
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.[EFFECTIVE_START_DATE] <= :StartDate
AND (peo.[EFFECTIVE_END_DATE] IS NULL OR peo.[EFFECTIVE_END_DATE] >= :EndDate)
Here is my example based on Gordon Linoff answer
with a little modification, because in SUBSTRACT table all records were appeared with -1 in NUM, even if no date was in END DATE = NULL.
use AdventureWorksDW2012 --using in MS SSMS for choosing DATABASE to work with
-- and may be not work in other platforms
select
t.thedate
,max(t.numActives) AS "Total Active Employees"
from (
select
dates.thedate
,SUM(dates.num) over (order by dates.thedate) as numActives
from
(
(
select
StartDate as thedate
,1 as num
from DimEmployee
)
union all
(
select
EndDate as thedate
,-1 as num
from DimEmployee
where EndDate IS NOT NULL
)
) AS dates
) AS t
group by thedate
ORDER BY thedate
worked for me, hope it will help somebody
I was able to get the results I was looking for with the following:
--Active Team Members by Date
SELECT "a_date",
COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo,
(SELECT DATE '2012-04-01'-1 + LEVEL AS "a_date"
FROM dual
CONNECT BY LEVEL <= DATE '2012-04-30'+2 - DATE '2012-04-01'-1
)
WHERE peo.current_employee_flag = 'Y'
AND "a_date" BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
GROUP BY "a_date"
ORDER BY "a_date"