In general, I need to associate (group) records which are created in similar time periods. If it helps, thinking of the example below as clickstream data where there is no sessionID and I need to build those sessions.
I have the following dataset:
UserId INT,
EventId INT,
DateCreated DATETIME,
BlockId INT
Assume the following data:
{123, 111, '2009-12-01 9:15am', NULL}
{123, 222, '2009-12-01 9:20am', NULL}
{123, 333, '2009-12-01 9:25am', NULL}
{123, 444, '2009-12-03 2:30pm', NULL}
{123, 555, '2009-12-03 2:32pm', NULL}
What I need to do is divide these events up, by user, into temporal buckets. There is a business rule that says anything > 30 minutes should be a new bucket. In the above example, events 111-333 represent a block, i.e. not more than 30 minutes separates them. Likewise, events 444-555 represent a second block.
My current solution uses a cursor and is extremely slow (therefore, unsustainable for the amount of data I need to process). I can post the code but it is pretty simple.
Any ideas?
Hopefully this will get you going in the right direction. If you're in an SP then using table variables for the StartTimes and EndTimes should make the query much easier to read and understand. This will give you start and end times for your batches, then just join back to your table and you should have it.
;WITH StartTimes AS
(
SELECT DISTINCT
T1.DateCreated AS StartTime
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.UserID = T1.UserID AND
T2.EventID = T1.EventID AND
T2.DateCreated >= DATEADD(mi, -30, T1.DateCreated) AND
T2.DateCreated < T1.DateCreated
WHERE
T2.UserID IS NULL
)
SELECT
StartTimes.StartTime,
EndTimes.EndTime
FROM
(
SELECT DISTINCT
T3.DateCreated AS EndTime
FROM
My_Table T3
LEFT OUTER JOIN My_Table T4 ON
T4.UserID = T3.UserID AND
T4.EventID = T3.EventID AND
T4.DateCreated <= DATEADD(mi, 30, T3.DateCreated) AND
T4.DateCreated > T3.DateCreated
WHERE
T4.UserID IS NULL
) AS ET
INNER JOIN StartTimes ST ON
ST.StartTime <= ET.EndTimes
LEFT OUTER JOIN StartTimes ST2 ON
ST2.StartTime <= ET.EndTimes AND
ST2.StartTime > ST.StartTime
WHERE
ST2.StartTime IS NULL
Based on comment thread,
A. Buckets are defined by the first record in the bucket, and the first record in each Bucket is defined as any row where the DateCreated is more than 30 minutes after the latest earlier DateCreated. (immediately previous record)
B. The rest of the rows in the bucket are all rows with DateCreated on or after the First Row whose DateCreated is less than 30 minutes after the immediately previous row, and there does not exist a non-qualifying, (or new bucket-defining), row since the specified Bucket-defining row.
In English:
Select The DateCreated of those records wheret he DateCreated is more than 30 minutes after the previous DateCreated and aggregate function of your choice on all the other records in table whose DateCreated is after that bucket-defining datecreated, less than 30 minutes after it's immedialte previous DateCreated, and there are no records between the bucket-defining DateCreated and this one which follow a greater than 30 minute gap.
In SQL:
Select Z.BucketDefinitionDate , Count(*) RowsInBucket
From (Select Distinct DateCreated BucketDefinitionDate
From Table Ti
Where DateCreated > DateAdd(minute, 30,
(Select Max(DateCreated) From Table
Where DateCreated < Ti.DateCreated))) Z
Join Table B
On B.DateCreated > Z.BucketDefinitionDate
And Not Exists
(Select * From Table
Where DateCreated Between Z.BucketDefinitionDate
And B.DateCreated
And DateCreated > DateAdd(minute, 30,
(Select Max(DateCreated) From Table
Where DateCreated < B.DateCreated)))
Group By Z.BucketDefinitionDate
What you can try is
DECLARE #TABLE TABLE(
ID INT,
EventID INT,
DateCreated DATETIME
)
INSERT INTO #TABLE SELECT 123, 111, '2009-12-01 9:15am'
INSERT INTO #TABLE SELECT 123, 222, '2009-12-01 9:20am'
INSERT INTO #TABLE SELECT 123, 333, '2009-12-01 9:25am'
INSERT INTO #TABLE SELECT 123, 444, '2009-12-03 2:30pm'
INSERT INTO #TABLE SELECT 123, 555, '2009-12-01 2:32pm'
SELECT ID,
DATEADD(dd, DATEDIFF(dd,0,DateCreated), 0) DayVal,
DATEPART(hh, DateCreated) HourPart,
FLOOR(DATEPART(mi, DateCreated) / 30.) MinBucket
FROM #TABLE
Now you can group by DayVal, HourPart and MinBucket.
I think I have something for you. it is not a cool single query like Tom H posted, but it seems to work. It uses a table variable as a working table.
declare #table table(
id int identity(1,1),
userId int,
eventId int,dateCreated datetime,
bucket int
)
insert into #table select 123, 111, '2009-12-01 9:15am', 0
// etc... insert more rows - note that the 'bucket' field is set to 0
declare #next_bucket int
set #next_bucket = 1
update #table
set bucket = #next_bucket, #next_bucket = #next_bucket + 1
from #table as [current]
where datecreated > dateadd(mi, 30, (select datecreated from #table as previous where [current].id = previous.id + 1))
update #table
set bucket =
coalesce(( select max(bucket)
from #table as previous
where previous.id < [current].id
and bucket <> 0
), 1)
from #table as [current]
where bucket = 0
-- return the results
select * from #table
Related
I'm using Microsoft SQL Server Management Studio, and I want a new column that calculates the following:
If it has an ‘Exec’ value for category, it takes the ‘enddate’.
If it has an ‘Scop’ value for category, it takes the ‘start date’.
This new column calculates the number of months between these two.
I want SQL to do the calculation for a given id, so each id will have different values calculated.
At the moment it takes the minimum enddate and minimum 'startdate' for the entire table.
SELECT
id, category, startdate, enddate,
CASE
WHEN id = id
THEN DATEDIFF(month,
(SELECT MIN(enddate) from [A].[PP] where category = 'Exec'),
(SELECT MIN(startdate) from [A].[PP] where category = 'Scop')) --AS datemodify
ELSE NULL
END
FROM
[A].[PP]
WHERE
startdate IS NOT NULL
AND (category = 'Exec' OR category = 'Scop')
ORDER BY
id ASC
Results it produces at the moment:
id
category
startdate
enddate
NewCOlumn
1
Scop
2022-11-1
2022-10-1
11
1
Exec
2023-11-1
2023-10-1
11
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11
The results I want:
id
category
startdate
enddate
NewCOlumn
1
Scop
2021-11-1
2022-10-1
24
1
Exec
2023-11-1
2023-11-1
24
2
Scop
2022-11-1
2022-10-1
11
2
Exec
2023-11-1
2023-09-1
11
Based on comments I'm not sure you still know you want as your output so I've come up with two different versions.
Here's how I'm created a version of your data set:
INSERT INTO #TempTable (ID, Category, StartDate, EndDate)
VALUES (1, 'Scop', '2021-11-01', '2022-10-01'),
(1, 'Exec', '2023-11-01', '2023-10-01'),
(2, 'Scop', '2022-11-01', '2022-10-01'),
(2, 'Exec', '2023-11-01', '2023-10-01');
This is the first version, this created your two lines per ID but hacks the StartDate and EndDate from different rows. This works by selecting all of the data straight out of the temp table, it then goes on to say if the row is Category = Scop then do a DateDiff between the StartDate and then fetches the EndDate from a subquery where the IDs match and the Category = Exec (it also has the same logic applied but the other way around for where the initial Category = Exec):
SELECT TT.ID,
TT.Category,
TT.StartDate,
TT.EndDate,
CASE
WHEN TT.Category = 'Scop' THEN DATEDIFF(M, TT.StartDate, (SELECT EndDate FROM #TempTable WHERE Category = 'Exec' AND ID = TT.ID))
ELSE CASE
WHEN TT.Category = 'Exec' THEN DATEDIFF(M, (SELECT StartDate FROM #TempTable WHERE Category = 'Scop' AND ID = TT.ID), TT.EndDate)
END
END AS DateDiffCalc
FROM #TempTable AS TT;
This version compresses the IDs to a single row, it initially only fetches Scop data, but then joins back to itself using ID and specifices now to get the Exec data only. Now you can DateDiff between the Scop StartDate and the Exec EndDate
SELECT DISTINCT t1.ID,
t1.Category,
t1.StartDate,
T2.Category,
T2.EndDate,
DATEDIFF(M, t1.StartDate, T2.EndDate) AS DateDiffCalc
FROM #TempTable AS t1
INNER JOIN #TempTable AS T2 ON T2.ID = T2.ID AND T2.Category = 'Exec'
WHERE t1.Category = 'Scop'
ORDER BY t1.ID;
I have one problem identifying and fixing some records having overlapping time intervals, for one scd type 2 dimension.
What I have is:
Bkey Uid startDate endDate
'John' 1 1990-01-01 (some time stamp) 2017-01-10 (some time stamp)
'John' 2 2016-11=03 (some time stamp) 2016-11-14 (some time stamp)
'John' 3 2016-11-14 (some time stamp) 2016-12-29 (some time stamp)
'John' 4 2016-12-29 (some time stamp) 2017-01-10 (some time stamp)
'John' 5 2017-01-10 (some time stamp) 2017-04-22 (some time stamp)
......
I want to find (first) which are all the Johns having overlapping time periods, for a table having lots and lots of Johns and then to figure out a way to correct those overlapping time periods. For the latest I know there are some function LAGG, LEAD, which can handle that, but it eludes me how to find those over lappings.
Any hints?
Regards,
[ 1 ] Following query will return overlapping time ranges:
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Full script:
DECLARE #Dimension1 TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDate DATE NOT NULL,
endDate DATE NOT NULL,
CHECK(startDate < endDate)
);
INSERT #Dimension1
SELECT 'John', 1, '1990-01-01', '2017-01-10' UNION ALL
SELECT 'John', 2, '2016-11-03', '2016-11-14' UNION ALL
SELECT 'John', 3, '2016-11-14', '2016-12-29' UNION ALL
SELECT 'John', 4, '2016-12-29', '2017-01-10' UNION ALL
SELECT 'John', 5, '2017-01-11', '2017-04-22';
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Demo here
[ 2 ] In order to find distinct groups of time ranges with overlapping original rows I would use following approach:
-- Edit 1
DECLARE #Groups TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDateNew DATE NOT NULL,
endDateNew DATE NOT NULL,
CHECK(startDateNew < endDateNew)
);
INSERT #Groups
SELECT x.Bkey, x.Uid, z.startDateNew, z.endDateNew
FROM #Dimension1 x
OUTER APPLY (
SELECT MIN(y.startDate) AS startDateNew, MAX(y.endDate) AS endDateNew
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
) z
-- End of Edit 1
-- This returns distinct groups identified by DistinctGroupId together with all overlapping Uid(s) from current group
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
) e
-- This returns distinct groups identified by DistinctGroupId together with an XML (XmlCol) which includes overlapping Uid(s)
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
FOR XML RAW, TYPE
) AS XmlCol
) e
Note: Last range used in my example is 'John', 5, '2017-01-11', '2017-04-22'; and not 'John', 5, '2017-01-10', '2017-04-22';. Also, data type used is DATE and not DATETIME[2][OFFSET].
I think the tricky part of your query is being able to articulate the logic for overlapping ranges. We can self join on the condition that a row on the left overlaps with any row on the right. All matching rows are those which overlap.
We can think of four possible overlap scenarios:
|---------| |---------| no overlap
|---------|
|---------| 1st end and 2nd start overlap
|---------|
|---------| 1st start and 2nd end overlap
|---------|
|---| 2nd completely contained inside 1st
(could be 1st inside 2nd also)
SELECT DISTINCT
t.Uid
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.startDate <= t2.endDate AND
t2.startDate <= t1.endDate
WHERE
t1.Bkey = 'John' AND t2.Bkey = 'John'
This will at least let you identify overlapping records. Updating and separating them in a meaningful way will probably end up being an ugly gaps and islands problem, perhaps meriting another question.
we can acheive this by doing a self join of emp table.
a.emp_id != b.emp_id ensures same row is not joined with itself.
remaining comparison clause checks if any row's start date or end date falls in other row's date range.
create table emp(name varchar(20), emp_id numeric(10), start_date date, end_date date);
insert into emp values('John', 1, '1990-01-01', '2017-01-10');
insert into emp values( 'John', 2, '2016-11-03', '2016-11-14');
insert into emp values( 'John', 3, '2016-11-14', '2016-12-29');
insert into emp values( 'John', 4, '2016-12-29', '2017-01-10');
insert into emp values( 'John', 5, '2017-01-11', '2017-04-22');
commit;
with A as (select * from EMP),
B as (select * from EMP)
select A.* from A,B where A.EMP_ID != B.EMP_ID
and A.START_DATE < B.END_DATE and B.START_DATE < A.END_DATE
and (A.START_DATE between B.START_DATE and B.END_DATE
or A.END_DATE between B.START_DATE and B.END_DATE);
Let me explain the process:
We've got a scanned questionnaries.
The OCR system processes these questionnaries to get data.
Then all recognized data(form_id, question_number, answer etc) goes into database.
For each form there are about 120-150 rows in database:
53453, 1, A, 2016-10-30 23:54:18.590
53453, 2, B, 2016-10-30 23:54:18.690
53453, 3, C, 2016-10-30 23:54:18.790 so on
As you can see, it is difficult enough to find duplicate of a questionnarie form in the database. SQL is not my strong point so I need your help) I need to select ID according to the condition: insertionTime difference of 1 min is not a duplicate. But if the ID exists somwhere else in another Time it would be a dublicate.
P.S. I did my best trying to explain my issue. Excuse me for my english)
Make sure your last column's data type is DATETIME the do:
SELECT tA.*
FROM MyTable tA INNER JOIN MyTable tB ON (tA.ID = tB.ID AND tA.question_number = tB.question_number AND tA.answer = tB.answer)
WHERE DATEDIFF(minute,tA.DateColumn,tB.DateColumn) < 2 -- DATEDIFF returns INT
You check only ID or also question and answer? I wrote my query for only ID and Date because You said If ID exists in other row with different time ( difference is more than minute - it is duplicate) you don't say anything about checking answer / question. In last row I modified time.
DECLARE #TMP TABLE (
ID INT,
VALUE INT,
VALUE2 VARCHAR(5),
DATES DATETIME
)
INSERT INTO #TMP
SELECT 53453, 1, 'A', '2016-10-30 23:54:18.590'
INSERT INTO #TMP
SELECT 53453, 2, 'B', '2016-10-30 23:54:18.690'
INSERT INTO #TMP
SELECT 53453, 3, 'C', '2016-10-30 23:56:20.590'
SELECT ID, MIN(DATES) DATES
INTO #TMP_ID
FROM #TMP
GROUP BY ID
-- MORE THAN MINUTE
SELECT *
FROM #TMP T
WHERE EXISTS (
SELECT NULL
FROM #TMP_ID X
WHERE DATEDIFF(second, x.dates, t.DATES) > 60
and x.id = t.id
)
-- LESS THAN MINUTE
SELECT *
FROM #TMP T
WHERE NOT EXISTS (
SELECT NULL
FROM #TMP_ID X
WHERE DATEDIFF(second, x.dates, t.DATES) > 60
and x.id = t.id
)
DROP TABLE #TMP_ID
I have a dataset of hospitalisations ('spells') - 1 row per spell. I want to drop any spells recorded within a week after another (there could be multiple) - the rationale being is that they're likely symptomatic of the same underlying cause. Here is some play data:
create table hif_user.rzb_recurse_src (
patid integer not null,
eventdate integer not null,
type smallint not null
);
insert into hif_user.rzb_recurse_src values (1,1,1);
insert into hif_user.rzb_recurse_src values (1,3,2);
insert into hif_user.rzb_recurse_src values (1,5,2);
insert into hif_user.rzb_recurse_src values (1,9,2);
insert into hif_user.rzb_recurse_src values (1,14,2);
insert into hif_user.rzb_recurse_src values (2,1,1);
insert into hif_user.rzb_recurse_src values (2,5,1);
insert into hif_user.rzb_recurse_src values (2,19,2);
Only spells of type 2 - within a week after any other - are to be dropped. Type 1 spells are to remain.
For patient 1, dates 1 & 9 should be kept. For patient 2, all rows should remain.
The issue is with patient 1. Spell date 9 is identified for dropping as it is close to spell date 5; however, as spell date 5 is close to spell date 1 is should be dropped therefore allowing spell date 9 to live...
So, it seems a recursive problem. However, I've not used recursive programming in SQL before and I'm struggling to really picture how to do it. Can anyone help? I should add that I'm using Teradata which has more restrictions than most with recursive SQL (only UNION ALL sets allowed I believe).
It's a cursor logic, check one row after the other if it fits your rules, so recursion is the easiest (maybe the only) way to solve your problem.
To get a decent performance you need a Volatile Table to facilitate this row-by-row processing:
CREATE VOLATILE TABLE vt (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT r.*
,ROW_NUMBER() -- needed to facilitate the join
OVER (PARTITION BY patid ORDER BY eventdate) AS rn
FROM hif_user.rzb_recurse_src AS r
) WITH DATA ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT vt.*
,eventdate AS startdate
FROM vt
WHERE rn = 1 -- start with the first row
UNION ALL
SELECT vt.*
-- check if type = 1 or more than 7 days from the last eventdate
,CASE WHEN vt.eventdate > cte.startdate + 7
OR vt.exac_type = 1
THEN vt.eventdate -- new start date
ELSE cte.startdate -- keep old date
END
FROM vt JOIN cte
ON vt.patid = cte.patid
AND vt.rn = cte.rn + 1 -- proceed to next row
)
SELECT *
FROM cte
WHERE eventdate - startdate = 0 -- only new start days
order by patid, eventdate
I think the key to solving this is getting the first date more than 7 days from the current date and then doing a recursive subquery:
with rrs as (
select rrs.*,
(select min(rrs2.eventdate)
from hif_user.rzb_recurse_src rrs2
where rrs2.patid = rrs.patid and
rrs2.eventdate > rrs.eventdate + 7
) as eventdate7
from hif_user.rzb_recurse_src rrs
),
recursive cte as (
select patid, min(eventdate) as eventdate, min(eventdate7) as eventdate7
from hif_user.rzb_recurse_src rrs
group by patid
union all
select cte.patid, cte.eventdate7, rrs.eventdate7
from cte join
hif_user.rzb_recurse_src rrs
on rrs.patid = cte.patid and
rrs.eventdate = cte.eventdate7
)
select cte.patid, cte.eventdate
from cte;
If you want additional columns, then join in the original table at the last step.
I have a table that has 3 cols namely points, project_id and creation_date. every time points are assigned a new record has been made, for example.
points = 20 project_id = 441 creation_date = 04/02/2011 -> Is one record
points = 10 project_id = 600 creation_date = 04/02/2011 -> Is another record
points = 5 project_id = 441 creation_dae = 06/02/2011 -> Is final record
(creation_date is the date on which record is entered and it is achieved by setting the default value to GETDATE())
now the problem is I want to get MAX points grouped by project_id but I also want creation_date to appear with it so I can use it for another purpose, if creation date is repeating its ok and I cannot group by creation_date because if I do so it will skip the points of project with id 600 and its wrong because id 600 is a different project and its only max points are 10 so it should be listed and its only possible if I do the grouping using project_id but then how should I also list creation_date
So far I am using this query to get MAX points of each project
SELECT MAX(points) AS points, project_id
FROM LogiCpsLogs AS LCL
WHERE (writer_id = #writer_id) AND (DATENAME(mm, GETDATE()) = DATENAME(mm, creation_date)) AND (points <> 0)
GROUP BY project_id
writer_id is the ID of writer whose points I want to see, like writer_id = 1, 2 or 3.
This query brings the result of current month only but I would like to list creation_date as well. Please help.
The subquery way
SELECT P.Project_ID, P.Creation_Date, T.Max_Points
FROM Projects P INNER JOIN
(
SELECT Project_ID, MAX(Points) AS Max_Points
FROM Projects
GROUP BY Project_ID
) T
ON P.Project_ID = T.Project_ID
AND P.Points = T.Max_Points
Please see comment: this will give you ALL days where max-points was achieved. If you only just want one, the query will be more complex.
Edits:
Misread requirements. Added additional constraint.
I'll give you sample..
SELECT MAX(POINTS),
PROJECT_ID,
CREATION_DATE
FROM yourtable
GROUP by CREATION_DATE,PROJECT_ID;
This should be what you want, you don't even need a group by or aggravate functions:
SELECT points, project_id, created_date
FROM #T AS LCL
WHERE writer_id = #writer_id AND points <> 0
AND NOT EXISTS (
SELECT TOP 1 1
FROM #T AS T2
WHERE T2.writer_id = #writer_id
AND T2.project_id = LCL.project_id
AND T2.points > LCL.points)
Where #T is your table, also if you want to only show the records where they were the total in general and not the total for just this given #writer_id then remove the restriction T2.writer_id = #writer_id from the inner query
And my code that I used to test:
DECLARE #T TABLE
(
writer_id int,
points int,
project_id int,
created_date datetime
)
INSERT INTO #T VALUES(1, 20, 441, CAST('20110204' AS DATETIME))
INSERT INTO #T VALUES(1, 10, 600, CAST('20110204' AS DATETIME))
INSERT INTO #T VALUES(1, 5, 441, CAST('20110202' AS DATETIME))
INSERT INTO #T VALUES(1, 15, 241, GETDATE())
INSERT INTO #T VALUES(1, 12, 241, GETDATE())
INSERT INTO #T VALUES(2, 12, 241, GETDATE())
SELECT * FROM #T
DECLARE #writer_id int = 1
My results:
Result Set (3 items)
points | project_id | created_date
20 | 441 | 04/02/2011 00:00:00
10 | 600 | 04/02/2011 00:00:00
15 | 241 | 21/09/2011 18:59:31
My solution use CROSS APPLY sub-queries.
For optimal performance I have created an index on project_id (ASC) & points (DESC sorting order) fields.
If you want to see all creation_date values that have maximum points then you can use WITH TIES:
CREATE TABLE dbo.Project
(
project_id INT PRIMARY KEY
,name NVARCHAR(100) NOT NULL
);
CREATE TABLE dbo.ProjectActivity
(
project_activity INT IDENTITY(1,1) PRIMARY KEY
,project_id INT NOT NULL REFERENCES dbo.Project(project_id)
,points INT NOT NULL
,creation_date DATE NOT NULL
);
CREATE INDEX IX_ProjectActivity_project_id_points_creation_date
ON dbo.ProjectActivity(project_id ASC, points DESC)
INCLUDE (creation_date);
GO
INSERT dbo.Project
VALUES (1, 'A'), (2, 'BB'), (3, 'CCC');
INSERT dbo.ProjectActivity (project_id, points, creation_date)
VALUES (1,100,'2011-01-01'), (1,110,'2011-02-02'), (1, 111, '2011-03-03'), (1, 111, '2011-04-04')
,(2, 20, '2011-02-02'), (2, 22, '2011-03-03')
,(3, 2, '2011-03-03');
SELECT p.*, ca.*
FROM dbo.Project p
CROSS APPLY
(
SELECT TOP(1) WITH TIES
pa.points, pa.creation_date
FROM dbo.ProjectActivity pa
WHERE pa.project_id = p.project_id
ORDER BY pa.points DESC
) ca;
DROP TABLE dbo.ProjectActivity;
DROP TABLE dbo.Project;