How to filter out records grouped by date with a large date difference - sql

I have some records, grouped by name and date.
I would like to find any records in a table that have a date difference between them larger than a week, from the most recent record.
Would this be possible to do with a cte?
I am thinking something along these lines (it is difficult to explain)
; with mycte as (
select *
from #GroupedRecords)
select *
from mycte a
join (select *
from #GroupedRecords) b on a.Name = b.Name
where datediff(day, a.DateCreated, b.DateCreated) > 7
For example:
Id Name Date
1 Foo 02/03/2010
2 Bar 23/02/2010
3 Ram 21/01/2010
4 Foo 29/02/2010
5 Foo 22/02/2010
6 Foo 05/12/2009
The results should be:
Id Name Date
1 Foo 02/03/2010
5 Foo 22/02/2010
6 Foo 05/12/2009

You can try:
SELECT id,
name,
DATE
FROM groupedrecords AS gr1
WHERE ( (SELECT MAX(DATE) AS md
FROM groupedrecords gr2
WHERE gr1.name = gr2.name) - gr1.DATE ) > 7;
Or probably better yet:
SELECT id,
name,
DATE
FROM groupedrecords AS gr1
INNER JOIN (SELECT name,
MAX(DATE) AS md
FROM groupedrecords AS gr2
GROUP BY name) AS q1
ON gr1.name = q1.name
WHERE ( q1.md - gr1.DATE ) > 7;
UPDATE: As suggested in the comments, here is a version that uses union to get the id with the max date per group AND the ids of those that are 7 days or older than the max date. I used a CTE for fun, it was not necessary. Note that if there is more than 1 ID that shares the max date in a group, this query will need to be modified-
WITH CTE
AS (SELECT name,
Max(date) AS MD
FROM Records
GROUP BY name)
SELECT R.ID,
R.name,
R.date
FROM CTE
INNER JOIN Records AS R
ON CTE.Name = R.Name
AND CTE.MD = R.date
UNION ALL
SELECT r1.id,
r1.name,
r1.DATE
FROM Records AS R1
INNER JOIN CTE
ON CTE.name = R1.name
WHERE ( CTE.md - R1.DATE ) > 7
ORDER BY name ASC,
date DESC

I wonder if this gets close to a solution:
; with tableWithRow as (
select *, row_number() over (order by name, date) as rowNum
from t
)
select t1.*, t2.id t2id, t2.name t2name, t2.date t2date, t2.rowNum t2rowNum
from tableWithRow t1
join tableWithRow t2
on t1.rowNum = t2.rowNum + 1 and t1.name = t2.name

Related

SQL Case depending on previous status of record

I have a table containing status of a records. Something like this:
ID STATUS TIMESTAMP
1 I 01-01-2016
1 A 01-03-2016
1 P 01-04-2016
2 I 01-01-2016
2 P 01-02-2016
3 P 01-01-2016
I want to make a case where I take the newest version of each row, and for all P that has at some point been an I, they should be cased as a 'G' instead of P.
When I try to do something like
Select case when ID in (select ID from TABLE where ID = 'I') else ID END as status)
From TABLE
where ID in (select max(ID) from TABLE)
I get an error that this isn't possible using IN when casing.
So my question is, how do I do it then?
Want to end up with:
ID STATUS TIMESTAMP
1 G 01-04-2016
2 G 01-02-2016
3 P 01-01-2016
DBMS is IBM DB2
Have a derived table which returns each id with its newest timestamp. Join with that result:
select t1.ID, t1.STATUS, t1.TIMESTAMP
from tablename t1
join (select id, max(timestamp) as max_timestamp
from tablename
group by id) t2
ON t1.id = t2.id and t1.TIMESTAMP = t2.max_timestamp
Will return both rows in case of a tie (two rows with same newest timestamp.)
Note that ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP".
You can do this by using a common table expression find all IDs that have had a status of 'I', and then using an outer join with your table to determine which IDs have had a status of 'I' at some point.
To get the final result (with only the newest record) you can use the row_number() OLAP function and select only the "newest" record (this is shown in the ranked common table expression below:
with irecs (ID) as (
select distinct
ID
from
TABLE
where
status = 'I'
),
ranked as (
select
rownumber() over (partition by t.ID order by t.timestamp desc) as rn,
t.id,
case when i.id is null then t.status else 'G' end as status,
t.timestamp
from
TABLE t
left outer join irecs i
on t.id = i.id
)
select
id,
status,
timestamp
from
ranked
where
rn = 1;
other solution
with youtableranked as (
select f1.id,
case (select count(*) from yourtable f2 where f2.ID=f1.ID and f2."TIMESTAMP"<f1."TIMESTAMP" and f2.STATUS='I')>0 then 'G' else f1.STATUS end as STATUS,
rownumber() over(partition by f1.id order by f1.TIMESTAMP desc, rrn(f1) desc) rang,
f1."TIMESTAMP"
from yourtable f1
)
select * from youtableranked f0
where f0.rang=1
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"
try this
select distinct f1.id, f4.*
from yourtable f1
inner join lateral
(
select
case (select count(*) from yourtable f3 where f3.ID=f2.ID and f3."TIMESTAMP"<f2."TIMESTAMP" and f3.STATUS='I')>0 then 'G' else f2.STATUS end as STATUS,
f2."TIMESTAMP"
from yourtable f2 where f2.ID=f3.ID
order by f2."TIMESTAMP" desc, rrn(f2) desc
fetch first rows only
) f4 on 1=1
rrn(f2) order is for same last date
ANSI SQL has TIMESTAMP as reserved word, so you may need to delimit it as "TIMESTAMP"

Consider a single record, per id, in a group by

Background
I have an SQL table with 4 columns:
id - varchar(50)
g1 - varchar(50)
g2 - varchar(50)
datetime - timestamp
I have this query:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table
and g2 = 'start'
GROUP BY 1
order by share desc
This query was built to answer: What is the distributions of g1 value out of the users?
Problem
Each id may have multiple records in the table. I wish to consider the earliest one. early means, the minimum datetime value.
Example
Table
id g1 g2 datetime
x1 a start 2016-01-19 21:01:22
x1 c start 2016-01-19 21:01:21
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
Actual query results
g1 count total share
a 2 4 0.5
b 1 4 0.25
c 1 4 0.25
we have 4 records, but I only want to consider two records:
x2 b start 2016-01-19 09:03:42
x1 a start 2016-01-18 13:56:45
which are the earliest records per id.
Expected query results
g1 count total share
a 1 2 0.5
b 1 2 0.5
Question
How do I consider only the earliest record, per id, in the group by
Here is a solution which should work in SQL Server, and any database which supports CTE:
WITH cte AS
(
SELECT t1.g1,
COUNT(*) AS count
FROM yourTable t1
INNER JOIN
(
SELECT id, MIN(datetime) AS datetime
FROM yourTable
GROUP BY id
) t2
ON t1.id = t2.id AND
t1.datetime = t2.datetime
)
SELECT t.g1,
t.count,
(SELECT COUNT(*) FROM cte) AS total,
t.count / (SELECT COUNT(*) FROM cte) AS share
FROM cte t
I don't know what is your DBMS so here's a standard ANSI way to do this
SELECT T1.g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM my_table T1
INNER JOIN
(SELECT id, MIN(datetime) AS mindt
FROM mytable
GROUP BY id
) T2 ON T1.datetime=t2.mindt AND T1.id=T2.id
and T1.g2 = 'start'
GROUP BY 1
order by share desc
It might be slow if you have a large table and datetime is not indexed.
Try with the below query.
;WITH cte_1
as (SELECT id, MIN(datetime) AS [Date]
FROM YourTable
GROUP BY id
)
SELECT yt.g1,
COUNT(DISTINCT yt.id) [Count],
SUM(COUNT(DISTINCT yt.id)) OVER () AS total,
(CAST(COUNT(DISTINCT yt.id) AS float) / SUM(COUNT(DISTINCT yt.id)) OVER ()) AS share
FROM cte_1 c
JOIN YourTable yt
ON yt.[datetime]=c.[Date] AND yt.id=c.id
and yt.g2 = 'start'
GROUP BY yt.g1
ORDER BY share DESC
Output :
You are querying from my_table all the data although you only want to have the earliest date for an id. I assume id is the primary key in the table.
I suggest you define a view (or inline view) which queries only the earliest dates for the id's and you use your query on that view instead of on my_table.
The view could be defined as so and would contain only id's of earliest date:
select * from my_table a
where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start'
You can define that as a view or use it directly inline as in:
SELECT g1,
COUNT(DISTINCT id),
SUM(COUNT(DISTINCT id)) OVER () AS total,
(CAST(COUNT(DISTINCT id) AS float) / SUM(COUNT(DISTINCT id)) OVER ()) AS share
FROM (select a.id, a.g1, a.g2, a.datetime from my_table a where a.datetime = (select min(z.datetime) from my_table z where a.id = z.id) and a.g2 = 'start')
GROUP BY 1
order by share desc

Get average time between record creation

So I have data like this:
UserID CreateDate
1 10/20/2013 4:05
1 10/20/2013 4:10
1 10/21/2013 5:10
2 10/20/2012 4:03
I need to group by each user get the average time between CreateDates. My desired results would be like this:
UserID AvgTime(minutes)
1 753.5
2 0
How can I find the difference between CreateDates for all records returned for a User grouping?
EDIT:
Using SQL Server 2012
Try this:
SELECT A.UserID,
AVG(CAST(DATEDIFF(MINUTE,B.CreateDate,A.CreateDate) AS FLOAT)) AvgTime
FROM #YourTable A
OUTER APPLY (SELECT TOP 1 *
FROM #YourTable
WHERE UserID = A.UserID
AND CreateDate < A.CreateDate
ORDER BY CreateDate DESC) B
GROUP BY A.UserID
This approach should aslo work.
Fiddle demo here:
;WITH CTE AS (
Select userId, createDate,
row_number() over (partition by userid order by createdate) rn
from Table1
)
select t1.userid,
isnull(avg(datediff(second, t1.createdate, t2.createdate)*1.0/60),0) AvgTime
from CTE t1 left join CTE t2 on t1.UserID = t2.UserID and t1.rn +1 = t2.rn
group by t1.UserID;
Updated: Thanks to #Lemark for pointing out number of diff = recordCount - 1
since you're using 2012 you can use lead() to do this
with cte as
(select
userid,
(datediff(second, createdate,
lead(CreateDate) over (Partition by userid order by createdate)
)/60) datdiff
From table1
)
select
userid,
avg(datdiff)
from cte
group by userid
Demo
Something like this:
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY CreateDate) RN,
UserID,
CreateDate
FROM Tbl
)
SELECT
T1.UserID,
AVG(DATEDIFF(mi, ISNULL(T2.CreateDate, T1.CreateDate), T1.CreateDate)) AvgTime
FROM CTE T1
LEFT JOIN CTE T2
ON T1.UserID = T2.UserID
AND T1.RN = T2.RN - 1
GROUP BY T1.UserID
With SQL 2012 you can use the ROW_NUMBER function and self-join to find the "previous" row in each group:
WITH Base AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY CreateDate) RowNum,
UserId,
CreateDate
FROM Users
)
SELECT
B1.UserID,
ISNULL(
AVG(
DATEDIFF(mi,B2.CreateDate,B1.CreateDate) * 1.0
)
,0) [Average]
FROM Base B1
LEFT JOIN Base B2
ON B1.UserID = B2.UserID
AND B1.RowNum = B2.RowNum + 1
GROUP BY B1.UserId
Although I get a different answer for UserID 1 - I get an average of (5 + 1500) / 2 = 752.
This only works in 2012. You can use the LEAD analytic function:
CREATE TABLE dates (
id integer,
created datetime not null
);
INSERT INTO dates (id, created)
SELECT 1 AS id, '10/20/2013 4:05' AS created
UNION ALL SELECT 1, '10/20/2013 4:10'
UNION ALL SELECT 1, '10/21/2013 5:10'
UNION ALL SELECT 2, '10/20/2012 4:03';
SELECT id, isnull(avg(diff), 0)
FROM (
SELECT id,
datediff(MINUTE,
created,
LEAD(created, 1, NULL) OVER(partition BY id ORDER BY created)
) AS diff
FROM dates
) as diffs
GROUP BY id;
http://sqlfiddle.com/#!6/4ce89/22

Select rows with the same field values

How can I query only the records that show up twice in my table?
Currently my table looks something like this:
Number Date RecordT ReadLoc
123 08/13/13 1:00pm N Gone
123 08/13/13 2:00pm P Home
123 08/13/13 3:00pm N Away
123 08/13/13 4:00pm N Away
I need a query that will select the records that have the same 'Value' in the RecordT field and the same 'Value' in the ReadLoc field.
So my result for the above would show with the query:
Number Date RecordT ReadLoc
123 08/13/13 3:00pm N Away
123 08/13/13 4:00pm N Away
I was trying to do a subselect like this:
SELECT t.Number, t.Date, n.RecordT, n.ReadLoc
FROM Table1 t join Table2 n ON t.Number = n.Number
WHERE t.Number IN (SELECT t.Number FROM Table1 GROUP BY t.Number HAVING COUNT(t.Number) > 1 )
AND n.ReadLoc IN (SELECT n.ReadLoc FROM Table2 GROUP n.ReadLoc HAVING COUNT(n.ReadLoc) > 1 )
SELECT a.*
FROM Table1 a
JOIN (SELECT RecordT, ReadLoc
FROM Table1
GROUP BY RecordT, ReadLoc
HAVING COUNT(*) > 1
)b
ON a.RecordT = b.RecordT
AND a.ReadLoc = b.ReadLoc
SQL Fiddle
Shouldn't this work:
select *
from table1
where (RecordT, ReadLoc) in
(select RecordT, ReadLoc
from table1
group by RecordT, ReadLoc
having count(*) > 1)
The following can be taken as a base:
;with cte as (
select *, cnt = count(1) over (partition by RecordT, ReadLoc)
from TableName
)
select *
from cte
where cnt > 1
If your TableName is actually a view of two joined tables, try:
;with TableName as (
SELECT t.Number, t.Date, n.RecordT, n.ReadLoc
FROM Table1 t
join Table2 n ON t.Number = n.Number
),
cte as (
select Number, Date, RecordT, ReadLoc,
cnt = count(1) over (partition by RecordT, ReadLoc)
from TableName
)
select Number, Date, RecordT, ReadLoc
from cte
where cnt > 1 /* and RecordT='N' and ReadLoc='AWAY' */

SQL Server query to select local maximums

I have this data. I need to get the lowest $ full rows for each person.
Amount Date Name
$123 Jun 1 Peter
$120 Jun 5 Peter
$123 Jun 5 Paul
$100 Jun 1 Paul
$220 Jun 3 Paul
The result of the SQl Server query should be:
$120 Jun 5 Peter
$100 Jun 1 Paul
SQL Server 2005+ Version
;WITH CTE AS
(
SELECT
Amount, [Date], Name,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Amount]) AS RowNum
FROM Table
)
SELECT *
FROM CTE
WHERE RowNum = 1
Alternative Version
SELECT t.Amount, t.[Date], t.Name
FROM
(
SELECT Name, MIN(Amount) AS MinAmount
FROM Table
GROUP BY Name
) m
INNER JOIN Table t
ON t.Name = m.Name
AND t.Amount = m.Amount
One way which works on SQL Server 7 and up
select t1.*
from(select min(amount) Minamount,name
from Yourtable
group by name) t2
join Yourtable t1 on t1.name = t2.name
and t1.amount = t2.Minamount
There are a couple of ways to solve this, see here: Including an Aggregated Column's Related Values
SELECT * FROM TableName T1 WHERE NOT EXISTS
(SELECT * FROM TableName T2
WHERE T2.Name = T1.Name AND T2.Amount < T1.Amount)
In the event of ties, both rows will be shown in this scenario.
Group on the person to get the lowest amount for each person, then join the table to get the date for each row:
select y.Amount, y.Date, y.Name
from (
select min(Amount), Name
from TheTable
group by Name
) x
inner join TheTable y on x.Name = y.Name and x.Amount = y.Amount
If the amount can exist on more than one date for a person, pick one of the dates, for example the first:
select y.Amount, min(y.Date), y.Name
from (
select min(Amount), Name
from TheTable
group by Name
) x
inner join TheTable y on x.Name = y.Name and x.Amount = y.Amount
group by y.Amount, y.Name
Not quite the most efficient possible, but simpler to read:
SELECT DISTINCT [Name], [Date], MIN([Amount]) OVER(PARTITION BY [Name])
FROM #Table