Find uncovered periods without exploding each combination - sql

I have the following two tables
People
+--------+---------------+-------------+
| Name | ContractStart | ContractEnd |
+--------+---------------+-------------+
| Kate | 20180101 | 20181231 |
| Sawyer | 20180101 | 20181231 |
| Ben | 20170601 | 20181231 |
+--------+---------------+-------------+
Shifts
+---------+--------+------------+----------+
| Station | Name | ShiftStart | ShiftEnd |
+---------+--------+------------+----------+
| Swan | Kate | 20180101 | 20180131 |
| Arrow | Kate | 20180301 | 20180331 |
| Arrow | Kate | 20180401 | 20181231 |
| Flame | Sawyer | 20180101 | 20181231 |
| Swan | Ben | 20180101 | 20181231 |
+---------+--------+------------+----------+
It means that, for example, Kate will be available from 20180101 to 20181231. In this period of time she will work at station Swan from 20180101 to 20180131, at station Arrow from 20180301 to 20180331 and from 20180401 to 20181231.
My goal is to come to the following table
+------+---------------+-------------+
| | VacationStart | VacationEnd |
+------+---------------+-------------+
| Kate | 20180201 | 20180228 |
| Ben | 20170601 | 20171231 |
+------+---------------+-------------+
that means that Kate will be free from 20180201 to 20180228.
My first idea was to create a table with every day of the 2017 and 2018, let's say a CalTable, then JOIN the table with People to find every day that every person should be available. At this point JOIN again the resulting table with Shifts to have evidence of the days NOT BETWEEN ShiftStart AND ShiftEnd.
This steps give me correct results but are very slow, considering that I have almost 1.000.000 of person and usually between ContractStart and ContractEnd the are 10-20 years.
What could be a correct approach to get the results in a more clever and fast way?
Thanks.
This is the data of the example on db<>Fiddle
For # A_Name_Does_Not_Matter this is my attempt
CREATE TABLE #CalTable([ID] VARCHAR(8) NOT NULL)
DECLARE #num int
SET #num = 20170101
WHILE (#num <= 20181231)
BEGIN
INSERT INTO #CalTable([ID])
SELECT #num AS [ID]
SET #num = #num + 1
END
SELECT X.[Name], X.[TIMEID]
FROM (
-- All day availables
SELECT DISTINCT A.[Name],B.[ID] AS [TIMEID]
FROM #People A INNER JOIN #CalTable B
ON B.[ID] BETWEEN A.[ContractStart] AND A.[ContractEnd]
) X
LEFT JOIN (
-- Working day
SELECT DISTINCT A.[Name],B.[ID] AS [TIMEID]
FROM #People A INNER JOIN #CalTable B
ON B.[ID] BETWEEN A.[ContractStart] AND A.[ContractEnd]
INNER JOIN #Shifts C ON A.[Name]=C.[Name] AND B.[ID] BETWEEN C.[ShiftStart] AND C.[ShiftEnd]
) Z
ON X.[Name]=Z.[Name] AND X.[TIMEID]=Z.[TIMEID]
WHERE Z.[Name] IS NULL
ORDER BY X.[Name],X.[TIMEID]
and then aggregate the dates witk this query.

so a persons start date could be the start of a vacation, and you can find the end of that vacation by finding the date of their first shift (minus 1 day) by using CROSS APPLY to get the TOP 1 shift, ORDERED BY DATE
In an unusual situation that they have no shifts, their vacation ends on their contract end date.
Future vacations then start the day after a shift, and end the day before the next shift (can be found by OUTER APPLY) and defaulted to contracted end date if there is no further shift
SELECT p.name, p.contractStart vacationstart, p.ContractEnd vacationend from people p WHERE not exists(select 1 from shifts s where p.name = s.name)
UNION
SELECT p2.name,
p2.contractStart vacationstart,
dateadd(day,-1,DQ.ShiftStart) as vacationend
from PEOPLE P2
CROSS APPLY
(SELECT TOP 1 s2.ShiftStart FROM shifts s2 WHERE p2.name = s2.name order by sfiftstart) DQ
WHERE DQ.ShiftStart > p2.contractstart
UNION
select P3.NAME,
dateadd(day,1,s3.ShiftEnd) vacationstart,
COALESCE(dateadd(day,-1, DQ2.shiftStart),P3.ContractEnd) --you might have to add handling yourself for removing a case where they work on their contract end date
FROM people p3 JOIN shifts s3 on p3.name = s3.name
OUTER APPLY (SELECT TOP 1 s4.shiftStart
from shifts s4
where s4.name = p3.name
and
s4.shiftstart > s3.shiftstart
order by s4.shiftstart) DQ2
it's hard for me to verify without test data.
For an employee, what I seek is.
Contract Start, Shift1Start - 1
Shift1End + 1, Shift2Start - 1
Shift2End + 1, Shift3Start - 1
Shift3End + 1, ContractEnd
then add the case with 'no shifts'
finally shifts may be contiguous, leading to vacations of duration of zero or less - you could filter these by making the query a sub query, and simply filtering

Related

Creating user time report that includes zero hour weeks

I'm having a heck of a time putting together a query that I thought would be quite simple. I have a table that records total hours spent on a task and the user that reported those hours. I need to put together a query that returns how many hours a given user charged to each week of the year (including weeks where no hours were charged).
Expected Output:
|USER_ID | START_DATE | END_DATE | HOURS |
-------------------------------------------
|'JIM' | 4/28/2019 | 5/4/2019 | 6 |
|'JIM' | 5/5/2019 | 5/11/2019 | 0 |
|'JIM' | 5/12/2019 | 5/18/2019 | 16 |
I have a function that returns the start and end date of the week for each day, so I used that and joined it to the task table by date and summed up the hours. This gets me very close, but since I'm joining on date I obviously end up with NULL for the USER_ID on all zero hour rows.
Current Output:
|USER_ID | START_DATE | END_DATE | HOURS |
-------------------------------------------
|'JIM' | 4/28/2019 | 5/4/2019 | 6 |
| NULL | 5/5/2019 | 5/11/2019 | 0 |
|'JIM' | 5/12/2019 | 5/18/2019 | 16 |
I've tried a few other approaches, but each time I end up hitting the same problem. Any ideas?
Schema:
---------------------------------
| TASK_LOG |
---------------------------------
|USER_ID | DATE_ENTERED | HOURS |
-------------------------------
|'JIM' | 4/28/2019 | 6 |
|'JIM' | 5/12/2019 | 6 |
|'JIM' | 5/13/2019 | 10 |
------------------------------------
| DATE_HELPER_TABLE |
|(This is actually a function, but I|
| put it in a table to simplify) |
-------------------------------------
|DATE | START_OF_WEEK | END_OF_WEEK |
-------------------------------------
|5/3/2019 | 4/28/2019 | 5/4/2019 |
|5/4/2019 | 4/28/2019 | 5/4/2019 |
|5/5/2019 | 5/5/2019 | 5/11/2019 |
| ETC ... |
Query:
SELECT HRS.USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
,SUM(HOURS)
FROM DATE_HELPER_TABLE DHT
LEFT JOIN (
SELECT TL.USER_ID
,TL.HOURS
,DHT2.START_OF_WEEK
,DHT2.END_OF_WEEK
FROM TASK_LOG TL
JOIN DATE_HELPER_TABLE DHT2 ON DHT2.DATE_VALUE = TL.DATE_ENTERED
WHERE TL.USER_ID = 'JIM1'
) HRS ON HRS.START_OF_WEEK = DHT.START_OF_WEEK
GROUP BY USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
ORDER BY DHT.START_OF_WEEK
http://sqlfiddle.com/#!18/02d43/3 (note: for this sql fiddle, I converted my date helper function into a table to simplify)
Cross join the users (in question) and include them in the join condition. Use coalesce() to get 0 instead of NULL for the hours of weeks where no work was done.
SELECT u.user_id,
dht.start_of_week,
dht.end_of_week,
coalesce(sum(hrs.hours), 0)
FROM date_helper_table dht
CROSS JOIN (VALUES ('JIM1')) u (user_id)
LEFT JOIN (SELECT tl.user_id,
dht2.start_of_week,
tl.hours
FROM task_log tl
INNER JOIN date_helper_table dht2
ON dht2.date_value = tl.date_entered) hrs
ON hrs.user_id = u.user_id
AND hrs.start_of_week = dht.start_of_week
GROUP BY u.user_id,
dht.start_of_week,
dht.end_of_week
ORDER BY dht.start_of_week;
I used a VALUES clause here to list the users. If you only want to get the times for particular users you can do so too (or use any other subquery, or ...). Otherwise you can use your user table (which you didn't post, so I had to use that substitute).
However the figures that are produced by this (and your original query) look strange to me. In the fiddle your user has worked for a total of 23 hours in the task_log table. Yet your sums in the result are 24 and 80, that is way to much on its own and even worse taking into account, that 1 hour in task_log isn't even on a date listed in date_helper_table.
I suspect you get more accurate figures if you just join task_log, not that weird derived table.
SELECT u.user_id,
dht.start_of_week,
dht.end_of_week,
coalesce(sum(tl.hours), 0)
FROM date_helper_table dht
CROSS JOIN (VALUES ('JIM1')) u (user_id)
LEFT JOIN task_log tl
ON tl.user_id = u.user_id
AND tl.date_entered = dht.date_value
GROUP BY u.user_id,
dht.start_of_week,
dht.end_of_week
ORDER BY dht.start_of_week;
But maybe that's just me.
SQL Fiddle
http://sqlfiddle.com/#!18/02d43/65
Using your SQL fiddle, I simply updated the select statement to account for and convert null values. As far as I can tell, there is nothing in your post that makes this option not viable. Please let me know if this is not the case and I will update. (This is not intended to detract from sticky bit's answer, but to offer an alternative)
SELECT ISNULL(HRS.USER_ID, '') as [USER_ID]
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
,SUM(ISNULL(HOURS,0)) as [SUM]
FROM DATE_HELPER_TABLE DHT
LEFT JOIN (
SELECT TL.USER_ID
,TL.HOURS
,DHT2.START_OF_WEEK
,DHT2.END_OF_WEEK
FROM TASK_LOG TL
JOIN DATE_HELPER_TABLE DHT2 ON DHT2.DATE_VALUE = TL.DATE_ENTERED
WHERE TL.USER_ID = 'JIM1'
) HRS ON HRS.START_OF_WEEK = DHT.START_OF_WEEK
GROUP BY USER_ID
,DHT.START_OF_WEEK
,DHT.END_OF_WEEK
ORDER BY DHT.START_OF_WEEK
Create a dates table that includes all dates for the next 100 years in the first column, the week of the year, day of the month etc in the next.
Then select from that dates table and left join everything else. Do isnull function to replace nulls with zeros.

SQL union / join / intersect multiple select statements

I have two select statements. One gets a list (if any) of logged voltage data in the past 60 seconds and related chamber names, and one gets a list (if any) of logged arc event data in the past 5 minutes. I am trying to append the arc count data as new columns to the voltage data table. I cannot figure out how to do this.
Note that, there may or may not be arc count rows, for a given chamber name that is in the voltage data table. If there are no rows, I want to set the arc count column value to zero.
Any ideas on how to accomplish this?
Voltage Data:
SELECT DISTINCT dbo.CoatingChambers.Name,
AVG(dbo.CoatingGridVoltage_Data.ChanA_DCVolts) AS ChanADC,
AVG(dbo.CoatingGridVoltage_Data.ChanB_DCVolts) AS ChanBDC,
AVG(dbo.CoatingGridVoltage_Data.ChanA_RFVolts) AS ChanARF,
AVG(dbo.CoatingGridVoltage_Data.ChanB_RFVolts) AS ChanBRF FROM
dbo.CoatingGridVoltage_Data LEFT OUTER JOIN dbo.CoatingChambers ON
dbo.CoatingGridVoltage_Data.CoatingChambersID =
dbo.CoatingChambers.CoatingChambersID WHERE
(dbo.CoatingGridVoltage_Data.DT > DATEADD(second, - 60,
SYSUTCDATETIME())) GROUP BY dbo.CoatingChambers.Name
Returns
Name | ChanADC | ChanBDC | ChanARF | ChanBRF
-----+-------------------+--------------------+---------------------+------------------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848
Arc count table:
SELECT CoatingChambers.Name,
SUM(ArcCount) as ArcCount
FROM CoatingChambers
LEFT JOIN CoatingArc_Data
ON dbo.[CoatingArc_Data].CoatingChambersID = dbo.CoatingChambers.CoatingChambersID
where EventDT > DATEADD(mi,-5, GETDATE())
Group by Name
Returns
Name | ArcCount
-----+---------
L1 | 283
L4 | 0
L6 | 1
S2 | 55
To be clear, I want this table (with added arc count column), given the two tables above:
Name | ChanADC | ChanBDC | ChanARF | ChanBRF | ArcCount
-----+-------------------+--------------------+---------------------+-------------------+---------
OX2 | 2.9099999666214 | -0.485000004371007 | 0.344801843166351 | 0.49748428662618 | 0
S2 | 0.100000001490116 | -0.800000016887983 | 0.00690172302226226 | 0.700591623783112 | 55
S3 | 4.25666658083598 | 0.5 | 0.96554297208786 | 0.134956782062848 | 0
You can treat the select statements as virtual tables and just join them together:
select
x.Name,
x.ChanADC,
x.ChanBDC,
x.ChanARF,
x.ChanBRF,
isnull( y.ArcCount, 0 ) ArcCount
from
(
select distinct
cc.Name,
AVG(cgv.ChanA_DCVolts) AS ChanADC,
AVG(cgv.ChanB_DCVolts) AS ChanBDC,
AVG(cgv.ChanA_RFVolts) AS ChanARF,
AVG(cgv.ChanB_RFVolts) AS ChanBRF
from
dbo.CoatingGridVoltage_Data cgv
left outer join
dbo.CoatingChambers cc
on
cgv.CoatingChambersID = cc.CoatingChambersID
where
cgv.DT > dateadd(second, - 60, sysutcdatetime())
group by
cc.Name
) as x
left outer join
(
select
cc.Name,
sum(ac.ArcCount) as ArcCount
from
dbo.CoatingChambers cc
left outer join
dbo.CoatingArc_Data ac
on
ac.CoatingChambersID = cc.CoatingChambersID
where
EventDT > dateadd(mi,-5, getdate())
group by
Name
) as y
on
x.Name = y.Name
Also, it's worthwhile to simplify your names with aliases and format the queries for readability...which I shamelessly took a stab at.

SQL Server Find the date in joining order

I am using MS-SQL Server there are two tables
membership
+---+-----------------+---------------------+----------------
| | membershipName | createddate | price |
+---+-----------------+---------------------+----------------
| 1 | Swimming | 2010-01-01 | 30 |
| 2 | Swimming | 2010-05-01 | 32 |
| 3 | Swimming | 2011-01-01 | 35 |
| 4 | Swimming | 2012-01-01 | 40 |
+---+-----------------+---------------------+----------------
member
+---+-----------------+---------------------+-----------------
| | memberName | membership | joiningDate |
+---+-----------------+---------------------+-----------------
| 0 | Andy | Swimming | 2008-02-02 |
| 1 | John | Swimming | 2010-02-02 |
| 2 | Andy | Swimming | 2011-02-02 |
| 3 | Alice | Swimming | 2015-02-02 |
+---+-----------------+---------------------+----------------
I want find the member's membership price for the right period of time
e.g
Andy return NULL
John return 30
Alice return 40
the best logic is to see
if the joiningDate is in between two start date
if yes choose the earlier date
if not
if the joining date is before the earlier date then use the earliest date
if the joining date is after the latest date then use the latest date
I am a Java programmer, do this in sql is quite tricky for me, any hint would be nice!
edit 1: sorry I forgot to consider month
edit 2: added desirable result
I hope I understood you correctly. try this out:
SELECT TOP 1 ms.Price
FROM membership ms
LEFT JOIN member m
ON m.joiningdate > ms.createdate
WHERE m.id = 3
ORDER BY price DESC
I hope I got this correctly. You might try it like this:
Declared table variable to mock-up a test scenario:
DECLARE #membership TABLE(id INT, membershipName VARCHAR(100),createddate DATETIME,price DECIMAL(10,4));
INSERT INTO #membership VALUES
(1,'Swimming',{d'2010-01-01'},30)
,(2,'Swimming',{d'2010-05-01'},32)
,(3,'Swimming',{d'2011-01-01'},35)
,(4,'Swimming',{d'2012-01-01'},40);
DECLARE #member TABLE(id INT,memberName VARCHAR(100),membership VARCHAR(100),joiningDate DATETIME);
INSERT INTO #member VALUES
(0,'Andy','Swimming',{d'2008-02-02'})
,(1,'John','Swimming',{d'2010-02-02'})
,(2,'Andy','Swimming',{d'2011-02-02'})
,(3,'Alice','Swimming',{d'2015-02-02'});
As you are on SQL-Server 2012 you are lucky. You can use LEAD:
The CTE "Intervalls" will return the membership table as is and it will add one column with one second before the next rows createddate. LEAD helps you to get hands on a value of a later coming row. First I take away one second, then I set a very high date in case of NULL:
WITH Intervalls AS
(
SELECT *
,ISNULL(DATEADD(SECOND ,-1,LEAD(createddate) OVER(ORDER BY createddate)),{d'2100-01-01'}) AS EndOfIntervall
FROM #membership AS ms
)
--The SELECT reads all members and joins them to the membership where their date is in the range according to "Intervalls". Only the case ealier than the first must be treated specially:
SELECT m.*
,ISNULL(i.price, CASE WHEN YEAR(m.joiningDate)<(SELECT MIN(x.createddate) FROM #membership as x)
THEN (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC) END)
FROM #member AS m
LEFT JOIN Intervalls AS i ON m.joiningDate BETWEEN i.createddate AND i.EndOfIntervall
UPDATE Better approach (thx to Paparis)
SELECT m.*
,ISNULL(Corresponding.price, (SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC)) AS price
FROM #member AS m
OUTER APPLY
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
) AS Corresponding
UPDATE 2: Even simpler!
SELECT m.*
,ISNULL
(
(
SELECT TOP 1 ms.price
FROM #membership AS ms
WHERE ms.createddate<=m.joiningDate
ORDER BY ms.createddate DESC
),
(
SELECT TOP 1 x.price FROM #membership AS x ORDER BY x.createddate ASC
)
) AS price
FROM #member AS m

How to subtract two columns in different table

I have a table of ward
ward_number | class | capacity
________________________________________
1 | A1 | 1
2 | A1 | 2
3 | B1 | 3
4 | C | 4
5 | B2 | 5
capacity = how many beds there is in the ward
I also have a table called ward_stay:
ward_number | from_date | to_date
_____________________________________________
2 | 2015-01-01 | 2015-03-08
3 | 2015-01-16 | 2015-02-18
6 | 2015-03-05 | 2015-03-18
3 | 2015-04-15 | 2015-04-20
1 | 2015-05-19 | 2015-05-30
I want to count the number of beds available in ward with class 'B1' on date '2015-04-15':
ward_number | count
_____________________
3 | 2
How to get the count is basically capacity - the number of times ward_number 3 appears
I managed to get the number of times ward_number 3 appears but I don't know how to subtract capacity from this result.
Here's my code:
select count(ward_number) AS 'result'
from ward_stay
where ward_number = (select ward_number
from ward
where class = 'B1');
How do I subtract capacity from this result?
SQL Fiddle Demo
Using 2015-01-17 instead I calculate the total of occupied bed on that day. Then join back to substract from original capacity. in case all bed are free the LEFT JOIN will return NULL, so COALESCE will put 0
SELECT w."ward_number", "capacity" - COALESCE(occupied, 0) as "count"
FROM wards w
LEFT JOIN (
SELECT "ward_number", COUNT(*) occupied
FROM ward_stay
WHERE to_date('2015-01-17', 'yyyy-mm-dd') BETWEEN "from_date" and "to_date"
GROUP BY "ward_number"
) o
ON w."ward_number" = o."ward_number"
WHERE w."class" = 'B1'
OUTPUT
| ward_number | count |
|-------------|-------|
| 3 | 2 |
select w.ward_number,
w.capacity - count(ws.ward_number) AS "result"
from ward as w left join ward_stay as ws
on ws.ward_number = w.ward_number
and date '2015-05-19' between ws.from_date and ws.to_date
where w.class = 'B1' -- which class
-- bed not occupied on that date
group by w.ward_number, w.capacity
having w.capacity - count(*) > 0 -- only available wards
See fiddle
You need to aggregate both tables before returning, because you have multiple rows for the same word type in both. So:
select c.class, (c.capacity - coalesce(wc.occupied)) as available
from (select class, sum(capacity) as capacity
from ward
group by class
) c left join
(select w.class, count(*) as occupied
from ward_stay ws join
ward s
on ws.ward_number = w.ward_number and
'2015-05-19' between ws.from_date and ws.to_date
) wc
on w.class = wc.class;
Note: this is standard SQL except for the date constant. This works in most databases; some might have other formats (or it might depend on internationalization settings).
Strictly speaking the aggregation on ward is not necessary for "B1". But it is clearly necessary for "A1".

query that returns rows where time difference past threshold

this is an odd question. i dunno if it is quite doable.
let's say i have the following table:
person | product | trans | purchase_date
-------+----------+--------+---------------
jim | square | aaaa | 2013-03-04 00:01:00
sarah | circle | aaab | 2013-03-04 00:02:00
john | square | aac1 | 2013-03-04 00:03:00
john | circle | aac2 | 2013-03-04 00:03:10
jim | triangle | aad1 | 2013-03-04 00:04:00
jim | square | abcd | 2013-03-04 00:05:00
sarah | square | efgh | 2013-03-04 00:07:00
jim | circle | ijkl | 2013-03-04 00:22:00
sarah | circle | mnop | 2013-03-04 00:24:00
sarah | square | qrst | 2013-03-04 00:26:00
sarah | circle | uvwx | 2013-03-04 00:44:00
i need to know when the difference between any person's purchases between a square and a circle (or a circle and a square) have exceeded 10 minutes. ideally, i'd like to know that difference as well, but that isn't required.
so as a result, here is what i need:
person | product | trans | purchase_date
-------+----------+--------+---------------
jim | square | abcd | 2013-03-04 00:05:00
jim | circle | ijkl | 2013-03-04 00:22:00
sarah | square | efgh | 2013-03-04 00:07:00
sarah | circle | mnop | 2013-03-04 00:24:00
sarah | square | qrst | 2013-03-04 00:26:00
sarah | circle | uvwx | 2013-03-04 00:44:00
this will run daily, so i will add a "where" clause to ensure the query doesn't get out of hand. also, i am aware that multiple transactions could show up (say there were 20 minutes between the purchase of a circle, then 20 minutes for a square, then 20 minutes for a circle again, which would mean there were 2 instances where the time difference was over 10 minutes).
any advice? i am on postgres 8.1.23
Modern day solution
With modern day Postgres (8.4 or later) you can use the window function row_number() to get a continuous numbering per group. Then you can left join to the previous and next row and see if either of them matches the criteria. Voilá.
WITH x AS (
SELECT *
,row_number() OVER (PARTITION BY person ORDER BY purchase_date) AS rn
FROM tbl
WHERE product IN ('circle', 'square')
)
SELECT x.person, x.product, x.trans, x.purchase_date
FROM x
LEFT JOIN x y ON y.person = x.person AND y.rn = x.rn + 1
LEFT JOIN x z ON z.person = x.person AND z.rn = x.rn - 1
WHERE (y.product <> x.product
AND y.purchase_date > x.purchase_date + interval '10 min')
OR (z.product <> x.product
AND z.purchase_date < x.purchase_date - interval '10 min')
ORDER BY x.person, x.purchase_date;
SQLfiddle.
Solution for Postgres 8.1
I can't test this on Postgres 8.1, no surviving instance available. Tested and works on v8.4 and should work for you, too. Temporary sequences and temporary tables and and CREATE TABLE AS were already available.
Temporary sequence and table are only visible to you, so you can get continuous numbers even with concurrent queries.
CREATE TEMP SEQUENCE s;
CREATE TEMP TABLE x AS
SELECT *, nextval('s') AS rn -- get row-numbers from sequence
FROM (
SELECT *
FROM tbl
WHERE product IN ('circle', 'square')
ORDER BY person, purchase_date -- need to order in a subquery first!
) a;
Then the same SELECT as above should work:
SELECT x.person, x.product, x.trans, x.purchase_date
FROM x
LEFT JOIN x y ON y.person = x.person AND y.rn = x.rn + 1
LEFT JOIN x z ON z.person = x.person AND z.rn = x.rn - 1
WHERE (y.product <> x.product
AND y.purchase_date > x.purchase_date + interval '10 min')
OR (z.product <> x.product
AND z.purchase_date < x.purchase_date - interval '10 min')
ORDER BY x.person, x.purchase_date;
You could try joining the table to itself with an 'ON' clause like this:
SELECT a.Person, CAST((DATEDIFF(mi, b.purchaseDateb a.purchaseDate)/60.0) AS Decimal) AS TimeDiff, a.Product, b.Product FROM <TABLE> a
JOIN <TABLE> b
ON a.Person = b.Person AND b.purchaseDate > a.purchaseDate
WHERE
(a.Product = 'Circle' AND b.Product = 'Square')
OR
(a.Product = 'Square' AND b.Product = 'Circle')
By joining the table to itself you get rows which combine two purchases by the same person. By limiting it to 'b.purchaseDate > a.purchaseDate' you prevent rows matching themselves. Then you can simply check for different products purchased.
The time difference is the last tricky part. What I included above is based on an answer I found here. It looks like it should work, and there's a couple of variations there you can use if what this outputs doesn't work for you.
You'll need to add a clause on the WHERE statement which uses the same DATEDIFF function to test for time > 10 minutes, but that should pose no great challenge.
Please note that this won't return exactly what you have in your question - this will include a row for Jim's first transaction as well as one for Jim's 2nd square purchase. Both will match to the same circle, and you will get both times (ijkl-abcd AND ijkl-aaaa). Thanks for xQbert's comment for pointing this out.
--Assumes
You want to know differences in minutes for purchase on same day. If dates don't matter eliminate the where clause.
That you only want considerations of circle to square following the purchase_date, not preceding.
.
.
SELECT A.person, A.product, a.Trans, A.Purchase_date, B.Purchase_date,
hours_diff * 60 + DATE_PART('minute', B.purchase_date - A.Purchase_date ) as minuteDifference
FROM yourTable A
LEFT JOIN yourTable B
on A.person = B.Person
and ((A.product = 'square' and b.product = 'circle')
OR (A.Product = 'circle' and b.product = 'square'))
and A.purchase_date <= B.Purchase_date
WHERE (A.purchase_Date::date = B.purchase_date::date OR B.purchase_date is null)
Null B.purchase_dates will tell you when you don't have a circle/square or square circle combo.