Finding the Time Since items interacted in SQL database - sql

I have a table such that I have
Item1 Item2 Timestamp
A B 2012-06-5 06:14:12
B A 2012-06-6 06:20:12
C A 2012-06-5 06:23:45
A B 2012-06-7 08:35:35
C A 2012-06-8 13:12:42
B C 2012-06-8 15:14:57
I want to append another column which we can call time_since that shows me the last time Item1 and Item2 interacted.
For example in Line Item 2 would have an extra row with the number 6 in it. Since the two items interacted 6 minutes prior.

I think on the second row of your example data you meant to put June FIFTH (not 6th) in timestamp field? Only in that scenario would there be a difference of 6 minutes.
If that is true and you really do want the difference between in minutes you should be able to use the following:
Fiddle Test: http://sqlfiddle.com/#!2/0aa157/4/0
select t1.item1,
t1.item2,
t1.timestamp,
timestampdiff(minute, max(t2.timestamp), t1.timestamp) as time_diff
from tbl t1
left join tbl t2
on t2.timestamp < t1.timestamp
and ((t1.item1 = t2.item1 and t1.item2 = t2.item2) or
(t1.item1 = t2.item2 and t1.item2 = t2.item1))
group by t1.item1, t1.item2, t1.timestamp
order by 3
(I changed the second row of your example data to be June 5th)

Related

Having a hard time building an aggregate SQL query

I am new at SQL and have a pretty good knowledge of basic stuff but I am stuck with my request.
My request gets me te following table (except for the last column on the right end side):
Team
Variable
Date
Value
Column_I_need_to_add
A
aa
2022/05/01
100
0
A
aa
2022/06/01
25
0
A
aa
2022/07/01
580
0
A
ad
2022/08/01
50
605
B
aa
2021/05/01
75
0
B
aa
2021/06/01
110
0
B
aa
2021/07/01
514
0
B
ad
2021/08/01
213
624
What I cannot turn my head around, is how to code for the last column that fills rows for the ad variable by summing values of the aa variables of the same team but only for the two months prior to the date of the ad variable.
Here is the script I have so far, that gets me the first four columns:
SELECT
team.Team,
Var.Variable,
TO_DATE(Var.Year||'-'||LPAD(Var.Month,2,'00')||'-'||'01','YYYY-MM-DD')AS Date ,
Var.value
FROM table1 as Var
join table2 as team
on Var.code=team.code
---This last join with table3 is only there to add other columns that are not relevant to this problem.
---join table3 as detail_var on Var.variable=detail_var.code_var
I was not content with the previous answer, with OUTER APPLY, as understood from further reading. So had to do a bit of further grinding and this is what I came up with (Now for Postgres 13).
It is cleaner and does the job in a conciser fashion. I've also added a FIDDLE LINK. If you want to see the previous answer please look at the edit versions.
SELECT
team.Team
,var.Variable
,var.Date
,var.value
,CASE
WHEN var.Variable='ad' THEN
(SELECT sum(value) FROM table1
WHERE
(TO_DATE(Year||'-'||LPAD(Month::varchar(2),2,'0')||'-'||'01','YYYY-MM-DD')
BETWEEN (var.Date - INTERVAL '2 month') AND var.Date)
AND Variable = 'aa'
AND code = var.code)
ELSE null
END as past2monthsValue
FROM (
-- this sub query to change Year & Month to Date Type Value
-- this Date Type Value (Date) will be used to compare dates
-- (var.Date) in the above sub-query
SELECT
code,
Variable,
TO_DATE(Year||'-'||LPAD(Month::varchar(2),2,'0')||'-'||'01','YYYY-MM-DD') AS Date,
value
FROM table1
) var
JOIN table2 AS team ON var.code=team.code

How do I stop my query from pulling duplicates?

Yes, I know this seems simple:
SELECT DISTINCT(...)
Except, it apparently isn't
Here is my actual Query:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS,
IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune,
IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical,
IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther,
IIf([DecReason]=7,1,0) AS YesAlready
FROM
EmployeeInformation
INNER JOIN (CompletedTrainings
LEFT JOIN DeclinationReasons ON CompletedTrainings.DecReason = DeclinationReasons.ReasonID)
ON EmployeeInformation.ID = CompletedTrainings.Employee
GROUP BY
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
CompletedTrainings.DecShotDate,
CompletedTrainings.DecShotLocation,
CompletedTrainings.DecReason,
CompletedTrainings.DecExplanation,
IIf([DecShotLocation]="MCS","Yes","No"),
IIf([DecReason]=1,1,0),
IIf([DecReason]=2,1,0),
IIf([DecReason]=3,1,0),
IIf([DecReason]=4,1,0),
IIf([DecReason]=5,1,0),
IIf([DecReason]=6,1,0),
IIf([DecReason]=7,1,0)
HAVING
((((EmployeeInformation.Active) Like -1)
AND ((CompletedTrainings.DecShotDate + 365 >= DATE())
OR (CompletedTrainings.DecShotDate IS NULL))));
This is Joining a few tables (obviously) in order to get a number of records. The problem is that if someone is duplicated on the table with a NULL in one of the date fields, and a date in another field, it pulls both the NULL and the DATE, or pulls multiple NULLS it might pull multiple dates but those are not present right at the moment.
I need the Nulls, they are actual data in this particular case, but if someone has a date and a NULL I need to pull only the newest record, I thought I could add MAX(RecordID) from the table, but that didn't change the results of the query either.
That code:
SELECT
DeclinationReasons.Reason,
EmployeeInformation.ID,
EmployeeInformation.Employee,
EmployeeInformation.Active,
MAX(CompletedTrainings.RecordID),
CompletedTrainings.DecShotDate
...
And it returned the same issue, Duplicated EmployeeInformation.ID with different DecShotDate values.
Currently it returns:
ID
Active
DecShotDate
etc. x a bunch
1
-1
date date
whatever goes
2
-1
in these
2
-1
date date
columns
These are being used in a report, that is to determine the total number of employees who fit the criteria of the report. The NULLs in DecShotDate are needed as they show people who did not refuse to get a flu vaccine in the current year, while the dates are people who did refuse.
Now I have come up with one simple solution, I could add a column to the CompletedTrainings Table that contains a date or other value, and add that to the HAVING statement. This might be the right solution as this is a yearly training questionnaire that employees have to fill out. But I am asking for advice before doing this.
Am I right in thinking I need to add a column to filter by so that older data isn't being pulled, or should I be able to do this by pulling recordID, and did I just bork that part of the query up?
Edited to add raw table views:
EmployeeInformation Table:
ID
Last
First
empID
Active
Termdate
DoH
Title
PT/FT/PD
PI
1
Doe
Jane
982
-1
date
Sr
PD
X
2
Roe
John
278
0
date
date
Jr
PD
X
3
Moe
Larry
1232
-1
date
Sr
FT
X
4
Zoe
Debbie
1424
-1
date
Sr
PT
X
DeclinationReasons Table:
ReasonID
Reason
1
Allergy
2
Already got it
3
Illness
CompletedTrainings Table:
RecordID
Employee
Training
...
DecShotdate
DecShotLocation
DecShotReason
DecExp
1
1
4
date
location
2
text
2
1
4
3
2
4
4
3
4
date
location
3
text
5
3
4
date
location
1
text
6
4
4
After some serious soul searching, I decided to use another column and filter by that.
In the end my query looks like this:
SELECT *
FROM (
(
SELECT RecordID, DecShotDate, DecShotLocation, DecReason, DecExplanation, Employee,
IIf([DecShotLocation]="MCS","Yes","No") AS YesMCS, IIf([DecReason]=1,1,0) AS YesAllergy,
IIf([DecReason]=2,1,0) AS YesImmune, IIf([DecReason]=3,1,0) AS YesAdverse,
IIf([DecReason]=4,1,0) AS YesMedical, IIf([DecReason]=5,1,0) AS YesSpiritual,
IIf([DecReason]=6,1,0) AS YesOther, IIf([DecReason]=7,1,0) AS YesAlready
FROM CompletedTrainings WHERE (CompletedDate > DATE() - 365 ) AND (Training = 69)) AS T1
LEFT JOIN
(
SELECT ID, Active FROM EmployeeInformation) AS T2 ON T1.Employee = T2.ID)
LEFT JOIN
(
SELECT Reason, ReasonID FROM DeclinationReasons) AS T3 ON T1.DecReason = T3.ReasonID;
This may not have been the best solution, but it did exactly what I needed. Which is to get the information by latest entry into the database.
Previously I had tried to use MAX(), DISTINCT(), etc. but always had a problem of multiple records being retrieved. In this case, I intentionally SELECT the most recent records first, then join them to the results of the next query, and so on. Until I have all the required data for my report.
I write this in hopes someone else finds it useful. Or even better if someone tells me why this is wrong, so as to improve my own skills.

Conditional removing duplicate records

I'm storing some realtime data in SQLite. Now, I want to remove duplicate records to reduce data and enlarge its timeframe to 20 seconds by SQL commands.
Sample data:
id t col1 col2
-----------------------------
23 9:19:18 15 16
24 9:19:20 10 11
25 9:19:20 10 11
26 9:19:35 10 11
27 9:19:45 10 11
28 9:19:53 10 11
29 9:19:58 14 13
Logic: In above sample, records 25-28 have same value in col1 and col2 field, so they are duplicate. But because keeping one (for example, record 25) and removing others will cause timeframe (= time difference between subsequent data) to be more than 20s, i don't want to remove all of records 26-28. So, in above sample, row=25 will be kept because, it's not duplicate of its previous row. Row=26 will be kept, because although its duplicate of its previous row, removing this row causes to have timeframe to more than 20s (19:45 - 19:20). Row=27 will be removed, meeting these 2 conditions and row=28 will be kept.
I can load data to C# datatable and apply this logic in code in a loop over records, but it is slow comparing to run SQL in database. I'm not sure this can be implemented in SQL. Any help would be greatly appreciated.
Edit: I've added another row before row =25 to show rows with the same time. Fiddle is here: Link
OK so here's an alternate answer that handles the duplicate record scenario you've described, uses LAG and LEAD and also ends up considerably simpler as it turns out!
delete from t1 where id in
(
with cte as (
select id,
lag(t, 1) over(partition by col1, col2 order by t) as prev_t,
lead(t, 1) over(partition by col1, col2 order by t) as next_t
from t1
)
select id
from cte
where strftime('%H:%M:%S',next_t,'-20 seconds') < strftime('%H:%M:%S',prev_t)
)
Online demo here
I believe this accomplishes what you are after:
delete from t1 where id in
(
select ta.id
from t1 as ta
join t1 as tb
on tb.t = (select max(t) from t1 where t < ta.t
and col1 = ta.col1 and col2 = ta.col2)
and tb.col1 = ta.col1 and tb.col2 = ta.col2
join t1 as tc
on tc.t = (select min(t) from t1 where t > ta.t
and col1 = ta.col1 and col2 = ta.col2)
and tc.col1 = ta.col1 and tc.col2 = ta.col2
where strftime('%H:%M:%S',tc.t,'-20 seconds') < strftime('%H:%M:%S',tb.t)
)
Online demo is here where I've gone through a couple of iterations to simplify it to the above. Basically you need to look at both the previous row and the next row to determine whether you can delete the current row, which happens only when there's a difference of less than 20 seconds between the previous and next row times, as I understand your requirement.
Note: You could probably achieve the same using LAG and LEAD but I'll leave that as an exercise to anyone else who's interested!!
EDIT: In case the time values are not unique, I've included additional conditions to the ta/tb and ta/tc joins to include col1 and col2 and updated the fiddle.
I think you can do the following:
Create a result set in SQL that adds the previous row ordered by id (for this use LAG function (https://www.sqlitetutorial.net/sqlite-window-functions/sqlite-lag/)
Calculate a new column using the CASE construct (https://www.sqlitetutorial.net/sqlite-case/). This column could be a boolean called "keep" that basically is calculated in the following way:
if the previous row col1 and col2 values are not the same => true
if the previous row col1 and col2 values are the same but the time difference > 20 sec => true
in other cases => false
Filter on this query to only select the rows to keep (keep = true).

Extract only variables which is greater than other table in influxDB

I am using influxDB and I would like to extract some values which is greater than certain threshold in other table.
For example, I have two tables as shown in below.
Table A
Time value
1 15
2 25
3 9
4 22
Table B
Time threshold
1 16
2 12
3 13
4 15
Give above two tables, I would like to extract three values which is greater than first row in Table B. Therefore what I want to have is as below.
Time value
2 25
4 22
I tried it using below sql query, but it didn't give any correct result.
select * from data1 where value > (select spec from spec1 limit1);
Look forward to your feedback.
Thanks.
Integrate the condition in an inner join:
select * from tableA as a
inner join tableB as b on a.id=b.id and a.value > b.threshold
When your time column doesn't only include integer values, you have to format the time and join on a time range. Here is an example:
SQL join on time range

Finding contiguous regions in a sorted MS Access query

I am a long time fan of Stack Overflow but I've come across a problem that I haven't found addressed yet and need some expert help.
I have a query that is sorted chronologically with a date-time compound key (unique, never deleted) and several pieces of data. What I want to know is if there is a way to find the start (or end) of a region where a value changes? I.E.
DateTime someVal1 someVal2 someVal3 target
1 3 4 A
1 2 4 A
1 3 4 A
1 2 4 B
1 2 5 B
1 2 5 A
and my query returns rows 1, 4 and 6. It finds the change in col 5 from A to B and then from B back to A? I have tried the find duplicates method and using min and max in the totals property however it gives me the first and last overall instead of the local max and min? Any similar problems?
I didn't see any purpose for the someVal1, someVal2, and someVal3 fields, so I left them out. I used an autonumber as the primary key instead of your date/time field; but this approach should also work with your date/time primary key. This is the data in my version of your table.
pkey_field target
1 A
2 A
3 A
4 B
5 B
6 A
I used a correlated subquery to find the previous pkey_field value for each row.
SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m;
Then put that in a subquery which I joined to another copy of the base table.
SELECT
sub.pkey_field,
sub.target,
sub.prev_pkey_field,
prev.target AS prev_target
FROM
(SELECT
m.pkey_field,
m.target,
(SELECT Max(pkey_field)
FROM YourTable
WHERE pkey_field < m.pkey_field)
AS prev_pkey_field
FROM YourTable AS m) AS sub
LEFT JOIN YourTable AS prev
ON sub.prev_pkey_field = prev.pkey_field
WHERE
sub.prev_pkey_field Is Null
OR prev.target <> sub.target;
This is the output from that final query.
pkey_field target prev_pkey_field prev_target
1 A
4 B 3 A
6 A 5 B
Here is a first attempt,
SELECT t1.Row, t1.target
FROM t1 WHERE (((t1.target)<>NZ((SELECT TOP 1 t2.target FROM t1 AS t2 WHERE t2.DateTimeId<t1.DateTimeId ORDER BY t2.DateTimeId DESC),"X")));