SQL Self join with closest time stamp

SQL Self join with closest time stamp - sql

I have a table on a SQL Server with millions of lab results from different people (AA,BB etc.). Some substances, for instance sodium and potassium, may be measured by two different methods (A and B).
I would now like to do some sort of "self join" based on the following criteria:
I would like to compare the results of the same substance obtained 1) from the same person, 2) with the two different methods and 3) with the closest date/time stamp
For instance, each SampleID of Sodium_A analysis should be joined with the Sodium_B analysis from the same person with the closest date/time stamp:
Source table:
select T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result
from [DataTable] T1 LEFT JOIN
[AnalysisTable] A1
ON A1.DW_SK_Analyse = T1.DW_SK_Analyse
DateTime
PersonID
SampleID
Analysis
Result
01-01-2021 10:30
AA
1
Sodium_A
10
01-01-2021 10:30
AA
1
Potassium_A
5
10-01-2021 11:30
AA
2
Sodium_A
15
10-01-2021 11:30
AA
2
Potassium_A
15
11-02-2021 12:30
AA
3
Sodium_A
20
16-03-2021 13:30
AA
4
Sodium_A
9
18-04-2021 14:30
AA
5
Sodium_A
1
02-01-2021 10:30
AA
6
Sodium_B
9
20-03-2021 13:30
AA
9
Sodium_B
11
20-04-2021 14:30
AA
10
Sodium_B
2
20-04-2021 14:30
AA
10
Potassium_B
6
23-05-2021 12:50
BB
13
Sodium_B
58
26-05-2021 11:20
BB
14
Potassium_A
11
29-05-2021 12:20
BB
15
Sodium_A
15
30-06-2021 11:20
BB
16
Sodium_B
24
30-06-2021 11:20
BB
16
Potassium_B
21
Desired result:
DateTime1
DateTime2
PersonID
SampleID1
SampleID2
Analysis1
Analysis2
Result1
Result2
01-01-2021 10:30
02-01-2021 10:30
A
1
6
Sodium_A
Sodium_B
10
9
10-01-2021 10:30
02-01-2021 10:30
A
2
6
Sodium_A
Sodium_B
15
9
29-05-2021 12:20
23-05-2021 12:50
B
15
13
Sodium_A
Sodium_B
15
58
I hope it makes sense... :-) Any idea on how to do this?
I've started with something like this, but it does not consider the time difference:
select
T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result,
T2.DateTime, T2.PersonID, T2.SampleID, A2.Analysis, T2.Result
from [DataTable] T1
LEFT JOIN [AnalysisTable] A1 ON A1.DW_SK_Analyse=T1.DW_SK_Analyse
INNER JOIN [DataTable] T2 ON T1.PersonID=T2.PersonID
LEFT JOIN [AnalysisTable] A2 ON A2.DW_SK_Analyse=T2.DW_SK_Analyse
WHERE (select AnaType1 =
CASE
WHEN A1.Analysis='Sodium_A' THEN 'Sodium'
WHEN A1.Analysis='Potassium_A' THEN 'Potassium'
else 'NN' end) = (select AnaType2 =
CASE
WHEN A2.Analysis='Sodium_B' THEN 'Sodium'
WHEN A2.Analysis='Potassium_B' THEN 'Potassium'
else 'OO'
end )

You can use apply:
with t as (
select T1.DateTime, T1.PersonID, T1.SampleID, A1.Analysis, T1.Result,
left(A1.Analysis, len(A1.Analysis) - 2) as substance,
right(A1.Analysis, 1) as which
from [DataTable] T1 left join
[AnalysisTable] A1
on A1.DW_SK_Analyse = T1.DW_SK_Analyse
)
select t.*, t_other.*
from t outer apply
(select top (1) t_other.*
from t t_other
where t_other.personId = t.personId and
t_other.substance = t.substance and
t_other.which <> t.which
order by abs(datediff(second, t.datetime, t_other.datetime))
) t_other;
Note that I split the "analysis" into two parts, one for the substance and one for the particular test.

Related

SQL Query for getting data from different columns of different rows

I have a dataset as following:
VN FNAME SEG STARTTIME ENDTIME F1 DIS F2
5 try 1 09-DEC-21 10.00.00 PM 09-DEC-21 11.05.00 PM 0 2 1
1 eat 1 09-DEC-21 11.00.00 PM 09-DEC-21 11.59.59 PM 1 15 1
5 sit 1 09-DEC-21 11.30.00 PM 09-DEC-21 11.59.59 PM 0 21 1
1 eat 2 10-DEC-21 12.00.00 AM 10-DEC-21 02.00.00 AM 1 15 1
5 sit 2 10-DEC-21 12.00.00 AM 10-DEC-21 04.00.00 AM 0 21 1
9 fly 1 10-DEC-21 01.00.00 AM 10-DEC-21 04.30.00 AM 1 50 1
4 say 1 10-DEC-21 05.00.00 AM 10-DEC-21 06.30.00 AM 0 25 1
With the above dataset I want to fetch records with unique FNAME where F1 and F2 both are 1 but STARTTIME displays the STARTTIME where SEG = 1 and ENDTIME displays the ENDTIME where SEG = max(SEG) for the FNAME. So basically the result which I am looking at is:
VN FNAME STARTTIME ENDTIME DIS
1 eat 09-DEC-21 11.00.00 PM 10-DEC-21 02.00.00 AM 15
9 fly 10-DEC-21 01.00.00 AM 10-DEC-21 04.30.00 AM 50
How can I achieve this using a SQL query? The database I am working with is Oracle.
Any help appreciated. Many thanks in advance.

Something like so, using grouing and subqueries. Having count(*) as 1 will show where there is 1 instance of the name, with F1 & F2 being 1.
select t.fname,
(select t2.starttime from table1 as t2 where t2.fname=t.fname and t2.seg=1) as start_time,
max(t.endtime) as end_time
from table1 as t
where t.f1=1 and t.f2=1
group by t.fname
having count(*)=1

How to join two tables based on conditon in sql?

I need to Join two table with respective to two columns
Stu_id (Table 1) - Stu_id (Table 2)
Perf_yr(Table 1) - yr_month (Table 2)
perf_yr starts on every year Sept to Aug.
Perf_yr should match the yr_month based on Perf_yr start and end Month
Table 1
Stu_id Roll_No Avg_marks Perf_yr
1 100244 72 2017
2 200255 62 2018
3 100246 68 2019
Table 2
Stu_id Subject Marks yr_month
1 Maths 70 201609
1 Science 69 201701
1 Social 74 201712
2 Maths 60 201709
2 Science 61 201801
2 Social 62 201808
3 Maths 65 201810
3 Science 64 201912
3 Social 72 201902
Output
Stu_id Roll_No Avg_marks Perf_yr Subject Marks yr_month
1 100244 72 2017 Maths 70 201609
1 100244 72 2017 Science 70 201701
2 200255 62 2018 Maths 60 201709
2 200255 62 2018 Science 61 201801
2 200255 62 2018 Social 62 201808
3 100246 68 2019 Maths 65 201810
3 100246 68 2019 Science 64 201912
3 100246 68 2019 Social 72 201902
I TRIED :
SELECT A.*, B.* FROM
(SELECT * FROM TABLE1 )A
LEFT JOIN
(SELECT * FROM TABLE)B
ON
A.Stu_id = B.Stu_id
AND
A.Perf_yr = B.Yr_Month
BUT IT WONT GIVE THE DESIRED RESULT BECAUSE THE CONDITION IS NOT SATISFYING THE PERF YR START AND END DATE .

You need to parse eg 201609 as a date, add 4 months to it, then match it to the year from the other table. Adding 4 months converts a date range 201609-201708 into being 201701-201712 - we only care about the year part:
SELECT * FROM
t1
INNER JOIN t2
ON
t1.stu_id = t2.stu_id AND
t1.Perf_yr = EXTRACT(year FROM ADD_MONTHS(TO_DATE(t2.yr_month, 'YYYYMM'), 4))
This is oracle. The same logic will work for SQLS, you'll just need to adjust the functions used-
CONVERT(date, t2.yr_month+'01', 112) -- Convert yyyymmdd to a date
DATEADD(month, x, 4) -- add 4 months to x
YEAR(x) -- extract year from date x

How to get a row number or ID of where the MAX() value was found

Good day community.
I'm having a hard time trying to figure out a way to achieve the results I try to get. As im not very skilled with SQL queries, I start to lose my mind. What I'm trying to do is to find the highest and lowest grade on a particular test, but I also wish to get the ID or the row number (they are matching) of the rows where the MAX() and MIN() were found.
The table "Results" looks like this:
ResultID|Test_UK|Test_US|TestUK_Scr|TestUS_Scr|TestTakenOn
1 1 3 85 14 2018-11-22 00:00:00.000
2 3 1 41 94 2018-11-23 00:00:00.000
3 2 4 71 54 2018-11-24 00:00:00.000
4 4 2 51 52 2018-12-25 00:00:00.000
5 6 3 74 69 2018-12-01 00:00:00.000
6 3 6 83 57 2018-12-02 00:00:00.000
7 7 4 91 98 2018-12-03 00:00:00.000
8 4 7 88 22 2018-12-04 00:00:00.000
9 5 8 41 76 2018-12-08 00:00:00.000
10 8 5 37 64 2018-12-09 00:00:00.000
The results I get when I run my query...
TestID|TopScore|LowScore|LastDateTestTaken
1 94 85 2018-11-23 00:00:00.000
2 71 52 2018-11-25 00:00:00.000
3 83 14 2018-12-02 00:00:00.000
4 98 51 2018-12-04 00:00:00.000
5 64 41 2018-12-09 00:00:00.000
6 74 57 2018-12-02 00:00:00.000
7 91 22 2018-12-04 00:00:00.000
8 76 37 2018-12-09 00:00:00.000
This is the queries I'm working on.
This query returns the results mentioned above
WITH
-- Combine the results of UK and US tests
Combined_Results_Both_Tests AS(
select ResultID as resultID, Test_UK as TestID, Test_UK_Scr as TestScore, TestTakenOn as TestDate from Results
union all
select ResultID as resultID, Test_US as TestID, Test_US_Scr as TestScore, TestTakenOn as TestDate from Results),
--Gets TOP and WORST results of the tests, LastDateTaken (Needs to add ResultID!)
Get_Best_and_Worst_Results_And_LastTestDate AS(
SELECT TestID ,max(TestScore) AS TopScore ,min(TestScore) AS LowScore ,max(TestDate) AS LastDateTestTaken
FROM Combined_Results_Both_Tests
GROUP BY TestID)
--Final query execution
SELECT * FROM Get_Best_and_Worst_Results_And_LastTestDate
I've tried to achieve my desired results with something like this, which doesn't work and is also very inefficient. What I mean that it doesn't work, it is filled with dublicates, whenever the match is found on US and UK tests.
--Gets ReslutID of Min and Max values
Get_ResultID_Of_Results AS(
SELECT * FROM Get_Best_and_Worst_Results_And_LastTestDate A
CROSS APPLY
(SELECT ResultID FROM Results res
WHERE (A.TestID = res.Test_UK AND A.TopScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_US AND A.TopScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_UK AND A.LowScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_US AND A.LowScore = res.Test_UK_Scr) OR
(A.TestID = res.Test_UK AND A.TopScore = res.Test_US_Scr) OR
(A.TestID = res.Test_US AND A.TopScore = res.Test_US_Scr) OR
(A.TestID = res.Test_UK AND A.LowScore = res.Test_US_Scr) OR
(A.TestID = res.Test_US AND A.LowScore = res.Test_US_Scr)) D)
SELECT * FROM Get_ResultID_Of_Results
This is the results I'm trying to achieve (extra columns that would state where Max value and Min value was found) that would state the ResultID from Results table. Also, the row numbers match the ResultIDs in the table.
TestID|TopScore|LowScore|LastDateTestTaken |MaxValueLocID|MinValueLocID|
1 94 85 2018-11-23 00:00:00.000 2 1
2 71 52 2018-11-25 00:00:00.000 3 4
3 83 14 2018-12-02 00:00:00.000 6 1
4 98 51 2018-12-04 00:00:00.000 7 4
5 64 41 2018-12-09 00:00:00.000 10 9
6 74 57 2018-12-02 00:00:00.000 5 6
7 91 22 2018-12-04 00:00:00.000 7 8
8 76 37 2018-12-09 00:00:00.000 9 10
Asking for any help with the solution, theoretical or even practical. Thank you!

If I follow correctly, you want to unpivot the data and aggregate:
select v.testid, max(v.score), min(v.score) max(v.TestTakenOn)
from results r cross apply
(values (Test_UK, TestUK_Scr, TestTakenOn),
(Test_US, TestUS_Scr, TestTakenOn)
) v(testid, score, TestTakenOn)
group by v.testid;
Then you can modify this using window functions:
select v.testid, max(v.score), min(v.score) max(v.TestTakenOn),
max(case when seqnum_desc = 1 then resultid end) as resultid_max,
max(case when seqnum_asc = 1 then resultid end) as resultid_min
from (select r.resultid, v.*,
row_number() over (partition by v.testid order by v.score asc) as seqnum_asc,
row_number() over (partition by v.testid order by v.score desc) as seqnum_desc
from results r cross apply
(values (Test_UK, TestUK_Scr, TestTakenOn),
(Test_US, TestUS_Scr, TestTakenOn)
) v(testid, score, TestTakenOn)
) v
group by v.testid;

with allScores (TestId, Score, TestTakenOn, valueLoc) as
(
select [Test_UK], [TestUK_Scr],[TestTakenOn], ResultId from scores
union all
select [Test_US], [TestUS_Scr],[TestTakenOn], ResultId from scores
),
maxMin (TestId, MaxScore, MinScore, LastTestDate) as (
select TestId, Max(score), Min(score), Max(TestTakenOn)
from allScores
group by TestId
)
select mm.*, a1.valueLoc as MaxValueLoc, a2.ValueLoc as MinValueLoc
from maxMin mm
inner join allScores a1
on mm.TestId = a1.TestId and mm.MaxScore = a1.score
inner join allScores a2
on mm.TestId = a2.TestId and mm.MinScore = a2.score;
DBFiddle demo

Calculating Average based on Month and ID in SQL

I have a sample table like below and would like to do average based on MONTH of the DATE and ID. Is there any way I could do this in SQL.
Table:Input
DATE ID VOLUME
20080630 A 45
20080628 A 23
20080629 A 34
20080627 A 33
20080730 A 45
20080728 A 12
20080730 A 34
20080724 A 56
20080430 A 34
20080428 A 23
20080630 B 12
20080628 B 45
20080629 B 67
20080627 B 78
20080730 B 45
20080728 B 12
20080730 B 34
20080724 B 56
20080430 B 2
20080428 B 34
Table:Output
DATE ID VOLUME AVERAGE
20080630 A 45 33.75
20080628 A 23 33.75
20080629 A 34 33.75
20080627 A 33 33.75
20080730 A 45 36.75
20080728 A 12 36.75
20080730 A 34 36.75
20080724 A 56 36.75
20080430 A 34 28.5
20080428 A 23 28.5
20080630 B 12 50.5
20080628 B 45 50.5
20080629 B 67 50.5
20080627 B 78 50.5
20080730 B 45 36.75
20080728 B 12 36.75
20080730 B 34 36.75
20080724 B 56 36.75
20080430 B 2 18
20080428 B 34 18

You could try this Query:
select DATE_FORMAT(date,'%Y%m'), id, avg(volume) from xxx1
group by DATE_FORMAT(date,'%Y%m'), ID;
or
select DATE_FORMAT(date,'%m'), id, avg(volume) from xxx1
group by DATE_FORMAT(date,'%m'), ID
Explanation:
DATE_FORMAT(date,'%Y%m') extracts the month and year from the date. Whereas DATE_FORMAT(date,'%m') extracts only the month. (I was not quite sure, if you want the month without the year).
Basically you extract the month and then group by it together with the id and calculate the average of the volume for these groups.

This is a Standard SQL answer, as you didn't tag your DBMS:
AVG(VOLUME)
OVER (PARTITION BY EXTRACT(YEAR FROM datecolumn)
,EXTRACT(MONTH FROM datecolumn))

If you are using the mssql server then so
select t1.*, t2.AVG
from TABLE1 t1
join
(
SELECT avg(volume) as AVG, MONTH(date) as DATE
FROM TABLE1
GROUP BY MONTH(date), ID
) t2 on MONTH(t1.DATE) = t2.DATE
The output will match the required. The query will select all columns from source table and will add GROUP BY one column.

SQL Server 2000 query

I have a created a few tables containing multiple records from several users so I can simulate circumstances.
I created the following query:
SELECT
a.celid, a.callid, a.active, a.messagetext,
b.jactive, a.cel_time, c.username, a.muserid
FROM level2 a, calls b , login c
WHERE a.callid = b.jid
AND a.muserid = c.loginid
AND b.jid = 92
AND a.win = 0
AND b.userid = 12
ORDER BY
cel_time ASC
and got the following as result
545 92 2 hello1 2 2011-09-18 16:32:17.000 phil01 21
546 92 1 hello2 2 2011-09-18 16:42:38.000 phil01 21
547 92 2 hello3 2 2011-09-18 16:59:08.000 danny 16
548 92 1 hello4 2 2011-09-18 20:46:21.000 phil01 21
549 92 1 hello5 2 2011-09-18 20:47:16.000 phil01 21
550 92 1 hello6 2 2011-09-19 19:32:15.000 phil01 21
551 92 1 hello7 2 2011-09-19 19:34:14.000 phil01 21
but I actually want this result to be distinct on muserid and return only return two rows.
I have studied distinct value description but can not seem to get this accomplished.
How would I accomplish this?

Use this SQL:
SELECT
a.celid, a.callid, a.active, a.messagetext,
b.jactive, a.cel_time, c.username, a.muserid
FROM level2 a
JOIN calls b ON a.callid = b.jid
JOIN login c ON a.muserid = c.loginid
JOIN
(SELECT l2.muserid, MAX(l2.cel_time) as max_time
FROM level2 l2
GROUP BY l2.muserid) d ON (d.muserid = a.muserid AND a.cel_time = d.max_time)
WHERE b.jid = 92
AND a.win = 0
AND b.userid = 12

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Self join with closest time stamp - sql

Related

SQL Query for getting data from different columns of different rows

How to join two tables based on conditon in sql?

How to get a row number or ID of where the MAX() value was found

Calculating Average based on Month and ID in SQL

SQL Server 2000 query

Categories

Resources