Compare and get missing values from two columns of two different tables - sql

I have two tables named StationUtilization and Process. Both tables have columns TestStart and TestDateTime respectively and should have similar records.
However, there are some missing records in TestStart column of StationUtilization table that needs to be added. How can I compare these two columns to get the missing values?
Example:
StationUtilization Table
ID
TestStart
.....
1
2021-01-01 22:42:23.000
2
2021-01-02 22:42:23.000
3
2021-01-05 22:42:23.000
Process Table:
ID
TestDateTime
.....
1
2021-01-01 22:42:23.000
2
2021-01-02 22:42:23.000
3
2021-01-03 22:42:23.000
4
2021-01-04 22:42:23.000
5
2021-01-05 22:42:23.000
Expected output after comparison:
ID
TestDateTime
.....
3
2021-01-03 22:42:23.000
4
2021-01-04 22:42:23.000

SELECT * FROM StationUtilization
LEFT JOIN Process
ON Process.TestDateTime = StationUtilization.TestStart
WHERE PROCESS.ID is null

NOT EXISTS is one approach:
select p.*
from Process p
where not exists (select 1
from StationUtilization su
where p.TestDateTime = su.TestStart
);

Related

How to index match with conditions in sql

I have tables like this:
regist table
userID
registDate
1
2022-01-22
2
2022-01-23
session table
userID
date_key
traffic
null
2022-01-02
facebook
1
2021-01-03
facebook
1
2021-01-04
google
1
2021-01-05
linkedin
2
2021-01-15
facebook
2
2021-01-25
facebook
3
2021-01-20
facebook
Output
userID
date_key
traffic
regist date
1
2021-01-03
facebook
2022-01-22
1
2021-01-04
google
2022-01-22
1
2021-01-05
linkedin
2022-01-22
2
2021-01-15
facebook
2022-01-23
How do I merge the tables so that I can return the regist date. Do I do a right join?
Is this correct?
select *
from sessiontables st
left join registtable rt on st.userID = rt.userID
where st.userID is not null
How to do exist userID exist in regist table statement?
if I understand correctly, You can try to use self join with an aggregate function.
select rt.userID,
st.date_key,
st.traffic,
rt.registDate
from (
SELECT userID,min(date_key) date_key,traffic
FROM sessiontables
GROUP BY traffic,userID
) st
JOIN registtable rt
ON st.userID=rt.userID

Need help joining incremental data to a fact table in an incremental manor

TableA
ID
Counter
Value
1
1
10
1
2
28
1
3
34
1
4
22
1
5
80
2
1
15
2
2
50
2
3
39
2
4
33
2
5
99
TableB
StartDate
EndDate
2020-01-01
2020-01-11
2020-01-02
2020-01-12
2020-01-03
2020-01-13
2020-01-04
2020-01-14
2020-01-05
2020-01-15
2020-01-06
2020-01-16
TableC (output)
ID
Counter
StartDate
EndDate
Val
1
1
2020-01-01
2020-01-11
10
2
1
2020-01-01
2020-01-11
15
1
2
2020-01-02
2020-01-12
28
2
2
2020-01-02
2020-01-12
50
1
3
2020-01-03
2020-01-13
34
2
3
2020-01-03
2020-01-13
39
1
4
2020-01-04
2020-01-14
22
2
4
2020-01-04
2020-01-14
33
1
5
2020-01-05
2020-01-15
80
2
5
2020-01-05
2020-01-15
99
1
1
2020-01-06
2020-01-16
10
2
1
2020-01-06
2020-01-16
15
I am attempting to come up with some SQL to create TableC. What TableC is, it takes the data from TableB, in chronological order, and for each ID in tableA, it finds the next counter in the sequence, and assigns that to the Start/End date combination for that ID, and when it reaches the end of the counter, it will start back at 1.
Is something like this even possible with SQL?
Yes this is possible. Try to do the following:
Calculate maximal value for Counter in TableA using SELECT MAX(Counter) ... into max_counter.
Add identifier row_number to each row in TableB so it will be able to find matching Counter value using SELECT ROW_NUMBER() OVER() ....
Establish relation between row number in TableB and Counter in TableA like this ... FROM TableB JOIN TableA ON (COALESCE(NULLIF(TableB.row_number % max_counter = 0), max_counter)) = TableA.Counter.
Then gather all these queries using CTE (Common Table Expression) into one query as official documentation shows.
Consider below approach
select id, counter, StartDate, EndDate, value
from tableA
join (
select *, mod(row_number() over(order by StartDate) - 1, 5) + 1 as counter
from tableB
)
using (counter)
if applied to sample data in your question - output is

How to insert into SQL table with previous data check

I'm creating a table in which I will store bookmakers odds changes for sport events over time (it will have hundrets k of rows).
I want to create an update function in PHP, which puts in the table data only if current_odd_value is different than most recent odd_value stored in table.
Using simple INSERT function I created this table of 1 match (8483075) from two companies (66 and 22) for the same market (1) which has 3 selection (1001, 1002, 1003) that I get today at 17:00:
internal_id
match_id
company_id
market_id
selection_id
odd_value
update_date
1
8483075
66
1
1001
9,60
2021-01-04 17:00:00
2
8483075
66
1
1002
18,00
2021-01-04 17:00:00
3
8483075
66
1
1003
1,09
2021-01-04 17:00:00
4
8483075
22
1
1001
8,40
2021-01-04 17:00:00
5
8483075
22
1
1002
16,00
2021-01-04 17:00:00
6
8483075
22
1
1003
1,08
2021-01-04 17:00:00
At 17:05 I checked odds once again and I noticed 2 changes (for internal_id 2 and 6):
2 / 8483075 / 66 / 1 / 1002 / 15,00 ==> 18,00
6 / 8483075 / 22 / 1 / 1003 / 1,08 ==> 1,18
, that I should put into that table and should look like this:
internal_id
match_id
company_id
market_id
selection_id
odd_value
update_date
7
8483075
66
1
1002
15,00
2021-01-04 17:05:00
8
8483075
22
1
1003
1,18
2021-01-04 17:05:00
My idea to do that was to:
get table of all recent odd values for each match_id + company_id + market_id + selection_id
compare it with current odd value and only if it's different than value from point 1. put new record into table with proper data
MY QUESTIONS:
What will be the SELECT query to get what I need for point 1? I think I can use internal_id (higher means most recent) or update_date to get it, but I don't know how. I know how to make it for specific match_id + company_id + makret_id + selection_id but I need whole table in one select not one by one.
Is my approach correct or I should try different approach? (I think that retriving whole table at the beginning of update with most recent odds should be faster than comparing each value one by one)
Additional info:
All data that I have are coming from XML/JSON files that I'm receiving from different sources (so different formats etc. that I'm unifying under my db).

Getting date difference between consecutive rows in the same group

I have a database with the following data:
Group ID Time
1 1 16:00:00
1 2 16:02:00
1 3 16:03:00
2 4 16:09:00
2 5 16:10:00
2 6 16:14:00
I am trying to find the difference in times between the consecutive rows within each group. Using LAG() and DATEDIFF() (ie. https://stackoverflow.com/a/43055820), right now I have the following result set:
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 00:06:00
2 5 00:01:00
2 6 00:04:00
However I need the difference to reset when a new group is reached, as in below. Can anyone advise?
Group ID Difference
1 1 NULL
1 2 00:02:00
1 3 00:01:00
2 4 NULL
2 5 00:01:00
2 6 00:04:00
The code would look something like:
select t.*,
datediff(second, lag(time) over (partition by group order by id), time)
from t;
This returns the difference as a number of seconds, but you seem to know how to convert that to a time representation. You also seem to know that group is not acceptable as a column name, because it is a SQL keyword.
Based on the question, you have put group in the order by clause of the lag(), not the partition by.

Join two data frames SQL where and between overlaps [duplicate]

This question already has an answer here:
Finding Overlaps between interval sets / Efficient Overlap Joins
(1 answer)
Closed 7 years ago.
I am trying to join two data frames which in SQL would utilise a where and a between statement for dates.
In SQL, the code would be:
select Date,(Value1-Test1) as Ans1,(Value2-Test2) as Ans2,ID
from Data a
inner join Test b on a.ID=b.ID and a.Date between b.DateStart and c.DateEnd
This is Data
Date Value1 Value2 ID
01/01/16 19:30:00 10 30 A
01/01/16 19:50:20 20 40 B
01/01/16 19:55:30 30 50 C
This is Test
RowNumber DateStart DateEnd Test1 Test2 ID
1 01/01/16 17:00:00 01/01/16 22:00:05 2 4 A
2 01/01/16 22:00:06 01/01/16 01:50:00 3 6 A
3 01/01/16 17:00:00 01/01/16 22:00:05 4 8 B
4 01/01/16 22:00:06 01/01/16 01:50:00 5 2 B
5 01/01/16 17:00:00 01/01/16 22:00:05 6 4 C
6 01/01/16 22:00:06 01/01/16 01:50:00 7 5 C
The results I am trying to create
Date Ans1 Ans2 ID
01/01/16 19:30:00 8 26 A
01/01/16 19:50:12 16 32 B
01/01/16 19:55:24 24 46 C
Any help and pointers would be great.
Following advice from #zx8754 I have tried to use data.table::foverlaps()
In Data, rename the Date field to DateStart and create a second date field where DateEnd=Date. Add the following code:
setkey(Data,ID,DateStart,DateEnd)
setkey(Test,Id,DateStart,DateEnd)
CompleteDataset <- foverlaps(Data, Test, type="any")
This give me exactly what I want.
Finding Overlaps between interval sets / Efficient Overlap Joins
Simply merge the two datasets on ID, then conditionally filter rows afterwards which corresponds to SQL's JOIN and WHERE clauses. Finally, run calculations and select columns afterwards.
mergedf <- merge(data, test, by="ID")
mergedf <- mergedf[(mergedf$Date >= mergedf$DateStart &
mergedf$Date <= mergedf$DateEnd),]
mergedf$Ans1 <- mergedf$Value1 - mergedf$Test1
mergedf$Ans2 <- mergedf$Value2 - mergedf$Test2
mergedf <- mergedf[c('Date', 'Ans1', 'Ans2', 'ID')]