join two views and detect missing entries where the matching condition is in the next row of the other view/table (using SQLITE) - sql

I am running a science test and logging my data inside two sqlite tables.
I have selected the data needed into two seperate and independent Views (RX and TX views).
Now I need to analyze the measurements and create a 3rd table view with the results with the following points in mind:
1- For each test at TX side (Table-1) there might be a corresponding entry at RX side (Table-2).
2- If the time stamp #RX side is less than the time stamp at the next row of the TX table view
we consider them to be associated with one record in the 3rd view/table and calculate the time difference OTHERWISE it would be a miss.
Question: How should i write the sql query in SQLITE to produce the analysis and test result given in table3?
Thanks a lot in advance.
TX View - Table (1)
id | time | measurement
------------------------
1 | 09:40:10.221 | 100
2 | 09:40:15.340 | 60
3 | 09:40:21.100 | 80
4 | 09:40:25.123 | 90
5 | 09:40:29.221 | 45
RX View -Table (2)
time | measurement
------------------------
09:40:15.7 | 65
09:40:21.560 | 80
09:40:30.414 | 50
Test Result View - Table (3)
id |TxTime |RxTime | delta_time(s)| delta_value
------------------------------------------------------------------------
1 | 09:40:10.221 | NULL |NULL | NULL (i.e. missed)
2 | 09:40:15.340 | 09:40:15.7 |0.360 | 5
3 | 09:40:21.100 | 09:40:21.560 |0.460 | 0
4 | 09:40:25.123 | NULL |NULL | NULL (i.e. missed)
5 | 09:40:29.221 | 09:40:30.414 |1.193 | 5

Use window function LEAD() to get the next time of each row in TX and join the views on your conditions:
SELECT t.id, t.time TxTime, r.time RxTime,
ROUND((julianday(r.time) - julianday(t.time)) * 24 * 60 *60, 3) [delta_time(s)],
r.measurement - t.measurement delta_value
FROM (
SELECT *, LEAD(time) OVER (ORDER BY Time) next
FROM TX
) t
LEFT JOIN RX r ON r.time >= t.time AND (r.time < t.next OR t.next IS NULL)
See the demo.
Results:
> id | TxTime | RxTime | delta_time(s) | delta_value
> -: | :----------- | :----------- | :------------ | :----------
> 1 | 09:40:10.221 | null | null | null
> 2 | 09:40:15.340 | 09:40:15.7 | 0.36 | 5
> 3 | 09:40:21.100 | 09:40:21.560 | 0.46 | 0
> 4 | 09:40:25.123 | null | null | null
> 5 | 09:40:29.221 | 09:40:30.414 | 1.193 | 5

Related

how to join tables on cases where none of function(a) in b

Say in MonetDB (specifically, the embedded version from the "MonetDBLite" R package) I have a table "events" containing entity ID codes and event start and end dates, of the format:
| id | start_date | end_date |
| 1 | 2010-01-01 | 2010-03-30 |
| 1 | 2010-04-01 | 2010-06-30 |
| 2 | 2018-04-01 | 2018-06-30 |
| ... | ... | ... |
The table is approximately 80 million rows of events, attributable to approximately 2.5 million unique entities (ID values). The dates appear to align nicely with calendar quarters, but I haven't thoroughly checked them so assume they can be arbitrary. However, I have at least sense-checked them for end_date > start_date.
I want to produce a table "nonevent_qtrs" listing calendar quarters where an ID has no event recorded, e.g.:
| id | last_doq |
| 1 | 2010-09-30 |
| 1 | 2010-12-31 |
| ... | ... |
| 1 | 2018-06-30 |
| 2 | 2010-03-30 |
| ... | ... |
(doq = day of quarter)
If the extent of an event spans any days of the quarter (including the first and last dates), then I wish for it to count as having occurred in that quarter.
To help with this, I have produced a "calendar table"; a table of quarters "qtrs", covering the entire span of dates present in "events", and of the format:
| first_doq | last_doq |
| 2010-01-01 | 2010-03-30 |
| 2010-04-01 | 2010-06-30 |
| ... | ... |
And tried using a non-equi merge like so:
create table nonevents
as select
id,
last_doq
from
events
full outer join
qtrs
on
start_date > last_doq or
end_date < first_doq
group by
id,
last_doq
But this is a) terribly inefficient and b) certainly wrong, since most IDs are listed as being non-eventful for all quarters.
How can I produce the table "nonevent_qtrs" I described, which contains a list of quarters for which each ID had no events?
If it's relevant, the ultimate use-case is to calculate runs of non-events to look at time-till-event analysis and prediction. Feels like run length encoding will be required. If there's a more direct approach than what I've described above then I'm all ears. The only reason I'm focusing on non-event runs to begin with is to try to limit the size of the cross-product. I've also considered producing something like:
| id | last_doq | event |
| 1 | 2010-01-31 | 1 |
| ... | ... | ... |
| 1 | 2018-06-30 | 0 |
| ... | ... | ... |
But although more useful this may not be feasible due to the size of the data involved. A wide format:
| id | 2010-01-31 | ... | 2018-06-30 |
| 1 | 1 | ... | 0 |
| 2 | 0 | ... | 1 |
| ... | ... | ... | ... |
would also be handy, but since MonetDB is column-store I'm not sure whether this is more or less efficient.
Let me assume that you have a table of quarters, with the start date of a quarter and the end date. You really need this if you want the quarters that don't exist. After all, how far back in time or forward in time do you want to go?
Then, you can generate all id/quarter combinations and filter out the ones that exist:
select i.id, q.*
from (select distinct id from events) i cross join
quarters q left join
events e
on e.id = i.id and
e.start_date <= q.quarter_end and
e.end_date >= q.quarter_start
where e.id is null;

sql query to compare table entries and detect missing data (exceedding a threshold)

I am running a science test and logging my data inside two sqlite tables.
I need to analyze the measurements and create a 3rd table view with the results with the following points in mind:
1- The entries in the tables are completely independent and irrelevent to each other.
The id at TX side shows the test number.
2- for each test at TX side (Table-1) there might be a corresponding entry at RX side (Table-2).
2-1- Well how are these 2 tables related to each other?
For each test I make (i.e. sending a number from TX side) I expect to receive a number #RX side with a maximum delay of 1.5 seconds.
Any entry with a delay of more than that threshold should be considered as a miss/NULL.
Question:
How should i write the sql query in SQLITE to produce the analysis and test result given in table3?
Thanks a lot in advance.
Table (1) TX
id | time | measurement
------------------------
1 | 09:40:10.221 | 100
2 | 09:40:15.340 | 60
3 | 09:40:21.100 | 80
4 | 09:40:25.123 | 90
5 | 09:40:29.221 | 45
Table (2) RX
time | measurement
------------------------
09:40:10.521 | 100
09:40:21.560 | 80
09:40:26.210 | 90
09:40:30.414 | 50
Table (3) Test Result
id | delta_time(s)| delta_value
----------------------------------------
1 | 0.300 | 0
2 | NULL | NULL (i.e. missed)
3 | 0.460 | 0
4 | 1.087 | 0
5 | 1.193 | 5
Join the tables on your conditions and use window functions MIN() and FIRST_VALUE():
select distinct
t.id,
round(min((julianday(r.time) - julianday(t.time)) * 24 * 60 * 60) over (partition by id), 3) [delta_time(s)],
first_value(r.measurement - t.measurement) over (partition by id order by (julianday(r.time) - julianday(t.time))) [delta_value]
from TX t left join RX r
on (julianday(r.time) - julianday(t.time)) * 24 * 60 * 60 between 0 and 1.5;
See the demo.
Results:
> id | delta_time(s) | delta_value
> -: | ------------: | ----------:
> 1 | 0.3 | 0
> 2 | null | null
> 3 | 0.46 | 0
> 4 | 1.087 | 0
> 5 | 1.193 | 5
You can use a correlated subquery:
select tx.*,
(select rx.measurement - tx.measurement
from rx
where rx.time >= tx.time and
rx.time < datetime(tx.time, '+1.5 seconds')
limit 1 -- in case there is more than one
) delta_value
from tx;

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Pick a record based on a given value in postgres

I have a table in postgres like below,
alg_campaignid | alg_score | cp | sum
----------------+-----------+---------+----------
9829 | 30.44056 | 12.4000 | 12.4000
9880 | 29.59280 | 12.0600 | 24.4600
9882 | 29.59280 | 12.0600 | 36.5200
9827 | 29.27504 | 11.9300 | 48.4500
9821 | 29.14840 | 11.8800 | 60.3300
9881 | 29.14840 | 11.8800 | 72.2100
9883 | 29.14840 | 11.8800 | 84.0900
10026 | 28.79280 | 11.7300 | 95.8200
10680 | 10.31504 | 4.1800 | 100.0000
From which i have to select a record based on randomly generated number from 0 to 100.i.e first record should be returned if random number picked is between 0 and 12.4000,second if rendom is between 12.4000 and 24.4600,and likewise last if random no is between 95.8200 and 100.0000.
For Example
if the random number picked is 8 then the first record should be returned
or
if the random number picked is 48 then the fourth record should be returned
Is it possible to do this postgres if so kindly recommend a solution for this..
Yes, you can do this in Postgres. If you want to generate the number in the database:
with r as (
select random() * 100 as r
)
select t.*
from table t cross join r
where t.sum <= r.r
order by t.sum desc
limit 1;

Transaction management and temporary tables in SQL Server

Sorry for the title, perhaps it's not very clear.
I have some SQL queries in a script that depend on each other.
The script uses a temporary table in which the data is inserted (the #temp_data table).
This is the expected output:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | 1 | 10 |
| NULL | 3 | 40 |
| NULL | 5 | 90 |
Here is the query structure (I didn't include the actual query since it's too big):
-- First group
queryForSpeed1
queryToUpdateDistanceBasedOnSpeed1
-- Second group
queryForSpeed2
queryToUpdateDistanceBasedOnSpeed2
If I run the first group of queries (queryForSpeed1 and queryToUpdateDistanceBasedOnSpeed1) separately from the second group then I get the expected output: only the speed1 and distance columns contain data:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
The same happens when I run the second group:
___________________________________
| speed1 | speed2 | distance |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | 1 | 10 |
| NULL | 2 | 40 |
| NULL | 3 | 90 |
BUT, when I run both groups: all the distances are NULL:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | NULL |
| 3 | NULL | NULL |
| 5 | NULL | NULL |
| NULL | 1 | NULL |
| NULL | 2 | NULL |
| NULL | 3 | NULL |
I believe this is somehow related to transaction management and temporary tables, although I wasn't able to find anything relevant to solve the problem on Google.
From what I've read, SQL Server keeps a transaction log where it stores every update, insert and whatever... when it arrives at the end of the script it actually does all those insertions and updates.
So the update I did for the distance column finds all the speeds as being NULL because the data wasn't yet inserted in the temporary table from the previous updates, but at the end of the query the speeds are inserted in the table so that's why they are visible.
I played a bit with the GO statement to execute my script in batches, but no luck so far...
What am I doing wrong? Can someone point me in the right direction, please?
EDIT
Here is the actual query.
The problem is not related to transactions, but rather to the way you conduct updates to #temp_speed_profile. The second pass through #temp_speed_profile retrieves all six records. Speed_new is null in first record of Voyage_Id, consequently #distance becomes null. As you retain the value of #distance in next turn, it remains null.
Problem goes away when using different temporary tables because second pass works on second set of data only.
A note on cursors - when defining one make sure to add local and fast_forward. Local because it is limiting cursors' scope, and fast_forward to optimize fetches.
It is almost certainly caused by the way you have written your queries.
To confirm, just rewrite your queries using #temp_data1 and #temp_data2, rather than a single table #temp_data.