SQL table join based on aggregated/equal values - sql

I have a very horrible two table dataset to hand that I need to create a join query for. Its best to show an example:
+------+---------+-----------+--+
| Time | Sent | Received | |
+------+---------+-----------+--+
| 1 | 100 | NULL | |
| 2 | NULL | 100 | |
| 3 | 50 | NULL | |
| 4 | NULL | 40 | |
| 5 | NULL | 10 | |
| 6 | 400 | 200 | |
| 7 | 100 | 200 | |
| 8 | NULL | 100 | |
| 9 | 500 | 500 | |
+------+---------+-----------+--+
Assuming 'time' above is in hours - 'Sent' shows the number of items sent in that hour, and 'Received' shows the number received. The problem being that they likely will not arrive in the same hour they were sent (though they can).
I need to match the received against the appropriate sent to find the time the received item was Sent.
Using the above:
Received 100 at time 2 is obviously the items sent from hour 1, so
that would be assigned to hour 1.
50 Sent in time 3 arrived in two batches (40 and 10 in time 4/5 respectively). So received 40/10 should be lumped into the time 3 category
Received in 6/7 (each for 200) correspond to the 400 order in time 6 (note that half the order was received in the same hour, this can happen)
Also in time 7 a new order was sent which corresponds to received for time 8
Also in time 9 an order of 500 was sent and received in the same hour.
Below is an example of what the output would look like (Note that there are other values associated with each 'Received' row but they are orthogonal to the task and will just be summed to provide meaning)
+------+----------+
| Time | Received |
+------+----------+
| 1 | 100 |
| 3 | 50 |
| 6 | 400 |
| 7 | 100 |
| 8 | 100 |
| 9 | 500 |
+------+----------+
I have been trying to rack my head around this for a while. If I could do this outside of sql I would have some function that loops through the value for each 'Sent' incrementally through time and loop that through 'Received' until the values match then assign those Received values to the Time index, then delete both the sent and received from the array (or note where the loop got to and continue from there)
Unfortunately the project doesnt allow the that scope - This must be done as much in SQL as possible. I am really at a loss and hoping there is some SQL functionality I have overlooked. Any help is much appreciated

If this is in SQL Server, you can use a WHILE loop. Look at the documentation. So, your project might look something like this:
CREATE TABLE #temp ([Time] int, [Received] int)
DECLARE #i int = 1
DECLARE #value int = 0
WHILE #i <= 9
BEGIN
#value = SELECT [Received] FROM [table] WHERE [Time] = #i
--Your logic here
INSERT INTO #temp ...
END
SELECT * FROM #temp
DROP TABLE #temp

Related

join two views and detect missing entries where the matching condition is in the next row of the other view/table (using SQLITE)

I am running a science test and logging my data inside two sqlite tables.
I have selected the data needed into two seperate and independent Views (RX and TX views).
Now I need to analyze the measurements and create a 3rd table view with the results with the following points in mind:
1- For each test at TX side (Table-1) there might be a corresponding entry at RX side (Table-2).
2- If the time stamp #RX side is less than the time stamp at the next row of the TX table view
we consider them to be associated with one record in the 3rd view/table and calculate the time difference OTHERWISE it would be a miss.
Question: How should i write the sql query in SQLITE to produce the analysis and test result given in table3?
Thanks a lot in advance.
TX View - Table (1)
id | time | measurement
------------------------
1 | 09:40:10.221 | 100
2 | 09:40:15.340 | 60
3 | 09:40:21.100 | 80
4 | 09:40:25.123 | 90
5 | 09:40:29.221 | 45
RX View -Table (2)
time | measurement
------------------------
09:40:15.7 | 65
09:40:21.560 | 80
09:40:30.414 | 50
Test Result View - Table (3)
id |TxTime |RxTime | delta_time(s)| delta_value
------------------------------------------------------------------------
1 | 09:40:10.221 | NULL |NULL | NULL (i.e. missed)
2 | 09:40:15.340 | 09:40:15.7 |0.360 | 5
3 | 09:40:21.100 | 09:40:21.560 |0.460 | 0
4 | 09:40:25.123 | NULL |NULL | NULL (i.e. missed)
5 | 09:40:29.221 | 09:40:30.414 |1.193 | 5
Use window function LEAD() to get the next time of each row in TX and join the views on your conditions:
SELECT t.id, t.time TxTime, r.time RxTime,
ROUND((julianday(r.time) - julianday(t.time)) * 24 * 60 *60, 3) [delta_time(s)],
r.measurement - t.measurement delta_value
FROM (
SELECT *, LEAD(time) OVER (ORDER BY Time) next
FROM TX
) t
LEFT JOIN RX r ON r.time >= t.time AND (r.time < t.next OR t.next IS NULL)
See the demo.
Results:
> id | TxTime | RxTime | delta_time(s) | delta_value
> -: | :----------- | :----------- | :------------ | :----------
> 1 | 09:40:10.221 | null | null | null
> 2 | 09:40:15.340 | 09:40:15.7 | 0.36 | 5
> 3 | 09:40:21.100 | 09:40:21.560 | 0.46 | 0
> 4 | 09:40:25.123 | null | null | null
> 5 | 09:40:29.221 | 09:40:30.414 | 1.193 | 5

Select Value based on Multiple Value Range in SQL

I am having multiple criteria to give incentive to my employees. For example as shown in below image
Grid Table is dynamic in nature. It keeps on changing based on business conditions.
I have a table where I have emp Ids whose Resolution % I have calculated and also calculated their Normalization %. Now, I need to give them % Incentives based on the above Grid using SQL Query.
Output Table in which i need to update the incentives
I assume the grid table is also stored as a database table (so you can update it):
+-----------------+---------------+--------------------+------------------+-----------+
| INCENTIVES |
+-----------------+---------------+--------------------+------------------+-----------+
| from_resulution | to_resolution | from_normalization | to_normalization | incentive |
+-----------------+---------------+--------------------+------------------+-----------+
| 0 | 70 | 0 | 5 | 9 |
| 0 | 70 | 5 | 10 | 11 |
| 0 | 70 | 10 | 100 | 13 |
| 71 | 75 | 0 | 5 | 10 |
... I hope you get the idea
+-----------------+---------------+--------------------+------------------+-----------+
And the update query can be:
update employee E
set E.incentive = (select I.incentive
from incentives I
where e.resolution >= I.from_resolution
and e.resolution < I.to_resolution
and e.normalization >= I.from_normalization
and e.normalization < I.to_normalization)
UPDATE: the TO values are not in the scope of the range. By using the TO value equal to the FROM value of the next range we assure to cover all values (including floating point). Thanks to Gordon

Returning singular row/value from joined table date based on closest date

I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.

Creating a view that joins multiple tables on an ID and a timestamp that needs to be rounded

I have a web application that sends data to my sqlite database into different tables depending on the information. I would like to make a view that merges multiple tables together based on cownumber and TS[timestamp] (There are no updates to my table, so a change to the same cownumber send the full record as a new entry with a new timestamp). The ajax calls are made table by table so the TS do not exactly sync up generally they can be 5-20 seconds off depending on the connection
Here is a sample of the three tables
+----master_animal-----+
+----------------------------------------------------+
| cownumber | height | weight | ts |
+-----------+----------+--------+--------------------+
| 1 | 150 | ... | 2017-12-01 12:28:00|
| 2 | 170 | ... | 2017-12-03 17:16:00|
| 3 | 60 | ... | 2017-12-03 08:09:00|
| 4 | 109 | ... | 2017-12-04 23:23:00|
+----animal_inventory-----+
+-------------------------------------------------------------+
| cownumber | brandlocation| dateacquired| ts |
+-----------+--------------+-------------+--------------------+
| 1 | ... | ... | 2017-12-01 12:28:50|
| 2 | ... | ... | 2017-12-03 17:16:30|
| 3 | ... | ... | 2017-12-03 08:09:12|
| 4 | ... | ... | 2017-12-04 23:23:23|
+----experiment-----+
+-------------------------------------------------------------+
| cownumber | ageatwean | birthweight | ts |
+-----------+--------------+-------------+--------------------+
| 1 | ... | ... | 2017-12-01 12:28:20|
| 2 | ... | ... | 2017-12-03 17:16:41|
| 3 | ... | ... | 2017-12-03 08:09:24|
| 4 | ... | ... | 2017-12-04 23:23:11|
The View I wrote
CREATE VIEW testing
AS SELECT a.height,a.weight,a.cownumber,
b.brandlocation,b.dateacquired,
c.ageatwean,c.birthweight
FROM master_animal a, animal_inventory b, experiment c
WHERE a.cownumber=b.cownumber
AND ROUND(a.ts/10000) = ROUND(b.ts/10000)
AND a.cownumber=c.cownumber
AND ROUND(a.ts/10000) = ROUND(c.ts/10000);
The query I wrote
Select * from testing where cownumber = 1;
What I was hoping to get back was
+----testing-----+
+----------------------------------------------------+
| cownumber | height | weight | brandlocation| dateacquired | ageatwean |birthweight |
+-----------+--------+--------+--------------+--------------+-----------+------------+
| 941 | 0 | ... | ... | ... | ... | .. |
Where there will be one row for cownumber 941 as long as all the correlated records were within a few seconds of each other. I am not exactly sure if I need to divide by 10000 or smaller. The same record should be no more than 50 seconds apart from each other. Anything more than 50 seconds apart should be considered a new record.
When I test this where there is only one record for that cownumber it works fine. But lets say I change some information from each table. I provide a new height, a new brandlocation.
Instead of getting two rows. The first row being the initial data entry and the second row showing the same cownumber with the changed values, I get back 8 rows with partial changes.
height|weight|cownumber|brandlocation|dateacquired|ageatwean|birthweight|
0.0|0.0|941|0|0|0.0|0
0.0|0.0|941|0|0|0.0|0
0.0|0.0|941|Left Hip|0|0.0|0
0.0|0.0|941|Left Hip|0|0.0|0
50.0|0.0|941|0|0|0.0|0
50.0|0.0|941|0|0|0.0|0
50.0|0.0|941|Left Hip|0|0.0|0
50.0|0.0|941|Left Hip|0|0.0|0
I assume the issue is in my where clause but I am not sure exactly how to fix it
The timestamps are stored as strings. When you try to divide it, the database tries to convert it to a number, which results in 2017. So all timestamps end up being the same.
Dividing cannot determine the distance; the values 9999 and 10000 will end up different although they are right near each other. (And an integer division results in an integer result, so the ROUND() has no effect.)
To compute the distance, convert the timestamp into a number of seconds first, and then use abs():
SELECT ...
FROM master_animal m
JOIN animal_inventory i ON m.cownumber = i.cownumber
AND abs(strftime('%s', m.ts) - strftime('%s', i.ts)) <= 50
JOIN experiment e ON m.cownumber = e.cownumber
AND abs(strftime('%s', m.ts) - strftime('%s', e.ts)) <= 50;

Transaction management and temporary tables in SQL Server

Sorry for the title, perhaps it's not very clear.
I have some SQL queries in a script that depend on each other.
The script uses a temporary table in which the data is inserted (the #temp_data table).
This is the expected output:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | 1 | 10 |
| NULL | 3 | 40 |
| NULL | 5 | 90 |
Here is the query structure (I didn't include the actual query since it's too big):
-- First group
queryForSpeed1
queryToUpdateDistanceBasedOnSpeed1
-- Second group
queryForSpeed2
queryToUpdateDistanceBasedOnSpeed2
If I run the first group of queries (queryForSpeed1 and queryToUpdateDistanceBasedOnSpeed1) separately from the second group then I get the expected output: only the speed1 and distance columns contain data:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | 10 |
| 3 | NULL | 40 |
| 5 | NULL | 90 |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
The same happens when I run the second group:
___________________________________
| speed1 | speed2 | distance |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | 1 | 10 |
| NULL | 2 | 40 |
| NULL | 3 | 90 |
BUT, when I run both groups: all the distances are NULL:
___________________________________
| speed1 | speed2 | distance |
| 1 | NULL | NULL |
| 3 | NULL | NULL |
| 5 | NULL | NULL |
| NULL | 1 | NULL |
| NULL | 2 | NULL |
| NULL | 3 | NULL |
I believe this is somehow related to transaction management and temporary tables, although I wasn't able to find anything relevant to solve the problem on Google.
From what I've read, SQL Server keeps a transaction log where it stores every update, insert and whatever... when it arrives at the end of the script it actually does all those insertions and updates.
So the update I did for the distance column finds all the speeds as being NULL because the data wasn't yet inserted in the temporary table from the previous updates, but at the end of the query the speeds are inserted in the table so that's why they are visible.
I played a bit with the GO statement to execute my script in batches, but no luck so far...
What am I doing wrong? Can someone point me in the right direction, please?
EDIT
Here is the actual query.
The problem is not related to transactions, but rather to the way you conduct updates to #temp_speed_profile. The second pass through #temp_speed_profile retrieves all six records. Speed_new is null in first record of Voyage_Id, consequently #distance becomes null. As you retain the value of #distance in next turn, it remains null.
Problem goes away when using different temporary tables because second pass works on second set of data only.
A note on cursors - when defining one make sure to add local and fast_forward. Local because it is limiting cursors' scope, and fast_forward to optimize fetches.
It is almost certainly caused by the way you have written your queries.
To confirm, just rewrite your queries using #temp_data1 and #temp_data2, rather than a single table #temp_data.