Calculate time difference between rows - sql

I currently have a database in the following format
ID | DateTime | PID | TIU
1 | 2013-11-18 00:15:00 | 1551 | 1005
2 | 2013-11-18 00:16:03 | 1551 | 1885
3 | 2013-11-18 00:16:30 | 9110 | 75527
4 | 2013-11-18 00:22:01 | 1022 | 75
5 | 2013-11-18 00:22:09 | 1019 | 1311
6 | 2013-11-18 00:23:52 | 1022 | 89
7 | 2013-11-18 00:24:19 | 1300 | 44433
8 | 2013-11-18 00:38:57 | 9445 | 2010
I have a scenario where I need to identify where there are gaps in processes more than 5 minutes by using the DateTime column.
An example of what I am trying to achieve is:
ID | DateTime | PID | TIU
3 | 2013-11-18 00:16:30 | 9110 | 75527
4 | 2013-11-18 00:22:01 | 1022 | 75
7 | 2013-11-18 00:24:50 | 1300 | 44433
8 | 2013-11-18 00:38:57 | 9445 | 2010
ID3 is the last row before a 6 minute 1 second gap, ID4 is the next row after it.
ID7 is the last row before a 14 minute 7 second gap, ID8 is the next record available.
I am trying to do this in SQL, however if needs be I can do this in C# to process instead.
I have tried a number of inner joins, however the table is over 3 million rows so performance suffers greatly.

This is a CTE solution but, as has been indicated, this may not always perform well - because we're having to compute functions against the DateTime column, most indexes will be useless:
declare #t table (ID int not null,[DateTime] datetime not null,
PID int not null,TIU int not null)
insert into #t(ID,[DateTime],PID,TIU) values
(1,'2013-11-18 00:15:00',1551,1005 ),
(2,'2013-11-18 00:16:03',1551,1885 ),
(3,'2013-11-18 00:16:30',9110,75527 ),
(4,'2013-11-18 00:22:01',1022,75 ),
(5,'2013-11-18 00:22:09',1019,1311 ),
(6,'2013-11-18 00:23:52',1022,89 ),
(7,'2013-11-18 00:24:19',1300,44433 ),
(8,'2013-11-18 00:38:57',9445,2010 )
;With Islands as (
select ID as MinID,[DateTime],ID as RecID from #t t1
where not exists
(select * from #t t2
where t2.ID < t1.ID and --Or by date, if needed
--Use 300 seconds to avoid most transition issues
DATEDIFF(second,t2.[DateTime],t1.[DateTime]) < 300
)
union all
select i.MinID,t2.[DateTime],t2.ID
from Islands i
inner join
#t t2
on
i.RecID < t2.ID and
DATEDIFF(second,i.[DateTime],t2.[DateTime]) < 300
), Ends as (
select MinID,MAX(RecID) as MaxID from Islands group by MinID
)
select * from #t t
where exists(select * from Ends e where e.MinID = t.ID or e.MaxID = t.ID)
This also returns a row for ID 1, since that row has no preceding row within 5 minutes of it - but that should be easy enough to exclude in the final select, if needed.
I've assumed we can use ID as a proxy for increasing dates - that if for two rows, the ID is higher in the second row, then the DateTime will also be later.
Islands is a recursive CTE. The top half (the anchor) just selects rows which do not have any preceding row within 5 minutes of themselves. We select the ID twice for those rows and also keep the DateTime around.
In the recursive portion, we try to find a new row from the table that can be "added on" to an existing Islands row - based on this new row being no more than 5 minutes later than the current end-point of the island.
Once the recursion is complete, we then exclude the intermediate rows that the CTE produces. E.g. for the "4" island, it generated the following rows:
4,00:22:01,4
4,00:22:09,5
4,00:23:52,6
4,00:24:19,7
And all that we care about is that final row where we've identified an "island" of time from ID 4 to ID 7 - that's what the second CTE (Ends) is finding for us.

Related

How to write a CTE to aggregate hierarchical values

I want to write expressions in sqlite to process a tree of items, starting with the leaf nodes (the bottom) and proceeding back to their parents all the way to the root node (the top), such that each parent node is updated based on the content of its children. I've been able to write a CTE that does something similar, but isn't yet totally correct.
I have a simple table "test1" containing some nested values:
id | parent | value | total
---+--------+--------------
1 | NULL | NULL | NULL
2 | 1 | NULL | NULL
3 | 2 | NULL | NULL
4 | 3 | 50 | NULL
5 | 3 | 50 | NULL
6 | 2 | 60 | NULL
7 | 6 | 90 | NULL
8 | 6 | 60 | NULL
Rows may have children who reference their parent via their parent field. Rows may have a value of their own as well as child rows, or they may simply be parents without values (ie. "wrappers"). The leafs would be the rows without any children.
For each row I'd like to calculate the total, as the average or the row's value (if not null) AND its children's totals. This should start with the leaf nodes and proceed up the tree to their parents, all the way to the root node at the top of the data hierarchy.
I've tried a number of variations of CTE's but am having difficulty writing one that will recursively calculate these totals from the bottom up.
Currently, I have:
UPDATE test1 SET total = (
WITH RECURSIVE cte(cte_id,cte_parent,cte_value,cte_total) AS (
SELECT test1.id, test1.parent, test1.value, test1.total
UNION ALL
select t.id, t.parent, t.value, t.total from test1 t, cte
WHERE cte.cte_id=t.parent
) SELECT AVG(cte_value) FROM cte
);
which produces:
id | parent | value | total
---+--------+-------+------
1 | NULL | NULL | 62
2 | 1 | NULL | 62
3 | 2 | NULL | 50
4 | 3 | 50 | 50
5 | 3 | 50 | 50
6 | 2 | 60 | 70
7 | 6 | 90 | 90
8 | 6 | 60 | 60
Looking at the top-most rows, this is not quite right, since it's taking an average of not only the row's immediate children, but of all the row's descendants. This causes row 2 for example to have a total of 62 instead of 60. The expected results should set rows 2's total to 60, as the average of its immediate child rows 3 and 6. Row 1's total would be 60 as well.
How can I calculate a "total" value for each row based on an average of the row's value and the values of it's immediate children only, while ensuring the upper levels of the hierarchy are correctly populated based on the calculated totals of their children?
It turns out that a very similar question and solution was posted here:
How can I traverse a tree bottom-up to calculate a (weighted) average of node values in PostgreSQL?
Since sqlite3 doesn't let you create functions, the example using a recursive CTE applies:
with recursive cte(id, parent, value, level, total) as (
select
t.id, t.parent, t.value,
0,
t.value as total
from test1 t
where not exists (
select id
from test1
where parent = t.id)
union all
select
t.id, t.parent, t.value,
c.level+1,
case when t.value is null then c.total else t.value end
from test1 t
join cte c on t.id=c.parent
)
select id, parent, value, avg(total) total from (
select
id, parent, value, level, avg(total) total
from cte
group by id,parent,level
)
group by id, parent
order by id

duplicate elimination with time

For my Smart home Stuff I create a Database, but I make a mistake during programming: the application posting stuff into the Database twice. I want to delete all rows, which contain duplicates. With duplicate I mean a tuples what is identically in the data to the last one from the same type. I mark the duplicates in this Example with "<<" please pay also attention to the last 3 rows. I want to keep the first new Data so I want to delete all the Duplicate after them. I still hope you can help me to solve my Problem.
SmartHome=# select * from sensordata order by time desc Limit 21;
type | data | time
------+------+----------------------------
8 | 2459 | 2019-08-09 23:10:39.530087 <<
8 | 2459 | 2019-08-09 23:10:39.356908
8 | 2445 | 2019-08-09 23:05:39.933269 <<
8 | 2445 | 2019-08-09 23:05:39.789173
10 | 6105 | 2019-08-09 22:50:50.40792 <<
10 | 6105 | 2019-08-09 22:50:50.096132
8 | 2459 | 2019-08-09 22:50:41.429681 <<
8 | 2459 | 2019-08-09 22:50:41.357483
8 | 2474 | 2019-08-09 22:45:42.13396 <<
8 | 2474 | 2019-08-09 22:45:41.813046
10 | 6221 | 2019-08-09 22:40:51.107709 <<
10 | 6221 | 2019-08-09 22:40:51.076903
10 | 6105 | 2019-08-09 22:35:51.737255 <<
10 | 6105 | 2019-08-09 22:35:51.544886
10 | 6221 | 2019-08-09 22:30:52.493895 <<
10 | 6221 | 2019-08-09 22:30:51.795203
8 | 2459 | 2019-08-09 22:30:43.193447 <<
8 | 2459 | 2019-08-09 22:30:43.045599
10 | 6105 | 2019-08-09 22:25:52.571793 << Duplicate like them above
10 | 6105 | 2019-08-09 22:25:52.442844 << Also a Duplicate with much more
10 | 6105 | 2019-08-09 22:20:51.356846 time between the rows
(21 rows)
SmartHome=# \d sensordata
Table "public.sensordata"
Column | Type | Modifiers
--------+-----------------------------+------------------------
type | integer | not null
data | character varying(20) | not null
time | timestamp without time zone | not null default now()
Indexes:
"smarthome_idx" UNIQUE, btree (type, "time")
Foreign-key constraints:
"sensordata_type_fkey" FOREIGN KEY (type) REFERENCES sensortype(id)
If i run
with a as (Select *, row_number() over(partition by type,data order by time) from sensordata) select * from a where row_number=1 order by time desc;
the output is:
10 | 17316 | 2019-08-09 09:43:46.938507 | 1
10 | 18276 | 2019-08-09 09:38:47.129788 | 1
10 | 18176 | 2019-08-09 09:33:47.889064 | 1
10 | 17107 | 2019-08-08 10:36:11.383106 | 1
10 | 17921 | 2019-08-08 09:56:15.889191 | 1
10 | 17533 | 2019-08-03 09:30:11.047639 | 1
thats not what i mean :/ (ßorry dont know how to mark the stuff as code block in the comment therfore this way
There are many possible ways to do this. The fastest is often a correlated subquery but I can never remember the syntax so I normally go for window functions, specifically row_number().
If you run
Select *, row_number() over(partition by type,data order by date) from sensor data
That should give a version of your table where all the rows you want to keep have a number 1 and the duplicates are numbered 2,3,4... Use that same field in a delete query and you'll be sorted.
EDIT: I understand now you only want to remove duplicates that occur sequentially within the same type. This can also be achieved using row_number and join.This query should give you only the data you want.
WITH s as (SELECT *,row_number() over(partition by type order by date) as rnum from sensordata)
SELECT a.*
FROM s a
JOIN s b
ON a.rnum=b.rnum+1 AND a.type=b.type
WHERE NOT a.data=b.data
This might need a slight tweak to avoid missing the very first entry if that's important.
You can identify the duplicates using lag():
select t.*
from (select t.*,
lag(time) over (partition by type, data order by time) as prev_time
from sensordata t
) t
where prev_time > time - interval '10 minute';
This assumes an interval of 10 minutes for a duplicate. You don't specify in the question.
Then, you can delete these using a join. It would be better if you had a serial primary key, so you knew that data was unique. But it seems reasonable to assume that type/data/time will be unique.
So:
delete t
from (select t.*,
lag(time) over (partition by type, data order by time) as prev_time
from sensordata t
) tt
where tt.prev_time > t.time - interval '10 minute' and
tt.type = t.type and
tt.data = t.data and
tt.time = t.time;
Note that if you do have exact duplicates in the table, then you might end up duplicating all the rows for a given combination, including the original one.
if you want to select or delete records based on the existence of other records, EXISTS() is your friend.
First:
define exactly what you consider a set of duplicates
decide which one you want to keep, and which ones to delete
Assuming that you want to delete the most recent records, but only if the data did not change between the one to keep and the one to delete.
select the ones to delete:
SELECT *
FROM sensordata d
WHERE EXISTS( -- if there exists a record with...
SELECT * FROM sensordata x
WHERE x.ztype = d.ztype -- the same type
AND x.zdata = d.zdata -- the same data
AND x.ztime <d.ztime -- but:older
AND NOT EXISTS( -- and NO record exists with...
SELECT * FROM sensordata nx
WHERE nx.ztype = d.ztype -- the same type
AND nx.zdata <> d.zdata -- but DIFFERENT data
AND nx.ztime <d.ztime -- between the one to delete
AND nx.ztime >x.ztime -- and the oldest one
)
);
If that looks OK, you can use exactly the same condition in a DELETE statement.
DELETE FROM sensordata d
WHERE EXISTS( -- if there exists a record with...
SELECT * FROM sensordata x
WHERE x.ztype = d.ztype -- the same type
AND x.zdata = d.zdata -- the same data
AND x.ztime <d.ztime -- but:older
AND NOT EXISTS( -- and NO record exists with...
SELECT * FROM sensordata nx
WHERE nx.ztype = d.ztype -- the same type
AND nx.zdata <> d.zdata -- but DIFFERENT data
AND nx.ztime <d.ztime -- between the one to delete
AND nx.ztime >x.ztime -- and the oldest one
)
);

Select last record from data table for each device in devices table [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have a problem with the executing speed of my sql query to postgres database.
I have 2 tables:
table 1: DEVICES
ID | NAME
------------------
1 | first device
2 | second device
table 2: DATA
ID | DEVICE_ID | TIME | DATA
--------------------------------------------
1 | 1 | 2016-07-14 2:00:00 | data1
2 | 1 | 2016-07-14 1:00:00 | data2
3 | 2 | 2016-07-14 4:00:00 | data3
4 | 1 | 2016-07-14 3:00:00 | data4
5 | 2 | 2016-07-14 6:00:00 | data5
6 | 2 | 2016-07-14 5:00:00 | data6
I need get this select's result table:
ID | DEVICE_ID | TIME | DATA
-------------------------------------------
4 | 1 | 2016-07-14 3:00:00 | data4
5 | 2 | 2016-07-14 6:00:00 | data5
i.e. for each device in devices table I need to get only one data record with the last TIME value.
This is my sql query:
SELECT * FROM db.data d
WHERE d.time = (
SELECT MAX(d2.time) FROM db.data d2
WHERE d2.device_id = d.device_id);
This is HQL query equivalent:
SELECT d FROM Data d
WHERE d.time = (
SELECT MAX(d2.time) FROM Data d2
WHERE d2.device.id = t2.device.id)
Yes, I use Hibernate ORM in my project - may this info will be useful for someone.
I got correct answer on my queries, BUT it's too long - about 5-10 seconds on 10k records in data table and only 2 devices in devices table. It's terrible.
First of all, I thought that problem is in Hibernate. But native sql query from psql in linux terminal execute the same time as through hibernate.
How can I optimize my query? This query is too complexity:
O(device_count * data_count^2)
Since you're using Postgres, you could use window functions to achieve this, like so:
select
sq.id,
sq.device_id,
sq.time,
sq.data
from (
select
data.*,
row_number() over (partition by data.device_id order by data.time desc) as rnk
from
data
) sq
where
sq.rnk = 1
The row_number() window function first ranks the rows in the data table on the basis of the device_id and time columns, and the outer query then picks the highest-ranked rows.

Joining series of dates and counting continous days

Let's say I have a table as below
date add_days
2015-01-01 5
2015-01-04 2
2015-01-11 7
2015-01-20 10
2015-01-30 1
what I want to do is to check the days_balance, i.e. if date is greater or smaller than previous date + N days (add_days) and take the cumulated sum of days count if they are a continuous series.
So the algorithm should work like
for i in 2:N_rows {
days_balance[i] := date[i-1] + add_days[i-1] - date[i]
if days_balance[i] >= 0 then
date[i] := date[i] + days_balance[i]
}
The expected result should be as follows
date days_balance
2015-01-01 0
2015-01-04 2
2015-01-11 -3
2015-01-20 -2
2015-01-30 0
Is it possible in pure SQL? I imagine it should be with some conditional joins, but cannot see how it could be implemented.
I'm posting another answer since it may be nice to compare them since they use different methods (this one just does a n^2 style join, other one used a recursive CTE). This one takes advantage of the fact that you don't have to calculate the days_balance for each previous row before calculating it for a particular row, you just need to sum things from previous days....
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
, combinedWithAllPreviousDaysCte as
(
select i [curr_i], date [curr_date], add_days [curr_add_days], days_since_prev [curr_days_since_prev], 0 [prev_add_days], 0 [prev_days_since_prev] from cte where i = 1 --get first row explicitly since it has no preceding rows
UNION ALL
select curr.i [curr_i], curr.date [curr_date], curr.add_days [curr_add_days], curr.days_since_prev [curr_days_since_prev], prev.add_days [prev_add_days], prev.days_since_prev [prev_days_since_prev]
from cte curr
join cte prev on curr.i > prev.i --join to all previous days
)
select curr_i, curr_date, SUM(prev_add_days) - curr_days_since_prev - SUM(prev_days_since_prev) [days_balance]
from combinedWithAllPreviousDaysCte
group by curr_i, curr_date, curr_days_since_prev
order by curr_i
outputs:
+--------+-------------------------+--------------+
| curr_i | curr_date | days_balance |
+--------+-------------------------+--------------+
| 1 | 2015-01-01 00:00:00.000 | 0 |
| 2 | 2015-01-04 00:00:00.000 | 2 |
| 3 | 2015-01-11 00:00:00.000 | -3 |
| 4 | 2015-01-20 00:00:00.000 | -5 |
| 5 | 2015-01-30 00:00:00.000 | -5 |
+--------+-------------------------+--------------+
Well, I think I have it with a recursive CTE (sorry, I only have Microsoft SQL Server available to me at the moment, so it may not comply with PostgreSQL).
Also I think the expected results you had were off (see comment above). If not, this can probably be modified to conform to your math.
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
,recursiveCte (i, date, add_days, days_since_prev, days_balance, math) as
(
select top 1
i,
date,
add_days,
days_since_prev,
0 [days_balance],
CAST('no math for initial one, just has zero balance' as varchar(max)) [math]
from cte where i = 1
UNION ALL --recursive step now
select
curr.i,
curr.date,
curr.add_days,
curr.days_since_prev,
prev.days_balance - curr.days_since_prev + prev.add_days [days_balance],
CAST(prev.days_balance as varchar(max)) + ' - ' + CAST(curr.days_since_prev as varchar(max)) + ' + ' + CAST(prev.add_days as varchar(max)) [math]
from cte curr
JOIN recursiveCte prev ON curr.i = prev.i + 1
)
select i, DATEPART(day,date) [day], add_days, days_since_prev, days_balance, math
from recursiveCTE
order by date
And the results are like so:
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| i | day | add_days | days_since_prev | days_balance | math |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| 1 | 1 | 5 | 0 | 0 | no math for initial one, just has zero balance |
| 2 | 4 | 2 | 3 | 2 | 0 - 3 + 5 |
| 3 | 11 | 7 | 7 | -3 | 2 - 7 + 2 |
| 4 | 20 | 10 | 9 | -5 | -3 - 9 + 7 |
| 5 | 30 | 1 | 10 | -5 | -5 - 10 + 10 |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
I don’t quite get how your algorithm returns your expected results? But let me share a technique I came up with that might help.
This will only work if the end result of your data is to be exported to Excel, and even then it won’t work in all scenarios depending on what format you export your dataset in, but here it is....
If you’ll familiar with Excel Formulas, what I discovered is that if you write an Excel formula in your SQL as another field, it will execute that formula for you as soon as you export to excel (best method that works for me is just coping and pasting it into Excel, so that it doesn’t format it as text)
So for your example, here’s what you could do (noting again I don’t understand your algorithm, so this is probably wrong, but it’s just to give you the concept)
SELECT
date
, add_days
, '=INDEX($1:$65536,ROW()-1,COLUMN()-2)'
||'+INDEX($1:$65536,ROW()-1,COLUMN()-1)'
||'-INDEX($1:$65536,ROW(),COLUMN()-2)'
AS "days_balance[i]"
,'=IF(INDEX($1:$65536,ROW(),COLUMN()-1)>=0'
||',INDEX($1:$65536,ROW(),COLUMN()-3)'
||'+INDEX($1:$65536,ROW(),COLUMN()-1))'
AS "date[i]"
FROM
myTable
ORDER BY /*Ensure to order by whatever you need for your formula to work*/
The key part to making this work is using the INDEX formula function to select a cell based on the position of the current cell. So ROW()-1 tells it get me the result of the previous record, and COLUMN()-2 means take the value from two columns to the left of the current. Because you can't use cell references like A2+B2-A3 because the row numbers won't change on export, and it assumes the position of the columns.
I used SQL string concatenation with || just so it's easier to read on screen.
I tried this one in excel; it didn’t match your expected results. But if this technique works for you then just correct the excel formula to suit.

Is it possible to temporarily duplicate and modify rows on the fly in an SQL SELECT query?

I've just received a new data source for my application which inserts data into a Derby database only when it changes. Normally, missing data is fine - I'm drawing a line chart with the data (value over time), and I'd just draw a line between the two points, extrapolating the expected value at any given point. The problem is that as missing data in this case means "draw a straight line," the graph would be incorrect if I did this.
There are two ways I could fix this: I could create a new class that handles missing data differently (which could be difficult due to the way prefuse, the drawing library I'm using, handles drawing), or I could duplicate the rows, leaving the y value the same while changing the x value in each row. I could do this in the Java that bridges the database and the renderer, or I could modify the SQL.
My question is, given a result set like the one below:
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 10 | 2000-01-01 08:00:05 |
| 11 | 2000-01-01 08:00:07 |
| 2 | 2000-01-01 08:00:13 |
| 4 | 2000-01-01 08:00:16 |
+-------+---------------------+
Assuming I query it at 8:00:20, how can I make it look like the following using SQL? Basically, I'm duplicating the row for every second until it's already taken. received is, for all intents and purposes, unique (it's not, but it will be due to the WHERE clause in the query).
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 7 | 2000-01-01 08:00:01 |
| 7 | 2000-01-01 08:00:02 |
| 7 | 2000-01-01 08:00:03 |
| 7 | 2000-01-01 08:00:04 |
| 10 | 2000-01-01 08:00:05 |
| 10 | 2000-01-01 08:00:06 |
| 11 | 2000-01-01 08:00:07 |
| 11 | 2000-01-01 08:00:08 |
| 11 | 2000-01-01 08:00:09 |
| 11 | 2000-01-01 08:00:10 |
| 11 | 2000-01-01 08:00:11 |
| 11 | 2000-01-01 08:00:12 |
| 2 | 2000-01-01 08:00:13 |
| 2 | 2000-01-01 08:00:14 |
| 2 | 2000-01-01 08:00:15 |
| 4 | 2000-01-01 08:00:16 |
| 4 | 2000-01-01 08:00:17 |
| 4 | 2000-01-01 08:00:18 |
| 4 | 2000-01-01 08:00:19 |
| 4 | 2000-01-01 08:00:20 |
+-------+---------------------+
Thanks for your help.
Due to the set based nature of SQL, there's no simple way to do this. I have used two solution strategies:
a) use a cycle to go from the initial to end date time and for each step get the value, and insert that into a temp table
b) generate a table (normal or temporary) with the 1 minute increments, adding the base date time to this table you can generate the steps.
Example of approach b) (SQL Server version)
Let's assume we will never query more than 24 hours of data. We create a table intervals that has a dttm field with the minute count for each step. That table must be populated previously.
select dateadd(minute,stepMinutes,'2000-01-01 08:00') received,
(select top 1 value from table where received <=
dateadd(minute,dttm,'2000-01-01 08:00')
order by received desc) value
from intervals
It seems like in this case you really don't need to generate all of these datapoints. Would it be correct to generate the following instead? If it's drawing a straight line, you don't need go generate a data point for each second, just two for each datapoint...one at the current time, one right before the next time. This example subtracts 5 ms from the next time, but you could make it a full second if you need it.
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 7 | 2000-01-01 08:00:04 |
| 10 | 2000-01-01 08:00:05 |
| 10 | 2000-01-01 08:00:06 |
| 11 | 2000-01-01 08:00:07 |
| 11 | 2000-01-01 08:00:12 |
| 2 | 2000-01-01 08:00:13 |
| 2 | 2000-01-01 08:00:15 |
| 4 | 2000-01-01 08:00:16 |
| 4 | 2000-01-01 08:00:20D |
+-------+---------------------+
If that's the case, then you can do the following:
SELECT * FROM
(SELECT * from TimeTable as t1
UNION
SELECT t2.value, dateadd(ms, -5, t2.received)
from ( Select t3.value, (select top 1 t4.received
from TimeTable t4
where t4.received > t3.received
order by t4.received asc) as received
from TimeTable t3) as t2
UNION
SELECT top 1 t6.value, GETDATE()
from TimeTable t6
order by t6.received desc
) as t5
where received IS NOT NULL
order by t5.received
The big advantage of this is that it is a set based solution and will be much faster than any iterative approach.
You could just walk a cursor, keep vars for the last value & time returned, and if the current one is more than a second ahead, loop one second at a time using the previous value and the new time until you get the the current row's time.
Trying to do this in SQL would be painful, and if you went and created the missing data, you would possible have to add a column to track real / interpolated data points.
Better would be to have a table for each axial value you want to have on the graph, and then either join to it or even just put the data field there and update that record when/if values arrive.
The "missing values" problem is quite extensive, so I suggest you have a solid policy.
One thing that will happen is that you will have multiple adjacent slots with missing values.
This would be much easier if you could transform it into OLAP data.
Create a simple table that has all the minutes (warning, will run for a while):
Create Table Minutes(Value DateTime Not Null)
Go
Declare #D DateTime
Set #D = '1/1/2000'
While (Year(#D) < 2002)
Begin
Insert Into Minutes(Value) Values(#D)
Set #D = DateAdd(Minute, 1, #D)
End
Go
Create Clustered Index IX_Minutes On Minutes(Value)
Go
You can then use it somewhat like this:
Select
Received = Minutes.Value,
Value = (Select Top 1 Data.Value
From Data
Where Data.Received <= Minutes.Received
Order By Data.Received Desc)
From
Minutes
Where
Minutes.Value Between #Start And #End
I would recommend against solving this in SQL/the database due to the set based nature of it.
Also you are dealing with seconds here so I guess you could end up with a lot of rows, with the same repeated data, that would have to be transfered from the database to you application.
One way to handle this is to left join your data against a table that contains all of the received values. Then, when there is no value for that row, you calculate what the projected value should be based on the previous and next actual values you have.
You didn't say what database platform you are using. In SQL Server, I would create a User Defined Function that accepts a start datetime and end datetime value. It would return a table value with all of the received values you need.
I have simulated it below, which runs in SQL Server. The subselect aliased r is what would actually get returned by the user defined function.
select r.received,
isnull(d.value,(select top 1 data.value from data where data.received < r.received order by data.received desc)) as x
from (
select cast('2000-01-01 08:00:00' as datetime) received
union all
select cast('2000-01-01 08:00:01' as datetime)
union all
select cast('2000-01-01 08:00:02' as datetime)
union all
select cast('2000-01-01 08:00:03' as datetime)
union all
select cast('2000-01-01 08:00:04' as datetime)
union all
select cast('2000-01-01 08:00:05' as datetime)
union all
select cast('2000-01-01 08:00:06' as datetime)
union all
select cast('2000-01-01 08:00:07' as datetime)
union all
select cast('2000-01-01 08:00:08' as datetime)
union all
select cast('2000-01-01 08:00:09' as datetime)
union all
select cast('2000-01-01 08:00:10' as datetime)
union all
select cast('2000-01-01 08:00:11' as datetime)
union all
select cast('2000-01-01 08:00:12' as datetime)
union all
select cast('2000-01-01 08:00:13' as datetime)
union all
select cast('2000-01-01 08:00:14' as datetime)
union all
select cast('2000-01-01 08:00:15' as datetime)
union all
select cast('2000-01-01 08:00:16' as datetime)
union all
select cast('2000-01-01 08:00:17' as datetime)
union all
select cast('2000-01-01 08:00:18' as datetime)
union all
select cast('2000-01-01 08:00:19' as datetime)
union all
select cast('2000-01-01 08:00:20' as datetime)
) r
left outer join Data d on r.received = d.received
If you were in SQL Server, then this would be a good start. I am not sure how close Apache's Derby is to sql.
Usage: EXEC ElaboratedData '2000-01-01 08:00:00','2000-01-01 08:00:20'
CREATE PROCEDURE [dbo].[ElaboratedData]
#StartDate DATETIME,
#EndDate DATETIME
AS
--if not a valid interval, just quit
IF #EndDate<=#StartDate BEGIN
SELECT 0;
RETURN;
END;
/*
Store the value of 1 second locally, for readability
--*/
DECLARE #OneSecond FLOAT;
SET #OneSecond = (1.00000000/86400.00000000);
/*
create a temp table w/the same structure as the real table.
--*/
CREATE TABLE #SecondIntervals(TSTAMP DATETIME, DATAPT INT);
/*
For each second in the interval, check to see if we have a known value.
If we do, then use that. If not, make one up.
--*/
DECLARE #CurrentSecond DATETIME;
SET #CurrentSecond = #StartDate;
WHILE #CurrentSecond <= #EndDate BEGIN
DECLARE #KnownValue INT;
SELECT #KnownValue=DATAPT
FROM TESTME
WHERE TSTAMP = #CurrentSecond;
IF (0 = ISNULL(#KnownValue,0)) BEGIN
--ok, we have to make up a fake value
DECLARE #MadeUpValue INT;
/*
*******Put whatever logic you want to make up a fake value here
--*/
SET #MadeUpValue = 99;
INSERT INTO #SecondIntervals(
TSTAMP
,DATAPT
)
VALUES(
#CurrentSecond
,#MadeUpValue
);
END; --if we had to make up a value
SET #CurrentSecond = #CurrentSecond + #OneSecond;
END; --while looking thru our values
--finally, return our generated values + real values
SELECT TSTAMP, DATAPT FROM #SecondIntervals
UNION ALL
SELECT TSTAMP, DATAPT FROM TESTME
ORDER BY TSTAMP;
GO
As just an idea, you might want to check out Anthony Mollinaro's SQL Cookbook, chapter 9. He has a recipe, "Filling in Missing Dates" (check out pages 278-281), that discusses primarily what you are trying to do. It requires some sort of sequential handling, either via a helper table or doing the query recursively. While he doesn't have examples for Derby directly, I suspect you could probably adapt them to your problem (particularly the PostgreSQL or MySQL one, it seems somewhat platform agnostic).