How to get the data of 10-10 row as a single row through SQL query among large set of data? - sql

I have a table that consist of multiple data of same ID with different-different time stamp(each interval of 6 minutes) in one column, and its recorded temperature at each given time stamp.
ID
Time_Stamp
Temperature_1
Temperature_2
101
18-09-2020 17:05:40
98.50
87.63
101
18-09-2020 17:11:40
96.60
46.3
101
18-09-2020 17:17:40
80.50
65.30
101
18-09-2020 17:23:40
65.30
77.21
101
18-09-2020 17:29:40
36.20
63.30
101
18-09-2020 17:35:40
69.30
54.70
..... up to 614 rows
Output should be:
ID
Time_Stamp
Avg_Temperature_1
Avg_Temperature_2
101
18-09-2020 17:29:40
98.50
87.63
101
18-09-2020 18:29:40
96.60
46.3
101
18-09-2020 19:29:40
80.50
65.30
..... up to 61 rows
Elaboration:
Lets assume it has 614 rows.
i have to first search all the data in ascending manner(by time_stamp) (e.g select * from table where id=101 order by time_stamp asc).
Now, I have 614 row of that data.but I have to consider only nearest 10 data. e.g if here 614 rows then i have to consider only 610 data. similarly if i will have 219 data, then i have to consider only 210 data (if 220 then i have to consider 220 data), if 155 then 150, if 314 then 310 data and so on..
so after considering 610 row i have to divide it by 10. so that in my final o/p i will have only 61 rows .(each of 10-10 set)
Also note that if i am taking 10-10 set then i will have each row showing avg of each hour in my final o/p)
how? (the data has came at interval of every 6-6 minute, so if i take 10 data together then it will have data of each 1-1 hour(6*10=60 min) representing by each row).
so finally i have to take set of 10-10 row and find the average of each temp column and represent it as a single row.
Note that in time_stamp column we can take any mid value of 10 set,either 4th one,5th one or 6th one.
And in Temp1 column it should be avg of 10 row.
i have to show the avg temp data of each 1-1 hour interval time or for 10-10 set of rows.
How to write a SQL query for this?
What I tried so far is as below - for this I thought to write a stored procedure:
Step 1:- starting i will fetch all data and floor(cound(id)) value
by:-
SELECT * FROM table WHERE id = 1 ORDER BY Time_Stamp ASC
and then
SELECT FLOOR(COUNT(id) / 10) FROM table_name WHERE id = 1 (for deciding num of time loop should execute)
This will return a value of 61.
Step 2: looping on upto n times (here 61 times)
And within each loop I suppose limit up to 10 rows and take avg of temperature and all.
In each loop: finding the avg of column w.r.t id (but I am unable to include time stamp)
I use below for finding the avg with respect to id of first 10 data by:-
select id, avg(Temperature_1) as TempAVG1, avg(Temperature_2) as TempAVG2
from table_name
where Time_stamp >= TO_CHAR('18-09-2020 17:05')
and Time_stamp <= TO_CHAR('18-09-2020 18:05:40')
and id = 101
group by id
Here I'm unable to include the time stamp (4, 5 or 6th one of 10 set)
So for that I tried to write another query for finding only time stamp and willing to do union with first query, but I am unable to union both query because avg column and time column have diff data types (also all columns are not same)
Also I cannot think how to left last odd rows ( e.g if lastly if there is only1 to 9 rows left)
Please provide another efficient way if possible to write query for this or try to help me to write this stored procedure.
Or else if it is/can be mixing of query and C# code (e.g., using datatable and all) then also its welcome.
Technologies I am using: C# and an Oracle database

Related

Postgresql query to retrive last inserted row for each id

Suppose I have PostgreSQL Database table with data as follows. Some Random values for lat and long values used here.
id
date
lat
long
1
2014-02-01
10
20.12
1
2014-02-01
20
30
1
2014-02-01
12
14
2
2014-02-02
12
16
2
2014-02-02
18
22
3
2014-06-12
23
10
3
2014-06-12
15
12
3
2014-06-12
85
72
Date column has same date for each id( Like, id 1 has 2014-02-01 for all rows), how to retrieve last inserted long and lat (coordinates) for each id in this case?
You cannot. SQL tables represent unordered (multi)sets. That is, there is no ordering, no "last insert" *unless a column has that information.
Two typical methods of storing this information:
An identity column (or equivalently a serial column).
A created timestamp (which isn't perfect because there can be duplicates).
If you had such a column, then Postgres offers distinct on, which is a very convenient way to do this:
select distinct on (id) t.*
from t
order by id, <ordering col> desc;
You may want to recreate your data using an auto-generated identity column so you can capture the insertion order.
SELECT COL1,MAX(COL2)
FROM TABLEA
GROUP BY COL1
May be you need this?
SELECT DISTINCT ON(id), lat, long
FROM tableName
ORDER BY id;
This query gives you the last inserted lat and long for each id.

Convert single row into multiple rows Bigquery SQL

Convert a row into multiple rows in bigQuery SQL.
The number of rows depend on a particular column value (in this case, the value of delta_unit/60):
Source table:
ID time delta_unit
101 2019-06-18 01:00:00 60
102 2019-06-18 01:01:00 60
103 2019-06-18 01:03:00 120
The ID 102 does recorded a time at 01:01:00 and the next record was at 01:03:00.
So, we are missing a record that should have been 01:02:00 and the delta_unit = 60
Expected table:
ID time delta_unit
101 2019-06-18 01:00:00 60
102 2019-06-18 01:01:00 60
104 2019-06-18 01:02:00 60
103 2019-06-18 01:03:00 60
A new row is created based on the delta_unit. The number of rows that need to be created will depend on the value delta_unit/60 (in this case, 120/60 = 2)
I have found a solution to your problem. I have done the following, first run
SELECT max(delta/60) as max_a FROM `<projectid>.<dataset>.<table>`
to compute the maximum number of steps. Then run the following loop
DECLARE a INT64 DEFAULT 1;
WHILE a <= 2 DO --2=max_a (change accordingly)
INSERT INTO `<projectid>.<dataset>.<table>` (id,time,delta)
SELECT id+1,TIMESTAMP_ADD(time, INTERVAL a MINUTE),delta-60*a
FROM
(SELECT id,time,delta
FROM `<projectid>.<dataset>.<table>`
)
WHERE delta > 60*a;
SET a = a + 1;
END WHILE;
Of course this is not efficient enough but it gets the Job done. The IDs and deltas do not finish at the right values yet, they should not be needed. The deltas would end up all at 60 (the column can be deleted) and the IDs can be recreated using the timestamp to get them ordered.
You might try using a conditional expression in here to avoid the loop and only going through the table once.
I have tried
INSERT INTO `<projectid>.<dataset>.<table>` (id,time,delta)
SELECT id+1, CASE
WHEN delta>80 THEN TIMESTAMP_ADD(time, INTERVAL 1 MINUTE)
WHEN delta>150 THEN TIMESTAMP_ADD(time, INTERVAL 2 MINUTE)
END
,60
FROM
(SELECT id,time,delta
FROM `<projectid>.<dataset>.<table>`
)
WHERE delta > 60;
but fails because only returns the first condition where the when is True. So, I am not sure if it is possible to do it all at once. If you have small tables I would stick to the first one which works fine.

SQL Query to Count IDs

I'm trying to write a SQL Query in MS Access to count the number of times each ID appears in a data set. The data set is formatted as follows:
ID Time
1 12345
1 12346
1 12350
2 99999
2 99999
If the Time for one ID is within 3 seconds of another Time for that same ID, I only want it to be counted once. So the results should look like this:
ID Count
1 2
2 1
The time column is not formatted as a datetime, so I can't use the datediff function. Any help would be appreciated.
This:
SELECT ID, COUNT(newtime)
FROM (SELECT DISTINCT ID, Time\3 AS newtime FROM times)
GROUP BY ID
groups the Time field values in triples using the integer division for Time\3 in Access.
The comment provided by #Andy G worked for my purposes:
"You first need a function to round up (or down) to the nearest multiple of 3. See here (allenbrowne) for example."
I rounded the time values to the nearest multiple of 3, and counted based on that criteria.

Postgres sql select result based on a ranking derived from a text column

Scratching my head here. I have a very simple postgres table from which I need to select a unique row per day, based solely on a text column which updates as follows.
First update= 'AA1', 2nd update= 'AB', 3rd update= 'D4'
id item date run value
---------------------------------------
23 apple 01/01/16 AA1 232
25 apple 01/01/16 AB 254
26 apple 01/01/16 D4 212
Depending on the time of day, running a query based on the date ('01/01/2016') would return 1, 2 or 3 rows. However I only need the latest row e.g. Run = D4 above.
How can I write a simple select query that always returns just the latest row based of a text based column? I presume i need to create a ranking based on the 'run' column but Im not sure how to do this.
regards
Using the handy distinct on:
select distinct on (date) *
from t
order by date, run desc

Query to count records within time range SQL or Access Query

I have a table that looks like this:
Row,TimeStamp,ID
1,2014-01-01 06:01:01,5
2,2014-01-01 06:00:03,5
3,2014-01-01 06:02:00,5
4,2014-01-01 06:02:39,5
What I want to do is count the number of records for each ID, however I don't want to count records if a subsequent TimeStamp is within 30 seconds.
So in my above example the total count for ID 5 would be 3, because it wouldn't count Row 2 because it is within 30 seconds of the last timestamp.
I am building a Microsoft Access application, and currently using a Query, so this query can either be an Access query or a SQL query. Thank you for your help.
I think the query below does what you want however I don't understand your expected output. It returns a count of 4 (all the rows in your example) which I believe would be correct because all of your records are at least 30 seconds apart. No single timestamp has a subsequent timestamp within 30 seconds from it (in time).
Row 2 with a timestamp of '2014-01-01 06:00:03' is not within 30 seconds of any timestamp coming after. The closest is row #1 which is 58 seconds later (58 is greater than 30 so I don't know why you think it should be excluded (given what you said you wanted in your explanation)).
Rows 1/3/4 of your example data also are not within 30 seconds of each other.
This is a test of the sql below but like I said it returns all 4 rows (change to a count if you want the count, I brought back the rows to illustrate):
http://sqlfiddle.com/#!3/0d727/20/0
Now check this example with some added data: (I added a fifth row)
http://sqlfiddle.com/#!3/aee67/1/0
insert into tbl values ('2014-01-01 06:01:01',5);
insert into tbl values ('2014-01-01 06:00:03',5);
insert into tbl values ('2014-01-01 06:02:00',5);
insert into tbl values ('2014-01-01 06:02:39',5);
insert into tbl values ('2014-01-01 06:02:30',5);
Note how the query result shows only 3 rows. That is because the row I added (#5) is within 30 seconds of row #3, so #3 is excluded. Row #5 also gets excluded because row #4 is 9 seconds (<=30) later than it. Row #4 does come back because no subsequent timestamp is within 30 seconds (there are no subsequent timestamps at all).
Query to get the detail:
select *
from tbl t
where not exists
(select 1
from tbl x
where x.id = t.id
and x.timestamp > t.timestamp
and datediff(second, t.timestamp, x.timestamp) <= 30)
Query to get the count by ID:
select id, count(*)
from tbl t
where not exists
(select 1
from tbl x
where x.id = t.id
and x.timestamp > t.timestamp
and datediff(second, t.timestamp, x.timestamp) <= 30)
group by id
To the best of my knowledge it is impossible to do with just a SQL statement as presented.
I use two approaches:
For small result sets, remove the surplus records inside your time windows in code, then calculate the relevant statistics. The main advantage to this approach is you do not have to alter the database structure.
Add a field to flag each record relative to the time window, then use code to preprocess your data & fill the indicator. You can now use SQL to aggregate / filter based on the new flag column. If you need to track multiple time windows, you can use multiple flags / multiple columns (e.g. 30 second window, 600 second window, etc)
For this, I'd recommend the second approach, it allows the database (SQL) do more work after you once the preprocessing step is done.