Convert single row into multiple rows Bigquery SQL - sql

Convert a row into multiple rows in bigQuery SQL.
The number of rows depend on a particular column value (in this case, the value of delta_unit/60):
Source table:
ID time delta_unit
101 2019-06-18 01:00:00 60
102 2019-06-18 01:01:00 60
103 2019-06-18 01:03:00 120
The ID 102 does recorded a time at 01:01:00 and the next record was at 01:03:00.
So, we are missing a record that should have been 01:02:00 and the delta_unit = 60
Expected table:
ID time delta_unit
101 2019-06-18 01:00:00 60
102 2019-06-18 01:01:00 60
104 2019-06-18 01:02:00 60
103 2019-06-18 01:03:00 60
A new row is created based on the delta_unit. The number of rows that need to be created will depend on the value delta_unit/60 (in this case, 120/60 = 2)

I have found a solution to your problem. I have done the following, first run
SELECT max(delta/60) as max_a FROM `<projectid>.<dataset>.<table>`
to compute the maximum number of steps. Then run the following loop
DECLARE a INT64 DEFAULT 1;
WHILE a <= 2 DO --2=max_a (change accordingly)
INSERT INTO `<projectid>.<dataset>.<table>` (id,time,delta)
SELECT id+1,TIMESTAMP_ADD(time, INTERVAL a MINUTE),delta-60*a
FROM
(SELECT id,time,delta
FROM `<projectid>.<dataset>.<table>`
)
WHERE delta > 60*a;
SET a = a + 1;
END WHILE;
Of course this is not efficient enough but it gets the Job done. The IDs and deltas do not finish at the right values yet, they should not be needed. The deltas would end up all at 60 (the column can be deleted) and the IDs can be recreated using the timestamp to get them ordered.
You might try using a conditional expression in here to avoid the loop and only going through the table once.
I have tried
INSERT INTO `<projectid>.<dataset>.<table>` (id,time,delta)
SELECT id+1, CASE
WHEN delta>80 THEN TIMESTAMP_ADD(time, INTERVAL 1 MINUTE)
WHEN delta>150 THEN TIMESTAMP_ADD(time, INTERVAL 2 MINUTE)
END
,60
FROM
(SELECT id,time,delta
FROM `<projectid>.<dataset>.<table>`
)
WHERE delta > 60;
but fails because only returns the first condition where the when is True. So, I am not sure if it is possible to do it all at once. If you have small tables I would stick to the first one which works fine.

Related

Select unique IDs and divide result into X minute intervals based on given timespan

I'm trying to knock some dust off my good old SQL queries, but I'm afraid I need a push in the right direction into taking those dusty skills and transform them into something useful when it comes to BigQuery statements.
I'm currently working with a single table schema looking like this:
In the query I would like to be able to supply the following in my where clause:
The date of which I would like the results to stem from.
A time range - in the above result example this range would be from 20:00 to 21:00. If 1. and 2. in this list should be merged together that's also fine.
The eventId I would like to find records for.
Optionally to be able to determine the interval frequency - should it be divided into each ie. 5, 10 or 15 minute intervals.
Also I would like to count the unique userIds for each interval. If one user is present during the entire session he/she should be taken into the count in every interval.
So think of it as the following:
How many unique users did we have every 5 minutes at X event, between 20:00 and 21:00 at Y day?
How should my query look if I want a result looking (something) like the following pseudo result:
time_interval number_of_unique_userIds
1 2022-03-16 20:00:00 10
2 2022-03-16 20:05:00 12
3 2022-03-16 20:10:00 15
4 2022-03-16 20:15:00 20
5 2022-03-16 20:20:00 30
6 ... etc.
If the time of the query is before the provided end time in the timespan, it should fill out the rest of the interval rows with 0 unique userIds.
In the following result we've executed mentioned query earlier than the provided end date - let's say that it's executed at 20:49:
time_interval number_of_unique_userIds
X 2022-03-16 20:50:00 0
X 2022-03-16 20:55:00 0
X 2022-03-16 21:00:00 0
Here's what I have so far, but it gives me several of the same interval records with what looks like each userId:
SELECT
TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(creationTime), 5*60)) time_interval,
COUNT(DISTINCT(userId)) number_of_unique_userIds
FROM `bigquery.table`
WHERE eventId = 'xyz'
AND creationTime > '2022-03-16 20:00:00' AND creationTime < '2022-03-16 21:00:00'
GROUP BY time_interval
ORDER BY time_interval DESC
This gives me somewhat what I expect - but I think the number_of_unique_userIds seems too low, so I'm a little worried that I'm not getting unique userIds for each interval. What I'm thinking is, that userIds that were counted into the first 5 minute interval is not counted in the next. So I'm not sure this query is sufficient for my needs. Also it's not filling the blanks with 0 number_of_unique_userIds.
I hope you can help me out here.
Thanks!

How to get the data of 10-10 row as a single row through SQL query among large set of data?

I have a table that consist of multiple data of same ID with different-different time stamp(each interval of 6 minutes) in one column, and its recorded temperature at each given time stamp.
ID
Time_Stamp
Temperature_1
Temperature_2
101
18-09-2020 17:05:40
98.50
87.63
101
18-09-2020 17:11:40
96.60
46.3
101
18-09-2020 17:17:40
80.50
65.30
101
18-09-2020 17:23:40
65.30
77.21
101
18-09-2020 17:29:40
36.20
63.30
101
18-09-2020 17:35:40
69.30
54.70
..... up to 614 rows
Output should be:
ID
Time_Stamp
Avg_Temperature_1
Avg_Temperature_2
101
18-09-2020 17:29:40
98.50
87.63
101
18-09-2020 18:29:40
96.60
46.3
101
18-09-2020 19:29:40
80.50
65.30
..... up to 61 rows
Elaboration:
Lets assume it has 614 rows.
i have to first search all the data in ascending manner(by time_stamp) (e.g select * from table where id=101 order by time_stamp asc).
Now, I have 614 row of that data.but I have to consider only nearest 10 data. e.g if here 614 rows then i have to consider only 610 data. similarly if i will have 219 data, then i have to consider only 210 data (if 220 then i have to consider 220 data), if 155 then 150, if 314 then 310 data and so on..
so after considering 610 row i have to divide it by 10. so that in my final o/p i will have only 61 rows .(each of 10-10 set)
Also note that if i am taking 10-10 set then i will have each row showing avg of each hour in my final o/p)
how? (the data has came at interval of every 6-6 minute, so if i take 10 data together then it will have data of each 1-1 hour(6*10=60 min) representing by each row).
so finally i have to take set of 10-10 row and find the average of each temp column and represent it as a single row.
Note that in time_stamp column we can take any mid value of 10 set,either 4th one,5th one or 6th one.
And in Temp1 column it should be avg of 10 row.
i have to show the avg temp data of each 1-1 hour interval time or for 10-10 set of rows.
How to write a SQL query for this?
What I tried so far is as below - for this I thought to write a stored procedure:
Step 1:- starting i will fetch all data and floor(cound(id)) value
by:-
SELECT * FROM table WHERE id = 1 ORDER BY Time_Stamp ASC
and then
SELECT FLOOR(COUNT(id) / 10) FROM table_name WHERE id = 1 (for deciding num of time loop should execute)
This will return a value of 61.
Step 2: looping on upto n times (here 61 times)
And within each loop I suppose limit up to 10 rows and take avg of temperature and all.
In each loop: finding the avg of column w.r.t id (but I am unable to include time stamp)
I use below for finding the avg with respect to id of first 10 data by:-
select id, avg(Temperature_1) as TempAVG1, avg(Temperature_2) as TempAVG2
from table_name
where Time_stamp >= TO_CHAR('18-09-2020 17:05')
and Time_stamp <= TO_CHAR('18-09-2020 18:05:40')
and id = 101
group by id
Here I'm unable to include the time stamp (4, 5 or 6th one of 10 set)
So for that I tried to write another query for finding only time stamp and willing to do union with first query, but I am unable to union both query because avg column and time column have diff data types (also all columns are not same)
Also I cannot think how to left last odd rows ( e.g if lastly if there is only1 to 9 rows left)
Please provide another efficient way if possible to write query for this or try to help me to write this stored procedure.
Or else if it is/can be mixing of query and C# code (e.g., using datatable and all) then also its welcome.
Technologies I am using: C# and an Oracle database

Find list of dates in a table closest to specific date from different table.

I have a list of unique ID's in one table that has a date column. Example:
TABLE1
ID Date
0 2018-01-01
1 2018-01-05
2 2018-01-15
3 2018-01-06
4 2018-01-09
5 2018-01-12
6 2018-01-15
7 2018-01-02
8 2018-01-04
9 2018-02-25
Then in another table I have a list of different values that appear multiple times for each ID with various dates.
TABLE 2
ID Value Date
0 18 2017-11-28
0 24 2017-12-29
0 28 2018-01-06
1 455 2018-01-03
1 468 2018-01-16
2 55 2018-01-03
3 100 2017-12-27
3 110 2018-01-04
3 119 2018-01-10
3 128 2018-01-30
4 223 2018-01-01
4 250 2018-01-09
4 258 2018-01-11
etc
I want to find the value in table 2 that is closest to the unique date in table 1.
Sometimes table 2 does contain a value that matches the date exactly and I have had no problem in pulling through those values. But I can't work out the code to pull through the value closest to the date requested from table 1.
My desired result based on the examples above would be
ID Value Date
0 24 2017-12-29
1 455 2018-01-03
2 55 2018-01-03
3 110 2018-01-04
4 250 2018-01-09
Since I can easily find the ID's with an exact match, one thing I have tried is taking the ID's that don't have an exact date match and placing them with their corresponding values into a temporary table. Then trying to find the values where I need the closest possible match, but it's here that I'm not sure where to begin on the coding of that.
Apologies if I'm missing a basic function or clause for this, I'm still learning!
The below would be one method:
WITH Table1 AS(
SELECT ID, CONVERT(date, datecolumn) DateColumn
FROM (VALUES (0,'20180101'),
(1,'20180105'),
(2,'20180115'),
(3,'20180106'),
(4,'20180109'),
(5,'20180112'),
(6,'20180115'),
(7,'20180102'),
(8,'20180104'),
(9,'20180225')) V(ID, DateColumn)),
Table2 AS(
SELECT ID, [value], CONVERT(date, datecolumn) DateColumn
FROM (VALUES (0,18 ,'2017-11-28'),
(0,24 ,'2017-12-29'),
(0,28 ,'2018-01-06'),
(1,455,'2018-01-03'),
(1,468,'2018-01-16'),
(2,55 ,'2018-01-03'),
(3,100,'2017-12-27'),
(3,110,'2018-01-04'),
(3,119,'2018-01-10'),
(3,128,'2018-01-30'),
(4,223,'2018-01-01'),
(4,250,'2018-01-09'),
(4,258,'2018-01-11')) V(ID, [Value],DateColumn))
SELECT T1.ID,
T2.[Value],
T2.DateColumn
FROM Table1 T1
CROSS APPLY (SELECT TOP 1 *
FROM Table2 ca
WHERE T1.ID = ca.ID
ORDER BY ABS(DATEDIFF(DAY, ca.DateColumn, T1.DateColumn))) T2;
Note that if the difference is days is the same, the row returned will be random (and could differ each time the query is run). For example, if Table had the date 20180804 and Table2 had the dates 20180803 and 20180805 they would both have the value 1 for ABS(DATEDIFF(DAY, ca.DateColumn, T1.DateColumn)). You therefore might need to include additional logic in your ORDER BY to ensure consistent results.
dude.
I'll say a couple of things here for you to consider, since SQL Server is not my comfort zone, while SQL itself is.
First of all, I'd join TABLE1 with TABLE2 per ID. That way, I can specify on my SELECT clause the following tuple:
SELECT ID, Value, DateDiff(d, T1.Date, T2.Date) qt_diff_days
Obviously, depending on the precision of the dates kept there, rather they have times or not, you can change the date field on DateDiff function.
Going forward, I'd also make this date difference an absolute number (to resolve positive / negative differences and consider only the elapsed time).
After that, and that's where it gets tricky because I don't know the SQL Server version you're using, but basically I'd use a ROW_NUMBER window function to rank all my lines per difference. Something like the following:
SELECT
ID, Value, Abs(DateDiff(d, T1.Date, T2.Date)) qt_diff_days,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Abs(DateDiff(d, T1.Date, T2.Date)) ASC) nu_row
ROW_NUMBER (Transact-SQL)
Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
If you could run ROW_NUMBER properly, you should notice the query will rank it's data per ID, starting with 1 and increasing this ranking by it's difference between both dates, reseting it's rank to 1 when ID changes.
After that, all you need to do is select only those lines where nu_row equals to 1. I'd use a CTE to that.
WITH common_table_expression (Transact-SQL)
Specifies a temporary named result set, known as a common table expression (CTE).

Working smarter not harder - setting all columns between two values SQL

I currently have a table that contains a personID, a date, and a set of times throughout the day in 15 minute intervals, it looks like this:
Table
PersonID | Date | [09:00 - 09:15] | [09:15 - 09:30] | .... | [17:45 - 18:00]
Each time column contains an integer (0 as default).
I'm updating the table to include information provided from another table that contains event information. E.g. a person may be in an event from 09:00 - 17:45 and I would want to increment the integer value stored in the respective time columns by 1. Rather that write a LOT of statements to incorporate the various permutations of possible events throughout the day it seems that I should be able to update the columns between the start and end time, I'm just unsure how to do this.
I would want to do something like the following:
UPDATE Table1
SET
(SELECT Column_names FROM Table1 WHERE ColumnNameStartTime >=
Table2.StartTime AND ColumnNameEndTime <= Table2.EndTime)
=ColumnName + 1
WHERE Table1.PersonID = Table2.PersonID and Table1.Date = Table2.Date
Is this even possible?
A more practical table design might be:
PersonID Date StartTime End Time Value
1 2017-11-07 09:00:01 09:15:00 0
1 2017-11-07 09:15:01 09:30:00 0
1 2017-11-07 09:30:01 09:45:00 0
2 2017-11-07 09:00:01 09:15:00 0
2 2017-11-07 09:15:01 09:30:00 0
But choose your data types carefully and be wary of gateway issues when matching your times
Column Data type
Date date
StartTime time
EndTime time
Now if you have a source event table you can update like this:
UPDATE Target
SET Value = 1
from source
where source.eventtime between target.starttime and target.endtime
and source.PersonID = target.PersonID
and source.Date = target.Date
If you need to count events, it's a bit more complicated - you need a calendar table defining the time windows. Then you can join to it and put events in buckets and update the table with that

Find median of a list of values in Access 2010 - SQL or VBA

I have a list of time-intervals (in seconds) between consecutive datetime-stamped records in a dataset in Access 2010. I want to find the median time interval for each Animal on each Date.
Please can someone tell me how to go about this - either in SQL or VBA?
Example data:
Animal Date Time_interval
1 18/07/14 1
1 18/07/14 18
1 18/07/14 100
1 18/07/14 121
1 18/07/14 156
2 18/07/14 14
2 18/07/14 35
(I also have a field for Time, not included here to keep things simple)
Thanks very much!!
You could run a query with to compare the 2 date/time entries using the DateDiff function.
Here is the setup for DateDiff:
DateDiff ( interval, date1, date2, [firstdayofweek], [firstweekofyear])
From what I understand, create a new query and add a field like this:
median_time_interval: DateDiff ("s", Date, Time)