Format change when transferring data from excel to Big Query - sql

In excel, I have a column labeled ride_length which has data from 2 different columns that include timestamps (end time - start time).
Example values: 0:06:40, 1:48:08, 34:56:57
I formatted these cells as TIME 37:30:55
After uploading the data to Big Query, the data is formatted as STRING and not time.
What am I doing wrong?

To calculate the duration in Excel to a value with the unit secounds, use this formula
=(A2-A3)*24*3600
BigQuery can parse a string to tranform it to a value. However, the time can be a maximum of 24 hours. Therefore, I would tranform the duration in a value with the unit secound.
Select A,
#time(parse_timestamp("%H:%M:%S", A)) as time_h_less_24,
3600*cast(split(A,":")[offset(0)] as int64)+TIME_DIFF(time(parse_timestamp("%Y:%M:%S", A)),"0:0:0",SECOND)as duration_in_s,
TIMESTAMP_MILLIS(1000*3600*cast(split(A,":")[offset(0)] as int64)+TIME_DIFF(time(parse_timestamp("%Y:%M:%S", A)),"0:0:0",MILLISECOND))
from
(Select "23:56:57" as A)

Related

Timeseries data query - optimizing query performance

Quick question on optimizing a query type we do a lot in working with time-series data provided by a data logging system.
Database is SQL Server 2019 (v15) and for simplification assume the table is made up of just:
ID (bigint) - unique ID for the row
Timestamp (bigint) - Unix timestamp value.
Sample (float) - Value of sample taken (e.g. temperature measurement).
There is no regular interval or spacing with respect to timestamp as the data logger only logs data on a change to the data point being monitored (i.e. there is no reliable way to determine when in time that a previous sample would have been taken).
Anyway, our queries often involve selecting a range of data between two timestamps, but as expected the timestamps selected as the bounds for the range rarely ever line-up exactly with a timestamp in the data set. Because of this, what we really need to select is all the data in the range plus one record immediately before the range (so we know what the data value is leading into the selected range).
Historically we have done this one of two ways:
Select the rows between the timestamps (inclusive) and union this with a top(1) select of the first row with a timestamp <= to the range start.
OR
Select the top(1) timestamp <= to the range start into a variable and then do a select statement with this new timestamp as the lower bound for the range.
Since I am not an expert, I'm wondering if either one of these methods has better performance over the other or if there is maybe some better, third option we haven't encountered.
Thanks!

Wanting a datediff formula to get the difference in seconds between each trasntype

Was hoping to get some insight into what I should include in a Datediff formula to calculate the difference in time between columns and rows. Using the "timestamp" column as the date, and the "transtype" column as the modifier. So something along the lines of "timestamp (TR) - Timestamp (FIN)". I've attached a screenshot for what the data looks like.
I'm essentially trying to find a formula that calculates the time between each "transaction" or "transtype", per "incident" or "inci_id".
Thank you for your time and assistance.
Data example

Google Sheets Query - fill zeros for dates when no data was recorded

How do I change my query in this google sheet so that it generates a table that has a column for every week, even if there is no data for that week? (display 0 in the values fields)
The problem I'm running into is that if there is a week or date with no data, it's not listed. I would like to generate a row/column with 0 for dates without data.
This is how the query is currently written:
=QUERY(A3:D9,"Select A, sum(C) group by A pivot D")
Here's the sheet (hyperlinked so you can see it):
The basic problem you need to solve is to know which data pieces are missing. Do you need the entries for every single day in a given date range? Only weekdays? Only weekdays, except public holidays? etc.
Once you know that, you can insert the missing data in the query itself, by concatenating the source table with literal data as below (where I'm manually adding a combination of Red with Nov 5), or with another query/resultset of another formula that gives you the missing data:
=QUERY({A3:D9; {"Red", date(2018,11,5), 0, weeknum(date(2018,11,5))}},
"Select Col1, sum(Col3) group by Col1 pivot Col4")

Difference in minutes from datetime

I'm trying to obtain the total amount of time difference from two timestamp columns (datetime)
I currently have a Table 1 setup like the following:
Time_Line_Down => datetime
Time_Line_Ran => datetime
Total_Downtime => Computed column with formula:
(case when [Time_Line_Down] IS NULL then NULL else CONVERT([varchar],case when [Time_Line_Ran] IS NULL then NULL else [Time_Line_Ran]-[Time_Line_Down] end,(108)) end)
Every time some conditions occur, I am copying those three columns (I have more columns but the problem is on this ones) into another Table 2 originally setup like the following:
Time_Line_Down => datetime
Time_Line_Ran => datetime
Total_Downtime => datetime
I then use an excel spreadsheet to "Get External Data" from SQL Server and use a pivot table to work with the data.
Example
Time_Line_Down = 2015-02-20 12:32:40.000
Time_Line_Ran = 2015-02-20 12:34:40.000
Total_Downtime = 1900-01-01 00:02:00.000
Desired Output
I want the pivot table to be able to give me a Grand Total of downtime from all rows in that table
Let's say it was forty five hours, fifty minutes and thirty seconds of accumulated downtime it should read like (45:50:30)
The problem:
Even if I format the Total_Downtime column in the excel pivot table as h:mm:ss to read like this:
Total_Downtime = 0:02:00
As rows accumulate and the Grand Total is calculated the "Date" part of the timestamp is messing the result is the total exceeds 24 hours
What I have tried
I changed the data type format of column Total_Downtime in Table 2 to time(0) so that it won't send the "Date" part, only the "Time" part of the timestamp, it is working and reads out 00:02:00
But now all the values in my pivot table on excel for that column are 0:00:00 no matter what value is actually in the SQL table.
Any suggestions?
You can use the Excel time format [h]:mm:ss which can go beyond 24 hours.
Alternatively, you can use the SQL function DATEDIFF to get the total downtime in seconds, and then convert that to however you need to display it in Excel, e.g.
case when [Time_Line_Down] IS NULL then NULL else case when [Time_Line_Ran] IS NULL then NULL else datediff(ss, Time_Line_Ran, Time_Line_Down) end end
I don't think you need the CASE statements here, you can just use
datediff(ss, Time_Line_Ran, Time_Line_Down)
Thank you all for your help,
I went ahead an tried the function DATEDIFF as suggested, I changed Table 1 computed column formula and Table 2 Total_Downtime column data type to int. Once imported into excel this numeric value needed some extra calculations.
In principle is the best answer and should work for anyone trying to calculate the difference from two timestamps, as mentioned before, is pretty straight forward.
But in my situation I needed to maintain two things:
1) The format 00:00:00 for the column Total_Downtime in Table 1, which changed to an integer value when using DATEDIFF
2) The pivot table Total_Downtime column format [h]:mm:ss (suggested by TobyLL) in excel, which required several calculations to convert from seconds
Solution
After learning that every time I copied from Table 1 to Table 2 the computed value (e.g. 00:02:00) changed to 1900-01-01 00:02:00.000 and that when imported to Excel it equaled to 1.001388889, I decided to force the "Date" part of the time stamp to be 1899-12-31 so that Excel would only calculate the Grand total in the pivot table with the "Time" (decimal) part.

how to convert a number to date in oracle sql developer

I have a excel format dataset that need to be imported to a table, one column is a date, but the data is stored in number format, such as 41275, when importing data, i tried to choose data format yyyy-mm-dd, it gives an error: not a valid month, also tried MM/DD/YYYY also error: day of month must be between 1 and last day of month. does anyone know what is this number and how can i convert it to a date format when importing it into the database?Thanks!
The expression (with respect to the Excel's leap year bug AmmoQ mentions) you are looking for is:
case
when yourNumberToBeImported <= 59 then date'1899-12-31' + yourNumberToBeImported
else date'1899-12-30' + yourNumberToBeImported
end
Then, you may either
Create a (global) temporary table in your Oracle DB, load your data from the Excel to the table and then reload the data from the temporary table to your target table with the above calculation included.
or you may
Load the data from the Excel to a persistent table in your Oracle DB and create a view over the persistent table which would contain the above calculation.
The number you got is the excel representation of a certain date ...
excel stores a date as the number of days, starting to count at a certain date ... to be precise:
1 = 1-JAN-1900
2 = 2-JAN-1900
...
30 = 30-JAN-1900
so, to get your excel number into an oracle date, you might want to try something like this...
to_date('1899-12-31','yyyy-mm-dd') + 41275