Date and Time in Spark Timestamp format - how to calculate difference?

Date and Time in Spark Timestamp format - how to calculate difference? - sql

Assume I don't have access to the underlying code that's producing the table. And that I'm relatively new to this.
The timestamp column has the format of "2021-08-26T11:14:08.251Z" which looks to be a spark timestamp format - but I'm not sure if it's in that datatype or a string.
I need to calculate the difference between two timestamp columns that are in the above format - how do I turn what you see there, into something that I can run a difference calculation on in a SQL query? And not in any underlying pyspark code?
Would love if you could give me two answers one if it's in the native timestamp datatype and one if it's actually just a string!

Spark timestamp type accepts only yyyy-MM-dd HH:mm:ss.SSS
To get difference between two timestamps as shown below example.
Example:
df.show()
//+------------------------+-----------------------+
//|ts |ts1 |
//+------------------------+-----------------------+
//|2021-08-26T11:14:08.251Z|2021-08-25 11:14:08.251|
//+------------------------+-----------------------+
df.select(unix_timestamp(to_timestamp(col("ts"))).as("ts"),unix_timestamp(to_timestamp(col("ts1"))).as("ts1")).
withColumn("diff",col("ts1")-col("ts")).show(false)
//+----------+----------+------+
//|ts |ts1 |diff |
//+----------+----------+------+
//|1629976448|1629915248|-61200|
//+----------+----------+------+
Update:
df.createOrReplaceTempView("tmp"
spark.sql("select ts,ts1,ts-ts1 as diff from (select unix_timestamp(to_timestamp(ts)) as ts,unix_timestamp(to_timestamp(ts1)) as ts1 from tmp)e").show(10,false)
//+----------+----------+-----+
//|ts |ts1 |diff |
//+----------+----------+-----+
//|1629976448|1629915248|61200|
//+----------+----------+-----+

Related

Issues while converting timestamp to specific timezone and then converting it to date in bigquery

I am doing just a simple conversion of timestamp column value to specific timezone and then getting the date out of it to create analytical charts based on the output of the query.
I am having the column of type timestamp in the bigquery and value for that column is in UTC. Now I need to convert that to PST (which is -8:00 GMT) and was looking straight forward to convert but I am seeing some dates up and down based on the output I get.
From the output that I was getting I took one abnormal output and wrote a query out of it as below:
select "2021-05-27 18:10:10" as timestampvalue ,
Date(Timestamp("2021-05-27 18:10:10" ,"-8:00")) as completed_date1,
Date(Timestamp("2021-05-27 18:10:10","America/Los_Angeles")) as completed_date2,
Date(TIMESTAMP_SUB("2021-05-27 18:10:10", INTERVAL 8 hour)) as completed_date3,
Date(Timestamp("2021-05-27 18:10:10","America/Tijuana")) as completed_date4
The output that I get is as below:
Based on my understanding I need to subtract 8 hours from the time in order to get the timestamp value for the timezone that I wanted and according to that completed_date3 column seems to show the correct value that should be there but if I use other timezone conversions as suggested in google documentation, the output gets changed to 2021-05-28 and I am not able to understand how that can happen.
Can anyone let me know what is the thing that I am doing wrong?

I was actually using it in a wrong way. I need to use it as below :
select "2021-05-27 18:10:10" as timestampvalue ,
Date(Timestamp("2021-05-27 18:10:10") ,"-8:00") as completed_date1,
Date(Timestamp("2021-05-27 18:10:10"),"America/Los_Angeles") as completed_date2,
Date(TIMESTAMP_SUB("2021-05-27 18:10:10", INTERVAL 8 hour)) as completed_date3,
Date(Timestamp("2021-05-27 18:10:10"),"America/Tijuana") as completed_date4
Initially I was converting that string timestamp to a specific timestamp based on the timezone and that is what I did not want.
Now if a convert a string to timestamp first without using time zone parameter and then apply timezone parameter when getting the date value out of it then it would return me correct date.
Please see the snapshot below :

Converting timestamp on whole table in bigquery

I have this table which stores millions of rows of data. This data has a date that indicates when was the data entered. I store the data in NUMERIC schemas with EPOCH UNIX as the format. However, I wanted to convert them to human date (yyyy-mm-dd hh:mm:ss) and later sort them by date not queried date.
However, it took me so long to find a suitable way. Here's my attempt.
I used SELECT CAST(DATE(timestamp) AS DATE) AS CURR_DT FROM dataset.table but it gave me this error:
No matching signature for function DATE for argument types: NUMERIC. Supported signatures: DATE(TIMESTAMP, [STRING]); DATE(DATETIME); DATE(INT64, INT64, INT64) at [1:13]
I used this method BigQuery: convert epoch to TIMESTAMP but still didn't fully understand
I'm a novice in coding so I hope you guys understand the situation. Thanks!

If I am understanding your question correctly you would like to take a numeric EPOCH time that is stored as an integer and convert it to a timestamp?
If so you can use the following in BigQuery Standard SQL:
select TIMESTAMP_SECONDS(1606048220)
It gives the output of:
2020-11-22 12:30:20 UTC
Documentation

If you only want the date component, then you would convert to a date after converting to a timestamp. Presumably you have seconds, so you would use TIMESTAMP_SECONDS() -- but there are similar functions for milliseconds and microseconds.
For just the date:
select date(timestamp_seconds(col))
Note that this removes the time component.

Convert Timestamp to DateTime in SQL (Google Big Query)

I'm trying to figure out how to convert a timestamp into a datetime object in SQL (I'm using Google Big Query).
Here's what the timestamp column looks like — each row contains a 10 digit integer.
Any help would be appreciated!

You want timestamp_seconds():
select timestamp_seconds(time_stamp) as utc_timestamp
Your column looks like a Unix timestamp, which is the number of seconds since 1970-01-01.

Spark SQL converting string to timestamp

I'm new to Spark SQL and am trying to convert a string to a timestamp in a spark data frame. I have a string that looks like '2017-08-01T02:26:59.000Z' in a column called time_string
My code to convert this string to timestamp is
CAST (time_string AS Timestamp)
But this gives me a timestamp of 2017-07-31 19:26:59
Why is it changing the time? Is there a way to do this without changing the time?
Thanks for any help!

You could use unix_timestamp function to convert the utc formatted date to timestamp
val df2 = Seq(("a3fac", "2017-08-01T02:26:59.000Z")).toDF("id", "eventTime")
df2.withColumn("eventTime1", unix_timestamp($"eventTime", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast(TimestampType))
Output:
+-------------+---------------------+
|userid |eventTime |
+-------------+---------------------+
|a3fac |2017-08-01 02:26:59.0|
+-------------+---------------------+
Hope this helps!

Solution on Java
There are some Spark SQL functions which let you to play with the date format.
Conversion example : 20181224091530 -> 2018-12-24 09:15:30
Solution (Spark SQL statement) :
SELECT
...
to_timestamp(cast(DECIMAL_DATE as string),'yyyyMMddHHmmss') as `TIME STAMP DATE`,
...
FROM some_table
You can use the SQL statements by using an instance of org.apache.spark.sql.SparkSession. For example if you want to execute an sql statement, Spark provide the following solution:
...
// You have to create an instance of SparkSession
sparkSession.sql(sqlStatement);
...
Notes:
You have to convert the decimal to string and after you can achieve the parsing to timestamp format
You can play with the format the get however format you want...

In spark sql you can use to_timestamp and then format it as your requirement.
select
date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as
from
Here 'timestamp' with value is 2019/02/23 12:00:00 and it is StringType column in 'event' table.
To convert into TimestampType apply to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss). It is need to make sure the format for timestamp is same as your column value. Then you apply date_format to convert it as per your requirement.
> select date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as timeStamp from event

Transform every row in a column to date, using first unix_timestamp

I have rows with the following format and I would like to transform then into valid Hive timestamps. Format in my data:
28/04/2017 00:00:00|20550|22/05/2017 00:00:00|
I'm only interested in the first and third column, separated with |, in MY case the format is, then:
dd/MM/yy HH:mm:ss
I've discovered this can't be used as timestamp in Hive.
I find myself unable to transform all that first and third column to the proper format using queries similar to:
select from_unixtime(unix_timestamp('28/04/2017','dd/MM/yy HH:mm:ss'),'yyyy-MM-dd') from `20170428_f_pers_pers`
I'm trying different instances of that query but since I can't access the documentation (internet is capped here at work), I can't see how to properly use this two functions, from_unixtime and unix_timestamp
I've made the following assumptions:
I can reorder the days and years. If this isn't true, I have no idea how to transform my original data into proper Hive format
When I do this select, it affects the whole column. Further, after doing this with success I should be able to change the format of the whole column from string to timestamp (maybe I have to create a new column for that, not sure)
I do not care about doing both columns at once, but right now when I do the query showed first I get as many nulls as data has my table, and I'm unsure my assumptions are even partially right since every example I come accross is simpler (they do not change days and years arround, for instance).
I would like to know how to apply the query to a specific column, since I haven't understood how to do that from the examples studied so far. I do not see them using any type of column ID for that, which is weird to me, using data from the column to change the column itself.
Thanks in advance.
edit: I am now trying something like
select from_unixtime(unix_timestamp(f_Date, 'dd/MM/yyyy HH:mm:ss')) from `myTable`
But I get from HUE the following error:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

The format should be completely covered by the input string.
In other words -
The format can be equal in length to the the input string or shorter, but not longer.
28/04/2017 00:00:00
|||||||||||||||||||
dd/MM/yyyy HH:mm:ss
select from_unixtime(to_unix_timestamp('28/04/2017 00:00:00', 'dd/MM/yyyy HH:mm:ss'))
2017-04-28 00:00:00
28/04/2017 00:00:00
||||||||||
dd/MM/yyyy
select from_unixtime(to_unix_timestamp('28/04/2017 00:00:00', 'dd/MM/yyyy'))
2017-04-28 00:00:00
The result can be converted from string to timestamp using cast
select cast (from_unixtime(to_unix_timestamp('28/04/2017 00:00:00', 'dd/MM/yyyy HH:mm:ss')) as timestamp)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Date and Time in Spark Timestamp format - how to calculate difference? - sql

Related

Issues while converting timestamp to specific timezone and then converting it to date in bigquery

Converting timestamp on whole table in bigquery

Convert Timestamp to DateTime in SQL (Google Big Query)

Spark SQL converting string to timestamp

Transform every row in a column to date, using first unix_timestamp

Categories

Resources