I'm new to Spark SQL and am trying to convert a string to a timestamp in a spark data frame. I have a string that looks like '2017-08-01T02:26:59.000Z' in a column called time_string
My code to convert this string to timestamp is
CAST (time_string AS Timestamp)
But this gives me a timestamp of 2017-07-31 19:26:59
Why is it changing the time? Is there a way to do this without changing the time?
Thanks for any help!
You could use unix_timestamp function to convert the utc formatted date to timestamp
val df2 = Seq(("a3fac", "2017-08-01T02:26:59.000Z")).toDF("id", "eventTime")
df2.withColumn("eventTime1", unix_timestamp($"eventTime", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast(TimestampType))
Output:
+-------------+---------------------+
|userid |eventTime |
+-------------+---------------------+
|a3fac |2017-08-01 02:26:59.0|
+-------------+---------------------+
Hope this helps!
Solution on Java
There are some Spark SQL functions which let you to play with the date format.
Conversion example : 20181224091530 -> 2018-12-24 09:15:30
Solution (Spark SQL statement) :
SELECT
...
to_timestamp(cast(DECIMAL_DATE as string),'yyyyMMddHHmmss') as `TIME STAMP DATE`,
...
FROM some_table
You can use the SQL statements by using an instance of org.apache.spark.sql.SparkSession. For example if you want to execute an sql statement, Spark provide the following solution:
...
// You have to create an instance of SparkSession
sparkSession.sql(sqlStatement);
...
Notes:
You have to convert the decimal to string and after you can achieve the parsing to timestamp format
You can play with the format the get however format you want...
In spark sql you can use to_timestamp and then format it as your requirement.
select
date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as
from
Here 'timestamp' with value is 2019/02/23 12:00:00 and it is StringType column in 'event' table.
To convert into TimestampType apply to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss). It is need to make sure the format for timestamp is same as your column value. Then you apply date_format to convert it as per your requirement.
> select date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as timeStamp from event
Related
I'm trying to cast the results of an entire column in BigQuery in MM/DD/YYYY format
Time_Column
2022-05-26T17:32:41.000Z
2022-05-28T06:34:23.000Z
Results:
We can try to use FORMAT_DATE function to make it, we can refer to this link Supported Format Elements For DATE to use your expected date format from the datetime type.
SELECT FORMAT_DATE("%m/%d/%Y", Time_Column)
FROM T
Assume I don't have access to the underlying code that's producing the table. And that I'm relatively new to this.
The timestamp column has the format of "2021-08-26T11:14:08.251Z" which looks to be a spark timestamp format - but I'm not sure if it's in that datatype or a string.
I need to calculate the difference between two timestamp columns that are in the above format - how do I turn what you see there, into something that I can run a difference calculation on in a SQL query? And not in any underlying pyspark code?
Would love if you could give me two answers one if it's in the native timestamp datatype and one if it's actually just a string!
Spark timestamp type accepts only yyyy-MM-dd HH:mm:ss.SSS
To get difference between two timestamps as shown below example.
Example:
df.show()
//+------------------------+-----------------------+
//|ts |ts1 |
//+------------------------+-----------------------+
//|2021-08-26T11:14:08.251Z|2021-08-25 11:14:08.251|
//+------------------------+-----------------------+
df.select(unix_timestamp(to_timestamp(col("ts"))).as("ts"),unix_timestamp(to_timestamp(col("ts1"))).as("ts1")).
withColumn("diff",col("ts1")-col("ts")).show(false)
//+----------+----------+------+
//|ts |ts1 |diff |
//+----------+----------+------+
//|1629976448|1629915248|-61200|
//+----------+----------+------+
Update:
df.createOrReplaceTempView("tmp"
spark.sql("select ts,ts1,ts-ts1 as diff from (select unix_timestamp(to_timestamp(ts)) as ts,unix_timestamp(to_timestamp(ts1)) as ts1 from tmp)e").show(10,false)
//+----------+----------+-----+
//|ts |ts1 |diff |
//+----------+----------+-----+
//|1629976448|1629915248|61200|
//+----------+----------+-----+
in which format should I convert timestamp to receive timestamp value like this 15.08.2017 22:17:41.860000
?
thx
You're close, you just need to Cast it to a string after adding the format:
Cast(Cast(tscol AS FORMAT 'dd.mm.yyyyBhh:mi:ss.s(6)') AS CHAR(26))
Or shorter using
To_Char(tscol,'dd.mm.yyyy hh:mi:ssff')
I have timestamp strings that look something like the example here:
2017-07-12T01:51:12.732-0600. Is there any function/combination of functions I can use this to convert this to UTC accurately?
The output should be 2017-07-12 07:51:12.732000. I've tried using to_timestamp and convert_timezone. Obviously, the latter failed, but so did the former and I'm at my wit's end. Help?
you can convert the string directly to timestamp and then set source timezone in convert_timezone function like this (note, offset sign is the opposite to timezone):
select convert_timezone('UTC+06','utc','2017-07-12T01:51:12.732-0600'::timestamp)
if -0600 part is varying you can construct 'UTC+06' part dynamically like this
with times as (
select '2017-07-12T01:51:12.732-0600'::varchar(28) as ts_col
)
select convert_timezone('utc'+(substring(ts_col from 24 for 3)::integer*(-1))::varchar(3),'utc',ts_col::timestamp)
from times
My table in hive has a filed of date in the format of '2016/06/01'. but i find that it is not in harmory with the format of '2016-06-01'.
They can not compare for instance.
Both of them are string .
So I want to know how to make them in harmory and can compare them. Or on the other hand, how to change the '2016/06/01' to '2016-06-01' so that them can compare.
Many thanks.
To convert date string from one format to another you have to use two date function of hive
unix_timestamp(string date, string pattern) convert time string
with given pattern to unix time stamp (in seconds), return 0 if
fail.
from_unixtime(bigint unixtime[, string format]) converts the
number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a
string representing the timestamp of that moment in the current
system time zone.
Using above two function you can achieve your desired result.
The sample input and output can be seen from below image:
The final query is
select from_unixtime(unix_timestamp('2016/06/01','yyyy/MM/dd'),'yyyy-MM-dd') from table1;
where table1 is the table name present in my hive database.
I hope this help you!!!
Let's say you have a column 'birth_day' in your table which is in your format,
you should use the following query to convert birth_day into the required format.
date_Format(birth_day, 'yyyy-MM-dd')
You can use it in a query in the following way
select * from yourtable
where
date_Format(birth_day, 'yyyy-MM-dd') = '2019-04-16';
Use :
unix_timestamp(DATE_COLUMN, string pattern)
The above command would help convert the date to unix timestamp format which you may format as you want using the Simple Date Function.
Date Function
cast(to_date(from_unixtime(unix_timestamp(yourdate , 'MM-dd-yyyy'))) as date)
here is my solution (for string to real Date type):
select to_date(replace('2000/01/01', '/', '-')) as dt ;
ps:to_date() returns Date type, this feature needs Hive 2.1+; before 2.1, it returns String.
ps2: hive to_date() function or date_format() function , or even cast() function, cannot regonise the 'yyyy/MM/dd' or 'yyyymmdd' format, which I think is so sad, and make me a little crazy.