how can I modify the data in dataframe - apache-spark-sql

I want to modify the date eg:2018-02-02 00:00:00 to 20180202.What should I do.
I have used the to_date and regexp_replace function to modify the data, but it doesn't work.

What type of data is 2018-02-02 00:00:00 string or timestamp? maybe you should cast to string before use to_date.

Related

How to convert date column value which is in CST timezone

For example column name Start_date is having value like below in hive which datatype is string
02-JUN-22 11.13.22 AM CST
I want to convert the value as below
2022-06-02
tried to_Date function but getting null values
You can try below one,
And also if you have 'AM CST' values included as part of your input string, then you can trim or take substring value to match the below query.
select from_unixtime(unix_timestamp('02-JUN-22 11.13.22' ,'dd-MMM-yy'), 'yyyy-MM-dd');
Use the built-in DateTime/TimeZone functionality
<?php
$mysqlDate = '2009-04-01 15:36:13';
$dateTime = new DateTime ($mysqlDate);
$dateTime->setTimezone(new DateTimeZone('America/Los_Angeles'));
?>
have you tried to_char on your query?
for example:
select to_char(start_date,'yyyy-mm-dd') from your table;
Convert to timestamp in Hive format taking into account the timezone (use from_unixtime(unix_timestamp(col, pattern))), see patterns, then use to_date.
Demo:
select to_date(from_unixtime(unix_timestamp('02-JUN-22 11.13.22 AM CST','dd-MMM-yy hh.mm.ss a z')))
Result:
2022-06-02

Unable to Parse a date string to timestamp in BigQuery Standard SQL

I have date string column in my dataset and this date string column has date values in the formats as shown below
Some dates are in the format : 2020-04-22 and
some dates are in the format : 04/22/2020
Kindly suggest how to parse these date values to a timestamp format : 2020-04-22 00:00:00 UTC
Thanks!
You need to normalise both dates.
Try using:
SELECT
COALESCE(SAFE.PARSE_DATE('%F', your_date_field), SAFE.PARSE_DATE('%m/%d/%Y', your_date_field)) AS your_new_date_field
FROM ...
If all strings strictly conform to one format or the other, you can do conditional logic:
PARSE_TIMESTAMP(
CASE
WHEN REGEXP_CONTAINS(mydate, r"^\d{4}-\d{2}-\d{2}$" THEN "%F"
WHEN REGEXP_CONTAINS(mydate, r"^\d{2}/\d{2}/\d{4}$" THEN "%m/%d/%Y"
END,
mydate
)
You might want to use SAFE.PARSE_TIMESTAMP() instead of PARSE_TIMESTAMP() to prevent conversion failures on unmatched formats.

Spark SQL converting string to timestamp

I'm new to Spark SQL and am trying to convert a string to a timestamp in a spark data frame. I have a string that looks like '2017-08-01T02:26:59.000Z' in a column called time_string
My code to convert this string to timestamp is
CAST (time_string AS Timestamp)
But this gives me a timestamp of 2017-07-31 19:26:59
Why is it changing the time? Is there a way to do this without changing the time?
Thanks for any help!
You could use unix_timestamp function to convert the utc formatted date to timestamp
val df2 = Seq(("a3fac", "2017-08-01T02:26:59.000Z")).toDF("id", "eventTime")
df2.withColumn("eventTime1", unix_timestamp($"eventTime", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast(TimestampType))
Output:
+-------------+---------------------+
|userid |eventTime |
+-------------+---------------------+
|a3fac |2017-08-01 02:26:59.0|
+-------------+---------------------+
Hope this helps!
Solution on Java
There are some Spark SQL functions which let you to play with the date format.
Conversion example : 20181224091530 -> 2018-12-24 09:15:30
Solution (Spark SQL statement) :
SELECT
...
to_timestamp(cast(DECIMAL_DATE as string),'yyyyMMddHHmmss') as `TIME STAMP DATE`,
...
FROM some_table
You can use the SQL statements by using an instance of org.apache.spark.sql.SparkSession. For example if you want to execute an sql statement, Spark provide the following solution:
...
// You have to create an instance of SparkSession
sparkSession.sql(sqlStatement);
...
Notes:
You have to convert the decimal to string and after you can achieve the parsing to timestamp format
You can play with the format the get however format you want...
In spark sql you can use to_timestamp and then format it as your requirement.
select
date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as
from
Here 'timestamp' with value is 2019/02/23 12:00:00 and it is StringType column in 'event' table.
To convert into TimestampType apply to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss). It is need to make sure the format for timestamp is same as your column value. Then you apply date_format to convert it as per your requirement.
> select date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as timeStamp from event

Converting Date to day-mon-year format in Oracle

I need to convert dates like this:
3/2/2016 12:00:00 AM
to this:
2-MAR-2016
For ORACLE You can use to_char(your_date, format)
SELECT TO_CHAR(your_Date ,'DD-MON-YYYY')
FROM DUAL;
for mysql
SELECT TO_CHAR(your_Date ,'%d-%m-%Y')
FROM DUAL;
Oracle's default date format is YYYY-mm-dd. We can use the TO_CHAR method to convert to a specific format.
TO_CHAR(date, 'FMDD-MON-YYYY')
Breakdown
FMDD- Apperantly, just using DD as recommended in the documentation does not format days with a leading 0. You need to use FMDD.
MON- Abbreviated month name
%YYYY- Long year format
Reference: https://docs.oracle.com/cd/B28359_01/server.111/b28286/sql_elements004.htm
In my-sql, the same could be accomplished with the DATE_FORMAT method
DATE_FORMAT(date, '%d-%b-%y')
Slightly different formatter options
Scroll down to the Datetime Format Elements

UNIX date converted incorrectly in HIVE

When I execute this statement in HIVE
select FROM_UNIXTIME(unix_timestamp(to_date('2016-03-28 00:00:00'),'YYYY-MM-DD'));
I get
OK
2015-12-27 00:00:00
Shouldn't it return 2016-03-28 00:00:00 instead?
The UPPER CASE string for the date pattern string that you have mentioned is wrong. It should be "yyyy-MM-dd"
Please use the following which should fix your error
select FROM_UNIXTIME(unix_timestamp(to_date('2016-03-28 00:00:00'),'yyyy-MM-dd'));