Spark timestamp column value comparison

Spark timestamp column value comparison - apache-spark-sql

Sample data I want to filter out the rows which have the date_time_begin is less than the start date and i have tried this but i am not getting any output. I am using the spark 1.6.2 version
filterddata = joindedf.filter(joindedf("date_time_begin").gt(lit("str_date")))
filterddata.show()

If both of the columns are of the same datatype this should work.
val filterddata = joindedf.filter(joindedf("date_time_begin").gt(joindedf("str_date")))
lit is used to convert the literal value as a column.
I hope this helped!

Related

Change data type of column from STRING format to DATE format

I am reading a file from ADLS location, in that one column Period_Ending_Date is having data type as STRING.
The Period_Ending_Date is having many dates in random order, I need to apply filter to get the latest date.
I'm trying this code:
select * from final_table
WHERE Period_Ending_Date = (SELECT MAX(Period_Ending_Date) FROM final_table)
But the problem is I'm getting the day with maximum, not the latest date. I can understand this is happening because of STRING data type. Please guide me how I can change this column to DATE data type or any other alternative to get the solution of this.
I'm working with Scala and SQL on Azure Databricks.

what about changing SELECT MAX(Period_Ending_Date) FROM final_table to SELECT MAX(cast(Period_Ending_Date as date)) FROM final_table - performing explicit casting to date if date format is ISO8601 (YYYY-MM-DD) or using the to_date function (doc) to convert non-standard dates.

Date and Time in Spark Timestamp format - how to calculate difference?

Assume I don't have access to the underlying code that's producing the table. And that I'm relatively new to this.
The timestamp column has the format of "2021-08-26T11:14:08.251Z" which looks to be a spark timestamp format - but I'm not sure if it's in that datatype or a string.
I need to calculate the difference between two timestamp columns that are in the above format - how do I turn what you see there, into something that I can run a difference calculation on in a SQL query? And not in any underlying pyspark code?
Would love if you could give me two answers one if it's in the native timestamp datatype and one if it's actually just a string!

Spark timestamp type accepts only yyyy-MM-dd HH:mm:ss.SSS
To get difference between two timestamps as shown below example.
Example:
df.show()
//+------------------------+-----------------------+
//|ts |ts1 |
//+------------------------+-----------------------+
//|2021-08-26T11:14:08.251Z|2021-08-25 11:14:08.251|
//+------------------------+-----------------------+
df.select(unix_timestamp(to_timestamp(col("ts"))).as("ts"),unix_timestamp(to_timestamp(col("ts1"))).as("ts1")).
withColumn("diff",col("ts1")-col("ts")).show(false)
//+----------+----------+------+
//|ts |ts1 |diff |
//+----------+----------+------+
//|1629976448|1629915248|-61200|
//+----------+----------+------+
Update:
df.createOrReplaceTempView("tmp"
spark.sql("select ts,ts1,ts-ts1 as diff from (select unix_timestamp(to_timestamp(ts)) as ts,unix_timestamp(to_timestamp(ts1)) as ts1 from tmp)e").show(10,false)
//+----------+----------+-----+
//|ts |ts1 |diff |
//+----------+----------+-----+
//|1629976448|1629915248|61200|
//+----------+----------+-----+

Convert date column to character column

One of the columns is present in date datatype. I need to convert that date column in character datatype so that it can be concatenated with another character column.
Right now my date column is present in the following format : 09-JUN-2020.
Please help me in converting this column to character column.This needs to be done sas enterprise guide.
Thank u so much in advance.

You can use PUT() to convert from numeric to character. You need to find the format you want the output to look like and use that as your second parameter. Assuming you want your date to look like 2020-06-02 character this works:
*puts date as 2020-06-02;
newVar1 = put(dateVar, yymmddd10.);
*creates date variable as 02Jun2020;
newVar2 = put(dateVar, date9.);
FYI - You can find the list of formats available here

Hive table date column value converson

in Hive table value for one column is like 01/12/17.But I need the value in the format as 12-2017(month-year).How to convert it?

Convert the string to a unix_timestamp and output the required format using from_unixtime.
select from_unixtime(unix_timestamp(col_name,'MM/dd/yy'),'MM-yyyy')

How Do we extract only the DATE portion from datetime datatype column in Pig?

For Ex: I am bringing the Hive table column (datetime data type) value in Pig and want to extract on;y the DATE portion. I have tried using ToDate function. the below is the Error Information. Please help me in this critical situation.
The Original Value in this column is "2014-07-29T06:01:33.705-04:00", I need out put as "2014-07-29"
ToDate(eff_end_ts,'YYYY-MM-DD') AS Delta_Column;
2016-07-28 07:07:25,298 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.ToDate as multiple or none of them
fit. Please use an explicit cast.

Assuming your column name is f1 which has the timestamp with values like 2014-07-29T06:01:33.705-04:00, you will have to use GetYear(),GetMonth,GetDay and CONCAT it to the required format.
B = FOREACH A GENERATE CONCAT(
CONCAT(
CONCAT((chararray)GetYear(f1),'-')),
(CONCAT((chararray)GetMonth(f1),'-')),
(chararray)GetDay(f1)) AS Day;

I did the Work around to figure out and Its working by this way:
ToDate(ToString(eff_end_ts,'YYYY-MM-DD'),'YYYY-MM-DD') AS (datetime: Delta_Column)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Spark timestamp column value comparison - apache-spark-sql

Sample data I want to filter out the rows which have the date_time_begin is less than the start date and i have tried this but i am not getting any output. I am using the spark 1.6.2 version filterddata = joindedf.filter(joindedf("date_time_begin").gt(lit("str_date"))) filterddata.show()

If both of the columns are of the same datatype this should work. val filterddata = joindedf.filter(joindedf("date_time_begin").gt(joindedf("str_date"))) lit is used to convert the literal value as a column. I hope this helped!

Related

Change data type of column from STRING format to DATE format

Date and Time in Spark Timestamp format - how to calculate difference?

Convert date column to character column

Hive table date column value converson

How Do we extract only the DATE portion from datetime datatype column in Pig?

Categories

Resources