I have a dataframe with a date column and an integer column and I'd like to add months based on the integer column to the date column. I tried the following, but I'm getting an error:
from pyspark.sql import functions as f
withColumn('future', f.add_months('cohort', col('period')))
Where 'cohort' is my date column and period is an integer. I'm getting the following error:
TypeError: Column is not iterable
Use expr to pass a column as second parameter for add_months function:
df.withColumn('future', F.expr("add_months(cohort, period)"))
Related
I have a DB2 table where NUM column is defined as INTEGER in DB2 and the query result is shown below,
NUM columns have numeric values which needs to be converted to date format. This numeric values are nothing but duration from 01.01.1850. Example : 01.01.1850 + 57677 days = 01.12.2007.
So Is it possible to convert or cast the numeric value into date fields in DB2 , so that the select query from the table can result as shown below after converting a numeric field into date field,
You may use the scalar ADD_DAYS function:
SELECT EMP_ID, ADD_DAYS('1850-01-01', NUM) AS NUM
FROM yourTable;
Not all Db2 products & versions have the ADD_DAYS function.
The following expression works for all of them.
You may optionally add DAY or DAYS at the end.
DATE ('1850-01-01') + 57677
I have a pandas DataFrame which includes a datetime column and I want to filter the data frame between the current hour and 10 hours ago. I have tried different ways to do it but still I cannot handle it. Because when I want to use pandas, the column type is Series and I can't use timedelta to compare them. If I use a for loop to compare the column as a string to my time interval, it is not efficient.
The table is like this:
And I want to filter the 'dateTime' column between current time and 10 hours ago, then filter based on 'weeks' > 80.
I have tried these codes as well But they have not worked:
filter_criteria = main_table['dateTime'].sub(today).abs().apply(lambda x: x.hours <= 10)
main_table.loc[filter_criteria]
This returns an error:
TypeError: unsupported operand type(s) for -: 'str' and 'datetime.datetime'
Similarly this code has the same problem:
main_table.loc[main_table['dateTime'] >= (datetime.datetime.today() - pd.DateOffset(hours=10))]
And:
main_table[(pd.to_datetime('today') - main_table['dateTime'] ).dt.hours.le(10)]
In all of the code above main_table is the name of my data frame.
How can I filter them?
First you need to make sure that your datatype in datetime column is correct. you can check it by using:
main_table.info()
If it is not datetime (i.e, object) convert it:
# use proper formatting if this line does not work
main_table['dateTime'] = pd.to_datetime(main_table['dateTime'])
Then you need to find the datetime object of ten hour before current time (ref):
from datetime import datetime, timedelta
date_time_ten_before = datetime.now() - timedelta(hours = 10)
All it remains is to filter the column:
main_table_10 = main_table[main_table['dateTime'] >= date_time_ten_before]
in Hive table value for one column is like 01/12/17.But I need the value in the format as 12-2017(month-year).How to convert it?
Convert the string to a unix_timestamp and output the required format using from_unixtime.
select from_unixtime(unix_timestamp(col_name,'MM/dd/yy'),'MM-yyyy')
I want to import some data from a text file to a PostgreSQL table. when I create the table I have a column with a data type of DATE. when I want to import the data, it gives this error:
ERROR: invalid input syntax for type date: "2009/11"
Some of my data for that column has the format of YYYY/MM/DD and some of them has the format of YYYY/MM. Now my question is how can I import these data?
select to_date(dateColumn,'yyyy/mm/dd')
select to_date('2009/11/07','yyyy/mm/dd')
2009-11-07
select to_date('2009/11','yyyy/mm/dd')
2009-11-01
select to_date('2009','yyyy/mm/dd')
2009-01-01
For the record. I don't use PostgresSQL myself so I don't know if there is a native tool or way of doing this.
Since it's a one time operation, I would change the datatype of the column from DATE to TEXT, or create a new temp TEXT column.
Import the txt file on that table and execute a query to add the missing day (since it's not important but required by the DATE format)
UPDATE table SET dateColumn = dateColumn + '/01' WHERE len(dateColumn) = 7;
Please write that in a valid PostgresSQL syntax. You should update the column value and add it '/01' when its length is 7 (YYYY/MM).
Afterwards, copy the value from the temp column to the DATE column or change the datatype back to DATE. Whatever path you choose before.
Merry Xmas!
For Ex: I am bringing the Hive table column (datetime data type) value in Pig and want to extract on;y the DATE portion. I have tried using ToDate function. the below is the Error Information. Please help me in this critical situation.
The Original Value in this column is "2014-07-29T06:01:33.705-04:00", I need out put as "2014-07-29"
ToDate(eff_end_ts,'YYYY-MM-DD') AS Delta_Column;
2016-07-28 07:07:25,298 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.ToDate as multiple or none of them
fit. Please use an explicit cast.
Assuming your column name is f1 which has the timestamp with values like 2014-07-29T06:01:33.705-04:00, you will have to use GetYear(),GetMonth,GetDay and CONCAT it to the required format.
B = FOREACH A GENERATE CONCAT(
CONCAT(
CONCAT((chararray)GetYear(f1),'-')),
(CONCAT((chararray)GetMonth(f1),'-')),
(chararray)GetDay(f1)) AS Day;
I did the Work around to figure out and Its working by this way:
ToDate(ToString(eff_end_ts,'YYYY-MM-DD'),'YYYY-MM-DD') AS (datetime: Delta_Column)