Calculate number of days left from nearest date in Pandas - pandas

I have a list like this:
dates = ["2020-05-08","2019-02-22", "2014-08-16"...........]
And a DF like this:
date .....
2020-12-19 .....
2015-06-01 .....
2018-03-06 ....
......
I want to create another column named "daysLeft", which count, the days from the nearest date.
For example.
If today is 24th Dec, then "1" day is left for chrismas. But if today is 26th Dec, "-1" day is left for Christmas. (Subtract date with nearest date)

I am not sure in this answers your question, but this might be a step in the right direction:
#This Function Converts String to DateTime Object for Date Manipulation
from datetime import datetime
def make_date(any_value):
return datetime.strptime(any_value, '%Y-%m-%d')
#We apply thr Function to the list
dates = ["2020-05-08","2019-02-22", "2014-08-16"]
dt_obj = list(map(make_date, dates))
#We apply the function to the DataFrame
date_df_list=["2020-12-19", "2015-06-01", "2018-03-06" ]
import pandas
date_df=pandas.DataFrame(date_df_list, columns=["date"])
date_df['date'] = date_df['date'].astype(str) #Each object needs to be converted to String for Function
date_df_yyyymmdd = pandas.DataFrame(columns=['date']) #Initialise Empty DataFrame
date_df_yyyymmdd['date'] = date_df['date'].apply(make_date)
#In this example we Find Difference in Dates for the first date of list
#Similarly we can find for all the other dates in the list
date_df['daysLeft'] = date_df_yyyymmdd['date'].apply(lambda x: (x-dt_obj[1]).days)
print(date_df)

Related

Outputting pandas timestamp to tuple with just month and day

I have a pandas dataframe with a timestamp field which I have successfully to converted to datetime format and now I want to output just the month and day as a tuple for the first date value in the data frame. It is for a test and the output must not have leading zeros. I ahve tried a number of things but I cannot find an answer without converting the timestamp to a string which does not work.
This is the format
2021-05-04 14:20:00.426577
df_cleaned['trans_timestamp']=pd.to_datetime(df_cleaned['trans_timestamp']) is as far as I have got with the code.
I have been working on this for days and cannot get output the checker will accept.
Update
If you want to extract month and day from the first record (solution proposed by #FObersteiner)
>>> df['trans_timestamp'].iloc[0].timetuple()[1:3]
(5, 4)
If you want extract all month and day from your dataframe, use:
# Setup
df = pd.DataFrame({'trans_timestamp': ['2021-05-04 14:20:00.426577']})
df['trans_timestamp'] = pd.to_datetime(df['trans_timestamp'])
# Extract tuple
df['month_day'] = df['trans_timestamp'].apply(lambda x: (x.month, x.day))
print(df)
# Output
trans_timestamp month_day
0 2021-05-04 14:20:00.426577 (5, 4)

How can convert string to date which only contains year number?

Create a dataframe whose first column is a text.
import pandas as pd
values = {'dates': ['2019','2020','2021'],
'price': [11,12,13]
}
df = pd.DataFrame(values, columns = ['dates','price'])
Check the dtypes:
df.dtypes
dates object
price int64
dtype: object
Convert type in the column dates to type dates.
df['dates'] = pd.to_datetime(df['dates'], format='%Y')
df
dates price
0 2019-01-01 11
1 2020-01-01 12
2 2021-01-01 13
I want to convert the type in dates column to date and the dates in the following format----contains only year number:
dates price
0 2019 11
1 2020 12
2 2021 13
How can achieve the target?
If you choose to have the datetime format for your columns, it is likely to benefit from it. What you see in the column ("2019-01-01") is a representation of the datetime object. The realquestion here is, why do you need to have a datetime object?
Actually, I don't care about datetime type:
Use a string ('2019'), or preferentially an integer (2019) which will enable you to perform sorting, calculations, etc.
I need the datetime type but I really want to see only the year:
Use style to format your column while retaining the underlying type:
df.style.format({'dates': lambda t: t.strftime('%Y')})
This will allow you to keep the type while having a clean visual format

How do I convert a non zero padded day string to a useful date in pandas

I'm trying to import a date string with non-zero padded day, zero padded month, and year without century to create a datetime e.g. (11219 to 01/12/19). However, pandas cannot distinguish between the day and the month (e.g. 11219 could be 11th February, 2019 or 1st December, 2019).
I've tried using 'dayfirst' and the '#' in the day e.g. %#d, but nothing works. Code below, any advise?
Code:
df_import['newDate'] = pd.to_datetime(df_import['Date'], format='%d/%m/%Y', dayfirst = True)
Error:
time data '11219' does not match format '%d/%m/%Y' (match)
Since only the day is not zero-padded, the dates are unambiguous. They can simply be parsed by Pandas if we add the pad:
pd.to_datetime(df_import['Date'].str.zfill(6), format='%d%m%y')
use zfill()
A custom function can also be used if you want to handle more cases.
def getDate(str):
return #logic to parse
df_import['newDate'] = df_import['Date'].apply(getDate)

to change any form of date string using pandas

my date time format in excel is 01-12-2010 08:26 (date =01,month =12) when i import that into pandas and change dtype to datetime, month and date both get swapped.I am new to this please help
Output of pandas is
x .date
12
x. month
1
Excel
Invoice date = 01/12/2010 08:26
PANDAS
When import using sales = pd.read_csv()
sales["InvoiceDate"] = sales["InvoiceDate"].astype("datetime64[ns]")
[ln] y["InvoiceDate"].loc[0].
[Out] Timestamp['2010-01-12 08:26:00']
[ln] y["InvoiceDate"].loc[0].day
[out] 12
the output of this should be 1 instead of 12
where i am getting it wrong
please help
you can use pd.to_datetime with parameter dayfirst like below
pd.to_datetime("01/12/2010 08:26", dayfirst=True)

Select Data frame between two dates of a date column

I would like to subset a data frame based on a date column, which originally has this format:
3/22/13
After I transform it to a date:
df['date']=pd.to_datetime(df['date'], format='%m/%d/%y')
I get this:
2013-03-22 00:00:00
Now I would like to subset it with something like this:
df.loc[(df['date']>'2014-06-22')]
But that either gives me an empty data frame or full data frame, that is no filtering.
Any suggestions how I can get this to work?
remark: I am well aware that similar questions have been asked in other forums but I could not figure out a solution since my date column looks different.
First you have to convert your starting date and final date into a datetime format. Then you can apply multiple conditions inside df.loc. Do not forget to reassign your modifications to your df :
import pandas as pd
from datetime import datetime
df['date']=pd.to_datetime(df['date'], format='%m/%d/%y')
date1 = datetime.strptime('2013-03-23', '%Y-%m-%d')
date2 = datetime.strptime('2013-03-25', '%Y-%m-%d')
df = df.loc[(df['date']>date1) & (df['date']<date2)]