Subtracting dates in Pandas

Subtracting dates in Pandas - pandas

Trying to create a new column in a dataframe that shows number of days between now and a past date. So far I have below code but it returns 'days' + a timestamp. How can I get just the number of days?
import pytz
now = datetime.datetime.now(pytz.utc)
excel1['days_old'] = now - excel1['Start Time']
Returns:
92 days 08:08:06.667518

excel1['days_old'] will hold "timedeltas". To get them to the day difference, just use ".days" like this:
import pytz
now = datetime.datetime.now(pytz.utc)
excel1['days_timedelta'] = now - excel1['Start Time']
excel1['days_old'] = excel1['days_timedelta'].days

Assuming that Start Time column is of datetime type, run:
(pd.Timestamp.now() - df['Start Time']).dt.days

Also worked for me
import datetime
import pytz
now = datetime.datetime.now(pytz.utc)
excel1['days_old'] = (now - excel1['Start Time']).astype('timedelta64[D]')

Related

Convert DateTime to TimeStamp Pandas

The objective of this post is to be able to convert the columns [‘Open Date’, 'Close date’] to timestamp format
I have tried with the functions / examples from these links with any results.
Convert datetime to timestamp in Neo4j
Convert datetime pandas
Pandas to_dict() converts datetime to Timestamp
Really appreciate any ideas / comments / examples on how to do so.
Data Base Image
Column Characteristics:
Open Date datetime64[ns] and pandas.core.series.Series
Close date datetime64[ns] and pandas.core.series.Series
Finally I been using these libraries
import pandas as pd
import numpy as np
from datetime import datetime, date, time, timedelta

You convert first to numpy array by values and transform (cast) to int64 - output is in nanoseconds , which means divide by 10 ** 9:
df['open_ts'] = df['Open_Date'].datetime.values.astype(np.int64)
df['close_ts'] = df['Close_Date'].datetime.values.astype(np.int64)
OR
If you want to avoid using numpy, you can also try:
df['open_ts'] = pd.to_timedelta(df['Open_Date'], unit='ns').dt.total_seconds().astype(int)
df['close_ts'] = pd.to_timedelta(df['Close_Date'], unit='ns').dt.total_seconds().astype(int)
Try them and report it back here

Extract the first 10 values of a column and create a new one [duplicate]

I am looking to convert datetime to date for a pandas datetime series.
I have listed the code below:
df = pd.DataFrame()
df = pandas.io.parsers.read_csv("TestData.csv", low_memory=False)
df['PUDATE'] = pd.Series([pd.to_datetime(date) for date in df['DATE_TIME']])
df['PUDATE2'] = datetime.datetime.date(df['PUDATE']) #Does not work
Can anyone guide me in right direction?

You can access the datetime methods of a Pandas series by using the .dt methods (in a aimilar way to how you would access string methods using .str. For your case, you can extract the date of your datetime column as:
df['PUDATE'].dt.date

This is a simple way to get day of month, from a pandas
#create a dataframe with dates as a string
test_df = pd.DataFrame({'dob':['2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04']})
#convert column to type datetime
test_df['dob']= pd.to_datetime(test_df['dob'])
# Extract day, month , year using dt accessor
test_df['DayOfMonth']=test_df['dob'].dt.day
test_df['Month']=test_df['dob'].dt.month
test_df['Year']=test_df['dob'].dt.year

I think you need to specify the format for example
df['PUDATE2']=datetime.datetime.date(df['PUDATE'], format='%Y%m%d%H%M%S')
So you just need to know what format you are using

Using Datetime to convert a pandas dataframe column (string)

I am trying to convert the following string in datetime format
"14DEC2014"
Does anyone have an advice on how to do this, I have been stuck on this one for a day or two now

import pandas as pd
test = '14DEC2014'
test = pd.to_datetime(test)
print(test)
output:
2014-12-14 00:00:00
If you would like to only have the date:
test = pd.to_datetime(test).date()
output:
2014-12-14

to change any form of date string using pandas

my date time format in excel is 01-12-2010 08:26 (date =01,month =12) when i import that into pandas and change dtype to datetime, month and date both get swapped.I am new to this please help
Output of pandas is
x .date
12
x. month
1
Excel
Invoice date = 01/12/2010 08:26
PANDAS
When import using sales = pd.read_csv()
sales["InvoiceDate"] = sales["InvoiceDate"].astype("datetime64[ns]")
[ln] y["InvoiceDate"].loc[0].
[Out] Timestamp['2010-01-12 08:26:00']
[ln] y["InvoiceDate"].loc[0].day
[out] 12
the output of this should be 1 instead of 12
where i am getting it wrong
please help

you can use pd.to_datetime with parameter dayfirst like below
pd.to_datetime("01/12/2010 08:26", dayfirst=True)

Select Data frame between two dates of a date column

I would like to subset a data frame based on a date column, which originally has this format:
3/22/13
After I transform it to a date:
df['date']=pd.to_datetime(df['date'], format='%m/%d/%y')
I get this:
2013-03-22 00:00:00
Now I would like to subset it with something like this:
df.loc[(df['date']>'2014-06-22')]
But that either gives me an empty data frame or full data frame, that is no filtering.
Any suggestions how I can get this to work?
remark: I am well aware that similar questions have been asked in other forums but I could not figure out a solution since my date column looks different.

First you have to convert your starting date and final date into a datetime format. Then you can apply multiple conditions inside df.loc. Do not forget to reassign your modifications to your df :
import pandas as pd
from datetime import datetime
df['date']=pd.to_datetime(df['date'], format='%m/%d/%y')
date1 = datetime.strptime('2013-03-23', '%Y-%m-%d')
date2 = datetime.strptime('2013-03-25', '%Y-%m-%d')
df = df.loc[(df['date']>date1) & (df['date']<date2)]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Subtracting dates in Pandas - pandas

excel1['days_old'] will hold "timedeltas". To get them to the day difference, just use ".days" like this: import pytz now = datetime.datetime.now(pytz.utc) excel1['days_timedelta'] = now - excel1['Start Time'] excel1['days_old'] = excel1['days_timedelta'].days

Assuming that Start Time column is of datetime type, run: (pd.Timestamp.now() - df['Start Time']).dt.days

Also worked for me import datetime import pytz now = datetime.datetime.now(pytz.utc) excel1['days_old'] = (now - excel1['Start Time']).astype('timedelta64[D]')

Related

Convert DateTime to TimeStamp Pandas

Extract the first 10 values of a column and create a new one [duplicate]

Using Datetime to convert a pandas dataframe column (string)

to change any form of date string using pandas

Select Data frame between two dates of a date column

Categories

Resources