converting a datetime in pandas - pandas

I have the following datetime format (Mai 2, 2018 6:35:52 AM) which I want to convert to %Y-%m-%d %H:%M:%S.%f format, such as: 2017-05-13 21:11:01.436757
I have tried every way I can think of but get errors
customer_file_path['RECEIVED_DATE2'] = customer_file_path['RECEIVED_DATE'].str.replace(',','')
print(customer_file_path[['RECEIVED_DATE2']].head(10))
customer_file_path['RECEIVED_DATE2'] = pd.to_datetime(customer_file_path['RECEIVED_DATE2'], format='%b %e %Y %-I:%M:%S %p', errors='coerce')
print(customer_file_path[['RECEIVED_DATE2']].head(10))
customer_file_path['RECEIVED_DATE3'] = pd.to_datetime(customer_file_path['RECEIVED_DATE2']).dt.strftime('%Y-%m-%d %H:%M:%S.%f')#, errors='coerce')
customer_file_path['RECEIVED_DATE2'] = pd.to_datetime(customer_file_path['RECEIVED_DATE2'], format='%b %e %Y %-I:%M:%S %p').dt.strftime('%Y-%m-%d %H:%M:%S.%f')#, errors='coerce')
customer_file_path['RECEIVED_DATE3'] = pd.to_datetime(customer_file_path['RECEIVED_DATE2'], format='%b %e %Y %-I:%M:%S %p').dt.strftime('%Y-%m-%d %H:%M:%S.%f')#, errors='coerce')

Related

Converting pandas._libs.tslibs.timestamps.Timestamp to seconds since midnight?

I have a pandas._libs.tslibs.timestamps.Timestamp object, e.g., 2016-01-01 07:00:04.85+00:00 and I want to create an int object that stores the number of seconds since the previous midnight.
In the above example, it would return 7 * 3600 + 0 * 60 + 4.85 = 25204.85
Is there a quick way to do this in pandas?
You can use normalize() to subtract the date part:
# ts = pd.to_datetime('2016-01-01 07:00:04.85+00:00')
>>> (ts - ts.normalize()).total_seconds()
25204.85
It also works with DataFrame through dt accessor:
# df = pd.DataFrame({'date': [ts]})
>>> (df['date'] - df['date'].dt.normalize()).dt.total_seconds()
0 25204.85
Name: date, dtype: float64
Not sure if this is what you are looking for but here is an implementation:
import pandas as pd
def seconds_from_midnight(date):
return date.hour * 3600 + date.minute * 60 + date.second + date.microsecond / 1000000
date = pd.Timestamp.now()
print(date)
print(seconds_from_midnight(date))

How to sort object data type index into datetime in pandas?

Index(['Apr-20', 'Apr-21', 'Apr-22', 'Aug-20', 'Aug-21', 'Aug-22', 'Dec-20',
'Dec-21', 'Dec-22', 'Feb-21', 'Feb-22', 'Jan-21', 'Jan-22', 'Jan-23',
'Jul-20', 'Jul-21', 'Jul-22', 'Jun-20', 'Jun-21', 'Jun-22', 'Mar-20',
'Mar-21', 'Mar-22', 'May-20', 'May-21', 'May-22', 'Nov-20', 'Nov-21',
'Nov-22', 'Oct-20', 'Oct-21', 'Oct-22', 'Sep-20', 'Sep-21', 'Sep-22'],
dtype='object', name='months')
How could I sort this month-year object dtype into the datetime format such as 'MMM-YY' in pandas? Take thanks in advance!
If need only sorting values of index like datetimes use DataFrame.sort_index with key parameter:
df = df.sort_index(key=lambda x: pd.to_datetime(x, format='%b-%y'))
If need DatetimeIndex and then sorting use:
df.index = pd.to_datetime(df.index, format='%b-%y')
df = df.sort_index()
Another idea is create PeriodIndex:
df.index = pd.to_datetime(df.index, format='%b-%y').to_period('m')
df = df.sort_index()

Selecting data between two dates in Dataframe 'ValueError: Lengths must match to compare'

I want to select all the values that are between 2 dates in my large df_data. This works when I do this outside of a loop for a single day worth of data:
df_data['datetime'] = pd.to_datetime(df_data['TimeStamp'] )
twelveearlier = datetime.datetime(2017, 12,23, 00,00, 00)
twelvelater = datetime.datetime(2017, 12, 24, 00, 00, 00)
df = df_data[(df_data['datetime']>= twelveearlier) &
(df_data['datetime']< twelvelater)]
But when I try and do this by looping through a list of dates below, I get ValueError: Lengths must match to compare.
event_name_list = ['noEvent_20161208174900', 'NoEvent_20161209174200', 'NoEvent20161211_061400']
for event in event_name_list:
event_time = re.findall(r'\d+', event)
event_timestamp = pd.to_datetime(event_time)
twelvelater = event_timestamp + datetime.timedelta(hours=12)
twelveearlier = event_timestamp - datetime.timedelta(hours=12)
df = df_data[(df_data['datetime']>= twelveearlier.values) &
(df_data['datetime']< twelvelater.values)]
I think this is because twelveearlier and twelvelater are different types in the loop
version due to using event_timestamp - datetime.timedelta(hours=12)but converting them using to_datetime, to_pydatetime etc. doesn't help. How do I get twelveearlier and twelvelater in the same format as df_data[datetime] so that I can create df based on only the dates between twelveearlier and twelvelater?
df_data['datetime']
3250592 2017-12-31 23:40:00
3250593 2017-12-31 23:50:00
Name: datetime, dtype: datetime64[ns]
print event_timestamp
DatetimeIndex(['2016-12-16 06:22:29'], dtype='datetime64[ns]', freq=None)
print twelveearlier
DatetimeIndex(['2016-12-08 05:49:00'], dtype='datetime64[ns]', freq=None)
print twelvelater
DatetimeIndex(['2016-12-09 05:49:00'], dtype='datetime64[ns]', freq=None)
You are trying to compare against a list of date times:twelvelater.values gives you a single element array.
This means you are trying to match a dataframe against 'multiple' elements in the conditional [[datetime]]. Only taking the first element of each of these date time arrays twelvelater.values[0] should fix the problem with minimal code changes.
event_name_list = ['noEvent_20161208174900', 'NoEvent_20161209174200', 'NoEvent20161211_061400']
for event in event_name_list:
event_time = re.findall(r'\d+', event)
event_timestamp = pd.to_datetime(event_time)
twelvelater = event_timestamp + datetime.timedelta(hours=12)
twelveearlier = event_timestamp - datetime.timedelta(hours=12)
df = df_data[(df_data['datetime']>= twelveearlier.values[0]) &
(df_data['datetime']< twelvelater.values[0])]
You are trying to compare datetime to a DatetimeIndex of datetimes of length one. This is because re.findall returns a list of all the matches it finds. Try this:
event_name_list = pd.to_datetime([re.findall(r'\d+', x)[0] for x in event_name_list])
for event_timestamp in event_name_list:
twelvelater = event_timestamp + datetime.timedelta(hours=12)
twelveearlier = event_timestamp - datetime.timedelta(hours=12)
df = df_data[(df_data['datetime']>= twelveearlier) &
(df_data['datetime']< twelvelater)]

How to set frequency with pd.to_datetime()?

When fitting a statsmodel, I'm receiving a warning about the date frequency.
First, I import a dataset:
import statsmodels as sm
df = sm.datasets.get_rdataset(package='datasets', dataname='airquality').data
df['Year'] = 1973
df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']])
df.drop(columns=['Year', 'Month', 'Day'], inplace=True)
df.set_index('Date', inplace=True, drop=True)
Next I try to fit a SES model:
fit = sm.tsa.api.SimpleExpSmoothing(df['Wind']).fit()
Which returns this warning:
/anaconda3/lib/python3.6/site-packages/statsmodels/tsa/base/tsa_model.py:171: ValueWarning: No frequency information was provided, so inferred frequency D will be used.
% freq, ValueWarning)
My dataset is daily so inferred 'D' is ok, but I was wondering how I can manually set the frequency.
Note that the DatetimeIndex doesn't have the freq (last line) ...
DatetimeIndex(['1973-05-01', '1973-05-02', '1973-05-03', '1973-05-04',
'1973-05-05', '1973-05-06', '1973-05-07', '1973-05-08',
'1973-05-09', '1973-05-10',
...
'1973-09-21', '1973-09-22', '1973-09-23', '1973-09-24',
'1973-09-25', '1973-09-26', '1973-09-27', '1973-09-28',
'1973-09-29', '1973-09-30'],
dtype='datetime64[ns]', name='Date', length=153, freq=None)
As per this answer I've checked for missing dates, but there doesn't appear to be any:
pd.date_range(start = '1973-05-01', end = '1973-09-30').difference(df.index)
DatetimeIndex([], dtype='datetime64[ns]', freq='D')
How should I set the frequency for the index?
I think pd.to_datetime not set default frequency, need DataFrame.asfreq:
df = df.set_index('Date').asfreq('d')
print (df.index)
DatetimeIndex(['1973-05-01', '1973-05-02', '1973-05-03', '1973-05-04',
'1973-05-05', '1973-05-06', '1973-05-07', '1973-05-08',
'1973-05-09', '1973-05-10',
...
'1973-09-21', '1973-09-22', '1973-09-23', '1973-09-24',
'1973-09-25', '1973-09-26', '1973-09-27', '1973-09-28',
'1973-09-29', '1973-09-30'],
dtype='datetime64[ns]', name='Date', length=153, freq='D')
But if duplicated values in index get error:
df = pd.concat([df, df])
df = df.set_index('Date')
print (df.asfreq('d').index)
ValueError: cannot reindex from a duplicate axis
Solution is use resample with some aggregate function:
print (df.resample('2D').mean().index)
DatetimeIndex(['1973-05-01', '1973-05-03', '1973-05-05', '1973-05-07',
'1973-05-09', '1973-05-11', '1973-05-13', '1973-05-15',
'1973-05-17', '1973-05-19', '1973-05-21', '1973-05-23',
'1973-05-25', '1973-05-27', '1973-05-29', '1973-05-31',
'1973-06-02', '1973-06-04', '1973-06-06', '1973-06-08',
'1973-06-10', '1973-06-12', '1973-06-14', '1973-06-16',
'1973-06-18', '1973-06-20', '1973-06-22', '1973-06-24',
'1973-06-26', '1973-06-28', '1973-06-30', '1973-07-02',
'1973-07-04', '1973-07-06', '1973-07-08', '1973-07-10',
'1973-07-12', '1973-07-14', '1973-07-16', '1973-07-18',
'1973-07-20', '1973-07-22', '1973-07-24', '1973-07-26',
'1973-07-28', '1973-07-30', '1973-08-01', '1973-08-03',
'1973-08-05', '1973-08-07', '1973-08-09', '1973-08-11',
'1973-08-13', '1973-08-15', '1973-08-17', '1973-08-19',
'1973-08-21', '1973-08-23', '1973-08-25', '1973-08-27',
'1973-08-29', '1973-08-31', '1973-09-02', '1973-09-04',
'1973-09-06', '1973-09-08', '1973-09-10', '1973-09-12',
'1973-09-14', '1973-09-16', '1973-09-18', '1973-09-20',
'1973-09-22', '1973-09-24', '1973-09-26', '1973-09-28',
'1973-09-30'],
dtype='datetime64[ns]', name='Date', freq='2D')
The problem is caused by the not explicitly set frequence. In most cases you can't be sure that your data does not have any gaps, so generate a data range with
rng = pd.date_range(start = '1973-05-01', end = '1973-09-30', freq='D')
reindex your DataFrame with this rng and fill the np.nan with your method or value of choice.

Trying to get object id with date in pymongo

import pymongo
import datetime
from pymongo import MongoClient
from bson.objectid import ObjectId
from time import gmtime, strftime
gen_time = strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())
dummy_id = ObjectId.from_datetime(gen_time)
result = db["config"].find({"_id":{"$lt": dummy_id}})
print(result)
and its showing erorr AttributeError: 'str' object has no attribute 'utcoffset'
You're passing a string to ObjectId.from_datetime():
gen_time = strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())
dummy_id = ObjectId.from_datetime(gen_time)
whereas the docs say
Pass either a naive datetime instance containing UTC, or an aware instance that has been converted to UTC.
You probably want
dummy_id = ObjectId.from_datetime(datetime.utcnow())
(or maybe datetime.datetime.utcnow() since you're just doing import datetime)