How to set time zone of values in a Pandas DataFrame? - numpy

I'd like to set the time zone of the values of a column in a Pandas DataFrame. I am reading the DataFrame with pandas.read_csv().

You can read dates as UTC directly from read_csv by setting the date_parser function manually, for example:
from dateutil.tz import tzutc
from dateutil.parser import parse
def date_utc(s):
return parse(s, tzinfos=tzutc)
df = read_csv('my.csv', parse_dates=[0], date_parser=date_utc)
.
If you are creating a timeseries, you can use the tz argument of date_range:
dd = pd.date_range('2012-1-1 1:30', periods=3, freq='min', tz='UTC')
In [2]: dd
Out[2]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-01 01:30:00, ..., 2012-01-01 01:32:00]
Length: 3, Freq: T, Timezone: UTC
.
If your DataFrame/Series is already index by a timeseries, you can use the tz_localize method to set a timezone:
df.tz_localize('UTC')
or if it already has a timezone, use tz_convert:
df.tz_convert('UTC')

# core modules
from datetime import timezone, datetime
# 3rd party modules
import pandas as pd
import pytz
# create a dummy dataframe
df = pd.DataFrame({'date': [datetime(2018, 12, 30, 20 + i, 56)
for i in range(2)]},)
print(df)
# Convert the time to a timezone-aware datetime object
df['date'] = df['date'].dt.tz_localize(timezone.utc)
print(df)
# Convert the time from to another timezone
# The point in time does not change, only the associated timezone
my_timezone = pytz.timezone('Europe/Berlin')
df['date'] = df['date'].dt.tz_convert(my_timezone)
print(df)
gives
date
0 2018-12-30 20:56:00
1 2018-12-30 21:56:00
date
0 2018-12-30 20:56:00+00:00
1 2018-12-30 21:56:00+00:00
date
0 2018-12-30 21:56:00+01:00
1 2018-12-30 22:56:00+01:00

df['date'] = df['date'].dt.tz_localize('UTC')
This seems to work starting with my 'naive' timezone.

Related

Having problem converting object/string type date formats to datetime type

Please assist how to convert a complete column elements to datetime format in pandas dataframe. The below is one of the such element.
1-July-2020 7.30 PM
DateTime
Use pd.to_datetime and the format argument:
>>> df
DateTime
0 1-July-2020 7.30 PM
>>> df.dtypes
DateTime object
dtype: object
df['DateTime'] = pd.to_datetime(df['DateTime'], format='%d-%B-%Y %I.%M %p')
Output result:
>>> df
DateTime
0 2020-07-01 19:30:00
>>> df.dtypes
DateTime datetime64[ns]
dtype: object

Trying to convert aware local datetime to naive local datetime in Panda DataFrame

I am trying to do as my title says. I have a panda dataframe with datetime and timezone.
I am trying to convert the column startDate to the local naive datetime as follow:
This is the piece of code to do that
df['startDate'] = df['startDate'].apply(lambda x: timezone('UTC').localize(x))
df["startDate"] = df.apply(lambda x: x.startDate.astimezone(timezone(x.timezone)), axis=1)
df["startDate"] = df["startDate"].dt.tz_localize(None)
I get this error message.
Tz-aware datetime.datetime cannot be converted to datetime64 unless
utc=True
If I actually precise UTC=True I will get my initial datetime value and that's not what I am trying to achieve
I want to get from this
2020-07-20 20:30:00-07:00
2020-07-21 16:00:00-04:00
2020-07-20 20:30:00-07:00
To this
2020-07-20 20:30:00
2020-07-21 16:00:00
2020-07-20 20:30:00
I am thinking of converting, otherwise, to a string and remove the the 5 last characters and reconverting to a datetime object. However I am looking for a better solution.
Thank you
If you read a datetime string with UTC offset like "2020-07-20 20:30:00-07:00", this will give you a Series of type datetime.datetime (not the pandas datetime64[ns]). So if I get this right, what you want to do is remove the tzinfo. This is basically described here and you can do that like
import pandas as pd
df = pd.DataFrame({'startDate':pd.to_datetime(['2020-07-20 20:30:00-07:00',
'2020-07-21 16:00:00-04:00',
'2020-07-20 20:30:00-07:00'])})
# df['startDate'].iloc[0]
# datetime.datetime(2020, 7, 20, 20, 30, tzinfo=tzoffset(None, -25200))
df['startDate_naive'] = df['startDate'].apply(lambda t: t.replace(tzinfo=None))
# df['startDate_naive']
# 0 2020-07-20 20:30:00
# 1 2020-07-21 16:00:00
# 2 2020-07-20 20:30:00
# Name: startDate_naive, dtype: datetime64[ns]
If you work with timezone aware pandas datetime column, see my answer here on how you can remove the timezone awareness.

ValueError: time data '25-08-2012 00:00' does not match format '%m-%d-%Y %H:%M' (match

import pandas as pd
import numPy as np # For mathematical calculations
import matplotlib.pyplot as pit # For plotting graphs
import datetime as dt
from datetime import datetime # To access datetime
from pandas import Series # To work on series
import warnings # To ignore the warnings warnings.filterwarnings("ignore" )
train=pd.read_csv( "train.csv")
train.head()
Data:
ID Datetime Count
0 0 25-08-2012 00:00 8
1 1 25-08-2012 01:00 2
2 2 25-08-2012 02:00 6
3 3 25-08-2012 03:00 2
4 4 25-08-2012 04:00 2
I am trying to convert the date format above :
train['New_date'] = pd.to_datetime(train.Datetime, format='%m-%d-%Y %H:%M')
But I get :
ValueError: time data '25-08-2012 00:00' does not match format '%m-%d-%Y %H:%M' (match)
I read many similar questions in the forum but I am still stuck.
Swap d with m, because format of datetimes is DD-MM-YYY HH:MM:
train['New_date'] = pd.to_datetime(train.Datetime, format='%d-%m-%Y %H:%M')
It's true. There is no month 25
(swap either your data or your format string around so that they agree. Right now one is DMY and the other is MDY)

How to cast pandas datetime object to string without index

I'm trying to cast a datetime64 panda object to string without printing the index.
I have a csv file with the following
Dates
2019-06-01
2019-06-02
2019-06-03
When I import the csv file via pandas, I have a normal pandas object in the column.
df['Dates'] = pd.to_datetime(df['Dates'], format='%Y-%m-%d')
This provides a datetime64[ns] object. I tried printing this object with the following output.
>>> What is the date 0 2019-06-01
Name: Dates, dtype: datetime64[ns]
So I have to cast this object to a string. The documentation suggests I use dt.strftime().
s=df["Dates"].dt.strftime("%Y-%m-%d")
print(f"What is the date {s['Dates'}")
The output for the above is:
>>> What is the date 0 2019-06-01
How do I remove the index from the output?
file = r'test.csv'
df = pd.read_csv(file)
df['Dates'] = pd.to_datetime(df['Dates'], format='%Y-%m-%d')
s = df[df["Dates"] < "2019-06-02"]
print(f"What is the date {s['Dates']}")
print(s["Dates"])
The expected output is the following:
>>> What is the date 2019-06-01
However I am getting the following
>>> What is the date 0 2019-06-01
You can try:
[print(f"What is the date {x}") for x in s['Dates'].astype('str')]
gives:
What is the date 2019-06-01
What is the date 2019-06-02
What is the date 2019-06-03

Pandas - Changing date format from yyyy-dd-mm to yyyy-mm-dd

I am trying to convert the timestamp 2018-06-11T00:00:00.000000000 to 2018-11-06T00:00:00.000000000
I have tried df['date'] = pd.to_datetime(df['date']) but it dint help
Have also tried pd.to_datetime(df['date']).dt.strftime("%Y-%d-%m")
You can use format:
In [11]: s = pd.Series(['2018-06-11T00:00:00.000000000'])
In [12]: pd.to_datetime(s, format='%Y-%d-%mT%H:%M:%S')
Out[12]:
0 2018-11-06
dtype: datetime64[ns]
i.e.
df['date'] = pd.to_datetime(df['date'], format='%Y-%d-%mT%H:%M:%S')