How to cast pandas datetime object to string without index - pandas

I'm trying to cast a datetime64 panda object to string without printing the index.
I have a csv file with the following
Dates
2019-06-01
2019-06-02
2019-06-03
When I import the csv file via pandas, I have a normal pandas object in the column.
df['Dates'] = pd.to_datetime(df['Dates'], format='%Y-%m-%d')
This provides a datetime64[ns] object. I tried printing this object with the following output.
>>> What is the date 0 2019-06-01
Name: Dates, dtype: datetime64[ns]
So I have to cast this object to a string. The documentation suggests I use dt.strftime().
s=df["Dates"].dt.strftime("%Y-%m-%d")
print(f"What is the date {s['Dates'}")
The output for the above is:
>>> What is the date 0 2019-06-01
How do I remove the index from the output?
file = r'test.csv'
df = pd.read_csv(file)
df['Dates'] = pd.to_datetime(df['Dates'], format='%Y-%m-%d')
s = df[df["Dates"] < "2019-06-02"]
print(f"What is the date {s['Dates']}")
print(s["Dates"])
The expected output is the following:
>>> What is the date 2019-06-01
However I am getting the following
>>> What is the date 0 2019-06-01

You can try:
[print(f"What is the date {x}") for x in s['Dates'].astype('str')]
gives:
What is the date 2019-06-01
What is the date 2019-06-02
What is the date 2019-06-03

Related

Converting dtype('O') to a date DD/MM/YYYY

im a begginer at coding and started a project for my small business.
i imported a xlsx file using panda and got the table with dtype('O') at every columns
me sheet
but i can't find anywhere a way to get only the date in the format DD/MM/YYYY
any tips?
i have tried this code
tabela['dt_nasc'] = pd.to_datetime(tabela['dt_nasc'], format='%m/%d/%Y')
but the results were
ValueError: time data '1988-10-24 00:00:00' does not match format '%m/%d/%Y' (match)
i also tried another code
import datetime
def convert_to_date(x):
return datetime.datetime.strptime(x , '%Y-%m-%d %H:%M:%S')
tabela.loc[:, 'dt_nasc'] = tabela['dt_nasc'].apply(convert_to_date)
# better to use a lambda function
tabela.loc[:, 'dt_nasc'] = tabela['dt_nasc'].apply(lambda x:datetime.datetime.strptime(x , '%Y-%m-%d %H:%M:%S'))
but couldn't find a way to print at format DD/MM/YYYY
Example
s = pd.Series({0: '2022-11-29 13:00:00', 1: '2022-11-30 13:48:00'})
s
0 2022-11-29 13:00:00
1 2022-11-30 13:48:00
dtype: object <-- chk dtype
Code
object to datetime
pd.to_datetime(s)
result
0 2022-11-29 13:00:00
1 2022-11-30 13:48:00
dtype: datetime64[ns] <--chk dtype
convert result to DD-MM-YYYY HH-mm-ss
pd.to_datetime(s).dt.strftime('%d-%m-%Y %H:%M:%S')
0 16-11-2022 13:00:00
1 17-11-2022 13:48:00
dtype: object <--chk dtype
convert result to DD/MM/YYYY
pd.to_datetime(s).dt.strftime('%d/%m/%Y')
0 29/11/2022
1 30/11/2022
dtype: object <-- chk dtype

Having problem converting object/string type date formats to datetime type

Please assist how to convert a complete column elements to datetime format in pandas dataframe. The below is one of the such element.
1-July-2020 7.30 PM
DateTime
Use pd.to_datetime and the format argument:
>>> df
DateTime
0 1-July-2020 7.30 PM
>>> df.dtypes
DateTime object
dtype: object
df['DateTime'] = pd.to_datetime(df['DateTime'], format='%d-%B-%Y %I.%M %p')
Output result:
>>> df
DateTime
0 2020-07-01 19:30:00
>>> df.dtypes
DateTime datetime64[ns]
dtype: object

Trying to convert aware local datetime to naive local datetime in Panda DataFrame

I am trying to do as my title says. I have a panda dataframe with datetime and timezone.
I am trying to convert the column startDate to the local naive datetime as follow:
This is the piece of code to do that
df['startDate'] = df['startDate'].apply(lambda x: timezone('UTC').localize(x))
df["startDate"] = df.apply(lambda x: x.startDate.astimezone(timezone(x.timezone)), axis=1)
df["startDate"] = df["startDate"].dt.tz_localize(None)
I get this error message.
Tz-aware datetime.datetime cannot be converted to datetime64 unless
utc=True
If I actually precise UTC=True I will get my initial datetime value and that's not what I am trying to achieve
I want to get from this
2020-07-20 20:30:00-07:00
2020-07-21 16:00:00-04:00
2020-07-20 20:30:00-07:00
To this
2020-07-20 20:30:00
2020-07-21 16:00:00
2020-07-20 20:30:00
I am thinking of converting, otherwise, to a string and remove the the 5 last characters and reconverting to a datetime object. However I am looking for a better solution.
Thank you
If you read a datetime string with UTC offset like "2020-07-20 20:30:00-07:00", this will give you a Series of type datetime.datetime (not the pandas datetime64[ns]). So if I get this right, what you want to do is remove the tzinfo. This is basically described here and you can do that like
import pandas as pd
df = pd.DataFrame({'startDate':pd.to_datetime(['2020-07-20 20:30:00-07:00',
'2020-07-21 16:00:00-04:00',
'2020-07-20 20:30:00-07:00'])})
# df['startDate'].iloc[0]
# datetime.datetime(2020, 7, 20, 20, 30, tzinfo=tzoffset(None, -25200))
df['startDate_naive'] = df['startDate'].apply(lambda t: t.replace(tzinfo=None))
# df['startDate_naive']
# 0 2020-07-20 20:30:00
# 1 2020-07-21 16:00:00
# 2 2020-07-20 20:30:00
# Name: startDate_naive, dtype: datetime64[ns]
If you work with timezone aware pandas datetime column, see my answer here on how you can remove the timezone awareness.

Failed to cast variant value to DATE (parquet/snowflake)

I have a Pandas dataframe which includes two datetime64[ns] columns, one with just a date and the other with a timestamp.
>>> df
date ts
0 2020-01-06 2020-01-06 03:12:45
1 2020-01-07 2020-01-07 12:56:52
2 2020-01-08 2020-01-08 15:09:59
>>> df.info()
# Column Dtype
--- --------- ------------
0 date datetime64[ns]
1 ts datetime64[ns]
The idea is to save this dataframe into a Parquet file hosted on S3:
df.to_parquet('s3:/my-bucket-name/df.parquet', engine='fastparquet', compression='gzip')
... and using this file to COPY INTO a Snowflake table with two columns:
CREATE TABLE MY_TABLE (
date DATE,
ts TIMESTAMP
)
The command used to COPY is as follows, based on Snowflake's documentation:
copy into {schema}.{table}
from s3://my-bucket-name
credentials=(aws_key_id='{aws_key_id}' aws_secret_key='{aws_secret_key}')
match_by_column_name=case_insensitive
file_format=(type=parquet);
When executing the above command with a dataframe/file/table with only timestamp fields, everything runs fine. The problem comes when using it with a dataframe/file/table with a date field. In this case, an error shows up:
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 100071 (22000):
Failed to cast variant value "2020-06-16 00:00:00.000" to DATE
Is there a way to solve this issue?

How to set time zone of values in a Pandas DataFrame?

I'd like to set the time zone of the values of a column in a Pandas DataFrame. I am reading the DataFrame with pandas.read_csv().
You can read dates as UTC directly from read_csv by setting the date_parser function manually, for example:
from dateutil.tz import tzutc
from dateutil.parser import parse
def date_utc(s):
return parse(s, tzinfos=tzutc)
df = read_csv('my.csv', parse_dates=[0], date_parser=date_utc)
.
If you are creating a timeseries, you can use the tz argument of date_range:
dd = pd.date_range('2012-1-1 1:30', periods=3, freq='min', tz='UTC')
In [2]: dd
Out[2]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-01-01 01:30:00, ..., 2012-01-01 01:32:00]
Length: 3, Freq: T, Timezone: UTC
.
If your DataFrame/Series is already index by a timeseries, you can use the tz_localize method to set a timezone:
df.tz_localize('UTC')
or if it already has a timezone, use tz_convert:
df.tz_convert('UTC')
# core modules
from datetime import timezone, datetime
# 3rd party modules
import pandas as pd
import pytz
# create a dummy dataframe
df = pd.DataFrame({'date': [datetime(2018, 12, 30, 20 + i, 56)
for i in range(2)]},)
print(df)
# Convert the time to a timezone-aware datetime object
df['date'] = df['date'].dt.tz_localize(timezone.utc)
print(df)
# Convert the time from to another timezone
# The point in time does not change, only the associated timezone
my_timezone = pytz.timezone('Europe/Berlin')
df['date'] = df['date'].dt.tz_convert(my_timezone)
print(df)
gives
date
0 2018-12-30 20:56:00
1 2018-12-30 21:56:00
date
0 2018-12-30 20:56:00+00:00
1 2018-12-30 21:56:00+00:00
date
0 2018-12-30 21:56:00+01:00
1 2018-12-30 22:56:00+01:00
df['date'] = df['date'].dt.tz_localize('UTC')
This seems to work starting with my 'naive' timezone.