I have a Pandas dataframe with a field that is datetime datatype. Most of the values in the field are valid datetime values, but some are NAT.
I need to drop the time part of the datetime values for each value in the field, keeping the field as date datatype (not str). I tried the following:
df['mydate'] = df['mydate'].dt.date
it work fine if there is no NAT values in the column. However, if there are NAT values, it throws this error
{AttributeError}Can only use .dt accessor with datetimelike values
I tried this alternative to skip over the NAT:
df['mydate'] = [d.date if not pd.isnull(d) else None for d in df['mydate']]
but this converted the values in the column to:
<built-in method date of Timestamp object at 0x000002A06F6501C8>
Please advise how ignore or skip the NAT in the field when converting. I'v had no luck googling for an answer, and I am trying to avoid using iterrows() looping on the entire dataframe.
First convert values to datetimes and then working nice dt.date function:
df = pd.DataFrame({'mydate':['2015-04-04','2018-09-10', np.nan]})
df['new'] = pd.to_datetime(df['mydate'], errors='coerce').dt.date
print (df)
mydate new
0 2015-04-04 2015-04-04
1 2018-09-10 2018-09-10
2 NaN NaT
Related
I have a column of years from the sunspots dataset.
I want to convert column 'year' in integer e.g. 1992 to datetime format then find the time delta and eventually compute total seconds (cumulative) to represent the time index column of a time series.
I am trying to use the following code but I get the error
TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]
sunspots_df['year'] = pd.to_timedelta(pd.to_datetime(sunspots_df['year'], format='%Y') ).dt.total_seconds()
pandas.Timedelta "[r]epresents a duration, the difference between two dates or times." So you're trying to get Python to tell you the difference between a particular datetime and...nothing. That's why it's failing.
If it's important that you store your index this way (and there may be better ways), then you need to pick a start datetime and compute the difference to get a timedelta.
For example, this code...
import pandas as pd
df = pd.DataFrame({'year': [1990,1991,1992]})
diff = (pd.to_datetime(df['year'], format='%Y') - pd.to_datetime('1990', format='%Y'))\
.dt.total_seconds()
...returns a series whose values are seconds from January 1st, 1990. You'll note that it doesn't invoke pd.to_timedelta(), because it doesn't need to: the result of the subtraction is automatically a pd.timedelta column.
Create a dataframe whose first column is a text.
import pandas as pd
values = {'dates': ['2019','2020','2021'],
'price': [11,12,13]
}
df = pd.DataFrame(values, columns = ['dates','price'])
Check the dtypes:
df.dtypes
dates object
price int64
dtype: object
Convert type in the column dates to type dates.
df['dates'] = pd.to_datetime(df['dates'], format='%Y')
df
dates price
0 2019-01-01 11
1 2020-01-01 12
2 2021-01-01 13
I want to convert the type in dates column to date and the dates in the following format----contains only year number:
dates price
0 2019 11
1 2020 12
2 2021 13
How can achieve the target?
If you choose to have the datetime format for your columns, it is likely to benefit from it. What you see in the column ("2019-01-01") is a representation of the datetime object. The realquestion here is, why do you need to have a datetime object?
Actually, I don't care about datetime type:
Use a string ('2019'), or preferentially an integer (2019) which will enable you to perform sorting, calculations, etc.
I need the datetime type but I really want to see only the year:
Use style to format your column while retaining the underlying type:
df.style.format({'dates': lambda t: t.strftime('%Y')})
This will allow you to keep the type while having a clean visual format
I am confused by the number of data type conversions and seemingly very different solutions to this, none of which I can get to work.
What is the best way to convert a pandas datetime column (datetime64[ns] eg 2017-01-01 03:15:00) to another column in the same pandas dataframe, converted to julian day eg 2458971.8234259?
Many thanks
Create DatetimeIndex and convert to julian dates:
df = pd.DataFrame({'dates':['2017-01-01 03:15:00','2017-01-01 03:15:00']})
df['dates'] = pd.to_datetime(df['dates'])
df['jul1'] = pd.DatetimeIndex(df['dates']).to_julian_date()
#if need remove times
df['jul2'] = pd.DatetimeIndex(df['dates']).floor('d').to_julian_date()
print (df)
dates jul1 jul2
0 2017-01-01 03:15:00 2.457755e+06 2457754.5
1 2017-01-01 03:15:00 2.457755e+06 2457754.5
Because:
df['jul'] = df['dates'].dt.to_julian_date()
AttributeError: 'DatetimeProperties' object has no attribute 'to_julian_date'
Convert MM:SS column to HH:MM:SS column in Pandas. I tried every possible way, like changing datatype and to_datetime and to_timedelta, but I couldn't covert the series. Please help somebody. I am getting errors like:
(here chiptime is in MM:SS format, which I want to change in HH:MM:SS)
df2["ChipTime"]=pd.to_datetime(df2.ChipTime, unit="hour").dt.strftime('%H:%M:%S')
ValueError: cannot cast unit hour
df2["ChipTime"]=pd.to_timedelta(df2["ChipTime"])
ValueError: expected hh:mm:ss format
df2["ChipTime"]=df2["ChipTime"].astype(int)
ValueError: invalid literal for int() with base 10: '16:48'
I have tried more methods, above are some of them, I am beginner in Pandas, so please excuse me if I have done any blunder. Thanks
If convert values to datetimes there are added default year, month, day with parameter format in to_datetime, if neccesary is possible convert values to times by Series.dt.time
df2 = pd.DataFrame({'ChipTime':['16:48','10:48']})
df2["ChipTime1"]=pd.to_datetime(df2.ChipTime, format="%M:%S")
df2["ChipTime11"]=pd.to_datetime(df2.ChipTime, format="%M:%S").dt.time
Or for timedeltas add 00: for default hour by to_timedelta:
df2["ChipTime2"]=pd.to_timedelta('00:' + df2["ChipTime"])
print (df2)
ChipTime ChipTime1 ChipTime11 ChipTime2
0 16:48 1900-01-01 00:16:48 00:16:48 00:16:48
1 10:48 1900-01-01 00:10:48 00:10:48 00:10:48
How do I convert a string in this format to a Pandas timestamp?
00:55:02:285
hours:minutes:seconds:milliseconds
I have a dataframe already with several columns in this format.
Pandas don't seem to recognize this format as a timestamp when I use any of the conversion functions, e.g.. to_datetime()
Many Thanks.
I think you need parameter format in to_datetime:
df = pd.DataFrame({'times':['00:55:02:285','00:55:02:285']})
print (df)
times
0 00:55:02:285
1 00:55:02:285
print (pd.to_datetime(df.times, format='%H:%M:%S:%f'))
0 1900-01-01 00:55:02.285
1 1900-01-01 00:55:02.285
Name: times, dtype: datetime64[ns]