TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns] - pandas

I have a column of years from the sunspots dataset.
I want to convert column 'year' in integer e.g. 1992 to datetime format then find the time delta and eventually compute total seconds (cumulative) to represent the time index column of a time series.
I am trying to use the following code but I get the error
TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]
sunspots_df['year'] = pd.to_timedelta(pd.to_datetime(sunspots_df['year'], format='%Y') ).dt.total_seconds()

pandas.Timedelta "[r]epresents a duration, the difference between two dates or times." So you're trying to get Python to tell you the difference between a particular datetime and...nothing. That's why it's failing.
If it's important that you store your index this way (and there may be better ways), then you need to pick a start datetime and compute the difference to get a timedelta.
For example, this code...
import pandas as pd
df = pd.DataFrame({'year': [1990,1991,1992]})
diff = (pd.to_datetime(df['year'], format='%Y') - pd.to_datetime('1990', format='%Y'))\
.dt.total_seconds()
...returns a series whose values are seconds from January 1st, 1990. You'll note that it doesn't invoke pd.to_timedelta(), because it doesn't need to: the result of the subtraction is automatically a pd.timedelta column.

Related

Darts/Pandas date parsing issue

I have a date column as you see which is pd datetime on a df. When I tried converting into series I get :
series = TimeSeries.from_dataframe(df,time_col='Date')
ValueError: The time index of the provided DataArray is missing the freq attribute, and the frequency could not be directly inferred. This probably comes from inconsistent date frequencies with missing dates. If you know the actual frequency, try setting fill_missing_dates=True, freq=actual_frequency. If not, try setting fill_missing_dates=True, freq=None to see if a frequency can be inferred.
These are company specific dates so the freq is Business day . When I tried to force it to business days it complains. Can someone help please?
Try:
TimeSeries.from_dataframe(df, fill_missing_dates=True, freq="D", time_col="Date")
If there are other issues with time col - you can also set date as index in dataframe and it should resolve date column issue.
df = df.set_index("Date")

Outputting pandas timestamp to tuple with just month and day

I have a pandas dataframe with a timestamp field which I have successfully to converted to datetime format and now I want to output just the month and day as a tuple for the first date value in the data frame. It is for a test and the output must not have leading zeros. I ahve tried a number of things but I cannot find an answer without converting the timestamp to a string which does not work.
This is the format
2021-05-04 14:20:00.426577
df_cleaned['trans_timestamp']=pd.to_datetime(df_cleaned['trans_timestamp']) is as far as I have got with the code.
I have been working on this for days and cannot get output the checker will accept.
Update
If you want to extract month and day from the first record (solution proposed by #FObersteiner)
>>> df['trans_timestamp'].iloc[0].timetuple()[1:3]
(5, 4)
If you want extract all month and day from your dataframe, use:
# Setup
df = pd.DataFrame({'trans_timestamp': ['2021-05-04 14:20:00.426577']})
df['trans_timestamp'] = pd.to_datetime(df['trans_timestamp'])
# Extract tuple
df['month_day'] = df['trans_timestamp'].apply(lambda x: (x.month, x.day))
print(df)
# Output
trans_timestamp month_day
0 2021-05-04 14:20:00.426577 (5, 4)

How can I filter a pandas data frame based on a datetime column between current time and 10 hours ago?

I have a pandas DataFrame which includes a datetime column and I want to filter the data frame between the current hour and 10 hours ago. I have tried different ways to do it but still I cannot handle it. Because when I want to use pandas, the column type is Series and I can't use timedelta to compare them. If I use a for loop to compare the column as a string to my time interval, it is not efficient.
The table is like this:
And I want to filter the 'dateTime' column between current time and 10 hours ago, then filter based on 'weeks' > 80.
I have tried these codes as well But they have not worked:
filter_criteria = main_table['dateTime'].sub(today).abs().apply(lambda x: x.hours <= 10)
main_table.loc[filter_criteria]
This returns an error:
TypeError: unsupported operand type(s) for -: 'str' and 'datetime.datetime'
Similarly this code has the same problem:
main_table.loc[main_table['dateTime'] >= (datetime.datetime.today() - pd.DateOffset(hours=10))]
And:
main_table[(pd.to_datetime('today') - main_table['dateTime'] ).dt.hours.le(10)]
In all of the code above main_table is the name of my data frame.
How can I filter them?
First you need to make sure that your datatype in datetime column is correct. you can check it by using:
main_table.info()
If it is not datetime (i.e, object) convert it:
# use proper formatting if this line does not work
main_table['dateTime'] = pd.to_datetime(main_table['dateTime'])
Then you need to find the datetime object of ten hour before current time (ref):
from datetime import datetime, timedelta
date_time_ten_before = datetime.now() - timedelta(hours = 10)
All it remains is to filter the column:
main_table_10 = main_table[main_table['dateTime'] >= date_time_ten_before]

Specifying datetime64 resolution in Ibis when converting to Pandas DataFrame

I have a MySQL database with datetime values shifted by arbitrary amounts for de-identification purposes. So, for example, I have a date value of datetime.datetime(2644, 1, 17, 0, 0) . If I query these values with pymysql or Pandas I get a fine datetime object. However, if I use Ibis to construct the query I get a failure because the dates fall outside of the data range that can be represented with the datetime64[ns] data type that Ibis is using in the conversion to the DataFrame.
So
---------------------------------------------------------------------------
OutOfBoundsDatetime Traceback (most recent call last)
~/opt/anaconda3/envs/clinicalnlp/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2084 try:
-> 2085 values, tz_parsed = conversion.datetime_to_datetime64(data)
2086 # If tzaware, these values represent unix timestamps, so we
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2644-01-17 06:48:00
Browsing through the Ibis source code, it seems like it should be possible to configure the datetime64 time unit from nanoseconds to microseconds, so that the dates fall within the allowable range, but I cannot figure out how to do this configuration within Ibis.
Any suggestions would be greatly appreciated.

Covert MM:SS column to HH:MM:SS column in Pandas?

Convert MM:SS column to HH:MM:SS column in Pandas. I tried every possible way, like changing datatype and to_datetime and to_timedelta, but I couldn't covert the series. Please help somebody. I am getting errors like:
(here chiptime is in MM:SS format, which I want to change in HH:MM:SS)
df2["ChipTime"]=pd.to_datetime(df2.ChipTime, unit="hour").dt.strftime('%H:%M:%S')
ValueError: cannot cast unit hour
df2["ChipTime"]=pd.to_timedelta(df2["ChipTime"])
ValueError: expected hh:mm:ss format
df2["ChipTime"]=df2["ChipTime"].astype(int)
ValueError: invalid literal for int() with base 10: '16:48'
I have tried more methods, above are some of them, I am beginner in Pandas, so please excuse me if I have done any blunder. Thanks
If convert values to datetimes there are added default year, month, day with parameter format in to_datetime, if neccesary is possible convert values to times by Series.dt.time
df2 = pd.DataFrame({'ChipTime':['16:48','10:48']})
df2["ChipTime1"]=pd.to_datetime(df2.ChipTime, format="%M:%S")
df2["ChipTime11"]=pd.to_datetime(df2.ChipTime, format="%M:%S").dt.time
Or for timedeltas add 00: for default hour by to_timedelta:
df2["ChipTime2"]=pd.to_timedelta('00:' + df2["ChipTime"])
print (df2)
ChipTime ChipTime1 ChipTime11 ChipTime2
0 16:48 1900-01-01 00:16:48 00:16:48 00:16:48
1 10:48 1900-01-01 00:10:48 00:10:48 00:10:48