Darts/Pandas date parsing issue - pandas

I have a date column as you see which is pd datetime on a df. When I tried converting into series I get :
series = TimeSeries.from_dataframe(df,time_col='Date')
ValueError: The time index of the provided DataArray is missing the freq attribute, and the frequency could not be directly inferred. This probably comes from inconsistent date frequencies with missing dates. If you know the actual frequency, try setting fill_missing_dates=True, freq=actual_frequency. If not, try setting fill_missing_dates=True, freq=None to see if a frequency can be inferred.
These are company specific dates so the freq is Business day . When I tried to force it to business days it complains. Can someone help please?

Try:
TimeSeries.from_dataframe(df, fill_missing_dates=True, freq="D", time_col="Date")
If there are other issues with time col - you can also set date as index in dataframe and it should resolve date column issue.
df = df.set_index("Date")

Related

How to change date datetype of df column entries

Within a df, the entries of a column "Date" (n entries) are of type datetime.datetime and I want to convert every entry to type datetime.date. Can anyone help here? THX!
use to_datetime to convert the dates to datetiem
df[Date] = pd.to_datetime(df[Date]).dt.date

TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]

I have a column of years from the sunspots dataset.
I want to convert column 'year' in integer e.g. 1992 to datetime format then find the time delta and eventually compute total seconds (cumulative) to represent the time index column of a time series.
I am trying to use the following code but I get the error
TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]
sunspots_df['year'] = pd.to_timedelta(pd.to_datetime(sunspots_df['year'], format='%Y') ).dt.total_seconds()
pandas.Timedelta "[r]epresents a duration, the difference between two dates or times." So you're trying to get Python to tell you the difference between a particular datetime and...nothing. That's why it's failing.
If it's important that you store your index this way (and there may be better ways), then you need to pick a start datetime and compute the difference to get a timedelta.
For example, this code...
import pandas as pd
df = pd.DataFrame({'year': [1990,1991,1992]})
diff = (pd.to_datetime(df['year'], format='%Y') - pd.to_datetime('1990', format='%Y'))\
.dt.total_seconds()
...returns a series whose values are seconds from January 1st, 1990. You'll note that it doesn't invoke pd.to_timedelta(), because it doesn't need to: the result of the subtraction is automatically a pd.timedelta column.

How can I filter a pandas data frame based on a datetime column between current time and 10 hours ago?

I have a pandas DataFrame which includes a datetime column and I want to filter the data frame between the current hour and 10 hours ago. I have tried different ways to do it but still I cannot handle it. Because when I want to use pandas, the column type is Series and I can't use timedelta to compare them. If I use a for loop to compare the column as a string to my time interval, it is not efficient.
The table is like this:
And I want to filter the 'dateTime' column between current time and 10 hours ago, then filter based on 'weeks' > 80.
I have tried these codes as well But they have not worked:
filter_criteria = main_table['dateTime'].sub(today).abs().apply(lambda x: x.hours <= 10)
main_table.loc[filter_criteria]
This returns an error:
TypeError: unsupported operand type(s) for -: 'str' and 'datetime.datetime'
Similarly this code has the same problem:
main_table.loc[main_table['dateTime'] >= (datetime.datetime.today() - pd.DateOffset(hours=10))]
And:
main_table[(pd.to_datetime('today') - main_table['dateTime'] ).dt.hours.le(10)]
In all of the code above main_table is the name of my data frame.
How can I filter them?
First you need to make sure that your datatype in datetime column is correct. you can check it by using:
main_table.info()
If it is not datetime (i.e, object) convert it:
# use proper formatting if this line does not work
main_table['dateTime'] = pd.to_datetime(main_table['dateTime'])
Then you need to find the datetime object of ten hour before current time (ref):
from datetime import datetime, timedelta
date_time_ten_before = datetime.now() - timedelta(hours = 10)
All it remains is to filter the column:
main_table_10 = main_table[main_table['dateTime'] >= date_time_ten_before]

Extract the first 10 values of a column and create a new one [duplicate]

I am looking to convert datetime to date for a pandas datetime series.
I have listed the code below:
df = pd.DataFrame()
df = pandas.io.parsers.read_csv("TestData.csv", low_memory=False)
df['PUDATE'] = pd.Series([pd.to_datetime(date) for date in df['DATE_TIME']])
df['PUDATE2'] = datetime.datetime.date(df['PUDATE']) #Does not work
Can anyone guide me in right direction?
You can access the datetime methods of a Pandas series by using the .dt methods (in a aimilar way to how you would access string methods using .str. For your case, you can extract the date of your datetime column as:
df['PUDATE'].dt.date
This is a simple way to get day of month, from a pandas
#create a dataframe with dates as a string
test_df = pd.DataFrame({'dob':['2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04']})
#convert column to type datetime
test_df['dob']= pd.to_datetime(test_df['dob'])
# Extract day, month , year using dt accessor
test_df['DayOfMonth']=test_df['dob'].dt.day
test_df['Month']=test_df['dob'].dt.month
test_df['Year']=test_df['dob'].dt.year
I think you need to specify the format for example
df['PUDATE2']=datetime.datetime.date(df['PUDATE'], format='%Y%m%d%H%M%S')
So you just need to know what format you are using

column with dates into datetime index in dask

pd.DatetimeIndex(df_dask_dataframe['name_col'])
I have a dask dataframe for which I want to convert a column with dates into datetime index. However I get a not implemented error. Is there a workaround?
I think you need dask.dataframe.DataFrame.set_index if dtype of column is datetime64:
df_dask_dataframe = df_dask_dataframe.set_index('name_col')