pd.DatetimeIndex(df_dask_dataframe['name_col'])
I have a dask dataframe for which I want to convert a column with dates into datetime index. However I get a not implemented error. Is there a workaround?
I think you need dask.dataframe.DataFrame.set_index if dtype of column is datetime64:
df_dask_dataframe = df_dask_dataframe.set_index('name_col')
Related
I have a date column as you see which is pd datetime on a df. When I tried converting into series I get :
series = TimeSeries.from_dataframe(df,time_col='Date')
ValueError: The time index of the provided DataArray is missing the freq attribute, and the frequency could not be directly inferred. This probably comes from inconsistent date frequencies with missing dates. If you know the actual frequency, try setting fill_missing_dates=True, freq=actual_frequency. If not, try setting fill_missing_dates=True, freq=None to see if a frequency can be inferred.
These are company specific dates so the freq is Business day . When I tried to force it to business days it complains. Can someone help please?
Try:
TimeSeries.from_dataframe(df, fill_missing_dates=True, freq="D", time_col="Date")
If there are other issues with time col - you can also set date as index in dataframe and it should resolve date column issue.
df = df.set_index("Date")
Within a df, the entries of a column "Date" (n entries) are of type datetime.datetime and I want to convert every entry to type datetime.date. Can anyone help here? THX!
use to_datetime to convert the dates to datetiem
df[Date] = pd.to_datetime(df[Date]).dt.date
I have a pandas DataFrame which includes a datetime column and I want to filter the data frame between the current hour and 10 hours ago. I have tried different ways to do it but still I cannot handle it. Because when I want to use pandas, the column type is Series and I can't use timedelta to compare them. If I use a for loop to compare the column as a string to my time interval, it is not efficient.
The table is like this:
And I want to filter the 'dateTime' column between current time and 10 hours ago, then filter based on 'weeks' > 80.
I have tried these codes as well But they have not worked:
filter_criteria = main_table['dateTime'].sub(today).abs().apply(lambda x: x.hours <= 10)
main_table.loc[filter_criteria]
This returns an error:
TypeError: unsupported operand type(s) for -: 'str' and 'datetime.datetime'
Similarly this code has the same problem:
main_table.loc[main_table['dateTime'] >= (datetime.datetime.today() - pd.DateOffset(hours=10))]
And:
main_table[(pd.to_datetime('today') - main_table['dateTime'] ).dt.hours.le(10)]
In all of the code above main_table is the name of my data frame.
How can I filter them?
First you need to make sure that your datatype in datetime column is correct. you can check it by using:
main_table.info()
If it is not datetime (i.e, object) convert it:
# use proper formatting if this line does not work
main_table['dateTime'] = pd.to_datetime(main_table['dateTime'])
Then you need to find the datetime object of ten hour before current time (ref):
from datetime import datetime, timedelta
date_time_ten_before = datetime.now() - timedelta(hours = 10)
All it remains is to filter the column:
main_table_10 = main_table[main_table['dateTime'] >= date_time_ten_before]
I am looking to convert datetime to date for a pandas datetime series.
I have listed the code below:
df = pd.DataFrame()
df = pandas.io.parsers.read_csv("TestData.csv", low_memory=False)
df['PUDATE'] = pd.Series([pd.to_datetime(date) for date in df['DATE_TIME']])
df['PUDATE2'] = datetime.datetime.date(df['PUDATE']) #Does not work
Can anyone guide me in right direction?
You can access the datetime methods of a Pandas series by using the .dt methods (in a aimilar way to how you would access string methods using .str. For your case, you can extract the date of your datetime column as:
df['PUDATE'].dt.date
This is a simple way to get day of month, from a pandas
#create a dataframe with dates as a string
test_df = pd.DataFrame({'dob':['2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04']})
#convert column to type datetime
test_df['dob']= pd.to_datetime(test_df['dob'])
# Extract day, month , year using dt accessor
test_df['DayOfMonth']=test_df['dob'].dt.day
test_df['Month']=test_df['dob'].dt.month
test_df['Year']=test_df['dob'].dt.year
I think you need to specify the format for example
df['PUDATE2']=datetime.datetime.date(df['PUDATE'], format='%Y%m%d%H%M%S')
So you just need to know what format you are using
My data consists of
Year and Quarter. I would like to make it into the DateTime index. such as YYY-MM-DD.
thanks
Data
df=pd.DataFrame({'q':['2020-Q1', '2030-Q3']})
df
Coerce to datetime
df['q']=pd.to_datetime(df['q'])
df