'Timestamp' object has no attribute 'dt' pandas - pandas

I have a dataset of 100,000 rows and 15 column in a 10mb csv.
the column I am working on is a : Date/Time column in a string format
source code
import pandas as pd
import datetime as dt
trupl = pd.DataFrame({'Time/Date' : ['12/1/2021 2:09','22/4/2021 21:09','22/6/2021 9:09']})
trupl['Time/Date'] = pd.to_datetime(trupl['Time/Date'])
print(trupl)
Output
Time/Date
0 2021-12-02 02:09:00
1 2021-04-22 21:09:00
2 2021-06-22 09:09:00
What I need to do is a bit confusing but I'll try to make it simple :
if the time of the date is between 12 am and 8 am ; subtract one day from the Time/Date and put the new timestamp in a new column.
if not, put it as it is.
Expected output
Time/Date Date_adjusted
0 12/2/2021 2:09 12/1/2021 2:09
1 22/4/2021 21:09 22/4/2021 21:09
2 22/6/2021 9:09 22/6/2021 9:09
I tried the below code :
trupl['Date_adjusted'] = trupl['Time/Date'].map(lambda x:x- dt.timedelta(days=1) if x >= dt.time(0,0,0) and x < dt.time(8,0,0) else x)
i get a TypeError: '>=' not supported between 'Timestamp' and 'datetime.time'
and when applying dt.time to x , i get an error " Timestamp" object has no attribute 'dt'
so how can i convert x to time in order to compare it ? or there is a better workaround ?
I searched a lot for a fix but I couldn't find a similar case.

Try:
trupl['Date_adjusted'] = trupl['Time/Date'].map(lambda x: x - dt.timedelta(days=1) if (x.hour >= 0 and x.hour < 8) else x)

Related

convert to datetime based on condition

I want to convert my datetime object into seconds
0 49:36.5
1 50:13.7
2 50:35.8
3 50:37.4
4 50:39.3
...
92 1:00:47.8
93 1:01:07.7
94 1:02:15.3
95 1:05:03.0
96 1:05:29.6
Name: Finish, Length: 97, dtype: object
the problem is that the format changes at index 92 which results in an error: ValueError: expected hh:mm:ss format before .
This error is caused when I try to convert the column to seconds
filt_data["F"] = pd.to_timedelta('00:'+filt_data["Finish"]).dt.total_seconds()
when I do the conversion in two steps it works but results in two different column which I don't know how to merge nor does it seem really efficient:
filt_data["F1"] = pd.to_timedelta('00:'+filt_data["Finish"].loc[0:89]).dt.total_seconds()
filt_data["F2"] = pd.to_timedelta('0'+filt_data["Finish"].loc[90:97]).dt.total_seconds()
the above code does not cause any error and gets the job done but results in two different columns. Any idea how to do this?
Ideally I would like to loop through the column and based on the format i.E. "50:39.3" or "1:00:47.8" add "00:" or "0" to the object.
I would use str.replace:
pd.to_timedelta(df['Finish'].str.replace('^(\d+:\d+\.\d+)', r'0:\1', regex=True))
Or str.count and map:
pd.to_timedelta(df['Finish'].str.count(':').map({1: '0:', 2: ''}).add(df['Finish']))
Output:
0 0 days 00:49:36.500000
1 0 days 00:50:13.700000
2 0 days 00:50:35.800000
3 0 days 00:50:37.400000
4 0 days 00:50:39.300000
92 0 days 01:00:47.800000
93 0 days 01:01:07.700000
94 0 days 01:02:15.300000
95 0 days 01:05:03
96 0 days 01:05:29.600000
Name: Finish, dtype: timedelta64[ns]
Given your data:
import pandas as pd
times = [
"49:36.5",
"50:13.7",
"50:35.8",
"50:37.4",
"50:39.3",
"1:00:47.8",
"1:01:07.7",
"1:02:15.3",
"1:05:03.0",
"1:05:29.6",
]
df = pd.DataFrame({'time': times})
df
You can write a function that you apply on each separate entry in the time column:
def format_time(time):
time = time.split('.')[0]
time = time.split(':')
if(len(time) < 3):
time.insert(0, "0")
return ":".join(time)
df["formatted_time"] = df.time.apply(format_time)
df
Then you could undertake two steps:
Convert column to datetime
Convert column to UNIX timestamp (number of seconds since 1970-01-01)
df["time_datetime"] = pd.to_datetime(df.formatted_time, infer_datetime_format=True)
df["time_seconds"] = (df.time_datetime - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s')
df

TypeError: '<' not supported between instances of 'int' and 'Timestamp'

I am trying to change the product name when the period between the expiry date and today is less than 6 months. When I try to add the color, the following error appears:
TypeError: '<' not supported between instances of 'int' and 'Timestamp'.
Validade is the column where the products expiry dates are in. How do I solve it?
epi1 = pd.read_excel('/content/timadatepandasepi.xlsx')
epi2 = epi1.dropna(subset=['Validade'])`
pd.DatetimeIndex(epi2['Validade'])
today = pd.to_datetime('today').normalize()
epi2['ate_vencer'] = (epi2['Validade'] - today) /np.timedelta64(1, 'M')
def add_color(x):
if 0 <x< epi2['ate_vencer']:
color='red'
return f'background = {color}'
epi2.style.applymap(add_color, subset=['Validade'])
Looking at your data, it seems that you're subtracting two dates and using this result inside your comparison. The problem is likely occurring because df['date1'] - today returns a pandas.Series with values of type pandas._libs.tslibs.timedeltas.Timedelta, and this type of object does not allow you to make comparisons with integers. Here's a possible solution:
epi2['ate_vencer'] = (epi2['Validade'] - today).dt.days
# Now you can compare values from `"ate_vencer"` with integers. For example:
def f(x): # Dummy function for demonstration purposes
return 0 < x < 10
epi2['ate_vencer'].apply(f) # This works
Example 1
Here's a similar error to yours, when subtracting dates and calling function f without .dt.days:
Example 2
Here's the same code but instead using .dt.days:

Trying to apply formula to a date column in pandas

I have this df with thousands of rows from which one of the columns is date:
The df.head() shows:
id_code texto date
0 ZZZZZZZZZZZZ ha tenido su corrección 2019-03-31
0 WWWWWWWWWWWW cierra la venta de sus plans 2019-03-29
0 XXXXXXXXXXXX se han reunido en ferraz 2019-03-26
0 AAAAAAAAAAAA marca es buen periodico 2019-03-12
I would like to apply the following formula to the date column :
initial_date=(pd.to_datetime("today")- pd.DateOffset(years=1)).strftime('%Y-%m-%d')
final_date=pd.to_datetime("today").strftime('%Y-%m-%d')
df["ponderacion"]=1-(final_date-pd.to_datetime(df.date))/(final_date-initial_date)
however when returning the df outputs:
ValueError: format number 1 of "b'2019-04-15'" is not recognized
Should I .decode('UTF-8') the date.values to turn them into str and then to datetime?
If that was the case, when I when tried to decode the date.values outputs :
AttributeError: 'numpy.ndarray' object has no attribute 'decode'
Could anyone give me some light on how could I overcome this issue and apply the desired formula to df.date?
The source of your problem is that you keep date values as strings.
After creation of your DataFrame, you should first convert date
column from string to datetime:
df.date = pd.to_datetime(df.date)
Then you can compute initial and final dates:
final_date = pd.to_datetime('today')
initial_date = final_date - pd.DateOffset(years=1)
Note the sequence:
First compute final_date, without conversion to string.
Then compute initial_date as one year before final_date.
Otherwise there is some difference in fractional part of second.
And the final step is to compute your column:
df['ponderacion'] = 1 - (final_date - df.date)/(final_date - initial_date)
also without conversion to string.
Use apply to convert bytes to strings:
pd.to_datetime(df.date.apply(str, encoding='ascii'))
It applies the function specified (str in this case) to each element of the Series, and it is possible to specify arguments to the function (encoding='ascii' here).

transform data frame in time series for date type POSIXct

I have a data frame with the following two variables:
amount: num 1213.5 34.5 ...
txn_date: POSIXct, format "2017-05-01 12:13:30" ...
I want to transform it in a time series using ts().
I started using this code:
Z <- zoo(data$amount, order.by=as.Date(as.character(data$txn_date), format="%Y/%m/%d %H:%M:%S"))
But the problem is that in Z I loose the dates. In fact, all the dates are reported as NA.
How can I solve it?
For my analysis is important to have date in the format:%Y/%m/%d %H:%M:%S
for example 2017-05-01 12:13:30. I don't want to remove the time component in the variable txn_date.
Yhan you for your help,
Andrea
I think your prolem comes from the way you're manipulating your data frame, could post more details about it please ?
I think i have a fix for you.
Data frame I used :
> df1
$data
value
1 1.9150
2 3.1025
3 6.7400
4 8.5025
5 11.0025
6 9.8025
7 9.0775
8 7.0900
9 6.8525
10 7.4900
$date
%Y-%m-%d
1 1974-01-01
2 1974-01-02
3 1974-01-03
4 1974-01-04
5 1974-01-05
6 1974-01-06
7 1974-01-07
8 1974-01-08
9 1974-01-09
10 1974-01-10
> class(df1$data$value)
[1] "numeric"
> class(df1$date$`%Y-%m-%d`)
[1] "POSIXct" "POSIXt"
Then I can create a time serie by calling zoo like that :
> Z<-zoo(df1$data,order.by=(as.POSIXct(df1$date$`%Y-%m-%d`)))
> Z
value
1974-01-01 1.9150
1974-01-02 3.1025
1974-01-03 6.7400
1974-01-04 8.5025
1974-01-05 11.0025
1974-01-06 9.8025
1974-01-07 9.0775
1974-01-08 7.0900
1974-01-09 6.8525
1974-01-10 7.4900
The important thing here is that I use df1$date$%Y-%m-%d instead of just
df1$date
In fact if I try the way you did it I get NA values too :
> Z<-zoo(df1$data,order.by=as.POSIXct(as.Date(as.character(df1$date),format("%Y-%m-%d"))))
> Z
value
<NA> 1.915
To get the name of data$txn_date you can use the following command : names(data$txn_date) and try my solution with your data frame and name.
> names(df1$date)
[1] "%Y-%m-%d"

pd.datetime is failing to convert to date

I have a data frame, which has a column 'Date', it is a string type, and as I want to use the column 'Date' as index, first I want to convert it to datetime, so I did:
data['Date'] = pd.to_datetime(data['Date'])
then I did,
data = data.set_index('Date')
but when I tried to do
data = data.loc['01/06/2006':'09/06/2006',]
the slicing is not accomplished, there is no Error but the slicing doesn't occur, I tried with iloc
data = data.iloc['01/06/2006':'09/06/2006',]
and the error message is the following:
TypeError: cannot do slice indexing on <class `'pandas.tseries.index.DatetimeIndex'> with these indexers [01/06/2006] of <type 'str'>`
So I come to the conclusion that the pd.to_datetime didn't work, even though no Error was raised?
Can anybody clarify what is going on? Thanks in advance
It seems you need change order of datetime string to YYYY-MM-DD:
data = data.loc['2006-06-01':'2006-06-09']
Sample:
data = pd.DataFrame({'col':range(15)}, index=pd.date_range('2006-06-01','2006-06-15'))
print (data)
col
2006-06-01 0
2006-06-02 1
2006-06-03 2
2006-06-04 3
2006-06-05 4
2006-06-06 5
2006-06-07 6
2006-06-08 7
2006-06-09 8
2006-06-10 9
2006-06-11 10
2006-06-12 11
2006-06-13 12
2006-06-14 13
2006-06-15 14
data = data.loc['2006-06-01':'2006-06-09']
print (data)
col
2006-06-01 0
2006-06-02 1
2006-06-03 2
2006-06-04 3
2006-06-05 4
2006-06-06 5
2006-06-07 6
2006-06-08 7
2006-06-09 8
As I what I want is to create a new DataFrame with specific dates from the original DataFrame, I convert the column 'Date' as Index
data = data.set_index(data['Date'])
And then just create the new Data Frame using loc
data1 = data.loc['01/06/2006':'09/06/2006']
I am quite new to Python and I thought that I needed to convert to datetime the column 'Date' which is string, but apparently is not necessary. Thanks for your help #jezrael