Trying to apply formula to a date column in pandas - pandas

I have this df with thousands of rows from which one of the columns is date:
The df.head() shows:
id_code texto date
0 ZZZZZZZZZZZZ ha tenido su corrección 2019-03-31
0 WWWWWWWWWWWW cierra la venta de sus plans 2019-03-29
0 XXXXXXXXXXXX se han reunido en ferraz 2019-03-26
0 AAAAAAAAAAAA marca es buen periodico 2019-03-12
I would like to apply the following formula to the date column :
initial_date=(pd.to_datetime("today")- pd.DateOffset(years=1)).strftime('%Y-%m-%d')
final_date=pd.to_datetime("today").strftime('%Y-%m-%d')
df["ponderacion"]=1-(final_date-pd.to_datetime(df.date))/(final_date-initial_date)
however when returning the df outputs:
ValueError: format number 1 of "b'2019-04-15'" is not recognized
Should I .decode('UTF-8') the date.values to turn them into str and then to datetime?
If that was the case, when I when tried to decode the date.values outputs :
AttributeError: 'numpy.ndarray' object has no attribute 'decode'
Could anyone give me some light on how could I overcome this issue and apply the desired formula to df.date?

The source of your problem is that you keep date values as strings.
After creation of your DataFrame, you should first convert date
column from string to datetime:
df.date = pd.to_datetime(df.date)
Then you can compute initial and final dates:
final_date = pd.to_datetime('today')
initial_date = final_date - pd.DateOffset(years=1)
Note the sequence:
First compute final_date, without conversion to string.
Then compute initial_date as one year before final_date.
Otherwise there is some difference in fractional part of second.
And the final step is to compute your column:
df['ponderacion'] = 1 - (final_date - df.date)/(final_date - initial_date)
also without conversion to string.

Use apply to convert bytes to strings:
pd.to_datetime(df.date.apply(str, encoding='ascii'))
It applies the function specified (str in this case) to each element of the Series, and it is possible to specify arguments to the function (encoding='ascii' here).

Related

TypeError: '<' not supported between instances of 'int' and 'Timestamp'

I am trying to change the product name when the period between the expiry date and today is less than 6 months. When I try to add the color, the following error appears:
TypeError: '<' not supported between instances of 'int' and 'Timestamp'.
Validade is the column where the products expiry dates are in. How do I solve it?
epi1 = pd.read_excel('/content/timadatepandasepi.xlsx')
epi2 = epi1.dropna(subset=['Validade'])`
pd.DatetimeIndex(epi2['Validade'])
today = pd.to_datetime('today').normalize()
epi2['ate_vencer'] = (epi2['Validade'] - today) /np.timedelta64(1, 'M')
def add_color(x):
if 0 <x< epi2['ate_vencer']:
color='red'
return f'background = {color}'
epi2.style.applymap(add_color, subset=['Validade'])
Looking at your data, it seems that you're subtracting two dates and using this result inside your comparison. The problem is likely occurring because df['date1'] - today returns a pandas.Series with values of type pandas._libs.tslibs.timedeltas.Timedelta, and this type of object does not allow you to make comparisons with integers. Here's a possible solution:
epi2['ate_vencer'] = (epi2['Validade'] - today).dt.days
# Now you can compare values from `"ate_vencer"` with integers. For example:
def f(x): # Dummy function for demonstration purposes
return 0 < x < 10
epi2['ate_vencer'].apply(f) # This works
Example 1
Here's a similar error to yours, when subtracting dates and calling function f without .dt.days:
Example 2
Here's the same code but instead using .dt.days:

'Timestamp' object has no attribute 'dt' pandas

I have a dataset of 100,000 rows and 15 column in a 10mb csv.
the column I am working on is a : Date/Time column in a string format
source code
import pandas as pd
import datetime as dt
trupl = pd.DataFrame({'Time/Date' : ['12/1/2021 2:09','22/4/2021 21:09','22/6/2021 9:09']})
trupl['Time/Date'] = pd.to_datetime(trupl['Time/Date'])
print(trupl)
Output
Time/Date
0 2021-12-02 02:09:00
1 2021-04-22 21:09:00
2 2021-06-22 09:09:00
What I need to do is a bit confusing but I'll try to make it simple :
if the time of the date is between 12 am and 8 am ; subtract one day from the Time/Date and put the new timestamp in a new column.
if not, put it as it is.
Expected output
Time/Date Date_adjusted
0 12/2/2021 2:09 12/1/2021 2:09
1 22/4/2021 21:09 22/4/2021 21:09
2 22/6/2021 9:09 22/6/2021 9:09
I tried the below code :
trupl['Date_adjusted'] = trupl['Time/Date'].map(lambda x:x- dt.timedelta(days=1) if x >= dt.time(0,0,0) and x < dt.time(8,0,0) else x)
i get a TypeError: '>=' not supported between 'Timestamp' and 'datetime.time'
and when applying dt.time to x , i get an error " Timestamp" object has no attribute 'dt'
so how can i convert x to time in order to compare it ? or there is a better workaround ?
I searched a lot for a fix but I couldn't find a similar case.
Try:
trupl['Date_adjusted'] = trupl['Time/Date'].map(lambda x: x - dt.timedelta(days=1) if (x.hour >= 0 and x.hour < 8) else x)

How to change from character to Date/Time when Year is in '20 format?

I have a column in my data frame that has class character but I want to convert it to date/time. The format of the dates are as follow, month-day-year hour:minute:second AM/PM. I've tried things like as.date and as.POSIXct but I think I'm having a problem with the year because instead of the year being 2020 it is 20 so that the format looks like so 06-25-20 08:00:00AM. Here is an example dataframe
# Create a, b, c, d variables
a <- c("06-25-20 08:00:00 AM","06-25-20 08:15:00 AM","06-25-20 08:30:00 AM","06-25-20 08:45:00 AM")
b <- c('book', 'pen', 'textbook', 'pencil_case')
c <- c(2.5, 8, 10, 7)
# Join the variables to create a data frame
df <- data.frame(a,b,c)
Using the code below, the entire column turns to N/A
df$a = as.Date(df$a,'%d-%m-%y %H:%M:%S')
Using the code below I get an error that says
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
In addition: There were 12 warnings (use warnings() to see them)
df$a = as.POSIXct(df$a, "%m-%d-%y %H:%M:%S")
I needed to identify the format,
df$a = as.POSIXlt(df$a,format="%m-%d-%y%H:%M:%S", tz = 'EST')

groupby based on one column and get the sum values in another column

I have a data frame such
mode travel time
transit_walk 284.0
transit_walk 284.0
pt 270.0
transit_walk 346.0
walk 455.0
I want to group by "mode" and get the sum of all travel time.
so my desire result looks like:
mode total travel time
transit_ walk 1200000000
pt 30000000
walk 88888888
I have written the code such as
df.groupby('mode')['travel time'].sum()
however, I have the result such as:
mode
pt 270.01488.01518.01788.01300.01589.01021.01684....
transit_walk 284.0284.0346.0142.0142.01882.0154.0154.0336.0...
walk 455.018.0281.0554.0256.0256.0244.0244.0244.045...
Name: travel time, dtype: object
which just put all the time side by side, and it didn't sum them up.
There are strings in column travel time, so try use Series.astype:
df['travel time'] = df['travel time'].astype(float)
If failed bcause some not numeric value, use to_numeric with errors='coerce':
df['travel time'] = pd.to_numeric(df['travel time'], errors='coerce')
And last aggregate:
df1 = df.groupby('mode', as_index=False)['travel time'].sum()

Converting Negative Number in String Format to Numeric when Sign as at the end

I have certain numbers within a column of my dataframe that have negative numbers in a string format like this: "500.00-" I need to convert every negative number within the column to numeric format. I'm sure there's an easy way to do this, but I have struggled finding one specific to pandas dataframe. Any help would be greatly appreciated.
I have tried the basic to_numeric function as shown below, but it doesn't read it in correctly. Also, only some of the numbers within the column are negative, therefore I can't simply remove all the negative signs and multiply the column by 1.
Q1['Credit'] = pd.to_numeric(Q1['Credit'])
Sample data:
df:
num
0 50.00
1 60.00-
2 70.00+
3 -80.00
Using series str accessor to check last digit. If it is '-' or '+', swap it to front. Use df.mask to apply it only to rows having -/+ as suffix. Finally, astype column to float
df.num.mask(df.num.str[-1].isin(['-','+']), df.num.str[-1].str.cat(df.num.str[:-1])).astype('float')
Out[1941]:
0 50.0
1 -60.0
2 70.0
3 -80.0
Name: num, dtype: float64
Possibly a bit explicit but would work
# build a mask of negative numbers
m_neg = Q1["Credit"].str.endswith("-")
# remove - signs
Q1["Credit"] = Q1["Credit"].str.rstrip("-")
# convert to number
Q1["Credit"] = pd.to_numeric(Q1["Credit"])
# Apply the mask to create the negatives
Q1.loc[m_neg, "Credit"] *= -1
Let us consider the following example dataframe:
Q1 = pd.DataFrame({'Credit':['500.00-', '100.00', '300.00-']})
Credit
0 500.00-
1 100.00
2 300.00-
We can use str.endswith to create a mask which indicates the negative numbers. Then we use np.where to conditionally convert the numbers to negative:
m1 = Q1['Credit'].str.endswith('-')
m2 = Q1['Credit'].str[:-1].astype(float)
Q1['Credit'] = np.where(m1, -m2, m2)
Output
Credit
0 -500.0
1 100.0
2 -300.0