How to solve an error, "module 'numpy' has no attribute 'float'"? - numpy

Circumstance
WSL2
Docker
Virtualenv
Python 3.8.16
jupyterlab 3.5.2
numpy 1.24.1
prophet 1.1.1
fbprophet 0.7.1
Cython 0.29.33
ipython 8.8.0
pmdarima 2.0.2
plotly 5.11.0
pip 22.3.1
pystan 2.19.1.1
scikit-learn 1.2.0
konlpy 0.6.0 (just in the case)
nodejs 0.1.1 (just in the case)
pandas 1.5.2 (just in the case)
Error
main error message
AttributeError: module 'numpy' has no attribute 'float'
entire error message
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[33], line 4
1 # Prophet() 모델을 읽어와서
2 # fit로 학습한다.
3 model_revenue = Prophet()
----> 4 model_revenue.fit(revenue_serial)
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:1115, in Prophet.fit(self, df, **kwargs)
1112 self.history = history
1113 self.set_auto_seasonalities()
1114 seasonal_features, prior_scales, component_cols, modes = (
-> 1115 self.make_all_seasonality_features(history))
1116 self.train_component_cols = component_cols
1117 self.component_modes = modes
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:765, in Prophet.make_all_seasonality_features(self, df)
763 # Seasonality features
764 for name, props in self.seasonalities.items():
--> 765 features = self.make_seasonality_features(
766 df['ds'],
767 props['period'],
768 props['fourier_order'],
769 name,
770 )
771 if props['condition_name'] is not None:
772 features[~df[props['condition_name']]] = 0
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:458, in Prophet.make_seasonality_features(cls, dates, period, series_order, prefix)
442 #classmethod
443 def make_seasonality_features(cls, dates, period, series_order, prefix):
444 """Data frame with seasonality features.
445
446 Parameters
(...)
456 pd.DataFrame with seasonality features.
457 """
--> 458 features = cls.fourier_series(dates, period, series_order)
459 columns = [
460 '{}_delim_{}'.format(prefix, i + 1)
461 for i in range(features.shape[1])
462 ]
463 return pd.DataFrame(features, columns=columns)
File /home/.venv/lib/python3.8/site-packages/fbprophet/forecaster.py:434, in Prophet.fourier_series(dates, period, series_order)
417 """Provides Fourier series components with the specified frequency
418 and order.
419
(...)
428 Matrix with seasonality features.
429 """
430 # convert to days since epoch
431 t = np.array(
432 (dates - datetime(1970, 1, 1))
433 .dt.total_seconds()
--> 434 .astype(np.float)
435 ) / (3600 * 24.)
436 return np.column_stack([
437 fun((2.0 * (i + 1) * np.pi * t / period))
438 for i in range(series_order)
439 for fun in (np.sin, np.cos)
440 ])
File /home/.venv/lib/python3.8/site-packages/numpy/__init__.py:284, in __getattr__(attr)
281 from .testing import Tester
282 return Tester
--> 284 raise AttributeError("module {!r} has no attribute "
285 "{!r}".format(__name__, attr))
AttributeError: module 'numpy' has no attribute 'float'
Example of dataset
ds y
0 2022-09-01 13:00:00 762
1 2022-09-01 15:00:00 746
2 2022-09-01 17:00:00 848
3 2022-09-01 19:00:00 866
4 2022-09-01 21:00:00 632
... ... ...
1881 2022-10-31 13:00:00 684
1882 2022-10-31 15:00:00 749
1883 2022-10-31 17:00:00 779
1884 2022-10-31 19:00:00 573
1885 2022-10-31 21:00:00 510
Type of variable
visitors_serial
ds datetime64[ns]
y int64
dtype: object
Short code
...
revenue_serial = pd.DataFrame(pd.to_datetime(df_active_time['START_DATE'], format="%Y%m%d %H:%M:%S"))
revenue_serial['객단가(원)']=df_active_time['객단가(원)']
revenue_serial = revenue_serial.reset_index(drop= True)
revenue_serial = revenue_serial.rename(columns={'START_DATE':'ds', '객단가(원)':'y'})
model_revenue = Prophet().
model_revenue.fit(revenue_serial)
I expected if I do upgrade the version of numpy module, it would be solved. It doesn't happend to solve

you could see it in your code the error numpy actually has no attribute float your code is t = np.array((dates - datetime(1970, 1, 1)).dt.total_seconds().astype(np.float) it should be
t = np.array(
(dates - datetime(1970, 1, 1))
.dt.total_seconds()
.astype(np.float32)

The alias numpy.float was deprecated in NumPy 1.20 and was removed in NumPy 1.24.
You can change it to numpy.float_, numpy.float64, or numpy.double. They all mean the same thing.
For your dependency prophet, the actual issue was already fixed in #1850 (March 2021), and it does appear to be fixed in v1.1.1 so it looks like you're not running the version you think you are.

Related

Removing duplicates based on matching column values with boolean indexing

After merging two DF's I have the following dataset:
DB_ID
x_val
y_val
x01
405
407
x01
405
405
x02
308
306
x02
308
308
x03
658
658
x03
658
660
x04
None
658
x04
None
660
x05
658
660
x06
660
660
The y table contains multiple values for the left join variable (not included in table), resulting in multiple rows per unique DB_ID (string variable, not in df index).
The issue is that only one row is correct, where x_val and y_val match. I tried removing the duplicates with the following code:
df= df[~df['DB_ID'].duplicated() | combined['x_val'] != combined['y_val']]
This however doesn't work. I am looking for a solution to achieve the following result:
DB_ID
x_val
y_val
x01
405
405
x02
308
308
x03
658
658
x04
None
658
x05
658
660
x06
660
660
Idea is compare both column for not equal, then sorting and reove duplicates by DB_ID:
df = (df.assign(new = df['x_val'].ne(df['y_val']))
.sort_values(['DB_ID','new'])
.drop_duplicates('DB_ID')
.drop('new', axis=1))
print (df)
DB_ID x_val y_val
1 x01 405 405
3 x02 308 308
4 x03 658 658
6 x04 None 658
8 x05 658 660
9 x06 660 660
If need equal NaNs or Nones use:
df = (df.assign(new = df['x_val'].fillna('same').ne(df['y_val'].fillna('same')))
.sort_values(['DB_ID','new'])
.drop_duplicates('DB_ID')
.drop('new', axis=1))
Maybe, you can simply use:
df = df[df['x_val'] == df['y_val']]
print(df)
# Output
DB_ID x_val y_val
1 x01 405 405
3 x02 308 308
4 x03 658 658
I think you don't need drop_duplicates or duplicated but if you want to ensure there remains only one instance of each DB_ID, you can append .drop_duplicates('DB_ID')
df = df[df['x_val'] == df['y_val']].drop_duplicates('DB_ID')
print(df)
# Output
DB_ID x_val y_val
1 x01 405 405
3 x02 308 308
4 x03 658 658

Add column for percentages

I have a df who looks like this:
Total Initial Follow Sched Supp Any
0 5525 3663 968 296 65 533
I transpose the df 'cause I have to add a column with the percentages based on column 'Total'
Now my df looks like this:
0
Total 5525
Initial 3663
Follow 968
Sched 296
Supp 65
Any 533
So, How can I add this percentage column?
The expected output looks like this
0 Percentage
Total 5525 100
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6
Do you know how can I add this new column?
I'm working in jupyterlab with pandas and numpy
Multiple column 0 by scalar from Total with Series.div, then multiple by 100 by Series.mul and last round by Series.round:
df['Percentage'] = df[0].div(df.loc['Total', 0]).mul(100).round(1)
print (df)
0 Percentage
Total 5525 100.0
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6
Consider below df:
In [1328]: df
Out[1328]:
b
a
Total 5525
Initial 3663
Follow 968
Sched 296
Supp 65
Any 533
In [1327]: df['Perc'] = round(df.b.div(df.loc['Total', 'b']) * 100, 1)
In [1330]: df
Out[1330]:
b Perc
a
Total 5525 100.0
Initial 3663 66.3
Follow 968 17.5
Sched 296 5.4
Supp 65 1.2
Any 533 9.6

Loss nan problem when using TFBertForSequenceClassification

I have a problem when training a model for multi-label text classification.
I'm working at Colab as follows:
def create_sentiment_bert():
config = BertConfig.from_pretrained("monologg/kobert", num_labels=52)
model = TFBertForSequenceClassification.from_pretrained("monologg/kobert", config=config, from_pt=True)
opt = tf.keras.optimizers.Adam(learning_rate=4.0e-6)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
metric = tf.keras.metrics.SparseCategoricalAccuracy("accuracy")
model.compile(optimizer=opt, loss=loss, metrics=[metric])
return model
sentiment_model = create_sentiment_bert()
sentiment_model.fit(train_x, train_y, epochs=2, shuffle=True, batch_size=250, validation_data=(test_x, test_y))
The result is as follows:
Epoch 1/2
739/14065 [>.............................] - ETA: 35:31 - loss: nan - accuracy: 0.0000e+00
I have checked out my data: no nan or null or invalid values.
I tried different optimizers, # of epochs, learning rate, but had the same problem.
The number of labels is 52 and the distribution is as follows:
[Label] [Count]
501 694624
601 651306
401 257665
210 250352
307 170665
301 153318
306 147948
201 141382
302 113917
402 102040
606 101434
506 73492
305 69876
604 62056
403 57956
104 56800
107 55503
607 40293
503 36272
505 34757
303 26884
308 24539
304 22135
205 20744
509 19465
206 16665
508 15334
208 13335
603 13240
504 12299
602 10684
202 10366
209 8267
106 6564
502 5880
211 5804
207 2794
507 1967
108 1860
204 1633
105 1545
109 682
605 426
102 276
101 274
405 268
212 204
213 153
103 103
203 90
404 65
608 37
I'm a beginner in this area. Please help me. Thanks in advance!
Why do you have from_logits = False? The classifier head returns logits so unless you put a softmax activation within your model, you need to calculate the loss from logits.

How to plot using timstamp and coordinates?

I have logs of mouse movement that is coordinates and timestamp .I want to plot the mouse movement using this log how can I do this I have no idea what API or what can be used to do the same.I want to know how start with if there is some way which exist.
My log is as follows
Date hr:min:sec ms x y
13/6/2020 13:13:33 521 291 283
13/6/2020 13:13:33 638 273 234
13/6/2020 13:13:33 647 272 233
13/6/2020 13:13:33 657 271 231
13/6/2020 13:13:33 667 269 230
13/6/2020 13:13:33 677 268 229
13/6/2020 13:13:33 687 267 228
13/6/2020 13:13:33 697 264 226
You're looking for geom_path() from ggplot2. The geom will connect a line between all your observations based on the order they appear in the dataframe. So, here's some x,y data that's expanded a bit:
df <- data.frame(
x=c(291,273,272,271,269,268,267,264,262,261,261,265,268,280,290),
y=c(283,234,233,231,230,229,228,226,230,235,237,248,252,246,235)
)
And some code to make a simple plot using geom_path():
p <- ggplot(df, aes(x=x,y=y)) + theme_classic() +
geom_path(color='blue') + geom_point()
p
If you want, you can even save that as an animation based on your time points. See the code below using the gganimate package:
library(gganimate)
df$time <- 1:15
a <- p + transition_reveal(time)
animate(a, fps=20)

Pandas adding row to categorical index

I have a scenario where I would like to group my datasets by personally defined week indexes that are then averaged and aggregate the averages into a "Total" row. I am able to achieve the first half of my scenario, but when I try to append/insert a new "Total" row that sums these rows I am receiving error messages.
I attempted to create this row via two different methods:
Method 1:
week_index_avg_unit.loc['Total'] = week_index_avg_unit.sum()
TypeError: cannot append a non-category item to a CategoricalIndex
Method 2:
week_index_avg_unit.index.insert(['Total'], week_index_avg_unit.sum())
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I have used the first approach in this scenario multiple times, but this is the first time where I'm cutting the data into multiple categories and clearly see where the CategoricalIndex type is the problem.
Here is the format of my data:
date organic ppc oa other content_partnership total \
0 2018-01-01 379 251 197 51 0 878
1 2018-01-02 880 527 405 217 0 2029
2 2018-01-03 859 589 403 323 0 2174
3 2018-01-04 835 533 409 335 0 2112
4 2018-01-05 760 449 355 272 0 1836
year_month day weekday weekday_name week_index
0 2018-01 1 0 Monday Week 1
1 2018-01 2 1 Tuesday Week 1
2 2018-01 3 2 Wednesday Week 1
3 2018-01 4 3 Thursday Week 1
4 2018-01 5 4 Friday Week 1
Here is the code:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
historicals = pd.read_csv("2018-2019_plants.csv")
# Capture dates for additional date columns
date_col = pd.to_datetime(historicals['date'])
historicals['year_month'] = date_col.dt.strftime("%Y-%m")
historicals['day'] = date_col.dt.day
historicals['weekday'] = date_col.dt.dayofweek
historicals['weekday_name'] = date_col.dt.day_name()
# create week ranges segment (7 day range)
historicals['week_index'] = pd.cut(historicals['day'],[0,7,14,21,28,32], labels=['Week 1','Week 2','Week 3','Week 4','Week 5'])
# Week Index Average (Units)
week_index_avg_unit = historicals[df_monthly_average].groupby(['week_index']).mean().astype(int)
type(week_index_avg_unit.index)
pandas.core.indexes.category.CategoricalIndex
Here is the week_index_avg_unit table:
organic ppc oa other content_partnership total day weekday
week_index
Week 1 755 361 505 405 22 2027 4 3
Week 2 787 360 473 337 19 1959 11 3
Week 3 781 382 490 352 18 2006 18 3
...
pd.CategoricalIndex is a special animal. It is immutable, so to do the trick you may need to use something like pd.CategoricalIndex.set_categories to add a new category.
See pandas docs: https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.CategoricalIndex.html