numpy typing works locally but not in circleci - numpy

I have numpy 1.22.4 installed locally and I changed the output of the expected methods from
def _predict(self, X: pd.DataFrame) -> np.array:
to
def _predict(self, X: pd.DataFrame) -> np.typing.ndarray:
When I execute locally mypy feature_engine the tests pass. But when committing the code to the repo, and the tests run in circle ci, they do not pass, because of the following error:
AttributeError: module 'numpy' has no attribute 'typing'
When I check the circleci environment, it seems to be working with numpy 1.22.4 as well:
Any idea why this could be happening?
For reference, this is the PR in question. These are the versions I have locally:
Python version: 3.10.3
Numpy version: 1.22.4
Pandas version: 1.4.2
Scikit-learn version: 1.1.1
Scipy version: 1.8.1
Statsmodels version: 0.13.2
Mypy version: 0.961
And the error thrown locally when I do not update from np.array to np.typing.ndarray can be seen here:

I found that I have to do something like: import numpy.typing as npt
in addition to my standard numpy import. I can then use the npt alias for all my type hints. Before I made that change I would get the ImportError whenever I tried to run my code.

Related

Issue with 'pandas on spark' used with conda: "No module named 'pyspark.pandas'" even though both pyspark and pandas are installed

I have installed both Spark 3.1.3 and Anaconda 4.12.0 on Ubuntu 20.04.
I have set PYSPARK_PYTHON to be the python bin of a conda environment called my_env
export PYSPARK_PYTHON=~/anaconda3/envs/my_env/bin/python
I installed several packages on conda environment my_env using pip. Here is a portion of the output of pip freeze command:
numpy==1.22.3
pandas==1.4.1
py4j==0.10.9.3
pyarrow==7.0.0
N.B: package pyspark is not installed on the conda environment my_env. I would like to be able to launch a pyspark shell on different conda environments without having to reinstall pyspark in every environment (I would like to only modify PYSPARK_PYTHON). This would also avoids having different versions of Spark on different conda environments (which is sometimes desirable but not always).
When I launch a pyspark shell using pyspark command, I can indeed import pandas and numpy which confirms that PYSPARK_PYTHON is properly set (my_env is the only conda env with pandas and numpy installed, moreover pandas and numpy are not installed on any other python installation even outside conda, and finally if I change PYSPARK_PYTHON I am no longer able to import pandas or numpy).
Inside the pyspark shell, the following code works fine (creating and showing a toy Spark dataframe):
sc.parallelize([(1,2),(2,4),(3,5)]).toDF(["a", "b"]).show()
However, if I try to convert the above dataframe into a pandas on spark dataframe it does not work. The command
sc.parallelize([(1,2),(2,4),(3,5)]).toDF(["t", "a"]).to_pandas_on_spark()
returns:
AttributeError: 'DataFrame' object has no attribute 'to_pandas_on_spark'
I tried to first import pandas (which works fine) and then pyspark.pandas before running the above command but when I run
import pyspark.pandas as ps
I obtain the following error:
ModuleNotFoundError: No module named 'pyspark.pandas'
Any idea why this happens ?
Thanks in advance
From here, it seems that you need apache spark 3.2, not 3.1.3. Update to 3.2 and you will have the desired API.
pip install pyspark #need spark 3.3
import pyspark.pandas as ps

Julia cant find module even though it knows it is installed

I am trying to run a simulator called FLOWUnsteady. At one point, Julia complains:
ERROR: LoadError: InitError: PyError ($(Expr(:escape, :(ccall(#= C:\Users\dsfjk\.julia\packages\PyCall\L0fLP\src\pyfncall.jl:43 =# #pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'ModuleNotFoundError'>
ModuleNotFoundError("No module named 'scipy'")
but at the same time:
'''
julia> Pkg.add("SciPy")
Resolving package versions...
No Changes to C:\Users\dsfjk\.julia\environments\v1.7\Project.toml
No Changes to C:\Users\dsfjk\.julia\environments\v1.7\Manifest.toml
'''
How does it not see the package it itself installed?
Looking at the error Julia tries to load scipy via the PyCall and does not see it.
The easiest way to replicate it would be:
using PyCall
pyimport("scipy")
Assuming you will see the same error, the problem is that the module SciPy.jl does not install Python scipy until it is used for the first time. This can be solved easily by loading the module:
julia> using SciPy
[ Info: Installing scipy via the Conda scipy package...
[ Info: Running `conda install -y scipy` in root environment
...
Another option is to add Python scipy manually to your Julia installation:
using Conda
Conda.add("scipy")

Failed to install Numpy 1.20.2 with Poetry on Python 3.9

When I try to install Numpy 1.20.2 module with Python Poetry 1.1.4 package manager (poetry add numpy) in a Python 3.9.0 virtual environment, I get:
ERROR: Failed building wheel for numpy
Failed to build numpy
ERROR: Could not build wheels for numpy which use PEP 517 and cannot be installed directly
I read a few threads like this one, but since then it seems the latest Numpy versions are supposed to be able to be built with 3.9 (see this official Numpy release doc, and this answer).
Did I miss something?
EDIT: using pip 21.0.1 (latest)

ModuleNotFoundError for pandas_datareader: Jupyter Notebook using different packages from conda environment

I am using Anaconda windows v5.3.
I am getting the error:
ModuleNotFoundError: No module named 'pandas_datareader'
When I tried to print out the packages used by Jupyter Notebook, I realized that pandas_datareader is not in, and a different version of pandas (0.23.0) is used:
import pkg_resources
for i in pkg_resources.working_set:
print(i)
Output
...
pandocfilters 1.4.2
pandas 0.23.0
packaging 17.1
openpyxl 2.5.3
...
This differs from the library installed in the pyfinance environment:
>conda list
# Name Version Build
pandas 0.20.3 py36_0
pandas-datareader 0.4.0 py36_0
Hence, pandas_datareader seem to work in the python shell in the command prompt, but not in jupyter notebook. Are there anyways to sync jupyter notebook environment to the conda environment?
I realized to sync jupyter notebook you just have to do:
conda install jupyter

mxnet installation: How to choose python version?

I installed mxnet in linux mint. I use anaconda for python 3.5. I followed the instruction and it was successfully installed. Both mxnet and the anaconda are latest version. However, when I tried the code:
import mxnet as mx
res = mx.nd.array([1,2,3])
I got the error:
AttributeError: module 'mxnet' has no attribute 'nd'
if I typed mx, I got: <module 'mxnet' (namespace)>
after repeating the installation and checking the scripts, I saw mxnet was installed under python 2.7, and graphviz is also under python 2.7. How can change them to python 3.5?
Working for MXNet python 3 is still in progress. Some functions are not fully tested yet.
At this time I suggest using python 2.7.
It should work in Python 3 environments.
I've installed MXNet in one easy set with pip3 in a python environment.
Everything works well.
Missing are some MXNet python API's advertised in the documentation, which are absent in the distribution and look absent in the current head of the repository as well.
So, I would not currently depend on the tutorial or example documentation -- they seem to be outdated or ahead of the repository. They cannot always guide you properly although in order to rescue yourself from particular situations reading the actual API documentation might help.
Anaconda Python 3.5 works fine for MXNet. See evidence below.
$ which python
/Users/username/anaconda3/bin/python
$ python --version
Python 3.5.2 :: Anaconda 4.2.0 (x86_64)
$ python
Python 3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul 2 2016, 17:52:12)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet as mx
>>> res = mx.nd.array([1,2,3])
>>> print(res)
<NDArray 3 #cpu(0)>
>>> print(res.asnumpy())
[ 1. 2. 3.]
>>> mx
<module 'mxnet' from '/Users/username/anaconda3/lib/python3.5/site-packages/mxnet-0.9.5-py3.5.egg/mxnet/__init__.py'>
The Python API documentation has been updated in newer releases. See: https://github.com/dmlc/mxnet/releases
When you use Anaconda3 with Python3 and MXNet, the installation process might get a bit cumbersome.
In my case, after following the installation steps and executing python setup.py install - I had to manually copy python/mxnet files into the ~/Anaconda3/Lib/site-packages/mxnet*../
Before I copied the files, I've seen the same error module 'mxnet' has no attribute 'nd'