Can't build spark py-files with pandas included - pandas

I am attempting to package up my dependencies for a spark program I am creating. I have a requirements.txt file as below
pandas
I then run
pip3 install -t dependencies -r requirements.txt
cd dependencies
zip -r ../dependencies.zip .
pyspark --py-files dependencies.zip
And run the line -
import pandas
And I get the error -
Traceback (most recent call last):
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/__init__.py", line 31, in <module>
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/_libs/__init__.py", line 3, in <module>
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/_libs/tslibs/__init__.py", line 3, in <module>
ModuleNotFoundError: No module named 'pandas._libs.tslibs.conversion'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/__init__.py", line 36, in <module>
ImportError: C extension: No module named 'pandas._libs.tslibs.conversion' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
Any ideas on how to fix this?

In order to ship dependency on the worker, there are two ways one is exactly what you did, zip the file, or simple py file then use --py-file. The problem you encountered is because of missing C dependency on the worker side. Pkg like NumPy/pandas all have c dependency.
In order to solve this, create the virtualenv, and zip the virtualenv including
the python executable
PYSPARK_DRIVER_PYTHON = <path to current working python>
PYSPARK_PYTHON = './venv/<path to python executable>'
pyspark --archives = <path to zip file>#venv
or follow this link

Related

Conda with Python3.9 using numpy in Python3.10

I'm trying to install statsmodels in Oracle Machine Learning in Conda enviroment.
My conda version is:
%conda
info
active environment : None
shell level : 0
user config file : /u01/.condarc
populated config files : /usr/share/conda/condarc.d/defaults.yaml
/u01/.condarc
conda version : 4.6.14
conda-build version : not installed
python version : 3.6.8.final.0
base environment : /usr (read only)
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/free/linux-64
https://repo.anaconda.com/pkgs/free/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /u01/.conda/pkgs
/var/cache/conda/pkgs
envs directories : /u01/.conda/envs
/usr/envs
platform : linux-64
user-agent : conda/4.6.14 requests/2.27.1 CPython/3.6.8 Linux/5.4.17-2136.314.6.3.el7uek.x86_64 oracle/7.9 glibc/2.17
UID:GID : 65000:65000
netrc file : None
offline mode : False
I created the conda enviroment with the next command:
%conda
create -n arima_enviroment python=3.9 xz sqlite libuuid statsmodels numpy
I activated the enviroment with:
%conda
activate arima_enviroment
Test the enviroment with:
%python
import sys
import platform
print("sys.version:", sys.version)
print("sys.version_info:", sys.version_info)
print("platform.python_version:", platform.python_version())
sys.version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0]
sys.version_info: sys.version_info(major=3, minor=9, micro=12, releaselevel='final', serial=0)
platform.python_version: 3.9.12
Then I execute the next command for import the ARIMA model.
%python
from statsmodels.tsa.arima_model import arima
But give me the next error:
Fail to execute line 2: from statsmodels.tsa.arima_model import arima
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/numpy/core/__init__.py", line 23, in <module>
from . import multiarray
File "/usr/local/lib/python3.10/site-packages/numpy/core/multiarray.py", line 10, in <module>
from . import overrides
File "/usr/local/lib/python3.10/site-packages/numpy/core/overrides.py", line 6, in <module>
from numpy.core._multiarray_umath import (
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/1675189382222-0/zeppelin_python.py", line 206, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 2, in <module>
File "/u01/.conda/active_env/lib/python3.9/site-packages/statsmodels/tsa/__init__.py", line 1, in <module>
from statsmodels.tools._testing import PytestTester
File "/u01/.conda/active_env/lib/python3.9/site-packages/statsmodels/tools/__init__.py", line 1, in <module>
from .tools import add_constant, categorical
File "/u01/.conda/active_env/lib/python3.9/site-packages/statsmodels/tools/tools.py", line 4, in <module>
import numpy as np
File "/usr/local/lib/python3.10/site-packages/numpy/__init__.py", line 144, in <module>
from . import core
File "/usr/local/lib/python3.10/site-packages/numpy/core/__init__.py", line 49, in <module>
raise ImportError(msg)
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.9 from "/u01/.conda/active_env/bin/python3"
* The NumPy version is: "1.22.1"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: No module named 'numpy.core._multiarray_umath'
Why conda is using Numpy in Python 3.10 folder and not the Numpy version installed in Python 3.9? How can repair it?

E: Package 'python3-distutils' has no installation candidate

I am currently using the Google core dev board and using Putty to enter the code.
While performing the code, I got the error as below.
Traceback (most recent call last):
File "tools/generate_detections.py", line 7, in <module>
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'
So I typed sudo pip install tensorflow.
but I got another error as below.
Traceback (most recent call last):
File "/home/mendel/.local/bin/pip", line 6, in <module>
from pip._internal.cli.main import main
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/main.py", line 9, in <module>
from pip._internal.cli.autocompletion import autocomplete
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/autocompletion.py", line 10, in <module>
from pip._internal.cli.main_parser import create_main_parser
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/main_parser.py", line 8, in <module>
from pip._internal.cli import cmdoptions
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/cmdoptions.py", line 23, in <module>
from pip._internal.cli.parser import ConfigOptionParser
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/cli/parser.py", line 12, in <module>
from pip._internal.configuration import Configuration, ConfigurationError
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/configuration.py", line 27, in <module>
from pip._internal.utils.misc import ensure_dir, enum
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/utils/misc.py", line 42, in <module>
from pip._internal.locations import get_major_minor_version, site_packages, user_site
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/locations/__init__.py", line 14, in <module>
from . import _distutils, _sysconfig
File "/usr/local/lib/python3.7/dist-packages/pip/_internal/locations/_distutils.py", line 9, in <module>
from distutils.cmd import Command as DistutilsCommand
ModuleNotFoundError: No module named 'distutils.cmd'
So I typed sudo apt install python3-distutils.
but I got another error as below.
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package python3-distutils is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
libpython3.7-stdlib
E: Package 'python3-distutils' has no installation candidate
I can no longer find a solution.
Is there anyone who can help me?
sudo apt install python3-distutils
Just install in your Ubuntu terminal after you can install other packages!

ImportError: liblapack.so.3gf: cannot open shared object file: No such file or directory

I have both python 2.7.16 and python 3.5.2 installed on my ubuntu 16.04 LTS and numpy used to work well on both of them but recently something went wrong with numpy on python 2 while it still running smoothly on python 3
I have tried to uninstall and install the numpy package and this did not fix anything
The exact stdout I am getting is as follow:
python -c "import numpy as np"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ahmed/.local/lib/python2.7/site-packages/numpy/__init__.py", line 153, in <module>
from . import add_newdocs
File "/home/ahmed/.local/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in <module>
from numpy.lib import add_newdoc
File "/home/ahmed/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", line 18, in <module>
from .polynomial import *
File "/home/ahmed/.local/lib/python2.7/site-packages/numpy/lib/polynomial.py", line 19, in <module>
from numpy.linalg import eigvals, lstsq, inv
File "/home/ahmed/.local/lib/python2.7/site-packages/numpy/linalg/__init__.py", line 50, in <module>
from .linalg import *
File "/home/ahmed/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 29, in <module>
from numpy.linalg import lapack_lite, _umath_linalg
ImportError: liblapack.so.3gf: cannot open shared object file: No such file or directory
Indeed Linux error while loading shared libraries: cannot open shared object file: No such file or directory resolved the problem. I had have only to do the following:
sudo find / -iname liblapack.so.3gf
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:*path_to_lib_as_found*
sudo ldconfig -v
Thanks to Abdur Rehman for the hint
Cheers.
Does your Ubuntu have the latest lapack library? Try installing sudo apt install liblapack3gf liblapack-dev. Sometimes you need to run sudo ldconfig after the install.

aiy.led and aiy.board modules do not exist

I am trying to import a module to control the LED in the button of the voice AIY. I have version 2 of the kit. Using both the v2 instructions and the v1 instructions, I get an ImportError when trying to import the packages. The v2 instructions say to run from aiy.leds import Leds, Color, and the v1 instructions say to use from aiy.board import Board, Led
Am I incorrectly trying to import the module, or missing it entirely? If it is missing, can I download the necessary module, or do I have to re-flash the image entirely?
pi#raspberrypi:/opt/aiy/projects-python/src $ python3 main.py
Importing packages...
Importing LED...
Traceback (most recent call last):
File "main.py", line 454, in <module>
from aiy.board import Board, Led
ImportError: No module named 'aiy.board'
pi#raspberrypi:/opt/aiy/projects-python/src $ sudo nano main.py
pi#raspberrypi:/opt/aiy/projects-python/src $ python3 main.py
Importing packages...
Importing LED...
Traceback (most recent call last):
File "main.py", line 458, in <module>
from aiy.leds import Leds, Color
ImportError: No module named 'aiy.leds'
Since you are in the "src" directory check if directory "aiy" is present there.
If not, that's the problem. If yes, check that "board.py and leds.py and init.py are present within "aiy".
If not, you need to install those with apt-get install commands. See
https://github.com/google/aiyprojects-raspbian/blob/v20181116/HACKING.md
for additional details. The Google repo must be present to find these.

Trouble importing Pandas

I am using the Anaconda distribution on Win7 - when I run python through the powershell I can import pandas and numpy without issue. However, when I run it through sublime (as i do on all my other machines by using a build system that targets the python executable), I get ImportError: No module named builtins
Here is the full detail:
No module named builtins
Traceback (most recent call last):
File "C:\Users\jarjarbinks\Sublime Text Build 3065\test.py", line 3, in <module>
import pandas as pd
File "C:\Users\jarjarbinks\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\__init__.py", line 6, in <module>
from . import hashtable, tslib, lib
File "tslib.pyx", line 40, in init pandas.tslib (pandas\tslib.c:63148)
File "C:\Users\jarjarbinks\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\compat\__init__.py", line 51, in <module>
import builtins
ImportError: No module named builtins
[Finished in 0.3s with exit code 1]
[cmd: ['C:/Users/jarjarbinks/AppData/Local/Continuum/Anaconda/python.exe', 'C:\\Users\\jarjarbinks\\Sublime Text Build 3065\\test.py']]
[dir: C:\Users\jarjarbinks\Sublime Text Build 3065]
[path: C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\QuickTime\QTSystem\;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Hyland\Web ActiveX\;C:\Users\jarjarbinks\AppData\Local\Continuum\Anaconda;C:\Users\jarjarbinks\AppData\Local\Continuum\Anaconda\Scripts]
No clue on this one, any help would be greatly appreciated.