NameError: name 'dbutils' is not defined in pyspark - apache-spark-sql

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands like,
#mount azure blob to dbfs location
dbutils.fs.mount (source="...",mount_point="/mnt/...",extra_configs="{key:value}")
I am also trying to unmount once the files has been written to the mount directory. But, when i am using dbutils directly in the pyspark job it is failing with
NameError: name 'dbutils' is not defined
Should i import any of the package to use dbutils in pyspark code ? Thanks in advance.

Try to use this:
def get_dbutils(spark):
try:
from pyspark.dbutils import DBUtils
dbutils = DBUtils(spark)
except ImportError:
import IPython
dbutils = IPython.get_ipython().user_ns["dbutils"]
return dbutils
dbutils = get_dbutils(spark)

To access the DBUtils module in a way that works both locally and in Azure Databricks clusters, on Python, use the following get_dbutils():
def get_dbutils(spark):
try:
from pyspark.dbutils import DBUtils
dbutils = DBUtils(spark)
except ImportError:
import IPython
dbutils = IPython.get_ipython().user_ns["dbutils"]
return dbutils
See: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect

yes!
You could use this:
pip install DBUtils
import DBUtils

Related

"no module named numpy.core" error after installing numpy in a layer in AWS Lambda

When I deployed my serverless FastAPI with AWS Lambda and API Gateway I got the following error:
No module named numpy.core. Unable to import module 'path/to/handler': Unable to import required dependencies: numpy
And I also got this:
The python version is: Python3.7 from "var/lang/bin/python3.7" , the numpy version is: "1.21.6"
I have installed the numpy module in a layer and included the path to the directory where the module is installed in the serverless.yml file and I still experience the "no module named numpy.core" error.
My serverless.yml file looks like this:
layers:
something:
compatibleRuntimes:
- python3.8
- python3.7
- python3.6
path: 'path/to/layer'
I also tried without the compatibleRuntimes block, and I still got the same error.
I think that since the numpy module is installed and the correct version is being used, it is likely that the issue is not related to the numpy module itself.
And, actually, in my code, what I import is the pandas module, but when I install pandas, it also installs numpy.
import pandas as pd
# functions
Any advice or tip? I have been stuck in this for a long time. Thank you in advance!

ImportError: cannot import name 'IEX_CLOUD_API_TOKEN' from 'secrets'

I'm facing the above error when I try to execute the following code:
from secrets import IEX_CLOUD_API_TOKEN
I tried using '.secrets' instead of 'secrets' but that gives the ModuleNotFoundError.
It's not a problem with the secrets library or anything of such kind.
It's just that after uploading secrets.py to the working directory, restart the kernel in your jupyter notebook.
Code shall run !!

ModuleNotFoundError: No module named 'pandas.io' for json_normalize

Please read carefully. In my Python script I have the following:
import json
import pandas
from pandas.io.json import json_normalize
and it returns the following error:
from pandas.io.json import json_normalize ModuleNotFoundError: No
module named 'pandas.io'; 'pandas' is not a package
My steps:
I have uninstalled and installed Pandas
I have upgraded pip and pandas
I have installed io (pip install -U pandas.io)
I have installed data_reader and replaced the pandas.io.json part with that: from pandas_datareader import json_normalize
I have tried every solution I saw on stackoverflow and github and nothing worked. The only one I have not tried is installing Anaconda but it should work with what I tried before. Do you think it is a Windows setting things I must change?
PS: My Python version is 3.7.4
Try:
Go to ...\Lib\site-packages\pytrends on your local disk and open file request.py
Change
from pandas.io.json._normalize import nested_to_record
to
from pandas.io.json.normalize import nested_to_record
I had the same error, but it helped me.
also change
from pandas.io.json.normalize
to
from pandas.io.json._normalize
The cause of the problem was the fact that the python file had the name pandas. The filename was pandas.py. After renaming it, the code worked normally without errors.
i had same problem and i solve it b uninstalling extra python versions install on my windows.now i have only one python installed by anaconda,and everything is working perfectly

No module named numpy when spark-submitting

I’m spark-submitting a python file that imports numpy but I’m getting a no module named numpy error.
$ spark-submit --py-files projects/other_requirements.egg projects/jobs/my_numpy_als.py
Traceback (most recent call last):
File "/usr/local/www/my_numpy_als.py", line 13, in <module>
from pyspark.mllib.recommendation import ALS
File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 24, in <module>
import numpy
ImportError: No module named numpy
I was thinking I would pull in an egg for numpy —python-files, but I'm having trouble figuring out how to build that egg. But then it occurred to me that pyspark itself uses numpy. It would be silly to pull in my own version of numpy.
Any idea on the appropriate thing to do here?
It looks like Spark is using a version of Python that does not have numpy installed. It could be because you are working inside a virtual environment.
Try this:
# The following is for specifying a Python version for PySpark. Here we
# use the currently calling Python version.
# This is handy for when we are using a virtualenv, for example, because
# otherwise Spark would choose the default system Python version.
os.environ['PYSPARK_PYTHON'] = sys.executable
I got this to work by installing numpy on all the emr-nodes by configuring a small bootstrapping script that contains the following (among other things).
#!/bin/bash -xe
sudo yum install python-numpy python-scipy -y
Then configure the bootstrap script to be executed when you start your cluster by adding the following option to the aws emr command (the following example gives an argument to the bootstrap script)
--bootstrap-actions Path=s3://some-bucket/keylocation/bootstrap.sh,Name=setup_dependencies,Args=[s3://some-bucket]
This can be used when setting up a cluster automatically from DataPipeline as well.
Sometimes, when you import certain libraries, your namespace is polluted with numpy functions. Functions such as min, max and sum are especially prone to this pollution. Whenever in doubt, locate calls to these functions and replace these calls with __builtin__.sum etc. Doing so will sometimes be faster than locating the pollution source.
Make sure your spark-env.sh has PYSPARK_PATH pointing to the correct Python release. Add export PYSPARK_PATH=/your_python_exe_path to /conf/spark-env.sh file.

pyttsx: No module named 'engine'

I'm trying to install TTS package by using this. Everything was okay until I tried to execute the following command:
import pyttsx
I got back this error:
File "/usr/local/lib/python3.4/dist-packages/pyttsx/__init__.py", line 18, in module <br>
from engine import Engine<br>
ImportError: No module named 'engine'
Any help would be appreciated. Thank you!
Guys there is an updated package compatible with Python3 :
pyttsx3
Works offline with no delay in the sound produced.
Installation:
pip install pyttsx3
Visit https://pyttsx3.readthedocs.io for the full usage docs.
Thanks!
Combining the advice from Jacob Tsui and Jokhongir Mamarasulov worked for me. To summarize:
In site_packages/pyttsx/init.py, modify "from engine import Engine" to
from .engine import Engine
Then, in site_packages/pyttsx/engine.py,
Modify import driver to
from . import driver
Modify except Exception, e to
except Exception as e
And finally, in site_packages/pyttsx/driver.py modify except Exception, e to
except Exception as e
See the responses from the aforementioned authors for the rationale behind these changes.
I just had the same problem, try using pyttsx3 instead of pyttsx
First install pyttsx3
pip install pyttsx3
and change the
import pyttsx
for
import pyttsx3
After that you have to change engine import (if you're using it in your main .py file). Use engineio instead. Install it
pip install python-engineio
then change import engine for import engineio and change your variables.
Here's an example
import pyttsx3
# import engineio #engineio module is not needed.
engineio = pyttsx3.init()
voices = engineio.getProperty('voices')
engineio.setProperty('rate', 130) # Aquí puedes seleccionar la velocidad de la voz
engineio.setProperty('voice',voices[0].id)
def speak(text):
engineio.say(text)
engineio.runAndWait()
speak("What do you want me to say?")
while(1):
phrase = input("--> ")
if (phrase == "exit"):
exit(0)
speak(phrase)
print(voices)
Hope this helps someone
For Python3, please install the latest version via pip3 install pyttsx3 and call import pyttsx3.
I found out the solution. Library was created in python2 language and there are not a lot of differences between those 2 versions, but exclusively in this case that occurs.
Move to your DP folder and change in engine.py "except Exception as e" instead of "except Exception, e", line 67. Do the same in drive.py, line 105.
Because of files are secured try to execute, e. g.
sudo nano engine.py (or drive.py)
I guess I helped everyone with that kind of problem. :)
Modify site_packages/pyttsx/init.py "from engine import Engine" to
from .engine import Engine
Modify site_packages/pyttsx/engine.py "import driver" to
from . import driver
Reason: The import statement "from engine import Engine" tells python to import Engine module from directory engine. In our case engine is not a directory, it's a python file, engine.py. So we need to tell python to import this engine module from current directory (".").
I used this code after
pip install pywin32 pypiwin32 pyttsx3
and it worked perfectly for me
import os
import sys
import pyttsx3
engine = pyttsx3.init()
engine.say('hello world ')
engine.runAndWait()
I had the same issue.
First Try this command:
pip install pyttsx3
and then don't use
import pyttsx
use this
import pyttsx3
It will work.
pyttsx: No module named 'engine'
File "/usr/local/lib/python3.4/dist-packages/pyttsx/__init__.py", line 18, in module <br>
from engine import Engine<br>
ImportError: No module named 'engine'
If the above one is your error then try install pyttsx3 instead of
pyttsx.
Before installing check your python version, then download the version which is compatible to your python version.
Refer this link to get the previous versions of pyttsx3
REASON:
The reason we get the above error is because of the pyttsx3 version
which is not supported by your python version. Even if you get the error then
FOR pyttsx
Modify the init.py file located in
C:\Users\YOUR USER\AppData\Local\Programs\Python\Python38-32\Lib\site_packages\pyttsx\init.py
Change
from engine import Engine
to
from .engine import Engine
pyttsx
Modify the engine.py file located at C:\Users\YOUR USER\AppData\Local\Programs\Python\Python38-32\Lib\site_packages\pyttsx\engine.py
Change
import driver
to
from . import driver
These are the two main solutions for the above error