automatically run %matplotlib inline in jupyter qtconsole - matplotlib

Is there a way to change the config file to make jupyter qtconsole run the following command on startup?:
%matplotlib inline

Add this line to the ipython_config.py file (not the ipython_qtconsole_config.py file):
c.InteractiveShellApp.matplotlib = 'inline'

In your ipython_config.py file you can specify commands to run at startup (including magic % commands) by setting c.InteractiveShellApp.exec_lines. For example,
c.InteractiveShellApp.exec_lines = """
%matplotlib inline
%autoreload 2
import your_favorite_module
""".split('\n')

Open the file ~/.ipython/profile_default/ipython_config.py, and
c.InteractiveShellApp.code_to_run = ''
==>
c.InteractiveShellApp.code_to_run = '%pylab inline'

Related

Testing a Jupyter Notebook

I am trying to come up with a method to test a number of Jupyter notebooks. A test should run when a new notebook is implemented in a Github branch and submitted for a pull request. The tests are not that complicated, they are mostly just testing if the notebook runs end-to-end and without any errors, and maybe a few asserts. However:
There are certain calls in some cells that need to be mocked, e.g. a call to download the data from a database.
There may be some magic cells in the notebooks which run a pip command or something else.
I am open to use any testing library, such as 'pytest' or unittest, although pytest is preferred.
I looked at a few libraries for testing notebooks such as nbmake, treon, and testbook, but I was unable to make them work. I also tried to convert the notebook to a python file, but the magic cells were converted to a get_ipython().run_cell_magic(...) call which became an issue, since pytest uses python and not ipython, and get_ipython() is only available in ipython.
So, I am wondering what is a good way to test jupyter notebooks with all of that in mind. Any help is appreciated.
One straightforward approach I've already used is to execute the entire notebook with nbconvert.
A notebook failed.ipynb raising an exception will result in a failed run thanks to the --execute option that tells nbconvert to execute the notebook prior to its conversion.
jupyter nbconvert --to notebook --execute failed.ipynb
# ...
# Exception: FAILED
echo $?
# 1
Another correct notebook passed.ipynb will result in a successful export.
jupyter nbconvert --to notebook --execute passed.ipynb
# [NbConvertApp] Converting notebook passed.ipynb to notebook
# [NbConvertApp] Writing 1172 bytes to passed.nbconvert.ipynb
echo $?
# 0
Cherry on the cake, you can do the same through the API and so wrap it in Pytest!
import nbformat
import pytest
from nbconvert.preprocessors import ExecutePreprocessor
#pytest.mark.parametrize("notebook", ["passed.ipynb", "failed.ipynb"])
def test_notebook_exec(notebook):
with open(notebook) as f:
nb = nbformat.read(f, as_version=4)
ep = ExecutePreprocessor(timeout=600, kernel_name='python3')
try:
assert ep.preprocess(nb) is not None, f"Got empty notebook for {notebook}"
except Exception:
assert False, f"Failed executing {notebook}"
Running the test gives.
pytest test_nbconv.py
# FAILED test_nbconv.py::test_notebook_exec[failed.ipynb] - AssertionError: Failed executing failed.ipynb
# PASSED test_nbconv.py::test_notebook_exec[passed.ipynb]
Notes
There is several output formats, I've used here notebook.
This doesn’t convert a notebook to a different format per se, instead it allows the running of nbconvert preprocessors on a notebook, and/or conversion to other notebook formats.
The python code example is just a quick draft it can be largely improved.
Here is my own solution using testbook. Let's say I have a notebook called my_notebook.ipynb with the following content:
The trick is to inject a cell before my call to bigquery.Client and mock it:
from testbook import testbook
#testbook('./my_notebook.ipynb')
def test_get_details(tb):
tb.inject(
"""
import mock
mock_client = mock.MagicMock()
mock_df = pd.DataFrame()
mock_df['week'] = range(10)
mock_df['count'] = 5
p1 = mock.patch.object(bigquery, 'Client', return_value=mock_client)
mock_client.query().result().to_dataframe.return_value = mock_df
p1.start()
""",
before=2,
run=False
)
tb.execute()
dataframe = tb.get('dataframe')
assert dataframe.shape == (10, 2)
x = tb.get('x')
assert x == 7

Spark: How to debug pandas-UDF in VS Code

I'm looking for a way to debug spark pandas UDF in vscode and Pycharm Community version (place breakpoint and stop inside UDF). At the moment when breakpoint is placed inside UDF debugger doesn't stop.
In the reference below there is described Local mode and Distributed mode.
I'm trying at least to debug in Local mode. Pycharm/VS Code there should be a way to debug local enc by "Attach to Local Process". Just I can not figure out how.
At the moment I can not find any answer how to attach pyspark debugger to local process inside UDF in VS Code(my dev ide).
I found only examples below in Pycharm.
Attache to local process How can PySpark be called in debug mode?
When I try to attach to process I'm getting message below in Pycharm. In VS Code I'm getting msg that process can not be attached.
Attaching to a process with PID=33,692
/home/usr_name/anaconda3/envs/yf/bin/python3.8 /snap/pycharm-community/223/plugins/python-ce/helpers/pydev/pydevd_attach_to_process/attach_pydevd.py --port 40717 --pid 33692
WARNING: The 'kernel.yama.ptrace_scope' parameter value is not 0, attach to process may not work correctly.
Please run 'sudo sysctl kernel.yama.ptrace_scope=0' to change the value temporary
or add the 'kernel.yama.ptrace_scope = 0' line to /etc/sysctl.d/10-ptrace.conf to set it permanently.
Process finished with exit code 0
Server stopped.
pyspark_xray https://github.com/bradyjiang/pyspark_xray
With this package, it is possible to debug rdds running on worker, but I was not able to adjust package to debug UDFs
Example code, breakpoint doesn't stop inside UDF pandas_function(url_json):
import pandas as pd
import pyspark
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, IntegerType,StringType
spark = pyspark.sql.SparkSession.builder.appName("test") \
.master('local[*]') \
.getOrCreate()
sc = spark.sparkContext
# Create initial dataframe respond_sdf
d_list = [('api_1',"{'api': ['api_1', 'api_1', 'api_1'],'A': [1,2,3], 'B': [4,5,6] }"),
(' api_2', "{'api': ['api_2', 'api_2', 'api_2'],'A': [7,8,9], 'B': [10,11,12] }")]
schema = StructType([
StructField('url', StringType(), True),
StructField('content', StringType(), True)
])
jsons = sc.parallelize(rdd_list)
respond_sdf = spark.createDataFrame(jsons, schema)
# Pandas UDF
def pandas_function(url_json):
# Here I want to place breakpoint
df = pd.DataFrame(eval(url_json['content'][0]))
return df
# Pnadas UDF transformation applied to respond_sdf
respond_sdf.groupby(F.monotonically_increasing_id()).applyInPandas(pandas_function, schema=schema).show()
This example demonstrates how to use excellent pyspark_exray library to step into UDF functions passed into Dataframe.mapInPandas function
https://github.com/bradyjiang/pyspark_xray/blob/master/demo_app02/driver.py

Passing commandline argument in google colab

How to pass commandline argument when running a python code in google colab?
I have written a code which takes a file as input via sys.argv[]. How do I do this?
As far as I know, there is no special way to pass command line arguments to python code. This is a working code sample I use to when creating tfrecords.
!python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record --image_dir=images/
I don't see any difference between the regular command line python argument passing and the colab. Please add more code to your question to get better help.
I tried this in a google colab notebook
import sys
sys.argv[0] = "first_arg" # this is to assign the first command line argument
sys.argv[1] = "second_arg" # This line to assign the second arg for example
And it worked for me.
So if you want to run a python code that works like this:
!python test.py --image_folder '/content/image' --workers 2 --Prediction CTC --rgb True
You have to open test.py or your file with editor then you will find line inside the file similer like this:
parser = argparse.ArgumentParser()
parser.add_argument('--image_folder', required=True, help='path to image_folder')
parser.add_argument('--workers', type=int, default=1, help='number of workers')
parser.add_argument('--Prediction', type=str, default='CTC', help='Prediction stage.')
parser.add_argument('--rgb', action='store_true', help='use rgb input')
args = parser.parse_args()
But this will give you " Error SystemExit: 2 "
Then you have to change like this:
parser = argparse.ArgumentParser()
parser.add_argument('--image_folder', required=False, default='/content/image', help='path to image_folder')
parser.add_argument('--workers', type=int, default=2, help='number of workers')
parser.add_argument('--Prediction', type=str, default='CTC', help='Prediction stage.')
parser.add_argument('--rgb', action='store_false', help='use rgb input')
parser.add_argument("-f", "--file", required=False)
args = parser.parse_args()
You must add in the end of " parser.add_argument " line:
parser.add_argument("-f", "--file", required=False)
Then you can call commandline argument like this:
image = args.image_path
Or
img = Image.open(args.image_path)
workers = args.workers
But if your last line like this:
args = vars(ap.parse_args())
Then you have to call it like this:
image = args["image_path"]
Or
img = Image.open(args["image_path"])
workers = args["workers"]
#Note ( action='store_false' ) will default to ( False )
Likewise, ( action='store_false' ) will default to ( True )
Tested with Google colab
I made a bioinformatic tool locally in my machine to parse Uniprot big data files of proteins.
The tool I made needs the passing of different parameters using command line arguments. After the tool was working locally, I upload data files and python source files to my google drive.
I did not make any changes to my files. I just run directly the following command in google colab:
!python3 drive/MyDrive/uniprot/uniprot_select.py FIELDS "ID,OS,SQ" FROM drive/MyDrive/data/uniprot.dat WHERE "SQ#EYDRRR" FASTA
It works perfectly!
No need of special parsing, no need to additional imports. All the work you normally do locally in your machine, can be executed without changes.

Having problems declaring SUMO_HOME

I'm trying to run a test python code to use the traci library and it is returning "please declare environment SUMO_HOME".
I'm on Ubuntu 18.4.2 and Sumo 0.32.0.I solved this problem before by running
export SUMO_HOME=/home/gustavo/Downloads/sumo-0.32.0/tools/
,but this time it couldn't solve the problem. So I tried implementing a line inside the python file using the os library giving the same command but from the code itself:
os.system("export SUMO_HOME=/home/gustavo/Downloads/sumo-0.32.0/tool/")
And it also didn't work, so came here to ask for help. May any of you help me, please?
import os
import sys
import optparse
os.system("export SUMO_HOME=/home/gustavo/Downloads/sumo-0.32.0/tool/")
# we need to import some python modules from the $SUMO_HOME/tools directory
if 'SUMO_HOME' in os.environ:
tools = os.path.join(os.environ['SUMO_HOME=/home/gustavo/Downloads/sumo-0.32.0/tools/'], 'tools')
sys.path.append(tools)
else:
sys.exit("please declare environment variable 'SUMO_HOME'")
from sumolib import checkBinary # Checks for the binary in environ vars
import traci
def get_options():
opt_parser = optparse.OptionParser()
opt_parser.add_option("--nogui", action="store_true",
default=False, help="run the commandline version of sumo")
options, args = opt_parser.parse_args()
return options
# contains TraCI control loop
def run():
step = 0
while traci.simulation.getMinExpectedNumber() > 0:
traci.simulationStep()
print(step)
step += 1
traci.close()
sys.stdout.flush()
# main entry point
if __name__ == "__main__":
options = get_options()
# check binary
if options.nogui:
sumoBinary = checkBinary('sumo')
else:
sumoBinary = checkBinary('sumo-gui')
# traci starts sumo as a subprocess and then this script connects and runs
traci.start([sumoBinary, "-c", "demo.sumocfg",
"--tripinfo-output", "tripinfo.xml"])
run()
I expected for the steps to appear on the terminal.
The correct location is probably
export SUMO_HOME=/home/gustavo/Downloads/sumo-0.32.0
without the tools or tool suffix. It will not work from inside the python script with os.system but you could modify os.environ directly.
Furthermore you mixed up the call to os.environ in the script. It should read:
tools = os.path.join(os.environ['SUMO_HOME'], 'tools')
I swapped the if else part for another code :
try:
sys.path.append("/home/gustavo/Downloads/sumo-0.32.0/tools")
from sumolib import checkBinary
except ImportError:
sys.exit("please declare environment variable 'SUMO_HOME' as the root directory of your sumo installation (it should contain folders 'bin', 'tools' and 'docs')")
It solved the problem

matplotlib configuration for inline backend in jupyter notebook

I'd like to learn how to configure the defaults for matplotlib using the inline backend in jupyter notebook. Specifically, I'd like to set default 'figure.figsize’ to [7.5, 5.0] instead of the default [6.0, 4.0]. I’m using jupyter notebook 1.1 on a Mac with matplotlib 1.4.3.
In the notebook, using the macosx backend, my matplotlibrc file is shown to be in the standard location, and figsize is set as specified in matplotlibrc:
In [1]: %matplotlib
Using matplotlib backend: MacOSX
In [2]: mpl.matplotlib_fname()
Out[2]: u'/Users/scott/.matplotlib/matplotlibrc'
In [3]: matplotlib.rcParams['figure.figsize']
Out[3]:[7.5, 5.0]
However, when I use the inline backend, figsize is set differently:
In [1]: %matplotlib inline
In [2]: mpl.matplotlib_fname()
Out[2]: u'/Users/scott/.matplotlib/matplotlibrc'
In [3]: matplotlib.rcParams['figure.figsize']
Out[3]:[6.0, 4.0]
In my notebook config file, ~/.jupyter/jupyter_notebook_config.py, I also added the line
c.InlineBackend.rc = {'figure.figsize': (7.5, 5.0) }
but this had no effect either. For now I’m stuck adding this line in every notebook:
matplotlib.rcParams['figure.figsize']=[7.5, 5.0]
Is there any way to set the default for the inline backend?
The Jupyter/IPython split is confusing. Jupyter is the front end to kernels, of which IPython is the defacto Python kernel. You are trying to change something related to matplotlib and this only makes sense within the scope of the IPython kernel. Making a change to matplotlib in ~/.jupyter/jupyter_notebook_config.py would apply to all kernels which may not make sense (in the case of running a Ruby/R/Bash/etc. kernel which doesn't use matplotlib). Therefore, your c.InlineBackend.rc setting needs to go in the settings for the IPython kernel.
Edit the file ~/.ipython/profile_default/ipython_kernel_config.py and add to the bottom: c.InlineBackend.rc = { }.
Since c.InlineBackend.rc specifies matplotlib config overrides, the blank dict tells the IPython kernel not to override any of your .matplotlibrc settings.
If the file doesn't exist, run ipython profile create to create it.
Using Jupyter on windows at least, I was able to do it using something very much like venkat's answer, i.e.:
%matplotlib inline
import matplotlib
matplotlib.rcParams['figure.figsize'] = (8, 8)
I did this to square the circle, which had been rather eliptical up to that point. See, squaring the circle is not that hard. :)
Note that the path of ipython_kernel_config.py differs if you run ipython from a virtual environment. In that case, dig in the path where the environment is stored.
Use figsize(width,height) in the top cell and it changes width of following plots
For jupyter 5.x and above with IPython kernels, you can just override particular keys and leave the rest by putting things like this, with your desired figsize in your ~/.ipython/profile_default/ipython_kernel_config.py:
c = get_config()
c.InlineBackend.rc.update({"figure.figsize": (12, 10)})