SQL cells with R kernel in Jupyter - sql

In a Jupyter notebook you can use an R kernel through IRkernel. It works fine, but with a Python kernel you can mix in other languages like SQL using ipython-sql. This is possible for R in Rmarkdown, where you can have SQL chunks and hand the result off to an R variable for further analysis.
Question:
Is it possible to have SQL cells in Jupyter with an R kernel, like it is for a Python kernel in Jupyter and for R in Rmarkdown?

Related

Testing a Jupyter Notebook

I am trying to come up with a method to test a number of Jupyter notebooks. A test should run when a new notebook is implemented in a Github branch and submitted for a pull request. The tests are not that complicated, they are mostly just testing if the notebook runs end-to-end and without any errors, and maybe a few asserts. However:
There are certain calls in some cells that need to be mocked, e.g. a call to download the data from a database.
There may be some magic cells in the notebooks which run a pip command or something else.
I am open to use any testing library, such as 'pytest' or unittest, although pytest is preferred.
I looked at a few libraries for testing notebooks such as nbmake, treon, and testbook, but I was unable to make them work. I also tried to convert the notebook to a python file, but the magic cells were converted to a get_ipython().run_cell_magic(...) call which became an issue, since pytest uses python and not ipython, and get_ipython() is only available in ipython.
So, I am wondering what is a good way to test jupyter notebooks with all of that in mind. Any help is appreciated.
One straightforward approach I've already used is to execute the entire notebook with nbconvert.
A notebook failed.ipynb raising an exception will result in a failed run thanks to the --execute option that tells nbconvert to execute the notebook prior to its conversion.
jupyter nbconvert --to notebook --execute failed.ipynb
# ...
# Exception: FAILED
echo $?
# 1
Another correct notebook passed.ipynb will result in a successful export.
jupyter nbconvert --to notebook --execute passed.ipynb
# [NbConvertApp] Converting notebook passed.ipynb to notebook
# [NbConvertApp] Writing 1172 bytes to passed.nbconvert.ipynb
echo $?
# 0
Cherry on the cake, you can do the same through the API and so wrap it in Pytest!
import nbformat
import pytest
from nbconvert.preprocessors import ExecutePreprocessor
#pytest.mark.parametrize("notebook", ["passed.ipynb", "failed.ipynb"])
def test_notebook_exec(notebook):
with open(notebook) as f:
nb = nbformat.read(f, as_version=4)
ep = ExecutePreprocessor(timeout=600, kernel_name='python3')
try:
assert ep.preprocess(nb) is not None, f"Got empty notebook for {notebook}"
except Exception:
assert False, f"Failed executing {notebook}"
Running the test gives.
pytest test_nbconv.py
# FAILED test_nbconv.py::test_notebook_exec[failed.ipynb] - AssertionError: Failed executing failed.ipynb
# PASSED test_nbconv.py::test_notebook_exec[passed.ipynb]
Notes
There is several output formats, I've used here notebook.
This doesn’t convert a notebook to a different format per se, instead it allows the running of nbconvert preprocessors on a notebook, and/or conversion to other notebook formats.
The python code example is just a quick draft it can be largely improved.
Here is my own solution using testbook. Let's say I have a notebook called my_notebook.ipynb with the following content:
The trick is to inject a cell before my call to bigquery.Client and mock it:
from testbook import testbook
#testbook('./my_notebook.ipynb')
def test_get_details(tb):
tb.inject(
"""
import mock
mock_client = mock.MagicMock()
mock_df = pd.DataFrame()
mock_df['week'] = range(10)
mock_df['count'] = 5
p1 = mock.patch.object(bigquery, 'Client', return_value=mock_client)
mock_client.query().result().to_dataframe.return_value = mock_df
p1.start()
""",
before=2,
run=False
)
tb.execute()
dataframe = tb.get('dataframe')
assert dataframe.shape == (10, 2)
x = tb.get('x')
assert x == 7

Problem evaluating iterated integral in SymPy

I'm teaching a course in Multivariate Calculus and decided to convert my notes from Sage to Jupyter using SymPy. I have rewritten nearly all my notes as Jupyter Notebooks and am very impressed how I can use multiple cells like Mathematica and I can use MarkDown cells with LaTeX as well as all the great features of matplotlib, NumPy and SymPy.
I'm nearly done converting my sagelets to Python scripts on Colab and found a discrepancy.
This Sage code resolves as pi:
integral(integral(integral(1, z, x^2+y^2, 2-x^2-y^2),
y, -sqrt(1-x^2), sqrt(1-x^2)),
x, -1, 1)
but this this SymPy code resolves as -pi/2:
Integral(1,
(z, x**2+y**2, 2-x**2-y**2),
(y, -sqrt(1-x**2), sqrt(1-x**2)),
(x, -1, 1)
).doit()
See #3 in this Jupyter Notebook:
https://colab.research.google.com/drive/1OlT9nfPG8TzoR_WpDavx-SAa07HLg3hV?usp=sharing
Shouldn't these be equal? What am I missing? Any help would be greatly appreciated as I've done A LOT of work on this course using SymPy and would like to use it in class this summer session!
Please help,
A. Jorge Garcia
Applied Math & CS
Nassau Community College
http://shadowfaxrant.blogspot.com
PS: Here's the SageCell version,
https://sagecell.sagemath.org/?z=eJzFk8FuwjAMhu9IvIMFB9qRTk1A07TrtE07cNuuSBkEGlGS4qRA-_RLQ1uQuDCGtlMsx_ns_5fTn-AbsJg-3scPoxg-J89sDB8os1TAu7JiiTw13U6fhvCy5WnOrQCu5v4OMxT2CWRdFpwHlJQkJiwkhTtGIdm7Yxx2OybRu6B3U2jPYbccg0FBykHT4seUdrYK52SzX0xIo-KArZvQo_DbYnsXyz1_e5A5unqe_ZQNiykjLHJR5KIKHJkN2oBWqZCcxFXviJ4a8deNL7XqOvrBzHEIr9JpsYmANTcG9MLHuZIWZvmXcNJ8YiHRWNAzy5WFXSIUYKJBGshQZxqt1IqnYLUvNpuco2hYc2ncq5ljoF77jEa5lKo19j-HaL_i6pKPuLoLareHZWWs39NmPf2yOmO_Adu6eJ4=&lang=sage

pycharm ignores command "integ()" from numpy.polynomial import Polynomialfunction / on jupyter it works

When using pycharm the integ() function is ignored:
from numpy.polynomial import Polynomial as P
p = P([1, 2, 3])
p.integ()
print(p)
outcome: 1.0 + 2.0 x**1 + 3.0 x**2 (no errors)
on jupyter
it gives me the correct result: 𝑥↦0.0+1.0𝑥+1.0𝑥2+1.0𝑥3
but I really prefer writing code on pycharm - can anyone tell me why this happens or how I could change it??
First, note that p.integ() doesn't change p. It returns a new polynomial object. When you execute print(p) after this expression, you are printing the original p that was created earlier.
In an interactive shell, with a line such as p.integ() that contains an expression (with no assignment), the shell (i.e. Jupyter) prints the value of the expression in the terminal. This is a feature of Jupyter, not of the Python interpreter. When such an expression is encountered in a program, the Python interpreter evaluates the expression, but does not print it. If you want to print the integral of p, you can do something like
q = p.integ()
print(q)

How to get python to generate the tweedie deviance for xgboost?

Using statsmodel's GLM, the tweedie deviance is included in the summary function, but I don't know how to do this for xgboost. Reading the API didn't help either.
In Python this is how you do it. Suppose predictions is the result of your gradient boosted tree and real are the actual numbers. Then using statsmodels you would run this:
import statsmodels as sm
dev = sm.families.Tweedie(pow_var=1.5).deviance(predictions, real)

matplotlib configuration for inline backend in jupyter notebook

I'd like to learn how to configure the defaults for matplotlib using the inline backend in jupyter notebook. Specifically, I'd like to set default 'figure.figsize’ to [7.5, 5.0] instead of the default [6.0, 4.0]. I’m using jupyter notebook 1.1 on a Mac with matplotlib 1.4.3.
In the notebook, using the macosx backend, my matplotlibrc file is shown to be in the standard location, and figsize is set as specified in matplotlibrc:
In [1]: %matplotlib
Using matplotlib backend: MacOSX
In [2]: mpl.matplotlib_fname()
Out[2]: u'/Users/scott/.matplotlib/matplotlibrc'
In [3]: matplotlib.rcParams['figure.figsize']
Out[3]:[7.5, 5.0]
However, when I use the inline backend, figsize is set differently:
In [1]: %matplotlib inline
In [2]: mpl.matplotlib_fname()
Out[2]: u'/Users/scott/.matplotlib/matplotlibrc'
In [3]: matplotlib.rcParams['figure.figsize']
Out[3]:[6.0, 4.0]
In my notebook config file, ~/.jupyter/jupyter_notebook_config.py, I also added the line
c.InlineBackend.rc = {'figure.figsize': (7.5, 5.0) }
but this had no effect either. For now I’m stuck adding this line in every notebook:
matplotlib.rcParams['figure.figsize']=[7.5, 5.0]
Is there any way to set the default for the inline backend?
The Jupyter/IPython split is confusing. Jupyter is the front end to kernels, of which IPython is the defacto Python kernel. You are trying to change something related to matplotlib and this only makes sense within the scope of the IPython kernel. Making a change to matplotlib in ~/.jupyter/jupyter_notebook_config.py would apply to all kernels which may not make sense (in the case of running a Ruby/R/Bash/etc. kernel which doesn't use matplotlib). Therefore, your c.InlineBackend.rc setting needs to go in the settings for the IPython kernel.
Edit the file ~/.ipython/profile_default/ipython_kernel_config.py and add to the bottom: c.InlineBackend.rc = { }.
Since c.InlineBackend.rc specifies matplotlib config overrides, the blank dict tells the IPython kernel not to override any of your .matplotlibrc settings.
If the file doesn't exist, run ipython profile create to create it.
Using Jupyter on windows at least, I was able to do it using something very much like venkat's answer, i.e.:
%matplotlib inline
import matplotlib
matplotlib.rcParams['figure.figsize'] = (8, 8)
I did this to square the circle, which had been rather eliptical up to that point. See, squaring the circle is not that hard. :)
Note that the path of ipython_kernel_config.py differs if you run ipython from a virtual environment. In that case, dig in the path where the environment is stored.
Use figsize(width,height) in the top cell and it changes width of following plots
For jupyter 5.x and above with IPython kernels, you can just override particular keys and leave the rest by putting things like this, with your desired figsize in your ~/.ipython/profile_default/ipython_kernel_config.py:
c = get_config()
c.InlineBackend.rc.update({"figure.figsize": (12, 10)})