Hide SQL statements when running IPython notebook with Django's django-extensions - sql

I'm using Django's django-extensions to run an iPython notebook with access to the Django models (as detailed here http://andrewbrookins.com/python/using-ipython-notebook-with-django/). When I make model queries, the notebook shows the underlying SQL queries executed by Django, like so:
Can I hide this SQL? It's so voluminous it makes the display unusable at times.

with recent version, you could use
%%capture variable
the code
to capture stdout and stderr into a variable
adding the --no-stdout flag
%%capture --no-stdout variable
wont capture stdout, thus displaying it.
Also, please, IPython (upper casse I) preferably,
ipython accepted, but try to avoid iPython

Related

why some notes in spark works very slow? and why multiple execution in same situation has different execution time?

My question is about the execution time of pyspark codes in zeppelin.
I have some notes and I work with some SQL's in it. in one of my notes, I convert my dataframe to panda with .topandas() function. size of my data is about 600 megabyte.
my problem is that it takes a long time.
if I use sampling for example like this:
df.sample(False, 0.7).toPandas()
it works correctly and in an acceptable time.
the other strange point is when I run this note several times, sometimes it works fast and sometimes slow. for example for the first run after restart pyspark interpreter, it works faster.
how I can work with zeppelin in a stable state?
and which parameters are effective to run a spark code in an acceptable time?
The problem here is not zeppelin, but you as a programmer. Spark is a distributed (cluster computing) data analysis engine written in Scala which therefore runs in a JVM. Pyspark is the Python API for Spark which utilises the Py4j library to provide an interface for JVM objects.
Methods like .toPandas() or .collect() return a python object which is not just an interface for JVM objects (i.e. it actually contains your data). They are costly because they require to transfer your (distributed) data from the JVM to the python interpreter inside the spark driver. Therefore you should only use it when the resulting data is small and work as long as possible with pyspark dataframes.
Your other issue regarding different execution times needs to be discussed with your cluster admin. Network spikes and jobs submitted by other users can influence your execution time heavily. I am also surprised that your first run after a restart of the spark interpreter is faster, because during the first run the sparkcontext is created and cluster ressources are allocated which adds some overhead.

Tensorflow: The graph couldn't be sorted in topological order only when running in terminal

I encounter some problem while running the python script on Google cloud computing instance, with python 3.6, tensorflow 1.13.1. I see several people encounter similar problems of loops in computational graph on stackoverflow. But none of it really find the culprit for it. And I observe something interesting so maybe someone experienced can figure it out.
The error message is like this:
2019-05-28 22:28:57.747339: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-05-28 22:28:57.754195: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
My script for train.py will look like this:
import A,B,C
...
def main():
....
if __name__ == '__main__':
main()
So I will show my two ways to run this script:
VERSION1:
In terminal,
python3 train.py
This gives me the error like I state above. When I only use CPU, i notice it throws me something like failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected. So I add GPU to my instance but the loop in computational graph is still there.
VERSION 2(This is where weird thing happens):
I just simply copy, with nothing changed, the code in main to the jupyter notebook and run there. Then suddenly, no error occurs anymore.
I don't really know what's going on under the hood. I just notice the message at the beginning of the running is not the same between two different ways of running the code.
If you encounter the same problem, copy to jupyter notebook might help directly. I would really like share more info if someone has any ideas what might possible cause this. Thank you!
Well it turns out, no matter what, I choose a wrong way to build the graph at the beginning, which will not give the loop in my perspective. The loop error give me an idea I'm doing something wrong. But the interesting question I mentioned above is still not answered! However, I'd like to share my error so anyone see the loop error should think whether you're doing the same thing as me.
In the input_fn, I use tensor.eval() to get corresponding numpy.array in the middle to interact with data outside of that function. I choose not to use tf.data.Dataset because the whole process is complicated and I can't compress the whole thing into Dataset directly. But it turns out this approach sabotage the static computational graph design of Tensorflow. So during training, it trains on the same batch again and again. So my two cents advice is that if you want to achieve something super complex in your input_fn. It's likely you will be better off or even only doing the right thing by using the old fashioned modelling way- tf.placeholder.

Run the same IPython notebook code on two different data files, and compare

Is there a good way to modularize and re-use code in IPython Notebook (Jupyter) when doing the same analysis on two different sets of data?
For example, I have a notebook with a lot of cells doing analysis on a data file. I have another data file of the same format, and I'd like to run the same analysis and compare the output. None of these options looks particularly appealing for this:
Copy and paste the cells to a second notebook. The analysis code is now duplicated and harder to update.
Move the analysis code into a module and run it for both files. This would lose the cell-by-cell format of the figures that are currently generated and simply jumble them all together in one massive cell.
Load both files in one notebook and run the analyses side by side. This also involves a lot of copy-and-pasting, and doesn't generalize well to 3 or 4 different data files.
Is there a better way to do this?
You could lace demo directives into the standalone module, as per the IPython Demo Mode example.
Then when actually executing it in the notebook, you make a call to the demo object wrapper each time you want to step to the next important part. So your cells would mostly consist of calls to that demo wrapper object.
Option 2 is clearly the best for code re-use, it is the de facto standard arguably in all of software engineering.
I argue that the notebook concept itself doesn't scale well to 3, 4, 5, ... different data files. Notebook presentations are not meant to be batch processing receptacles. If you find yourself needing to do parameter sweeps across different data sets, and wanting to re-run analyses on top of the different data loaded for each parameter group (even when the 'parameters' might be as simple as different file names) it raises a bad code smell. It likely means the level of analysis being performed in an 'interactive' way is wrong. Witnessing analysis 'interactively' and at the same time performing batch processing are two pretty much incompatible goals. A much better idea would be to batch process all of the parameter sets separately, 'offline' from the point of view of any presentation, and then build a set of stand-alone functions that can produce visual results from the computed and stored batch results. Then the notebook will just be a series of function calls, each of which produces summary data (some of which could be examples from a selection of parameter sets during batch processing) across all of the parameter sets at once to invite the necessary comparisons and meaningfully present the result data side-by-side.
'Witnessing' an entire interactive presentation that performs analysis on one parameter set, then changing some global variable / switching to a new notebook / running more cells in the same notebook in order to 'witness' the same presentation on a different parameter set sounds borderline useless to me, in the sense that I cannot imagine a situation where that mode of consuming the presentation is not strictly worse than consuming a targeted summary presentation that first computed results for all parameter sets of interest and assembled important results into a comparison.
Perhaps the only case I can think of would be toy pedagogical demos, like some toy frequency data and a series of notebooks that do some simple Fourier analysis or something. But that's exactly the kind of case that begs for the analysis functions to be made into a helper module, and the notebook itself just lets you selectively declare which toy input file you want to run the notebook on top of.

PyPlot in Julia only showing plot when code ends

I have recently begun learning to use Julia, converting over from Matlab/Octave. I decided that the best way to get some experience was to convert some code I was already working on i Octave - a Newton solver for a complicated multidimensional problem. I have been able to convert the code over successfully (and with noticeable speedup relative to Octave, without devectorisation or other performance-based changes), with only one issue arising.
I have chosen to use PyPlot for plotting, due to its similarity to Matlab/Octave's plotting functionality. However, there is some behaviour from PyPlot that is undesired. I use the plotting function to display the current state of the vector I am trying to get to zero (using the Newton solver part of the code), so that I can see what it is doing, and adjust the code to try to improve this behaviour. I input the number of Newton steps to take before the code stops, and then I can make adjustments or re-issue the command to continue attempting to converge.
I have the code set up to plot the current state every few steps, so that I can, for instance, have the code take 200 steps, but show me the status after every 10 steps. In Octave, this works perfectly, providing me with up-to-date information - should the behaviour of the code not be desirable, I can quickly cancel the code with Ctrl-C (this part works in Julia, too).
However, Julia does not produce or update the plots when the plot() command is used; instead, it produces the plot, or updates it if the plot window is already open, only when the code finishes. This entirely defeats the purpose of the intermittent plotting within the code. Once the code has completed, the plot is correctly generated, so I know that the plot() command itself is being used correctly.
I have tried adding either draw() or show() immediately after the plot command. I have also tried display(gcf()). None of these have modified the result. I have confirmed that isinteractive() outputs "true". I have also tried turning interactivity off (ioff()) and switching whether to use the python or julia backend (pygui(true) and pygui(false)), with no effect on this behaviour.
Have I missed something? Is there another package or option that needs to be set in order to force PyPlot to generate the current plot immediately, rather than waiting until Julia finishes its current code run to generate the plot?
Or is it perhaps possible that scope is causing a problem, here, as the intermittent plotting happens inside a while loop?
I am using xubuntu 12.10 with Julia 0.2.1.
PyPlot defaults to this behavior in the REPL. To make it show the plots as they are plotted type ion(). To turn it off again type ioff().
ion() is only effective for the current season so if you want it to stay on across sessions just add it to your .juliarc file.
If you're using iPython ion() will plot to a new window but ioff() will plot inline.

Programatically determine if a user is calling code from the notebook [duplicate]

This question already has answers here:
How can I check if code is executed in the IPython notebook?
(16 answers)
Closed 6 years ago.
I'm writing some software that creates matplotlib plots of simulation data. Since these plotting routines are often running in a headless environment, I've chosen to use the matplotlib object oriented interface explicitly assign canvases to figures only just before they are saved. This means I cannot use pylab or pyplot based solutions for this issue.
I've added some special sauce so the plots show up inline either by invoking a display method on the plot object or by invoking __repr__. However, the check I'm doing to determine if a user is running under IPython (checking for "__IPYTHON__" in dir(__builtin__)) cannot discriminate whether the user is in a notebook or just a regular terminal session where inline figures won't work.
Is there some way to programatically check whether a code snippet has been executed in a notebook, qt console, or terminal IPython session? Am I doing something silly here - I haven't looked too closely at the display semantics so perhaps I'm ignorant about some portion of the IPython internal API that will take care of this for me.
Answerd many time : No you cant.
How can I check if code is executed in the IPython notebook?
Same kernel can be connected to notebook, qtconsole and terminal at the same time, even to many at once.
Your question is like a TV star asking "how can I know if the person watching me on tv is male of female? ". It does not make sens.
Don't invoke _repr_*_ yourself. Try to import display, make it no-op if import fail.
that should be sufficient to both in python or IPython not.
Better return object instead of displaying. The display hook will work by itself in IPython if the object has a _repr_png_ or equivalent.