Run colab notebook from last output rather tha from scratch - google-colaboratory

I ran notebook in google colab yesterday say till cell 42 and there were some function which took a lot of time and I want to run it tomorrow continuing after cell 42 rather than from cell 0.
How can we do it?
I tried to google solution but didn't find much about it.

I don't think you can. Colab session times out after a while and then you must load all your elements to continue.

Related

How to revert Intellij's Jupyter notebook to traditional style?

I use IntelliJ premium, I updated the whole app, and I saw a very annoying new output style of it's Jupyter notebook. Then I reinstalled an older version, I see the jupyter output is still in new format. I guess it is because of Jupyter's package update.
How can I have the old-style table format?
the new style shows only 10 rows and for every run, you should change 10 to a higher number which becomes annoying after a couple of minutes. It has gotten really slow too.

Grib2 data extraction with xarray and cfgrib very slow, how to improve the code?

The code is taking about 20 minutes to load a month for each variable with 168 time steps for the cycle of 00 and 12 UTC of each day. When it comes to saving to csv, the code takes even longer, it's been running for almost a day and it still hasn't saved to any station. How can I improve the code below?
Reading .grib files using xr.open_mfdataset() and cfgrib:
I can speak to the slowness of reading grib files using xr.open_mfdataset(). I had a similar task where I was reading in many grib using xarray and it was taking forever. Other people have experienced similar issues with this as well (see here).
According to the issue raised here, "cfgrib is not optimized to handle files with a huge number of fields even if they are small."
One thing that worked for me was converting as many of the individual grib files as I could to one (or several) netcdf files and then read in the newly created netcdf file(s) to xarray instead. Here is a link to show you how you could do this with several different methods. I went with the grib_to_netcdf command via ecCodes tool.
In summary, I would start with converting your grib files to netcdf, as it should be able to read in the data to xarray in a more performant manner. Then you can focus on other optimizations further down in your code.
I hope this helps!

Running GitHub projects on Google Colaboratory

I tried to run a project (ipynb extension) from GitHub using Google colab.
I have managed to run the program, but when compared with the author’s output, mine is a little different.
For example, train_df.describe() does not print some of the columns (‘target' column in particular because that is used to plot a graph.
Why is it that I run the same program but get different result?

Tensorflow: The graph couldn't be sorted in topological order only when running in terminal

I encounter some problem while running the python script on Google cloud computing instance, with python 3.6, tensorflow 1.13.1. I see several people encounter similar problems of loops in computational graph on stackoverflow. But none of it really find the culprit for it. And I observe something interesting so maybe someone experienced can figure it out.
The error message is like this:
2019-05-28 22:28:57.747339: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-05-28 22:28:57.754195: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
My script for train.py will look like this:
import A,B,C
...
def main():
....
if __name__ == '__main__':
main()
So I will show my two ways to run this script:
VERSION1:
In terminal,
python3 train.py
This gives me the error like I state above. When I only use CPU, i notice it throws me something like failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected. So I add GPU to my instance but the loop in computational graph is still there.
VERSION 2(This is where weird thing happens):
I just simply copy, with nothing changed, the code in main to the jupyter notebook and run there. Then suddenly, no error occurs anymore.
I don't really know what's going on under the hood. I just notice the message at the beginning of the running is not the same between two different ways of running the code.
If you encounter the same problem, copy to jupyter notebook might help directly. I would really like share more info if someone has any ideas what might possible cause this. Thank you!
Well it turns out, no matter what, I choose a wrong way to build the graph at the beginning, which will not give the loop in my perspective. The loop error give me an idea I'm doing something wrong. But the interesting question I mentioned above is still not answered! However, I'd like to share my error so anyone see the loop error should think whether you're doing the same thing as me.
In the input_fn, I use tensor.eval() to get corresponding numpy.array in the middle to interact with data outside of that function. I choose not to use tf.data.Dataset because the whole process is complicated and I can't compress the whole thing into Dataset directly. But it turns out this approach sabotage the static computational graph design of Tensorflow. So during training, it trains on the same batch again and again. So my two cents advice is that if you want to achieve something super complex in your input_fn. It's likely you will be better off or even only doing the right thing by using the old fashioned modelling way- tf.placeholder.

Tensorflow: How to Optimize the Prediction on GCloud-ML?

I have a model published on GCloud-ML and it is working fine. I can do the Online predicitons and get the right results. My issue is the Performance. Each predict (inference) takes around 3.5 seconds and it is not good for my case. I'm using automatically scale and my bucket is US-Central. My images have around 100Kb and I'm in Brazil (in the GCloud Console I can see tha latency --> 1.5 seconds).
I already tried the optimazed_for_inference.py but it didn't work (I can't generate the saved_model from the optimazed graph. It is possible?).
I need to get the result at least in 2 seconds. My doubt is: Is it possible to do that?
or is it normal to get results in 3 / 4 seconds using gcloud-ml predict?
Thanks! Any ideas are well coming! If you need more information to help me, please add a comment!
Thanks again!