Many of my partners at work are comfortable mainly with spreadsheets, and not python, scala, java, SQL, etc. These people are non-technical, but they need data and it's my job to get it in their hands. Reading about Google colaboratory, and their example notebook on io, I discovered gspread and the apparent ease of authentication:
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
We use a local deployment of Jupyter with several customizations for simplifying access to Presto, Hive, Spark, etc, so I tried installing colab in our environment to no avail (it's not open source). The next best approach I found was using oauth2, but this seems to require a GCP account for access to the Google Developer's Console. This seems overly complicated for something that boils down to authenticating with a remote service.
So, group mind, how do you most easily authenticate with gspread in a jupyter notebook?
I can explain the ingredients that make this work in Colab:
the call to authenticate_user invokes gcloud auth login --enable-gdrive-access (with other args).
We then set the GOOGLE_APPLICATION_CREDENTIALS environment variable to an exported copy of those credentials.
Any code using the Application Default Credentials can now discover those credentials; in particular, oauth2client finds them (as in the sample above).
As always, you should be careful about where you put credentials; in the case of Colab, we have a fairly controlled system, and are OK with this setup (though as with all things, it could change in the future).
Related
I have a big doubt... is see a lot of blog posts where they say that you can use the Colab front-end to edit a local Jupiter Notebook
However I don't see the point... the actual advantage would be to use something like DataSpell or some local IDE, on a remote Notebook on Colab, and use the Colab Resources to do the computations, so you have:
IDE level of suggestions (Colab is pretty slow compared to local IDE)
cloud computing performances and advantages
Hoever, I don't see any blog talking about this... is there any way to do this?
I would like to understand how feasible it would be to spin up my own instance of a Colaboratory server that I could run within a closed network. Using the public version is unfortunately not yet an option in my company. I would really like to have something equivalent that I could use internally, which has all of the nice features such as collaborative editing.
Has anyone tried doing this? Is it even possible?
There's no way to spin up a full instance of the Colab service; i.e., the bits that integrate with GSuite / Docs / GCP / TPUs.
But, you can run local backends using the instructions here:
http://research.google.com/colaboratory/local-runtimes.html
I found this great question here:
https://stackoverflow.com/questions/48376580/google-colab-how-to-read-data-from-my-google-drive
which helped me to connect the colab to my drive
Here it is as well:
from google.colab import drive
drive.mount('/content/gdrive')
My question: Is there anyway to do this process of the google authentication only once? Colab disconnects from time to time if not in use and than I need to restart the authentication process.
Thanks
Authentication is done per machine; exchanging keys to access drive. Since you always get a new machine on re-connect, you need to re-authenticate.
However, another option is to use an API key for your google drive access. This can be done via theGoogle API Console for the Drive Platform. Essentially you would have one API Token you can over and over again; possibly leading you to store it inside the notebook... where the bad part starts.
If you opt-in on using a token to "manually" mount the drive folder, as soon someone gets a hand on this token (i.e. sharing your notebook, man in the middle, forgetting to delete the key), your drive folder is compromised. That is the reason why my formal answer to this question is: No, you can't.
But since colab provides the whole machine with a unix environment where you can execute arbitrary bash commands, you are in control and leave you with additional resources for further investigation:
https://stackoverflow.com/a/50888878/2763239
https://medium.com/#uditsaini/access-google-drive-and-mount-google-drive-to-colab-notebook-google-ccbca1691d31
https://github.com/googlecolab/colabtools/issues/121#issuecomment-423326300
A recently released feature makes this much simpler. The details are described in this answer:
https://stackoverflow.com/a/60103029/8841057
The short version is that for notebooks in Drive that aren't shared, there's now a GUI option to mount Drive files automatically for a given notebook.
Is it possible to have an IPython notebook upon in your own local browser, but it is running on a remote machine?
How does one actually access an IPython notebook running remotely using ssh?
Quoth the extensive Jupyter Documentation for Running a Notebook Server:
The Jupyter notebook web application is based on a server-client structure. The notebook server uses a two-process kernel architecture based on ZeroMQ, as well as Tornado for serving HTTP requests.
This document describes how you can secure a notebook server and how to run it on a public interface.
If you store your iPython notebook on GitHub or GitHub Gist or any file service (DropBox) then you can point http://nbviewer.jupyter.org/ to your file and view it online.
Or you can export your notebook to HTML https://ipython.org/ipython-doc/1/interactive/nbconvert.html
this may be less necessary as GitHub displays iPython notebooks directly now (try https://github.com/jakevdp/sklearn_pycon2015/blob/master/notebooks/02.1-Machine-Learning-Intro.ipynb)
The code for nbViewer is also on GitHub https://github.com/jupyter/nbviewer
Let me know if you need to modify notebooks remotely or just view them.
Is there any good way of creating and managing S3 policies and users from the command line of my Raspberry Pi?
The AWS Universal Command Line Tools are newer and better supported. They rely on Python, so if you can get Python for Raspberry Pi, you should be set.
I have no experience of using it myself, but I found a tool for interacting with Amazon IAM, the access control service for AWS, in a manner that might work for you:
IAM Command Line Toolkit (note: last updated September 2010)
There may be more usable stuff under the IAM Resources section.
If you are unfamiliar with IAM, the documentation is one place to start. Although, knowing the general style of AWS documentation, there may be better resources and tutorials to be found elsewhere.