How to prevent fastai fastbook from requesting access to Google Drive when running in Google Colab? - google-colaboratory

When setting up Fastbook in Google Colab, it requests permissions in order to access my Google Drive. This is the prompt I get:
Permit this notebook to access your Google Drive files?
This notebook is requesting access to your Google Drive files. Granting access to Google Drive will permit code executed in the notebook to modify files in your Google Drive. Make sure to review notebook code prior to allowing this access.
[No thanks] [Connect to Google Drive]
Since I'm running foreing (and potentially unsafe) code on my Google account, I don't feel comfortable granting permissions to Google Drive.
It looks the call asking for permission is: fastbook.setup_book()
How can I prevent fastbook from fastai from requesting access to Google Drive? If I don't grant the permission the following error occurrs and I'm unsure on whether it has been initialized or not:
---------------------------------------------------------------------------
MessageError Traceback (most recent call last)
<ipython-input-8-fce0e354ba4c> in <module>
----> 1 fastbook.setup_book()
5 frames
/usr/local/lib/python3.7/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
100 reply.get('colab_msg_id') == message_id):
101 if 'error' in reply:
--> 102 raise MessageError(reply['error'])
103 return reply.get('data', None)
104
MessageError: Error: credential propagation was unsuccessful

After looking at fastbook module source code and initialization, I found three ways on preventing fastai fastbook from asking Google Drive permissions when running in Google Colaboratory. As of this writing, all three work, you can use any of the three approaches safely.
1. Create /content/gdrive/My Drive directory
setup_colab function found in fastbook/__init__.py checks whether google drive has been mounted already. If you make it believe it has, it won't try to mount it again.
To do so, just add these two lines at the beginning of your notebook:
import os
os.makedirs ('/content/gdrive/My Drive', exist_ok = True)
Then run it, then you can run the import fastbook and its setup without any errors.
2. Do not execute fastbook.setup_book() (or comment that line)
It turns setup_book code only checks if it is running inside colab and if so, it mounts your google drive into this folder: /content/gdrive/ and creates the global variable "gdrive" which points to /content/gdrive/My Drive as a convenience way to save stuff there and have persistence.
As of this writing, it will be totally fine if you don't execute fastbook.setup_book(), or comment out that line; the rest of the notebook will run just fine. Again, the only thing that "setup" does, is to call "setup_colab()" in order to setup your Google Drive for the notebooks to be able to have some persistency (that might not be used on some notebooks anyway).
You can just change the initialization to:
! [ -e /content ] && pip install -Uqq fastbook
import fastbook
# fastbook.setup_book()
3. try/except fastbook.setup_book()
If you embed this call into a try/except, it won't return that error. This is what initialization will look like:
! [ -e /content ] && pip install -Uqq fastbook
import fastbook
try:
fastbook.setup_book()
except:
pass
Final thoughts
As of this writing (2022) the function setup_book only initializes google drive in colab, but this might change in the future (e.g in order to initialize other stuff). Probably the best solution would be to just use the first approach I described and create the folder so fastbook believes it has already mounted it, so if the setup_book call changes in the future to include other sort of initialization, we won't be preventing it from happening.
Regardless, it is always good to check out the source code and see what is going on under the hood.
As far as I've seen in the code, there should be no harm in granting permissions, since the only thing it does is to mount Google Drive in order to allow notebooks to save data permanently, so you have that data available across executions. However, a word of caution, since that does not mean that another library imported from any of those scripts could potentially exploit the fact that the permissions have already been granted and copy your private documents or other stuff somewhere else, or even ransom your documents. I'm guessing that if something like that would happen it will likely be picked up and addressed very quickly by the fast.ai community; TBH I might be a little bit "paranoid" with this stuff and it might be totally fine to just grant permissions, but just in case I prefer to err on the safe/paraonoid side.
Another alternative would be to just create another Google Account with an empty drive and run the notebooks from there without any fear of granting permissions.

Related

How do I access files generated during Cloud Run function execution?

I'm running a very simple program getting screenshots of a page using Selenium in Cloud Run. I know that Cloud Run is stateless and I cannot access the screenshot that is generated after the program finishes executing, but I wanted to know where/how can I access these files right after the screenshot is taken and read them, so I can store a reference to them in my Cloud Storage bucket too
You have several solution:
Store the screenshot locally, and then upload them to Cloud Storage (you can create a script for that, use client libraries,...). A good evolution is to make a tar (optionally a gzip also) to upload only 1 file, it's faster.
Use Cloud Run execution runtime 2nd generation, and mount a bucket with GCSFuse into your Cloud Run instance. Like that, a file directly written in the mounted directory will be written on Cloud Storage. For that solution, and despite the good tutorial, it requires good skills in container.

How do you import a custom python library onto an apache spark pool with Azure Synapse Analytics?

According to Microsoft's documentation it is possible to upload a python wheel file so that you can use custom libraries in Synapse Analytics.
Here is that documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries
I have created a simple library with just a hello world function that I was able to install with pip on my own computer. So I know my wheel file works.
I uploaded my wheel file to the location Microsoft's documentation say to upload the file.
I also found a youtube video of a person doing exactly what I am trying to do.
Here is the video: https://www.youtube.com/watch?v=t4-2i1sPD4U
Microsoft's documentation mentions this, "Custom packages can be added or modified between sessions. However, you will need to wait for the pool and session to restart to see the updated package."
As far as I can tell there is no way to restart a pool, and I also do not know how to tell if the pool is down or has restarted.
When I try to use the library in a notebook I get a module not found error.
Scaling up or down will force the cluster to restart .
Making changes to the spark pool's scale settings does restart the spark pool as HimanshuSinha-msft suggested. That was not my problem though.
The actual problem was that I needed the Storage Blob Data Contributor role in the data lake storage the files were stored in. I assumed because I already had owner permissions and because I could create a folder and upload there I had all the permissions I needed. Once I got the Storage Blob Data Contributor role though everything worked.

How to access google drive data when using google-colab in local runtime?

I have tried using PyDrive to authenticate and get access of Google Drive.
I followed every step in here, https://pythonhosted.org/PyDrive/quickstart.html.
After downloading and renaming “client_secrets.json”, where should I put or use this file in Jupyter notebook environment, in order to access google-drive in google-colab for local runtime?
I am getting error, InvalidConfigError: Invalid client secrets file while saving PyDrive credentials.
In order to use Google Colab in 'local runtime'.
You have to install Jupyter and related packages, that I have been able to do, as mentioned here https://research.google.com/colaboratory/local-runtimes.html.
But how to access 'google-drive' now?

Is there a way to access data from one drive using google colab?

I have started using google colab to train neural networks, however the data I have is quite large (4GB and 18GB). I have all this data currently stored in one drive and I don't have enough space on my google drive to transfer these files over.
Is there a way for me to directly access the data from one drive in google colab?
I have tried directly loading the data from my own machine, however I feel this process is too time consuming and my machine really doesn't have enough space to store these files. I have also tried adding download=1 after the ? in the file's hyperlink however this does not download and only displays the hyperlink. While using wget produces a 'ERROR 403: Forbidden.' message.
I would like for the google colab file to download this zipped file and to unzip the data from it in order to preform training.
ok, here is the method download to colab, choose file and right-click download button in onedrive but pause it immediately
then go to the download interface, right-click the paused item, and copy the link address
!wget --no-check-certificate \
https://public.sn.files.1drv.com/xxx\
-O /content/filename.zip
Note: it will invalid in some minutes
You can use OneDriveSDK which available for download in the PyPi index.
First, we will install it in Google Colab using :
!pip install onedrivesdk
The process is too long to be accommodated here. You need to first authenticate yourself and then you can upload/download files easily.
You can authenticate using this code:
import onedrivesdk
redirect_uri = 'http://localhost:8080/' client_secret = 'your_client_secret' client_id='your_client_id' api_base_url='https://api.onedrive.com/v1.0/'
scopes=['wl.signin', 'wl.offline_access', 'onedrive.readwrite']
http_provider = onedrivesdk.HttpProvider()
auth_provider = onedrivesdk.AuthProvider( http_provider=http_provider, client_id=client_id, scopes=scopes)
client = onedrivesdk.OneDriveClient(api_base_url, auth_provider, http_provider)
auth_url = client.auth_provider.get_auth_url(redirect_uri)
# Ask for the code
print('Paste this URL into your browser, approve the app\'s access.')
print('Copy everything in the address bar after "code=", and paste it below.') print(auth_url)
code = input('Paste code here: ') client.auth_provider.authenticate(code, redirect_uri, client_secret)
This will result in a code which you need to paste in your browser and again in the console to authenticate yourself.
You can download an file using :
root_folder = client.item(drive='me', id='root').children.get()
id_of_file = root_folder[0].id client.item(drive='me', id=id_of_file).download('./path_to_file')
For download only, to download folders:
cliget in Firefox (wget didn't work for me, but curl is fine)
curlwget in Chrome (sorry, haven't tried, i don't use Chrome)
With cliget, you just have to install the add-on in firefox, than start a download of the folder. (Don't have to actually finish.) And at the add-ons' icons, click on cliget, than choose curl, and copy(-paste) the created command.
Note: these are not 'safe' methods, probably shouldn't be used with sensitive contents
(Probably other OneDrive folders stay safe, but I'm not sure. Please confirm me.)
To unzip, one can use unzip command.
A year passed since the question, but I leave this here, for others. :)
Edit:
For many small files it seems to be really slow, for some reason. (I'm not sure why.) Also (with OneDrive) it seems that reliable only up to a few (2-3) GBs... :(

How do I backup to google drive using duplicity?

I have been trying to get duplicity to backup to google drive. But it looks like it is still using the old client API.
I found some thread saying that the new API should be supported but not much details on how to get it to work.
I got as far as compiling and using duplicity 7.0.3 but then I got this error:
BackendException: GOOGLE_DRIVE_ACCOUNT_KEY environment variable not set. Please read the manpage to fix.
Has anyone set up duplicity to work with Google Drive and know how to do this?
Now that Google has begun forcing clients to use OAuth, using Google Drive as a backup target has actually gotten very confusing. I found an excellent blog post that walked me through it. The salient steps are:
Install PyDrive
PyDrive is the library that lets Duplicity use OAuth to access Drive.
pip install pydrive
should be sufficient, or you can go through your distribution's package manager.
Create an API token
Navigate to the Google Developer Console and log in. Create a project and select it from the drop-down on the top toolbar.
Now select the "Enable APIs and Services" button in the Dashboard, which should already be pulled up, but if not, is in the hamburger menu on the left.
Search for and enable the Drive API. After it's enabled, you can actually create the token. Choose "Credentials" from the left navigation bar, and click "Add Credential" > "OAuth 2.0 Client ID." Set the application type to "Other."
After the credential is created, click on it to view the details. Your Client ID and secret will be displayed. Take note of them.
Configure Duplicity
Whew. Time to actually configure the program. Paste the following into a file, replacing your client ID and secret with the ones from the Console above.
client_config_backend: settings
client_config:
client_id: <your client ID>.apps.googleusercontent.com
client_secret: <your client secret>
save_credentials: True
save_credentials_backend: file
save_credentials_file: gdrive.cache
get_refresh_token: True
(I'm using the excellent Duply frontend, so I saved this as ~/.duply/<server name>/gdrive).
Duplicity needs to be given the name of this file in the GOOGLE_DRIVE_SETTINGS environment variable. So you could invoke duplicity like this:
GOOGLE_DRIVE_SETTINGS=gdrive duplicity <...>
Or if you're using Duply, you can export this variable in the Duply configuration file:
export GOOGLE_DRIVE_SETTINGS=gdrive
Running Duplicity for the first time will begin the OAuth process; you'll be given a link to visit, which will ask permission for the app you created earlier in the Console to access your Drive account. Accept, and it will give you another authentication token to paste back into the terminal. The authorization info will be saved in a .cache file alongside the gdrive settings file.
At this point you should be good to go, and Duplicity should behave normally. Good luck!