Is there a way to access data from one drive using google colab? - onedrive

I have started using google colab to train neural networks, however the data I have is quite large (4GB and 18GB). I have all this data currently stored in one drive and I don't have enough space on my google drive to transfer these files over.
Is there a way for me to directly access the data from one drive in google colab?
I have tried directly loading the data from my own machine, however I feel this process is too time consuming and my machine really doesn't have enough space to store these files. I have also tried adding download=1 after the ? in the file's hyperlink however this does not download and only displays the hyperlink. While using wget produces a 'ERROR 403: Forbidden.' message.
I would like for the google colab file to download this zipped file and to unzip the data from it in order to preform training.

ok, here is the method download to colab, choose file and right-click download button in onedrive but pause it immediately
then go to the download interface, right-click the paused item, and copy the link address
!wget --no-check-certificate \
https://public.sn.files.1drv.com/xxx\
-O /content/filename.zip
Note: it will invalid in some minutes

You can use OneDriveSDK which available for download in the PyPi index.
First, we will install it in Google Colab using :
!pip install onedrivesdk
The process is too long to be accommodated here. You need to first authenticate yourself and then you can upload/download files easily.
You can authenticate using this code:
import onedrivesdk
redirect_uri = 'http://localhost:8080/' client_secret = 'your_client_secret' client_id='your_client_id' api_base_url='https://api.onedrive.com/v1.0/'
scopes=['wl.signin', 'wl.offline_access', 'onedrive.readwrite']
http_provider = onedrivesdk.HttpProvider()
auth_provider = onedrivesdk.AuthProvider( http_provider=http_provider, client_id=client_id, scopes=scopes)
client = onedrivesdk.OneDriveClient(api_base_url, auth_provider, http_provider)
auth_url = client.auth_provider.get_auth_url(redirect_uri)
# Ask for the code
print('Paste this URL into your browser, approve the app\'s access.')
print('Copy everything in the address bar after "code=", and paste it below.') print(auth_url)
code = input('Paste code here: ') client.auth_provider.authenticate(code, redirect_uri, client_secret)
This will result in a code which you need to paste in your browser and again in the console to authenticate yourself.
You can download an file using :
root_folder = client.item(drive='me', id='root').children.get()
id_of_file = root_folder[0].id client.item(drive='me', id=id_of_file).download('./path_to_file')

For download only, to download folders:
cliget in Firefox (wget didn't work for me, but curl is fine)
curlwget in Chrome (sorry, haven't tried, i don't use Chrome)
With cliget, you just have to install the add-on in firefox, than start a download of the folder. (Don't have to actually finish.) And at the add-ons' icons, click on cliget, than choose curl, and copy(-paste) the created command.
Note: these are not 'safe' methods, probably shouldn't be used with sensitive contents
(Probably other OneDrive folders stay safe, but I'm not sure. Please confirm me.)
To unzip, one can use unzip command.
A year passed since the question, but I leave this here, for others. :)
Edit:
For many small files it seems to be really slow, for some reason. (I'm not sure why.) Also (with OneDrive) it seems that reliable only up to a few (2-3) GBs... :(

Related

How to prevent fastai fastbook from requesting access to Google Drive when running in Google Colab?

When setting up Fastbook in Google Colab, it requests permissions in order to access my Google Drive. This is the prompt I get:
Permit this notebook to access your Google Drive files?
This notebook is requesting access to your Google Drive files. Granting access to Google Drive will permit code executed in the notebook to modify files in your Google Drive. Make sure to review notebook code prior to allowing this access.
[No thanks] [Connect to Google Drive]
Since I'm running foreing (and potentially unsafe) code on my Google account, I don't feel comfortable granting permissions to Google Drive.
It looks the call asking for permission is: fastbook.setup_book()
How can I prevent fastbook from fastai from requesting access to Google Drive? If I don't grant the permission the following error occurrs and I'm unsure on whether it has been initialized or not:
---------------------------------------------------------------------------
MessageError Traceback (most recent call last)
<ipython-input-8-fce0e354ba4c> in <module>
----> 1 fastbook.setup_book()
5 frames
/usr/local/lib/python3.7/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
100 reply.get('colab_msg_id') == message_id):
101 if 'error' in reply:
--> 102 raise MessageError(reply['error'])
103 return reply.get('data', None)
104
MessageError: Error: credential propagation was unsuccessful
After looking at fastbook module source code and initialization, I found three ways on preventing fastai fastbook from asking Google Drive permissions when running in Google Colaboratory. As of this writing, all three work, you can use any of the three approaches safely.
1. Create /content/gdrive/My Drive directory
setup_colab function found in fastbook/__init__.py checks whether google drive has been mounted already. If you make it believe it has, it won't try to mount it again.
To do so, just add these two lines at the beginning of your notebook:
import os
os.makedirs ('/content/gdrive/My Drive', exist_ok = True)
Then run it, then you can run the import fastbook and its setup without any errors.
2. Do not execute fastbook.setup_book() (or comment that line)
It turns setup_book code only checks if it is running inside colab and if so, it mounts your google drive into this folder: /content/gdrive/ and creates the global variable "gdrive" which points to /content/gdrive/My Drive as a convenience way to save stuff there and have persistence.
As of this writing, it will be totally fine if you don't execute fastbook.setup_book(), or comment out that line; the rest of the notebook will run just fine. Again, the only thing that "setup" does, is to call "setup_colab()" in order to setup your Google Drive for the notebooks to be able to have some persistency (that might not be used on some notebooks anyway).
You can just change the initialization to:
! [ -e /content ] && pip install -Uqq fastbook
import fastbook
# fastbook.setup_book()
3. try/except fastbook.setup_book()
If you embed this call into a try/except, it won't return that error. This is what initialization will look like:
! [ -e /content ] && pip install -Uqq fastbook
import fastbook
try:
fastbook.setup_book()
except:
pass
Final thoughts
As of this writing (2022) the function setup_book only initializes google drive in colab, but this might change in the future (e.g in order to initialize other stuff). Probably the best solution would be to just use the first approach I described and create the folder so fastbook believes it has already mounted it, so if the setup_book call changes in the future to include other sort of initialization, we won't be preventing it from happening.
Regardless, it is always good to check out the source code and see what is going on under the hood.
As far as I've seen in the code, there should be no harm in granting permissions, since the only thing it does is to mount Google Drive in order to allow notebooks to save data permanently, so you have that data available across executions. However, a word of caution, since that does not mean that another library imported from any of those scripts could potentially exploit the fact that the permissions have already been granted and copy your private documents or other stuff somewhere else, or even ransom your documents. I'm guessing that if something like that would happen it will likely be picked up and addressed very quickly by the fast.ai community; TBH I might be a little bit "paranoid" with this stuff and it might be totally fine to just grant permissions, but just in case I prefer to err on the safe/paraonoid side.
Another alternative would be to just create another Google Account with an empty drive and run the notebooks from there without any fear of granting permissions.

Can I transfer images between shopify sites?

I'm doing some work for a client who has an existing shopify website. They want to make some big changes to the site, so i have set up a new development site in shopify, exported all of the products/pages/blog posts to it and am now working on getting all the new functionality/design working on the dev site.
Once the new build is finished though, i want to transfer everything back over to their current site. Products/pages/blog posts will be fine (ive written a custom export/import thing using their api), but what about images?
I am uploading lots of images to the dev site and i am worried they will be deleted when development is finished and i shut down the dev site. Is it possible to transfer over images from one site to another?
Ideally, keeping the same urls on shopifys cdn when doing so, although if i have to change the urls, then i can probably do an automated replace on the csv files that will get uploaded.
There are going to be hundreds of images involved, and they will be used in various places throughout the site, including in the rich text area of pages/blogs, so it's not going to be practical to do manually in any way, must be something I can automate.
Thanks for any help.
When you export products as a CSV, you get links to your images. You could write a script to download each of the images in the CSV. Just redirect the output of curl to save the image.
curl link_url > imagename
Have you tried transferring between the two sites using FTP? If you have SSH Access
login to the server via SSH
change to the right directory to file location or desired location
FTP into the other server using ftp <name_or_IP_address_of_other_server> and your login details
use cd to locate your location / desired destination
use the binary command
hash if you want a progress bar
if sending the file from the server you SSHed into issue the put <filename> command, and if you want to pull the file from the other server to the one you are logged into use get <filename> instead.
Wait a while for the transfer to complete - might take a while

How do I backup to google drive using duplicity?

I have been trying to get duplicity to backup to google drive. But it looks like it is still using the old client API.
I found some thread saying that the new API should be supported but not much details on how to get it to work.
I got as far as compiling and using duplicity 7.0.3 but then I got this error:
BackendException: GOOGLE_DRIVE_ACCOUNT_KEY environment variable not set. Please read the manpage to fix.
Has anyone set up duplicity to work with Google Drive and know how to do this?
Now that Google has begun forcing clients to use OAuth, using Google Drive as a backup target has actually gotten very confusing. I found an excellent blog post that walked me through it. The salient steps are:
Install PyDrive
PyDrive is the library that lets Duplicity use OAuth to access Drive.
pip install pydrive
should be sufficient, or you can go through your distribution's package manager.
Create an API token
Navigate to the Google Developer Console and log in. Create a project and select it from the drop-down on the top toolbar.
Now select the "Enable APIs and Services" button in the Dashboard, which should already be pulled up, but if not, is in the hamburger menu on the left.
Search for and enable the Drive API. After it's enabled, you can actually create the token. Choose "Credentials" from the left navigation bar, and click "Add Credential" > "OAuth 2.0 Client ID." Set the application type to "Other."
After the credential is created, click on it to view the details. Your Client ID and secret will be displayed. Take note of them.
Configure Duplicity
Whew. Time to actually configure the program. Paste the following into a file, replacing your client ID and secret with the ones from the Console above.
client_config_backend: settings
client_config:
client_id: <your client ID>.apps.googleusercontent.com
client_secret: <your client secret>
save_credentials: True
save_credentials_backend: file
save_credentials_file: gdrive.cache
get_refresh_token: True
(I'm using the excellent Duply frontend, so I saved this as ~/.duply/<server name>/gdrive).
Duplicity needs to be given the name of this file in the GOOGLE_DRIVE_SETTINGS environment variable. So you could invoke duplicity like this:
GOOGLE_DRIVE_SETTINGS=gdrive duplicity <...>
Or if you're using Duply, you can export this variable in the Duply configuration file:
export GOOGLE_DRIVE_SETTINGS=gdrive
Running Duplicity for the first time will begin the OAuth process; you'll be given a link to visit, which will ask permission for the app you created earlier in the Console to access your Drive account. Accept, and it will give you another authentication token to paste back into the terminal. The authorization info will be saved in a .cache file alongside the gdrive settings file.
At this point you should be good to go, and Duplicity should behave normally. Good luck!

Replacing a empty dropbox's (fresh install) folder with a previously (uptodate) dropbox folder. Is it possible?

I am about to install Maverick and before I do that I am going to reformat my macbook air. I use dropbox and have about 15gb of (small) files on it (mainly documents/ebooks).
My question is: Is it possible to backup my Dropbox folder now, reformat my SSD and and install dropbox again. After wish I replace the dropbox folder with my backup without getting Dropbox confused (It might think it are new files? So dropbox could upload them or/and download the same files again).
Does anyone got any experience with this?
It's fine to do this - I have done it myself, but not on OSX.
The Dropbox client will index the files that it finds on your computer and compare them to the ones which are already in your account (on the server). I believe that it uses some kind of hash function to do this - the client creates a small hash value for each file and then this value is compared to the value on the server. If the value is the same then the client assumes that the file is the same and it does not need to be re-uploaded. However, if you have thousands of files, this can take some time.
Source: https://www.dropbox.com/help/1941/en - "The application will index the files and see that they are the same files in your account."
If you want to do it, when you install Dropbox again, you should sign-in to your account, let it create the Dropbox folder and then click "Pause Syncing" so that it doesn't start downloading everything. Then you should copy the backed-up Dropbox files into the new Dropbox folder and resume syncing.

Display list files and folder using Mediafire API

I tryed to use Mediafire API, but when I use Folder, get_info, it doesn't return file & folder array like the example.
Full url I used: http://www.mediafire.com/api/folder/get_info.php?folder_key=l461cm2d8hfxd
What's wrong with my attempt? Thank you.
You can try the MediaFire API PHP Library. This class currently implements all the functions in the Mediafire API.
Ok I just took a look at their API documentation. They've updated the get_info function for the folders. They've taken out the file tree....
So if you are uploading via the dropbox (which doesn't return the quickkey associated with the file), you CAN NEVER dropbox upload and then use the api to find the file and download it. This renders their API as useless as tits on a boar hog.
The point of a dropbox is to allow remote uploads to a folder, you then know the folder key so you can query the API and return the document quickkeys that are in the folder which then allows you to manage those files remotely, move, delete, download etc. Now you cannot do this FAIL FAIL FAIL.
Despite the Functionality of get_info not working, folder search can resolve at least some issues with retrieving quick keys. In my case i searched for ".mp3" and was handed all the mp3s in my folder