How to transfer a file between a cloud storage and a download/upload host? (and vice versa) - selenium

Let's say there is a file in this link example.com/file.bin and I want to transfer it to a cloud storage (Mega or Google Drive for example), or I have a download/upload host and want to transfer files from a cloud storage to that host. how can I do that? (Besides the good old downloading-and-reuploading way and transfer sites like Multcloud)
At first I thought I can use python + selenium framework to handle the cloud storage side, but that works only if I have the file on my own system. Can I use a host to deploy the code on it, and then use it to transfer the files? (Some of cloud storages don't have API to use them for downloading, So I think it's necessary to use Selenium)

Related

BigQuery retrieve data from sftp

I have an internet facing sftp server that has regularly csv files update. Is there anyway command to have BigQuery to retrieve data from this sftp and put it into tables. Alternatively, any API or python library that support this?
As for BigQuery - there's no integration I know of with SFTP.
You'll need to either:
Create/find a script that reads from SFTP and pushes to GCS.
Add a HTTPS service to the SFTP server, so your data can be read with the GCS transfer service (https://cloud.google.com/storage-transfer/docs/)
Yet another 3rd party Tool supporting (S)FTP In and Out to/from GCP is Magnus - Workflow Automator which is part of Potens.io Suite - supports all BigQuery, Cloud Storage and most of Google APIs as well as multiple simple utility type Tasks like BigQuery Task, Export to Storage Task, Loop Task and many many more along with advanced scheduling, triggering, etc.. Also available at Marketplace.
FTP-to-GCS Task accepts a source FTP URI and can transfer single or multiple files based on input to a destination location in Google Cloud Storage. The resulting uploaded list to Google Cloud Storage is saved to a parameter for later use within the Workflow. The source FTP can be of types SFTP, FTP, or FTPS.
See here for more
Disclosure: I am GDE for Google Cloud and creator of those tools and leader on Potens team

How to store google auth and data copied from cloud storage across sessions in Colab?

I recently started using Google Colab and found that I need to reauth and recopy my data which is stored on Google Cloud Storage every time I start a new session.
Given that I'm using all Google services, is there a way to ensure that my settings are maintained in the environment?
There's no way to persist local files on the VM across sessions. You'll want to store persistent data externally in something like Drive, GCS, or your local filesystem.
Some recipes are available in the I/O example notebook.

send file to google drive or other online store from my server

I am using a shared host. No root access. I want to transfer files between the host and any online storage. The target is to take online backup and restore files from storage to host directly without download/upload. Kindly suggest me some way.
Mover.io is an online service that helps you easily transfer files and folders from one cloud storage service to another. The service works on a freemium model – you can transfer up to 10 GB of data for free and then pay $1 per extra GB of transfer.
Mover has connectors for all popular cloud storage providers. You may copy files from your Google Drive to Dropbox, from SkyDrive to Box or even from your old Google account to the new one. They also support FTP allowing you to directly transfer files from Google Drive or Dropbox to your FTP server, over the cloud.

Amazon S3 WebDAV access

I would like to access my Amazon S3 buckets without third-party software, but simply through the WebDAV functionality available in most operating systems. Is there a way to do that ? It is important to me that no third-party software is required.
There's a number of ways to do this. I'm not sure about your situation, so here they are:
Option 1: Easiest: You can use a 3rd party "cloud gateway" provider, like http://storagemadeeasy.com/CloudDav/
Option 2: Set up your own "cloud gateway" server
Set up a dedicated server or virtual server to act as a gateway. Using Amazon's own EC2 would be a good choice.
Set up software that mounts S3 as a drive. Two I know of on Windows: (1) CloudBerry Drive http://www.cloudberrylab.com/ and (2) WebDrive (http://webdrive.com). For Linux, I have never done it, but you can try: https://github.com/s3fs-fuse/s3fs-fuse
Set up a webdav server like CrushFTP. (It comes to mind because it's stable and cheap and works on any OS.) Another option is IIS but I personally find it's harder to set up securely for webdav.
Set up a user in your WebDav server (ie CrushFTP or IIS) with access to the mapped S3 drive.
Possible snag: Assuming you're using Windows, to start your services automatically and have this work, you may need to set up both services to use the same Windows user account (Services->(Your Service)->[right-click]Properties->Log On tab). This is because the S3 mapping software might not map the S3 drive for all Windows users. Alternatively, you can use FireDaemon if you get stuck on this step to start the programs as a service all under the same username.
Other notes: I have experience using WebDrive under pretty heavy loads, and it seems to work well. Under tons of pounding (I'm talking thousands of files per hour being added to a 5 TB WebDrive) it started to crash Windows. But I'm not sure if you are going that far with it. Also, if you're using EC2, you may not have that issue since it was likely caused by a huge transfer queue in memory and EC2 will have faster transit to S3 and keep the queue smaller.
I finally gave up on this idea and today I use Rclone (https://rclone.org) to synchronize my files between AWS S3 and different computers. Rclone has the ability to mount remote storage on a local computer, but I don't use this feature. I simply use the copy and sync commands.
S3 does not support webdav, so you're out of luck!
Also, S3 does not support hierarchial name spaces, so you cant directly map a filesystem onto it
There is an example java project here for putting a webdav server over Amazon S3 - https://github.com/miltonio/milton-aws

Can i run a website on Amazon S3 ? Say, by using Amazon S3 PHP SDK?

What exactly the SDK can be used for ? Only for storage like it's done on google drive, box or dropbox etc ? Or can i use the stored scripts to run a complete website ?
What exactly the SDK can be used for?
The Software Development Kit (SDK) can be used to programmatically control nearly every single aspect across all 40± AWS services.
Only for storage like it's done on google drive, box or dropbox etc?
Amazon S3 is a storage-only service. It complements the plethora of other AWS services.
Or can i use the stored scripts to run a complete website?
For that, you'd need something with a server. I recommend taking a look at AWS Elastic Beanstalk first because that's arguably the quickest way to get something running. If you're looking for something with more control, you can check out AWS OpsWorks.
If you want a raw virtual server, take a look at Amazon EC2. If you want to build a template that can automate and configure nearly your entire cloud infrastructure (storage, compute, databases, etc.), take a look at Amazon CloudFormation.