Loading data into Google Colab from a central data repository? - google-colaboratory

What's the best place to upload data for use in Google Colaboratory notebooks? I'm planning to make some notebooks that load netCDF data using python, and I'd like to be able to send the notebooks to other people and have them load the same data without difficulty.
I know I can load data from my own Google Drive, but if I sent other people the notebooks, then I'd have to send them the data files too, right?
Is it possible to have a central data repository that multiple people can load data from? The files that I'd like to use are ~10-100 MB. I only need to read data, not write it. Thanks!

Related

How to permanently upload files onto Google Colab such that it can be directly accessed by multiple people?

My friend and I are working on a project together on Google Colab for which we require a dataset but we keep running into the same problem while uploading it.
What we're doing right now is uploading onto drive and giving each other access and then mounting gdrive each time. This becomes time consuming and irritating as we need to authorize and mount each time.
Is there a better way so that the we can upload the dataset to the home directory and directly access it each time? Or is that not possible because we're assessed a different machine(?) each time?
If you create a new notebook, you can set it to mount automatically, no need to authenticate every time.
See this demo.

is it possible to read a Google Drive folder (all files) as BigQuery external data source?

I am using Google Drive as an external data source in BigQuery. I can able to access a single file, but unable to read a folder with multiple files.
Note:
I have picked up the shareable link from Google Drive for folder and used "bq mk.." command referencing the link ID. Although it creates the table but unable to pull data.
I've not tried it with drive so I have no sense of how performant it is, but when defining an external table (or load job), you can specify the source data as a list of URIs. My suspicion is that it's not particularly scalable and may run into limits in drive, as that's not a typical access pattern. Google Cloud Storage is a much more suitable datasource for this kind of thing.

How to access data from machine using google colab

I want to use google colab. But my data is pretty huge. So I want to access my data directly from the machine in google colab. And I also want to save the files directly in my machine directory. Is there a way I can do that as I can't seem to find any.
Look at how to use local runtime here.
https://research.google.com/colaboratory/local-runtimes.html
Otherwise, you can store your data on GDrive, GCS, or S3. Then, you can just mount it, no need to upload every time.

Inserting realtime data into Bigquery with a file on compute engine?

I'm downloading realtime data into a csv file on Google's Compute Engine instance and want to load this file into Bigquery for realtime analysis.
Is there a way for me to do this without first uploading the file to Cloud Storage?
I tried this: https://cloud.google.com/bigquery/streaming-data-into-bigquery but since my file isnt in JSON, this fails.
Have you tried the command line tool? You can upload CSVs from it.

upload multiple csv from google cloud to bigquery

I need to upload multiple CSV files from my google bucket. Tried pointing to the bucket when creating the dataset, but i received an error. also tried
gsutil load <projectID:dataset.table> gs://mybucket
it didn't work.
I need to upload multiple files at a time as my total data is 2-3 TB and there is a large number of files
You're close. Google Cloud Storage uses gsutil, but BigQuery's command-line utility is "bq". The command you're looking for is bq load <table> gs://mybucket/file.csv.
bq's documentation is over here: https://developers.google.com/bigquery/bq-command-line-tool