is it possible to read a Google Drive folder (all files) as BigQuery external data source? - google-bigquery

I am using Google Drive as an external data source in BigQuery. I can able to access a single file, but unable to read a folder with multiple files.
Note:
I have picked up the shareable link from Google Drive for folder and used "bq mk.." command referencing the link ID. Although it creates the table but unable to pull data.

I've not tried it with drive so I have no sense of how performant it is, but when defining an external table (or load job), you can specify the source data as a list of URIs. My suspicion is that it's not particularly scalable and may run into limits in drive, as that's not a typical access pattern. Google Cloud Storage is a much more suitable datasource for this kind of thing.

Related

Loading data into Google Colab from a central data repository?

What's the best place to upload data for use in Google Colaboratory notebooks? I'm planning to make some notebooks that load netCDF data using python, and I'd like to be able to send the notebooks to other people and have them load the same data without difficulty.
I know I can load data from my own Google Drive, but if I sent other people the notebooks, then I'd have to send them the data files too, right?
Is it possible to have a central data repository that multiple people can load data from? The files that I'd like to use are ~10-100 MB. I only need to read data, not write it. Thanks!

How to access data from machine using google colab

I want to use google colab. But my data is pretty huge. So I want to access my data directly from the machine in google colab. And I also want to save the files directly in my machine directory. Is there a way I can do that as I can't seem to find any.
Look at how to use local runtime here.
https://research.google.com/colaboratory/local-runtimes.html
Otherwise, you can store your data on GDrive, GCS, or S3. Then, you can just mount it, no need to upload every time.

Have multiple names for same blob content

Let's say I have a file called foo.txt in my Azure Storage as a blob. Is it possible for creating a link of sorts or a redirect url where I can access foo.txt's content even when I visit bar.txt?
Ideally I do not want to upload the same file content again for bar.txt too to avoid wasting space.
No, you can't. Azure Blob Storage is just simple object storage, not a full file system having soft links or hard links.
BTW, you may consider simulating the link feature following answers here: Is there a way to do symbolic links to the blob data when using Azure Storage to avoid duplicate blobs?

Accessing Dropbox Datastore database

I use an outdated iOS app called Loggr, and now would like to extract data stored in it. It syncs with Dropbox Datastore, which I can see on my Dropbox account:
But I cant find any files corresponding to among my Dropbox files. My question, how do I extract the information from the Datastore?
Dropbox Datastores are a structured data storage system, separate from files, so they won't appear as files in your account. They should be available under "Apps you use" here though:
https://www.dropbox.com/developers/apps/datastores

How to upload multiple files to google cloud storage bucket as a transaction

Use Case:
Upload multiple files into a cloud storage bucket, and then use that data as a source to a bigquery import. Use the name of the bucket as the metadata to drive which sharded table the data should go into.
Question:
In order to prevent partial import to the bigquery table, ideally, I would like to do the following,
Upload the files into a staging bucket
Verify all files have been uploaded correctly
Rename the staging bucket to its final name (for example, gs://20130112)
Trigger the bigquery import to load the bucket into a sharded table
Since gsutil does not seem to support bucket rename, what are the alternative ways to accomplish this?
Google Cloud Storage does not support renaming buckets, or more generally an atomic way to operate on more than one object at a time.
If your main concern is that all objects were uploaded correctly (as opposed to needing to ensure the bucket content is only visible once all objects are uploaded), gsutil cp supports that -- if any object fails to upload, it will report the number that failed to upload and exit with a non-zero status.
So, a possible implementation would be a script that runs gsutil cp to upload all your files, and then checks the gsutil exit status before creating the BigQuery table load job.
Mike Schwartz, Google Cloud Storage team
Object names are actually flat in Google Cloud Storage; from the service's perspective, '/' is just another character in the name. The folder abstraction is provided by clients, like gsutil and various GUI tools. Renaming a folder requires clients to request a sequence of copy and delete operations on each object in the folder. There is no atomic way to rename a folder.
Mike Schwartz, Google Cloud Storage team