I am using R in Google Cloud Datalab and I want to save my output, which is a table containing Strings that is created in the code itself, to BigQuery. I know there is a way to do it with Python by using bqr_create_table so I am looking for the equivalent in R.
I have found this blog post from Gus Class on Google Cloud Platform which uses this code to write to BigQuery:
# Install BigRQuery if you haven't already...
# install.packages("devtools")
# devtools::install_github("rstats-db/bigrquery")
# library(bigrquery)
insert_upload_job("your-project-id", "test_dataset", "stash", stash)
Where "test_dataset" is the dataset in BigQuery, "stash" is the table inside the dataset and stash is any dataframe you have define with your data.
There is more information on how to authorize with bigrquery
Related
We can share individual dataset access, through this step.
https://cloud.google.com/bigquery/docs/dataset-access-controls
Select the dataset
Share the dataset
We can do this one by one. And we can even share the individual BigQuery table this way.
But how can I know who have been given the shared access to all these datasets/tables of my GCP project?
Instead of go to the each dataset and each table, check "share" link manually?
Thanks!
I personally use a python function that shows that information for all datasets.
from google.cloud import bigquery
import os
client = bigquery.Client()
datasets = list(client.list_datasets())
project = client.project
print("Datasets in project {}:".format(project))
for dataset in datasets:
os.system('bq show --project_id {} {}'.format(project,dataset.dataset_id))
You can do:
bq show --project_id PROJECT DATASET
I have some datasets in BigQuery, I wonder if there is a way to use the same datasets in Data Lab? As the datasets are big, I can't download it and reload it in Data Lab.
Thank you very much.
The BigQuery Python client library support querying data stored in BigQuery. To load the commands from the client library, paste the following code into the first cell of the notebook:
%load_ext google.cloud.bigquery
%load_ext is one of the many Jupyter built-in magic commands.
The BigQuery client library provides a %%bigquery cell, which runs a SQL query and returns the results as a Pandas DataFrame.
You can query data from a public dataset or from the datasets in your project:
%%bigquery
SELECT *
FROM `MY_PROJECT.MY_DATASET.MY_TABLE`
LIMIT 50
I was able to successfully get data from the dataset without any issues.
You can follow this tutorial. I hope it helps.
One of our BQ datasets is no longer accessible via BQ Web UI and Cloud Shell.
It shows message "Not found: Dataset project:dataset" immediately upon opening the UI.
We tried a couple of bq shell commands as well:
bq ls: successfully lists the "missing" dataset
bq ls dataset: returns "BigQuery error in ls operation: Not found: Dataset project:dataset"
But we were able to query the views inside and access the contents of the dataset via PowerBI.
IAM Permission: Owner
Anyone encountering similar issue?
I had a similar issue - the BigQuery client library would list the dataset when I called ListDatasets(), but attempting to call UploadCsv() with the same dataset ID would return 404 Dataset not found.
Turns out it was because I had selected 'asia-northeast1' as the Data Location when creating the dataset - it doesn't tell you when you create the dataset that this region is treated differently, but a line in the BigQuery docs says:
If your data is in a location other than the US or EU multi-regional
location, you must specify the location when you perform actions such
as loading data, querying data, and exporting data.
Re-creating the dataset in the US region fixed my issue. Or you could use the options in the docs above to specify the 'asia-northeast1' location everytime instead.
At that time, there was a caching issue that caused this to happen and was already resolved.
I have a software product with a Ruby API that generates a table-like output when queried, and I would like to dynamically connect the output to Google Cloud bigQuery.
Having read the documentation, there is a dynamic connector for Google Sheets, and static ETL connectors to PostgreSQL and other (https://cloud.google.com/blog/big-data/2016/05/bigquery-integrates-with-google-drive).
If I have a ruby query that looks like the one below:
ruby productX-api/ruby/query_table.rb param1 param2
and this produces a table from the query:
field1,field2,field3
foo,bar,bar
xyz,abc,def
What options do I have to connect this to bigQuery?
There's no built-in connector as you'd like but you can pretty easily load the resulting csv file programmatically by using the Google Cloud client library for Ruby. For example:
require "google/cloud/bigquery"
bigquery = Google::Cloud::Bigquery.new
dataset = bigquery.dataset "my_dataset"
table = dataset.table "my_table"
file = File.open "my_data.csv"
load_job = table.load_job file
More information here for the particular load_job method.
Guys a very basic question but not able to decipher ,Please help me out.
Q1: When we create bigquery table using below command , the data resides in same Cloud Storage?
bq load --source_format=CSV 'market.cust$20170101' \
gs://sp2040/raw/cards/cust/20170101/20170101_cust.csv
Q2: let's say my data director is gs://sp2040/raw/cards/cust/ for customer file Table structure defined is:
bq mk --time_partitioning_type=DAY market.cust \
custid:string,grp:integer,odate:string
Everyday I create new dir in the bucket such as 20170101,20170102..to load new dataset. So after the data loaded in this bucket Do I need to fire below queries.
D1:
bq load --source_format=CSV 'market.cust$20170101' \
gs://sp2040/raw/cards/cust/20170101/20170101_cust.csv
D2:
bq load --source_format=CSV 'market.cust$20170102' \
gs://sp2040/raw/cards/cust/20170102/20170102_cust.csv
When we create bigquery table using below command , the data resides in same Cloud Storage?
Nope! BigQuery is not using Cloud Storage for storing data (unless it is federated Table linked to Cloud Storage)
Check BigQuery Under the Hood with Tino Tereshko and Jordan Tigani - you will like it
Do I need to fire below queries
Yes. you need to load those files into BigQuery, so you can query the data
Yes you would need load the data into BigQuery using those commands.
However, there are a couple of alternatives
PubSub and Dataflow: You could configure PubSub to watch your cloud storage and create notification when files are added, described here. You could then have Dataflow job that imported the file into BigQuery. DataFlow Documentation
BigQuery external tables: BigQuery can query cvs files that are stored in Cloud Storage without importing the data, as described here. There is wildcard support for filenames so it could be configured once. Performance might not be as good as directly storing items in BigQuery