BigQuery API on gcp compute instance - api

New to BigQuery on GCP
I'm trying to query tables on a public dataset on gcp.
I'd like to query the tables via my compute instance (debian).
Is there a step by step out there?
Thanks
MS

Please find step below.
Create your Compute Engine with Custom scope and Enable Bigquery access.
If you already created instance , stop the instance and then click on edit it will allow you to change the service account scope.
now run your query with bq as below.
bq query --nouse_legacy_sql
'SELECT
*
FROM
bigquery-public-data.samples.shakespeare limit 2'

Related

BigQuery scheduled query failing

I have a BigQuery scheduled query that is failing with the following error:
Not found: Dataset bunny25256:dataset1 was not found in location US at [5:15]; JobID: 431285762868:scheduled_query_635d3a29-0000-22f2-888e-14223bc47b46
I scheduled the query via the SQL Workspace. When I run the query in the workspace, it works fine. The dataset and everything else that I have created is in the same region: us-central1.
Any ideas on what the problem could be, and how I could fix it or work around it?
There's nothing special about the query, it computes some statistics on a table in dataset1 and puts it in dataset2.
When you submit a query, you submit it to BQ at a given location. The dataset you created lives in us-central1 but your query was submitted to us. The location us and us-central1 are not the same. Change your scheduled query to run in us-central1. See docs on location for more info.
Dataset is not provided correctly- it should be in formate project.dataset.table
try running below in big query
select * from bunny25256:dataset1
you should provide bunny25256:dataset1.table

How to save a view using federated queries across two projects?

I'm looking to save a view which uses federated queries (from a MySQL Cloud SQL connection) between two projects. I'm receiving two different errors (depending on which project I try to save in).
If I try to save in the project containing the dataset I get error:
Not found: Connection my-connection-name
If I try to save in the project that contains the connection I get error:
Not found: Dataset my-project:my_dataset
My example query that crosses projects looks like:
SELECT
bq.uuid,
sql.item_id,
sql.title
FROM
`project_1.my_dataset.psa_v2_202005` AS bq
LEFT OUTER JOIN
EXTERNAL_QUERY( 'project_2.us-east1.my-connection-name',
'''SELECT item_id, title
FROM items''') AS sql
ON
bq.looks_info.query_item.item_id = sql.item_id
The documentation at https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries#known_issues_and_limitations doesn't mention any limitations here.
Is there a way around this so I can save a view using an external connection from one project and dataset from another?
Your BigQuery table is located in US and your MySQL data source is located in us-east1. BigQuery automatically chooses to run the query in the location of your BigQuery table (i.e. in US), however, your Cloud MySQL is in us-east1 and that's why your query fails. Therefore the BigQuery table and Cloud SQL instance, must be in the same location in order for this query to succeed.
The solution for this kind of cases is moving your BigQuery dataset to the same location as your Cloud SQL instance manually by following the steps explained in detail in this documentation. However, the us-east1 is not currently supported for copying datasets. Thus, I will recommend you to create a new connection in one of the locations mentioned in the documentation.
I hope you find the above pieces of information useful.

BigQuery - scheduled query through CLI

Simple question regarding bq cli tool. I am fairly confident the answer is, as of the writing of this question, no, but may be wrong.
Is it possible to create a scheduled query (similar to seen in the screenshot below) using the bq cli tool?
Yes, scheduled queries now could be created with bq mk --transfer_config. Please see examples below:
To create a scheduled query with query SELECT 1:
bq mk --transfer_config --target_dataset=mydataset --display_name='My Scheduled Query' --schedule='every 24 hours' --params='{"query":"SELECT 1","destination_table_name_template":"mytable","write_disposition":"WRITE_TRUNCATE"}' --data_source=scheduled_query
Note:
--target_dataset is required.
--display_name is required.
In --params field, query is required and we only support Standard SQL queries.
In --params field, destination_table_name_template is optional for DML and DDL but required for regular SELECT queries.
In --params field, write_disposition is same as destination_table_name_template, required for regular SELECT queries but optional for DML and DDL.
--data_source needs to be always set to scheduled_query to create a scheduled query.
After a scheduled query is created successfully, you could expect a full resource name, for example:
Transfer configuration 'projects/<p>/locations/<l>/transferConfigs/5d1bec8c-0000-2e6a-a4eb-089e08248b78' successfully created.
To schedule a backfill for this scheduled query, for example:
bq mk --transfer_run --start_time 2017-05-25T00:00:00Z --end_time 2017-05-25T00:00:00Z projects/<p>/locations/<l>/transferConfigs/5d1bec8c-0000-2e6a-a4eb-089e08248b78
Hope this helps! Thank you for using scheduled queries!

How to save Google Cloud Datalab output into BigQuery using R

I am using R in Google Cloud Datalab and I want to save my output, which is a table containing Strings that is created in the code itself, to BigQuery. I know there is a way to do it with Python by using bqr_create_table so I am looking for the equivalent in R.
I have found this blog post from Gus Class on Google Cloud Platform which uses this code to write to BigQuery:
# Install BigRQuery if you haven't already...
# install.packages("devtools")
# devtools::install_github("rstats-db/bigrquery")
# library(bigrquery)
insert_upload_job("your-project-id", "test_dataset", "stash", stash)
Where "test_dataset" is the dataset in BigQuery, "stash" is the table inside the dataset and stash is any dataframe you have define with your data.
There is more information on how to authorize with bigrquery

Google Bigquery query execution using google cloud dataflow

Is it possible to execute Bigquery's query using Google cloud data flow directly and fetch data, not reading data from table then putting conditions?
For example, PCollections res=p.apply(BigqueryIO.execute("Select col1,col2 from publicdata:samples.shakeseare where ...."))
Instead of reinventing using iterative method what Bigquery queries already implemented, we can use the same directly.
Thanks and Regards
Ajay K N
BigQueryIO currently only supports reading from a Table and not a Query or View (FAQ).
One way to work around this is in your main program to create a BigQuery permanent table by issuing a query before you run your Dataflow job. After, your job runs you could delete the table.