BigQuery Cannot read in location: asia - google-bigquery

I am trying to query a federated table where the bucket is in
Multi-region
asia (multiple regions in Asia)
. BigQuery dataset info
Data location:
asia-south1
When I run a simple select * from ... I get:
Cannot read in location: asia

You encounter this error because your bucket is Multi region and your BigQuery dataset is regional. The general rule for location consideration is that the external data and the dataset should be in the same location.
As of now the available multi region BigQuery dataset is for US and EU. Thus the error when using Asia multi region for the external table. To fix this you can either:
Create a new bucket in asia-south1. Transfer your files from your old bucket using Cloud Storage Transfer service to the new bucket. Then create the dataset in asia-south1 as well and you should be able to query without errors.
If you want really want your data in a multi region setup, you can create a new bucket and BQ dataset which are both in US or EU. Just transfer your files to the new bucket and you would be able to execute queries.
NOTE: It is not possible to edit the location of the bucket. When you create a bucket, you permanently define its name, its geographic location, and the project it is part of. Thus the suggested fix mentioned above.

Related

Dataset not found error while loading data from cloud storage bucket

I am new to GCP, I have my csv files kept in Cloud Storage Bucket. When I am trying to load them in my bigquery table from the console, I am receiving the below error:
Error:
Not found: Dataset Myproject1:Dataset1
Can anyone help me with this?
This error generally occurs when your cloud storage bucket and bigquery dataset locations are different.
For example:
If your storage location is Asia and your bigquery dataset location is the United States(US), then you will end up getting this error. You have to keep your storage bucket location as well as your Bigquery dataset location the same.
You can refer to this documentation for more reference:
[1] https://cloud.google.com/bigquery/docs/locations#data-locations

Automatic ETL data before loading to Bigquery

I have CSV files added to a GCS bucket daily or weekly each file name contains (date + specific parameter)
The files contain the schema (id + name) columns and we need to auto load/ingest these files into a bigquery table so that the final table have 4 columns (id,name,date,specific parameter)
We have tried dataflow templates but we couldn't get the date and specific parameter from the file name to the dataflow
And we tried cloud function (we can get the date and specific parameter value from file name) but couldn't add it in columns while ingestion
Any suggestions?
Disclaimer: I have authored an article for this kind of problem using Cloud Workflows. When you want to extract parts of filename, to use as table definition later.
We will create a Cloud Workflow to load data from Google Storage into BigQuery. This linked article is a complete guide on how to work with workflows, connecting any Google Cloud APIs, working with subworkflows, arrays, extracting segments, and calling BigQuery load jobs.
Let’s assume we have all our source files in Google Storage. Files are organized in buckets, folders, and could be versioned.
Our workflow definition will have multiple steps.
(1) We will start by using the GCS API to list files in a bucket, by using a folder as a filter.
(2) For each file then, we will further use parts from the filename to use in BigQuery’s generated table name.
(3) The workflow’s last step will be to load the GCS file into the indicated BigQuery table.
We are going to use BigQuery query syntax to parse and extract the segments from the URL and return them as a single row result. This way we will have an intermediate lesson on how to query from BigQuery and process the results.
Full article with lots of Code Samples is here: Using Cloud Workflows to load Cloud Storage files into BigQuery

BigQuery Table creation options

When we create a table under a particular dataset, we have 5 options like empty table , Google cloud storage and upload etc.My question is if it is a Cloud storage , where does this table gets created in BigQuery or Cloud storage ? as my intention is to dump the data in cloud storage and then load in to BigQuer. Same goes for empty table also as we explicitly define schema , I understand the table will reside in BQ.
I have load the data by below script:
bq load --source_format=CSV --skip_leading_rows=1 --autodetect --ignore_unknown_values \
commerce.balltoball gs://balltoballbucket/head_usa_names.csv
I suppose the balltoballbucket is referred to storage bucket where as commerce.balltoball is BigQuery refrence.
Apologies for newbie question. Thanks for your help.
If your bq load works, then UI should work for you. The documentation is here:
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table (then pick Console tab)
Select file from GCS bucket: gs://balltoballbucket/head_usa_names.csv
File Format: CSV
Dataset Name: commerce
Table Name: balltoball
Other options see on the page:
(Optional) Click Advanced options.
As to where the table is stored, if you pick Native table as Table type, it is stored inside BigQuery storage, and External table for letting the data stay on GCS and only read from GCS when there is a query hitting the table.

Is it possible to set the default region in the BigQuery web console?

Is it possible to set the BigQuery default region for uploads and queries in the web console? Currently, when I go to create table, it defaults to the US region.
The region location of the BigQuery product is dataset-based. That means that all the tables in the same dataset will be in the same region.
When you create a new dataset via the console, you are prompted choose one of the following regions, but you cannot change the dataset location later.
However, if it's essential to have the dataset in another region, you can follow this steps to transfer the data to a new dataset in another location.

Google Dataflow job and BigQuery failing on different regions

I have a Google Dataflow job that is failing on:
BigQuery job ... finished with error(s): errorResult:
Cannot read and write in different locations: source: EU, destination: US, error: Cannot read and write in different locations: source: EU, destination: US
I'm starting the job with
--zone=europe-west1-b
And this is the only part of the pipeline that does anything with BigQuery:
Pipeline p = Pipeline.create(options);
p.apply(BigQueryIO.Read.fromQuery(query));
The BigQuery table I'm reading from has this in the details: Data Location EU
When I run the job locally, I get:
SEVERE: Error opening BigQuery table dataflow_temporary_table_339775 of dataset _dataflow_temporary_dataset_744662 : 404 Not Found
I don't understand why it is trying to write to a different location if I'm only reading data. And even if it needs to create a temporary table, why is it being created in a different region?
Any ideas?
I would suggest to verify:
If the staging location for the Google Dataflow is in the same zone.
If Google Cloud Storage location used in Dataflow is also the in same zone.