Google Dataflow job and BigQuery failing on different regions - google-bigquery

I have a Google Dataflow job that is failing on:
BigQuery job ... finished with error(s): errorResult:
Cannot read and write in different locations: source: EU, destination: US, error: Cannot read and write in different locations: source: EU, destination: US
I'm starting the job with
--zone=europe-west1-b
And this is the only part of the pipeline that does anything with BigQuery:
Pipeline p = Pipeline.create(options);
p.apply(BigQueryIO.Read.fromQuery(query));
The BigQuery table I'm reading from has this in the details: Data Location EU
When I run the job locally, I get:
SEVERE: Error opening BigQuery table dataflow_temporary_table_339775 of dataset _dataflow_temporary_dataset_744662 : 404 Not Found
I don't understand why it is trying to write to a different location if I'm only reading data. And even if it needs to create a temporary table, why is it being created in a different region?
Any ideas?

I would suggest to verify:
If the staging location for the Google Dataflow is in the same zone.
If Google Cloud Storage location used in Dataflow is also the in same zone.

Related

Dataset not found error while loading data from cloud storage bucket

I am new to GCP, I have my csv files kept in Cloud Storage Bucket. When I am trying to load them in my bigquery table from the console, I am receiving the below error:
Error:
Not found: Dataset Myproject1:Dataset1
Can anyone help me with this?
This error generally occurs when your cloud storage bucket and bigquery dataset locations are different.
For example:
If your storage location is Asia and your bigquery dataset location is the United States(US), then you will end up getting this error. You have to keep your storage bucket location as well as your Bigquery dataset location the same.
You can refer to this documentation for more reference:
[1] https://cloud.google.com/bigquery/docs/locations#data-locations

BigQuery Cannot read in location: asia

I am trying to query a federated table where the bucket is in
Multi-region
asia (multiple regions in Asia)
. BigQuery dataset info
Data location:
asia-south1
When I run a simple select * from ... I get:
Cannot read in location: asia
You encounter this error because your bucket is Multi region and your BigQuery dataset is regional. The general rule for location consideration is that the external data and the dataset should be in the same location.
As of now the available multi region BigQuery dataset is for US and EU. Thus the error when using Asia multi region for the external table. To fix this you can either:
Create a new bucket in asia-south1. Transfer your files from your old bucket using Cloud Storage Transfer service to the new bucket. Then create the dataset in asia-south1 as well and you should be able to query without errors.
If you want really want your data in a multi region setup, you can create a new bucket and BQ dataset which are both in US or EU. Just transfer your files to the new bucket and you would be able to execute queries.
NOTE: It is not possible to edit the location of the bucket. When you create a bucket, you permanently define its name, its geographic location, and the project it is part of. Thus the suggested fix mentioned above.

Bigquery - Error creating scheduled query: Cannot create a transfer in JURISDICTION_US when destination dataset is located in JURISDICTION_EU

BigQuery has started throwing up this error
"Error creating scheduled query: Cannot create a transfer in
JURISDICTION_US when destination dataset is located in
JURISDICTION_EU".
My datasets are all in the EU but I don't understand why it is trying to create a transfer in the US.
Has anyone had a similar issue and been able to resolve it?
I ran into this issue and did the following:
"In the UI, under More -> Query Settings, then look for Processing Location down the bottom before Advanced."
Then I had to reload the page for it to work.

BigQuery "Backend Error, Job aborted" when exporting data

The export job for one of my tables fails in BigQuery with no error message, I checked the job id hoping to get more info but it just says "Backend Error, Job aborted". I used the command-line tool with tis command
bq extract --project_id=my-proj-id --destination_format=NEWLINE_DELIMITED_JSON 'test.table_1' gs://mybucket/export
I checked this question but I know that it is not a problem with my destination bucket in GCS, Because exporting other tables to same bucked is done successfully.
The only difference here is that this table has a repeated record field and each json can get pretty large but I did not find any limit for this on BigQuery docs.
Any ideas on what be the problem can be?
Job Id from one of my tries: bqjob_r51435e780aefb826_0000015691dda235_1

Issues creating table from bucket file

I have a big table (About 10 million rows) that I'm trying to pull into my bigquery. I had to upload the CSV into the bucket due to the size constraints when creating the table. When I try to create the table using the Datastore, the job fails with the error:
Error Reason:invalid. Get more information about this error at Troubleshooting Errors: invalid.
Errors:
gs://es_main/provider.csv does not contain valid backup metadata.
Job ID: liquid-cumulus:job_KXxmLZI0Ulch5WmkIthqZ4boGgM
Start Time: Dec 16, 2015, 3:00:51 PM
End Time: Dec 16, 2015, 3:00:51 PM
Destination Table: liquid-cumulus:ES_Main.providercloudtest
Source URI: gs://es_main/provider.csv
Source Format: Datastore Backup
I've troubleshot by using a small sample file of rows from the same table and just uploading using the CSV feature in the table creation without any errors and can view the data just fine.
I'm just wondering what the metadata should be set as with the "Edit metadata" option within the bucket or if there is some other work around I'm missing. Thanks
The error message for the job that you posted is telling you that the file you're providing is not a Datastore Backup file. Note that "Datastore" here means Google Cloud Datastore, which is another storage solution that it sounds like you aren't using. A Cloud Datastore Backup is a specific file type from that storage product which is different from CSV or JSON.
Setting the file metadata within the Google Cloud Storage browser, which is where the "Edit metadata" option you're talking about lives, should have no impact on how BigQuery imports your file. It might be important if you were doing something more involved with your file from Cloud Storage, but it isn't important to BigQuery as far as I know.
To upload a CSV file from Google Cloud Storage to BigQuery, make sure to select the CSV source format and the Google Storage load source as pictured below.