Federated table/query not working - "Cannot read in location: us-west1" - google-bigquery

I have a GCS bucket in US-WEST1:
That bucket has two files:
wiki_1b_000000000000.csv.gz
wiki_1b_000000000001.csv.gz
I've created a external table definition to read those files like so:
The dataset where this external table definition exists is also in the US.
When I query it with:
SELECT
*
FROM
`grey-sort-challenge.bigtable.federated`
LIMIT
100
..I get the following error:
Error: Cannot read in location: us-west1
I tested with asia-northeast1 and it works fine.
Why isn't this working for the US region?

Faced the same earlier. See G's answer - must use us-central1 for now: https://issuetracker.google.com/issues/76127552#comment11

For people from Europe
If you get an error Cannot read in location: EU while trying to read from external source - regional GCS bucket, you have to place your data in region europe-west1 as per the same comment. Unfortunately it is not reflected in the documentation yet.

I wanted to create a federation(external table) to contiually load up data from a new csv file which was imported each day.
In attempting to do so I was getting "Error: Cannot read in location: xxxx "
I solved the problem by:
I recreated a NEW bucket, this time select the US ( Multiple regions )
I then went back to BIG query and created a NEW data set with the data location as United States (US)
Presto!, I am now able to query an (constantly updating) external table!

Related

BQ client.load_table_from_uri behaves different from python to with in Airflow

I am trying to load a CSV into BQ using a custom operator in Airflow.
My custom operator is using
load_job_config = bigquery.LoadJobConfig(
schema=self.schema_fields,
skip_leading_rows=self.skip_leading_rows,
source_format=bigquery.SourceFormat.CSV
)
load_job = client.load_table_from_uri(
'gs://' + self.source_bucket + "/" + self.source_object, self.dsp_tmp_dataset_table,
job_config=load_job_config
) #
The issue I am facing is that I always get errors
google.api_core.exceptions.BadRequest: 400 Provided Schema does not match Table nonprod-cloud-composer:dsp_data_transformation.tremorvideo_daily_datafeed. Field Date has changed type from TIMESTAMP to DATE
The exact same code when run outside of Airflow as a stand alone python works fine.
I am using the exactly same schema object , same source CSV file just that the environment is different.
Below is the high level steps followed
Created table in BQ
Using the
LOAD DATA OVERWRITE XXXX
FROM FILES (
format = 'CSV',
uris = ['gs://xxx.csv']);
This worked fine and the data was loaded into the table.
3. Truncated the table and tried to run the custom operator what has the code above listed. Then faced errors.
4. Created a simple python program with to test the bq load job and that works fine too.
Its just that when ever the same load job is triggered using Airflow the schema detection fails and leads to all sorts of errors.

DBT - [WARNING]: Did not find matching node for patch

I keep getting the error below when I use dbt run - I can't find anything on why this error occurs or how to fix it within the dbt documentation.
[WARNING]: Did not find matching node for patch with name 'vGenericView' in the 'models' section of file 'models\generic_schema\schema.sql'
did you by chance recently upgrade to dbt 1.0.0? If so, this means that you have a model, vGenericView defined in a schema.yml but you don't have a vGenericView.sql model file to which it corresponds.
If all views and tables defined in schema are 1 to 1 with model files then try to run dbt clean and test or run afterward.
Not sure what happened to my project, but ran into frustration looking for missing and/or misspelled files when it was just leftovers from different compiled files not cleaned out. Previously moved views around to different schemas and renamed others.
So the mistake is here in the naming:
The model name in the models.yml file should for example be: employees
And the sql file should be named: employees.sql
So your models.yml will look like:
version: 2
models:
- name: employees
description: "View of employees"
And there must be a model with file name: employees.sql
One case when this will happen is if you have the same data source defined in two different schema.yml file (or whatever you call it)

pyspark.sql.utils.IllegalArgumentException: requirement failed: Temporary GCS path has not been set

On Google Cloud Platform, I am trying to submit a pyspark job that writes a dataframe to BigQuery.
The code that executes the writing is as the following:
finalDF.write.format("bigquery")\
.mode('overwrite')\
.option("table","[PROJECT_ID].dataset.table")\
.save()
And I get the mentioned error in the title. How can I set the GCS temporary path?
As the github repository of spark-bigquery-connector states
One can specify it when writing:
df.write
.format("bigquery")
.option("temporaryGcsBucket","some-bucket")
.save("dataset.table")
Or in a global manner:
spark.conf.set("temporaryGcsBucket","some-bucket")
Property "temporaryGcsBucket" needs to be set either at the time of writing dataframe or while creating sparkSession.
.option("temporaryGcsBucket","some-bucket")
or like .option("temporaryGcsBucket","some-bucket/optional_path")
1. finalDF.write.format("bigquery") .mode('overwrite').option("temporaryGcsBucket","some-bucket").option("table","[PROJECT_ID].dataset.table") .save()

gcloud nodejs createDataset: How to set data location?

When I create a dataset using the expression below, it commits but doesn't set any location and when I try to copy data from other dataset I have an error because they in different locations: source: EU, destination: US.
bigquery.createDataset('my-dataset', function(err, dataset, apiResponse) {});
I didn't find anything in the docs https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.24.1/bigquery?method=createDataset
Is it possible to do it?
This is an oops on our part when writing the createDataset code. Sorry about that; I'll try to get it out for the next release.
(Issue opened at https://github.com/GoogleCloudPlatform/gcloud-node/issues/941)

google big query: export table to own bucket results in unexpected error

I'am stuck trying to export a table to my google cloud storage bucket.
Example job id: job_0463426872a645bea8157604780d060d
I tried the cloud storage target with alot of different variations, all reveal the same error. If I try to copy the natality report, it works.
What am I doing wrong?
Thanks!
Daniel
It looks like the error says:
"Table too large to be exported to a single file. Specify a uri including a * to shard export." Try switching the destination URI to something like gs://foo/bar/baz*
Specify the file extension along with the pattern. Example
gs://foo/bar/baz*.gz in case of GZIP (compressed)
gs://foo/bar/baz*.csv in case of csv (uncompressed)
The foo directory is the bucket name and bar directory can be your
date in string format which could be generated on the fly.
I was able to do it with:
bq extract --destination_format=NEWLINE_DELIMITED_JSON myproject:mydataset.mypartition gs://mybucket/mydataset/mypartition/{*}.json