Weird issue: Bigquery dataset not found - google-bigquery

One of our BQ datasets is no longer accessible via BQ Web UI and Cloud Shell.
It shows message "Not found: Dataset project:dataset" immediately upon opening the UI.
We tried a couple of bq shell commands as well:
bq ls: successfully lists the "missing" dataset
bq ls dataset: returns "BigQuery error in ls operation: Not found: Dataset project:dataset"
But we were able to query the views inside and access the contents of the dataset via PowerBI.
IAM Permission: Owner
Anyone encountering similar issue?

I had a similar issue - the BigQuery client library would list the dataset when I called ListDatasets(), but attempting to call UploadCsv() with the same dataset ID would return 404 Dataset not found.
Turns out it was because I had selected 'asia-northeast1' as the Data Location when creating the dataset - it doesn't tell you when you create the dataset that this region is treated differently, but a line in the BigQuery docs says:
If your data is in a location other than the US or EU multi-regional
location, you must specify the location when you perform actions such
as loading data, querying data, and exporting data.
Re-creating the dataset in the US region fixed my issue. Or you could use the options in the docs above to specify the 'asia-northeast1' location everytime instead.

At that time, there was a caching issue that caused this to happen and was already resolved.

Related

Bigquery - Error creating scheduled query: Cannot create a transfer in JURISDICTION_US when destination dataset is located in JURISDICTION_EU

BigQuery has started throwing up this error
"Error creating scheduled query: Cannot create a transfer in
JURISDICTION_US when destination dataset is located in
JURISDICTION_EU".
My datasets are all in the EU but I don't understand why it is trying to create a transfer in the US.
Has anyone had a similar issue and been able to resolve it?
I ran into this issue and did the following:
"In the UI, under More -> Query Settings, then look for Processing Location down the bottom before Advanced."
Then I had to reload the page for it to work.

When I run snowflake stage query I get aws error

I've created an s3 linked stage on snowflake called csv_stage with my aws credentials, and the creation was successful.
Now I'm trying to query the stage like below
select t.$1, t.$2 from #sandbox_ra.public.csv_stage/my_file.csv t
However the error I'm getting is
Failure using stage area. Cause: [The AWS Access Key Id you provided is not valid.]
Any idea why? Do I have to pass something in the query itself?
Thanks for your help!
Ultimately let's say my s3 location has 3 different csv files. I would like to load each one of them individually to different snowflake tables. What's the best way to go about doing this?
Regarding the last part of your question: You can load multiple files with one COPY INTO-command by using the file names or a certain regex-pattern. But as you have 3 different files for 3 different tables you also have to use three different COPY INTO-commands.
Regarding querying your stage you can find some more hints in these questions:
Missing List-permissions on AWS - Snowflake - Failure using stage area. Cause: [The AWS Access Key Id you provided is not valid.] and
https://community.snowflake.com/s/question/0D50Z00008EKjkpSAD/failure-using-stage-area-cause-access-denied-status-code-403-error-code-accessdeniedhow-to-resolve-this-error
https://aws.amazon.com/de/premiumsupport/knowledge-center/access-key-does-not-exist/
I found out the aws credential I provided was not right. After fixing that, query worked.
This approach works to import data from S3 into a snowgflake Table from a public S3 bucket:
COPY INTO SNOW_SCHEMA.table_name FROM 's3://test-public/new/solution/file.csv'

How to save a view using federated queries across two projects?

I'm looking to save a view which uses federated queries (from a MySQL Cloud SQL connection) between two projects. I'm receiving two different errors (depending on which project I try to save in).
If I try to save in the project containing the dataset I get error:
Not found: Connection my-connection-name
If I try to save in the project that contains the connection I get error:
Not found: Dataset my-project:my_dataset
My example query that crosses projects looks like:
SELECT
bq.uuid,
sql.item_id,
sql.title
FROM
`project_1.my_dataset.psa_v2_202005` AS bq
LEFT OUTER JOIN
EXTERNAL_QUERY( 'project_2.us-east1.my-connection-name',
'''SELECT item_id, title
FROM items''') AS sql
ON
bq.looks_info.query_item.item_id = sql.item_id
The documentation at https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries#known_issues_and_limitations doesn't mention any limitations here.
Is there a way around this so I can save a view using an external connection from one project and dataset from another?
Your BigQuery table is located in US and your MySQL data source is located in us-east1. BigQuery automatically chooses to run the query in the location of your BigQuery table (i.e. in US), however, your Cloud MySQL is in us-east1 and that's why your query fails. Therefore the BigQuery table and Cloud SQL instance, must be in the same location in order for this query to succeed.
The solution for this kind of cases is moving your BigQuery dataset to the same location as your Cloud SQL instance manually by following the steps explained in detail in this documentation. However, the us-east1 is not currently supported for copying datasets. Thus, I will recommend you to create a new connection in one of the locations mentioned in the documentation.
I hope you find the above pieces of information useful.

Cannot delete phantom table using `bq` or BigQuery UI

I'm trying to remove a table from a dataset using bq without success:
BigQuery error in rm operation: Not found: Table carbon-web-...:AS_....Orders_01Jun2014_31May2015_3704438_01
The table is listed whenever I run bq ls AS_....
I'm seeing similar behavior when I try to access the table from the BigQuery UI. When I click on the link to the table, I receive an error message:
Unable to find table: carbon-web-...:AS_....Orders_01May2017_31May2017
Is there a way to force a refresh on the metadata for this dataset?
These are tables in transient state that shouldn't have been exposed. We found a bug in a feature that we were rolling out with listing tables where in some rare scenarios tables in transient state would show up in the list. We have reverted that now.

loading avro files with different schemas into one bigquery table

I have a set of avro files with slightly varying schemas which I'd like to load into one bq table.
Is there a way to do that with one line? Every automatic way to handle schema difference would be fine for me.
Here is what I tried so far.
0) If I try to do it in a straightforward way, bq fails with error:
bq load --source_format=AVRO myproject:mydataset.logs gs://mybucket/logs/*
Waiting on bqjob_r4e484dc546c68744_0000015bcaa30f59_1 ... (4s) Current status: DONE
BigQuery error in load operation: Error processing job 'iow-rnd:bqjob_r4e484dc546c68744_0000015bcaa30f59_1': The Apache Avro library failed to read data with the follwing error: EOF reached
1) Quick googling shows that there is --schema_update_option=ALLOW_FIELD_ADDITION option which, added to bq load job, changes nothing. ALLOW_FIELD_RELAXATION does not change anything either.
2) Actually, schema id is mentioned in the file name, so files look like:
gs://mybucket/logs/*_schemaA_*
gs://mybucket/logs/*_schemaB_*
Unfortunately, bq load does not allow more that on asterisk (as is written in bq manual too):
bq load --source_format=AVRO myproject:mydataset.logs gs://mybucket/logs/*_schemaA_*
BigQuery error in load operation: Error processing job 'iow-rnd:bqjob_r5e14bb6f3c7b6ec3_0000015bcaa641f3_1': Not found: Uris gs://otishutin-eu/imp/2016-06-27/*_schemaA_*
3) When I try to list the files explicitly, the list happens to be too long, so bq load does not work either:
bq load --source_format=AVRO myproject:mydataset.logs $(gsutil ls gs://mybucket/logs/*_schemaA_* | xargs | tr ' ' ',')
Too many positional args, still have ['gs://mybucket/logs/log_schemaA_2658.avro,gs://mybucket/logs/log_schemaA_2659.avro,gs://mybucket/logs/log_schemaA_2660.avro,...
4) When I try to use files as external table and list the files explicitly in external table definition, I also get "too many files" error:
BigQuery error in query operation: Table definition may not have more than 500 source_uris
I understand that I could first copy files to different folders and then process them folder-by-folder, and this is what I'm doing now as last resort, but this is only a small part of data processing pipeline, and copying is not acceptable as production solution.