Error In Query Operation: Cannot start a job without a project id - google-bigquery

I keep getting an error using the bqcommand line tool. For example, I can easily run this query and it returns the table that I want:
head -n 10 xxxx-bq:name_name.Report2
Note that xxxx-bq is the projectid, and name_name is the dataset id. When I try to run a query against this table, say the follwing:
query "SELECT count(*) FROM xxxx-bq:name_name.Report2
I get an error that says that I cannot start a job without a project id. What am I doing wrong here? How can I specify in the query the project ID? I know people have asked some similar questions. That said, I have been following along and my approach is not working.

Do you have a project id? If not, this page can help you set one up: https://developers.google.com/bigquery/bq-command-line-tool-quickstart
All BigQuery jobs (which include queries) require a project id, which is the project that gets billed for any damage done by the job. (by damage, I mean work)
You should either set your default project id (you can do this by running bq init)
or set the project id that you're running the job under via --project_id=
So if you're running bq shell, you would use bq shell --project_id=myprojectid instead.

strange... I just started working with bq & got the same error but it didn't like me passing --project_id=[myprojectid]. Although I was already authed with gcloud auth login, I had to run bq init (and it seemingly didn't do anything) -- after that, my queries worked just fine.

Related

BigQuery scheduled query failing

I have a BigQuery scheduled query that is failing with the following error:
Not found: Dataset bunny25256:dataset1 was not found in location US at [5:15]; JobID: 431285762868:scheduled_query_635d3a29-0000-22f2-888e-14223bc47b46
I scheduled the query via the SQL Workspace. When I run the query in the workspace, it works fine. The dataset and everything else that I have created is in the same region: us-central1.
Any ideas on what the problem could be, and how I could fix it or work around it?
There's nothing special about the query, it computes some statistics on a table in dataset1 and puts it in dataset2.
When you submit a query, you submit it to BQ at a given location. The dataset you created lives in us-central1 but your query was submitted to us. The location us and us-central1 are not the same. Change your scheduled query to run in us-central1. See docs on location for more info.
Dataset is not provided correctly- it should be in formate project.dataset.table
try running below in big query
select * from bunny25256:dataset1
you should provide bunny25256:dataset1.table

BigQuery Scheduled Query won't run

I've got data buckets setup in GCS and using BigQuery to run all my .csv files from that bucket to build a table. That works flawlessly. I made a simple deduplication query that when manually run, selects only distinct rows and creates a new table with "DeDupe" appended (Code below). That runs flawlessly.
CREATE OR REPLACE TABLE
`project-name-123456.dataset_2022.dataset 2022 DeDuped` AS
SELECT
DISTINCT *
FROM
`project-name-123456.dataset_2022.dataset 2022`
The issue I am having is with scheduling that query. Every time it tries to run I get the error "Error status: Not found: Dataset project-name-123456:dataset_2022 was not found in location US; JobID: project-name-123456:628d7766-0000-2d36-a82f-94eb2c0a664a"
The only thing I can figure is that I have my data location for the dataset as "us-central1" as it has a free tier. And when I go to my scheduled query, whether I select the same data location, or "Default" it always changes to "US Multiple".
Is there a way to fix this?
Or do I need to create my dataset in "US Multiple"?
Trying to cut down on costs as much as possible by keeping it in the us-central1
EDIT: Seems like I just needed to delete and recreate the scheduled query again. Chatted with Google Support and they sorted it. Sorry all!

materialized view manual refresh

I was experimenting with Big Query materialized view manual refresh but when I try to call the procedure as per the documentation
bq query --use_legacy_sql=false "CALL BQ.REFRESH_MATERIALIZED_VIEW('my-project-id.my-dataset-name.mv_name')"
I get back :
Error in query string: Error processing job 'project-id:bqjob_r508c7d13330d5fcd_0000017801fd2694_1': Not found: Dataset my-project-id.my-dataset-name was not found in location US at [1:6]
my-project-id is my GCP project, my-dataset-name is the dataset which is not in the US.
Note I get the same error when using the Query window from the web console.
I haven't been able to find any system documentation on the sp itself though it seems to only take 1 argument, but I hoped it would be able to figure out the location to run via the project id and dataset as supplied.
We're don't support routing for materialized view refresh right now. If this would be valuable to you please submit a feature request in our public issue tracker. In the meantime you can refresh your views by explicitly picking a location in the UI or using the --location flag on the command line.
I think you might want to use "`" instead of "'" around the dataset.table name.
Try below
bq query --use_legacy_sql=false "CALL BQ.REFRESH_MATERIALIZED_VIEW(`my-project-id.my-dataset-name.mv_name`)"
in BQ console.
CALL BQ.REFRESH_MATERIALIZED_VIEW(`my-project-id.my-dataset-name.mv_name`)

BigQuery scheduled query: Cannot create a transfer in > JURISDICTION_US when destination dataset is located in > REGION_ASIA_SOUTHEAST_1

I am getting this error quite frequently while trying to create a scheduled query
Error creating scheduled query: Cannot create a transfer in
JURISDICTION_US when destination dataset is located in
REGION_ASIA_SOUTHEAST_1
I just need a scheduled query to overwrite data in a table.
I had the same problem while trying to create a scheduled query with python:
400 Cannot create a transfer in REGION_EUROPE_WEST_1 when destination dataset is located in JURISDICTION_EU
I figured out that even my project is located in europe-west1 but my destination dataset was located in multinational location: Europe. I had to update my parent path : parent=project_path to '{project_path}/locations/eu' so that it works.
I hope that it helps someone.
It's look like as a bug from BQ.
I got the same problems, with source and destination dataset located in EU both.
I've change just for testing purpose the destination for an other EU dataset, and it works.
I've finally update the scheduled query to use my first destination choice and now it works.. I can't explain why, but it's seem to be a workaround.
Maybe, you can try with starting from the Scheduled Queries BigQuery UI and click on "+ create schedule query" button, then I don't get error. If I start directly in BigQuery UI I get the same error.
As I tried, it may happen because I have existed table with the same id as the destination table. This happens even if the table is the result of manually running that query and saved.
I faced the same issue recently.I tried 2 things and they worked:
try setting the query location to destination dataset/table location, then try scheduling the query.
If that does not work try to run the query and save results to the intended table in bigquery i.e. try creating the destination table with storing the results of the query you are trying to schedule first. Then try scheduling the query.
Both cases worked for me in different cases.
I had this error and tried many of the solutions in this thread. I tried a new session in an incognito window and it worked so I believe this is a transient issue as suggested.
I just scheduled query select 1 and then edited it to the needed one – it worked
I think trouble with time when start schedule. If it is in the past relative to local time, then bg tries to run the request on another server.
I had the same issue. The way I solved it was to disable the editor tabs (there is a button at the top). Then opened the query settings and set the processing location to EU manually.
I was using bq command when I came across this issue and was able to resolve it by adding parameter --location='europe-west1
So my final query looked like this
bq query \
--use_legacy_sql=false \
--display_name='my_table' \
--location='europe-west1' \
'''create or replace table my_dataset.my_table as (select * from external_query('projects/my_mysql_connection/locations/europe-west1/connections/bi', '(select * from my_table)'))'''

Get the Last Modified date for all BigQuery tables in a BigQuery Project

I have several databases within a BigQuery project which are populated by various jobs engines and applications. I would like to maintain a dashboard of all of the Last Modified dates for every table within our project to monitor job failures.
Are there any command line or SQL commands which could provide this list of Last Modified dates?
For a SQL command you could try this one:
#standardSQL
SELECT *, TIMESTAMP_MILLIS(last_modified_time)
FROM `dataset.__TABLES__` where table_id = 'table_id'
I recommend you though to see if you can log these errors at the application level. By doing so you can also understand why something didn't work as expected.
If you are already using GCP you can make use of Stackdriver (it works on AWS as well), we started using it in our projects and I recommend giving it a try (we tested for python applications though, not sure how the tool performs on other clients but it might be quite similar).
I've just queried stacked GA4 data using the following code:
FROM analytics_#########.__TABLES__
where table_id LIKE 'events_2%'
I have kept the 2 on the events to ensure my intraday tables do not pull through also.