Query error: Resources exceeded during query execution: The query could not be executed in the allotted memory - google-bigquery

I'm getting an error when tries to execute the following query:
select r.*
from dataset.table1 r
where id NOT IN (select id from staging_data.table1);
It's basically a query to load incremental data on a table. The dataset.table1 has 360k rows and the incremental on staging_data has 40k. But when I try to run this on my script to load on another table, I got the error:
Resources exceeded during query execution: The query could not be executed in the allotted memory
This started to happen on the last week, before that it was working well.
I looked over for solutions on internet, but all the solutions doesn't work on my case.
Has anyone know how to solve it?
I changed the cronjob time and it worked. Thank you!

You can try using writing the results to another table, as Big Query has a limitation for the maximum response size that can be processed. You can do that either if you are using Legacy or Standard SQL, and you can follow the steps to do it in the documentation.

Related

Out of Memory error while creating table but SELECT works

I'm trying to CREATE a table using CREATE TABLE AS, which gives the following error:
[Amazon](500310) Invalid operation: Out Of Memory:
Details:
-----------------------------------------------
error: Out Of Memory:
code: 1004
context: alloc(524288,MtPool)
query: 6041453
location: alloc.cpp:405
process: query2_98_6041453 [pid=8414]
-----------------------------------------------;
I'm getting this error everytime I execute the query, but executing just the SELECT part of the query (without the CREATE TABLE AS) works fine. The result has around 38k rows. However I see there's a drastic difference in Bytes returned in the sequential scan on one table.
SELECT
CREATE TABLE AS SELECT
I fail to understand why there's so much difference between these two scenarios and what can be done to mitigate it. I also tried to create a TEMP TABLE but that also results in memory error.
I'm not great at understanding Query Plans (never found a detailed guide to it for Redshift, so if you could link to some resource that'd be a bonus).
Update : Also tried creating the table first and then INSERTing the data using SELECT, that also gives the same error.
Update 2 : Tried set wlm_query_slot_count to 40; or even 50 but still the same error.
We ran into a similar issue after our clusters got updated to the latest release (1.0.10694).
Two things that helped:
Changing your WLM to allocate more memory to your query (in our case,
we switched to WLM
Auto)
Allocating a higher query_slot_count to your query:
set wlm_query_slot_count to 2; to allocate 2 query slots for
example.
We suspect that AWS may have changed something with memory management with the most recent updates. I'll update once we hear back.
As a workaround, you could try inserting the records in batches.
Solved this using manual WLM implement.

Statement object has been closed in querying from Amazon Redshift

On attempting to execute a simple query on a table (dimensions: 1,131,714,069 rows by 22 columns), I am running into the error:
[Amazon][JDBC](12080) Statement object has been closed.
Research online has unfortunately not provided much insight into this error.
I will not encounter this error each time I execute a query; so far it seems that its occurrence is unpredictable. The query that most recently caused this error looked was a very simple SELECT ... FROM ... WHERE with no subqueries and only one condition in the WHERE clause.
The query was busy for about 22 minutes before failing, however after waiting a few minutes, then running it again, it completed successfully in a matter of seconds. That being said, this kind of unpredictability and unreliability is exactly what I'm trying to prevent against.
If it helps, the IDE that I am using to connect to my Redshift database is TeamSQL.
What could be causing this error, and what steps could I take to prevent it?

BQ PY Client Libraries :: client.run_async_query() vs client.run_sync_query()

I'm looking at BQ PY Client Libraries:
There used to be two different operations to query a table
client.run_async_query()
client.run_sync_query()
But in the latest version (v1.3) it seems there's only one operations to execute a query, Client.query(). Did I understand correctly?
And looking at GH code it looks Client.query() just returns the query job, not the actual query results / data.... Making me conclude it works in a similar way as client.run_async_query().. there's no replacement for client.run_sync_query() operation anymore which return query results (data) synchronously / immediately?
Thanks for the clarification!
Cheers!
Although .run_sync_query() has been removed, the Query reference says that short jobs may return results right away if they don't take long to finish:
query POST /projects/projectId/queries
Runs a BigQuery SQL query and returns results if the query completes within a specified timeout.

Query Failed Error: Unexpected. Please try again. Google BigQuery

I am unable to execute a simple select query (select * from [<DatasetName>.<TableName>] LIMIT 1000) with Google BigQuery. It is giving me below error:
Query Failed
Error: Unexpected. Please try again.
Job ID: job_51MSM2uo1vE5QomZA8Sv_RUE7M8
The table contains around 10 records. I am able to execute queries on other tables.
Job Id is - job_51MSM2uo1vE5QomZA8Sv_RUE7M8.
It looks like there was a temporary issue where we were seeing timeouts when querying tables that had been inserted to recently via streaming insert (tabledata.insertall()). We're currently addressing the underlying issue, but it should be working now. Please ping this thread if you see it again.

Strange result when running query over a table created from a result of another query

Since yesterday, 1-09-2012, I can't run any queries over a table that has been created from the result of another query.
example query:
SELECT region FROM [project.table] LIMIT 1000
result:
Query Failed
Error: Field 'region' is incompatible with the table schema.
49077933619
These kinds of queries passed successfully every day, last couple of weeks. Has anybody else encountered a similar problem?
We added some additional schema checking on friday. I was unable to reproduce the problem but I'll look into your examples (I was able to find your failed job in the logs). I'm in the process of turning off the additional schema checking in the meantime. Please try again and let us know if the problem continues.