BQ PY Client Libraries :: client.run_async_query() vs client.run_sync_query()

BQ PY Client Libraries :: client.run_async_query() vs client.run_sync_query() - google-bigquery

I'm looking at BQ PY Client Libraries:
There used to be two different operations to query a table
client.run_async_query()
client.run_sync_query()
But in the latest version (v1.3) it seems there's only one operations to execute a query, Client.query(). Did I understand correctly?
And looking at GH code it looks Client.query() just returns the query job, not the actual query results / data.... Making me conclude it works in a similar way as client.run_async_query().. there's no replacement for client.run_sync_query() operation anymore which return query results (data) synchronously / immediately?
Thanks for the clarification!
Cheers!

Although .run_sync_query() has been removed, the Query reference says that short jobs may return results right away if they don't take long to finish:
query POST /projects/projectId/queries
Runs a BigQuery SQL query and returns results if the query completes within a specified timeout.

Related

Databricks SQL - How to get all the rows (more than 1000) in the first run?

Currently, in Databricks if we run the query, it always returns 1000 rows in the first run. If we need all the rows, we need to execute the query again.
In the situations where we know that we need to download full data(1000+ rows), is there a turn around way to execute the query to get all the rows in the first run without re-executing the query?

There should be a down arrow next to the download button which allows you to Download full results. Are you seeing that in your interface? (Source)

How to calculate accumulated sum of query timings?

I have a sql file running many queries. I want to see the accumualted sum of all queries. I know that if I turn on timing, or call
\timing
query 1;
query 2;
query 3;
...
query n;
at the beginning of the script, it will start to show time it takes for each query to run. However, I need to have the accumulate results of all queries, without having to manually add them.
Is there a systematic way? If not, how can I fetch the interim times to throw them in a variable.

The pg_stat_statements is a good module that provides a means for tracking execution statistics.
First, add pg_stat_statements to shared_preload_libraries in the
postgresql.conf file. To know where this .conf file exists in your
filesystem, run show config_file;
shared_preload_libraries = 'pg_stat_statements'
Restart Postgres database
Create the extension
CREATE EXTENSION pg_stat_statements;
Now, the module provides a View, pg_stat_statements, which helps you to analyze various query execution metrics.
Reset the contents of stat collected before running queries.
SELECT pg_stat_statements_reset();
Now, execute your script file containing queries.
\i script_file.sql
You may get all the timing statistics of all the queries executed. To get the total time taken, simply run
select sum(total_time) from pg_stat_statements
where query !~* 'pg_stat_statements';
The time you get is in milliseconds, which may be converted to desired format using various timestamp related Postgres functions

If you want to time the whole script, on linux or mac you can use the time utility to launch the script.
The measurement in this case is a bit more than the sum of the raw query times, because it includes some overhead of starting and running the psql command. On my system this overhead is around 20ms.
$ time psql < script.sql
…
real 0m0.117s
user 0m0.008s
sys 0m0.007s
The real value is the time it took to execute the whole script, including the aforementioned overhead.
The approach in this answer is a crude, simple client side way to measure the runtime of the overall script. It is not useful to measure milli-second precision server side execution times. It still might be sufficient for many use-cases.
The solution of Kaushik Nayak is a way more precise method to time executions directly on the server. It also provides much more insight into the execution (eg. query level times).

Query error: Resources exceeded during query execution: The query could not be executed in the allotted memory

I'm getting an error when tries to execute the following query:
select r.*
from dataset.table1 r
where id NOT IN (select id from staging_data.table1);
It's basically a query to load incremental data on a table. The dataset.table1 has 360k rows and the incremental on staging_data has 40k. But when I try to run this on my script to load on another table, I got the error:
Resources exceeded during query execution: The query could not be executed in the allotted memory
This started to happen on the last week, before that it was working well.
I looked over for solutions on internet, but all the solutions doesn't work on my case.
Has anyone know how to solve it?
I changed the cronjob time and it worked. Thank you!

You can try using writing the results to another table, as Big Query has a limitation for the maximum response size that can be processed. You can do that either if you are using Legacy or Standard SQL, and you can follow the steps to do it in the documentation.

DB visualizer query running for a long time without producing any output

Hi i connected Hive using DB visualizer and fired a simple join query to fetch two columns according to the filter applied. But the query was running for more than an hour with the status "Executing". I fired the same query in Hive logging through Putty and got the result in less than 20 seconds.
Can anyone help me to understand why the query in DB visualizer was running for a long time without producing any output.
Query used:
SELECT
A.ORDER,
B.ORDER1
FROM
ORDER A
INNER JOIN DUORDER B ON A.ORDER=B.ORDER1 AND A.TYPE ='50'
(The result set contain only 400 records)

To analyze why, we need more info. Please please open Tools->Debug Window in DbVisualizer and enable debugging (just for DbVisualizer, not JDBC). Execute the query again, stopping it after some time (say a few minutes). Then submit a support request using Help->Contact Support, and make sure that Attach Logs is enabled. This will give us the info we need to see what may be wrong.
Best Regards,
Hans (DbVisualizer team)

BigQuery executing only one query

I'm trying to run a query in the UI, but I get the error:
Error: 6.1 - 0.0: Only one query can be executed at a time.
I don't think there is any other queries running, and this has lasted for a while now. Surely it can handle more than one query at a time?? How long will this be stuck? How can I turn bigquery off and on again :p

This error means that the query string has been parsed as containing multiple queries. Check that you do not have multiple top-level SELECT statements in the query text box.
From the specific message, I would guess your second query begins on or around line 6.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BQ PY Client Libraries :: client.run_async_query() vs client.run_sync_query() - google-bigquery

Although .run_sync_query() has been removed, the Query reference says that short jobs may return results right away if they don't take long to finish: query POST /projects/projectId/queries Runs a BigQuery SQL query and returns results if the query completes within a specified timeout.

Related

Databricks SQL - How to get all the rows (more than 1000) in the first run?

How to calculate accumulated sum of query timings?

Query error: Resources exceeded during query execution: The query could not be executed in the allotted memory

DB visualizer query running for a long time without producing any output

BigQuery executing only one query

Categories

Resources