Are LIMIT 0 queries billed in BigQuery? - google-bigquery

I was thinking to use this together with DBT to check that all the DAG, dependencies and such is correct without incurring in costs.
I was thinking of adding a LIMIT 0 in BigQuery queries. I'm not finding any official doc stating whether this is the case.
Are those queries not billed?

Correct, this will not bill any data. You can run a dry run to verify:
dzagales#cloudshell:~ (elzagales)$ bq query --use_legacy_sql=false --dry_run 'SELECT * FROM `bigquery-public-data.austin_311.311_service_requests` LIMIT 0'
Query successfully validated. Assuming the tables are not modified, running this query will process 0 bytes of data.
dzagales#cloudshell:~ (elzagales)$ bq query --use_legacy_sql=false --dry_run 'SELECT * FROM `bigquery-public-data.austin_311.311_service_requests` LIMIT 1'
Query successfully validated. Assuming the tables are not modified, running this query will process 254787 bytes of data.
Above you can see a LIMIT 0 bills 0 bytes, while a LIMIT 1 will scan the whole table.

Related

How to Generate incredibly large arrays in sql

When trying to generate a large array using the following command
GENERATE_ARRAY(1467331200, 1530403201, 15)
I'm getting the following error:
google.api_core.exceptions.BadRequest: 400 GENERATE_ARRAY(1467331200, 1530403201, 15) produced too many elements
Is there a way to generate an array of said size?
There is a limit on the number of result elements up to 1048575.
Test: bq query --dry_run --nouse_legacy_sq "[replace query below]"
Query: select GENERATE_ARRAY(1, 1048575) as test_array;
Output: Query successfully validated. Assuming the tables are not modified, running this query will process 0 bytes of data.
Query: select GENERATE_ARRAY(1, 1048576) as test_arr;
Output: GENERATE_ARRAY(1, 1048576, 1) produced too many elements
There's no mention of this limit in the documentation so I suggest that you either send a documentation feedback on the page or file a feature request to increase the limit or if possible remove the limit.
Possible workaround is to concat the array.
Example: SELECT ARRAY_CONCAT(GENERATE_ARRAY(1,1048575), GENERATE_ARRAY(1,1048575))...

Optimize hive query requesting data from two parition

currently, I am using hive with s3 storage.
I have total 1000000 partitions right now. I am facing a problem where:
If I do:
Query execution time is less than 1 second.
select sum(metric) from foo where pt_partition_number = 'bar1'
select sum(metric) from foo where pt_partition_number = 'bar2'
But if I do
select sum(metric) from foo where pt_partition_number IN ('bar1','bar2')
The query is taking near about 30 seconds. I am thinking hive is doing directory scan in case of second query.
Is there a way to optimize query:
My request pattern always access two partition data.

Does BigQuery charge for querying only the stream buffer?

I have a day partitioned table with approx 300k rows in the streaming buffer. When running an interactive, non-cached, standard SQL query using
SELECT .. FROM .. WHERE _PARTITIONTIME IS NULL
The query validator says:
Valid: This query will process 0 B when run.
And after executing, the job information tab says:
Bytes Processed 0 B
Bytes Billed 0 B
The query is certainly returning real-time results each time I run it. Is this actually a free operation?

BigQuery returning "query too large" on short query

I've been trying to run this query:
SELECT
created
FROM
TABLE_DATE_RANGE(
program1_insights.insights_,
TIMESTAMP('2016-01-01'),
TIMESTAMP('2016-02-09')
)
LIMIT
10
And BigQuery complains that the query is too large.
I've experimented with writing the table names out manually:
SELECT
created
FROM program1_insights.insights_20160101,
program1_insights.insights_20160102,
program1_insights.insights_20160103,
program1_insights.insights_20160104,
program1_insights.insights_20160105,
program1_insights.insights_20160106,
program1_insights.insights_20160107,
program1_insights.insights_20160108,
program1_insights.insights_20160109,
program1_insights.insights_20160110,
program1_insights.insights_20160111,
program1_insights.insights_20160112,
program1_insights.insights_20160113,
program1_insights.insights_20160114,
program1_insights.insights_20160115,
program1_insights.insights_20160116,
program1_insights.insights_20160117,
program1_insights.insights_20160118,
program1_insights.insights_20160119,
program1_insights.insights_20160120,
program1_insights.insights_20160121,
program1_insights.insights_20160122,
program1_insights.insights_20160123,
program1_insights.insights_20160124,
program1_insights.insights_20160125,
program1_insights.insights_20160126,
program1_insights.insights_20160127,
program1_insights.insights_20160128,
program1_insights.insights_20160129,
program1_insights.insights_20160130,
program1_insights.insights_20160131,
program1_insights.insights_20160201,
program1_insights.insights_20160202,
program1_insights.insights_20160203,
program1_insights.insights_20160204,
program1_insights.insights_20160205,
program1_insights.insights_20160206,
program1_insights.insights_20160207,
program1_insights.insights_20160208,
program1_insights.insights_20160209
LIMIT
10
And not surprisingly, BigQuery returns the same error.
This Q&A says that "query too large" means that BigQuery is generating an internal query that's too large to be processed. But in the past, I've run queries over way more than 40 tables with no problem.
My question is: what is it about this query in particular that's causing this error, when other, larger-seeming queries run fine? Is it that doing a single union over this number of tables is not supported?
Answering question: what is it about this query in particular that's causing this error
The problem is not in query itself.
Query looks good.
I just run similar query against ~400 daily tables with total 5.8B (billion) rows of total size 5.7TB with:
Query complete (150.0s elapsed, 21.7 GB processed)
SELECT
Timestamp
FROM
TABLE_DATE_RANGE(
MyEvents.Events_,
TIMESTAMP('2015-01-01'),
TIMESTAMP('2016-02-12')
)
LIMIT
10
You should look around - btw, are you sure you are not over-simplifying query in your question?

Bigquery resources exceeded during query execution, quota?

With Google BigQuery, I'm running a query with a group by and receive the error, "resources exceeded during query execution".
Would an increased quota allow the query to run?
Any other suggestions?
SELECT
ProductId,
StoreId,
ProductSizeId,
InventoryDate as InventoryDate,
avg(InventoryQuantity) as InventoryQuantity
FROM BigDataTest.denorm
GROUP EACH BY
ProductSizeId,
InventoryDate,
ProductId,
StoreId;
The table is around 250GB, project # is 883604934239.
A combination of reducing the data involved and recent updates to BigQuery, this query now runs.
where ABS(HASH(ProductId) % 4) = 0
Was used to reduce the 1.3 Billion rows in the table (% 3 still failed).
With the test data set it gives "Error: Response too large to return in big query" which can be handled by writing the results out to a table. Click Enable Options, 'Select Table' (and enter a table name), then check 'Allow Large Results'.