Google BigQuery unable to process larger result set getting "Response too large to return" or "Resources exceeded during query execution" - google-bigquery

I am currently working with large table (~105M Records) in C# application.
When query the table with 'Order by' or 'Order Each by' clause, then i am getting "Resources exceeded during query execution" error.
If i remove 'Order by' or 'Order Each by' clause, then i am getting Response too large to return error.
Here is the sample query for two scenarios (I am using Wikipedia public table)
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title
Here are the questions i have
What is the maximum size of Big Query Response?
How do we select all the records in Query Request not in 'Export Method'?

1. What is the maximum size of Big Query Response?
As it's mentioned on Quota-policy queries maximum response size: 10 GB compressed (unlimited when returning large query results)
2. How do we select all the records in Query Request not in 'Export Method'?
If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.
Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:
You must specify a destination table.
You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
Window functions can return large query results only if used in conjunction with a PARTITION BY clause.
Read more about how to paginate to get the results here and also read from the BigQuery Analytics book, the pages that start with page 200, where it is explained how Jobs::getQueryResults is working together with the maxResults parameter and int's blocking mode.
Update:
Query Result Size Limitations - Sometimes, it is hard to know what 10 GB of compressed
data means.
When you run a normal query in BigQuery, the response size is limited to 10 GB
of compressed data. Sometimes, it is hard to know what 10 GB of compressed
data means. Does it get compressed 2x? 10x? The results are compressed within
their respective columns, which means the compression ratio tends to be very
good. For example, if you have one column that is the name of a country, there
will likely be only a few different values. When you have only a few distinct
values, this means that there isn’t a lot of unique information, and the column
will generally compress well. If you return encrypted blobs of data, they will
likely not compress well because they will be mostly random. (This is explained on the book linked above on page 220)

Related

Grafana displaying incomplete data from query results from BigQuery

I'm connected to BigQuery from Grafana, and I'm getting wildly different results from queries compared to the BigQuery console and other query tools I've connected to BigQuery. Simple things like select * from table yield very different results. Grafana is returning 1400 records from a select * on a table with 4 million records. Anyone seen this before or have any idea what is going on?
It seems like in the Query options, the default value for Max data points is limiting the number of results to 1369. Set it to a higher number and the query will return that number of rows (it is used as LIMIT value in the query sent to the BigQuery API).

How to calculate IO for a query in google bigquery for write operations

I am going through all the java-docs that are available on google side to find the IO for a query, there is a field called totalBytesProcessed, but the totalBytesProcessed field tells you how much data is processed (read) by your query. For the write queries i am not getting that field populated itself. Now in this case how can i see the IO of a query ? We have fields related to queryOutput, and DmlStats but those are just number of rows effected, we can't calculate the bytes based on number of rows, since from row to row the bytes can vary. In order to conclude my question, is there any way i can know the noOfBytesProcessed for write operations queries.

How to interpret query process GB in Bigquery?

I am using a free trial of Google bigquery. This is the query that I am using.
select * from `test`.events where subject_id = 124 and id = 256064 and time >= '2166-01-15T14:00:00' and time <='2166-01-15T14:15:00' and id_1 in (3655,223762,223761,678,211,220045,8368,8441,225310,8555,8440)
This query is expected to return at most 300 records and not more than that.
However I see a message like this as below
But the table on which this query operates is really huge. Does this indicate the table size? However, I ran this query multiple times a day
Due to this, it resulted in error below
Quota exceeded: Your project exceeded quota for free query bytes scanned. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
How long do I have to wait for this error to go-away? Is the daily limit 1TB? If yes, then I didn't not use close to 400 GB.
How to view my daily usage?
If I can edit quota, can you let me know which option should I be editing?
Can you help me with the above questions?
According to the official documentation
"BigQuery charges for queries by using one metric: the number of bytes processed (also referred to as bytes read)", regardless of how large the output size is. What this means is that if you do a count(*) on a 1TB table, you will supposedly be charged $5, even though the final output is very minimal.
Note that due to storage optimizations that BigQuery is doing internally, the bytes processed might not equal to the actual raw table size when you created it.
For the error you're seeing, browse the Google Console to "IAM & admin" then "Quotas", where you can then search for quotas specific to the BigQuery service.
Hope this helps!
Flavien

Google Bigquery results

I am getting a part of result from the Bigquery API.
Earlier, I solved the issue of 1,00,000 records per result using iterators.
However, now I'm stuck at some other obstacle.
If I take more than 6-7 columns in a result, I do not get the complete set of result.
However, if I take a single column, I get the complete result.
Can there be a size limit as well for results in Bigquery API ?
There are some limits for Query Job
In particular
Maximum response size — 128 MB compressed
Of course, it is unlimited when writing large query results to a destination table (and then reading from there)

Why does DevCenter of Datastax has row restrictions to 1000?

There is a limit of displaying 1000 rows for the tables in Datastax Devcenter. Any reason for having this option?
Because when queried as SELECT count(*) FROM tablename;
the performance from Cassandra is going to be same whether displaying 1000 records or complete records set.
DevCenter version 1.6.0 introduces result set paging which allows you to browse all the rows in your result set.
In DevCenter 1.6.0 the "with limit" value sets the paging size, e.g. the number of records to view per page and is still limited to 1000 maximum. However, now you can page forward (and back) through all of the query results.
A related new feature allows you to export all results to a file, either as CSV or INSERT statements. Right-click in the results view area and select "Export all results to File as [CSV|Insert]".
This is by design; consider it as a safeguard that prevents you from potentially fetching thousands or millions of rows by accident, which, among other problems, could have a serious impact on your network's bandwidth usage.
When you run the query in Datastax DevCenter 1.6 it displays the 1000 record in result as selected limit but if you export the same result to CSV it will give you all the record which you are looking for.
I run Datastax Devcenter 1.4.
I run the query with a limit and it provides me the actual count.
But LIMIT is limited to maximum value of signed integer (2147483647)
select count(*) from users LIMIT 2147483647;-- ALLOW FILTERING;