I'm connected to BigQuery from Grafana, and I'm getting wildly different results from queries compared to the BigQuery console and other query tools I've connected to BigQuery. Simple things like select * from table yield very different results. Grafana is returning 1400 records from a select * on a table with 4 million records. Anyone seen this before or have any idea what is going on?
It seems like in the Query options, the default value for Max data points is limiting the number of results to 1369. Set it to a higher number and the query will return that number of rows (it is used as LIMIT value in the query sent to the BigQuery API).
Related
I am getting a part of result from the Bigquery API.
Earlier, I solved the issue of 1,00,000 records per result using iterators.
However, now I'm stuck at some other obstacle.
If I take more than 6-7 columns in a result, I do not get the complete set of result.
However, if I take a single column, I get the complete result.
Can there be a size limit as well for results in Bigquery API ?
There are some limits for Query Job
In particular
Maximum response size — 128 MB compressed
Of course, it is unlimited when writing large query results to a destination table (and then reading from there)
I have a calculation view which is based on other calculation views and joins to bring material Accounts data from different vendors (all joins have 1-1 mapping with target). in the final view I have a calculated column as Formatted_MATERIAL (Material numbers without any leading zeros, used Ltrim() to remove leading zeros.)
Now, when I'm searching Formatted_MATERIAL equal to some specific number it's showing read error (heading). If I'm searching for some range of material it's giving results.
For example, if I search for material (500098), it's present in following query results
select "Formatted_MATERIAL"
FROM "_SYS_BIC"."CA_REPORTS_001_VK"
where "Formatted_MATERIAL" between 5000000 and 6000000
order by "Formatted_MATERIAL"
but no results for
select "Formatted_MATERIAL"
FROM "_SYS_BIC"."CA_REPORTS_001_VK"
where "Formatted_MATERIAL" = 5000098
The cause of the error is that during some processing step in one of the views you're using, the intermediate result set exceeds 2 billion records.
Based on my experience with typical HANA use cases (that would mostly be use cases in relation to SAP products) I am pretty sure that the way these underlying views have been modelled is not really right. Whenever you try and join or aggregate an intermediate result set of two billion records at once, chances are that important operations like filtering, projection and aggregation should have been done much earlier in the model.
Of course, without seeing the model(s) and the execution details (use PlanViz for this) and knowing with HANA version you're using, there is nothing we can say about how to solve this issue.
I would like to know if it is possible to limit the bigquery query size when running a query through the web user-interface?
My idea is just to test the query but instead of querying all my tables; I would like just to query a part of it with for instance a number of row.
Limit is not optimizing my query cost, so the idea is to find a function similar to "row_number" or "fetch".
Sorry I'm a marketer and not a developer, so thank you in advance for your kind help.
How to limit BigQuery query size for testing ... ?
1 - Try to minimize number of tables involved in your testing
In your query – there are 60+ tables involved for respectively dates between 2016-12-11 and nowadays
SELECT <fields_list> FROM
TABLE_DATE_RANGE([XXX:85801771.ga_sessions_],
TIMESTAMP('20161211'),
TIMESTAMP('20170315'))
Instead you can use same day as start and end of time range, thus drastically reducing number of involved tables (down to just one table) and overall scan size. For example
SELECT <fields_list> FROM
TABLE_DATE_RANGE([XXX:85801771.ga_sessions_],
TIMESTAMP('20161211'),
TIMESTAMP('20161211'))
2 - Minimize number of rows. Ability to do so really depends on how your table is being loaded with data. If table loaded incrementally - you can use so called table decorators.
Note - this technique works with tables within last 7 days
For example, below will scan only data that was in table at one hour ago (so called snapshot decorator)
SELECT <fields_list> FROM [XXX:85801771.ga_sessions_20170212#-3600000]
This works well with the most recent day's table especially at the start of the day when size of table is not big yet
So, to limit further, you can use below version (so called range decorator) - gives you data added between one hour and half an hour ago
SELECT <fields_list> FROM [XXX:85801771.ga_sessions_20170212#-3600000--1800000]
Finally, #0 is a special case that references the oldest possible snapshot of the table: either 7 days in the past, or the table's creation time if the table is less than 7 days old. For example
SELECT <fields_list> FROM [XXX:85801771.ga_sessions_20170210#0]
3 - Test against Sampled Table. If you expect experimenting with your query again and again - you can first prepare downsized version of your table with just as many rows as you need and applying sampling logic that fit in your business logic. To limit number of rows you can use LIMIT Clause. To get random rows you can use RAND function for example
After sampled table is prepared - run all your query against it till when you have final version - after this - you can run it against your original table(s)
And btw, to create sampled table you need to set destination table under options in Web UI.
I don't really consider this as a duplicate of this
or this. It might be slightly related any way. But, i wrote a simple query on data SELECT * FROM [opendata.openQueryData] LIMIT 1000 that is not even up to 100,00 rows in total and it is taking forever. meanwhile similar query on sample data SELECT * FROM [publicdata:samples.shakespeare] LIMIT 1000 just took 1.2s. Is there anything i have to do to achieve this speed on my own data ?
edit
And i just noticed the number of rows on my data is not showing unlike the rest of the samples dataset and some i have used before now, Could this be the reason for slow query response ?
I am currently working with large table (~105M Records) in C# application.
When query the table with 'Order by' or 'Order Each by' clause, then i am getting "Resources exceeded during query execution" error.
If i remove 'Order by' or 'Order Each by' clause, then i am getting Response too large to return error.
Here is the sample query for two scenarios (I am using Wikipedia public table)
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title
Here are the questions i have
What is the maximum size of Big Query Response?
How do we select all the records in Query Request not in 'Export Method'?
1. What is the maximum size of Big Query Response?
As it's mentioned on Quota-policy queries maximum response size: 10 GB compressed (unlimited when returning large query results)
2. How do we select all the records in Query Request not in 'Export Method'?
If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.
Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:
You must specify a destination table.
You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
Window functions can return large query results only if used in conjunction with a PARTITION BY clause.
Read more about how to paginate to get the results here and also read from the BigQuery Analytics book, the pages that start with page 200, where it is explained how Jobs::getQueryResults is working together with the maxResults parameter and int's blocking mode.
Update:
Query Result Size Limitations - Sometimes, it is hard to know what 10 GB of compressed
data means.
When you run a normal query in BigQuery, the response size is limited to 10 GB
of compressed data. Sometimes, it is hard to know what 10 GB of compressed
data means. Does it get compressed 2x? 10x? The results are compressed within
their respective columns, which means the compression ratio tends to be very
good. For example, if you have one column that is the name of a country, there
will likely be only a few different values. When you have only a few distinct
values, this means that there isn’t a lot of unique information, and the column
will generally compress well. If you return encrypted blobs of data, they will
likely not compress well because they will be mostly random. (This is explained on the book linked above on page 220)