GCP Bigquery run issue - google-bigquery

I am running a query in GCP console. The script is correct but it shows the below message while I run the query.
"Results are not displayed because the table has a large amount of cells and may cause the BigQuery console to become unresponsive. Consider modifying your query to improve browser performance."
What is the solution.

This is just a BigQuery's mechanism to avoid overwhelming your browser.
You still can see the results by clicking on View results
If you think this message is confuse or something should be changed, I encourage you to create a Public Issue sharing your thoughts. You can do that through this link

You may have potentially caused a humongous CROSS JOIN in your query. You need to take a look at your query first, and BigQuery will compute the results if you force it, but it will lead to the use of a lot of Slots and rack up a high bill for you

Related

Measuring the averaged elapsed time for SQL code running in google BigQuery

As BigQuery is a shared resource, it is possible that one gets different values running the same code on BigQuery. OK one option that I always use is to turn off caching in Query Settings, Cache preference. This way queries will not be cached. The problem with this setting is that if you refresh the browser or leave it idle, that Cache Preference box will be ticked again.
Anyhow I had a discussion with some developers that are optimizing the code. In a nutshell, they take slow running code, run it 5 times and get an average, then following optimization then run the code again 5 times to get an average value for optimized SQL. Details are not clear to me. However, my preference would be (all in BQ console)
create a user session
turn off sql caching
On BQ console paste the slow running code;
On the same session paste the optimized code
Run the codes (separated by ";")
This will ensure that any systematics like BQ busy/overloaded, slow connection etc will affect "BOTH" SQL piece equally and the systematics will be cancelled out. In my option one only need to run it once as caching is turned off as well. Running 5 times to get an average looks excessive and superfluous?
Appreciate any suggestions/feedback
Thanks
Measuring the time is one way, the other way to see if the query has been optimized is the understanding of the query plan and how slots are used effectively.
I've been with BigQuery more than 6 years, and what you describe was never used by me. In BigQuery actually what matters is reducing the costs, and that can be done iteratively rewriting the query, and using partitioning/clustering/materialized views, caching/temporary tables.

List all the queries made to BigQuery with their processed bytes

I would like to know if there is a method in the BigQuery API or any other way where i can list all the queries made and their processed bytes. Something like what is listed in the Activity Page but with the processedBytes field:
https://console.cloud.google.com/home/activity?project=coherent-server-125913
We are having a problem with billing. Suddenly our BigQuery Analysis Costs have increased a lot and we think we are being charged like 20 times more than expected (we check all the responses from BigQuery API and save the processedBytes field, taking into account that the minimum charge is of 10MB).
The only way we can solve this difference is listing all the requests and comparing to our numbers to see if we arenĀ“t measuring something or if we are doing something wrong. We have opened a billing support ticket and they have redirected me to Stackoverflow for asking the question as they think that is a technical issue.
Thanks in advance!
Instead of checking totalBytesProcessed - you should try checking totalBytesBilled and billingTier (see here)
You might jumped to high billing tiers - just guess
The best place to check would be the BigQuery logs.
This is going to tell you what queries were run, who ran them, what date/time they were run, the total bytes billed etc.
Logs can be a bit tedious to look through but BigQuery allows you to stream BigQuery logs into a BigQuery table and you can then query said table to identify expensive queries.
I've done this and it works really well to give you visibility on your BQ charges. The process of how to do this is outlined in more detail here: https://www.reportsimple.com.au/post/google-bigquery

BigQuery web UI is unresponsive & eventually crashes

When I click on "Details" to see a preview of the data in the table, the web UI locks up and I see the following errors. I've tried refreshing and restating my browser, but it doesn't help.
Yes, we (the BQ team) are aware of performance issues when viewing very repeated rows in the BigQuery web UI. The public genomics tables are known to tickle these performance issues since individual rows of their table are highly repeated.
We're considering a few methods of fixing this, but the simplest would probably be to default to the JSON display of rows for problematic tables, and allow switching to the tabular view with a "View it at your own risk!"-style warning message.
It took a little time for me too, but it eventually (1min 40sec) loaded up to UI.
I think it is because of how table data is presented in Native BQ UI for Preview mode.
As you could noticed - it is showed in sort of hierarchical way.
I noticed this slowness for heavy tables (row size and/or hierarchical structure wise) when this was intorduced. And btw. only one row is showed for this particular table because of this.
Of course this is just my guess - would be great to hear from Google Team!
Meantime - when I am using internal application that uses same APIs for preview table data - i dont see any slowness at all (10 rows in 3 sec), which supports my above guess.

Google BigQuery: Stop running query

I have run a query on Google BigQuery several hours ago, and the query is still running. I clicked "abandon", but it appears there is no way to stop a query. What can I do? Can I contact Google somehow, so they stop the query?
I've been working on a project for a company which analyzes Google Analytics data with BigQuery, so I don't want to run them a big bill or something.
(Maybe StackOverflow is not the right place to ask this question, but I've tried to find another place, and I couldn't. On the BigQuery support page, it is said that questions should be asked here, with the google-bigquery tag, so I'm doing that).
I've written a query (which I don't want to paste or describe here, as someone might abuse it to block BigQuery or something, I don't know). Let's just say it includes inner joins. After I've written it, and before running it, the console message was something like "This will analyze 674KB of data", which looked OK, given the fact that the table only has 10,000 rows. I've got the same message after clicking on "abandon" query, something like "You can abandon this, but you will still be billed for 674KB of data".
I try very hard to make sure what I do doesn't cause problems to someone, so I've actually run that query on a local PostgreSQL database (with the exact same data - 10,000 rows) as in BigQuery, and the query there finishes in a second or two.
How can I cancel this query, and can I (the company I've worked for) be billed for something more than 674KB of data?
At the time being, there is no way to stop a BigQuery job once it's started, neither via web interface or API calls.
According to this, this feature may be added in the future.
As BigQuery will shard the query to multiple machines, even a large query (TeraByte level) will not have a large impact on an individual machine, let alone a query of 674KB. However, according to this, this is the amount that you will be charged.
Here are some tips to save money in BigQuery.
First thing to know is that, unlike traditional RDBMS, BigQuery is column based, and you will be charged by the amount of data in the columns rather than in the rows.
That means, don't include columns that you do not need in the query. This may sound trivial, but sometimes people coming from RDBMS may write queries like this:
SELECT
COUNT(*), user_id
FROM
[Dataset.Table]
The query is absolutely correct, but instead of being charged only the size of user_id column, Google would actually bill the whole table for this query. Therefore it's a good idea to explicitly specify the column names.
Break the tables into smaller chunks. Instead of having a single table that contains all the data, it's a good idea to split the table according to date, and use table wildcard functions to stitch the tables together during query. In this case, you won't be billed by rows that you don't need.
BigQuery supports canceling query jobs.
You can do this via the bq command line utility:
bq cancel <job_id>
or from the API via the jobs.cancel method (documented here)

Why does my SELECT query take so much longer to run on the web server than on the database itself?

I'm running the following setup:
Physical Server
Windows 2003 Standard Edition R2 SP2
IIS 6
ColdFusion 8
JDBC connection to iSeries AS400 using JT400 driver
I am running a simple SQL query against a file in the database:
SELECT
column1,
column2,
column3,
....
FROM LIB/MYFILE
No conditions.
The file has 81 columns - aplhanumeric and numeric - and about 16,000 records.
When I run the query in the emulator using the STRSQL command, the query comes back immediately.
When I run the query on my Web Server, it takes about 30 seconds.
Why is this happening, and is there any way to reduce this time?
While I cannot address whatever overhead might be involved in your web server, I can say there are several other factors to consider:
This may likely have to do primarily in the differences between the way the two system interfaces work.
Your interactive STRSQL session will start displaying results as quickly as it receives the first few pages of data. You are able to page down through that initial data, but generally at some point you will see a status message at the bottom of the screen indicating that it is now getting more data.
I assume your web server is waiting until it receives the entire result set. It wants to get all the data as it is building the HTML page, before it sends the page. Thus you will naturally wait longer.
If this is not how your web server application works, then it is likely to be a JT400 JDBC Properties issue.
If you have overridden any default settings, make sure that those are appropriate.
In some situations the OPTIMIZATION_GOAL settings might be a factor. But if you are reading the table (aka physical file or PF) directly, in its physical sequence, without any index or key, then that might not apply here.
Your interactive STRSQL session will default to a setting of *FIRSTIO, meaning that the query is optimized for returning the first pages of data quickly, which corresponds to the way it works.
Your JDBC connection will default to a "query optimize goal" of "0", which will translate to an OPTIMIZATION_GOAL setting of *ALLIO, unless you are using extended dynamic packages. *ALLIO means the optimizer will try to minimize the time needed to return the entire result set, not just the first pages.
Or, perhaps first try simply adding FOR READ ONLY onto the end of your SELECT statement.
Update: a more advanced solution
You may be able to bypass the delay caused by waiting for the entire result set as part of constructing the web page to be sent.
Send a web page out to the browser without any records, or limited records, but use AJAX code to load the remainder of the data behind the scenes.
Use large block fetches whenever feasible, to grab plenty of rows in one clip.
One thing you need to remember, the i saves the access paths it creates in the job in case they are needed again. Which means if you log out and log back in then run your query, it should take longer to run, then the second time you run the query it'll be faster. When running queries in a web application, you may or may not be reusing a job meaning the access paths have to be rebuilt.
If speed is important. I would:
Look into optimizing the query. I know there are better sources, but I can't find them right now.
Create a stored procedure. A stored procedure saves the access paths created.
With only 16000 rows and no WHERE or ORDER BY this thing should scream. Break the problem down to help diagnose where the bottleneck is. Go back to the IBM i, run your query in the SQL command line and then use the B, BOT or BOTTOM command to tell the database to show the last row. THAT will force the database to cough up the entire 16k result set, and give you a better idea of the raw performance on the IBM side. If that's poor, have the IBM administrators run Navigator and monitor the performance for you. It might be something unexpected, like the 'table' is really a view and the columns you are selecting might be user defined functions.
If the performance on the IBM side is OK, then look to what Cold Fusion is doing with the result set. Not being a CF programmer, I'm no help there. But generally, when I am tasked with solving multi-platform performance issues, the client side tends to consume the entire result set and then use program logic to choose what rows to display/work with. The server is MUCH faster than the client, and given the right hints, the database optimiser can make some very good decisions about how to get at those rows.