Support for Google BigQuery JDBC Driver using KNIME - google-bigquery

I get an error when using the following JDBC driver to retrieve BigQuery data in KNIME
The Error Message is in the Database Connection Table Reader node as follow:
Execute failed: " Simba BigQueryJDBCDriver 100033" Error getting job status.
However, this only occurs after consecutively running a couple of similar data flows including the BigQuery driver, in KNIME.
After google searches, no extra info was found. And I already updated the driver / KNIME to the latest version. Als tried to rerun the flow on a different system with no success.
Is there a quota/limits attached to usin g this specific driver?
Hope someone is able to help!

I found this issue tracker, it seems that you opened it and there's already interaction with the BigQuery's Engineering team. Thus, I suggest following the interaction made there and subscribing to it to keep updated as you'll receive e-mails regarding its progress.
Regarding your question about the limits for the driver, the quotas and limits that you usually have in BigQuery will apply to the Simba driver too (I.e. Concurrent queries limit, execution time limit, maximum response size, etc...).
Hope it helps.

Just discovered a new query limit is set at company's Group level, some miscommunication internally. Sorry for bothering and thans for the feedback!

Related

Spotfire - Biquery - Simba JDBC

I could not find an anwser to the following question. I have been testing Spotfire with BigQuery connector in the last week. I am using Simba JDBC driver. it works well ~1min per Go of Data import to memory. This is fast thanks to the BigQuery Storage API implementation⚡
What I cannot understand are the 2 following points:
Which version of Simba Driver and Spotfire have this highthroughput feature enabled?
Can I configure this feature? Adding EnableHighThroughputAPI=1 in the connection string toggles BQ storage API use. In Spotfire, changing this parameter from 1 to 0 does not make any difference - Storage API is always used. Is it internally overwritten by Spotfire?
An expert insight would be appreciated - I could not find a documentation on that (I am using 11.2.0 HF002 and simba jdbc42).
thanks!

Can't update dataset in Power Bi Service

I see this message in 30 minutes after clicking Refresh-now-button (in Dataset tab):
Something went wrong
There was an error when processing the data in the dataset.
Please try again later or contact support. If you contact support, please provide these details.
Data source error: {"error":{"code":"ModelRefresh_ShortMessage_ProcessingError","pbi.error":{"code":"ModelRefresh_ShortMessage_ProcessingError","parameters":{},"details":[{"code":"Message","detail":{"type":1,"value":"Timeout expired. The timeout period elapsed prior to completion of the operation."}}],"exceptionCulprit":1}}}
Cluster URI: WABI-WEST-EUROPE-redirect.analysis.windows.net
Activity ID: 6465a7a0-8ee3-4f9b-bfae-d26800ff83b4
Request ID: 2a3851e0-5a38-3b96-783c-d0f5e1b464cb
Time: 2020-03-19 11:16:05Z
ODBC: ERROR [HY000] [Microsoft][BigQuery] (100) Error interacting with REST API: Operation timed out after 6.0 hours. Consider reducing the amount of work performed by your operation so that it can complete within this limit
Power Bi tries to refresh dataset 30 minutes, then shows this error.
I use only Google BigQuery connection in my dataset.
I refreshed my data in BigQuery. Everything ok. It refreshes about 3-4 minutes.
I contacted with PowerBi support. Support-team told me, that problems with Google driver. And they can't help me...Until Google updates driver
Could someone help me?
There are few steps you need to check. Firstly, try to refresh credential for BigQuery to Power Bi and reduce the size of the data under 1GB.
I recommend to use Simba ODBC BigQuery Connector, which will eliminate the issue, please follow this documentation.
There is already a bug report on the issue tracker about it. I will let you know if there will be any updates.
I hope it helps.

Connection::SQLGetInfoW: [Simba][ODBC] (11180) SQLGetInfo property not found: 1750

This is a setup where Microsoft's Power BI is the frontend for the data presentation to end-users. Behind it there's an on-premises PBI gateway which connects to BigQuery via Magnitude Simba ODBC driver for BigQuery. Since two days ago, after always working flawlessly, the PBI data refresh started failing due to timeout.
BigQuery ODBC driver's debug shows these two errors below in hundreds of rows per refresh:
SimbaODBCDriverforGoogleBigQuery_connection_9.log:Aug 29 15:21:54.154 ERROR 544 Connection::SQLGetInfoW: [Simba][ODBC] (11180) SQLGetInfo property not found: 180
SimbaODBCDriverforGoogleBigQuery_connection_9.log:Aug 29 15:22:49.427 ERROR 8176 Connection::SQLGetInfoW: [Simba][ODBC] (11180) SQLGetInfo property not found: 1750
And only occurence per refresh of this:
SimbaODBCDriverforGoogleBigQuery_connection_6.log:Aug 29 16:56:15.102 ERROR 6704 BigQueryAPIClient::GetResponseCheckErrors: HTTP error: Error encountered during execution. Retrying may solve the problem.
After some intensive research web search, it kinda looks like this might be related to 'wrong' coding, either wrong data types or strings that are too big, but nothing conclusive.
Other, smaller, refreshes to the same place work without issues.
Do we have any knowledgebase or reference for such cryptic error messages? Any advice on how to troubleshoot this?
Already tried:
Searching Google;
Updating Magnitude Simba ODBC driver for BigQuery to the latest
version;
Updating PBI Gateway to the latest version;
Rebooting the gateway server.
This issue occurs when ODBC drivers try to pull the data in streams which is via port 444. You either need to enable port 444 for optimal performance or disable streams, so that the data is pulled using pagination(Not recommended for huge data).

Bigquery Stream Benchmark

Bigquery officially becomes our device log data repository and live monitor/analysis/diagnostic base. As one step further, We need to measure and monitor data streaming performance. Any relevant benchmark you are using for Bigquery live stream? What relevant once I can refer to?
Since streaming has a limited payload size, see Quota policy it's easier to talk about times and other side effects.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
We seen several side effects although:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout'
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take for these is an exponential-backoff with retry, even the support told to do so. Which personally doesn't make me happy.
UPDATE
Someone requested in the comments new stats, so I posted 2017. It's still the same, there was some heavy data reorganization for us, you see the spike, but essentially it's the same it's around 2sec if you use the max of the streaming insert.

Bigquery Job Failing with 500 Internal Error

This is really a question for Jordan Tigani and Google's BigQuery support that recommends we use stackoverflow:
I have a BigQuery job that has been executing daily for the past several months but now has started erroring out with an internal 500 error. One example is job id job_4J9LL4vp3xtM30WgqduvQqFFUN4 - would it be possible to know why this job is causing an internal bigquery error and if there's something I can do to fix it?
This is the same issue as bigquery error:Unexpected. Please try again
We've got a fix that we're currently testing, it should go out in this week's release. Unfortunately, there is no workaround, other than to use a table decorator that doesn't include the latest streaming data.