Streaming insert API throws 500 errors while using with google-api-java-clients batch request - google-bigquery

We are using streaming insert API along with google-api-java-clients batch request.
Initially everything was fine but after some time it started throwing so many 500 errors:
{"code":500,"errors":[{"domain":"global","message":"Unexpected. Please try again.","reason":"internalError"}],"message":"Unexpected. Please try again."}
Code snippet is below:
val batch = client.batch()
val request = new TableDataInsertAllRequest()
request.setRows(rows)
val insertAll = client.tabledata().insertAll(ProjectId, datasetId, tableId, request)
insertAll.queue(batch, new MyCallback(datasetId, tableId, rows, retryAttempt))
Sometimes we are getting BackEndError as well
{"code":500,"errors":[{"domain":"global","message":"Backend Error","reason":"backendError"}],"message":"Backend Error"}
NOTE: Prior to getting errors we got below error:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_06]
at java.net.SocketInputStream.read(SocketInputStream.java:150) ~[na:1.7.0_06]
at java.net.SocketInputStream.read(SocketInputStream.java:121) ~[na:1.7.0_06]
at sun.security.ssl.InputRecord.readFully(InputRecord.java:312) ~[na:1.7.0_06]
at sun.security.ssl.InputRecord.read(InputRecord.java:350) ~[na:1.7.0_06]
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927) ~[na:1.7.0_06]
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884) ~[na:1.7.0_06]
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102) ~[na:1.7.0_06]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) ~[na:1.7.0_06]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) ~[na:1.7.0_06]
at java.io.BufferedInputStream.read(BufferedInputStream.java:334) ~[na:1.7.0_06]
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:633) ~[na:1.7.0_06]
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:579) ~[na:1.7.0_06]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1322) ~[na:1.7.0_06]
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) ~[na:1.7.0_06]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) ~[na:1.7.0_06]
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36) ~[google-http-client-1.18.0-rc.jar:1.18.0-rc]
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94) ~[google-http-client-1.18.0-rc.jar:1.18.0-rc]
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965) ~[google-http-client-1.18.0-rc.jar:1.18.0-rc]
at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:241) ~[google-api-client-1.18.0-rc.jar:1.18.0-rc]
Questions
What is the cause of this?
What should we do to fix this?
EDIT
- Project id is deft-virtue-628
- We are executing using streaming insert api and do not have job id.

We were kind of experienced same issues before. Here are some of our findings, hope them can help you a bit.
regarding to java.net.SocketTimeoutException: Read timed out: it by a big chance wasn't due to the Bigquery side, but could be your system resource (JVM or network socket or relative things) get exhausted. We originally ran our program on a virtual machine on Windows Server 2008R2 when we experienced so many socket timed out. Since we moved to a new server and ran our program on native machine, we have barely seen this exceptions. Also, this time out would sometimes cause other exceptions such as SSL connection closed during handshake and etc.
as of Bigquery 500's error: we were not be able to find a way to avoid it, since it didn't show any pattern, Bigquery doesn't totally forbid or fail your following requests. Simply do backoff won't help to avoid this error, especially if you are using multithread, it's hard to control backoff time accurately. So, what we did was put data back to queue if 500 error happened and retry, the fact was that averagely after 1-2 times of retry, it success. Although we are still waiting for advises on an optimized way to deal with this error, we kind of just keep retrying as for now. By doing this, 500's error happens, but we can still get all of our data streamed into Bigquery.

Related

Load from GCS to GBQ causes an internal BigQuery error

My application creates thousands of "load jobs" daily to load data from Google Cloud Storage URIs to BigQuery and only a few cases causing the error:
"Finished with errors. Detail: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 7916072"
The application is written on Python and uses libraries:
google-cloud-storage==1.42.0
google-cloud-bigquery==2.24.1
google-api-python-client==2.37.0
Load job is done by calling
load_job = self._client.load_table_from_uri(
source_uris=source_uri,
destination=destination,
job_config=job_config,
)
this method has a default param:
retry: retries.Retry = DEFAULT_RETRY,
so the job should automatically retry on such errors.
Id of specific job that finished with error:
"load_job_id": "6005ab89-9edf-4767-aaf1-6383af5e04b6"
"load_job_location": "US"
after getting the error the application recreates the job, but it doesn't help.
Subsequent failed job ids:
5f43a466-14aa-48cc-a103-0cfb4e0188a2
43dc3943-4caa-4352-aa40-190a2f97d48d
43084fcd-9642-4516-8718-29b844e226b1
f25ba358-7b9d-455b-b5e5-9a498ab204f7
...
As mentioned in the error message, Wait according to the back-off requirements described in the BigQuery Service Level Agreement, then try the operation again.
If the error continues to occur, if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue. You can also try to reduce the frequency of this error by using Reservations.
For more information about the error messages you can refer to this document.

Issues while running with BigQueryIO.Write.Method.STORAGE_WRITE_API

We are testing with STORAGE_WRITE_API to insert data into BigQuery. We've seen several errors/warnings in our Dataflow pipeline(written in Java). It might work well in the beginning, but eventually the system lag would be increasing, it would stop processing any data from PubSub and the unacked messages piled up.
One common warning is:
Operation ongoing in step insertTableRowsToBigQuery/StorageApiLoads/StorageApiWriteSharded/Write Records for at least 03h35m00s without outputting or completing in state process
at java.base#11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
at java.base#11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
at java.base#11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
at java.base#11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager$Callback.await(RetryManager.java:153)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager$Operation.await(RetryManager.java:136)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager.await(RetryManager.java:256)
at app//org.apache.beam.sdk.io.gcp.bigquery.RetryManager.run(RetryManager.java:248)
at app//org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn.process(StorageApiWritesShardedRecords.java:453)
at app//org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$WriteRecordsDoFn$DoFnInvoker.invokeProcessElement(Unknown Source)
Other exceptions we've seen:
Got error io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Stream is closed
Got error io.grpc.StatusRuntimeException: ALREADY_EXIST
PodSandboxStatus of sandbox "..." for pod "df-...-pipeline-...-harness-qw4j_default(...)" error: rpc error: code = Unknown desc = Error: No such container
Code sample:
toBq.apply("insertTableRowsToBigQuery",
BigQueryIO
.writeTableRows()
.to(String.format("%s:%s.%s", PROJECT_ID, DATASET, table))
.withTriggeringFrequency(Duration.standardSeconds(options.getTriggeringFrequency()))
.withNumStorageWriteApiStreams(options.getNumStorageWriteApiStreams())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
There was a production issue related to connection being stuck after streaming 10MB which has been fixed. If you try again, it should work.

Setting a timeout on webservice consumer built with org.apache.axis.client.Call and running on Domino

I'm maintaining an antedeluvian Notes application which connects to a SAP back-end via a manually done 'Webservice'
The server is running Domino Release 7.0.4FP2 HF97.
The Webservice is not the more recently Webservice Consumer, but a large Java agent which is using Apache soap.jar (org.apache.soap). Below an example of the calling code.
private Call setupSOAPCall() {
Call call = new Call();
SOAPHTTPConnection conn = new SOAPHTTPConnection();
call.setSOAPTransport(conn);
call.setEncodingStyleURI(Constants.NS_URI_SOAP_ENC);
There has been a change in the SAP system which is now taking 8 minutes to complete (verified by SAP Team).
I'm getting an error message as follows:
[SOAPException: faultCode=SOAP-ENV:Client; msg=For input string: "906 "; targetException=java.lang.NumberFormatException: For input string: "906 "]
I found a blog article describing the error message quite closely:
https://thejavablog.wordpress.com/category/jmeter/
and I've come to the hypothesis that it is a timeout message that is returning to my Call object and that this timeout message is being incorrectly parsed, hence the NumberFormat Exception.
Looking at my logs I can see that there is a time difference of 62 seconds between my call and the response.
I recommended that the server setting in the server document, tab Internet Protocols/HTTP/Timeouts/Request timeouts be changed from 60 seconds to 600 seconds, and the http task restarted with
tell http restart
I've re-run the tests and I am getting the same error, and the time difference is still slightly more than 60 seconds, which is not what I was expecting.
I read Michael Rulnau's blog entry
http://www.mruhnau.net/2014/06/how-to-overcome-domino-webservice.html
which points to this APR
http://www-01.ibm.com/support/docview.wss?uid=swg1LO48272
but I'm not convinced that this would apply in this case, since there is no way that IBM would know that my Java agent is in fact making a Soap call.
My current hypothesis is that I have to use either the setTimeout() method on
org.apache.axis.client.Call
https://axis.apache.org/axis/java/apiDocs/org/apache/axis/client/Call.html
or on the org.apache.soap.transport.http.SOAPHTTPConnection
https://docs.oracle.com/cd/B13789_01/appdev.101/b12024/org/apache/soap/transport/http/SOAPHTTPConnection.html
and that the timeout value is an apache default, not something that is controlled by the Domino server.
I'd be grateful for any help.
I understand your approach, and I hope this is the correct one to solve your problem.
Add a debug (console write would be fine) that display the default Timeout then try to increase it to 10 min.
SOAPHTTPConnection conn = new SOAPHTTPConnection();
System.out.println("time out is :" + conn.getTimeout());
conn.setTimeout(600000);//10 min in ms
System.out.println("after setting it, time out is :" + conn.getTimeout());
call.setSOAPTransport(conn);
Now keep in mind that Dommino has also a Max LotusScript/Java execution time, check this value and (at least for a try) change it: http://www.ibm.com/support/knowledgecenter/SSKTMJ_9.0.1/admin/othr_servertasksagentmanagertab_r.html (it's version 9 help but this part should be identical)
I've since discovered that it wasn't my code generating the error; the default timeout for the apache axis SOAPHTTPConnetion is 0, i.e. no timeout.

Read Timed Out : sychronous query via Bigquery java API

We are using the big query JAVA API to retrieve results for our analytics reporting frontend. We are trying to retrieve the results synchronously. A lot of times we get Read timed out error, even before the query timeout as specified in the parameters. Here's the stack trace for a sample fail:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:830)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:787)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
I am not able to retrieve the job id of the resulting job as the error occurs before I can retrieve a JobReference object. The timeout specified in this case was 300 sec. The query failed well before it. The query contains three JOIN's and several GROUP EACH BY clauses. Can you suggest us a possible way to debug this ?
Adding the code snippet:
QueryRequest queryInfo = new QueryRequest().setQuery(sql)
.setTimeoutMs(timeOutInSec * 1000);
// get project id
BQGameConnectionDetails details = Config
.getBQConnectionDetails(gameId);
String projectId = details.getProjectId();
Bigquery.Jobs.Query queryRequest = getInstance(gameId).jobs()
.query(projectId, queryInfo);
QueryResponse response = queryRequest.execute();
There are two timeouts involved. The first timeout is in the HTTP request you've sent to bigquery. The second is in the bigquery request timeout. It sounds like you've set the latter to a large value, but the former is likely the timeout that you're hitting. If the HTTP request times out before the BigQuery timeout, the connection will be closed and BigQuery won't have a chance to respond.
There are two options: First is to increase the HTTP request timeout (which depends on the libraries you're using, but this page here may be helpful). The second is to decrease the bigquery timeout. This means you'll have to use jobs.getQueryResults() to read the actual results, but this is a more robust method because it doesn't matter how long the query takes, you can just call getQueryResults() in a loop. I would post a link to a good java sample that does this, but I don't know that one exists, unfortunately.

How to set read timeout for ftp control connection

I am using ftp apache's commomns net version 3.1 .
The ftp connection gets in hung state while doing listing operation INTERMITTENTLY .
The reason i feel so seems to be ftp client is kept waiting indefnitely for server response for FTP command PASV while trying to open data connection for listing operation.
How do i need to set read timeout on control connection to avoid this situation.
I have set readtimeout on data connection using setDataTimeout().
For more refer :
http://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/ftp/FTPClient.html#setDataTimeout(int)
1)Does setting setsoTimeout() after doing ftp connect() operation helps avoiding this situation on control connection?
For more refer :
http://commons.apache.org/proper/commons-net/apidocs/org/apache/commons/net/SocketClient.html#setSoTimeout(int)
2)If so,what is the optimum timeout value i need to set for setsotimeout() ?
Please find stack trace below:
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:140)
at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:464)
at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:506)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:234)
at java.io.InputStreamReader.read(InputStreamReader.java:188)
at java.io.BufferedReader.fill(BufferedReader.java:147)
at java.io.BufferedReader.read(BufferedReader.java:168)
at org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58
)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310)
at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290)
at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479)
at org.apache.commons.net.ftp.FTPClient.openDataConnection(FTPClient.java:7
69)
at org.apache.commons.net.ftp.FTPClient.openDataConnection(FTPClient.java:6
57)
at org.apache.commons.net.ftp.FTPClient.initiateListParsing(FTPClient.java:
3097)
at org.apache.commons.net.ftp.FTPClient.initiateListParsing(FTPClient.java:
3072)
at org.apache.commons.net.ftp.FTPClient.initiateListParsing(FTPClient.java:
2972
Any help on this will be appreciated:)
Thanks.