Request timed out error on copying data in azure data factory - azure-data-factory-2

I am receiving the below error on running a copy activity in my adf pipeline. My source and sink are cosmos db containers in different subscription. ADF pipeline is created in subscription which has target(sink) cosmos db container.
Error:
Error code 2200 Failure type User configuration issue
Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request
timed out. ActivityId: 0d2b8ebb-090d-43eb-8494-f82e53b3134b, Request
URI: /dbs/ZLQDAA==/colls/ZLQDAIez1wo=/docs, RequestStats: , SDK:
documentdb-dotnet-sdk/2.5.1 Host/64-bit
MicrosoftWindowsNT/6.2.9200.0,Source=Microsoft.Azure.Documents.Client,''Type=System.Threading.Tasks.TaskCanceledException,Message=A
task was
canceled.,Source=mscorlib,''Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request
timed out. ActivityId: 0d2b8ebb-090d-43eb-8494-f82e53b3134b, Request
URI: /dbs/ZLQDAA==/colls/ZLQDAIez1wo=/docs, RequestStats: , SDK:
documentdb-dotnet-sdk/2.5.1 Host/64-bit
MicrosoftWindowsNT/6.2.9200.0,Source=Microsoft.Azure.Documents.Client,''Type=System.Threading.Tasks.TaskCanceledException,Message=A
task was
canceled.,Source=mscorlib,''Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request
timed out. ActivityId: 0d2b8ebb-090d-43eb-8494-f82e53b3134b, Request
URI: /dbs/ZLQDAA==/colls/ZLQDAIez1wo=/docs, RequestStats: , SDK:
documentdb-dotnet-sdk/2.5.1 Host/64-bit
MicrosoftWindowsNT/6.2.9200.0,Source=Microsoft.Azure.Documents.Client,''Type=System.Threading.Tasks.TaskCanceledException,Message=A
task was
canceled.,Source=mscorlib,''Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request
timed out. ActivityId: 0d2b8ebb-090d-43eb-8494-f82e53b3134b, Request
URI: /dbs/ZLQDAA==/colls/ZLQDAIez1wo=/docs, RequestStats: , SDK:
documentdb-dotnet-sdk/2.5.1 Host/64-bit
MicrosoftWindowsNT/6.2.9200.0,Source=Microsoft.Azure.Documents.Client,''Type=System.Threading.Tasks.TaskCanceledException,Message=A
task was canceled.,Source=mscorlib,'

As per official documentation
Cosmos DB limits single request’s size to 2MB. The formula is Request Size = Single Document Size * Write Batch Size. If you hit error saying “Request size is too large.”, reduce the writeBatchSize value in copy sink configuration.
Page size: The number of documents per page of the query result. Default is "-1" which uses the service dynamic page up to 1000.
Throughput: Set an optional value for the number of RUs you'd like to apply to your CosmosDB collection for each execution of this data flow during the read operation. Minimum is 400.

Related

Load from GCS to GBQ causes an internal BigQuery error

My application creates thousands of "load jobs" daily to load data from Google Cloud Storage URIs to BigQuery and only a few cases causing the error:
"Finished with errors. Detail: An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 7916072"
The application is written on Python and uses libraries:
google-cloud-storage==1.42.0
google-cloud-bigquery==2.24.1
google-api-python-client==2.37.0
Load job is done by calling
load_job = self._client.load_table_from_uri(
source_uris=source_uri,
destination=destination,
job_config=job_config,
)
this method has a default param:
retry: retries.Retry = DEFAULT_RETRY,
so the job should automatically retry on such errors.
Id of specific job that finished with error:
"load_job_id": "6005ab89-9edf-4767-aaf1-6383af5e04b6"
"load_job_location": "US"
after getting the error the application recreates the job, but it doesn't help.
Subsequent failed job ids:
5f43a466-14aa-48cc-a103-0cfb4e0188a2
43dc3943-4caa-4352-aa40-190a2f97d48d
43084fcd-9642-4516-8718-29b844e226b1
f25ba358-7b9d-455b-b5e5-9a498ab204f7
...
As mentioned in the error message, Wait according to the back-off requirements described in the BigQuery Service Level Agreement, then try the operation again.
If the error continues to occur, if you have a support plan please create a new GCP support case. Otherwise, you can open a new issue on the issue tracker describing your issue. You can also try to reduce the frequency of this error by using Reservations.
For more information about the error messages you can refer to this document.

Rancher Cluster Flapping - Increase API Read Body Timeout?

We are using Rancher 2.2.13 and Kubernetes 1.13.12 in GKE. Our instance keeps flapping while being connected in Rancher. The agent logs show:
E0804 00:50:08.384154 6 request.go:853] Unexpected error when reading response body: context.deadlineExceededError{}
E0804 00:50:08.384223 6 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Secret: Unexpected error context.deadlineExceededError{} when reading response body. Please retry.
E0804 00:50:08.385380 6 request.go:853] Unexpected error when reading response body: &http.httpError{err:"context deadline exceeded (Client.Timeout exceeded while reading body)", timeout:true}
E0804 00:50:08.385431 6 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ConfigMap: Unexpected error &http.httpError{err:"context deadline exceeded (Client.Timeout exceeded while reading body)", timeout:true} when reading response body. Please retry.
The underlying issue seems to be that there are roughly ~24K ConfigMaps for this particular Cluster and ~17K secrets. So obviously the return is going to be immense for both.
Is there any way to increase the read timeout for the body? Is there a paging feature, or is there anyway to implement one if there isn't?

LeaseAlreadyPresent Error in Azure Data Factory V2

I am getting the following error in a pipeline that has Copy activity with Rest API as source and Azure Data Lake Storage Gen 2 as Sink.
"message": "Failure happened on 'Sink' side. ErrorCode=AdlsGen2OperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'Conflict'. Account: '{Storage Account Name}'. FileSystem: '{Container Name}'. Path: 'foodics_v2/Burgerizzr/transactional/_567a2g7a/2018-02-09/raw/inventory-transactions.json'. ErrorCode: 'LeaseAlreadyPresent'. Message: 'There is already a lease present.'. RequestId: 'd27f1a3d-d01f-0003-28fb-400303000000'..,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.Azure.Storage.Data.Models.ErrorSchemaException,Message=Operation returned an invalid status code 'Conflict',Source=Microsoft.DataTransfer.ClientLibrary,'",
The pipeline runs in a for loop with Batch size = 5. When I make it sequential, the error goes away, but I need to run it in parallel.
This is known issue with adf limitation variable thread parallel running.
You probably trying to rename filename using variable.
Your option is to run another child looping after each variable execution.
i.e. variable -> Execute Pipeline
enter image description here
or
remove those variable, hard coded those variable expression in azure activity.
enter image description here
Hope this helps

GoogleApiException: Google.Apis.Requests.RequestError Backend Error [500] when streaming to BigQuery

I'm streaming data to BigQuery for the past year or so from a service in Azure written in c# and recently started to get increasing amount of the following errors (most of the requests succeed):
Message: [GoogleApiException: Google.Apis.Requests.RequestError An
internal error occurred and the request could not be completed. [500]
Errors [
Message[An internal error occurred and the request could not be completed.] Location[ - ] Reason[internalError] Domain[global] ] ]
This is the code I'm using in my service:
public async Task<TableDataInsertAllResponse> Update(List<TableDataInsertAllRequest.RowsData> rows, string tableSuffix)
{
var request = new TableDataInsertAllRequest {Rows = rows, TemplateSuffix = tableSuffix};
var insertRequest = mBigqueryService.Tabledata.InsertAll(request, ProjectId, mDatasetId, mTableId);
return await insertRequest.ExecuteAsync();
}
Just like any other cloud service, BigQuery doesn't offer a 100% uptime SLA (it's actually 99.9%), so it's not uncommon to encounter transient errors like these. We also receive them frequently in our applications.
You need to build exponential backoff-and-retry logic into your application(s) to handle such errors. A good way of doing this is to use a queue to stream your data to BigQuery. This is what we do and it works very well for us.
Some more info:
https://cloud.google.com/bigquery/troubleshooting-errors
https://cloud.google.com/bigquery/loading-data-post-request#exp-backoff
https://cloud.google.com/bigquery/streaming-data-into-bigquery
https://cloud.google.com/bigquery/sla

Read Timed Out : sychronous query via Bigquery java API

We are using the big query JAVA API to retrieve results for our analytics reporting frontend. We are trying to retrieve the results synchronously. A lot of times we get Read timed out error, even before the query timeout as specified in the parameters. Here's the stack trace for a sample fail:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:830)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:787)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
I am not able to retrieve the job id of the resulting job as the error occurs before I can retrieve a JobReference object. The timeout specified in this case was 300 sec. The query failed well before it. The query contains three JOIN's and several GROUP EACH BY clauses. Can you suggest us a possible way to debug this ?
Adding the code snippet:
QueryRequest queryInfo = new QueryRequest().setQuery(sql)
.setTimeoutMs(timeOutInSec * 1000);
// get project id
BQGameConnectionDetails details = Config
.getBQConnectionDetails(gameId);
String projectId = details.getProjectId();
Bigquery.Jobs.Query queryRequest = getInstance(gameId).jobs()
.query(projectId, queryInfo);
QueryResponse response = queryRequest.execute();
There are two timeouts involved. The first timeout is in the HTTP request you've sent to bigquery. The second is in the bigquery request timeout. It sounds like you've set the latter to a large value, but the former is likely the timeout that you're hitting. If the HTTP request times out before the BigQuery timeout, the connection will be closed and BigQuery won't have a chance to respond.
There are two options: First is to increase the HTTP request timeout (which depends on the libraries you're using, but this page here may be helpful). The second is to decrease the bigquery timeout. This means you'll have to use jobs.getQueryResults() to read the actual results, but this is a more robust method because it doesn't matter how long the query takes, you can just call getQueryResults() in a loop. I would post a link to a good java sample that does this, but I don't know that one exists, unfortunately.