Why is ElasticSearch request failing after 10.000 documents? - nest

We are using ElasticSearch.NET / NEST to query an ElasticSearch configuration. The plan is to fetch in batches of 1000 documents and process them before fetching the next 1000 documents
However it always fails after processing 10 batches
ELK Search failed Invalid NEST response built from a unsuccessful (500)
If we change the batch-size to 10,000 it will fail after 1 batch
With batch size of 100 it will fail after 100 batches
Failure is always efter 10.000 documents
The code looks something like this
private void ProcessRequest(SearchRequest request)
{
request.Size = 1000;
for (request.From = 0; request.From < 1_000_000; request.From += request.Size)
{
Console.WriteLine(request.From);
var responses = _client.Search<GroupStaticElkDocument>(request);
foreach (var response in responses)
_requestCounter.Add(response.ToRequest());
}
}

Maybe you shuld try paging/scrolling

Related

Database timeout in Azure SQL

We have a .Net Core API accessing Azure SQL (Gen5, 4 vCores)
Since quite some time,
the API keeps throwing below exception for a specific READ operation
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout
Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The READ operation has code to read rows of data and convert an XML column into a specific output format.
Most of the read operation extracts hardly 4-5 rows # a time.
The tables involved in the query have ~ 500,000 rows
We are clueless on Root Cause of this issue.
Any hints on where to start looking # for root cause?
Any pointer would be highly appreciated.
NOTE : Connection string has following settings, apart from others
MultipleActiveResultSets=True;Connection Timeout=60
Overall code looks something like this.
HINT: The above timeout exception comes # ConvertHistory, when the 2nd table is being read.
HttpGet]
public async Task<IEnumerable<SalesOrders>> GetNewSalesOrders()
{
var SalesOrders = await _db.SalesOrders.Where(o => o.IsImported == false).OrderBy(o => o.ID).ToListAsync();
var orders = new List<SalesOrder>();
foreach (var so in SalesOrders)
{
var order = ConvertSalesOrder(so);
orders.Add(order);
}
return orders;
}
private SalesOrder ConvertSalesOrder(SalesOrder o)
{
var newOrder = new SalesOrder();
var oXml = o.XMLContent.LoadFromXMLString<SalesOrder>();
...
newOrder.BusinessUnit = oXml.BusinessUnit;
var history = ConvertHistory(o.ID);
newOrder.history = history;
return newOrder;
}
private SalesOrderHistory[] ConvertHistory(string id)
{
var history = _db.OrderHistory.Where(o => o.ID == id);
...
}
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
From Microsoft Document,
You will get this error in both conditions Connection timeout or Query or command timeout. first identify it from call stack of the error messages.
If you found it as a connection issue you can either Increase connection timeout parameter. if you are still getting same error, it is causing due to a network issue.
from information that you provided It is Query or command timeout error to work around this error you can set CommandTimeout for query or command
command.CommandTimeout = 10;
The default timeout value is 30 seconds, the query will continue to run until it is finished if the time-out value is set to 0 (no time limit).
For more information refer Troubleshoot query time-out errors provided by Microsoft.

How can get the list test more than 250 from a run using Test Rail API

Returns a list of tests for a test run
https://www.gurock.com/testrail/docs/api/reference/tests#gettests
There is an API from Test Rail, it will return a list test case from one test run (id)
This is a limitation, only return up to 250 entities in once.
How can I get more than 400 or 500 case from a run?
Maximum and default value of limit parameter is 250, you can't get more than 250 with one request. But they have offset parameter for that, so you can set the start position of the next chunk.
And also you can use "next link" from the response, here is the example:
"_links": { "next": "/api/v2/get_tests/1&limit=250&offset=250", "prev": null }
Here is some python code I wrote that will query to get all of the results.
def _execute_get_until_finished(self, resp, extract):
results = []
finished = False
while not finished:
results.extend(resp[extract])
finished = resp["_links"]["next"] is None
if not finished:
resp = self.send_get(resp["_links"]["next"].replace("/api/v2/", "").replace("&limit=250", ""))
return results
Example:
cases = self._execute_get_until_finished(
self.send_get(f"get_cases/{self.project_id}/{self.suite_id}"),
"cases")
Note, reason for removing limit is a bug I found where next would be null even though there were still results to retrieve. Removing this limit fixed that issue.

BigQuery Stream Lost the stream data but there are no error reported

$insertResponse = $bqTable->insertRows($insertRows);
if ($insertResponse->isSuccessful()) {
return true;
} else {
foreach ($insertResponse->failedRows() as $row) {
foreach ($row['errors'] as $error) {
Log::error('Streaming to BigQuery Error: ' . $error['reason'] . ' ' . $error['message']);
}
}
return false;
}
I used the above code (copied from the php client sample codes).
Basically, what it does is. If the streaming successful, I will return true, and if the streaming failed, I will return false.
I have 524845 rows to insert. To avoid the over size error, for each 1000 rows, I called the above stream statement. And then for the last 845 rows, I called the stream statement again.
if the streaming is successful (return true), I will continue to stream next 1000 rows. If the streaming fails, then I will stop the full streaming process.
I found that bigquery streaming is not stable. In my tests, most times, I had all the 534845 rows streamed into the table. But once a while, I lost some rows. Such as one time I only had 522845 rows streamed. No error reported/logged.
Due to I stream 1000 rows each time, it seems two of my stream activities failed, I lost 2000 rows. But there is no error report, also if it reports error, my code will stop.
Please advise what should I do next to debug this BigQuery Streaming issue.
Is an insertId being provided while inserting rows? If so, is it possible the insertIds may be duplicated? It could cause BigQuery to discard what it believes to potentially be duplicate rows.

RavenDB Query Statistics server execution time in milliseconds

I am trying to print the query statistics upon execution of the given query. Particularly I am interested in execution time on server in milliseconds property. Below is my code for reference
void Main()
{
var documentStore = DocumentStoreHolder.Store;
Load_Stats(documentStore);
}
// Define other methods and classes here
public static void Load_Stats(IDocumentStore documentStore)
{
using (var session = documentStore.OpenSession())
{
RavenQueryStatistics stats;
IRavenQueryable<Order> recentOrdersQuery = from order in session.Query<Order>().Statistics(out stats) where order.Company=="companies/1" select order;
List<Order> recentOrders = recentOrdersQuery.Take(3).ToList();
Console.WriteLine("Index used was: " + stats.IndexName);
Console.WriteLine($"Other stats : 1. Execution time on the server : {stats.DurationMilliseconds} 2.Total number of results {stats.TotalResults} 3. The last document ETag {stats.ResultEtag} 4. The timestamp of last document indexed by the index {stats.IndexTimestamp}");
}
But upon repeated execution of this query I get time taken to run query on server in milliseconds as -1. I am failing to understand why it is happening so. Should I assign the result to a long variable or is it allowed to print the result as such (stats.DurationMilliseconds). TIA
The most likely reason is that this is because RavenDB was able to serve the request from the client cache, instead of going to the server

Why is the first SaveChanges slower than following calls?

I'm investigating some performance problems in an experimental scheduling application I'm working on. I found that calls to session.SaveChanges() were pretty slow, so I wrote a simple test.
Can you explain why the first iteration of the loop takes 200ms and subsequent loop 1-2 ms? How I can I leverage this in my application (I don't mind the first call to be this slow if all subsequent calls are quick)?
private void StoreDtos()
{
for (int i = 0; i < 3; i++)
{
StoreNewSchedule();
}
}
private void StoreNewSchedule()
{
var sw = Stopwatch.StartNew();
using (var session = DocumentStore.OpenSession())
{
session.Store(NewSchedule());
session.SaveChanges();
}
Console.WriteLine("Persisting schedule took {0} ms.",
sw.ElapsedMilliseconds);
}
Output is:
Persisting schedule took 189 ms. // first time
Persisting schedule took 2 ms. // second time
Persisting schedule took 1 ms. // ... etc
Above is for an in-memory database. Using a http connection to a Raven DB instance (on the same machine), I get similar results. The first call takes noticeably more time:
Persisting schedule took 1116 ms.
Persisting schedule took 37 ms.
Persisting schedule took 14 ms.
On Github: RavenDB 2.0 testcode and RavenDB 2.5 testcode.
The very first time that you call RavenDB, there are several things that have to happen.
We need to prepare the serializers for your entities, which takes time.
We need to create the TCP connection to the server.
On the next calls, we can reuse the connection that is already open and the created serializers.