Check if Elasticsearch has finished indexing

Check if Elasticsearch has finished indexing - testing

Is there a way to check if Elasticsearch has finished processing my request?
I want to perform integration tests for my application checking if a record can be found after insertion.
For example if I make a following request:
POST /_all/_bulk
{
"update":{
"_id":419,
"_index":"popc",
"_type":"offers"
}
}
{
"doc":{
"id":"419",
"author":"foo bar",
"number":"642-00419"
},
"doc_as_upsert":true
}
And I check immediately, the test fails, because it takes some time for Elasticsearch to complete my request. If I sleep for 1 second before the assertion it works most of the time, but not always. I could extend the sleep time to eg. 3 seconds, but it makes the tests very slow, hence my question.
I have tried using cat pending tasks and pending cluster tasks endpoints, but the responses are always empty.
If any of this is relevant, I'm using Elasticsearch 5.4, Laravel Scout 3.0.5 and tamayo/laravel-scout-elastic 3.0.3

I found this PR: https://github.com/elastic/elasticsearch/pull/17986
You can use refresh: wait_for and Elasticsearch will only respond once your data is available for search.

You can wait for the response; when you receive the response to the update request, it's done (and you won't see it in pending or current tasks). I think the problem you're having is probably with the refresh interval (see dynamic settings). Indexed documents are not available for search right away, and this is the (maximum) amount of time before they will be available. (You can change this setting for what makes sense for your use case, or use this setting to let you know how long you need to sleep before searching for the integration tests.)
If you want to see at in-progress tasks, you can use the tasks api.

Related

Sonar Api: After scan is finished on new pull request it’s not possible to get /api/measures/component?metricKeys=coverage

SonarQube: Enterprise Edition Version 9.2.4 (build 50792)
Sonar client: 4.7.0.2747
Scan is launched for merge request in gitlab. I am requesting coverage for pull request.
Imidietly after scan (using scanner client) is finished I try to get coverage by following call:
http:///api/measures/component?metricKeys=coverage&component=&pullRequest=
I am getting:
404 : “{“errors”:[{“msg”:“Component \u0027u0027 of pull request \u0027\u0027 not found”}]}”
Interestingly if I put some sleep (1 second) after scan is finished and before i do a call to get coverage everything is fine.
It seems it has to do something with the fact that it’s a new pull request and regardless scan is finished and it generates link with results, it still requires some time before it will be possible for the api call i mentioned to be able to return coverage. Also, if i retry the operation(scan and get results) on already existing pull request there are no issues like this.
Could you please elaborate on this issue, is such behavior is expected or maybe there are some other ways I can get coverage right away after the scan is finished without adding any sleeps…
As a side observation under same circumstances if i do scan on new pull request and call another api (/issues/search?) to get list of detected issues and it successfully works without any additional sleeps,
Thank you.

After the call from the scanner client completes, SonarQube executes a "background task" in the project that finalizes the computations of measures. When the background task is complete, your measures will be available. This is why adding a "sleep" appears to work for you. In reality, it's just luck that you're sleeping long enough. The proper way to do this is to either manually check the status of the background task, or use tools that check for the background task completion under the covers.
If you're using Jenkins pipelines, and you have the "webhook" properly configured in SonarQube to notify completion of the background task, then the "waitForQualityGate" pipeline step does this, first checking to see if the task is already complete, and if not, going into a polling loop waiting for it to complete.
The machinery uses the "report-task.txt" file that should be written by the scanner. This is in the form of a Java properties file, but there's only one property in the file that you care about, which is the "ceTaskId" property. That is the id of the background task. You can then make an api call to "/api/ce/task?id=", which returns a block that tells you whether the background task is complete or not.

How to implement a busy/progress page page on Express.js

I have a webapp created with Node.js/Express.js/Pug that runs a bash script(mostly an Nmap scan) and displays the results. I'd like to implement some sort of page in between the start and the results to signify the system is working on the task.
I tried to just add another res.render(...) at the beginning of the route that starts the scan, but I ran into the problem that HTTP cannot send headers twice. Effectively, I can't send two http responses for one request; please let me know if I'm wrong here.
I'm still not very familiar with this stuff; I'm working with a group and this job fell to me, any help is appreciated.

Typically the route handler would:
trigger the long running script asynchronously
return an "in progress" page
Then the "in progress" page would ask the server if it was done yet via:
Websocket
Ajax polling
Meta refresh polling
You'd need to have the callback to the original asynchronous process keep track of where the response should go to (possibly using a GUID that would be passed to it and also returned as data in the "in progress" page).

How to continue test when the page still not completely loaded in selenium

Actually, I am creating automation testing for an e-commerce website. Actually, the website have function lazy load or something. I am testing it on UAT server. So, it will load the page slowly because the specification of the server. It takes more than 60 sec or more to load all the resources from the webpage. So, when I am trying to create selenium automation, it always waiting more than 60 sec to continue the next step (because waiting the page fully loaded). Please, someone give me tips how to continue run the test step after 10 seconds wait the page to load. It won't throw an exception, just continue the test step.

Not possible.
If you find some element and try execute some action while loading you will get stale element error + due loading issue you will have a lot of failed tests and it will take a lot more time to debug.
Automation means to execute fast and have reliable results.
It seems that this environment is not built for automation, you should request more resources.
As an alternative maybe you can use a headless driver or see if you can put the same build on a VM.
Why this is an issue: Selenium needs to wait for each request to be complete.For example when you request a page, if the page is not received entirely and the server still sending info then the request is not done, it is logical that you need a complete request in order to continue.
You should address this to your Project Manager/QA Lead and ask for advice/option on how to handle this.
Please note that these costs should be included/added in the automation price.You need to address this in a simple way:
good server -> automation runs smoothly and fast and the testing is
done faster
bad server -> unable to run automation since is not reliable and each
test has a high rate of failure => alternative X day(s) of
manual testing for each build
If this would be a coding issue like some delayed ajax request then you would have some solutions, devs could help, but if is an infrastructure/resources issue then if not depending on you, and you cannot solve it.
You could use try any type of wait implicit/explicit, explicit would throw some exception, but this is not a solution for poor resources.

How to update file upload messages using backbone?

I am uploading multiple files using javascript.
After I upload the files, I need to run several processing functions.
Because of the processing time that is required, I need a UI on the front telling the user the estimated time left of the entire process.
Basically I have 3 functions:
/upload - this is an endpoint for uploading the files
/generate/metadata - this is the next endpoint that should be triggered after /upload
/process - this is the last endpoint. SHould be triggered after /generate/metadata
This is how I expect the screen to look like basically.
Information such as percentage remaining and time left should be displayed.
However, I am unsure whether to allow server to supply the information or I do a hackish estimate solely using javascript.
I would also need to update the screen like telling the user messages such as
"currently uploading"
if I am at function 1.
"Generating metadata" if I am at function 2.
"Processing ..." if I am at function 3.
Function 2 only occurs after the successful completion of 1.
Function 3 only occurs after the successful completion of 2.
I am already using q.js promises to handle some parts of this, but the code has gotten scarily messy.
I recently come across Backbone and it allows structured ways to handle single page app behavior which is what I wanted.
I have no problems with the server-side returning back json responses for success or failure of the endpoints.
I was wondering what would be a good way to implement this function using Backbone.js

You can use a "progress" file or DB entry which stores the state of the backend process. Have your backend process periodically update this file. For example, write this to the file:
{"status": "Generating metadata", "time": "3 mins left"}
After the user submits the files have the frontend start pinging a backend progress function using a simple ajax call and setTimeout. the progress function will simply open this file, grab the JSON-formatted status info, and then update the frontend progress bar.
You'll probably want the ajax call to be attached to your model(s). Have your frontend view watch for changes to the status and update accordingly (e.g. a progress bar).

Long Polling request:
Polling request for updating Backbone Models/Views
Basically when you upload a File you will assign a "FileModel" to every given file. The FileModel will start a long polling request every N seconds, until get the status "complete".

In the new ASP.NET Web API, how do I design for "Batch" requests?

I'm creating a web API based on the new ASP.NET Web API. I'm trying to understand the best way to handle people submitting multiple data-sets at the same time. If they have 100,000 requests it would be nice to let them submit 1,000 at a time.
Let's say I have a create new Contact method in my Contacts Controller:
public string Put(Contact _contact)
{
//add new _contact to repository
repository.Add(_contact);
//return success
}
What's the proper way to allow users to "Batch" submit new contacts? I'm thinking:
public string BatchPut(IEnumerable<Contact> _contacts)
{
foreach (var contact in _contacts)
{
respository.Add(contact);
}
}
Is this a good practice? Will this parse a GET request with a JSON array of Contacts (assuming they are correctly formatted)?
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
Thanks a million!

When you PUT a collection, you are either inserting the whole collection or replacing an existing collection as if it was a single resource. It is very similar to GET, DELETE or POST a collection. It is an atomic operation. Using is as a substitute for individual calls to PUT a contact may not be very RESTfull (but that is really open for debate).
You may want to look at HTTP pipelining and send multiple PutContact requests of the same socket. With each request you can return standard HTTP status for that single request.
I implemented batch updates in the past with SOAP and we encountered a number of unforeseen issues when the system was under load. I suspect you will run into the same issues if you don't pay attention.
For example, the database may timeout in the middle of the batch update and the all hell broke loose in terms of failures, reliability, transactions etc. And the poor client had to figure out what was actually updated and try again.
When there was too many records to update, the HTTP request would time out because we took too long. That opened another can of worms.
Another concern was how much data would we accept during the update? Was 10MB of contacts enough? Perhaps 1MB? Larger buffers has numerous implications in terms of memory usage and security.
Hence my suggestion to look at HTTP pipelining.
Update
My suggestion would to handle batch creation of contacts as an async process. Just assume that a "job" is the same as a "batch create" process. So the service may look as follows:
public class JobService
{
// Post
public void Create(CreateJobRequest job)
{
// 1. Create job in the database with status "pending"
// 2. Save job details to disk (or S3)
// 3. Submit the job to MSMQ (or SQS)
// 4. For 20 seconds, poll the database to see if the job completed
// 5. If the job completed, return 201 with a URI to "Get" method below
// 6. If not, return 202 (aka the request was accepted for processing, but has not completed)
}
// Get
public Job Get(string id)
{
// 1. Fetch the job from the database
// 2. Return the job if it exists or 404
}
}
The background process that consumes stuff from the queue can update the database or alternatively perform a PUT to the service to update the status of Job to running and completed.
You'll need another service to navigate through the data that was just processed, address errors and so forth.
You background process may be need to be tolerant of validation errors. If not, or if your service does validation (assuming you are not doing database calls etc for which response times cannot be guaranteed), you can return a structure like CreateJobResponse that contains enough information for your client to fix the issue and resubmit the request. If you have to do some validation that is time consuming, do it in the background process, mark the job as failed and update the job with the information that will allow a client to fix the errors and resubmit the request. This assumes that the client can do something with the fact that the job failed.
If the Create method breaks the job request into many smaller "jobs" you'll have to deal with the fact that it may not be atomic and pose numerous challenges to monitor whether jobs completed successfully.

A PUT operation is supposed to replace a resource. Normally you do this against a single resource but when doing it against a collection that would mean you replace the original collection with the set of data passed. Not sure if you are meaning to do that but I am assuming you are just updating a subset of the collection in which case a PATCH method would be more appropriate.
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
That is really up to you. There is only a single response so you can send a 200 OK or a 400 Bad Request and put the details in the body.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas