Force Lucene.NET to show results in N seconds - lucene

I like to get response from Lucene.NET after N seconds, even no results yet. How?
Currently I am facing a problem. All Lucene.NET index is located in a central place, and each instance, after reboot, have to copy the index to local before search can happen.
The copy will be initiated after first Lucene.NET request and take few minutes to complete. Currently all Lucene.NET just hung and wait, so I like to FORCE them to response no matter what.
Please help.
[EIDT]
So the path is using TimeLimitingCollector, this gives me another question, how to use multiple connector together?
My original code is:
TopFieldCollector collector = TopFieldCollector.create(Sort.RELEVANCE, resultAmount,
false,
true /* trackDocScores */,
true /* trackMaxScore */,
false /* docsInOrder */);
searcher.Search(query, new PositiveScoresOnlyCollector(collector));
Where should I put TimeLimitingCollector?

You can use a TimeLimitingCollector.
[EDIT]
I am not familiar with Lucene.NET, but with Lucene Java you just need to wrap your collector inside a TimeLimitingCollector, and it will throw a time-out exception whenever trying to collect a document too late.

Related

runtimeservice.getVariables does not work because it can't find process instance id

I'm new to flowable and I'm trying to start a process instance with variables. params here is the Map of <String,Object> that I'm using to start the process. It all goes well, but if I try to get my variables back it tells me
"execution 22f42f67-5f88-11e9-9df0-d46d6dbfea92 doesn't exist"
But if I search for it in my process instances list, is there. This is what I do:
pi = runtimeService.startProcessInstanceById(processDefinitionId, params);
runtimeService.getVariables(pi.getId());
I'm stuck with this problem and I do not understand why it keeps doing this. What am I missing?
Flowable has the concept of RuntimeService and HistoryService. The first one contains only the runtime data (what is currently active) and the second one has all the data. The runtime data is a subset of the history data.
The reason why you can’t find the variables via the RuntimeService is due to the fact that the process is completed.
If you use the HistoryService then it would work as expected.

Set variables in Javascript job entry at root level

I need to set variables in root scope in one job to be used in a different job. The first job has a Javascript job entry, with the statements:
parent_job.setVariable("customers_full_path", "C:\\customers22.csv", "r");
true;
But the compilation fails with:
Couldn't compile javascript:
org.mozilla.javascript.EvaluatorException: Can't find method
org.pentaho.di.job.Job.setVariable(string,string,string). (#2)
How to set a variable at root level in a Javascript job entry?
Sorry for the passive agressive but:
I don't know if you are new to Pentaho but, the most common mistake for new users, with previous knowledge of programming, is to be sort of 'addicted' to know methods, as such you are using JavaScript for a functionality that is built in the tool. Both Transformations(KTR) and JOBs(KJB) have a similar step, you can better manipulate this in a KTR.
JavaScript steps slow down the flow considerably, so try to stay away from those as much as possible.
EDIT:
Reading This article, seems the only thing you're doing wrong is the actual syntax of the command..
Correct usage :
parent_job.setVariable("Desired Value", [name_of_variable]);
The command you described has 3 parameters, when it should be 2. If you have more than 1 variable you need to set, use 3 times the command. Try it out see if it works.

Wait.on(signals) use in Apache Beam

Is it possible to write to 2nd BigQuery table after writing to 1st has finished in a batch pipeline using Wait.on() method(new feature in Apache Beam 2.4)? The example given in the Apache Beam documentation is:
PCollection<Void> firstWriteResults = data.apply(ParDo.of(...write to first database...));
data.apply(Wait.on(firstWriteResults))
// Windows of this intermediate PCollection will be processed no earlier than when
// the respective window of firstWriteResults closes.
.apply(ParDo.of(...write to second database...));
But why would I write to database from within ParDo? Can we not do the same by using the I/O transforms given in Dataflow?
Thanks.
Yes this is possible, although there are some known limitations and there is currently some work being done to further support this.
In order to make this work you can do something like the following:
WriteResult writeResult = data.apply(BigQueryIO.write()
...
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
);
data.apply(Wait.on(writeResults.getFailedInserts()))
.apply(...some transform which writes to second database...);
It should be noted that this only works with streaming inserts and wont work with file loads. At the same time there is some work being done currently to better support this use case that you can follow here
Helpful references:
http://moi.vonos.net/cloud/beam-send-pubsub/
http://osdir.com/apache-beam-users/msg02120.html

Can't replace mongo document

I am attempting to save documents to a mongoDB cluster (sharded replica sets) and am having a strange issue. I am using pymongo 2.7.2 and TokuMX 1.5 mongodb 2.4.10.
When I attempt to save (overwrite) existing documents I am getting an exception that looks like the document I am saving is too large:
doc = db.collection.find_one()
db.collection.save(doc)
pymongo.errors.OperationFailure: BSONObj size: 18798961 (0x71D91E01) is invalid. Size must be between 0 and 16793600(16MB) First element: op: "u"
However this works fine:
doc = db.collection.find_one()
db.collection.remove({'_id': doc['_id']})
db.collection.save(doc)
The document in question is about 9mb, so it looks like when I attempt to replace the document it is somehow doubling the size of the document, exceeding the 16mb limit.
Any ideas as to what could cause this behavior?
Apparently this is a known issue with TokuMX. Oplog entries are twice the size of the document, so replacing a 9mb document will result in a 18mb oplog entry- which raises the exception.
The solution would be to limit document writes to less than 8mb so that oplog entries never exceed 16mb.
I think this is a side effect of how save is implemented in PyMongo.
Under the hood if the document has a _id then the save(doc) is turned into an update(doc, doc). That is where the doubling is coming into play since the query+update is 18MB.
When you removed the _id you changed the save(doc) into a insert(doc) of a new document with a new _id. I don't think that is what you wanted.
Rather than use save I would recommend constructing a query with just the _id field from the original document and doing the update call manually. I would even go so far as you should enter a Jira ticket to get PyMongo to do this for you.
HTH,
Rob.

SMJobRemove succeeds, but plist and helper tool not deleted

I'm trying to remove a privileged helper tool installed via SMJobBless, I'm getting a positive return value and no errors, yet the files at /Library/PrivilegedTools and /Library/LaunchDaemons are not deleted. Do I have to delete these files myself?
From the documentation I read:
Return Value true if the job was removed successfully, otherwise
false.
I'm calling the following to remove the job:
result = SMJobRemove(kSMDomainSystemLaunchd, (__bridge CFStringRef)label, _authRef, YES, &errorCF);
Thanks jatoben, that thread had the answer I was looking for.
As suspected you do have to remove the files yourself or use the following: (Taken from Apple dev forums:)
SMJobRemove is the equivalent of "launchctl remove". That is, it
removes the job from launchd but has no effect on the disk at all.
Thus the job will get reloaded the next time you start up. To get
around that you have to either remove the plist yourself or by
fork/exec'ing "launchctl unload -w".
Have you seen https://github.com/brenwell/SMJobBless-Demo/blob/master/Uninstall.sh? It was very helpful for me.