Migrating Lucene HitCollector (2.x) to Collector (3.x) - lucene

In one of our projects, we use an old Lucene version (2.3.2). I'm now looking at current Lucene versions (3.5.0) and trying to re-write the old code. In the old project, we extended TopFieldDocCollector to do some extra filtering in the collect() method. I'm having a bit of trouble understanding the new Collector class however, and I couldn't find a good example.
1) The method setScorer(). How/where do I get a Scorer object from?
2) The method collect(). I guess I need to create my own Collection and store the docIds I'm interested in, correct?
3) When extending TopDocsCollector instead, I'd need to implement a PriorityQueue to use in the constructor, correct? There seems to be no standard implementation for it. But I still need my own Collection to store docIds (or rather, ScoreDocs), and call populateResults after the search is finished?
Overall, it seems like extending Collector is (a lot) easier than extending TopDocsCollector, but maybe I'm missing something.

setScorer() is a hook, the Scorer is passed in by IndexSearcher when it actually does the search. So you basically override this method if you care about scores at all (e.g. saving the passed in Scorer away so you can use it). From its javadocs:
Called before successive calls to {#link #collect(int)}. Implementations
that need the score of the current document (passed-in to
{#link #collect(int)}), should save the passed-in Scorer and call
scorer.score() when needed.
collect() is called for each matching document, passing in the per-segment docid. Note if you need the 'rebased docid' (relative to the entire reader across all of its segments) then you must override setNextReader, saving the docBase, and compute docBase + docid. From Collector javadocs:
NOTE: The doc that is passed to the collect
method is relative to the current reader. If your
collector needs to resolve this to the docID space of the
Multi*Reader, you must re-base it by recording the
docBase from the most recent setNextReader call.
TopDocsCollector is a base class for TopFieldCollector (sort by field) and TopScoreDocCollector (sort by score). If you are writing a custom collector that sorts by score, then its probably easier to just extend TopScoreDocCollector.
Also: the simplest Collector example is TotalHitCountCollector!

Related

From a ByteBuddy-generated method, how do I set a (public) instance field in an object received as an argument to the return value of a MethodCall?

I am generating a class in ByteBuddy.
As part of one method implementation, I would like to set a (let's just say) public instance field in another object to the return value of a MethodCall invocation. (Keeping the example public means that access checks etc. are irrelevant.)
I thought I could use MethodCall#setsField(FieldDescription) to do this.
But from my prior question related to this I learned that MethodCall#setsField(FieldDescription) is intended to work only on fields of the instrumented type, and, looking at it now, I'm not entirely sure why or how I thought it was ever going to work.
So: is there a way for a ByteBuddy-generated method implementation to set an instance field of another object to the return value of a method invocation?
If it matters, the "instrumented method" (in ByteBuddy's terminology) accepts the object whose field I want to set as an argument. Naïvely I'd expect to be able to do something like:
MethodCall.invoke(someMethod).setsField(somePublicField).onArgument(2);
There may be problems here that I am not seeing but I was slightly surprised not to see this DSL option. (It may not exist for perfectly good reasons; I just don't know what they would be.)
This is not possible as of Byte Buddy 1.10.18, the mechanism was originally created to support getters/setters when defining beans, for example. That said, it would not be difficult to add; I think it would even be easiest to allow any custom byte code to be dispatched as a consumer of the method call.
I will look into how this can be done, but as a new feature, this will take some time before I find the empty space to do so. The change is tracked on GitHub.

In which cases Kotlin core Data Structures (Map, List, Set) are not really immutable?

It seems Kotlin core Data Structures (i.e. Map, List, Set) are really an interface and not really immutable.
If I have:
fun <K,V> foo(map: Map<K, V>) {
...
}
Can map change after I received it - from outside?
In which cases is it possible?
No.
The Map interface does not provide any mutator methods, so an implementation could be completely immutable.
But other implementations aren't — in particular, MutableMap is a subinterface, so anything implementing that is a mutable Map.  This means that code with a reference to a Map could potentially see the data changing, even though it couldn't make those changes itself.
Similarly, MutableList is a subinterface of List, and MutableSet is a subinterface of Set.
There are immutable implementations of those top-level interfaces (such as the kotlinx.collections.immutable and Guava libraries) — and you could write your own.  But the Kotlin language and type system don't yet provide strong support for deep immutability, only for read-only interfaces to data that may or may not be immutable.
(That's not to say that such support couldn't be added in future.  There is a lot of interest in it, and JetBrains have been considering it.)
Let's run an experiment:
class Foo {
#Test
fun foo() {
val items = mutableListOf("A")
run(items)
Thread.sleep(1000)
items.add("B")
println("Foo")
Thread.sleep(2000)
}
fun run(items: List<String>) {
thread(start = true) {
println("Run ${items.count()}")
Thread.sleep(2000)
println("Run ${items.count()}")
}
}
}
This test case will create a mutable list of 1 item, it will then pass a reference to this list into a method whose type is for an immutable list.
This method called run will diplay the length of the list.
Outside of the run method a new item will be appended to the list.
sleeps have been added ensure that the addition to the list happen after run's first statement but before the second print statement.
Let's examine the output:
Run 1
Foo
Run 2
As we can see, the list contents did indeed change, even though run took in an immutable list.
This is because MutableList and List are merely interfaces and all MutableList implementations also implement List.
When Kotlin refers to mutable and immutable it simply references whether the methods to modify the collection are present, not whether the contents can be changed.
So if you take in a list to a method using List as the parameter type then yes, the contents can vary if they are altered by another thread, if that is a concern then make a copy of the list as the first thing your method does.
As other have indicated, the map could be modified while you're using it in another thread... however that would already be broken unless your access to the map was #Synchronized, which would indicate that you knew it would change, so this possibility is not really a problem. Even if your method took a MutableMap parameter it would be wrong if it was changed while your method was in progress.
I think you're misinterpreting the purpose of the read-only collection interfaces.
When your method accepts a Map as a parameter, you are indicating that the method will not change the map. The purpose of the read-only Map interface is to allow you to say such things. You could do (map as? MutableMap)?.put(...), but that would be wrong since you promised not to do that. You could also crash the process in various ways or run an infinite loop, but that would also be wrong. Just don't do it. The language does not provide protection against malicious programmers.
Similarly, if your method returns a Map, that indicates that the receiver must not change it. Usually in such cases, you also promise (hopefully in a comment) that the returned map will not change. You can't keep this promise if anyone who receives the map can change it themselves, and that is why you return the Map instead of the underlying MutableMap

Is there a way in IntelliJ to make a usage search of a method and filter this by specific arguments passed to the method?

I have a method in my Service class which executes an hibernate update for any domain object:
update(Object obj)
It's called from lot's of classes in my project for different kind of objects. I would like to find all usages of this method when it's called for a specific domain object. I.e. call methods call wich executes an update of my Title object:
serviceClass.update(Title title)
I'm using IntelliJ as my IDE and I'm wondering if there is a way to find all those usages.
Does anyone have an IDEA how to do this?
Thanks a lot in advance,
Ronny
I've tried it with a small sample project and was able to achieve the desired behavior using Structural Search and Replace feature with the modified method calls template:
$MethodCall$ Text constraints, Text/regexp should be set to update so that methods with other names are ignored. $Parameter$ Occurrences count, Minimum count should be set to 1 to ignore method calls with no or more parameters.
Results:
If you're interested in the call chains that are providing a specific input into a given method, try the Analyze->Data Flow to Here command.
This allows you to see which values are passed in, through which call chains. And, for example, where null values might be coming from.
Quite a powerful feature, really.

"Fluent interfaces" that maintain order in the invokation chain

Is there an elegant/convinient way (without creating many "empty" classes or at least they should be not annoying) to have fluent interfcaes that maintain order on compilation level.
Fluent interfaces:
http://en.wikipedia.org/wiki/Fluent_interface
with an idea to permit this compilation
var fluentConfig = new ConfigurationFluent().SetColor("blue")
.SetHeight(1)
.SetLength(2)
.SetDepth(3);
and decline this
var fluentConfig = new ConfigurationFluent().SetLength(2)
.SetColor("blue")
.SetHeight(1)
.SetDepth(3);
Each step in the chain needs to return an interface or class that only includes the methods that are valid to use after the current step. In other words, if SetColor must come first, ConfigurationFluent should only have a SetColor method. SetColor would then return an object that only has a SetHeight method, and so forth.
In reality, the return values could all be the same instance of ConfigurationFluent but cast to different interfaces explicitly implemented by that class.
I've got a set of three ways of doing this in C++ using essentially a compile time FSM to validate the actions. You can find the code on github.
The short answer is no, there is no elegant or convenient way to enforce an order of constructing a class that properly impelemnts the "Fluent Interface" as you've linked.
The longer answer starts with playing devil's advocate. If I had dependent properties (i.e. properties that required other properties to be set first), then I could implement them something like this:
method SetLength(int millimeters)
if color is null throw new ValidationException
length = millimeters
return this
end
(NOTE: the above does not map to any real language, it is just psuedocode)
So now I have exceptions to worry about. If I don't obey the rules, the fluent object will throw an exception. Now let's say I have a declaration like yours:
var config = new Fluent().SetLength(2).SetHeight(1).SetDepth(3).SetColor("blue");
When I catch the ValidationException because length depends on the color being set first, how am I as the user supposed to know what the correct order is? Even if I had each SetX method on a different line, the stacktrace will just give me the line where the config variable was declared in most languages. Furthermore, how am I supposed to keep the rules of this object straight in my head compared to other objects? It is a cocophony of conflicting ideals.
Such precedence checks violate the spirit of the "Fluent Interface" approach. That approach was designed for conveniently configure complex objects. You take the convenience out when you attempt to enforce order.
To properly and elegantly implement the fluent interface there are a couple of guidelines that are best observed to make consumers of your class thank you:
Provide meaningful default values: minimizes need to change values, and minimizes chances of creating an invalid object.
Do not perform configuration validation until explicitly asked to do so. That event can be when we use the configuration to create a new fully configured object, or when the consumer explicitly calls a Validate() method.
In any exceptions thrown, make sure the error message is clear and points out any inconsistencies.
maybe the compiler could check that methods are called in the same order as they are defined.
this could be a new feature for compilers.
Or maybe by means of annotations, something like:
class ConfigurationFluent {
#Called-before SetHeight
SetColor(..) {}
#Called-After SetColor
SetHeight(..) {}
#Called-After SetHeight
SetLength(..){ }
#Called-After SetLength
SetDepth(..) {}
}
You can implement a state machine of valid sequence of operations and on each method call the state machine and verify if the sequence of operation is allowed or throw an exception if not.
I will not suggest this approach for Configurations though, it can get very messy and not readable

What is the use of reflection in Java/C# etc [duplicate]

This question already has answers here:
What is reflection and why is it useful?
(23 answers)
Closed 6 years ago.
I was just curious, why should we use reflection in the first place?
// Without reflection
Foo foo = new Foo();
foo.hello();
// With reflection
Class cls = Class.forName("Foo");
Object foo = cls.newInstance();
Method method = cls.getMethod("hello", null);
method.invoke(foo, null);
We can simply create an object and call the class's method, but why do the same using forName, newInstance and getMthod functions?
To make everything dynamic?
Simply put: because sometimes you don't know either the "Foo" or "hello" parts at compile time.
The vast majority of the time you do know this, so it's not worth using reflection. Just occasionally, however, you don't - and at that point, reflection is all you can turn to.
As an example, protocol buffers allows you to generate code which either contains full statically-typed code for reading and writing messages, or it generates just enough so that the rest can be done by reflection: in the reflection case, the load/save code has to get and set properties via reflection - it knows the names of the properties involved due to the message descriptor. This is much (much) slower but results in considerably less code being generated.
Another example would be dependency injection, where the names of the types used for the dependencies are often provided in configuration files: the DI framework then has to use reflection to construct all the components involved, finding constructors and/or properties along the way.
It is used whenever you (=your method/your class) doesn't know at compile time the type should instantiate or the method it should invoke.
Also, many frameworks use reflection to analyze and use your objects. For example:
hibernate/nhibernate (and any object-relational mapper) use reflection to inspect all the properties of your classes so that it is able to update them or use them when executing database operations
you may want to make it configurable which method of a user-defined class is executed by default by your application. The configured value is String, and you can get the target class, get the method that has the configured name, and invoke it, without knowing it at compile time.
parsing annotations is done by reflection
A typical usage is a plug-in mechanism, which supports classes (usually implementations of interfaces) that are unknown at compile time.
You can use reflection for automating any process that could usefully use a list of the object's methods and/or properties. If you've ever spent time writing code that does roughly the same thing on each of an object's fields in turn -- the obvious way of saving and loading data often works like that -- then that's something reflection could do for you automatically.
The most common applications are probably these three:
Serialization (see, e.g., .NET's XmlSerializer)
Generation of widgets for editing objects' properties (e.g., Xcode's Interface Builder, .NET's dialog designer)
Factories that create objects with arbitrary dependencies by examining the classes for constructors and supplying suitable objects on creation (e.g., any dependency injection framework)
Using reflection, you can very easily write configurations that detail methods/fields in text, and the framework using these can read a text description of the field and find the real corresponding field.
e.g. JXPath allows you to navigate objects like this:
//company[#name='Sun']/address
so JXPath will look for a method getCompany() (corresponding to company), a field in that called name etc.
You'll find this in lots of frameworks in Java e.g. JavaBeans, Spring etc.
It's useful for things like serialization and object-relational mapping. You can write a generic function to serialize an object by using reflection to get all of an object's properties. In C++, you'd have to write a separate function for every class.
I have used it in some validation classes before, where I passed a large, complex data structure in the constructor and then ran a zillion (couple hundred really) methods to check the validity of the data. All of my validation methods were private and returned booleans so I made one "validate" method you could call which used reflection to invoke all the private methods in the class than returned booleans.
This made the validate method more concise (didn't need to enumerate each little method) and garuanteed all the methods were being run (e.g. someone writes a new validation rule and forgets to call it in the main method).
After changing to use reflection I didn't notice any meaningful loss in performance, and the code was easier to maintain.
in addition to Jons answer, another usage is to be able to "dip your toe in the water" to test if a given facility is present in the JVM.
Under OS X a java application looks nicer if some Apple-provided classes are called. The easiest way to test if these classes are present, is to test with reflection first
some times you need to create a object of class on fly or from some other place not a java code (e.g jsp). at that time reflection is useful.