Efficiency: reuse Term Lucene 6 - lucene

I want to reuse the Term object instead of creating a new one every time I call this method:
public long getDF(String term) throws Exception {
return indexReader.docFreq(new Term("content", term));
}
I read in the documentation that, I can use this constructor of Term to reuse it:
public Term(String fld)
Constructs a Term with the given field and empty text. This serves two purposes: 1) reuse of a Term with the same field. 2) pattern for a query.
However, I don't know what is the next step as there is no setters in the Term documentation nor reset() method.
Any hint on how to achieve this?

Constructing Terms is cheap. You probably shouldn't worry too much about trying to reuse them. If you are seeing real performance issues, you should run a profiler. I'd guess constructing terms is not the real problem, and attempting to reuse clauses like this will just complicate things for no discernible benefit.
That said, you could reuse it by getting the Term's BytesRef from Term.bytes(), and directly modifying the underlying byte array.
String text = "text";
Term term = new Term("field");
BytesRef bytes = term.bytes();
bytes.bytes = new byte[UnicodeUtil.maxUTF8Length(text.length())];
bytes.length = UnicodeUtil.UTF16toUTF8(text, 0, text.length(), bytes.bytes);
Be careful that you aren't changing the value of a Term that is still in use. For instance, attempting to add two clauses to a BooleanQuery like this will, of course, not work.

Related

How to create a new list of Strings from a list of Longs in Kotlin? (inline if possible)

I have a list of Longs in Kotlin and I want to make them strings for UI purposes with maybe some prefix or altered in some way. For example, adding "$" in the front or the word "dollars" at the end.
I know I can simply iterate over them all like:
val myNewStrings = ArrayList<String>()
longValues.forEach { myNewStrings.add("$it dollars") }
I guess I'm just getting nitpicky, but I feel like there is a way to inline this or change the original long list without creating a new string list?
EDIT/UPDATE: Sorry for the initial confusion of my terms. I meant writing the code in one line and not inlining a function. I knew it was possible, but couldn't remember kotlin's map function feature at the time of writing. Thank you all for the useful information though. I learned a lot, thanks.
You are looking for a map, a map takes a lambda, and creates a list based on the result of the lambda
val myNewStrings = longValues.map { "$it dollars" }
map is an extension that has 2 generic types, the first is for knowing what type is iterating and the second what type is going to return. The lambda we pass as argument is actually transform: (T) -> R so you can see it has to be a function that receives a T which is the source type and then returns an R which is the lambda result. Lambdas doesn't need to specify return because the last line is the return by default.
You can use the map-function on List. It creates a new list where every element has been applied a function.
Like this:
val myNewStrings = longValues.map { "$it dollars" }
In Kotlin inline is a keyword that refers to the compiler substituting a function call with the contents of the function directly. I don't think that's what you're asking about here. Maybe you meant you want to write the code on one line.
You might want to read over the Collections documentation, specifically the Mapping section.
The mapping transformation creates a collection from the results of a
function on the elements of another collection. The basic mapping
function is
map().
It applies the given lambda function to each subsequent element and
returns the list of the lambda results. The order of results is the
same as the original order of elements.
val numbers = setOf(1, 2, 3)
println(numbers.map { it * 3 })
For your example, this would look as the others said:
val myNewStrings = longValues.map { "$it dollars" }
I feel like there is a way to inline this or change the original long list without creating a new string list?
No. You have Longs, and you want Strings. The only way is to create new Strings. You could avoid creating a new List by changing the type of the original list from List<Long> to List<Any> and editing it in place, but that would be overkill and make the code overly complex, harder to follow, and more error-prone.
Like people have said, unless there's a performance issue here (like a billion strings where you're only using a handful) just creating the list you want is probably the way to go. You have a few options though!
Sequences are lazily evaluated, when there's a long chain of operations they complete the chain on each item in turn, instead of creating an intermediate full list for every operation in the chain. So that can mean less memory use, and more efficiency if you only need certain items, or you want to stop early. They have overhead though so you need to be sure it's worth it, and for your use-case (turning a list into another list) there are no intermediate lists to avoid, and I'm guessing you're using the whole thing. Probably better to just make the String list, once, and then use it?
Your other option is to make a function that takes a Long and makes a String - whatever function you're passing to map, basically, except use it when you need it. If you have a very large number of Longs and you really don't want every possible String version in memory, just generate them whenever you display them. You could make it an extension function or property if you like, so you can just go
fun Long.display() = "$this dollars"
val Long.dollaridoos: String get() = "$this.dollars"
print(number.display())
print(number.dollaridoos)
or make a wrapper object holding your list and giving access to a stringified version of the values. Whatever's your jam
Also the map approach is more efficient than creating an ArrayList and adding to it, because it can allocate a list with the correct capacity from the get-go - arbitrarily adding to an unsized list will keep growing it when it gets too big, then it has to copy to another (larger) array... until that one fills up, then it happens again...

Kotlin most efficient way sequence to set

Hi I've got list of 1330 objects and would like to apply method and obtain set as result.
val result = listOf1330
.asSequence()
.map {
someMethod(it)
}
val resultSet = result.toSet()
It works fine without toSet but if then execution time is about 10 times longer.
I've used sequence to make it work faster and it is but as a result I need list without duplicates (set).
Simply: What is most effective way to convert sequence to set?
val result = listOf1330.mapTo(HashSet()) { someMethod(it) }
It makes less sense to use streams or sequences to implement the transformation - you will need all elements from the collection, not several. The mapTo (and map) functions are inline in Kotlin. It means the code will be substituted into the call site, it will not have lambda created and executed many times. We use mapTo to avoid the second copy of the collection done by the toSet() function.
The .parallelStream() may add more performance, if you like to run the computation in several threads. It is still a good idea to measure how good the load is balanced between threads. The performance may depend on the collection implementation class, on which you call it
If your someObject has a slow implementation of equals() or hashCode(), or gives the same hash code for many objects, then that could account for the delay, and you may be able to improve it.
Otherwise, if the objects are big, the delay may be mostly due to the amount of memory that must be accessed to store them all; if so, that's the price you'll have to pay if you want a set with all those objects in memory.
Sequence.toSet() uses a LinkedHashSet.  You could try providing another Set instance, using e.g. toCollection(HashSet()), to see if that's any faster.  (You wouldn't get the same iteration order, though.)
I agree with gidds answer on HashSet and LinkedHashSet performance.
LinkedHashSet is more expensive for insertions than HashSet;
However, in the above use case, I think we can leverage parallelStream to improve the performance. Under the hood, Kotlin uses the Java parallelStream.
val result: Set<String> = listOf("sdgds", "fdgdfsg", "dsfgsdfg")
.parallelStream()
.map {
someMethod(it)
}.collect(Collectors.toSet())
The Collectors.toSet() uses HashSet. So, we should be ok in insertion performance perspective.
Use distict or distictBy.
val result = sequenceOf("a", "b", "a", "c").distinct()
// -> "a", "b", "c"
// for more complex cases use custom comparator function
val result = getMyObjectsSequence().distinctBy { it.name }
This approach lets keep using sequence without involving explicit Iterables (List, Set, etc.).
Nevertheless, there is no magic, and "distinct" still uses HashSet under the hood and in case of really huge sequence it may cause sufficient memory usage and it must be kept in mind while applying this function.

Minecraft bukkit scheduler and procedural instance naming

This question is probably pretty obvious to any person who knows how to use Bukkit properly, and I'm sorry if I missed a solution in the others, but this is really kicking my ass and I don't know what else to do, the tutorials have been utterly useless. There are really 2 things that I need help doing:
I need to learn how to create an indefinite number of instances of an object. I figure it'd be like this:
int num = 0;
public void create(){
String name = chocolate + num;
Thingy name = new Thingy();
}
So you see what I'm saying? I need to basically change the name that is given to each new instance so that it doesn't overwrite the last one when created. I swear I've looked everywhere, I've asked my Java professor and I can't get any answers.
2: I need to learn how to use the stupid scheduler, and I can't understand anything so far. Basically, when an event is detected, 2 things are called: one method which activates instantly, and one which needs to be given a 5 second delay, then called. The code is like this:
public onEvent(event e){
Thingy thing = new Thingy();
thing.method1();
thing.doOnDelay(method2(), 100 ticks);
}
Once again, I apologize if I am not giving too many specifics, but I cannot FOR THE LIFE OF ME find anything about the Bukkit event scheduler that I can understand.
DO NOT leave me links to the Bukkit official tutorials, I cannot understand them at all and it'll be a waste of an answer. I need somebody who can help me, I am a starting plugin writer.
I've had Programming I and II with focus in Java, so many basic things I know, I just need Bukkit-specific help for the second one.
The first one has had me confused since I started programming.
Ok, so for the first question I think you want to use a data structure. Depending on what you're doing, there are different data structures to use. A data structure is simply a container that you can use to store many instances of a type of object. The data structures that are available to you are:
HashMap
HashSet
TreeMap
List
ArrayList
Vector
There are more, but these are the big ones. HashMap, HashSet, and TreeMap are all part of the Map class, which is notable for it's speedy operations. To use the hashmap, you instantiate it with HashMap<KeyThing, ValueThingy> thing = new HashMap<KeyThing, ValueThing>(); then you add elements to it with thing.put(key, value). Thn when you want to get a value out of it, you just use thing.get(key) HashMaps use an algorithm that's super fast to get the values, but a consequence of this is that the HashMap doesn't store it's entries in any particular order. Therefore when you want to loop though it with a for loop, it randomly returns it's entries (Not truly random because memory and stuff). Also, it's important to note that you can only have one of each individual key. If you try to put in a key that already exists in the map, it will over-right the value for that key.
The HashSet is like a HashMap but without storing values to go with it. It's a pretty good container if all you need to use it for is to determine if an object is inside it.
The TreeMap is one of the only maps that store it's values in a particular order. You have to provide a Comparator (something that tells if an object is less than another object) so that it knows the order to put the values if it wants them to be in ascending order.
List and ArrayList are not maps. Their elements are put in with a index address. With the List, you have to specify the number of elements you're going to be putting into it. Lists do not change size. ArrayLists are like lists in that each element can be retrieved with arrayListThing.get(index) but the ArrayList can change size. You add elements to an ArrayList by arrayListThing.add(Thing).
The Vector is a lot like an ArrayList. It actually functions about the same and I'm not quite sure what the difference between them is.
At any rate, you can use these data structures to store a lot of objects by making a loop. Here's an example with a Vector.
Vector<Thing> thing = new Vector<Thing>();
int numberofthings = 100;
for(int i = 0; i < numberofthings; i++) {
thing.add(new Thing());
}
That will give you a vector full of things which you can then iterate through with
for(Thing elem:thing) {
thing.dostuff
}
Ok, now for the second problem. You are correct that you need to use the Bukkit Scheduler. Here is how:
Make a class that extends BukkitRunnable
public class RunnableThing extends BukkitRunnable {
public void run() {
//what you want to do. You have to make this method.
}
}
Then what you want to do when you want to execute that thing is you make a new BukkitTask object using your RunnableThing
BukkitTask example = new RunnableThing().runTaskLater(plugin, ticks)
You have to do some math to figure out how many ticks you want. 20 ticks = 1 second. Other than that I think that covers all your questions.

Is there any disadvantage of writing a long constructor?

Does it affect the time in loading the application?
or any other issues in doing so?
The question is vague on what "long" means. Here are some possible interpretations:
Interpretation #1: The constructor has many parameters
Constructors with many parameters can lead to poor readability, and better alternatives exist.
Here's a quote from Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters:
Traditionally, programmers have used the telescoping constructor pattern, in which you provide a constructor with only the required parameters, another with a single optional parameters, a third with two optional parameters, and so on...
The telescoping constructor pattern is essentially something like this:
public class Telescope {
final String name;
final int levels;
final boolean isAdjustable;
public Telescope(String name) {
this(name, 5);
}
public Telescope(String name, int levels) {
this(name, levels, false);
}
public Telescope(String name, int levels, boolean isAdjustable) {
this.name = name;
this.levels = levels;
this.isAdjustable = isAdjustable;
}
}
And now you can do any of the following:
new Telescope("X/1999");
new Telescope("X/1999", 13);
new Telescope("X/1999", 13, true);
You can't, however, currently set only the name and isAdjustable, and leaving levels at default. You can provide more constructor overloads, but obviously the number would explode as the number of parameters grow, and you may even have multiple boolean and int arguments, which would really make a mess out of things.
As you can see, this isn't a pleasant pattern to write, and even less pleasant to use (What does "true" mean here? What's 13?).
Bloch recommends using a builder pattern, which would allow you to write something like this instead:
Telescope telly = new Telescope.Builder("X/1999").setAdjustable(true).build();
Note that now the parameters are named, and you can set them in any order you want, and you can skip the ones that you want to keep at default values. This is certainly much better than telescoping constructors, especially when there's a huge number of parameters that belong to many of the same types.
See also
Wikipedia/Builder pattern
Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters (excerpt online)
Related questions
When would you use the Builder Pattern?
Is this a well known design pattern? What is its name?
Interpretation #2: The constructor does a lot of work that costs time
If the work must be done at construction time, then doing it in the constructor or in a helper method doesn't really make too much of a difference. When a constructor delegates work to a helper method, however, make sure that it's not overridable, because that could lead to a lot of problems.
Here's some quote from Effective Java 2nd Edition, Item 17: Design and document for inheritance, or else prohibit it:
There are a few more restrictions that a class must obey to allow inheritance. Constructors must not invoke overridable methods, directly or indirectly. If you violate this rule, program failure will result. The superclass constructor runs before the subclass constructor, so the overriding method in the subclass will be invoked before the subclass constructor has run. If the overriding method depends on any initialization performed by the subclass constructor, the method will not behave as expected.
Here's an example to illustrate:
public class ConstructorCallsOverride {
public static void main(String[] args) {
abstract class Base {
Base() { overrideMe(); }
abstract void overrideMe();
}
class Child extends Base {
final int x;
Child(int x) { this.x = x; }
#Override void overrideMe() {
System.out.println(x);
}
}
new Child(42); // prints "0"
}
}
Here, when Base constructor calls overrideMe, Child has not finished initializing the final int x, and the method gets the wrong value. This will almost certainly lead to bugs and errors.
Interpretation #3: The constructor does a lot of work that can be deferred
The construction of an object can be made faster when some work is deferred to when it's actually needed; this is called lazy initialization. As an example, when a String is constructed, it does not actually compute its hash code. It only does it when the hash code is first required, and then it will cache it (since strings are immutable, this value will not change).
However, consider Effective Java 2nd Edition, Item 71: Use lazy initialization judiciously. Lazy initialization can lead to subtle bugs, and don't always yield improved performance that justifies the added complexity. Do not prematurely optimize.
Constructors are a little special in that an unhandled exception in a constructor may have weird side effects. Without seeing your code I would assume that a long constructor increases the risk of exceptions. I would make the constructor as simple as needed and utilize other methods to do the rest in order to provide better error handling.
The biggest disadvantage is probably the same as writing any other long function -- that it can get complex and difficult to understand.
The rest is going to vary. First of all, length and execution time don't necessarily correlate -- you could have a single line (e.g., function call) that took several seconds to complete (e.g., connect to a server) or lots of code that executed entirely within the CPU and finished quickly.
Startup time would (obviously) only be affected by constructors that were/are invoked during startup. I haven't had an issue with this in any code I've written (at all recently anyway), but I've seen code that did. On some types of embedded systems (for one example) you really want to avoid creating and destroying objects during normal use, so you create almost everything statically during bootup. Once it's running, you can devote all the processor time to getting the real work done.
Constructor is yet another function. You need very long functions called many times to make the program work slow. So if it's only called once it usually won't matter how much code is inside.
It affects the time it takes to construct that object, naturally, but no more than having an empty constructor and calling methods to do that work instead. It has no effect on the application load time
In case of copy constructor if we use donot use reference in that case
it will create an object and call the copy constructor and passing the
value to the copy constructor and each time a new object is created and
each time it will call the copy constructor it goes to infinite and
fill the memory then it display the error message .
if we pass the reference it will not create the new object for storing
the value. and no recursion will take place
I would avoid doing anything in your constructor that isn't absolutely necessary. Initialize your variables in there, and try not to do much else. Additional functionality should reside in separate functions that you call only if you need to.

Dealing with multiple input parameters using wrapper class

As a sort of continuation of this, I have the following newbie question:
What difference is there in building a wrapper class that expects lots of inputs parameters and inputting those parameters directly into the final constructor?
Don't get me wrong, I think the multiple input parameter thing is pretty ugly, and I'm trying to get around it since just like the poster of that question, I need to deal with a Calculator-like class that requires a lot of parameters. But what I don't understand is what would a wrapper class for input parameters solve, since I also need to build the input class--and that's just as ugly as the other alternative.
To summarize, I don't think this:
MyClass::MyClass(int param1, int param2, int param3... int paramN)
{
this->param1 = param1;
this->param2 = param2;
this->param3 = param3;
...
this->paramN = paramN;
}
...is much different from this:
Result MyClass::myInterface(MyInputClass input)
{
//perform calculations
}
MyInputClass::MyInputClass(int param1, int param2, int param3... int paramN)
{
this->param1 = param1;
this->param2 = param2;
this->param3 = param3;
...
this->paramN = paramN;
}
And, of course, I am trying to avoid setters as much as possible.
Am I missing something here? I'd love to have some insight on this, since I'm still a rather newbie programmer.
Here is some of the reasons, one would like to use a parameter class:
Reusability: You can save the parameters in a variable and reuse them, maybe in another method.
Separation of concerns: Parameter handling, say to verify which parameters are active, which are in their correct ranges, etc., is performed in the parameter class; your calculation method only knows how to calculate, so in the future you know where is each and no logic is mixed.
You can add a new value and impact minimally in your calcultor method.
Maybe your calculator method can be applied to two different set of parameters, say on an integer and on a double. It is more readable/mantainable/faster to write the calculation logic just once and change the parameter object.
Some classes do not need to initialize every single field at the constructor. Sometimes setters are the way to go.
The biggest benefits are:
Insulation from changes. You can add
new properties to the parameter class
and not have to change the class that
uses it or any of its callers.
Code reduction when you're chaining
methods. If MyClass needs to pass
its parameters around, MyInputClass
eliminates a bunch of code.
All the other points are completely valid and sound. Here's a little reinforcement from some authoritative text:
Code Complete suggests that you limit the number of parameters of any routine to seven, as "seven is the magic number for people's comprehension." It then goes on to suggest that passing more than seven parameters increases coupling with the calling scope and that you should use a structured variable (ala MyInputClass) instead.