Does Parallel.ForEach require AsParallel() - .net-4.0

ParallelEnumerable has a static member AsParallel. If I have an IEnumerable<T> and want to use Parallel.ForEach does that imply that I should always be using AsParallel?
e.g.
Are both of these correct (everything else being equal)?
without AsParallel:
List<string> list = new List<string>();
Parallel.ForEach<string>(GetFileList().Where(file => reader.Match(file)), f => list.Add(f));
or with AsParallel?
List<string> list = new List<string>();
Parallel.ForEach<string>(GetFileList().Where(file => reader.Match(file)).AsParallel(), f => list.Add(f));

It depends on what's being called, they are separate issues.
.AsParallel() Parallelizes the enumeration not the delegation of tasks.
Parallel.ForEach Parallelized the loop, assigning tasks to worker threads for each element.
So unless your source enumeration gains from becoming parallel (e.g. reader.Match(file) is expensive), they are equal. To your last question, yes, both are also correct.
Also, there's one other construct you may want to look at that shortens it a bit, still getting maximum benefit of PLINQ:
GetFileList().Where(file => reader.Match(file)).ForAll(f => list.Add(f));

Related

Efficiency: reuse Term Lucene 6

I want to reuse the Term object instead of creating a new one every time I call this method:
public long getDF(String term) throws Exception {
return indexReader.docFreq(new Term("content", term));
}
I read in the documentation that, I can use this constructor of Term to reuse it:
public Term(String fld)
Constructs a Term with the given field and empty text. This serves two purposes: 1) reuse of a Term with the same field. 2) pattern for a query.
However, I don't know what is the next step as there is no setters in the Term documentation nor reset() method.
Any hint on how to achieve this?
Constructing Terms is cheap. You probably shouldn't worry too much about trying to reuse them. If you are seeing real performance issues, you should run a profiler. I'd guess constructing terms is not the real problem, and attempting to reuse clauses like this will just complicate things for no discernible benefit.
That said, you could reuse it by getting the Term's BytesRef from Term.bytes(), and directly modifying the underlying byte array.
String text = "text";
Term term = new Term("field");
BytesRef bytes = term.bytes();
bytes.bytes = new byte[UnicodeUtil.maxUTF8Length(text.length())];
bytes.length = UnicodeUtil.UTF16toUTF8(text, 0, text.length(), bytes.bytes);
Be careful that you aren't changing the value of a Term that is still in use. For instance, attempting to add two clauses to a BooleanQuery like this will, of course, not work.

Another ConcurrentModificationException question

I've searched StackOverflow and there are many ConcurrentModificationException questions. After reading them, I'm still confused. I'm getting a lot of these exceptions. I'm using a "Registry" setup to keep track of Objects:
public class Registry {
public static ArrayList<Messages> messages = new ArrayList<Messages>();
public static ArrayList<Effect> effects = new ArrayList<Effect>();
public static ArrayList<Projectile> proj = new ArrayList<Projectile>();
/** Clears all arrays */
public static void recycle(){
messages.clear();
effects.clear();
proj.clear();
}
}
I'm adding and removing objects to these lists by accessing the ArrayLists like this: Registry.effects.add(obj) and Registry.effects.remove(obj)
I managed to get around some errors by using a retry loop:
//somewhere in my game..
boolean retry = true;
while (retry){
try {
removeEffectsWithSource("CHARGE");
retry = false;
}
catch (ConcurrentModificationException c){}
}
private void removeEffectsWithSource(String src) throws ConcurrentModificationException {
ListIterator<Effect> it = Registry.effects.listIterator();
while ( it.hasNext() ){
Effect f = it.next();
if ( f.Source.equals(src) ) {
f.unapplyEffects();
Registry.effects.remove(f);
}
}
}
But in other cases this is not practical. I keep getting ConcurrentModificationExceptions in my drawProjectiles() method, even though it doesn't modify anything. I suppose the culprit is if I touched the screen, which creates a new Projectile object and adds it to Registry.proj while the draw method is still iterating.
I can't very well do a retry loop with the draw method, or it will re-draw some of the objects. So now I'm forced to find a new solution.. Is there a more stable way of accomplishing what I'm doing?
Oh and part 2 of my question: Many people suggest using ListIterators (as I have been using), but I don't understand.. if I call ListIterator.remove() does it remove that object from the ArrayList it's iterating through, or just remove it from the Iterator itself?
Top line, three recommendations:
Don't do the "wrap an exception in a loop" thing. Exceptions are for exceptional conditions, not control flow. (Effective Java #57 or Exceptions and Control Flow or Example of "using exceptions for control flow")
If you're going to use a Registry object, expose thread-safe behavioral, not accessor methods on that object and contain the concurrency reasoning within that single class. Your life will get better. No exposing collections in public fields. (ew, and why are those fields static?)
To solve the actual concurrency issues, do one of the following:
Use synchronized collections (potential performance hit)
Use concurrent collections (sometimes complicated logic, but probably efficient)
Use snapshots (probably with synchronized or a ReadWriteLock under the covers)
Part 1 of your question
You should use a concurrent data structure for the multi-threaded scenario, or use a synchronizer and make a defensive copy. Probably directly exposing the collections as public fields is wrong: your registry should expose thread-safe behavioral accessors to those collections. For instance, maybe you want a Registry.safeRemoveEffectBySource(String src) method. Keep the threading specifics internal to the registry, which seems to be the "owner" of this aggregate information in your design.
Since you probably don't really need List semantics, I suggest replacing these with ConcurrentHashMaps wrapped into Set using Collections.newSetFromMap().
Your draw() method could either a) use a Registry.getEffectsSnapshot() method that returns a snapshot of the set; or b) use an Iterable<Effect> Registry.getEffects() method that returns a safe iterable version (maybe just backed by the ConcurrentHashMap, which won't throw CME under any circumstances). I think (b) is preferable here, as long as the draw loop doesn't need to modify the collection. This provides a very weak synchronization guarantee between the mutator thread(s) and the draw() thread, but assuming the draw() thread runs often enough, missing an update or something probably isn't a big deal.
Part 2 of your question
As another answer notes, in the single-thread case, you should just make sure you use the Iterator.remove() to remove the item, but again, you should wrap this logic inside the Registry class if at all possible. In some cases, you'll need to lock a collection, iterate over it collecting some aggregate information, and make structural modifications after the iteration completes. You ask if the remove() method just removes it from the Iterator or from the backing collection... see the API contract for Iterator.remove() which tells you it removes the object from the underlying collection. Also see this SO question.
You cannot directly remove an item from a collection while you are still iterating over it, otherwise you will get a ConcurrentModificationException.
The solution is, as you hint, to call the remove method on the Iterator instead. This will remove it from the underlying collection as well, but it will do it in such a way that the Iterator knows what's going on and so doesn't throw an exception when it finds the collection has been modified.

wcf method in a loop?

Taking the follwing scenario in consuming a WCF method:
List<TList> MyList ; // contains 1000 rows
MyWCFclient svc = new MyWCFclient;
foreach ( var g in MyList){
g.field1inMyList = svc.getCalc( g.field2inMyList )
}
svc.Close();
Could be a good implementation to reuse a wcf method in this way, performance etc?
Each invocation of getMyInt would be its own separate conversation, with quite a bit of overhead on each call. So no, this would not be a good implementation if you actually had a loop quickly invoking getMyInt 1000 times in a row. However, since your example is a bit contrived (it doesn't accomplish/return anything) it's difficult to suggest the best way to improve it.
Edit: I can see from your changes that you have a list of objects containing two properties. You are using the web service to derive one of the values from the other. Therefore, the most optimal way of solving this problem, would be to pass all the values to the web service at once, like so:
List<TList> MyList ; // contains 1000 rows
MyWCFclient svc = new MyWCFclient;
var field2Values = MyList.Select(x => x.field2inMyList).ToArray();
var field1Values = svc.getCalc(field2Values);
for (int i = 0; i < field1Values.Length; i++)
{
MyList[i].field1inMyList = field1Values[i];
}
svc.Close();
Both field2Values and field1Values are arrays and there is only one call to the WCF service.
You might want to create a "bulk" version of this service operation to allow it to take a list as a parameter (MyList in that case). This new operation would then call the same business logic for each element and return the results as a whole.
This should reduce overhead. But do not forget to adapt the Max*MessageSize configuration parameters to cope with the potentially big messages involved in that case.

C# 4.0 'dynamic' and foreach statement

Not long time before I've discovered, that new dynamic keyword doesn't work well with the C#'s foreach statement:
using System;
sealed class Foo {
public struct FooEnumerator {
int value;
public bool MoveNext() { return true; }
public int Current { get { return value++; } }
}
public FooEnumerator GetEnumerator() {
return new FooEnumerator();
}
static void Main() {
foreach (int x in new Foo()) {
Console.WriteLine(x);
if (x >= 100) break;
}
foreach (int x in (dynamic)new Foo()) { // :)
Console.WriteLine(x);
if (x >= 100) break;
}
}
}
I've expected that iterating over the dynamic variable should work completely as if the type of collection variable is known at compile time. I've discovered that the second loop actually is looked like this when is compiled:
foreach (object x in (IEnumerable) /* dynamic cast */ (object) new Foo()) {
...
}
and every access to the x variable results with the dynamic lookup/cast so C# ignores that I've specify the correct x's type in the foreach statement - that was a bit surprising for me... And also, C# compiler completely ignores that collection from dynamically typed variable may implements IEnumerable<T> interface!
The full foreach statement behavior is described in the C# 4.0 specification 8.8.4 The foreach statement article.
But... It's perfectly possible to implement the same behavior at runtime! It's possible to add an extra CSharpBinderFlags.ForEachCast flag, correct the emmited code to looks like:
foreach (int x in (IEnumerable<int>) /* dynamic cast with the CSharpBinderFlags.ForEachCast flag */ (object) new Foo()) {
...
}
And add some extra logic to CSharpConvertBinder:
Wrap IEnumerable collections and IEnumerator's to IEnumerable<T>/IEnumerator<T>.
Wrap collections doesn't implementing Ienumerable<T>/IEnumerator<T> to implement this interfaces.
So today foreach statement iterates over dynamic completely different from iterating over statically known collection variable and completely ignores the type information, specified by user. All that results with the different iteration behavior (IEnumarble<T>-implementing collections is being iterated as only IEnumerable-implementing) and more than 150x slowdown when iterating over dynamic. Simple fix will results a much better performance:
foreach (int x in (IEnumerable<int>) dynamicVariable) {
But why I should write code like this?
It's very nicely to see that sometimes C# 4.0 dynamic works completely the same if the type will be known at compile-time, but it's very sadly to see that dynamic works completely different where IT CAN works the same as statically typed code.
So my question is: why foreach over dynamic works different from foreach over anything else?
First off, to explain some background to readers who are confused by the question: the C# language actually does not require that the collection of a "foreach" implement IEnumerable. Rather, it requires either that it implement IEnumerable, or that it implement IEnumerable<T>, or simply that it have a GetEnumerator method (and that the GetEnumerator method returns something with a Current and MoveNext that matches the pattern expected, and so on.)
That might seem like an odd feature for a statically typed language like C# to have. Why should we "match the pattern"? Why not require that collections implement IEnumerable?
Think about the world before generics. If you wanted to make a collection of ints, you'd have to use IEnumerable. And therefore, every call to Current would box an int, and then of course the caller would immediately unbox it back to int. Which is slow and creates pressure on the GC. By going with a pattern-based approach you can make strongly typed collections in C# 1.0!
Nowadays of course no one implements that pattern; if you want a strongly typed collection, you implement IEnumerable<T> and you're done. Had a generic type system been available to C# 1.0, it is unlikely that the "match the pattern" feature would have been implemented in the first place.
As you've noted, instead of looking for the pattern, the code generated for a dynamic collection in a foreach looks for a dynamic conversion to IEnumerable (and then does a conversion from the object returned by Current to the type of the loop variable of course.) So your question basically is "why does the code generated by use of the dynamic type as a collection type of foreach fail to look for the pattern at runtime?"
Because it isn't 1999 anymore, and even when it was back in the C# 1.0 days, collections that used the pattern also almost always implemented IEnumerable too. The probability that a real user is going to be writing production-quality C# 4.0 code which does a foreach over a collection that implements the pattern but not IEnumerable is extremely low. Now, if you're in that situation, well, that's unexpected, and I'm sorry that our design failed to anticipate your needs. If you feel that your scenario is in fact common, and that we've misjudged how rare it is, please post more details about your scenario and we'll consider changing this for hypothetical future versions.
Note that the conversion we generate to IEnumerable is a dynamic conversion, not simply a type test. That way, the dynamic object may participate; if it does not implement IEnumerable but wishes to proffer up a proxy object which does, it is free to do so.
In short, the design of "dynamic foreach" is "dynamically ask the object for an IEnumerable sequence", rather than "dynamically do every type-testing operation we would have done at compile time". This does in theory subtly violate the design principle that dynamic analysis gives the same result as static analysis would have, but in practice it's how we expect the vast majority of dynamically accessed collections to work.
But why I should write code like this?
Indeed. And why would the compiler write code like that? You've removed any chance it might have had to guess that the loop could be optimized. Btw, you seem to interpret the IL incorrectly, it is rebinding to obtain IEnumerable.Current, the MoveNext() call is direct and GetEnumerator() is called only once. Which I think is appropriate, the next element might or might not cast to an int without problems. It could be a collection of various types, each with their own binder.

Should I use LINQ for this query?

I've been reading a fair bit about the performance of using LINQ rather than using a for each loop and from what I understand using a LINQ query would be a little bit slower but generally worth it for convenience and expressiveness. However I am a bit confused about how much slower it is if you were to use the results of the query in a for loop.
Let's say that I have a set called 'Locations' and a set of objects called 'Items'. Each 'item' can only belong to one 'location'. I want to link items that are under the same location to each other. If I were to do this using a normal 'For Each' loop it would be something like this:
For Each it as Item in Items
If it.Location.equals(Me.Location)
Me.LinkedItems.Add(it)
End If
Next
However if i was to use LINQ it would instead be this:
For Each it as Item in Items.Where(Function(i) i.Location.equals(Me.Location))
Me.LinkedItems.Add(it)
Next
Now my question is, is the second (LINQ) option going to loop once through the entire 'Items' set to complete the query, then loop through the results to add them to the list, resulting in essentially two loops, or will it do the one loop like the first (For Each) option? If the answer is the former, I assume then that it would be silly to use LINQ in this situation.
It will do one loop - it's lazily evaluated.
However, you may be able to do better than this. What's the type of LinkedItems? If it has an appropriate AddRange method, you should be able to do:
Me.LinkedItems.AddRange(Items.Where(Function(i) i.Location.equals(Me.Location)))
More on lazy evaluation
Basically Where maintains an iterator, and only finds the next matching item when you ask for it. In C#, the implementation would be something like:
// Error handling omitted
public static IEnumerable<T> Where(this IEnumerable<T> source,
Func<T, bool> predicate)
{
foreach (T element in source)
{
if (predicate(element))
{
yield return element;
}
}
}
It's the use of yield return here which would make it lazily evaluated. If you're not familiar with C# iterator blocks, you might want to look at these articles which explain them in more detail.
Of course Where could have been implemented "manually" instead of using an iterator block, but the above implementation is sufficient to show the lazy evaluation.
It will do the query once, since you are foreaching over a list of Items.Where. In your case that's a pre-filtered list of the condition you want and you should really go with the LINQ.