Fail Safe Iterator giving unpredictable output

Fail Safe Iterator giving unpredictable output - iterator

I have an example (see code below) of a fail safe iterator using a ConcurrentHashMap in which two entries are added to the map while iterating.
While one of the values appears in the output, the other one does not appear in the output.
Since fail-safe works on copy of the map and not the original map should the iterator print the value added to the map while iterating?
public static void main(String[] args) {
ConcurrentHashMap<String, Integer> mp = new ConcurrentHashMap<String, Integer>();
mp.put("one",1);
mp.put("two",2);
Iterator<String> itr=mp.keySet().iterator();
while (itr.hasNext()) {
String key=(String)itr.next();
System.out.println(mp.get(key));
mp.put("three",3);
mp.put("FIVE",5);
System.out.println(mp);
}
}

The keySet iterator is not fail-safe.
The behaviour you see is consistent with the published contract.
The javadoc for ConcurrentHashMap#keySet() says:
The view's iterators and spliterators are weakly consistent.
A visit to package-level javadoc of the java.util.concurrent package says:
Most concurrent Collection implementations (including most Queues)
also differ from the usual java.util conventions in that their
Iterators and Spliterators provide weakly consistent rather than
fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect any modifications subsequent to construction.
(emphasis mine)

Related

Should my object be responsible for randomizing its own content?

I'm building an app that generates random sequences of musical notes and displays them to the user as musical notation. These sequences can be generated according to several parameters, including density and maximum consecutive notes of the same pitch.
Musical sequences are captured by a sequence object whose notes property is a simple string of notes such as "abcdaba".
My early attempts to generate random sequences involved a SequenceGenerator class that compiled random sequences using several private methods. This looks like a service to me. But I'm trying to honour the principle expressed in Domain-Driven Design (Evans 2003) to only use services where necessary and to prefer associating behaviour with domain objects.
So my question is:
Should the job of producing random sequences be taken care of by a public method on sequence itself (such as generateRandom()) or should it be kept separate?
I considered the possibility that my original design is more along the lines of a builder or factory pattern than a service, but the the code is very different for creating a random sequence than for creating one with a supplied string of notes.
One concern I have with the method route is that generateRandom() as a method on sequence changes the content of sequence but isn't actually generating a new sequence object. This just feels wrong, but I can't express why.
I'm still getting my head around some the core OO design principles, so any help is greatly appreciated.

Should the job of producing random sequences be taken care of by a public method on sequence itself (such as generateRandom()) or should it be kept separate?
I usually find that I get cleaner designs if I treat "random" the same way that I treat "time", or "I/O" -- as an input to the model, rather than as an aspect of the model itself.
If you don't consider time an input value, think about it until you do -- it is an important concept (John Carmack, 1998).
Within the constraints of DDD, that could either mean passing a "domain service" as an argument to your method, allowing your aggregate to invoke the service as needed, or it could mean having a method on the aggregate, so that the application can pass in random numbers when needed.
So any creation of a sequence would involve passing in some pattern or seed, but whether that is random or not is decided outside of the sequence itself?
Yes, exactly.

The creation of an object is not usually considered part of the logic for the object.
How you do that technically is a different matter. You could potentially use delegation. For example:
public interface NoteSequence {
void play();
}
public final class LettersNoteSequence implements NoteSequence {
public LettersNoteSequence(String letters) {
...
}
...
}
public final class RandomNoteSequence implements NoteSequence {
...
#Override
public void play() {
new LetterNoteSequence(generateRandomLetters()).play();
}
}
This way you don't have to have a "service" or a "factory", but this is only one alternative, may or may not fit your use-case.

Reference Semantics in Google Protocol Buffers

I have slightly peculiar program which deals with cases very similar to this
(in C#-like pseudo code):
class CDataSet
{
int m_nID;
string m_sTag;
float m_fValue;
void PrintData()
{
//Blah Blah
}
};
class CDataItem
{
int m_nID;
string m_sTag;
CDataSet m_refData;
CDataSet m_refParent;
void Print()
{
if(null == m_refData)
{
m_refParent.PrintData();
}
else
{
m_refData.PrintData();
}
}
};
Members m_refData and m_refParent are initialized to null and used as follows:
m_refData -> Used when a new data set is added
m_refParent -> Used to point to an existing data set.
A new data set is added only if the field m_nID doesn't match an existing one.
Currently this code is managing around 500 objects with around 21 fields per object and the format of choice as of now is XML, which at 100k+ lines and 5MB+ is very unwieldy.
I am planning to modify the whole shebang to use ProtoBuf, but currently I'm not sure as to how I can handle the reference semantics. Any thoughts would be much appreciated

Out of the box, protocol buffers does not have any reference semantics. You would need to cross-reference them manually, typically using an artificial key. Essentially on the DTO layer you would a key to CDataSet (that you simply invent, perhaps just an increasing integer), storing the key instead of the item in m_refData/m_refParent, and running fixup manually during serialization/deserialization. You can also just store the index into the set of CDataSet, but that may make insertion etc more difficult. Up to you; since this is serialization you could argue that you won't insert (etc) outside of initial population and hence the raw index is fine and reliable.
This is, however, a very common scenario - so as an implementation-specific feature I've added optional (opt-in) reference tracking to my implementation (protobuf-net), which essentially automates the above under the covers (so you don't need to change your objects or expose the key outside of the binary stream).

Another ConcurrentModificationException question

I've searched StackOverflow and there are many ConcurrentModificationException questions. After reading them, I'm still confused. I'm getting a lot of these exceptions. I'm using a "Registry" setup to keep track of Objects:
public class Registry {
public static ArrayList<Messages> messages = new ArrayList<Messages>();
public static ArrayList<Effect> effects = new ArrayList<Effect>();
public static ArrayList<Projectile> proj = new ArrayList<Projectile>();
/** Clears all arrays */
public static void recycle(){
messages.clear();
effects.clear();
proj.clear();
}
}
I'm adding and removing objects to these lists by accessing the ArrayLists like this: Registry.effects.add(obj) and Registry.effects.remove(obj)
I managed to get around some errors by using a retry loop:
//somewhere in my game..
boolean retry = true;
while (retry){
try {
removeEffectsWithSource("CHARGE");
retry = false;
}
catch (ConcurrentModificationException c){}
}
private void removeEffectsWithSource(String src) throws ConcurrentModificationException {
ListIterator<Effect> it = Registry.effects.listIterator();
while ( it.hasNext() ){
Effect f = it.next();
if ( f.Source.equals(src) ) {
f.unapplyEffects();
Registry.effects.remove(f);
}
}
}
But in other cases this is not practical. I keep getting ConcurrentModificationExceptions in my drawProjectiles() method, even though it doesn't modify anything. I suppose the culprit is if I touched the screen, which creates a new Projectile object and adds it to Registry.proj while the draw method is still iterating.
I can't very well do a retry loop with the draw method, or it will re-draw some of the objects. So now I'm forced to find a new solution.. Is there a more stable way of accomplishing what I'm doing?
Oh and part 2 of my question: Many people suggest using ListIterators (as I have been using), but I don't understand.. if I call ListIterator.remove() does it remove that object from the ArrayList it's iterating through, or just remove it from the Iterator itself?

Top line, three recommendations:
Don't do the "wrap an exception in a loop" thing. Exceptions are for exceptional conditions, not control flow. (Effective Java #57 or Exceptions and Control Flow or Example of "using exceptions for control flow")
If you're going to use a Registry object, expose thread-safe behavioral, not accessor methods on that object and contain the concurrency reasoning within that single class. Your life will get better. No exposing collections in public fields. (ew, and why are those fields static?)
To solve the actual concurrency issues, do one of the following:
Use synchronized collections (potential performance hit)
Use concurrent collections (sometimes complicated logic, but probably efficient)
Use snapshots (probably with synchronized or a ReadWriteLock under the covers)
Part 1 of your question
You should use a concurrent data structure for the multi-threaded scenario, or use a synchronizer and make a defensive copy. Probably directly exposing the collections as public fields is wrong: your registry should expose thread-safe behavioral accessors to those collections. For instance, maybe you want a Registry.safeRemoveEffectBySource(String src) method. Keep the threading specifics internal to the registry, which seems to be the "owner" of this aggregate information in your design.
Since you probably don't really need List semantics, I suggest replacing these with ConcurrentHashMaps wrapped into Set using Collections.newSetFromMap().
Your draw() method could either a) use a Registry.getEffectsSnapshot() method that returns a snapshot of the set; or b) use an Iterable<Effect> Registry.getEffects() method that returns a safe iterable version (maybe just backed by the ConcurrentHashMap, which won't throw CME under any circumstances). I think (b) is preferable here, as long as the draw loop doesn't need to modify the collection. This provides a very weak synchronization guarantee between the mutator thread(s) and the draw() thread, but assuming the draw() thread runs often enough, missing an update or something probably isn't a big deal.
Part 2 of your question
As another answer notes, in the single-thread case, you should just make sure you use the Iterator.remove() to remove the item, but again, you should wrap this logic inside the Registry class if at all possible. In some cases, you'll need to lock a collection, iterate over it collecting some aggregate information, and make structural modifications after the iteration completes. You ask if the remove() method just removes it from the Iterator or from the backing collection... see the API contract for Iterator.remove() which tells you it removes the object from the underlying collection. Also see this SO question.

You cannot directly remove an item from a collection while you are still iterating over it, otherwise you will get a ConcurrentModificationException.
The solution is, as you hint, to call the remove method on the Iterator instead. This will remove it from the underlying collection as well, but it will do it in such a way that the Iterator knows what's going on and so doesn't throw an exception when it finds the collection has been modified.

Is there any disadvantage of writing a long constructor?

Does it affect the time in loading the application?
or any other issues in doing so?

The question is vague on what "long" means. Here are some possible interpretations:
Interpretation #1: The constructor has many parameters
Constructors with many parameters can lead to poor readability, and better alternatives exist.
Here's a quote from Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters:
Traditionally, programmers have used the telescoping constructor pattern, in which you provide a constructor with only the required parameters, another with a single optional parameters, a third with two optional parameters, and so on...
The telescoping constructor pattern is essentially something like this:
public class Telescope {
final String name;
final int levels;
final boolean isAdjustable;
public Telescope(String name) {
this(name, 5);
}
public Telescope(String name, int levels) {
this(name, levels, false);
}
public Telescope(String name, int levels, boolean isAdjustable) {
this.name = name;
this.levels = levels;
this.isAdjustable = isAdjustable;
}
}
And now you can do any of the following:
new Telescope("X/1999");
new Telescope("X/1999", 13);
new Telescope("X/1999", 13, true);
You can't, however, currently set only the name and isAdjustable, and leaving levels at default. You can provide more constructor overloads, but obviously the number would explode as the number of parameters grow, and you may even have multiple boolean and int arguments, which would really make a mess out of things.
As you can see, this isn't a pleasant pattern to write, and even less pleasant to use (What does "true" mean here? What's 13?).
Bloch recommends using a builder pattern, which would allow you to write something like this instead:
Telescope telly = new Telescope.Builder("X/1999").setAdjustable(true).build();
Note that now the parameters are named, and you can set them in any order you want, and you can skip the ones that you want to keep at default values. This is certainly much better than telescoping constructors, especially when there's a huge number of parameters that belong to many of the same types.
See also
Wikipedia/Builder pattern
Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters (excerpt online)
Related questions
When would you use the Builder Pattern?
Is this a well known design pattern? What is its name?
Interpretation #2: The constructor does a lot of work that costs time
If the work must be done at construction time, then doing it in the constructor or in a helper method doesn't really make too much of a difference. When a constructor delegates work to a helper method, however, make sure that it's not overridable, because that could lead to a lot of problems.
Here's some quote from Effective Java 2nd Edition, Item 17: Design and document for inheritance, or else prohibit it:
There are a few more restrictions that a class must obey to allow inheritance. Constructors must not invoke overridable methods, directly or indirectly. If you violate this rule, program failure will result. The superclass constructor runs before the subclass constructor, so the overriding method in the subclass will be invoked before the subclass constructor has run. If the overriding method depends on any initialization performed by the subclass constructor, the method will not behave as expected.
Here's an example to illustrate:
public class ConstructorCallsOverride {
public static void main(String[] args) {
abstract class Base {
Base() { overrideMe(); }
abstract void overrideMe();
}
class Child extends Base {
final int x;
Child(int x) { this.x = x; }
#Override void overrideMe() {
System.out.println(x);
}
}
new Child(42); // prints "0"
}
}
Here, when Base constructor calls overrideMe, Child has not finished initializing the final int x, and the method gets the wrong value. This will almost certainly lead to bugs and errors.
Interpretation #3: The constructor does a lot of work that can be deferred
The construction of an object can be made faster when some work is deferred to when it's actually needed; this is called lazy initialization. As an example, when a String is constructed, it does not actually compute its hash code. It only does it when the hash code is first required, and then it will cache it (since strings are immutable, this value will not change).
However, consider Effective Java 2nd Edition, Item 71: Use lazy initialization judiciously. Lazy initialization can lead to subtle bugs, and don't always yield improved performance that justifies the added complexity. Do not prematurely optimize.

Constructors are a little special in that an unhandled exception in a constructor may have weird side effects. Without seeing your code I would assume that a long constructor increases the risk of exceptions. I would make the constructor as simple as needed and utilize other methods to do the rest in order to provide better error handling.

The biggest disadvantage is probably the same as writing any other long function -- that it can get complex and difficult to understand.
The rest is going to vary. First of all, length and execution time don't necessarily correlate -- you could have a single line (e.g., function call) that took several seconds to complete (e.g., connect to a server) or lots of code that executed entirely within the CPU and finished quickly.
Startup time would (obviously) only be affected by constructors that were/are invoked during startup. I haven't had an issue with this in any code I've written (at all recently anyway), but I've seen code that did. On some types of embedded systems (for one example) you really want to avoid creating and destroying objects during normal use, so you create almost everything statically during bootup. Once it's running, you can devote all the processor time to getting the real work done.

Constructor is yet another function. You need very long functions called many times to make the program work slow. So if it's only called once it usually won't matter how much code is inside.

It affects the time it takes to construct that object, naturally, but no more than having an empty constructor and calling methods to do that work instead. It has no effect on the application load time

In case of copy constructor if we use donot use reference in that case
it will create an object and call the copy constructor and passing the
value to the copy constructor and each time a new object is created and
each time it will call the copy constructor it goes to infinite and
fill the memory then it display the error message .
if we pass the reference it will not create the new object for storing
the value. and no recursion will take place

I would avoid doing anything in your constructor that isn't absolutely necessary. Initialize your variables in there, and try not to do much else. Additional functionality should reside in separate functions that you call only if you need to.

C# 4.0 'dynamic' and foreach statement

Not long time before I've discovered, that new dynamic keyword doesn't work well with the C#'s foreach statement:
using System;
sealed class Foo {
public struct FooEnumerator {
int value;
public bool MoveNext() { return true; }
public int Current { get { return value++; } }
}
public FooEnumerator GetEnumerator() {
return new FooEnumerator();
}
static void Main() {
foreach (int x in new Foo()) {
Console.WriteLine(x);
if (x >= 100) break;
}
foreach (int x in (dynamic)new Foo()) { // :)
Console.WriteLine(x);
if (x >= 100) break;
}
}
}
I've expected that iterating over the dynamic variable should work completely as if the type of collection variable is known at compile time. I've discovered that the second loop actually is looked like this when is compiled:
foreach (object x in (IEnumerable) /* dynamic cast */ (object) new Foo()) {
...
}
and every access to the x variable results with the dynamic lookup/cast so C# ignores that I've specify the correct x's type in the foreach statement - that was a bit surprising for me... And also, C# compiler completely ignores that collection from dynamically typed variable may implements IEnumerable<T> interface!
The full foreach statement behavior is described in the C# 4.0 specification 8.8.4 The foreach statement article.
But... It's perfectly possible to implement the same behavior at runtime! It's possible to add an extra CSharpBinderFlags.ForEachCast flag, correct the emmited code to looks like:
foreach (int x in (IEnumerable<int>) /* dynamic cast with the CSharpBinderFlags.ForEachCast flag */ (object) new Foo()) {
...
}
And add some extra logic to CSharpConvertBinder:
Wrap IEnumerable collections and IEnumerator's to IEnumerable<T>/IEnumerator<T>.
Wrap collections doesn't implementing Ienumerable<T>/IEnumerator<T> to implement this interfaces.
So today foreach statement iterates over dynamic completely different from iterating over statically known collection variable and completely ignores the type information, specified by user. All that results with the different iteration behavior (IEnumarble<T>-implementing collections is being iterated as only IEnumerable-implementing) and more than 150x slowdown when iterating over dynamic. Simple fix will results a much better performance:
foreach (int x in (IEnumerable<int>) dynamicVariable) {
But why I should write code like this?
It's very nicely to see that sometimes C# 4.0 dynamic works completely the same if the type will be known at compile-time, but it's very sadly to see that dynamic works completely different where IT CAN works the same as statically typed code.
So my question is: why foreach over dynamic works different from foreach over anything else?

First off, to explain some background to readers who are confused by the question: the C# language actually does not require that the collection of a "foreach" implement IEnumerable. Rather, it requires either that it implement IEnumerable, or that it implement IEnumerable<T>, or simply that it have a GetEnumerator method (and that the GetEnumerator method returns something with a Current and MoveNext that matches the pattern expected, and so on.)
That might seem like an odd feature for a statically typed language like C# to have. Why should we "match the pattern"? Why not require that collections implement IEnumerable?
Think about the world before generics. If you wanted to make a collection of ints, you'd have to use IEnumerable. And therefore, every call to Current would box an int, and then of course the caller would immediately unbox it back to int. Which is slow and creates pressure on the GC. By going with a pattern-based approach you can make strongly typed collections in C# 1.0!
Nowadays of course no one implements that pattern; if you want a strongly typed collection, you implement IEnumerable<T> and you're done. Had a generic type system been available to C# 1.0, it is unlikely that the "match the pattern" feature would have been implemented in the first place.
As you've noted, instead of looking for the pattern, the code generated for a dynamic collection in a foreach looks for a dynamic conversion to IEnumerable (and then does a conversion from the object returned by Current to the type of the loop variable of course.) So your question basically is "why does the code generated by use of the dynamic type as a collection type of foreach fail to look for the pattern at runtime?"
Because it isn't 1999 anymore, and even when it was back in the C# 1.0 days, collections that used the pattern also almost always implemented IEnumerable too. The probability that a real user is going to be writing production-quality C# 4.0 code which does a foreach over a collection that implements the pattern but not IEnumerable is extremely low. Now, if you're in that situation, well, that's unexpected, and I'm sorry that our design failed to anticipate your needs. If you feel that your scenario is in fact common, and that we've misjudged how rare it is, please post more details about your scenario and we'll consider changing this for hypothetical future versions.
Note that the conversion we generate to IEnumerable is a dynamic conversion, not simply a type test. That way, the dynamic object may participate; if it does not implement IEnumerable but wishes to proffer up a proxy object which does, it is free to do so.
In short, the design of "dynamic foreach" is "dynamically ask the object for an IEnumerable sequence", rather than "dynamically do every type-testing operation we would have done at compile time". This does in theory subtly violate the design principle that dynamic analysis gives the same result as static analysis would have, but in practice it's how we expect the vast majority of dynamically accessed collections to work.

But why I should write code like this?
Indeed. And why would the compiler write code like that? You've removed any chance it might have had to guess that the loop could be optimized. Btw, you seem to interpret the IL incorrectly, it is rebinding to obtain IEnumerable.Current, the MoveNext() call is direct and GetEnumerator() is called only once. Which I think is appropriate, the next element might or might not cast to an int without problems. It could be a collection of various types, each with their own binder.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas