QueryCursorImpl.getAll possibly loses data. Should I prefer IgniteCacheProxy.getAll? - ignite

I am trying to figure out what kind of getAll I should use. Currently, in a project, I work for, is implemented way through QueryCursorImpl
try (QueryCursor<Cache.Entry<IgnitePrimaryKey, V>> query = cache.query(new ScanQuery<>(filter))) {
return query.getAll()
.stream()
...
} catch (Exception e) {
...
}
However, Most of the times my application cannot obtain data (data is empty). On the other hand cache variable is a IgniteCacheProxy and it has getAll method itself. There is some problem with set of keys, because actual signature of IgniteCacheProxy.getAll is following:
public Map<K, V> getAll(Set<? extends K> keys)
However, if I solve issue with keys, maybe I should prefer IgniteCacheProxy.getAll because I can see asynchronouse code inside it rather than QueryCursorImpl.getAll?

Sounds like you want to simply iterate over all the data you have in cache. If that's the case, then the easiest way would be to simple get Iterator (IgniteCache implements Iterable) and the loop through it. Using getAll would mean fetching all the data from a distributed cache to a client, which is typically a very bad idea, especially for large data set. Ignite iterator, however, will fetch data in pages and therefore will never blow the client's memory.

Related

Entity Framework Core verify SaveChanges count

I have been assigned a task to verify the count of changes done using SaveChanges().
It is expected that the developer should know how many records will be changed before-hand when SaveChanges() will be called.
To implement it, I have created an extension method for DbContext called SaveChangesAndVerify(int expectedChangeCount) where I am using transaction and equating this parameter with the return value of SaveChanges().
If the values match, the transaction is committed and if it doesn't match, the transaction is rolled back.
Please check the code below and let me know if it would work and if there are any considerations that I need to make. Also, is there a better way to do this?
public static class DbContextExtensions
{
public static int SaveChangesAndVerify(this DbContext context, int expectedChangeCount)
{
context.Database.BeginTransaction();
var actualChangeCount = context.SaveChanges();
if (actualChangeCount == expectedChangeCount)
{
context.Database.CommitTransaction();
return actualChangeCount;
}
else
{
context.Database.RollbackTransaction();
throw new DbUpdateException($"Expected count {expectedChangeCount} did not match actual count {actualChangeCount} while saving the changes.");
}
}
public static async Task<int> SaveChangesAndVerifyAsync(this DbContext context, int expectedChangeCount, CancellationToken cancellationToken = default)
{
await context.Database.BeginTransactionAsync();
var actualChangeCount = await context.SaveChangesAsync();
if(actualChangeCount == expectedChangeCount)
{
context.Database.CommitTransaction();
return actualChangeCount;
}
else
{
context.Database.RollbackTransaction();
throw new DbUpdateException($"Expected count {expectedChangeCount} did not match actual count {actualChangeCount} while saving the changes.");
}
}
}
A sample usage would be like context.SaveChangesAndVerify(1) where a developer is expecting only 1 record to update.
Ok so some points.
Unless you've disabled it SaveChanges works as a transaction. Nothing will be changed if anything fails
Furthermore use context.ChangeTracker.Entries() and from there you can get the count of the number of the changed entities. So this will not require you handle transactions. Also SaveChanges() simply return the numbers of rows affected so it may not tell the full story.
Generally I dislike the idea of having this kind of check from a project architecture standpoint, increases complexity of code for dynamic changes and simply adds complexity without bringing any kind of security or safety. Data integrity and proper behavior should be validated using Unit test not those kinds of methods. For example you could add Unit Tests that validate that the rows that got changed are the same as those as you expected. But that should be test code. Not code that will be shipped to production
But if you need to do it dont use transaction and count the entities before changing anything as it is much cheaper. You can even use a "cheap" forloop so you can log what entities failed and so on. Furthermore since we are policing the developers you use extensions which means a developer can freely use the SaveChanges() as far as I can tell. You should create a new custom class for DbContext and expose only those 2 methods for saving changes.

Kotlin most efficient way sequence to set

Hi I've got list of 1330 objects and would like to apply method and obtain set as result.
val result = listOf1330
.asSequence()
.map {
someMethod(it)
}
val resultSet = result.toSet()
It works fine without toSet but if then execution time is about 10 times longer.
I've used sequence to make it work faster and it is but as a result I need list without duplicates (set).
Simply: What is most effective way to convert sequence to set?
val result = listOf1330.mapTo(HashSet()) { someMethod(it) }
It makes less sense to use streams or sequences to implement the transformation - you will need all elements from the collection, not several. The mapTo (and map) functions are inline in Kotlin. It means the code will be substituted into the call site, it will not have lambda created and executed many times. We use mapTo to avoid the second copy of the collection done by the toSet() function.
The .parallelStream() may add more performance, if you like to run the computation in several threads. It is still a good idea to measure how good the load is balanced between threads. The performance may depend on the collection implementation class, on which you call it
If your someObject has a slow implementation of equals() or hashCode(), or gives the same hash code for many objects, then that could account for the delay, and you may be able to improve it.
Otherwise, if the objects are big, the delay may be mostly due to the amount of memory that must be accessed to store them all; if so, that's the price you'll have to pay if you want a set with all those objects in memory.
Sequence.toSet() uses a LinkedHashSet.  You could try providing another Set instance, using e.g. toCollection(HashSet()), to see if that's any faster.  (You wouldn't get the same iteration order, though.)
I agree with gidds answer on HashSet and LinkedHashSet performance.
LinkedHashSet is more expensive for insertions than HashSet;
However, in the above use case, I think we can leverage parallelStream to improve the performance. Under the hood, Kotlin uses the Java parallelStream.
val result: Set<String> = listOf("sdgds", "fdgdfsg", "dsfgsdfg")
.parallelStream()
.map {
someMethod(it)
}.collect(Collectors.toSet())
The Collectors.toSet() uses HashSet. So, we should be ok in insertion performance perspective.
Use distict or distictBy.
val result = sequenceOf("a", "b", "a", "c").distinct()
// -> "a", "b", "c"
// for more complex cases use custom comparator function
val result = getMyObjectsSequence().distinctBy { it.name }
This approach lets keep using sequence without involving explicit Iterables (List, Set, etc.).
Nevertheless, there is no magic, and "distinct" still uses HashSet under the hood and in case of really huge sequence it may cause sufficient memory usage and it must be kept in mind while applying this function.

Spring WebClient: Parse + Stream very large JSON

This question is similar to Spring reactive streaming data from regular WebClient request with the difference that I'm not getting JSON array immediately from my WebClient, but something like this:
This JSON object can be very large (~100MB), and thus needs to be worked on and streamed to the client, instead of parsed. This here is the only way I seem to be able to get the semantics correct:
{
"result-set":{
"docs":[
{
"id":"auhcsasb1005_100000"
},
{
"id":"auhcsasb1005_1000000"
},
{
"id":"auhcsasb1005_1000001"
},
{
"id":"auhcsasb1005_1000002"
},
...
...
{
"EOF":true
}
]
}
}
WebClient.create()
.get()
.retrieve()
.bodyToMono(DontKnowWhatClass.class)
.flatMapMany(resultSet -> Flux.fromIterable(resultSet.getDocs()))
BUT that means that I'm deserializing 100MB or more in memory, to then create a flux from it. What I'm wondering is: Am I missing something crucial? Can I somehow just create a Flux from an Object like that? I have now way to influence how the result-set object is rendered, sadly.
I cannot imagine a way to reliably parse such a huge chunk of json in smaller parts. You could try to somehow convert the big chunk into smaller tokens and try to process them step by step.
But I would assume that this ending with the expected result, i.e. allow a more memory efficient parsing, is absolutely not guaranteed.
But, there are other ways to approach this problem, especially when you work reactive.
If you work in a reactive WebFlux context, you could try to use backpressure or rate limiting to make your application only parse a limited number of JSON objects at the same time. This would preserve the limited resources (RAM, CPU, JVM threads etc.) of your application.
And if the size of the objects really passes the 100MB limit, you should really consider questioning the current data model. Is this data structure and its size really suitable?
Maybe this problem cannot be solved with technical / implementation means. Maybe the current application design need to be changed.
You can accept a ServerWebExchange to your controller which has a method that will take a Publisher exchange.response.writeWith().
If you have a way to parse the payload in chunks you just create a Flux that emits the parts.
For example, if you don't care about the payload at all and just want to ship it as-is:
#GetMapping("/api/foo/{myId}")
fun foo(exchange: ServerWebExchange, #PathVariable myId: Long): Mono<Void> {
val content: Flux<DataBuffer> = webClient
.get()
.uri("/api/up-stream/bar/$myId")
.exchange()
.flatMapMany { it.bodyToFlux<DataBuffer>() }
return exchange.response.writeWith(content)
}
Make sure you check the content negotiation settings to avoid something buffering you didn't expect.

Another ConcurrentModificationException question

I've searched StackOverflow and there are many ConcurrentModificationException questions. After reading them, I'm still confused. I'm getting a lot of these exceptions. I'm using a "Registry" setup to keep track of Objects:
public class Registry {
public static ArrayList<Messages> messages = new ArrayList<Messages>();
public static ArrayList<Effect> effects = new ArrayList<Effect>();
public static ArrayList<Projectile> proj = new ArrayList<Projectile>();
/** Clears all arrays */
public static void recycle(){
messages.clear();
effects.clear();
proj.clear();
}
}
I'm adding and removing objects to these lists by accessing the ArrayLists like this: Registry.effects.add(obj) and Registry.effects.remove(obj)
I managed to get around some errors by using a retry loop:
//somewhere in my game..
boolean retry = true;
while (retry){
try {
removeEffectsWithSource("CHARGE");
retry = false;
}
catch (ConcurrentModificationException c){}
}
private void removeEffectsWithSource(String src) throws ConcurrentModificationException {
ListIterator<Effect> it = Registry.effects.listIterator();
while ( it.hasNext() ){
Effect f = it.next();
if ( f.Source.equals(src) ) {
f.unapplyEffects();
Registry.effects.remove(f);
}
}
}
But in other cases this is not practical. I keep getting ConcurrentModificationExceptions in my drawProjectiles() method, even though it doesn't modify anything. I suppose the culprit is if I touched the screen, which creates a new Projectile object and adds it to Registry.proj while the draw method is still iterating.
I can't very well do a retry loop with the draw method, or it will re-draw some of the objects. So now I'm forced to find a new solution.. Is there a more stable way of accomplishing what I'm doing?
Oh and part 2 of my question: Many people suggest using ListIterators (as I have been using), but I don't understand.. if I call ListIterator.remove() does it remove that object from the ArrayList it's iterating through, or just remove it from the Iterator itself?
Top line, three recommendations:
Don't do the "wrap an exception in a loop" thing. Exceptions are for exceptional conditions, not control flow. (Effective Java #57 or Exceptions and Control Flow or Example of "using exceptions for control flow")
If you're going to use a Registry object, expose thread-safe behavioral, not accessor methods on that object and contain the concurrency reasoning within that single class. Your life will get better. No exposing collections in public fields. (ew, and why are those fields static?)
To solve the actual concurrency issues, do one of the following:
Use synchronized collections (potential performance hit)
Use concurrent collections (sometimes complicated logic, but probably efficient)
Use snapshots (probably with synchronized or a ReadWriteLock under the covers)
Part 1 of your question
You should use a concurrent data structure for the multi-threaded scenario, or use a synchronizer and make a defensive copy. Probably directly exposing the collections as public fields is wrong: your registry should expose thread-safe behavioral accessors to those collections. For instance, maybe you want a Registry.safeRemoveEffectBySource(String src) method. Keep the threading specifics internal to the registry, which seems to be the "owner" of this aggregate information in your design.
Since you probably don't really need List semantics, I suggest replacing these with ConcurrentHashMaps wrapped into Set using Collections.newSetFromMap().
Your draw() method could either a) use a Registry.getEffectsSnapshot() method that returns a snapshot of the set; or b) use an Iterable<Effect> Registry.getEffects() method that returns a safe iterable version (maybe just backed by the ConcurrentHashMap, which won't throw CME under any circumstances). I think (b) is preferable here, as long as the draw loop doesn't need to modify the collection. This provides a very weak synchronization guarantee between the mutator thread(s) and the draw() thread, but assuming the draw() thread runs often enough, missing an update or something probably isn't a big deal.
Part 2 of your question
As another answer notes, in the single-thread case, you should just make sure you use the Iterator.remove() to remove the item, but again, you should wrap this logic inside the Registry class if at all possible. In some cases, you'll need to lock a collection, iterate over it collecting some aggregate information, and make structural modifications after the iteration completes. You ask if the remove() method just removes it from the Iterator or from the backing collection... see the API contract for Iterator.remove() which tells you it removes the object from the underlying collection. Also see this SO question.
You cannot directly remove an item from a collection while you are still iterating over it, otherwise you will get a ConcurrentModificationException.
The solution is, as you hint, to call the remove method on the Iterator instead. This will remove it from the underlying collection as well, but it will do it in such a way that the Iterator knows what's going on and so doesn't throw an exception when it finds the collection has been modified.

Is there any disadvantage of writing a long constructor?

Does it affect the time in loading the application?
or any other issues in doing so?
The question is vague on what "long" means. Here are some possible interpretations:
Interpretation #1: The constructor has many parameters
Constructors with many parameters can lead to poor readability, and better alternatives exist.
Here's a quote from Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters:
Traditionally, programmers have used the telescoping constructor pattern, in which you provide a constructor with only the required parameters, another with a single optional parameters, a third with two optional parameters, and so on...
The telescoping constructor pattern is essentially something like this:
public class Telescope {
final String name;
final int levels;
final boolean isAdjustable;
public Telescope(String name) {
this(name, 5);
}
public Telescope(String name, int levels) {
this(name, levels, false);
}
public Telescope(String name, int levels, boolean isAdjustable) {
this.name = name;
this.levels = levels;
this.isAdjustable = isAdjustable;
}
}
And now you can do any of the following:
new Telescope("X/1999");
new Telescope("X/1999", 13);
new Telescope("X/1999", 13, true);
You can't, however, currently set only the name and isAdjustable, and leaving levels at default. You can provide more constructor overloads, but obviously the number would explode as the number of parameters grow, and you may even have multiple boolean and int arguments, which would really make a mess out of things.
As you can see, this isn't a pleasant pattern to write, and even less pleasant to use (What does "true" mean here? What's 13?).
Bloch recommends using a builder pattern, which would allow you to write something like this instead:
Telescope telly = new Telescope.Builder("X/1999").setAdjustable(true).build();
Note that now the parameters are named, and you can set them in any order you want, and you can skip the ones that you want to keep at default values. This is certainly much better than telescoping constructors, especially when there's a huge number of parameters that belong to many of the same types.
See also
Wikipedia/Builder pattern
Effective Java 2nd Edition, Item 2: Consider a builder pattern when faced with many constructor parameters (excerpt online)
Related questions
When would you use the Builder Pattern?
Is this a well known design pattern? What is its name?
Interpretation #2: The constructor does a lot of work that costs time
If the work must be done at construction time, then doing it in the constructor or in a helper method doesn't really make too much of a difference. When a constructor delegates work to a helper method, however, make sure that it's not overridable, because that could lead to a lot of problems.
Here's some quote from Effective Java 2nd Edition, Item 17: Design and document for inheritance, or else prohibit it:
There are a few more restrictions that a class must obey to allow inheritance. Constructors must not invoke overridable methods, directly or indirectly. If you violate this rule, program failure will result. The superclass constructor runs before the subclass constructor, so the overriding method in the subclass will be invoked before the subclass constructor has run. If the overriding method depends on any initialization performed by the subclass constructor, the method will not behave as expected.
Here's an example to illustrate:
public class ConstructorCallsOverride {
public static void main(String[] args) {
abstract class Base {
Base() { overrideMe(); }
abstract void overrideMe();
}
class Child extends Base {
final int x;
Child(int x) { this.x = x; }
#Override void overrideMe() {
System.out.println(x);
}
}
new Child(42); // prints "0"
}
}
Here, when Base constructor calls overrideMe, Child has not finished initializing the final int x, and the method gets the wrong value. This will almost certainly lead to bugs and errors.
Interpretation #3: The constructor does a lot of work that can be deferred
The construction of an object can be made faster when some work is deferred to when it's actually needed; this is called lazy initialization. As an example, when a String is constructed, it does not actually compute its hash code. It only does it when the hash code is first required, and then it will cache it (since strings are immutable, this value will not change).
However, consider Effective Java 2nd Edition, Item 71: Use lazy initialization judiciously. Lazy initialization can lead to subtle bugs, and don't always yield improved performance that justifies the added complexity. Do not prematurely optimize.
Constructors are a little special in that an unhandled exception in a constructor may have weird side effects. Without seeing your code I would assume that a long constructor increases the risk of exceptions. I would make the constructor as simple as needed and utilize other methods to do the rest in order to provide better error handling.
The biggest disadvantage is probably the same as writing any other long function -- that it can get complex and difficult to understand.
The rest is going to vary. First of all, length and execution time don't necessarily correlate -- you could have a single line (e.g., function call) that took several seconds to complete (e.g., connect to a server) or lots of code that executed entirely within the CPU and finished quickly.
Startup time would (obviously) only be affected by constructors that were/are invoked during startup. I haven't had an issue with this in any code I've written (at all recently anyway), but I've seen code that did. On some types of embedded systems (for one example) you really want to avoid creating and destroying objects during normal use, so you create almost everything statically during bootup. Once it's running, you can devote all the processor time to getting the real work done.
Constructor is yet another function. You need very long functions called many times to make the program work slow. So if it's only called once it usually won't matter how much code is inside.
It affects the time it takes to construct that object, naturally, but no more than having an empty constructor and calling methods to do that work instead. It has no effect on the application load time
In case of copy constructor if we use donot use reference in that case
it will create an object and call the copy constructor and passing the
value to the copy constructor and each time a new object is created and
each time it will call the copy constructor it goes to infinite and
fill the memory then it display the error message .
if we pass the reference it will not create the new object for storing
the value. and no recursion will take place
I would avoid doing anything in your constructor that isn't absolutely necessary. Initialize your variables in there, and try not to do much else. Additional functionality should reside in separate functions that you call only if you need to.