Difference between stream.max(Comparator) and stream.collect(Collectors.maxBy(Comparator) in Java - api

In Java Streams - what is the difference between stream.max(Comparator) and stream.collect(Collectors.maxBy(Comparator)) in terms of preformance. Both will fetch the max based on the comparator being passed. If this is the case why do we need the additional step of collecting using the collect method? When should we choose former vs latter? What are the use case scenarios suited for using both?

They do the same thing, and share the same code.
why do we need the additional step of collecting using the collect method?
You don't. Use max() if that's what you want to do. But there are cases where a Collector can be handy. For example:
Optional<Foo> result = stream.collect(createCollector());
where createCollector() would return a collector based on some condition, which could be maxBy, minBy, or something else.
In general, you shouldn't care too much about the small performance differences that might exist between two methods that do the same thing, and have a huge chance of being implemented the same way. Instead, you should make your code as clear and readable as possible.

There is a relevant quote in Effective Java 3rd Edition, page 214:
The collectors returned by the counting method are intended only for use as downstream collectors. The same functionality is available directly on Stream, via the count method, so there is never a reason to say collect(counting()). There are fifteen more Collectors with this property.
Given that maxBy is duplicated by Stream.max, it is presumably one of these sixteen methods.
Shortly after, same page, it goes on to justify the dual existence:
From a design perspective, these collectors represent an attempt to partially duplicate the functionality of streams in collectors so that downstream collectors can act as "ministreams".
Personally, I find this edict and explanation a bit unsatisfying: it says that it wasn't the intent for these 16 collectors to be used like this, but not why they shouldn't.
I suppose that the methods directly on stream are able to be implemented in specialized ways which could be more efficient than the general collectors.

According to java Documentation ,
the below are definition for maxBy , minBy From Collectors class ,
static <T> Collector<T,?,Optional<T>> maxBy(Comparator<? super T> comparator)
Returns a Collector that produces the maximal element according to a given Comparator, described as an Optional<T>.
static <T> Collector<T,?,Optional<T>> minBy(Comparator<? super T> comparator)
Returns a Collector that produces the minimal element according to a given Comparator, described as an Optional<T>.
where as max() and min() in Stream return the Optional<T>
every stream pipeline operation can be divided into terminal and non terminal operation .
so by definition from java doc , it is one thing clear that Stream provided max() ,min() are terminal operation and return Optional<T> .
but the maxBy() and minBy() are Collector producing operation , so they can be used for chaining computation .

They both use BinaryOperator.maxBy(comparator) and do a reducing operation to the elements (even though the implementation of how it is reduced is slightly different). Hence there are no changes in the output.
If you need to find the max among all the stream elements, I suggest using Stream.max because the code would look neat and also you do not really need to create a collector in this case.
But there are scenarios where Collectors.maxBy need to be used. Assume that you need to group your elements and need to find the max in each group. In such scenarios you cannot use Stream.max. Here you need to use Collectors.groupingBy(mapper, Collectors.maxBy(...)). Similarly you could use it for partitionBy and other similar methods where you need a collector.

Related

What are the benifits of using Higher order functions in Kotlin?

I am learning about higher order functions[HOF] and lambda in Kotlin.
I checked the Kotlin docs but didn't understand it, I found one benefit of HOF:
You can perform any operations on functions that are possible for other non-function values.
so, What are 'non-functional values'?
and What are those 'operations'?
In a higher order function if a lambda is taking two parameters and returning a value, then can't we just use a function for it?
and what is the real scenario when we have return a function?
I have seen real programs in Kotlin, but I haven't seen any use of lambda or HOF in them.
I want to understand why, else many of the features would just go unused.
It's just a part of the Kotlin syntax that makes it more concise and understandable.
For example, try to imagine this code without using lambdas and HOF like map, filter etc:
val sum = listOfInts.filter{it % 2 == 0}.map{it*it}.sumOf{it % 10}
Type-safe builders is a cool thing too.
These functions are widely used in many libraries and frameworks, like Compose by Google. The first thing I remembered - State hoisting pattern.

Convert IVectorView to std::span

winrt::hstring is convertible to std::basic_string_view which comes in handy quite often. However, I am unable to do the same for IVectorView.
Looking at the interface of IVector, I imagine you would have to convert it back to the underlying implementation type so I tried
using impl_type = winrt::impl::vector_impl<float, std::vector<float>, winrt::impl::single_threaded_collection_base>;
winrt::Windows::Foundation::Collections::IVectorView vector_view = GetIVectorView();
auto& impl = *winrt::get_self<impl_type>(vector_view);
auto& container = impl.get_container();
which compiles but container.size() is 0 which is incorrect.
Edit:
vector_view was the result of the TensorFloat.GetAsVectorView Method. So I can solve my problem by using the TensorFloat.CreateReference Method to get a IMemoryBufferReference instead of a IVectorView.
However, I'd still like to know whether IVectorView can be converted to a std::span, if not why is this not allowed.
The IVector and IVectorView interfaces are specifically designed not to expose the underlying contiguous memory, probably to support cases where there is no underlying contiguous memory or the implementation language doesn't expose it as such (javascript??).
You probably could get back the implementation type in when you know cppwinrt provides the implementation, however I'm my case there is no possible way of knowing the implemention type. In any case, it's inadvisable to do this.
In my case it would have been better if TensorFloat.GetAsVectorView did not exist so I could find TensorFloat.CreateReference.
Also it would be nice if cppwinrt made themselves range-v3 compatible. But until the most advisable thing to do is just copy to a std::vector.

Mono.Defer() vs Mono.create() vs Mono.just()?

Could someone help me to understand the difference between:
Mono.defer()
Mono.create()
Mono.just()
How to use it properly?
Mono.just(value) is the most primitive - once you have a value you can wrap it into a Mono and subscribers down the line will get it.
Mono.defer(monoSupplier) lets you provide the whole expression that supplies the resulting Mono instance. The evaluation of this expression is deferred until somebody subscribes. Inside of this expression you can additionally use control structures like Mono.error(throwable) to signal an error condition (you cannot do this with Mono.just).
Mono.create(monoSinkConsumer) is the most advanced method that gives you the full control over the emitted values. Instead of the need to return Mono instance from the callback (as in Mono.defer), you get control over the MonoSink<T> that lets you emit values through MonoSink.success(), MonoSink.success(value), MonoSink.error(throwable) methods.
Reactor documentation contains a few good examples of possible Mono.create use cases: link to doc.
The general advice is to use the least powerful abstraction to do the job: Mono.just -> Mono.defer -> Mono.create.
Although in general I agree with (and praise) #IlyaZinkovich's answer, I would be careful with the advice
The general advice is to use the least powerful abstraction to do the job: Mono.just -> Mono.defer -> Mono.create.
In the reactive approach, especially if we are beginners, it's very easy to overlook which the "least powerful abstraction" actually is. I am not saying anything else than #IlyaZinkovich, just depicting one detailed aspect.
Here is one specific use case where the more powerful abstraction Mono.defer() is preferable over Mono.just() but which might not be visible at the first glance.
See also:
https://stackoverflow.com/a/54412779/2886891
https://stackoverflow.com/a/57877616/2886891
We use switchIfEmpty() as a subscription-time branching:
// First ask provider1
provider1.provide1(someData)
// If provider1 did not provide the result, ask the fallback provider provider2
.switchIfEmpty(provider2.provide2(someData))
public Mono<MyResponse> provide2(MyRequest someData) {
// The Mono assembly is needed only in some corner cases
// but in fact it is always happening
return Mono.just(someData)
// expensive data processing which might even fail in the assemble time
.map(...)
.map(...)
...
}
provider2.provide2() accepts someData only when provider1.provide1() does not return any result, and/or the method assembly of the Mono returned by provider2.provide2() is expensive and even fails when called on wrong data.
It this case defer() is preferable, even if it might not be obvious at the first glance:
provider1.provide1(someData)
// ONLY IF provider1 did not provide the result, assemble another Mono with provider2.provide()
.switchIfEmpty(Mono.defer(() -> provider2.provide2(someData)))

needs for synchronous programming

EDIT: This question was misexpressed. What I've really wanted to ask was:
Is there anything what cant be written in OO languages (with support for closures) using continuation-passing style?
You can google what CPS does mean or just stick with definition of function/method never returning anything, always pushing data somewhere - using passed callback.
And after yers from original question, I can even answer myself - there's nothing like that. And moreover it's actually very good OO principle called Tell Dont Ask
function getName(){
return this.name;
}
console.log(xyz.getName())
vs.
function pushNameTo(callback){
callback(this.name);
}
xyz.pushNameTo(console.log)
good, but this time it was named after how it does the thing, lets name it after what it does and make it even more OO:
function renderOn(responseBuilder){
var b = responseBuilder;
//or just string, whatever, depending on your builder implementation
b.field("Name: ", this.name);
b.field("Age: ", this.age);
b.image("Profile photo", this.imageData);
}
person.renderOn(htmlBuilder);
the point here is - the object encapsulates not only its data but even behavior, the spirit, personality. Who else should be responsible for expressing person's representation rather than person itself?
Of course this does not necessarily means you should have html in your code, builder serves this purpose. It can even generate some xml or other data-format for actual UI-rendering layer. But its always push instead of pull.
Nothing, of course. Consider: if you have a program that is completely sequential, you could simply insert it into some kind of wrapper, like document.onload(). Then the sequential program would be started asynchronously.
Going the other way around, if all you have is a synchronous language, you can always write the asynchronous case by having a table of pieces to be executed, and an inner loop that looks to see what's been enabled, and takes it from the table to execute. in fact, this would look very much like the underlying runtime in whoich your javascript runs.
There are two types of programs -- imperative and functional.
Imperative programs are sequantial -- one step after another. C++, Java, etc. are examples.
Functional programs may not be sequential. Most async patterns use "continuation-style" programming, which is a type of functional programming with imperative overtones.
JavaScript is an imperative language which has first-class functions, i.e. it also enables certain functional programming paradigms.
What you described in your question is "continuation-style" async programming. Notice that the meaning of a "continuation" is "the rest of the program after this line". Therefore, theoretically, every imperative program can be rewritten in "continuation" style (i.e. the first line with a continuation of the rest of the program starting form the second line, and so on and so forth). For example:
Statement #1
Statement #2
Statement #3
can be rewritten as:
do(Statement #1, function{
do(Statement #2, function{
Statement #3
})
})
where the second parameter to do is the continuation of the statement.
Loops are more tricky though, but they can also be rewritten similarly -- essentially passing the loop body itsslef as the continuation.

Most appropriate data structure for dynamic languages field access

I'm implementing a dynamic language that will compile to C#, and it's implementing its own reflection API (.NET's is too slow, and the DLR is limited only to more recent and resourceful implementations).
For this, I've implemented a simple .GetField(string f) and .SetField(string f, object val) interface. Until recently, the implementation just switches over all possible field string values and makes the corresponding action.
Also, this dynamic language has the possibility to define anonymous objects. For those anonymous objects, at first, I had implemented a simple hash algorithm.
By now, I am looking for ways to optimize the dynamic parts of the language, and I have come across the fact that a hash algorithm for anonymous objects would be overkill. This is because the objects are usually small. I'd say the objects contain 2 or 3 fields, normally. Very rarely, they would contain more than 15 fields. It would take more time to actually hash the string and perform the lookup than if I would test for equality between them all. (This is not tested, just theoretical).
The first thing I did was to -- at compile-time -- create a red-black tree for each anonymous object declaration and have it laid onto an array so that the object can look for it in a very optimized way.
I am still divided, though, if that's the best way to do this. I could go for a perfect hashing function. Even more radically, I'm thinking about dropping the need for strings and actually work with a struct of 2 longs.
Those two longs will be encoded to support 10 chars (A-za-z0-9_) each, which is mostly a good prediction of the size of the fields. For fields larger than this, a special function (slower) receiving a string will also be provided.
The result will be that strings will be inlined (not references), and their comparisons will be as cheap as a long comparison.
Anyway, it's a little hard to find good information about this kind of optimization, since this is normally thought on a vm-level, not a static language compilation implementation.
Does anyone have any thoughts or tips about the best data structure to handle dynamic calls?
Edit:
For now, I'm really going with the string as long representation and a linear binary tree lookup.
I don't know if this is helpful, but I'll chuck it out in case;
If this is compiling to C#, do you know the complete list of fields at compile time? So as an idea, if your code reads
// dynamic
myObject.foo = "some value";
myObject.bar = 32;
then during the parse, your symbol table can build an int for each field name;
// parsing code
symbols[0] == "foo"
symbols[1] == "bar"
then generate code using arrays or lists;
// generated c#
runtimeObject[0] = "some value"; // assign myobject.foo
runtimeObject[1] = 32; // assign myobject.bar
and build up reflection as a separate array;
runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string>
runtimeObject.FieldIds["foo"] === 0; // Dictionary<string, int>
As I say, thrown out in the hope it'll be useful. No idea if it will!
Since you are likely to be using the same field and method names repeatedly, something like string interning would work well to quickly generate keys for your hash tables. It would also make string equality comparisons constant-time.
For such a small data set (expected upper bounds of 15) I think almost any hashing will be more expensive then a tree or even a list lookup, but that is really dependent on your hashing algorithm.
If you want to use a dictionary/hash then you'll need to make sure the objects you use for the key return a hash code quickly (perhaps a single constant hash code that's built once). If you can prevent collisions inside of an object (sounds pretty doable) then you'll gain the speed and scalability (well for any realistic object/class size) of a hash table.
Something that comes to mind is Ruby's symbols and message passing. I believe Ruby's symbols act as a constant to just a memory reference. So comparison is constant, they are very lite, and you can use symbols like variables (I'm a little hazy on this and don't have a Ruby interpreter on this machine). Ruby's method "calling" really turns into message passing. Something like: obj.func(arg) turns into obj.send(:func, arg) (":func" is the symbol). I would imagine that symbol makes looking up the message handler (as I'll call it) inside the object pretty efficient since it's hash code most likely doesn't need to be calculated like most objects.
Perhaps something similar could be done in .NET.