IEnumerable yield return combined with .AsParallel() - .net-4.0

I've written some code to try and describe my concern:
static void Main(string[] args)
{
IEnumerable<decimal> marks = GetClassMarks();
IEnumerable<Person> students = GetStudents();
students.AsParallel().ForAll(p => GenerateClassReport(p, marks));
Console.ReadKey();
}
GetClassMarks uses yield return in it from my weird data source. Assume that GenerateClassReport does basically a marks.Sum()/marks.Count() to get the class average.
From what I understand, students.AsParallel().ForAll is a parallel foreach.
My worry is what is going to happen inside the GetClassMarks method.
Is it going to be enumerated once or many times?
What order is the enumeration going to happen in?
Do I need to do a .ToList() on marks to make sure it is only hit once?

Is it going to be enumerated once or many times?
Assuming that GenerateClassReport() enumerates marks once, then marks will be enumerated once for each element in students.
What order is the enumeration going to happen in?
Each thread will enumerate the collection in its default order, but several threads will do so concurrently. The concurrent enumeration order is generally unpredictable. Also, you should note that the number of threads is limited and variable, so most likely not all of the enumerations will occur concurrently.
Do I need to do a .ToList() on marks to make sure it is only hit once?
If GetClassMarks() is an iterator (i.e. it uses the yield construct), then its execution will be deferred and it will be called once for each time marks is enumerated (i.e. once for each element in students). If you use IEnumerable<decimal> marks = GetClassMarks().ToList() or if GetClassMarks() internally returns a concrete list or array, then GetClassMarks() will be executed immediately and the results will be stored and enumerated in each of the parallel threads without calling GetClassMarks() again.

If GetClassMarks is an iterator -- that is, if it uses yield internally -- then it is effectively a query that will be re-executed whenever you call marks.Sum(), marks.Count() etc.
It's almost impossible to predict the order of execution in a parallel query.
Yes. The following will ensure that GetClassMarks is only executed once. Subsequent calls to marks.Sum(), marks.Count() etc will use the concrete list rather than re-executing the GetClassMarks query.
List<decimal> marks = GetClassMarks().ToList();
Note that points 1 and 3 apply whether or not you're using AsParallel. The GetClassMarks query will be executed exactly the same number of times in either case (assuming that the rest of the code, except for the parallel aspects, is the same).

Is it going to be enumerated once or many times?
Just once.
What order is the enumeration going to happen in?
The iterator (function using yield) determines the order.
Do I need to do a .ToList() on marks to make sure it is only hit once?
No.
AsParallel only iterates through its input once, partitioning the input into blocks which are dispatched to worker threads.

Related

Difference between stream.max(Comparator) and stream.collect(Collectors.maxBy(Comparator) in Java

In Java Streams - what is the difference between stream.max(Comparator) and stream.collect(Collectors.maxBy(Comparator)) in terms of preformance. Both will fetch the max based on the comparator being passed. If this is the case why do we need the additional step of collecting using the collect method? When should we choose former vs latter? What are the use case scenarios suited for using both?
They do the same thing, and share the same code.
why do we need the additional step of collecting using the collect method?
You don't. Use max() if that's what you want to do. But there are cases where a Collector can be handy. For example:
Optional<Foo> result = stream.collect(createCollector());
where createCollector() would return a collector based on some condition, which could be maxBy, minBy, or something else.
In general, you shouldn't care too much about the small performance differences that might exist between two methods that do the same thing, and have a huge chance of being implemented the same way. Instead, you should make your code as clear and readable as possible.
There is a relevant quote in Effective Java 3rd Edition, page 214:
The collectors returned by the counting method are intended only for use as downstream collectors. The same functionality is available directly on Stream, via the count method, so there is never a reason to say collect(counting()). There are fifteen more Collectors with this property.
Given that maxBy is duplicated by Stream.max, it is presumably one of these sixteen methods.
Shortly after, same page, it goes on to justify the dual existence:
From a design perspective, these collectors represent an attempt to partially duplicate the functionality of streams in collectors so that downstream collectors can act as "ministreams".
Personally, I find this edict and explanation a bit unsatisfying: it says that it wasn't the intent for these 16 collectors to be used like this, but not why they shouldn't.
I suppose that the methods directly on stream are able to be implemented in specialized ways which could be more efficient than the general collectors.
According to java Documentation ,
the below are definition for maxBy , minBy From Collectors class ,
static <T> Collector<T,?,Optional<T>> maxBy(Comparator<? super T> comparator)
Returns a Collector that produces the maximal element according to a given Comparator, described as an Optional<T>.
static <T> Collector<T,?,Optional<T>> minBy(Comparator<? super T> comparator)
Returns a Collector that produces the minimal element according to a given Comparator, described as an Optional<T>.
where as max() and min() in Stream return the Optional<T>
every stream pipeline operation can be divided into terminal and non terminal operation .
so by definition from java doc , it is one thing clear that Stream provided max() ,min() are terminal operation and return Optional<T> .
but the maxBy() and minBy() are Collector producing operation , so they can be used for chaining computation .
They both use BinaryOperator.maxBy(comparator) and do a reducing operation to the elements (even though the implementation of how it is reduced is slightly different). Hence there are no changes in the output.
If you need to find the max among all the stream elements, I suggest using Stream.max because the code would look neat and also you do not really need to create a collector in this case.
But there are scenarios where Collectors.maxBy need to be used. Assume that you need to group your elements and need to find the max in each group. In such scenarios you cannot use Stream.max. Here you need to use Collectors.groupingBy(mapper, Collectors.maxBy(...)). Similarly you could use it for partitionBy and other similar methods where you need a collector.

In VB.Net, how can I create a method that waits for a variable number of asynchronous calls to complete, and then returns a result?

How can I code a method in VB.Net 2012 that waits for a variable number of asynchronous calls to complete, and only when all calls finish will then return a result?
I'm writing an app that retrieves a value from various web pages, and then returns the sum of those values. The number of values to retrieve will be determined by the user at runtime. As web retrieval is asynchronous by nature, I'm trying to make the app more efficient by coding it as such. I've just read about the keywords Async and Await, which seem perfect for the job. I also found this example of how to do it in C#: Run two async tasks in parallel and collect results in .NET 4.5.
But there are two issues with this example: 1) At first glance, I don't know how to make the same thing happen in VB.Net, and 2) I don't know how it could be redesigned to handle a variable number of called tasks.
Here's a pseudo-translation from the example, of what I hope to achieve:
Function GetSumOfValues(n as Integer)
For i = 1 To n
GetValueAsync<i>.Start()
Next i
Dim result = Await Task.WhenAll(GetValueAsync<?*>)
Return result.Sum()
End Function
Note the question mark, as I'm not sure if it's possible to give WhenAll a "wildcarded" group of tasks. Perhaps with an object collection?
You can use this example of using tasks with Task.WaitAll
Now, to collect data asynchronously, you can use a static method with sync lock. Or one of the synchronized collections

What is the difference between an Idempotent and a Deterministic function?

Are idempotent and deterministic functions both just functions that return the same result given the same inputs?
Or is there a distinction that I'm missing?
(And if there is a distinction, could you please help me understand what it is)
In more simple terms:
Pure deterministic function: The output is based entirely, and only, on the input values and nothing else: there is no other (hidden) input or state that it relies on to generate its output. There are no side-effects or other output.
Impure deterministic function: As with a deterministic function that is a pure function: the output is based entirely, and only, on the input values and nothing else: there is no other (hidden) input or state that it relies on to generate its output - however there is other output (side-effects).
Idempotency: The practical definition is that you can safely call the same function multiple times without fear of negative side-effects. More formally: there are no changes of state between subsequent identical calls.
Idempotency does not imply determinacy (as a function can alter state on the first call while being idempotent on subsequent calls), but all pure deterministic functions are inherently idempotent (as there is no internal state to persist between calls). Impure deterministic functions are not necessarily idempotent.
Pure deterministic
Impure deterministic
Pure Nondeterministic
Impure Nondeterministic
Idempotent
Input
Only parameter arguments (incl. this)
Only parameter arguments (incl. this)
Parameter arguments and hidden state
Parameter arguments and hidden state
Any
Output
Only return value
Return value or side-effects
Only return value
Return value or side-effects
Any
Side-effects
None
Yes
None
Yes
After 1st call: Maybe.After 2nd call: None
SQL Example
UCASE
CREATE TABLE
GETDATE
DROP TABLE
C# Example
String.IndexOf
DateTime.Now
Directory.Create(String)Footnote1
Footnote1 - Directory.Create(String) is idempotent because if the directory already exists it doesn't raise an error, instead it returns a new DirectoryInfo instance pointing to the specified extant filesystem directory (instead of creating the filesystem directory first and then returning a new DirectoryInfo instance pointing to it) - this is just like how Win32's CreateFile can be used to open an existing file.
A temporary note on non-scalar parameters, this, and mutating input arguments:
(I'm currently unsure how instance methods in OOP languages (with their hidden this parameter) can be categorized as pure/impure or deterministic or not - especially when it comes to mutating the the target of this - so I've asked the experts in CS.SE to help me come to an answer - once I've got a satisfactory answer there I'll update this answer).
A note on Exceptions
Many (most?) programming languages today treat thrown exceptions as either a separate "kind" of return (i.e. "return to nearest catch") or as an explicit side-effect (often due to how that language's runtime works). However, as far as this answer is concerned, a given function's ability to throw an exception does not alter its pure/impure/deterministic/non-deterministic label - ditto idempotency (in fact: throwing is often how idempotency is implemented in the first place e.g. a function can avoid causing any side-effects simply by throwing right-before it makes those state changes - but alternatively it could simply return too.).
So, for our CS-theoretical purposes, if a given function can throw an exception then you can consider the exception as simply part of that function's output. What does matter is if the exception is thrown deterministically or not, and if (e.g. List<T>.get(int index) deterministically throws if index < 0).
Note that things are very different for functions that catch exceptions, however.
Determinacy of Pure Functions
For example, in SQL UCASE(val), or in C#/.NET String.IndexOf are both deterministic because the output depends only on the input. Note that in instance methods (such as IndexOf) the instance object (i.e. the hidden this parameter) counts as input, even though it's "hidden":
"foo".IndexOf("o") == 1 // first cal
"foo".IndexOf("o") == 1 // second call
// the third call will also be == 1
Whereas in SQL NOW() or in C#/.NET DateTime.UtcNow is not deterministic because the output changes even though the input remains the same (note that property getters in .NET are equivalent to a method that accepts no parameters besides the implicit this parameter):
DateTime.UtcNow == 2016-10-27 18:10:01 // first call
DateTime.UtcNow == 2016-10-27 18:10:02 // second call
Idempotency
A good example in .NET is the Dispose() method: See Should IDisposable.Dispose() implementations be idempotent?
a Dispose method should be callable multiple times without throwing an exception.
So if a parent component X makes an initial call to foo.Dispose() then it will invoke the disposal operation and X can now consider foo to be disposed. Execution/control then passes to another component Y which also then tries to dispose of foo, after Y calls foo.Dispose() it too can expect foo to be disposed (which it is), even though X already disposed it. This means Y does not need to check to see if foo is already disposed, saving the developer time - and also eliminating bugs where calling Dispose a second time might throw an exception, for example.
Another (general) example is in REST: the RFC for HTTP1.1 states that GET, HEAD, PUT, and DELETE are idempotent, but POST is not ( https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html )
Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE SHOULD NOT have side effects, and so are inherently idempotent.
So if you use DELETE then:
Client->Server: DELETE /foo/bar
// `foo/bar` is now deleted
Server->Client: 200 OK
Client->Server DELETE /foo/bar
// foo/bar` is already deleted, so there's nothing to do, but inform the client that foo/bar doesn't exist
Server->Client: 404 Not Found
// the client asks again:
Client->Server: DELETE /foo/bar
// foo/bar` is already deleted, so there's nothing to do, but inform the client that foo/bar doesn't exist
Server->Client: 404 Not Found
So you see in the above example that DELETE is idempotent in that the state of the server did not change between the last two DELETE requests, but it is not deterministic because the server returned 200 for the first request but 404 for the second request.
A deterministic function is just a function in the mathematical sense. Given the same input, you always get the same output. On the other hand, an idempotent function is a function which satisfies the identity
f(f(x)) = f(x)
As a simple example. If UCase() is a function that converts a string to an upper case string, then clearly UCase(Ucase(s)) = UCase(s).
Idempotent functions are a subset of all functions.
A deterministic function will return the same result for the same inputs, regardless of how many times you call it.
An idempotent function may NOT return the same result (it will return the result in the same form but the value could be different, see http example below). It only guarantees that it will have no side effects. In other words it will not change anything.
For example, the GET verb is meant to be idempotent in HTTP protocol. If you call "~/employees/1" it will return the info for employee with ID of 1 in a specific format. It should never change anything but simply return the employee information. If you call it 10, 100 or so times, the returned format will always be the same. However, by no means can it be deterministic. Maybe if you call it the second time, the employee info has changed or perhaps the employee no longer even exists. But never should it have side effects or return the result in a different format.
My Opinion
Idempotent is a weird word but knowing the origin can be very helpful, idem meaning same and potent meaning power. In other words it means having the same power which clearly doesn't mean no side effects so not sure where that comes from. A classic example of There are only two hard things in computer science, cache invalidation and naming things. Why couldn't they just use read-only? Oh wait, they wanted to sound extra smart, perhaps? Perhaps like cyclomatic complexity?

Is it bad practice to use a function in an if statement?

I have a function in another class file that gets information about the battery. In a form I have the following code:
If BatteryClass.getStatus_Battery_Charging = True Then
It appears Visual Studio accepts this. However, would it be better if I used the following code, which also works:
dim val as boolean = BatteryClass.getStatus_Battery_Charging
If val = True Then
Is there a difference between these two methods?
What you're asking in general is which approach is idiomatic.
The technical rule is not to invoke a method multiple times - unless you're specifically checking a volatile value for change - when its result can be preserved in a locally scoped variable. That's not what your asking but its important to understand that multiple calls should typically be bound to a variable.
That being said its better to produce less lines of code from a maintenance perspective as long as doing so improves the readability of your code. If you do have to declare a locally scoped variable to hold the return value of a method make sure to give the variable a meaningful name.
Prefer this [idiomatic VB.NET] one liner:
If BatteryClass.getStatus_Battery_Charging Then
over this:
Dim isBatteryCharging As Boolean = BatteryClass.getStatus_Battery_Charging
If isBatteryCharging Then
Another point you should concern yourself with are methods, which when invoked, create a side effect that affects the state of your program. In most circumstances it is undesirable to have a side effect causing method invoked multiple times - when possible rewrite such side affecting methods so that they do not cause any side effects. To limit the number of times a side effect will occur use the same local variable scoping rule instead of performing multiple invocations of the method.
No real difference.
The second is better if you need the value again of course. It's also marginally easier to debug when you have a value stored in a variable.
Personally I tend to use the first because I'm an old school C programmer at heart!

Void-returning functions in UML sequence diagrams

I have a problem with the sequence model seen in the diagram below, specifically where the System object is creating a new Number. In this case, there is no need for a return message since the function SaveInput(n), both in System and Number, is the end of the line for that portion of the program, but unless I include one, the modeller reshaped my diagram into the other one I've uploaded here, and I can't see how to arrange the messages so that my program will work the way I intend without including the return message (the one without a name) from Number to System, since the functions SaveInput() both return a void.
How should void-returning functions be handled in sequence diagrams so that they behave correctly? I have opened the message properties and explicitly defined it as returning a void, but that hasn't helped.
When A calls operation b in B, the "return" arrow from B to A indicates the end of the operation b has finished its execution. This doesn´t mean that as part of the return message you have to return a value, it only means that the execution is done and you can continue with the next messages. Visually, most tools also use these return messages to manage the life bar of the object.