Which components of VTD xml are thread safe? - vtd-xml

Using VTD 2.11
Can VTDGen be init once and used by multiple threads?
For instance I want to use it in a servlet so the VTGen gets initialized once when the servlet get initialized and then each request coming in parse what ever document is received.
Same for AutoPilot I would figure I can set my XPATH once and then keep rebinding it each new navigation?

Yes, VTD-XML can be initiated once and used many many times. But because VTDGen's initialization cost is super low, instantiating multiple copies of it incurs little cost...
AutoPilot is also designed to be reused... it is intimately linked to an XPath expression.
However, there are many circumstances in which it makes sense to assign AutoPilot instances of to each thread...each of those instances refers to the same xpath...
As an example:
AutoPilot ap1 = new AutoPilot();
AutoPilot ap2 = new AutoPilot();
ap1.selectXpath("/a/b/c") // assign to thread 1
ap2.selectXPath("/a/b/c"); // assigned to thread2,
Although ap1 and ap2 select the same xpath, they are two distinct xpath objects, and can be evaluated independently by 2 threads...which is better than trying to share a single AutoPilot between 2 threads... which leads to undesirable thread contention...

Related

Reactor - Stop source when first empty

I have a requirement like this.
Flux<Integer> s1 = .....;
s1.flatMap(value -> anotherSource.find(value));
I need a way to stop this s1 when anotherSource.find gives me first empty. how to do that?
Note:
One possible solution is to throw error then capture it to stop.
anotherSource.find(value).switchIfempty(Mono.error(..))
I am looking for better solution than this.
You won't find a specific operator for this, you'll have to combine operators to achieve it. (Note that doesn't make it a "hack" per-se, reactive frameworks are generally intended to be used in a way where you combine basic operators together to achieve your use-case.)
I would agree that using an error to achieve is far from ideal though as it potentially disrupts the flow of real errors in the reactive chain - so that should really be a last resort.
The approach I've generally taken in cases where I want the stream to stop based on an inner publisher is to materialise the inner stream, filter out the onComplete() signals and then re-add the onComplete() wherever appropriate (in this case, if it's empty.) You can then dematerialise the outer stream and it'll respond to the completed signal wherever you've injected it, stopping the stream:
s1.flatMap(
value ->
anotherSource
.find(value)
.materialize()
.filter(s -> !s.isOnComplete())
.defaultIfEmpty(Signal.complete()))
.dematerialize()
This has the advantage of preserving any error signals, while also not requiring another object or special value.

Elm: avoiding a Maybe check each time

I am building a work-logging app which starts by showing a list of projects that I can select, and then when one is selected you get a collection of other buttons, to log data related to that selected project.
I decided to have a selected_project : Maybe Int in my model (projects are keyed off an integer id), which gets filled with Just 2 if you select project 2, for example.
The buttons that appear when a project is selected send messages like AddMinutes 10 (i.e. log 10 minutes of work to the selected project).
Obviously the update function will receive one of these types of messages only if a project has been selected but I still have to keep checking that selected_project is a Just p.
Is there any way to avoid this?
One idea I had was to have the buttons send a message which contains the project id, such as AddMinutes 2 10 (i.e. log 10 minutes of work to project 2). To some extent this works, but I now get a duplication -- the Just 2 in the model.selected_project and the AddMinutes 2 ... message that the button emits.
Update
As Simon notes, the repeated check that model.selected_project is a Just p has its upside: the model stays relatively more decoupled from the UI. For example, there might be other UI ways to update the projects and you might not need to have first selected a project.
To avoid having to check the Maybe each time you need a function which puts you into a context wherein the value "wrapped" by the Maybe is available. That function is Maybe.map.
In your case, to handle the AddMinutes Int message you can simply call: Maybe.map (functionWhichAddsMinutes minutes) model.selected_project.
Clearly, there's a little bit more to it since you have to produce a model, but the point is you can use Maybe.map to perform an operation if the value is available in the Maybe. And to handle the Maybe.Nothing case, you can use Maybe.withDefault.
At the end of the day is this any better than using a case expression? Maybe, maybe not (pun intended).
Personally, I have used the technique of providing the ID along with the message and I was satisfied with the result.

Erlang. Registering processes assignment

I am currently reading the Programming Erlang Second Edition Writing Software for a concurrent world written by Joe Armstrong and I have the following assignment :
Write a function start(AnAtom, Fun) to register AnAtom as spawn(Fun). Make sure your program works correctly in the case when two parallel processes simultaneously evaluate start/2. In this case you must guarantee that one succeeds and the other fails.
I understand the first bit. I need to register the process of Fun to the AnAtom. However what does the second part want me to do?
If two processes call start/2 at the same time then one of them must fail? Why? Given that the AnAtom is different to any others (which will be done inside the body of start/2 why would I want to fail one of the processes?
From what I can understand so far we have:
a = spawn(process1).
b = spawn(process2).
a ! {self(), registerProcess} //which should call the start/2
b ! {self(), registerProcess} //which should call the start/2
What is the problem here? Two processes will evaluate start/2. Why fail one of them? I'm probably missing the logic here or what I understood so far is completely wrong. Can anybody explain this in easier terms so I can get my head around it?
I believe the exercise is asking you to think about what happens when two parallel process evaluate start/2 using the SAME atom as the first parameter. When start(a, MyFunction) completes, there should be a spawned function (running MyFunction) associated with the name (atom) a.... what happens if
start(cool, MyFun1) and
start(cool, MyFun2)
are both executed simultaneously? How do you guarantee that one succeeds and the other fails.... does this help?
EDIT: I think you are not understanding the register process part of the assignment. When you get done with start(name, MyFun), doing a whereis(name) from the repl should return the process identifier of the process that got created.
This is not about sending the process a message to give it a name, it is about registering the process your created under the name passed in as the first parameter to start/2

How to convert Greensock's CustomEase functions to be usable in CreateJS's Tween system?

I'm currently working on a project that does not include GSAP (Greensock's JS Tweening library), but since it's super easy to create your own Custom Easing functions with it's visual editor - I was wondering if there is a way to break down the desired ease-function so that it can be reused in a CreateJS Tween?
Example:
var myEase = CustomEase.create("myCustomEase", [
{s:0,cp:0.413,e:0.672},{s:0.672,cp:0.931,e:1.036},
{s:1.036,cp:1.141,e:1.036},{s:1.036,cp:0.931,e:0.984},
{s:0.984,cp:1.03699,e:1.004},{s:1.004,cp:0.971,e:0.988},
{s:0.988,cp:1.00499,e:1}
]);
So that it turns it into something like:
var myEase = function(t, b, c, d) {
//Some magic algorithm performed on the 7 bezier/control points above...
}
(Here is what the graph would look like for this particular easing method.)
I took the time to port and optimize the original GSAP-based CustomEase class... but due to license restrictions / legal matters (basically a grizzly bear that I do not want to poke with a stick...), posting the ported code would violate it.
However, it's fair for my own use. Therefore, I believe it's only fair that I guide you and point you to the resources that made it possible.
The original code (not directly compatible with CreateJS) can be found here:
https://github.com/art0rz/gsap-customease/blob/master/CustomEase.js (looks like the author was also asked to take down the repo on github - sorry if the rest of this post makes no sense at all!)
Note that CreateJS's easing methods only takes a "time ratio" value (not time, start, end, duration like GSAP's easing method does). That time ratio is really all you need, given it goes from 0.0 (your start value) to 1.0 (your end value).
With a little bit of effort, you can discard those parameters from the ease() method and trim down the final returned expression.
Optimizations:
I took a few extra steps to optimize the above code.
1) In the constructor, you can store the segments.length value directly as this.length in a property of the CustomEase instance to cut down a bit on the amount of accessors / property lookups in the ease() method (where qty is set).
2) There's a few redundant calculations done per Segments that can be eliminated in the ease() method. For instance, the s.cp - s.s and s.e - s.s operations can be precalculated and stored in a couple of properties in each Segments (in its constructor).
3) Finally, I'm not sure why it was designed this way, but you can unwrap the function() {...}(); that are returning the constructors for each classes. Perhaps it was used to trap the scope of some variables, but I don't see why it couldn't have wrapped the entire thing instead of encapsulating each one separately.
Need more info? Leave a comment!

Is foreach the only way to consume a BlockingCollection<T> in C#?

I'm starting to work with TPL right now. I have seen a simple version of the producer/consumer model utilizing TPL in this video.
Here is the problem:
The following code:
BlockingCollection<Double> bc = new BlockingCollection<Double>(100);
IEnumerable<Double> d = bc.GetConsumingEnumerable();
returns an IEnumerable<Double> which can be iterated (and automatically consumed) using a foreach:
foreach (var item in d)
{
// do anything with item
// in the end of this foreach,
// will there be any items left in d or bc? Why?
}
My questions are:
if I get the IEnumerator<Double> dEnum = d.GetEnumerator() from d (to iterate over d with a while loop, for instance) would the d.MoveNext() consume the list as well? (My answer: I don't think so, because the the dEnum is not linked with d, if you know what I mean. So it would consume dEnum, but not d, nor even bc)
May I loop through bc (or d) in a way other than the foreach loop, consuming the items? (the while cycles much faster than the foreach loop and I'm worried with performance issues for scientific computation problems)
What does exactly consume mean in the BlockingCollection<T> type?
E.g., code:
IEnumerator<Double> dEnum = d.GetEnumerator();
while (dEnum.MoveNext())
{
// do the same with dEnum.Current as
// I would with item in the foreach above...
}
Thank you all in advance!
If I get the IEnumerator<Double> dEnum = d.GetEnumerator() from d (to iterate over d with a while loop, for instance) would the d.MoveNext() consume the list as well?
Absolutely. That's all that the foreach loop will do anyway.
May I loop through bc (or d) in a way other than the foreach loop, consuming the items? (the while cycles much faster than the foreach loop and I'm worried with performance issues for scientific computation problems)
If your while loop is faster, that suggests you're doing something wrong. They should be exactly the same - except the foreach loop will dispose of the iterator too, which you should do...
If you can post a short but complete program demonstrating this discrepancy, we can look at it in more detail.
An alternative is to use Take (and similar methods).
What does exactly consume mean in the BlockingCollection type?
"Remove the next item from the collection" effectively.
There is no "performance issue" with foreach. Using the enumerator directly is not likely to give you any measurable improvement in performance compared to just using a foreach loop directly.
That being said, GetConsumingEnumerable() returns a standard IEnumerable<T>, so you can enumerate it any way you choose. Getting the IEnumerator<T> and enumerating through it directly will still work the same way.
Note that, if you don't want to use GetConsumingEnumerable(), you could just use ConcurrentQueue<T> directly. By default, BlockingCollection<T> wraps a ConcurrentQueue<T>, and really just provides a simpler API (GetConsumingEnumerable()) to make Producer/Consumer scenarios simpler to write. Using a ConcurrentQueue<T> directly would be closer to using BlockingCollection<T> without using the enumerable.