multiple writes on a writestream in node.js - file-io

I've been looking at the code of node-dirty and noticed that when writing a lot of data to a file, the original programmer has chosen to bunch the writes into several groups and issue writes of the groups one at a time, but they are all issued simultaneously as part of one loop, without waiting for any callbacks. I have three questions about this. I have a similar problem to solve.
Is this more efficient in some way? Should I be bundling writes too?
How should I choose the optimum bundle size? Why not just write one group?
If I sign up to the on('drain') event on the writestream, will it be emitted only once after all the simultaneously issued writes have completed? Or after each? (my guess is the former)
If the on('error') is emitted, will the ('drain') event also be emitted? Or are they mutually exclusive?
thanks

Is this more efficient in some way?
Should I be bundling writes too?
It's inefficient to make many small writes. sending a write command has overhead attached to it. So writing just 5 bytes instead of a 1000 is more expensive.
How should I choose the optimum bundle
size? Why not just write one group?
Optimum size sounds like a black art to me. I presume there are good reasons for not making it one big write. Probably to start writing earlier then later. It's slightly more efficient to start a bit earlier.
If I sign up to the on('drain') event
on the writestream, will it be emitted
only once after all the simultaneously
issued writes have completed? Or after
each? (my guess is the former)
Drain triggers when everything in the write queue has finished writing. So as long as you append to the write queue faster then it writes it, it should only trigger once. You'd need one hell of a system to pull of an edge-case like that.
If the on('error') is emitted, will
the ('drain') event also be emitted?
Or are they mutually exclusive?
Even if it is emitted it doesn't make sense to do error handling in 'drain'. If an error has occurred I would always assume that the entire writing operation has failed and not try to recover mid-write.

For 4. If the on('error') is emitted, will the ('drain') event also be emitted? Or are they mutually exclusive?
You are worried about it since you don't want to maintain state in your application right. So, maybe you could use a convenience function:
function not_if(proc, veto_inner) {
var vetoed = false;
return {
proc: function() {
if (!vetoed) { return proc.apply(null, arguments); }
},
vetoer: function() {
if (!vetoed) {
vetoed = true;
veto_inner.apply(null, arguments);
}
};
}
Now, you can set the 'error' handler to vetoer and the 'drain' handler to 'proc' and not wirry about 'drain' being called after 'error' is called.

Related

Proper logging in reactive application - WebFlux

last time I am thinking about proper using logger in our applications.
For example, I have a controller which returns a stream of users but in the log, I see the "Fetch Users" log is being logged by another thread than the thread on the processing pipeline but is it a good approach?
#Slf4j
class AwesomeController {
#GetMapping(path = "/users")
public Flux<User> getUsers() {
log.info("Fetch users..");
return Flux.just(...)..subscribeOn(Schedulers.newParallel("my-custom"));
}
}
In this case, two threads are used and from my perspective, not a good option, but I can't find good practices with loggers in reactive applications. I think below approach is better because allocation memory is from processing thread but not from spring webflux thread which potential can be blocking but logger.
#GetMapping(path = "/users")
public Flux<User> getUsers() {
return Flux.defer(() -> {
return Mono.fromCallable(() -> {
log.info("Fetch users..");
.....
})
}).subscribeOn(Schedulers.newParallel("my-custom"))
}
The normal thing to do would be to configure the logger as asynchronous (this usually has to be explicit as per the comments, but all modern logging frameworks support it) and then just include it "normally" (either as a separate line as you have there, or in a side-effect method such as doOnNext() if you want it half way through the reactive chain.)
If you want to be sure that the logger's call isn't blocking, then use BlockHound to make sure (this is never a bad idea anyway.) But in any case, I can't see a use case for your second example there - that makes the code rather difficult to follow with no real advantage.
One final thing to watch out for - remember that if you include the logging statement separately as you have above, rather than as part of the reactive chain, then it'll execute when the method at calltime rather than subscription time. That may not matter in scenarios like this where the two happen near simultaneously, but would be rather confusing if (for example) you're returning a publisher which may be subscribed to multiple times - in that case, you'd only ever see the "Fetch users..." statement once, which isn't obvious when glancing through the code.

Difference between FireAndForget and Async behavior for publishing

Currently, we are using StackExchange.Redis and, as it does not provides "blocking pops", we are doing as suggested on the documentation:
db.ListLeftPush(key, newWork, flags: CommandFlags.FireAndForget);
sub.Publish(channel, "");
What is the difference from this to the following?
db.ListLeftPushAsync(key, newWork);
sub.Publish(channel, "");
We know the purpose of the commands, what we would like to know is if it has any difference internally or any risk of behaving differently? (Execution order etc.)
There's a main difference comparing fire and forget vs calling an async operation and not awaiting it.
Fire and forget means that not only you're not waiting for the result but you don't care if it works or not, while an async operation may throw an exception once it has ended if something goes wrong.
In the other hand, when you issue a fire and forget command, StackExchange.Redis doesn't try to retrieve the command result internally, which is better if you just want the so-called fire and forget behavior when issuing commands.
You may check this difference if you open ConnectionMultiplexer source code and you see how ExecuteAsyncImpl / ExecuteSyncImpl methods are implemented:
// For example, ExecuteAsyncImpl...
if (message.IsFireAndForget)
{
TryPushMessageToBridge(message, processor, null, ref server);
return CompletedTask<T>.Default(null); // F+F explicitly does not get async-state
}
else
{
var tcs = TaskSource.CreateDenyExecSync<T>(state);
var source = ResultBox<T>.Get(tcs);
if (!TryPushMessageToBridge(message, processor, source, ref server))
{
ThrowFailed(tcs, ExceptionFactory.NoConnectionAvailable(IncludeDetailInExceptions, message.Command, message, server));
}
return tcs.Task;
}
Answer to some OP comment
Hi. Thanks for your answer. We know the purpose of the commands, what
we would like to know is if it has any differrence internally or any
risk of behaving differently (execution order etc.)
Since the async operation won't be finished when you publish the message on the Redis channel, it can happen that you publish the message and the operation gets executed never. You lose a lot of control.
When you send a fire and forget command, it mightn't be executed too, but you know that the try was done before you publish the channel's message. Therefore, you shouldn't use async operations to implement fire and forget pattern when using StackExchange.Redis.
You may check this other related Q&A: Stackexchange.redis does fire and forget guarantees delivery?

what is the best use of BusyIndicator?

Just want to understand the usage of busy indicator does it alternate to timeout/putting wait etc.
for example have following line of code in mainfunct()
1. busy.show();
2. callcustom(); --asynch function without callback this is calling xmlhttpRequest etc.
3. busy.hide();
4. callanothercustom(); -- asynch function without callback
now question is
does line 4 will be executed only when busy.hide() completes and
line 3 only when line 2 is completed while without busy all (2,4)
will be called inside mainfunct() without waiting line 2 to
complete...
when busy.hide() is being called is there any timer setup which
holds until line 2 finishes and then hide and call line 4.
A busyIndicator's show and hide functions only control when to display the indicator and when to hide the indicator. They have no effect what-so-ever on anything else going on in your code.
In other words your code is basically:
callcustom()
callanothercustom()
In your customcode you can still make sure that callanothercustom will be called only when it's finished by adding your own callback... I assume this is AJAX inside of it, so: jQuery ajax success callback function definition
function callcustom() {
$.ajax({
url : 'example.com',
type: 'GET',
success : callanothercustom
})
}
And then in callanothercustom you can busy.hide...
Or any other combination of business logic - it really depends on what's going on in your code.
In my opinion, the only main use case for using a busy indicator is a Long running synchronous task that's blocking UI. Let's say greater than 2 seconds long. Hopefully, these are few are far between.
If you have asynch tasks, then the UI is not blocked and user can interact. If you are relying on the results for next steps as you imply above, then you must have a callback/promise to trigger the next steps. If you want the user to be blocked until the async task is complete, then treat it as a synch task and show the Busy.
Be aware, use of Busy Indicator is now seen mostly as an anti-pattern. Its basically yelling at your user "See how slow this app is!". Sometimes you cannot avoid dead time in your app, such as fetching a large block of data to generate a view, but there are many ways to mitigate this. An example may be to -- get something on the view as fast as possible (< 1 sec), and then backfill with larger data. Just always ask yourself WHY you need this Busy, and can I work out a way to avoid it, but not leave the user wondering what the app is doing.

What to do when users generate the same action several time waiting for download?

I am designing an IPhone application. User search something. We grab data from the net. Then we update the table.
THe pseudocode would be
[DoThisAtbackground ^{
LoadData ();
[DoThisAtForeground ^{
UpdateTableAndView();
}];
}];
What about if before the first search is done the user search something else.
What's the industry standard way to solve the issue?
Keep track which thread is still running and only update the table
when ALL threads have finished?
Update the view every time a thread finish?
How exactly we do this?
I suggest you take a look at the iOS Human Interface Guidelines. Apple thinks it's pretty important all application behave in about the same way, so they've written an extensive document about these kind of issues.
In the guidelines there are two things that are relevant to your question:
Make Search Quick and Rewarding: "When possible, also filter remote data while users type. Although filtering users' typing can result in a better search experience, be sure to inform them and give them an opportunity to opt out if the response time is likely to delay the results by more than a second or two."
Feedback: "Feedback acknowledges people’s actions and assures them that processing is occurring. People expect immediate feedback when they operate a control, and they appreciate status updates during lengthy operations."
Although there is of course a lot of nonsense in these guidelines, I think the above points are actually a good idea to follow. As a user, I expect something to happen when searching, and when you update the view every time a thread is finished, the user will see the fastest response. Yes, it might be results the user doesn't want, but something is happening! For example, take the Safari web browser in iOS: Google autocomplete displays results even when you're typing, and not just when you've finished entering your search query.
So I think it's best to go with your second option.
If you're performing the REST request for data to your remote server you can always cancel the request and start the new one without updating the table, which is a way to go. Requests that have the time to finish will update UI and the others won't. For example use ASIHTTPRequest
- (void)serverPerformDataRequestWithQuery:(NSString *)query andDelegate:(__weak id <ServerDelegate)delegate {
[currentRequest setFailedBlock:nil];
[currentRequest cancel];
currentRequest = [[ASIHTTPRequest alloc] initWithURL:kHOST];
[currentRequest startAsynchronous];
}
Let me know if you need an answer for the local SQLite databases too as it is much more complicated.
You could use NSOperationQueue to cancel all pending operations, but it still would not cancel the existing operation. You would still have to implement something to cancel the existing operation... which also works to early-abort the operations in the queue.
I usually prefer straight GCD, unless there are other benefits in my use cases that are a better fit for NSOperationQueue.
Also, if your loading has an external cancel mechanism, you want to cancel any pending I/O operations.
If the operations are independent, consider a concurrent queue, as it will allow the newer request to execute simultaneously as the other(s) are being canceled.
Also, if they are all I/O, consider if you can use dispatch_io instead of blocking a thread. As Monk would say, "You'll thank me later."
Consider something like this:
- (void)userRequestedNewSearch:(SearchInfo*)searchInfo {
// Assign this operation a new token, that uniquely identifies this operation.
uint32_t token = [self nextOperationToken];
// If your "loading" API has an external abort mechanism, you want to keep
// track of the in-flight I/O so any existing I/O operations can be canceled
// before dispatching new work.
dispatch_async(myQueue, ^{
// Try to load your data in small pieces, so you can exit as early as
// possible. If you have to do a monolithic load, that's OK, but this
// block will not exit until that stops.
while (! loadIsComplete) {
if ([self currentToken] != token) return;
// Load some data, set loadIsComplete when loading completes
}
dispatch_async(dispatch_get_main_queue(), ^{
// One last check before updating the UI...
if ([self currentToken] != token) return;
// Do your UI update operations
});
});
}
It will early-abort any operation that is not the last one submitted. If you used NSOperationQueue you could call cancelAllOperations but you would still need a similar mechanism to early-abort the one that is currently executing.

Ncqrs recreate the complete ReadModel

Using Ncqrs, is there a way to replay every single event ever happened (all aggregate types) and feed these through my denormalizers in order to recreate the whole read model from scratch?
Edit:
I though it's be good to provide a more specific use case. I'm building this inside a ASP.NET MVC application and using Entity Framework (Code first) for working with the read models. In order to speed up development (and because I'm lazy), I want to use a database initializer that recreates the database schemas once any read model changes. Then using the initializer's seed method to repopulate them.
There is unfortunately nothing built in to do this for you (though I haven't updated the version of ncqrs I use in quite a while so perhaps that's changed). It is also somewhat non-trivial to do it since it depends on exactly what you want to do.
The way I would do it (up to this point I have not had a need) would be to:
Call to the event store to get all relevant events
Depending on what you are doing this could be all events or just the events for one aggregate root, or a subset of events for one or more aggregate roots.
Re-create the read-model in memory from scratch (to save slow and unnecessary writing)
Store the re-created read-model in place of the existing one
Call to the event store one more time to get any events that may have been missed
Repeat until there are no new events being returned
One thing to note, if you are recreating the entire read-model database from scratch I would off-line the service temporarily or queue up new events until you finish.
Again there are different ways you could approach this problem, your architecture and scenarios will probably dictate how best to do it.
We use a MsSqlServerEventStore, to replay all the events I implemented the following code:
var myEventBus = NcqrsEnvironment.Get<IEventBus>();
if (myEventBus == null) throw new Exception("EventBus is not found in NcqesEnvironment");
var myEventStore = NcqrsEnvironment.Get<IEventStore>() as MsSqlServerEventStore;
if (myEventStore == null) throw new Exception("MsSqlServerEventStore is not found in NcqesEnvironment");
var myEvents = myEventStore.GetEventsAfter(GetFirstEventIdFromEventStore(), int.MaxValue);
myEventBus.Publish(myEvents);
This will push all the events on the eventbus and the denormalizers will process all the events. The function GetFirstEventIdFromEventStore just queries the eventstore and returns the first Id from the eventstore (where SequentialId = 1)
What I ended up doing is the following. At the service startup, before any commands are being processed, if the read model has changed, I throw it away and recreate it from scratch by processing all past events in my denormalizers. This is done in the database initializer's seed method.
This was a trivial task using the MS SQL event storage as there was a method for retrieving all events. However, I'm not sure about other event storages.