Cleanup files when using NServiceBus FileShareDataBus - nservicebus

My question is kind of similar to this question but I think the response is not answering the question at all.
To elaborate,
I have the following code piece:
Configuration:
BusConfiguration busConfiguration = new BusConfiguration();
busConfiguration.EndpointName("Samples.DataBus.Sender");
busConfiguration.UseSerialization<JsonSerializer>();
busConfiguration.UseDataBus<FileShareDataBus>().BasePath(BasePath);
busConfiguration.UsePersistence<InMemoryPersistence>();
busConfiguration.EnableInstallers();
using (IBus bus = Bus.Create(busConfiguration).Start())
....
Message:
[TimeToBeReceived("00:01:00")]
public class MessageWithLargePayload : ICommand
{
public string SomeProperty { get; set; }
public DataBusProperty<byte[]> LargeBlob { get; set; }
}
This works fine (it creates queues, sends messages in the queue, creates a file for the LargeBlob property and stores it in the base path, receiver takes the message and handles it).
My question is: Is there any way to remove the created files (LargeBlob) after the message has been handled or taken out of the queue or after it lands in the error-queue.
The Documentation clearly states that files are not cleaned up, but I think this is kind of a messy behaviour, can anyone help?

Is there any way to remove the files after the message has been handled or taken out of the queue or after it lands in the error-queue.
After message handled
After taken out of the queue
After it is in error queue
I'm not really sure what you're after? You want to remove the files, but you're not sure when?
NServiceBus has no way to figure out when the file should be deleted. Perhaps you're deferring a message for the file to be processed later. Or you're giving the task to another handler. Which means that if the file is removed, there's no way that other handler can process the file. So removing the file depends on your functional needs.
When the message is in the error queue, it is most likely that you want to try and process it again. Why else put the message in an error queue, instead of just removing the message altogether?
Besides that, the file system isn't transactional. So there's no way for any software to tell if messages got processed correctly and the file should be deleted. And when the outbox has been enabled in NServiceBus, the message is removed from the queuing storage, but it's not been processed yet. If the file would've been removed by then, it also can't be processed anymore.
As you can tell, there are a large number of scenarios where removing the file can pose a problem. The only one who actually knows when which file could be removed, is you as a developer. You'll have to come up with a strategy to remove the files.
The sample has a class Program with a static field BasePath. Make it public so your handler can access it. Then in the handler you can obtain the file location like this:
public void Handle(MessageWithLargePayload message)
{
var filename = Path.Combine(Program.BasePath, message.LargeBlob.Key);
Console.WriteLine(filename);
UPDATE
Added some documentation about a possible cleanup strategy. We have some plans for a really good solution, but it'll take time. So for now perhaps this can help.

I solved the file cleanup challenge by creating a service that removes the files after a configurable number of hours. I will say that with a large number of bus files, you will be more successful if you do a rolling delete vs trying to do it once a day.
There are two options out on GitHub that are already coded that can be used:
PowerShell Script:
https://gist.github.com/smartindale/50674db76bd2e36cc94f
Windows Service: https://github.com/bradjolicoeur/VapoDataBus
The PowerShell script worked with low volumes, but I had issues with running reliably with larger volumes of files.
The Windows Service is reliable on large volumes and is more efficient.

Related

Apache Camel Choice(): Do Nothing

I have made an application using Apache Camel that integrates well with AWS S3. Here is the code:
Predicate newFilePred = header(NEW_FILE_RECEIVED_HEADER).isEqualTo(SUCCESS);
from(incomingEndpoint)
.process((exchange) -> {
logger.info("Checking S3 bucket.");
Date newFileUploadDate = (Date) exchange.getIn().getHeaders().get("CamelAwsS3LastModified");
if (fileIsNew(newFileUploadDate))
exchange.getIn().setHeader(NEW_FILE_RECEIVED_HEADER, SUCCESS);
} else {
exchange.getIn().setHeader(NEW_FILE_RECEIVED_HEADER, FAILURE);
}
})
.choice()
.when(newFilePred)
.to(outgoingEndpoint)
.endChoice()
.end();
The application works well, but it throws a really annoying warning seen below:
Camel Thread #1 - aws-s3://mybucket] | c.a.s.s.i.S3AbortableInputStream Not all bytes were read from the
S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior.
Request only the bytes you need via a ranged GET or drain the input stream after use. request_id=
I know the problem is that I dont direct the output of the file anywhere if the header "NEW_FILE_RECEIVED_HEADER" is set to failure. This is by design because I do not want to download the file in this case. Is there anyway to tell Camel to abort the connection properly after I have identified the file as "unwanted"? I could create another camel route directly to a "trash" directory, but these would be useless cycles.
Thank you for your help!
For lack of a better solution, I managed to mute these messages by implementing a garbage/failure route that finishes the http get/input stream from S3.
If someone comes up with a better method or a different approach entirely, please feel free to "beat" this answer. The underlying rule that was discovered is that a finished route (A -> B) is required or a warning will be thrown. The warnings are harmless unless you view a polluted log file like I do.
Finished route conditionals look like this:
.choice()
.when(newFilePred)
.to(successfulOutgoingEndpoint)
.otherwise()
.to(failureOutgoingEndpoint);

Posting a Task to the Web Consoles Execution(Management) Context

In the apache brooklyn web interface we would like to display some content for the sytsem managers. The content is too long to be served as a simple sensor value.
Our idea was to create a task and write the content into the output stream of the task, and then offer the REST based URL to the managers like this:
/v1/activities/{task}/stream/stdout (Of course the link masked with some nice text)
The stream and task is created like this:
LOG.info("{} Creating Activity for ClusterReport Feed", this);
activity = Tasks.builder().
displayName("clusterReportFeed").
description("Output for the Cluster Report Feed").
body(new Runnable() {
#Override
public void run() {
//DO NOTHING
}
}).
parallel(true).
build();
LOG.info("{} Task Created with Id: " + activity.getId(), this);
Entities.submit(server, activity).getUnchecked();
The task seems to be created and the interraction works perfectly fine.
However when I want to access the tasks output stream from my browser using a prepared URL I get the error that the task does not exist.
Our idea is that we are not in the right Management/Execution Context. The Web page is running in an other context compared to the entities and their sensors. How can we put a task so that it's visible for the web consoles context also.
Is it possible to write the content into a file and then offer it for download via Jetty(brooklyns web server)? That would be a much simpler way.
Many tasks in Brooklyn default to being transient - i.e. they are deleted shortly after they complete (things like effector invocations are by default non-transient).
You can mark your task as non-transient using the code below in your use of the task builder:
.tag(BrooklynTaskTags.NON_TRANSIENT_TASK_TAG)
However, note that (as of Brooklyn version 0.9.0) tasks are kept in-memory using soft references. This means the stdout of the task will likely be lost at some point in the future, when that memory is needed for other in-memory objects.
For your use-case, would it make sense to have this as an effector result perhaps?
Or could you write to an object store such as S3 instead? The S3-approach would seem best to me.
For writing it to a file, care must be taken when used with Brooklyn high-availability. Would you write to a shared volume?
If you do write to a file, then you'd need to provide a web-extension so that people can access the contents of that file. As of Brooklyn 0.9.0, you can add your own WARs in code when calling BrooklynLauncher (which calls BrooklynWebServer).

Which ServiceControl.Contracts messages should IHandle?

I'm trying to make sure that 3rd party dependencies are running, and built a service to do this based on the Monitoring 3rd party Sample Application, which emits ServiceControl CheckResult messages.
This works fine; ServicePulse alerts me when I stop/start my local and remote windows services, Databases, Flux Capacitors, etc.
I now want to build a windows service / nServiceBus Endpoint, like ServicePulse, but with logic that can attempt recovery, send emails etc. I don't really want to put this code into the 3rdParty monitor.
I followed the servicecontrol/external-integrations and servicecontrol/contracts tutorials, and created my MendStuffOrEmail endpoint - But it doesn't work; It doesn't receive any messages.
I was going to ask "what am I doing wrong?", but I think I know; I'm using IHandleMessages<ServiceControl.Contracts.MessageFailed> which is for failed messages.
I need to listen for the "CheckResult" type messages - but what are they? I have looked through the ServiceControl and ServicePulse code, but cannot work out what is being sent/received. How can I find this out, or has anyone else actually done this and already knows?
UPDATE
After more extensive rummaging, I also subscribed to CustomCheckFailed and CustomCheckSucceeded messages. I implemented IHandle interfaces for them, but I'm still not getting any messages. The log shows autosubscriber has taken out a subscription to them. What should I check for next?
I compared my code to Sean's posted
example and found the mistake:
I had implemented two of the interfaces, IConfigureThisEndpoint and AsA_Server in the wrong class (a 2am cut 'n' paste error).
The example listens for failed messages, but for anyone else trying to do this, you do need to subscribe to CustomCheckFailed and CustomCheckSucceeded messages (nuget ServiceControl.Contracts).
public partial class MessageHandler : IHandleMessages<CustomCheckFailed>,
IHandleMessages<CustomCheckSucceeded>
{
public void Handle(CustomCheckFailed message)
{
this.HandleImplementation(message);
}
partial void HandleImplementation(CustomCheckFailed message);
public void Handle(CustomCheckSucceeded message)
{
this.HandleImplementation(message);
}
partial void HandleImplementation(CustomCheckSucceeded message);
public IBus Bus { get; set; }
}
then the logic to do something with the messages. (I left in my original test - sending email - but our system has a library with all sorts of recovery & notification methods. You'll need something similar to stop an email flood):
public partial class MessageHandler
{
partial void HandleImplementation(CustomCheckFailed message)
{
var messageBody = string.Format("Message with id {0} failed with reason {1}", message.CustomCheckId, message.FailureReason);
MailMessageFactory.sendEmail("Failure Notification", messageBody);
Console.Out.WriteLine(messageBody);
}
}
And a similar file with the logic for recovery messages (CustomCheckSucceeded). You probably want a check in there to detect it is actually recovering from a failure, not just passing the test.
So anyway, fixed - on my dev pc.
The next problem was making it work on the server, which took a support call. It turns out ServiceControl ALSO needs a licence, available as part of the "Advanced", "Enterprise", and "Ultimate" editions - Not part of the platform on the standard licence.

Why configure with Sagas()?

Why is it necessary to configure with Sagas()? I ask because I had been running a saga with raven persistence for the last few months before I noticed the Sagas() is not in the configure.with, in fact I realized I was missing a bit of the RavenPersistence stuff as well. Yet, as far as I know Sagas have been working 98% of the time and persisting to Raven. So I wonder what the Sagas() configuration does differently than not configuring it.
The reason I say 98% of the time is I do notice random messages falling out of a method and not sending the next message it is designated to send in the Saga. I am curious if not having the proper configuration is the cause of this.
_logger.InfoFormat("1.1 - Preparing Saga for; File: {0}", message.FileNumber);
//Creates Saga information
SetupSaga(uploads,
message.Documents,
message.ProcedureID.GetValueOrDefault(0),
file.Client.Id,
message.FileNumber,
message.Stage,
user);
_logger.InfoFormat("1.2 - Upload Saga Unique ID; File: {0}, UniqueID: {1}", message.FileNumber, Data.UniqueID);
Bus.SendLocal(new GetLoanInformation {
UniqueID = Data.UniqueID
});
The NServiceBus Host does a lot of configuration automatically based on roles and profiles. Both the Sagas configuration and the Raven persistence are handled for you automatically. You would only need to do this manually if you were going to run a Saga when self-hosting, which would be somewhat rare.
For a better idea of what happens as a result of all the different roles and profiles, check out All About NServiceBus Host Profiles and Roles. (Disclaimer: This is my blog post.)
The problem you're mentioning is due to something else, but a lot more information would be required to diagnose it.

Why is Mage_Persistent breaking /api/wsdl?soap

I get the following error within Magento CE 1.6.1.0
Warning: session_start() [<a href='function.session-start'>function.session-start</a>]: Cannot send session cookie - headers already sent by (output started at /home/dev/env/var/www/user/dev/wdcastaging/lib/Zend/Controller/Response/Abstract.php:586) in /home/dev/env/var/www/user/dev/wdcastaging/app/code/core/Mage/Core/Model/Session/Abstract/Varien.php on line 119
when accessing /api/soap/?wsdl
Apparently, a session_start() is being attempted after the entire contents of the WSDL file have already been output, resulting in the error.
Why is magento attempting to start a session after outputting all the datums? I'm glad you asked. So it looks like controller_front_send_response_after is being hooked by Mage_Persistent in order to call synchronizePersistentInfo(), which in turn ends up getting that session_start() to fire.
The interesting thing is that this wasn't always happening, initially the WSDL loaded just fine for me, initially I racked my brains to try and see what customization may have been made to our install to cause this, but the tracing I've done seems to indicate that this is all happening entirely inside of core.
We have also experienced a tiny bit of (completely unrelated) strangeness with Mage_Persistent which makes me a little more willing to throw my hands up at this point and SO it.
I've done a bit of searching on SO and have found some questions related to the whole "headers already sent" thing in general, but not this specific case.
Any thoughts?
Oh, and the temporary workaround I have in place is simply disabling Mage_Persistent via the persistent/options/enable config data. I also did a little bit of digging as to whether it might be possible to observe an event in order to disable this module only for the WSDL controller (since that seems to be the only one having problems), but it looks like that module relies exclusively on this config flag to determine it's enabled status.
UPDATE: Bug has been reported: http://www.magentocommerce.com/bug-tracking/issue?issue=13370
I'd report this is a bug to the Magento team. The Magento API controllers all route through standard Magento action controller objects, and all these objects inherit from the Mage_Api_Controller_Action class. This class has a preDispatch method
class Mage_Api_Controller_Action extends Mage_Core_Controller_Front_Action
{
public function preDispatch()
{
$this->getLayout()->setArea('adminhtml');
Mage::app()->setCurrentStore('admin');
$this->setFlag('', self::FLAG_NO_START_SESSION, 1); // Do not start standart session
parent::preDispatch();
return $this;
}
//...
}
which includes setting a flag to ensure normal session handling doesn't start for API methods.
$this->setFlag('', self::FLAG_NO_START_SESSION, 1);
So, it sounds like there's code in synchronizePersistentInf that assumes the existence of a session object, and when it uses it the session is initialized, resulting in the error you've seen. Normally, this isn't a problem as every other controller has initialized a session at this point, but the API controllers explicitly turns it off.
As far as fixes go, your best bet (and probably the quick answer you'll get from Magento support) will be to disable the persistant cart feature for the default configuration setting, but then enable it for specific stores that need it. This will let carts
Coming up with a fix on your own is going to be uncharted territory, and I can't think of a way to do it that isn't terribly hacky/unstable. The most straight forward way would be a class rewrite on the synchronizePersistentInf that calls it's parent method unless you've detected this is an API request.
This answer is not meant to replace the existing answer. But I wanted to drop some code in here in case someone runs into this issue, and comments don't really allow for code formatting.
I went with a simple local code pool override of Mage_Persistent_Model_Observer_Session to exit out of the function for any URL routes that are within /api/*
Not expecting this fix to need to be very long-lived or upgrade-friendly, b/c I'm expecting them to fix this in the next release or so.
public function synchronizePersistentInfo(Varien_Event_Observer $observer)
{
...
if ($request->getRouteName() == 'api') {
return;
}
...
}