troubleshooting qmail delivery failure - sysadmin

I've got a puzzling qmail situation (at least to me, but I am not a qmail expert, this is a work-related issue that I'm looking at by default...)
There are about a dozen emails sent to a particular address that have been in the qmail queue on our mailserver for 2 days. In the logs I see each delivery attempt ends with
delivery NNNN: deferral: Connected_to_XXX.XXX.X.XX_but_connection_died._Possible_duplicate!_(#4.4.2)/
All this time, literally thousands of other emails to the exact same address are being sent successfully. There's nothing I can see about the dozen failed messages that is different, although that is the only logical explanation I can think of.
The destination host (XXX.XXX.X.XX) is a machine on our own internal network.
Any suggestions on what I could check? Any way to get additional details on what happened to cause 'connection died'? E.g., perhaps the email server at the destination rejected the messages due to some particular content?
Any suggestions will be much appreciated.

it is possible that the mail is looping.
If the mail header contains "Delivered To:" line then qmail-local usually gives a deferral.
Also check the destination server for any logs, if multiple copies of same mail are been hit there for any random reasons.

Related

Firebase Error Message - The document has moved https://fcm.googleapis.com/batch?google_abuse=GOOGLE_ABUSE_EXEMPTION

We are sending lots and lots of FCM Messages to our millions of users. As the message is triggered by an external event (Kick off in a football match) we sent many messages at the same time.
Sometimes the sending of an FCM message fails and we get an error message like this:
<H1>302 Moved</H1> The document has moved
<A HREF="https://fcm.googleapis.com/batch?
google_abuse=GOOGLE_ABUSE_EXEMPTION%3DID%....3B+expires%3DTue,+22-Nov-
2022+19:04:17+GMT">here</A>. </BODY></HTML>
(I removed some text for privacy reasons.)
For sending the messages we use
implementation 'com.google.firebase:firebase-admin:9.1.0'
We got thousands of error messages like this in one minute. In the next minute everthing worked fine again.
I have search the internet for information about it. But i couldn't find any abuse rules for FCM. Does anybody has information about this kind of error?
firebaser here
At first glance, your project may be getting throttled, but it may also be another problem in the API calls or the FCM backend. It’ll be challenging to pinpoint or even narrow down what is the specific cause of the error on a public forum without going into project-specific details. I would recommend reaching out to Firebase support as they can offer personalized help. Please provide the latest request and response (with timestamp) you have so they can check what happened to the message delivery.

Dealing with dead letters in RabbitMQ

TL;DR: I need to "replay" dead letter messages back into their original queues once I've fixed the consumer code that was originally causing the messages to be rejected.
I have configured the Dead Letter Exchange (DLX) for RabbitMQ and am successfully routing rejected messages to a dead letter queue. But now I want to look at the messages in the dead letter queue and try to decide what to do with each of them. Some (many?) of these messages should be replayed (requeued) to their original queues (available in the "x-death" headers) once the offending consumer code has been fixed. But how do I actually go about doing this? Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to? And what about searching the dead letter queue? What if I know that a message (let's say which is encoded in JSON) has a certain attribute that I want to search for and replay? For example, I fix a defect which I know will allow message with PacketId: 1234 to successfully process now. I could also write a one-off program for this I suppose.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them. It seems like there should be some sort of Swiss Army Knife for this sort of thing. I did a pretty extensive search on Google and Stack Overflow but didn't really come up with much. The closest thing I could find were shovels but that doesn't really seem like the right tool for the job.
Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to?
generally speaking, yes.
you could set up a delayed re-try to resend the message back to the original queue, using a combination of the delay message exchange plugin.
but this would only automate the retries on an interval, and you may not have fixed the problem before the retries happen.
in some circumstances this is ok - like when the error is caused by an external resource being temporarily unavailable.
in your case, though, i believe your thoughts on creating an app to handle the dead letters is the best way to go, for several reasons:
you need to search through the messages, which isn't possible RMQ
this means you'll need a database to store the messages from the DLX/queue
because you're pulling the messages out of the DLX/queue, you'll need to ensure you get all the header info from the message so that you can re-publish to the correct queue when the time comes.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them.
and you're not!
there are many solutions to this problem that all come down to the solution you've suggested.
some larger "service bus" implementations have this type of feature built in to them. i believe NServiceBus (or the SaaS version of it) has this built in, for example - though I'm not 100% sure of it.
if you want to look into this further, do some search for the term "poison message" - this is generally the term used for this situation. I've found a few things on google with a quick search, that may help you down the path:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-January/025019.html
https://web.archive.org/web/20170809194056/http://tafakari.co.ke/2014/07/rabbitmq-poison-messages/
https://web.archive.org/web/20170809170555/http://kjnilsson.github.io/blog/2014/01/30/spread-the-poison/
hope that helps!

Can a CAN message have two reliable recipients?

In my situation multiple modules report their state over a CAN bus to a central processor, which replies and drives them. There's also a supervising processor, which listens in on the CAN bus and analyzes incoming messages from the modules for critically dangerous situations (two different modules reporting activating outputs which are absolutely forbidden from being activated simultaneously).
This all works okay as long as the CAN bus is noise-free.
CAN bus guarantees the recipient to receive a message; the message will be resent if no recipient confirms receiving it. The problem begins if there's more than one recipient and all of them absolutely must receive the message.
If the line is clean, both receive it, confirm it, and everything is okay.
If the message is badly damaged, neither will receive it, and it will be resent. That's okay.
But if the noise on the line is "just on the brink", one of them will receive it, and confirm, and the other will fail to receive it (noise on its end of the bus just minimally worse), and since the sender got the confirmation, the message won't be resent.
Is there a reliable way to assure two different recipients of a message both receive it? ...other than sending two messages with two addresses, specifically? (it's essential that the supervising CPU hears the same messages as the main CPU, not just similar)
There is no way at the CAN layer to detect receipt by more than one module. You would need to add messages to your communication protocol to confirm receipt if this is absolutely critical. As mentioned, you could have each module receive the same message and send a unique reply.
Some general thoughts:
1) Are the important messages broadcast periodically? If so, the recipient could test that the periodicity of the message is correct and fail safely if the period is violated.
2) CAN is a very robust network. In my many years, I have not seen noise affecting a single node like you described other than when the node was at the end of a exceedingly (and irrationally) long wire. You are correct to worry about this scenario and design your message format and system to be robust to all CAN failures. Generally, when safety or reliability was paramount, we would have more than one CAN bus communicating the information along with a number of crosscheck messages to verify that not only the path was intact but the device on the other end was operating intelligently. Our general assumption was that if crosscheck messages were making the trip, then our operational messages were making the trip successfully as well.
Obviously not.
It fails even in the simple case, that one receiver is shutdown.
There is no possibility for the master to detect this (for this single packet).
You need an advanced CAN, with more acknowledge slots, for each recipients one slot.
But you could request that each reciepient has to confirm the message with a unique response message.
So your master can detect by a timeout that not all reciepent received the message.

Re-getting POP3 messages

I am using Peter Huber's POP3 client to connect to gmail and download messages.
The inboxes being accessed are transactional inboxes used only for code-access. That is, a message comes in with a order file attached, code will process it and then delete the message. One stipulation of the code though was a DEBUG flag, which if set would prevent the code from deleting the message so that you can run the program again later without the debug flag and reprocess the message. So, in my code I have
If Not Arguments.Debug Then pop.DeleteEmail(eid)
This works fine. Problem is, even when not deleting the message, running the program a second time will not re-retrieve the message, even though if I login to gmail and look at the inbox, it is still there. The only way I can get the program to see the message again is to forward the message back to the same inbox. But in Peter's code I do not see anywhere where he is keeping track of seen messages between sessions.
Is this something that is done on gmail's end? Refusing to deliver a message to the same client a second time? If so, is there any way I can change my gmail account so that it will always show all messages in the inbox to a client when retrieving the list of messages, even ones already "seen"? I don't see anything in the gmail settings screen.
UPDATE: I tried adding a method to send a RSET command to the server, as per this comment on the codeproject page. I then call my new Reset() method after retrieving my messages but before disconnecting, ... but I still have the same problem.
Okay... found a "sort of" answer after reading through pages of the comments on the codeproject project.
According to this comment, the RSET command does not actually do anything when you are dealing with gmail's servers.
The "answer" is to prepend your username with the string "recent:", so instead of logging in with [myaccount#gmail.com] you log in with [recent:myaccount#gmail.com]. Rather hackish, ... but it works.

Can I Redirect mis-addressed message on outgoing MSMQ?

I have a bunch of message on an outgoing MSMQ that have been addressed incorrectly due to a typo. They are stuck on the outgoing queue since the box they have been directed at does not exist.
The name of the queue is of the form
DIRECT=OS:foo\private$\MyQueue and is in state Waiting to connect
and I want to send to
DIRECT=OS:bar\private$\MyQueue
Will an entry in the hosts file suffice, or if not is there any other way that I can go this? I don't want to have to create a server called and a process that recieves the messages and forwards them on to the correct server if at all possible
I tried the idea I suggested above of adding the incorrect entry into the Hosts file with the correct IP address. This caused the messages to get dispatched to the remote computer, but I think WCF dropped them as they didn't have the correct endpoint adress.
Not recommended if you don't want to lose the messages! Fortunately for me these were not critical, so it was worth a try.
Oh well
Modan,
I experienced something similar years ago and the only way I could find was to setup the box and Q (in a VM) and forward them to correct queue. Maybe you can delete and resend the messages with the fix? Is that possible? Unfortunately don't think there's a good way around this.
-Bryan