How should I avoid sending duplicate emails using mailgun, taskqueue and ndb? - app-engine-ndb

I am using the taskqueue API to send multiple emails is small groups with mailgun. My code looks more or less like this:
class CpMsg(ndb.Model):
group = ndb.KeyProperty()
sent = ndb.BooleanProperty()
#Other properties
def send_mail(messages):
"""Sends a request to mailgun's API"""
# Some code
pass
class MailTask(TaskHandler):
def post(self):
p_key = utils.key_from_string(self.request.get('p'))
msgs = CpMsg.query(
CpMsg.group==p_key,
CpMsg.sent==False).fetch(BATCH_SIZE)
if msgs:
send_mail(msgs)
for msg in msgs:
msg.sent = True
ndb.put_multi(msgs)
#Call the task again in COOLDOWN seconds
The code above has been working fine, but according to the docs, the taskqueue API guarantees that a task is delivered at least once, so tasks should be idempotent. Now, most of the time this would be the case with the above code, since it only gets messages that have the 'sent' property equal to False. The problem is that non ancestor ndb queries are only eventually consistent, which means that if the task is executed twice in quick succession the query may return stale results and include the messages that were just sent.
I thought of including an ancestor for the messages, but since the sent emails will be in the thousands I'm worried that may mean having large entity groups, which have a limited write throughput.
Should I use an ancestor to make the queries? Or maybe there is a way to configure mailgun to avoid sending the same email twice? Should I just accept the risk that in some rare cases a few emails may be sent more than once?

One possible approach to avoid the eventual consistency hurdle is to make the query a keys_only one, then iterate through the message keys to get the actual messages by key lookup (strong consistency), check if msg.sent is True and skip sending those messages in such case. Something along these lines:
msg_keys = CpMsg.query(
CpMsg.group==p_key,
CpMsg.sent==False).fetch(BATCH_SIZE, keys_only=True)
if not msg_keys:
return
msgs = ndb.get_multi(msg_keys)
msgs_to_send = []
for msg in msgs:
if not msg.sent:
msgs_to_send.append(msg)
if msgs_to_send:
send_mail(msgs_to_send)
for msg in msgs_to_send:
msg.sent = True
ndb.put_multi(msgs_to_send)
You'd also have to make your post call transactional (with the #ndb.transactional() decorator).
This should address the duplicates caused by the query eventual consistency. However there still is room for duplicates caused by transaction retries due to datastore contention (or any other reason) - as the send_mail() call isn't idempotent. Sending one message at a time (maybe using the task queue) could reduce the chance of that happening. See also GAE/P: Transaction safety with API calls

Related

Persist detailed information about failed Item processing

I´ve got a Job that runs a TaskletStep, then a chunk-based step and then another TaskletStep.
In each of these steps, errors (in the form of Exceptions) can occur.
The chunk-based step looks like this:
stepBuilderFactory
.get("step2")
.chunk<SomeItem, SomeItem>(1)
.reader(flatFileItemReader)
.processor(itemProcessor)
.writer {}
.faultTolerant()
.skipPolicy { _ , _ -> true } // skip all Exceptions and continue
.taskExecutor(taskExecutor)
.throttleLimit(taskExecutor.corePoolSize)
.build()
The whole job definition:
jobBuilderFactory.get("job1")
.validator(validator())
.preventRestart()
.start(taskletStep1)
.next(step2)
.next(taskletStep2)
.build()
I expected that Spring Batch somehow picks up the Exceptions that occur along the way, so I can then create a Report including them after the Job has finished processing. Looking at the different contexts, there´s also fields that should contain failureExceptions. However, it seems there´s no such information (especially for the chunked step).
What would be a good approach if I need information about:
what Exceptions did occur in which Job execution
which Item was the one that triggered it
The JobExecution provides a method to get all failure exceptions that happened during the job. You can use that in a JobExecutionListener#afterJob(JobExecution jobExecution) to generate your report.
In regards to which items caused the issue, this will depend on where the exception happens (during the read, process or write operation). For this requirement, you can use one of the ItemReadListener, ItemProcessListener or ItemWriteListener to keep record of the those items (For example, by adding them to the job execution context to be able to get access to them in the JobExecutionListener#afterJob method for your report).

Can a telegram bot block a specific user?

I have a telegram bot that for any received message runs a program in the server and sends its result back. But there is a problem! If a user sends too many messages to my bot(spamming), it will make server so busy!
Is there any way to block the people whom send more than 5 messages in a second and don't receive their messages anymore? (using telegram api!!)
Firstly I have to say that Telegram Bot API does not have such a capability itself, Therefore you will need to implement it on your own and all you need to do is:
Count the number of the messages that a user sends within a second which won't be so easy without having a database. But if you have a database with a table called Black_List and save all the messages with their sent-time in another table, you'll be able to count the number of messages sent via one specific ChatID in a pre-defined time period(In your case; 1 second) and check if the count is bigger than 5 or not, if the answer was YES you can insert that ChatID to the Black_List table.
Every time the bot receives a message it must run a database query to see that the sender's chatID exists in the Black_List table or not. If it exists it should continue its own job and ignore the message(Or even it can send an alert to the user saying: "You're blocked." which I think can be time consuming).
Note that as I know the current telegram bot API doesn't have the feature to stop receiving messages but as I mentioned above you can ignore the messages from spammers.
In order to save time, You should avoid making a database connection
every time the bot receives an update(message), instead you can load
the ChatIDs that exist in the Black_List to a DataSet and update the
DataSet right after the insertion of a new spammer ChatID to the
Black_List table. This way the number of the queries will reduce
noticeably.
I have achieved it by this mean:
# Using the ttlcache to set a time-limited dict. you can adjust the ttl.
ttl_cache = cachetools.TTLCache(maxsize=128, ttl=60)
def check_user_msg_frequency(message):
print(ttl_cache)
msg_cnt = ttl_cache[message.from_user.id]
if msg_cnt > 3:
now = datetime.now()
until = now + timedelta(seconds=60*10)
bot.restrict_chat_member(message.chat.id, message.from_user.id, until_date=until)
def set_user_msg_frequency(message):
if not ttl_cache.get(message.from_user.id):
ttl_cache[message.from_user.id] = 1
else:
ttl_cache[message.from_user.id] += 1
With these to functions above, you can record how many messages sent by any user in the period. If a user's messages sent more than expected, he would be restricted.
Then, every handler you called should call these two functions:
#bot.message_handler(commands=['start', 'help'])
def handle_start_help(message):
set_user_msg_frequency(message)
check_user_msg_frequency(message)
I'm using pyTelegramBotAPI this module to handle.
I know I'm late to the party, but here is another simple solution that doesn't use a Db:
Create a ConversationState class to attach to each telegram Id when they start to chat with the bot
Then add a LastMessage DateTime variable to the ConversationState class
Now every time you receive a message check if enought time has passed from the LasteMessage DateTime, if not enought time has passed answer with a warning message.
You can also implement a timer that deletes the conversation state class if you are worried about performance.

NServiceBus - How to control message handler ordering when Bus.Send() occurs on different threads / processes?

Scenario:
I have a scenario where audit messages are sent via NServiceBus. The handlers insert and update a row on a preexisting database table, which we have no remit to change. The requirement is that we have control over the order that messages are handled, so that the Audit data reflects the correct system state. Messages processed out of order may cause the audit data to reflect an incorrect state.
Some of the Audit data is expected in a specific order, however some can be received at any time after the initial message, such as a status update which will be sent several times during the process.
In my test project I have been testing using a server, (specifically the ISpecifyMessageHandlerOrdering functionality) with the end point configured as follows:
public class MyServer : IConfigureThisEndpoint, AsA_Server, ISpecifyMessageHandlerOrdering
{
public void SpecifyOrder(Order order)
{
order.Specify(First<PrimaryCommand>.Then<SecondaryCommand>());
}
}
Because the explicit order of messages is not known, one message, InitialAuditMessage is the initial message, and inherits from PrimaryCommand.
Other messages which are allowed to be received at a later stage inherit from SecondaryCommand.
public class StartAuditMessage : PrimaryCommand
public class UpdateAudit1Message : SecondaryCommand
public class UpdateAudit2Message : SecondaryCommand
public class ProcessUpdateMessage : SecondaryCommand
This works in controlling the handling order of messages where they are sent from the same thread.
This breaks down however, if the messages are sent from separate threads or processes, which makes sense as there is nothing to link the messages as related.
How can I link the messages, say through an ID of some sort so that they are not processed out of order when sent from separate threads? Is this a use case for Sagas?
Also, with regard to status update messages, how can I ensure that messages of the same type are processed in the order in which they were sent?
Whenever you have a requirement for ordered processing you cannot avoid the conclusion that at some point in your processing you need to restrict everything down to a single thread. The single thread guarantees the order in which things are processed.
In some cases you can "scale out" the single thread into multiple threads by splitting the processing by a correlating identifier. The correlation ID allows you to define a logical grouping of messages within which order must be maintained. This allows you to have concurrent threads each performing ordered processing which is more efficient.

Suppressing NServicebus Transaction to write errors to database

I'm using NServiceBus to handle some calculation messages. I have a new requirement to handle calculation errors by writing them the same database. I'm using NHibernate as my DAL which auto enlists to the NServiceBus transaction and provides rollback in case of exceptions, which is working really well. However if I write this particular error to the database, it is also rolled back which is a problem.
I knew this would be a problem, but I thought I could just wrap the call in a new transaction with the TransactionScopeOption = Suppress. However the error data is still rolled back. I believe that's because it was using the existing session with has already enlisted in the NServiceBus transaction.
Next I tried opening a new session from the existing SessionFactory within the suppression transaction scope. However the first call to the database to retrieve or save data using this new session blocks and then times out.
InnerException: System.Data.SqlClient.SqlException
Message=Timeout expired. The timeout period elapsed prior to completion of the >operation or the server is not responding.
Finally I tried creating a new SessionFactory using it to open a new session within the suppression transaction scope. However again it blocks and times out.
I feel like I'm missing something obvious here, and would greatly appreciate any suggestions on this probably common task.
As Adam suggests in the comments, in most cases it is preferred to let the entire message fail processing, giving the built-in Retry mechanism a chance to get it right, and eventually going to the error queue. Then another process can monitor the error queue and do any required notification, including logging to a database.
However, there are some use cases where the entire message is not a failure, i.e. on the whole, it "succeeds" (whatever the business-dependent definition of that is) but there is some small part that is in error. For example, a financial calculation in which the processing "succeeds" but some human element of the data is "in error". In this case I would suggest catching that exception and sending a new message which, when processed by another endpoint, will log the information to your database.
I could see another case where you want the entire message to fail, but you want the fact that it was attempted noted somehow. This may be closest to what you are describing. In this case, create a new TransactionScope with TransactionScopeOption = Suppress, and then (again) send a new message inside that scope. That message will be sent whether or not your full message transaction rolls back.
You are correct that your transaction is rolling back because the NHibernate session is opened while the transaction is in force. Trying to open a new session inside the suppressed transaction can cause a problem with locking. That's why, most of the time, sending a new message asynchronously is part of the solution in these cases, but how you do it is dependent upon your specific business requirements.
I know I'm late to the party, but as an alternative suggestion, you coudl simply raise another separate log message, which NSB handles independently, for example:
public void Handle(DebitAccountMessage message)
{
var account = this.dbcontext.GetById(message.Id);
if (account.Balance <= 0)
{
// log request - new handler
this.Bus.Send(new DebitAccountLogMessage
{
originalMessage = message,
account = account,
timeStamp = DateTime.UtcNow
});
// throw error - NSB will handle
throw new DebitException("Not enough funds");
}
}
public void Handle(DebitAccountLogMessage message)
{
var messageString = message.originalMessage.Dump();
var accountString = message.account.Dump(DumpOptions.SuppressSecurityTokens);
this.Logger.Log(message.UniqueId, string.Format("{0}, {1}", messageString, accountString);
}

How to write a transactional, multi-threaded WCF service consuming MSMQ

I have a WCF service that posts messages to a private, non-transactional MSMQ queue. I have another WCF service (multi-threaded) that processes the MSMQ messages and inserts them in the database.
My issue is with sequencing. I want the messages to be in certain order. For example MSG-A need to go to the database before MSG-B is inserted. So my current solution for that is very crude and expensive from database perspective.
I am reading the message, if its MSG-B and there is no MSG-A in the database, I throw it back on the message queue and I keep doing that till MSG-A is inserted in the database. But this is a very expensive operation as it involves table scan (SELECT stmt).
The messages are always posted to the queue in sequence.
Short of making my WCF Queue Processing service Single threaded (By setting the service behavior attribute InstanceContextMode to Single), can someone suggest a better solution?
Thanks
Dan
Instead of immediately pushing messages to the DB after taking them out of the queue, keep a list of pending messages in memory. When you get an A or B, check to see if the matching one is in the list. If so, submit them both (in the right order) to the database, and remove the matching one from the list. Otherwise, just add the new message to that list.
If checking for a match is too expensive a task to serialize - I assume you are multithreading for a reason - the you could have another thread process the list. The existing multiple threads read, immediately submit most messages to the DB, but put the As and Bs aside in the (threadsafe) list. The background thread scavenges through that list finding matching As and Bs and when it finds them it submits them in the right order (and removes them from the list).
The bottom line is - since your removing items from the queue with multiple threads, you're going to have to serialize somewhere, in order to ensure ordering. The trick is to minimize the number of times and length of time you spend locked up in serial code.
There might also be something you could do at the database level, with triggers or something, to reorder the entries when it detects this situation. I'm afraid I don't know enough about DB programming to help there.
UPDATE: Assuming the messages contain some id that lets you associate a message 'A' with the correct associated message 'B', the following code will make sure A goes in the database before B. Note that it does not make sure they are adjacent records in the database - there could be other messages between A and B. Also, if for some reason you get an A or B without ever receiving the matching message of the other type, this code will leak memory since it hangs onto the unmatched message forever.
(You could extract those two 'lock'ed blocks into a single subroutine, but I'm leaving it like this for clarity with respect to A and B.)
static private object dictionaryLock = new object();
static private Dictionary<int, MyMessage> receivedA =
new Dictionary<int, MyMessage>();
static private Dictionary<int, MyMessage> receivedB =
new Dictionary<int, MyMessage>();
public void MessageHandler(MyMessage message)
{
MyMessage matchingMessage = null;
if (IsA(message))
{
InsertIntoDB(message);
lock (dictionaryLock)
{
if (receivedB.TryGetValue(message.id, out matchingMessage))
{
receivedB.Remove(message.id);
}
else
{
receivedA.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(matchingMessage);
}
}
else if (IsB(message))
{
lock (dictionaryLock)
{
if (receivedA.TryGetValue(message.id, out matchingMessage))
{
receivedA.Remove(message.id);
}
else
{
receivedB.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(message);
}
}
else
{
// not A or B, do whatever
}
}
If you're the only client of those queues, you could very easy add a timestamp as a message header (see IDesign sample) and save the Sent On field (kinda like an outlook message) in the database as well. You could process them in the order they were sent (basically you move the sorting logic at the time of consumption).
Hope this helps,
Adrian