Telegram bot: ignore text-only messages - telegram-bot

I want to reduce the number of requests from Telegram to my server and make the bot receive only messages with images, URLs or documents. I use webhook.
If this is achievable, then how?

It is possible to some extend through the allowed_updates parameter of the https://core.telegram.org/bots/api#setwebhook method:
List the types of updates you want your bot to receive. For example, specify [“message”, “edited_channel_post”, “callback_query”] to only receive updates of these types. See Update for a complete list of available update types. Specify an empty list to receive all updates regardless of type (default). If not specified, the previous setting will be used.
Very sadly though it is not possible to receive only updates with urls, since this is text.
If you want to reduce the amount of requests made, than you could attempt to use the getUpdates method. Sorry :/

Related

Retrieving messages in chat application

I am working on an application for chatting. I wondering how to design the API for getting messages from a server.
When the application loads the chat window, I display the last 20 messages from the server using the endpoint:
/messages?user1={user1Id}&user2={user2Id}&page=0
Secondly, I allow user to load and display previous messages when the users scrolls to the top using the same endpoint but with different page (1)
/messages?user1={user1Id}&user2={user2Id}&page=1
But this design doesn't work correctly when users start to send messages to each others. The reason is that the endpoint returns the messages using descending order. Invoking the endpoint will give different result sets before sending/receiving a message and after.
My goal is to get always 20 previous messages when a user scrolls through conversation history.
How would you implement this in a clean way (including the REST API design)? I can imagine some solutions, but they seem dirty to me.
REST does not scale well for real time applications. It takes too many resources to open the HTTP connection again and again. Better to use websockets for it.
As of the upper problem if you start pagination with the latest message, not with the first message, then I would not wonder that it changes. To keep the pages after it you need to send the message id you started with. So for example: GET ...&count=20&latest=1, after that you get a 20 list of messages, you get the message id of the latest one and do the pagination with GET ...&count=20&basePoint={messageId}&page=0 to always get the exact same page no matter that you got new messages. After that you continue with GET ...&count=20&basePoint={messageId}&page=1 and so on... You will have negative pages GET ...&count=20&basePoint={messageId}&page=-1 if there are newer messages. Though I don't know any chatting application which uses pagination this way.
What I would do is GET ...&count=20&latest=1 and get the 20th message and do GET ...&count=20&before={messageId_20th} after that get the last message again from that list which is the 40th and do GET ...&count=20&before={messageId_40th} and so on. To get the new messages you can do GET ...&count=20&latest=1 again or GET ...&count=20&after={messageId_1st}.
None of the above is really good if you are using a caching mechanism in the client. I would add something like key frames in videos, so key messages. For example every 20th message can be a key message. I would do caching based on the key message ids so the responses could be cacheable, something like GET ...&count=20&latest=1 would return messages and one in the middle of the list would have a property of keyMessage=true. I would start the next request from the last key message something like GET ...&count=20&before={messageId_lastKey}, so the response will be cacheable.
Another way of doing this is starting pagination from the very first message. So when you do GET ...&count=20&latest=1 it will write the index of the message, something like 1234th message. You can do caching based on every 20th message just like in the key message solution, but instead of the message ids, you can do GET ...&count=20&to=1220 and merge the pages on the client. Or if you really want to spare with data, then GET ...&from=1201&to=1220&before=1215 or GET ...&from=1201&to=1220&last=1214 or GET ...&from=1201&to=1214, etc... And continue normal pagination with GET ...&count=20&to=1200 or GET ...&from=1181&to=1200.
What is good with the upper fixed pages approach, that you can use range headers too. For example GET .../latest would return the last 20 message with the header of Content-Range: messages 1215-1234/1234. After that you can do GET .../ Range: messages=1201-1214 and GET .../ Range: messages=1181-1200 and so on... When the total message count is updated in the response Content-Range: messages 1181-1200/1245, then you can automagically download the new message with GET .../ Range: messages=1235-1245 or with GET .../ Range: messages=1235- if you expect the 1245 to change meanwhile. You can do the same thing with the URI too, but if you have a standard solution like range headers it is better to use it instead of reinventing the wheel.
As of the &user1={user1Id}&user2={user2Id} part I would order it based on alphabet, so always user1 would be earlier in alphabetic order than user2, something like &user1=B&user2=A -> &user1=A&user2=B. So that will be properly cacheable too, not necassarily on the client, but you can add a server side cache too, so if the two participating users try to get the same conversation recently it won't be queried from the database again. Another way of doing this is adding a conversation id, which can be random unique or generated from the two user ids, but it must be always the same for the two users e.g. /conversations/A+B or /conversations/x23542asda. I would do the latter, because if you start support conversations with multiple users it is better to have a dedicated conversation identifier. Another thing you can support is having multiple topic related conversations between the two users, so you won't need to do /conversations/A+B/{topic} just use a unique id and search for conversations before creating a new one, so either /conversations?participants[]=A&participants[]=B&topic={topic} will give you an empty list or a list with a single conversation. This could be used to list coversations of a single user /conversations?participants[]=A or list conversations of a single user in a certain topic /conversations?participants[]=A&topic={topic}.
Generally speaking, for "system design" questions, there isn't any one correct answer. Multiple approaches can be considered, based on their pros and cons. You've mentioned you want a clean way to retrieve messages, so your expectation might be different from mine.
Here is a simple approach/idea to get you started:
Have a field in your database that can be used to order messages. It could be a timestamp, a message_id, or something else that you want. Let's call this m_id for now.
On your client, have a field that tracks the m_id of the earliest message currently available locally on the client. This key (let's call it earliest_m_id) will be used to query the database, because you can simply fetch x number of messages before the earliest_m_id.
This would solve your issue of incorrect messages being fetched. Using paging will not work, since pages will continuously change with the exchange of messages.

WhatsApp messages order

We are working on our chatbot that is connected to UIB. Some of our messages have a bit of a complex structure, and we need to split them up in order to send them in proper order. Consider, we have a single message that has the following content structure: <text><image><text>. In order to send this message to a WhatsApp user, we need to split the content into three messages (#1 <text>; #2 <image>; #3 <text>). If we send these messages one-by-one, in WhatsApp client we might receive these messages as in the order <text><text><image>, because posting images takes longer than posting text messages. We have a workaround (adding delay between requests) but images could be big in size, so it takes very long to send them. We could constantly increase delay, but it's not a good way to do such things.
So, my question is the following:
Is it possible to make a request to check the message status, whether it has been delivered to WhatsApp servers or not? Actually, it doesn't matter, whether the message were delivered to the end user, because users might be offline. We just need to know if the messages have been delivered to the WhatApp server in the proper order.
You'll get the read/delivery status in the webhook URL.
Please contact support#uib.ai for further clarifications.

Message versioning in RabbitMQ / AMQP?

What is the recommended way to deal with message versioning? The main schools of thought appear to be:
Always create a new message class as the message structure changes
Never use (pure) serialized objects as a message. Always use some kind of version header field and a byte stream body field. In this way, the receiver can always accept the message and check the version number before attempting to read the message body.
Never use binary serialized objects as a message. Instead, use a textual form such as JSON. In this way, the receiver can always accept the message, check the version number, and then (when possible) understand the message body.
As I want to keep my messages compact I am considering using Google Protocol Buffers which would allow me to satisfy both 2 & 3.
However I am interested in real world experiences and advice on how to handle versioning of messages as their structure changes?
In this case "version" will be basically some metadata about the message, And these metadata are some instruction/hints to the processing algorithm. So I willsuggest to add such metadata in the header (outside of the payload), so that consumer can read the metadata first before trying to read/understand and process the message payload. For example, if you keep the version info in the payload and due to some reason your (message payload is corrupted) then algorithm will fail parse the message, then it can not event reach the metadata you have put there.
You may consider to have both version and type info of the payload in one header.

Event sourcing combinded with deletable data

I want to develop an Emailer microservice which provides a SendEmail command. Inside the microservice I have an aggregate in mind which represents the whole email process with the following events:
Aggregate Email:
(EmailCreated)
EmailDeliveryStarted
EmailDeliveryFailed
EmailRecipientDelivered when one of the recipients received the email
EmailRecipientDeliveryFailed when one of the recipients could not receive the email
etc.
In the background the email delivery service SendGrid is used; my microservice works like a facade for that with my own events. The incoming webhooks from SendGrid are translated to proper domain events.
The process would look like this:
Command SendEmail ==> EmailCreated
EmailCreatedHandler ==> Email.Send (to SendGrid)
Incoming webhook ==> EmailDeliveryStarted
Further webhooks ==> EmailRecipientDelivered, EmailRecipientDeliveryFailed, etc.
Of course if I'd want to replace the external webservice and it would apply other messaging strategies I would adapt to that but keep my domain model with it's events. I want to let the client not worry about the concrete email delivery strategy.
Now the crucial problem I face: I want to accept the SendEmail commands even if SendGrid is not available at that very moment, which entails storing the whole email data (with attachments) and then, with an event handler, start the sending process. On the other hand I don't want to bloat my initial EmailCreated event with this BLOB data. And I want to be able to clean up this data after SendGrid has accepted my send email request.
I could also try both sending the email to SendGrid and storing an initial EmailDeliveryStarted event in the SendEmail command. But this feels like a two-phase commit: if SendGrid accepted my call but somehow my repository was unable to store the EmailDeliveryStarted event the client would be informed that something went wrong and it tries again which would be a disaster.
So I don't know how to design my aggregate and, more important, my EmailCreated event since it should not contain the BLOB data like attachments.
I found this question interesting and it took a little bit to reflect on that.
First things first - I do not see an obligation to store the email attachments in the event. You can simply store the fully qualified name of the files attached. That would keep the event log smaller and perhaps rule out the need for "deleting" the event (and you know that, in an event source model, you should not do that).
Secondly, assuming that the project is not building an e-mail client, I don't see a need to model an e-mail as an aggregate root. I see AggregateRoots represent business-relevant domains, not for a utility task like sending an e-mail. You could model this far more easily using a database table / document that keeps track of what has been sent and what not yet. I see sending e-mails through SendGrid as a reaction to a business event, certainly to be tracked, but not an AggregateRoot in its own right.
Lastly, if you want to accept SendEmail commands also when SendGrid is offline, the aggregate emits an EmailQueued event. The EmailQueuedHandler will produce a line on the read model of the process in charge taking all the Emails in queued state and batch them for sending. If the communication with SendGrid fails, you can either:
Do nothing, the sender process will pick the email at the next attempt
Emit a EmailSendFailed, intercepted by a Handler that will increase the retry count (if you want to stop after a number of retries).
Hope that is sufficiently clear and best of luck with your project.

Building REST API - separate requests

I am building an API and am a little unsure whether it would be better to have a request that brings back all information relating to a resource, or just bring back info separately according to tasks that need carrying out. For example, I have a messages resource and am struggling to decide whether to bring back all message information in one go. OR have a separate request for unread messages, a separate request for a list of messages and another request for a single message.
What is the proper way? I am tempted to keep them all separate but then worrya bout having to do too many requests.
Stop worrying and just do.
I like to keep things separate in the beginning, and at some point, I realise that request x always followed by request y, so I'll just merge those two. You won't know what you'll need until you're working on it...