channel message statistics with get_stats - telethon

I'am trying to get message statistics using telethon get_stats.
channel = '#test'
async with client:
stats = await client.get_stats(entity=channel,message=92)
print(stats.stringify())
But I keep getting ChatAdminRequiredError
Can it be used only for channels where I am admin?

From the docs:
Note that some restrictions apply before being able to fetch statistics, in particular the channel must have enough members (for megagroups, this requires at least 500 members).
and
If there are not enough members (poorly named) errors such as telethon.errors.ChatAdminRequiredError will appear.

Related

how to make sense of People APIs responses?

When calling People API's endpoints, especially in Batch requests, we're getting many different types of error responses.
Some have useful explanation in the error message, like:
Quota exceeded for quota metric 'Daily Contact Writes (Batch requests
cost 200 quota)' and limit 'Daily Contact Writes (Batch requests cost
200 quota) per day per user' of service 'people.googleapis.com' for
consumer 'project_number:XXX'.
Which you can detect and properly handle, e.g. wait for 24 hours before retrying that request, but some are more cryptic, such as:
Resource has been exhausted (e.g. check quota).
This does mention rate-limiting, but for which quota? Is it per-user or per GCP project? When can we retry this?
Note that we're getting this for the first batch call when syncing a user account, so I'm guessing this is not per-user quota, but there's no mention of such rate-limits in the docs.
Specifically, having issues handling:
"Sync quota exceeded"
"Resource has been exhausted (e.g. check quota)"
"MY_CONTACTS_OVERFLOW_COUNT"
Here's what I have so far, feel free to edit this answer to add more insights:
Authentication or Google backend issues:
"invalid_grant": bad access token
"Insufficient Permission": access token doesn't contain required scope
"The service is currently unavailable.": Google issue
"Internal error encountered.": Google issue
"Authentication backend unavailable.": Google issue
Quota and rate-limiting:
"Sync quota exceeded": ???
"Quota exceeded for quota metric X": A specific quota had been exceeded (per min / daily will be part of the message)
"Resource has been exhausted (e.g. check quota)": ???
"MY_CONTACTS_OVERFLOW_COUNT": ???
Bad requests:
"Request contains an invalid argument": something is wrong in the request, usually a Person object with some illegal info item
"Request contains a person.etag that is different than the current person.etag": An attempt to update a person that was recently updated on Google's side, need to fetch again
"Request person.etag is different than the current person.etag": same as above
"Requested entity was not found": An attempt to update a no-longer existing person
"Contact person resources are not found": same as above
"Contact group name is empty, expected to be non empty": An attempt to create/update a group with an empty name.
"Contact group name already exists": An attempt to create a group with the same name
"MY_CONTACTS_OVERFLOW_COUNT" happens when you try to insert contacts to a google account, but they already have the maximum number of contacts.
I am not 100% sure, but this limit seems to be ~20,000 for "normal"/"free" google accounts.
edit- The limit is 25,000, since 2011: https://workspaceupdates.googleblog.com/2011/05/need-more-contacts-in-gmail-contacts.html

How should I avoid sending duplicate emails using mailgun, taskqueue and ndb?

I am using the taskqueue API to send multiple emails is small groups with mailgun. My code looks more or less like this:
class CpMsg(ndb.Model):
group = ndb.KeyProperty()
sent = ndb.BooleanProperty()
#Other properties
def send_mail(messages):
"""Sends a request to mailgun's API"""
# Some code
pass
class MailTask(TaskHandler):
def post(self):
p_key = utils.key_from_string(self.request.get('p'))
msgs = CpMsg.query(
CpMsg.group==p_key,
CpMsg.sent==False).fetch(BATCH_SIZE)
if msgs:
send_mail(msgs)
for msg in msgs:
msg.sent = True
ndb.put_multi(msgs)
#Call the task again in COOLDOWN seconds
The code above has been working fine, but according to the docs, the taskqueue API guarantees that a task is delivered at least once, so tasks should be idempotent. Now, most of the time this would be the case with the above code, since it only gets messages that have the 'sent' property equal to False. The problem is that non ancestor ndb queries are only eventually consistent, which means that if the task is executed twice in quick succession the query may return stale results and include the messages that were just sent.
I thought of including an ancestor for the messages, but since the sent emails will be in the thousands I'm worried that may mean having large entity groups, which have a limited write throughput.
Should I use an ancestor to make the queries? Or maybe there is a way to configure mailgun to avoid sending the same email twice? Should I just accept the risk that in some rare cases a few emails may be sent more than once?
One possible approach to avoid the eventual consistency hurdle is to make the query a keys_only one, then iterate through the message keys to get the actual messages by key lookup (strong consistency), check if msg.sent is True and skip sending those messages in such case. Something along these lines:
msg_keys = CpMsg.query(
CpMsg.group==p_key,
CpMsg.sent==False).fetch(BATCH_SIZE, keys_only=True)
if not msg_keys:
return
msgs = ndb.get_multi(msg_keys)
msgs_to_send = []
for msg in msgs:
if not msg.sent:
msgs_to_send.append(msg)
if msgs_to_send:
send_mail(msgs_to_send)
for msg in msgs_to_send:
msg.sent = True
ndb.put_multi(msgs_to_send)
You'd also have to make your post call transactional (with the #ndb.transactional() decorator).
This should address the duplicates caused by the query eventual consistency. However there still is room for duplicates caused by transaction retries due to datastore contention (or any other reason) - as the send_mail() call isn't idempotent. Sending one message at a time (maybe using the task queue) could reduce the chance of that happening. See also GAE/P: Transaction safety with API calls

How to interpret the RabbitMQ Message stats?

I to want get and historize queue metrics for the "Enqueued, Dequeued an Size" (Terminology formerly met on ActiveMQ).
The moving charts provided in the management plugin are not enough for the monitoring that I need to do.
So with RabbitMQ, I'm getting data from https://rabbitmq-server:15672/api/queues/myvhost
This returns json.. for a queue, I can obtain real life production data like :
"messages":0, // for "Size"
"message_stats":{
"deliver_get":171528, // for "Dequeued"
"ack":162348,
"redeliver":9513,
"deliver_no_ack":0,
"deliver":171528,
"get":0,
"publish":51293 // for "Enqueued"
(...)
I'm in particular surprised by the publish counter:
Its value can even decrease between 2 measures done with a couple of minutes of delay ! (see sample chart around 17:00)
As you can see on my data, the deliver_get is significantly larger than the publish.
https://my-rabbitmq:15672/doc/stats.html doesn't give a lot of details that could explain what I actually notice.
Also, under the message_stats object that I obtain, I'm missing the some counters like confirm and return which could be related to the enqueuing.
Are there relationships between these metrics ? (like deliver_get + messages = redeliver + publish.. but that one doesn't work with my figures)
Is there another more detailed documentation about these metrics ?

Can a telegram bot block a specific user?

I have a telegram bot that for any received message runs a program in the server and sends its result back. But there is a problem! If a user sends too many messages to my bot(spamming), it will make server so busy!
Is there any way to block the people whom send more than 5 messages in a second and don't receive their messages anymore? (using telegram api!!)
Firstly I have to say that Telegram Bot API does not have such a capability itself, Therefore you will need to implement it on your own and all you need to do is:
Count the number of the messages that a user sends within a second which won't be so easy without having a database. But if you have a database with a table called Black_List and save all the messages with their sent-time in another table, you'll be able to count the number of messages sent via one specific ChatID in a pre-defined time period(In your case; 1 second) and check if the count is bigger than 5 or not, if the answer was YES you can insert that ChatID to the Black_List table.
Every time the bot receives a message it must run a database query to see that the sender's chatID exists in the Black_List table or not. If it exists it should continue its own job and ignore the message(Or even it can send an alert to the user saying: "You're blocked." which I think can be time consuming).
Note that as I know the current telegram bot API doesn't have the feature to stop receiving messages but as I mentioned above you can ignore the messages from spammers.
In order to save time, You should avoid making a database connection
every time the bot receives an update(message), instead you can load
the ChatIDs that exist in the Black_List to a DataSet and update the
DataSet right after the insertion of a new spammer ChatID to the
Black_List table. This way the number of the queries will reduce
noticeably.
I have achieved it by this mean:
# Using the ttlcache to set a time-limited dict. you can adjust the ttl.
ttl_cache = cachetools.TTLCache(maxsize=128, ttl=60)
def check_user_msg_frequency(message):
print(ttl_cache)
msg_cnt = ttl_cache[message.from_user.id]
if msg_cnt > 3:
now = datetime.now()
until = now + timedelta(seconds=60*10)
bot.restrict_chat_member(message.chat.id, message.from_user.id, until_date=until)
def set_user_msg_frequency(message):
if not ttl_cache.get(message.from_user.id):
ttl_cache[message.from_user.id] = 1
else:
ttl_cache[message.from_user.id] += 1
With these to functions above, you can record how many messages sent by any user in the period. If a user's messages sent more than expected, he would be restricted.
Then, every handler you called should call these two functions:
#bot.message_handler(commands=['start', 'help'])
def handle_start_help(message):
set_user_msg_frequency(message)
check_user_msg_frequency(message)
I'm using pyTelegramBotAPI this module to handle.
I know I'm late to the party, but here is another simple solution that doesn't use a Db:
Create a ConversationState class to attach to each telegram Id when they start to chat with the bot
Then add a LastMessage DateTime variable to the ConversationState class
Now every time you receive a message check if enought time has passed from the LasteMessage DateTime, if not enought time has passed answer with a warning message.
You can also implement a timer that deletes the conversation state class if you are worried about performance.

Read Timed Out : sychronous query via Bigquery java API

We are using the big query JAVA API to retrieve results for our analytics reporting frontend. We are trying to retrieve the results synchronously. A lot of times we get Read timed out error, even before the query timeout as specified in the parameters. Here's the stack trace for a sample fail:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:830)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:787)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:36)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
I am not able to retrieve the job id of the resulting job as the error occurs before I can retrieve a JobReference object. The timeout specified in this case was 300 sec. The query failed well before it. The query contains three JOIN's and several GROUP EACH BY clauses. Can you suggest us a possible way to debug this ?
Adding the code snippet:
QueryRequest queryInfo = new QueryRequest().setQuery(sql)
.setTimeoutMs(timeOutInSec * 1000);
// get project id
BQGameConnectionDetails details = Config
.getBQConnectionDetails(gameId);
String projectId = details.getProjectId();
Bigquery.Jobs.Query queryRequest = getInstance(gameId).jobs()
.query(projectId, queryInfo);
QueryResponse response = queryRequest.execute();
There are two timeouts involved. The first timeout is in the HTTP request you've sent to bigquery. The second is in the bigquery request timeout. It sounds like you've set the latter to a large value, but the former is likely the timeout that you're hitting. If the HTTP request times out before the BigQuery timeout, the connection will be closed and BigQuery won't have a chance to respond.
There are two options: First is to increase the HTTP request timeout (which depends on the libraries you're using, but this page here may be helpful). The second is to decrease the bigquery timeout. This means you'll have to use jobs.getQueryResults() to read the actual results, but this is a more robust method because it doesn't matter how long the query takes, you can just call getQueryResults() in a loop. I would post a link to a good java sample that does this, but I don't know that one exists, unfortunately.