JobQueue.run_repeating to run a function without command handler in Telegram - telegram-bot

I need to start sending notifications to a TG group, before that I want to run a function continuosly which would query an API and store data in DB. While this function is running I would want to be able to send notifications if they are available in the DB:
That's my code:
import telegram
from telegram.ext import Updater,CommandHandler, JobQueue
token = "tokenno:token"
bot = telegram.Bot(token=token)
def start(update, context):
context.bot.send_message(chat_id=update.message.chat_id,
text="You will now receive msgs!")
def callback_minute(context):
chat_id = context.job.context
# Check in DB and send if new msgs exist
send_msgs_tg(context, chat_id)
def callback_msgs():
fetch_msgs()
def main():
JobQueue.run_repeating(callback_msgs, interval=5, first=1, context=None)
updater = Updater(token,use_context=True)
dp = updater.dispatcher
dp.add_handler(CommandHandler("start",start, pass_job_queue=True))
updater.start_polling()
updater.idle()
if __name__ == '__main__':
main()
This code gives me error:
TypeError: run_repeating() missing 1 required positional argument: 'callback'
Any help would greatly appreciated

There are a few issues with your code, let me try to point them out:
1.
def callback_msgs(): fetch_msgs()
You use callback_msgs as callback for your job. But job callbacks take exactly one argument of type telegram.ext.CallbackContext.
JobQueue.run_repeating(callback_msgs, interval=5, first=1, context=None)
JobQueue is a class. To use run_repeating, which is an instance method, you'll need an instance of that class. In fact the Updater already builds an instance for you, it's available as updater.job_queue in your case. So the call should look like this:
updater.job_queue.run_repating(callback_msgs, interval=5, first=1, context=None)
CommandHandler("start",start, pass_job_queue=True)
This is not strictly speaking an issue, bot pass_job_queue=True has no effect at all, because you use use_context=True
Please note that there is a nice tutorial on JobQueue over at the ptb-wiki. There is also an example on how to use it.
Disclaimer: I'm currently the maintainer of python-telegram-bot

Related

How to call a the python code when a new message is delivered from telethon API

How can I call some Python code when a new message is delivered from the Telethon API? I need to run the code all the day so that I can do my processing from Python code.
How to use this? #client.on(events.NewMessage(chats=channel, incoming=True))
Do I need run the scheduler to check this?
I am using history = client(GetHistoryRequest) method.
First Steps - Updates in the documentation greets you with the following code:
import logging
logging.basicConfig(format='[%(levelname) 5s/%(asctime)s] %(name)s: %(message)s',
level=logging.WARNING)
from telethon import TelegramClient, events
client = TelegramClient('anon', api_id, api_hash)
#client.on(events.NewMessage)
async def my_event_handler(event):
if 'hello' in event.raw_text:
await event.reply('hi!')
client.start()
client.run_until_disconnected()
Note you can "call" any Python code inside my_event_handler. It also shows how #client.on() is meant to be used. Note there is no need for a scheduler.
I am using history = client(GetHistoryRequest) method.
As a side note this is raw API which is discouraged if a friendly alternative, like client.get_messages, exists.

Adding custom headers to all boto3 requests

I need to add some custom headers to every boto3 request that is sent out. Is there a way to manage the connection itself to add these headers?
For boto2, connection.AWSAuthConnection has a method build_base_http_request which has been helpful. I've yet to find an analogous function within the boto3 documentation though.
This is pretty dated but we encountered the same issue, so I'm posting our solution.
I wanted to add custom headers to boto3 for specific requests.
I found this: https://github.com/boto/boto3/issues/2251, and used the event system for adding the header
def _add_header(request, **kwargs):
request.headers.add_header('x-trace-id', 'trace-trace')
print(request.headers) # for debug
some_client = boto3.client(service_name=SERVICE_NAME)
event_system = some_client.meta.events
event_system.register_first('before-sign.EVENT_NAME.*', _add_header)
You can try using a wildcard for all requests:
event_system.register_first('before-sign.*.*', _add_header)
*SERVICE_NAME- you can find all available services here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/index.html
For more information about register a function to a specific event: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/events.html
Answer from #May Yaari is pretty awesome. To the concern raised by #arainchi:
This works, there is no way to pass custom data to event handlers, currently we have to do it in a non-pythonic way using global variables/queues :( I have opened issue ticket with Boto3 developers for this exact case
Actually, we could leverage the python functional programming property: returning a function inside a function to get around:
In the case we want to add a custom value custom_variable to the header, we could do
some_client = boto3.client(service_name=SERVICE_NAME)
event_system = some_client.meta.events
event_system.register_first('before-sign.EVENT_NAME.*', _register_callback(custom_variable))
def _register_callback(custom_variable):
def _add_header(request, **kwargs):
request.headers.add_header('header_name_you_want', custom_variable)
return _add_header
Or a more pythonic way using lambda
some_client = boto3.client(service_name=SERVICE_NAME)
event_system = some_client.meta.events
event_system.register_first('before-sign.EVENT_NAME.*', lambda request, **kwargs: _add_header(request, custom_variable))
def _add_header(request, custom_variable):
request.headers.add_header('header_name_you_want', custom_variable)

gRPC + Thread local issue

Im building a grpc server with python and trying to have some thread local storage handled with werkzeug Local and LocalProxy, similar to what flask does.
The problem I'm facing is that, when I store some data in the local from a server interceptor, and then try to retrieve it from the servicer, the local is empty. The real problem is that for some reason, the interceptor runs in a different greenlet than the servicer, so it's impossible to share data across a request since the werkzeug.local.storage ends up with different keys for the data that is supposed to belong to the same request.
The same happens using python threading library, it looks like the interceptors are run from the main thread or a different thread from the servicers. Is there a workaround for this? I would have expected interceptors to run in the same thread, thus allowing for this sort of things.
# Define a global somewhere
from werkzeug.local import Local
local = Local()
# from an interceptor save something
local.message = "test msg"
# from the service access it
local.service_var = "test"
print local.message # this throw a AttributeError
# print the content of local
print local.__storage__ # we have 2 entries in the storage, 2 different greenlets, but we are in the same request.
the interceptor is indeed run on the serving thread which is different from the handling thread. The serving thread is in charge of serving servicers and intercept servicer handlers. After the servicer method handler is returned by the interceptors, the serving thread will submit it to the thread_pool at _server.py#L525:
# Take unary unary call as an example.
# The method_handler is the returned object from interceptor.
def _handle_unary_unary(rpc_event, state, method_handler, thread_pool):
unary_request = _unary_request(rpc_event, state,
method_handler.request_deserializer)
return thread_pool.submit(_unary_response_in_pool, rpc_event, state,
method_handler.unary_unary, unary_request,
method_handler.request_deserializer,
method_handler.response_serializer)
As for workaround, I can only imagine passing a storage instance both to the interceptor and to servicer during initialization. After that, the storage can be used as a member variable.
class StorageServerInterceptor(grpc.ServerInterceptor):
def __init__(self, storage):
self._storage = storage
def intercept_service(self, continuation, handler_call_details):
key = ...
value = ...
self._storage.set(key, value)
...
return continuation(handler_call_details)
class Storage(...StorageServicer):
def __init__(self, storage):
self._storage = storage
...Servicer Handlers...
You can also wrap all the functions that will be called and set the threading local there, and return a new handler with the wrapped functions.
class MyInterceptor(grpc.ServerInterceptor):
def wrap_handler(self, original_handler: grpc.RpcMethodHandler):
if original_handler.unary_unary is not None:
unary_unary = original_handler.unary_unary
def wrapped_unary_unary(*args, **kwargs):
threading.local().my_var = "hello"
return unary_unary(*args, **kwargs)
new_unary_unary = wrapped_unary_unary
else:
new_unary_unary = None
...
# do this for all the combinations to make new_unary_stream, new_stream_unary, new_stream_stream
new_handler = grpc.RpcMethodHandler()
new_handler.request_streaming=original_handler.request_streaming
new_handler.response_streaming=original_handler.response_streaming
new_handler.request_deserializer=original_handler.request_deserializer
new_handler.response_serializer=original_handler.response_serializer
new_handler.unary_unary=new_unary_unary
new_handler.unary_stream=new_unary_stream
new_handler.stream_unary=new_stream_unary
new_handler.stream_stream=new_stream_stream
return new_handler
def intercept_service(self, continuation, handler_call_details):
return self.wrap_handler(continuation(handler_call_details))

How to access `request_seen()` inside Spider?

I have a Spider and I have a situation where I want to check if the request I am going to schedule already exists in request_seen() or not?
I don't want any method to check inside a download/spider middleware, I just want to check inside my Spider.
Is there any way to call that method?
You should be able to access the dupe filter itself from the spider like this:
self.dupefilter = self.crawler.engine.slot.scheduler.df
then you could use that in other places to check:
req = scrapy.Request('whatever')
if self.dupefilter.request_seen(req):
# it's already been seen
pass
else:
# never saw this one coming
pass
I did something similar to yours with pipeline. Following command is the code that I use.
You should specify an identifier and then go with it to check whether it is seen or not.
class SeenPipeline(object):
def __init__(self):
self.isbns_seen = set()
def process_item(self, item, spider):
if item['isbn'] in self.isbns_seen:
raise DropItem("Duplicate item found : %s" %item)
else:
self.isbns_seen.add(item['isbn'])
return item
Note: You can use these codes within your spider, too

Persistent connection in twisted

I'm new in Twisted and have one question. How can I organize a persistent connection in Twisted? I have a queue and every second checks it. If have some - send on client. I can't find something better than call dataReceived every second.
Here is the code of Protocol implementation:
class SyncProtocol(protocol.Protocol):
# ... some code here
def dataReceived(self, data):
if(self.orders_queue.has_new_orders()):
for order in self.orders_queue:
self.transport.write(str(order))
reactor.callLater(1, self.dataReceived, data) # 1 second delay
It works how I need, but I'm sure that it is very bad solution. How can I do that in different way (flexible and correct)? Thanks.
P.S. - the main idea and alghorithm:
1. Client connect to server and wait
2. Server checks for update and pushes data to client if anything changes
3. Client do some operations and then wait for other data
Without knowing how the snippet you provided links into your internet.XXXServer or reactor.listenXXX (or XXXXEndpoint calls), its hard to make head-or-tails of it, but...
First off, in normal use, a twisted protocol.Protocol's dataReceived would only be called by the framework itself. It would be linked to a client or server connection directly or via a factory and it would be automatically called as data comes into the given connection. (The vast majority of twisted protocols and interfaces (if not all) are interrupt based, not polling/callLater, thats part of what makes Twisted so CPU efficient)
So if your shown code is actually linked into Twisted via a Server or listen or Endpoint to your clients then I think you will find very bad things will happen if your clients ever send data (... because twisted will call dataReceived for that, which (among other problems) would add extra reactor.callLater callbacks and all sorts of chaos would ensue...)
If instead, the code isn't linked into twisted connection framework, then your attempting to reuse twisted classes in a space they aren't designed for (... I guess this seems unlikely because I don't know how non-connection code would learn of a transport, unless your manually setting it...)
The way I've been build building models like this is to make a completely separate class for the polling based I/O, but after I instantiate it, I push my client-list (server)factory into the polling instance (something like mypollingthing.servfact = myserverfactory) there-by making a way for my polling logic to be able to call into the clients .write (or more likely a def I built to abstract to the correct level for my polling logic)
I tend to take the examples in Krondo's Twisted Introduction as one of the canonical examples of how to do twisted (other then twisted matrix), and the example in part 6, under "Client 3.0" PoetryClientFactory has a __init__ that sets a callback in the factory.
If I try blend that with the twistedmatrix chat example and a few other things, I get:
(You'll want to change sendToAll to whatever your self.orders_queue.has_new_orders() is about)
#!/usr/bin/python
from twisted.internet import task
from twisted.internet import reactor
from twisted.internet.protocol import Protocol, ServerFactory
class PollingIOThingy(object):
def __init__(self):
self.sendingcallback = None # Note I'm pushing sendToAll into here in main
self.iotries = 0
def pollingtry(self):
self.iotries += 1
print "Polling runs: " + str(self.iotries)
if self.sendingcallback:
self.sendingcallback("Polling runs: " + str(self.iotries) + "\n")
class MyClientConnections(Protocol):
def connectionMade(self):
print "Got new client!"
self.factory.clients.append(self)
def connectionLost(self, reason):
print "Lost a client!"
self.factory.clients.remove(self)
class MyServerFactory(ServerFactory):
protocol = MyClientConnections
def __init__(self):
self.clients = []
def sendToAll(self, message):
for c in self.clients:
c.transport.write(message)
def main():
client_connection_factory = MyServerFactory()
polling_stuff = PollingIOThingy()
# the following line is what this example is all about:
polling_stuff.sendingcallback = client_connection_factory.sendToAll
# push the client connections send def into my polling class
# if you want to run something ever second (instead of 1 second after
# the end of your last code run, which could vary) do:
l = task.LoopingCall(polling_stuff.pollingtry)
l.start(1.0)
# from: https://twistedmatrix.com/documents/12.3.0/core/howto/time.html
reactor.listenTCP(5000, client_connection_factory)
reactor.run()
if __name__ == '__main__':
main()
To be fair, it might be better to inform PollingIOThingy of the callback by passing it as an arg to it's __init__ (that is what is shown in Krondo's docs), For some reason, I tend to miss connections like this when I read code and find class-cheating easier to see, but that may just by my personal brain-damage.