I'm currently repeating a task in a for loop inside a callback using Twisted, but would like the reactor to break the loop in the callback (one) if the user issues a KeyboardInterrupt via Ctrl-C. From what I have tested, the reactor only stops or processes interrupts at the end of the callback.
Is there any way of sending a KeyboardInterrupt to the callback or the error handler in the middle of the callback run?
#!/usr/bin/env python
from twisted.internet import reactor, defer
def one(result):
print "Start one()"
for i in xrange(10000):
print i
print "End one()"
def oneErrorHandler(failure):
print failure
print "INTERRUPTING one()"
if __name__ == '__main__':
d = defer.Deferred()
reactor.callLater(1, d.callback, 'result')
except KeyboardInterrupt:
print "Interrupted by keyboard. Exiting."
I got this working dandy. The fired SIGINT sets a flag running for any running task in my code, and additionally calls reactor.callFromThread(reactor.stop) to stop any twisted running code:
#!/usr/bin/env python
import sys
import twisted
import re
from twisted.internet import reactor, defer, task
import signal
def one(result, token):
print "Start one()"
for i in xrange(1000):
print i
if token.running is False:
raise KeyboardInterrupt()
#reactor.callFromThread(reactor.stop) # this doesn't work
print "End one()"
def oneErrorHandler(failure):
print "INTERRUPTING one(): Unkown Exception"
import traceback
print traceback.format_exc()
def oneKeyboardInterruptHandler(failure):
print "INTERRUPTING one(): KeyboardInterrupt"
def repeatingTask(token):
d = defer.Deferred()
d.addCallback(one, token)
class Token(object):
def __init__(self):
self.running = True
def sayBye():
print "bye bye."
if __name__ == '__main__':
token = Token()
def customHandler(signum, stackframe):
print "Got signal: %s" % signum
token.running = False # to stop my code
reactor.callFromThread(reactor.stop) # to stop twisted code when in the reactor loop
signal.signal(signal.SIGINT, customHandler)
t2 = task.LoopingCall(reactor.callLater, 0, repeatingTask, token)
reactor.addSystemEventTrigger('during', 'shutdown', sayBye)
This is intentional to avoid (semi-)preemption, since Twisted is a cooperative multitasking system. Ctrl-C is handled in Python with a SIGINT handler installed by the interpreter at startup. The handler sets a flag when it is invoked. After each byte code is executed, the interpreter checks the flag. If it is set, KeyboardInterrupt is raised at that point.
The reactor installs its own SIGINT handler. This replaces the behavior of the interpreter's handler. The reactor's handler initiates reactor shutdown. Since it doesn't raise an exception, it doesn't interrupt whatever code is running. The loop (or whatever) gets to finish, and when control is returned to the reactor, shutdown proceeds.
If you'd rather have Ctrl-C (ie SIGINT) raise KeyboardInterrupt, then you can just restore Python's SIGINT handler using the signal module:
signal.signal(signal.SIGINT, signal.default_int_handler)
Note, however, that if you send a SIGINT while code from Twisted is running, rather than your own application code, the behavior is undefined, as Twisted does not expect to be interrupted by KeyboardInterrupt.
I'm a beginner of python. Below is my python code for telegram bot. It's working on XAMPP but I would to host the bot on cloud so that there's no need to start the XAMPP's Apache & MYSQL everytime when I'm trying to use the bot. However, it's not working after it's been uploaded to Heroku. May I know how can I fix this ? Thank you in advance.
Modified for uploading to Heroku
import logging
from telegram.ext import Updater, CommandHandler, MessageHandler, Filters
import os
import mysql.connector
from typing import Dict
from telegram import ReplyKeyboardMarkup, Update, ReplyKeyboardRemove
from telegram.ext import (
PORT = int(os.environ.get('PORT', 5000))
# Enable logging
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
mydb = mysql.connector.connect(
query = mydb.cursor()
logger = logging.getLogger(__name__)
TOKEN = '5333685233:AAFr4-1nB6_I8ZMdt25Y4zBotHRA9I_qtMI'
# Define a few command handlers. These usually take the two arguments update and
# context. Error handlers also receive the raised TelegramError object in error.
def start(update, context):
"""Send a message when the command /start is issued."""
update.message.reply_text('Hi! This is start')
def help(update, context):
"""Send a message when the command /help is issued."""
def sql(update, context):
sql = "SELECT nama_item, jumlah_dalam_kg FROM data_penjualan_harian WHERE nama_item = 'Lemon'"
sql_result = query.fetchall()
pesan_balasan = ''
for x in sql_result:
pesan_balasan = pesan_balasan + str(x) + '\n'
#memperbagus balasan bot
#menghilangkan tanda petik
pesan_balasan = pesan_balasan.replace("'","")
#menghilangkan tanda kurung
pesan_balasan = pesan_balasan.replace("(","")
pesan_balasan = pesan_balasan.replace(")","")
#menghilangkan tanda koma
pesan_balasan = pesan_balasan.replace(",","")
def main():
updater = Updater(TOKEN, use_context=True)
# Get the dispatcher to register handlers
dp = updater.dispatcher
# on different commands - answer in Telegram
dp.add_handler(CommandHandler("start", start))
dp.add_handler(CommandHandler("help", help))
# on noncommand i.e message - echo the message on Telegram
dp.add_handler(MessageHandler(Filters.text, echo))
# log all errors
# # Start the Bot
webhook_url='https://powerful-lowlands-14039.herokuapp.com/' + TOKEN)
# Run the bot until you press Ctrl-C or the process receives SIGINT,
# SIGTERM or SIGABRT. This should be used most of the time, since
# start_polling() is non-blocking and will stop the bot gracefully.
if __name__ == '__main__':
from dotenv import load_dotenv
import os
from panda import *
from telegram.ext import *
from telegram.update import *
Token =os.getenv('TOKEN')
print("The bot connected .....")
# commands handler
# start message
def start_command(update,context):
update.message.reply_text("Hello am mr panda am here to help you: ")
# help command
def help_command(update,context):
res = panda.help()
# message handler
**def message_handle(update,context):
message = str(update.message.text).lower()
respose = panda.hello(message)
# errror handler
def error(update,context):
print(f"Update the context error : {context.error}")
# main function
def main():
global message
updater =Updater(Token,use_context=True)
dp = updater.dispatcher
# command handlers
**# message handlers
# error handlers
This was the code Am getting error
Update the context error: local variable 'message' referenced before assignment
I think there is an error on the highlighted portions I do little searches and I referred to the documentation too I can't catch the error.
Does anyone have solution that would be great :)
Using Scrapy i'm implementing a CrawlSpider which will scrape all kinds of websites and hence, sometimes very slow ones which will produce a timeout eventually.
My problem is that if such a twisted.internet.error.TimeoutError occurs, i want to trigger the errback of my spider. I don't want to raise this exception and i also don't want to return a dummy Response object which may would suggest that scraping was successful.
Note that i was already able to made all of this work, but only using a "dirty" workaround:
myspider.py (excerpt)
class MySpider(CrawlSpider):
name = 'my-spider'
rules = (
callback='_my_callback', follow=True
def parse_start_url(self, response):
# (...)
def errback(self, failure):
log.warning('Failed scraping following link: {}'
middlewares.py (excerpt)
from twisted.internet.error import DNSLookupError, TimeoutError
# (...)
class MyDownloaderMiddleware(object):
def from_crawler(cls, crawler):
# This method is used by Scrapy to create your spiders.
s = cls()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
def process_request(self, request, spider):
return None
def process_response(self, request, response, spider):
return response
def process_exception(self, request, exception, spider):
if (isinstance(exception, TimeoutError)
or (isinstance(exception, DNSLookupError))):
# just 2 examples of errors i want to catch
# set status=500 to enforce errback() call
return Response(request.url, status=500)
Settings should be fine with my custom Middleware already enabled.
Now as you can see by using return Response(request.url, status=500) i can trigger my errback() function in MySpider as desired. However, the status code 500 is very misleading because it's not only incorrect but technically i never receive any response at all.
So my question is, how can i trigger my errback() function trough DownloaderMiddleware.process_exception() in a clean way?
EDIT: I quickly figured it out that for similar exceptions like DNSLookupError i want to have the same behaviour in place. I've updated the coding snippets accordingly.
I didn't find it in the docs, but looking at the source I find DownloaderMiddleware.process_exception() can return twisted.python.failure.Failure objects as well as Request or Response objects.
This means you can return a Failure object to be handled by the errback by wrapping the exception in the Failure object.
This is cleaner than creating a fake Response object, see an example Middleware implementation that does this here: https://github.com/miguelsimon/site2graph/blob/master/site2graph/middlewares.py
The core idea:
from twisted.python.failure import Failure
class MyDownloaderMiddleware:
def process_exception(self, request, exception, spider):
return Failure(exception)
The __init__ method of the Rule class accepts a process_request parameter that you can use to attatch an errback to a request:
class MySpider(CrawlSpider):
name = 'my-spider'
rules = (
# …
def process_request(self, request, response):
return request.replace(errback=self.errback)
def errback(self, failure):
I am able to call a scrapy spider from another Python script using either CrawlerRunner or CrawlerProcess. But, when I try to call the same spider calling class from a pywikibot robot, I get a ReactorNotRestartable error. Why is this and how can I fix it?
Here is the error:
File ".\scripts\userscripts\ReplicationWiki\RWLoad.py", line 161, in format_new_page
aea = AEAMetadata(url=DOI_url)
File ".\scripts\userscripts\ReplicationWiki\GetAEAMetadata.py", line 39, in __init__
reactor.run() # the script will block here until all crawling jobs are finished
File "C:\Users\lextr\.conda\envs\py37\lib\site-packages\twisted\internet\base.py", line 1282, in run
File "C:\Users\lextr\.conda\envs\py37\lib\site-packages\twisted\internet\base.py", line 1262, in startRunning
File "C:\Users\lextr\.conda\envs\py37\lib\site-packages\twisted\internet\base.py", line 765, in startRunning
raise error.ReactorNotRestartable()
CRITICAL: Exiting due to uncaught exception <class 'twisted.internet.error.ReactorNotRestartable'>
Here is the script which calls my scrapy spider. It runs fine if I just call the class from main.
from twisted.internet import reactor, defer
from scrapy import signals
from scrapy.crawler import Crawler, CrawlerProcess, CrawlerRunner
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from Scrapers.spiders.ScrapeAEA import ScrapeaeaSpider
class AEAMetadata:
Helper to run ScrapeAEA spider and return JEL codes and data links
for a given AEA article link.
def __init__(self, *args, **kwargs):
url = kwargs.get('url')
if not url:
raise ValueError('No article url given')
self.items = []
def collect_items(item, response, spider):
settings = get_project_settings()
crawler = Crawler(ScrapeaeaSpider, settings)
crawler.signals.connect(collect_items, signals.item_scraped)
runner = CrawlerRunner(settings)
d = runner.crawl(crawler, url=url)
d.addBoth(lambda _: reactor.stop())
reactor.run() # the script will block here until all crawling jobs are finished
#process = CrawlerProcess(settings)
#process.crawl(crawler, url=url)
#process.start() # the script will block here until the crawling is finished
def get_jelcodes(self):
jelcodes = self.items[0]['jelcodes']
return jelcodes
def main():
aea = AEAMetadata(url='https://doi.org/10.1257/app.20180286')
jelcodes = aea.get_jelcodes()
if __name__ == '__main__':
Updated simple Test that instantiates the AEAMetadata class twice.
Here is the calling code in my pywikibot bot which fails:
from GetAEAMetadata import AEAMetadata
def main(*args):
for _ in [1,2]:
url = 'https://doi.org/10.1257/app.20170442'
aea = AEAMetadata(url=url)
print('After AEAMetadata')
jelcodes = aea.get_jelcodes()
if __name__ == '__main__':
My call to AEAMetadata was embedded in a larger script which fooled me into thinking the AEAMetadata class was only instantiated once before failure.
In fact, AEAMetadata was called twice.
And, I also thought that the script would block after the reactor.run() because the comment in all the scrapy examples stated that was the case.
However, the second deferred callback is reactor.stop() which unblocks the reactor.run().
A more basic incorrect assumption was that the reactor was deleted and recreated on each iteration. In fact, the reactor is instantiated and initialized when it is first imported. And, it is a global object which lives as long as the underlying process and was not designed to be restarted. The extremes actually needed to delete and restart a reactor are described here:
So, I guess I've answered my own question.
And, I'm rewriting my script so it doesn't try to use the reactor in a way it was never intended to be used.
And, thanks Gallaecio for getting me thinking in the right direction.
I'm new to Scrapy and I once managed to run my script well on Scrapy 0.24. But when I switched to the newly launched 1.0 I encountered a logging problem: What I want to do is to set both the file and the console log level to INFO, but however I set the LOG_LEVEL or the configure_logging() function(using the Python internal logging package instead of scrapy.log), Scrapy always logs DEBUG level information to the console, which returns the whole item object in format of dict. In fact, the LOG_LEVEL option only works for the external file. I suspect it must have something to do with the Python logging but have no idea how to set it. Could any one help me out?
This is how I config my logging in run_my_spider.py:
from crawler.settings import LOG_FILE, LOG_FORMAT
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging
from crawler.spiders.MySpiders import MySpider
import logging
def run_spider(spider):
settings = get_project_settings()
# configure file logging
# It ONLY works for the file
configure_logging({'LOG_FORMAT': LOG_FORMAT,
'LOG_STDOUT' : True})
# instantiate spider
process = CrawlerProcess(settings)
logging.info('Running Crawler: ' + spider.name)
process.start() # the script will block here until the spider_closed signal was sent
logging.info('Crawler ' + spider.name + ' stopped.\n')
This is the console output:
DEBUG:scrapy.core.engine:Crawled (200) <GET http://mil.news.sina.com.cn/2014-10-09/0450804543.html>(referer: http://rss.sina.com.cn/rollnews/jczs/20141009.js)
{'item_name': 'item_sina_news_reply',
'news_id': u'jc:27-1-804530',
'reply_id': u'jc:27-1-804530:1',
'reply_lastcrawl': '1438605374.41',
'reply_table': 'news_reply_20141009'}
Many Thanks!
It may be that what you are viewing in the console is the Twisted Logs.
It will print the Debug level messages to the console.
You can redirect them to your log files using:
from twisted.python import log
observer = log.PythonLoggingObserver(loggerName='logname')
(As given in How to make Twisted use Python logging?)