I have a small web server written using Twisted. One of the things I want to do is have it return a result from another web server as the response to loading a page. That is, the response to render_GET() at server A (via http://A.com/resource) should be the content of a different URL at server B (via http://B.com/resource2). The content returned by server B is dynamic, so I can't just cache it.
Right now, server A can render pages just fine, it just can't render this remote resource. I've tried with Agent(), but I can't seem to get the response from B let alone forward it to A. I know that somewhere I have to take that request from the render_GET and later write() and finish() it. That's done in the cbBody callback, which get called but can't get to the original request to populate it.
Here's a piece of the code for server A's resource handler:
def render_GET(self,request):
# try with canned content just to test the whole thing
bmpServer = BMPServer(ServerBURL,
"xyzzy",
"plugh")
d= bmpServer.postNotification({"a":123},request)
print "Deferred", d
return NOT_DONE_YET
And this is the other code at server A:
theRequest = None
def cbRequest(response,args):
print "response called"
print response
print args
print 'Response version:', response.version
print 'Response code:', response.code
print 'Response phrase:', response.phrase
print 'Response headers:'
print pformat(list(response.headers.getAllRawHeaders()))
d = readBody(response)
d.addCallback(cbBody)
return d
def cbBody(body):
print "Response body:"
print body
theRequest.write(body)
theRequest.finish()
theRequest = None
def cbError(failure):
print type(failure.value), failure # catch error here
print failure.value.reasons[0].printTraceback()
class BMPServer(object):
def __init__(self,url,arg1,arg2):
self.url = url
self.arg1 = arg1
self.arg2 = arg2
def postNotification(self,message,request):
theRequest = request
bmpMessage = {'arg1':self.token,
'arg2':self.appID,
'message':message}
print "Sending request to %s"%self.url
print "Create agent"
agent = Agent(reactor)
print "create request deferred"
print "url = %s" % self.url
d = agent.request('POST', self.url,
Headers({'User-Agent': ['Twisted Web Client Example']}),
MessageProducer(bmpMessage))
print "adding callback"
d.addCallbacks(cbRequest,cbError)
print "returning deferred"
return d
When I run this as a standalone code (outside of the resource, using react() for example), it works fine. However, when I try to include it as shown above it just never seems to receive the data. I've got WireShark running so I can see the response is being returned from Server B, but the data never shows up in cbRequest().
For example, here's the output I see:
Sending request to http://localhost:8888/postMGCMNotificationService
Create agent
create request deferred
url = http://serverB:8888/postService
Message producer: body = {"arg2": "plugh", "arg1": "xyzzy", "message": {"a": 1}}
adding callback
returning deferred
testAgent: returning deferred
<Deferred at 0x10b54d290>
Writing body now
response called
<twisted.web._newclient.Response object at 0x1080753d0>
Response version: ('HTTP', 1, 1)
Response code: 200
Response phrase: OK
Response headers:
Response body:
{"result": false}
^CUnhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/Twisted-13.1.0_r39314-py2.7-macosx-10.8-intel.egg/twisted/web/_newclient.py", line 1151, in _bodyDataFinished_CONNECTED
self._bodyProtocol.connectionLost(reason)
File "/Library/Python/2.7/site-packages/Twisted-13.1.0_r39314-py2.7-macosx-10.8-intel.egg/twisted/web/client.py", line 1793, in connectionLost
self.deferred.callback(b''.join(self.dataBuffer))
File "/Library/Python/2.7/site-packages/Twisted-13.1.0_r39314-py2.7-macosx-10.8-intel.egg/twisted/internet/defer.py", line 382, in callback
self._startRunCallbacks(result)
File "/Library/Python/2.7/site-packages/Twisted-13.1.0_r39314-py2.7-macosx-10.8-intel.egg/twisted/internet/defer.py", line 490, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/Library/Python/2.7/site-packages/Twisted-13.1.0_r39314-py2.7-macosx-10.8-intel.egg/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "AServer.py", line 85, in cbBody
print theRequest
exceptions.UnboundLocalError: local variable 'theRequest' referenced before assignment
Looking at this a little more, it seems that if I could figure out a way to get the request over to cbBody() this would all work just fine.
You can pass extra arguments to callbacks on a Deferred:
d.addCallback(f, x)
When d fires, the result is f(result of d, x). You can pass as many positional or keyword arguments as you like this way. See the API documentation for Deferred for more details.
Related
I have an API which gets the success or error message on console.I am new to python and trying to read the response. Google throws so many examples to use subprocess but I dont want to run,call any command or sub process. I just want to read the output after below API call.
This is the response in console when success
17:50:52 | Logged in!!
This is the github link for the sdk and documentation
https://github.com/5paisa/py5paisa
This is the code
from py5paisa import FivePaisaClient
email = "myemailid#gmail.com"
pw = "mypassword"
dob = "mydateofbirth"
cred={
"APP_NAME":"app-name",
"APP_SOURCE":"app-src",
"USER_ID":"user-id",
"PASSWORD":"pw",
"USER_KEY":"user-key",
"ENCRYPTION_KEY":"enc-key"
}
client = FivePaisaClient(email=email, passwd=pw, dob=dob,cred=cred)
client.login()
In general it is bad practice to get a value from STDOUT. There are some ways but it's pretty tricky (it's not made for it). And the problem doesn't come from you but from the API which is wrongly designed, it should return a value e.g. True or False (at least) to tell you if you logged in, and they don't do it.
So, according to their documentation it is not possible to know if you're logged in, but you may be able to see if you're logged in by checking the attribute client_code in the client object.
If client.client_code is equal to something then it should be logged in and if it is equal to something else then not. You can try comparing it's value when you successfully login or when it fails (wrong credential for instance). Then you can put a condition : if it is None or False or 0 (you will have to see this by yourself) then it is failed.
Can you try doing the following with a successful and failed login:
client.login()
print(client.client_code)
Source of the API:
# Login function :
# (...)
message = res["body"]["Message"]
if message == "":
log_response("Logged in!!")
else:
log_response(message)
self._set_client_code(res["body"]["ClientCode"])
# (...)
# _set_client_code function :
def _set_client_code(self, client_code):
try:
self.client_code = client_code # <<<< That's what we want
except Exception as e:
log_response(e)
Since this questions asks how to capture "stdout" one way you can accomplish this is to intercept the log message before it hits stdout.
The minimum code to capture a log message within a Python script looks this:
#!/usr/bin/env python3
import logging
logger = logging.getLogger(__name__)
class RequestHandler(logging.Handler):
def emit(self, record):
if record.getMessage().startswith("Hello"):
print("hello detected")
handler = RequestHandler()
logger.addHandler(handler)
logger.warning("Hello world")
Putting it all together you may be able to do something like this:
import logging
from py5paisa import FivePaisaClient
email = "myemailid#gmail.com"
pw = "mypassword"
dob = "mydateofbirth"
cred={
"APP_NAME":"app-name",
"APP_SOURCE":"app-src",
"USER_ID":"user-id",
"PASSWORD":"pw",
"USER_KEY":"user-key",
"ENCRYPTION_KEY":"enc-key"
}
client = FivePaisaClient(email=email, passwd=pw, dob=dob,cred=cred)
class PaisaClient(logging.Handler):
def __init__():
self.loggedin = False # this is the variable we can use to see if we are "logged in"
def emit(self, record):
if record.getMessage().startswith("Logged in!!")
self.loggedin = True
def login():
client.login()
logging.getLogger(py5paisa) # get the logger for the py5paisa library
# tutorial here: https://betterstack.com/community/questions/how-to-disable-logging-from-python-request-library/
logging.basicConfig(handlers=[PaisaClient()], level=0, force=True)
c = PaisaClient()
c.login()
I'm trying to run the sample program from this RedisLabs page.
I chose Option A - which was to set up the free Redis cloud server.
(Seems like if you install manually, then you have to add the JSON as a plugin.)
I'm able to connect and use other "set" commands, but getting error on JSON:
File "C:\Users\nwalt\.virtualenvs\TDAmeritradeGetQuotes\lib\site-packages\redis\client.py", line 901, in execute_command
return self.parse_response(conn, command_name, **options)
File "C:\Users\nwalt\.virtualenvs\TDAmeritradeGetQuotes\lib\site-packages\redis\client.py", line 915, in parse_response
response = connection.read_response()
File "C:\Users\nwalt\.virtualenvs\TDAmeritradeGetQuotes\lib\site-packages\redis\connection.py", line 756, in read_response
raise response
redis.exceptions.ResponseError: unknown command 'JSON.SET'
My Python test program (except put in the sample endpoint before posting):
import redis
import json
import pprint
host_info = "redis.us-east-1-1.ec2.cloud.redislabs.com"
redisObj = redis.Redis(host=host_info, port=18274, password='xxx')
print ("Normal call to Redis")
redisObj.set('foo', 'bar')
value = redisObj.get('foo')
print(value)
capitals = {
"Lebanon": "Beirut",
"Norway": "Oslo",
"France": "Paris"
}
print ("capitals - before call to Redis")
pprint.pprint(capitals)
print("JSON call to Redis")
redisObj.execute_command('JSON.SET', 'doc', '.', json.dumps(capitals))
print("Data Saved, now fetch data back from redis")
reply = json.loads(redisObj.execute_command('JSON.GET', 'doc'))
print("reply from Redis get")
pprint.pprint(reply)
This is the screen shot from their website where I created the database. I didn't see any option to enable JSON or add any modules.
Not sure this was available when I created the REDIS database, but it is now. When you create it on redislabs.com, you can turn on the modules, and pick one from the list.
Then use this library: "rejson" from https://pypi.org/project/rejson/ to get the method "jsonset" method, using such code such as this:
rj = Client(host=config_dict['REDIS_CONFIG_HOST'], port=config_dict['REDIS_CONFIG_PORT'], password=config_dict['REDIS_CONFIG_PASSWORD'], decode_responses=True)
out_doc = {}
out_doc['firstname'] = "John"
out_doc['lastname'] = "Doe"
rj.jsonset('config', Path.rootPath(), out_doc)
get_doc = rj.jsonget('config', Path.rootPath())
pprint.pprint(get_doc)
I'm not used the cloud redis, in my local the Python don't load the JSON.SET
I just so make done, in this sample https://onelinerhub.com/python-redis/save-json-to-redis
first.feature
Given ur ''
def payload = read('')
request payload
soap action ''
value = /Envelope/Body/Response/Result/Num
print value # prints value correctly as expected
second.feature
Background:
*def fetch = read('first.feature')
*def data = call fetch
Scenario:
print data.response # prints the soap response in json format.
def res = data.response
print res["s:Envelope"][""]["s:Body"]["Response"][""]["Result"]["_"]["a:num']
first.feature works as expected ( response is in soap )
When I try to call this feature in another feature then the response is in json format.
I want to use a value from this response to pass it on to another request.
I had to use res["s:Envelope"]["_"]["s:Body"][][].. to get to that.
Is there a way to easily fetch a value from this response as we do in first.feature?
Please could anyone let me know how to achieve this.
Make this change:
* xml res = data.response
We will be improving this in the next version, it would be good if you can test the develop branch and confirm: https://github.com/intuit/karate/wiki/Developer-Guide
I have a question about using apollo-upload-client and graphene-django. Here I've discovered that apollo-upload-client adding operations to formData. But here graphene-django is only trying to get query parameter. And the question is, where and how it should be fixed?
If you're referring to the data that has a header like (when viewing the HTTP from Chrome tools):
Content-Disposition: form-data; name="operations"
and data like
{"operationName":"MyMutation","variables":{"myData"....}, "query":"mutation MyMutation"...},
the graphene-python library interprets this and assembles it into a query for you, inserting the variables and removing the file data from the query. If you are using Django, you can find all of the uploaded files in info.context.FILES when writing a mutation.
Here's my solution to support the latest apollo-upload-client (8.1). I recently had to revisit my Django code when I upgraded from apollo-upload-client 5.x to 8.x. Hope this helps.
Sorry I'm using an older graphene-django but hopefully you can update the mutation syntax to the latest.
Upload scalar type (passthrough, basically):
class Upload(Scalar):
'''A file upload'''
#staticmethod
def serialize(value):
raise Exception('File upload cannot be serialized')
#staticmethod
def parse_literal(node):
raise Exception('No such thing as a file upload literal')
#staticmethod
def parse_value(value):
return value
My upload mutation:
class UploadImage(relay.ClientIDMutation):
class Input:
image = graphene.Field(Upload, required=True)
success = graphene.Field(graphene.Boolean)
#classmethod
def mutate_and_get_payload(cls, input, context, info):
with NamedTemporaryFile(delete=False) as tmp:
for chunk in input['image'].chunks():
tmp.write(chunk)
image_file = tmp.name
# do something with image_file
return UploadImage(success=True)
The heavy lifting happens in a custom GraphQL view. Basically it injects the file object into the appropriate places in the variables map.
def maybe_int(s):
try:
return int(s)
except ValueError:
return s
class CustomGraphqlView(GraphQLView):
def parse_request_json(self, json_string):
try:
request_json = json.loads(json_string)
if self.batch:
assert isinstance(request_json,
list), ('Batch requests should receive a list, but received {}.').format(
repr(request_json))
assert len(request_json) > 0, ('Received an empty list in the batch request.')
else:
assert isinstance(request_json, dict), ('The received data is not a valid JSON query.')
return request_json
except AssertionError as e:
raise HttpError(HttpResponseBadRequest(str(e)))
except BaseException:
logger.exception('Invalid JSON')
raise HttpError(HttpResponseBadRequest('POST body sent invalid JSON.'))
def parse_body(self, request):
content_type = self.get_content_type(request)
if content_type == 'application/graphql':
return {'query': request.body.decode()}
elif content_type == 'application/json':
return self.parse_request_json(request.body.decode('utf-8'))
elif content_type in ['application/x-www-form-urlencoded', 'multipart/form-data']:
operations_json = request.POST.get('operations')
map_json = request.POST.get('map')
if operations_json and map_json:
operations = self.parse_request_json(operations_json)
map = self.parse_request_json(map_json)
for file_id, f in request.FILES.items():
for name in map[file_id]:
segments = [maybe_int(s) for s in name.split('.')]
cur = operations
while len(segments) > 1:
cur = cur[segments.pop(0)]
cur[segments.pop(0)] = f
logger.info('parse_body %s', operations)
return operations
else:
return request.POST
return {}
I save crawled urls in Mysql database. When scrapy crawls sites again, the schedule or the downloader should only hit/crawl/download the page if its url is not in database.
#settings.py
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.RandomUserAgentMiddleware': 400,
'myproject.middlewares.ProxyMiddleware': 410,
'myproject.middlewares.DupFilterMiddleware': 390,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
# Disable compression middleware, so the actual HTML pages are cached
}
#middlewares.py
class DupFilterMiddleware(object):
def process_response(self, request, response, spider):
conn = MySQLdb.connect(user='dbuser',passwd='dbpass',db='dbname',host='localhost', charset='utf8', use_unicode=True)
cursor = conn.cursor()
log.msg("Make mysql connection", level=log.INFO)
cursor.execute("""SELECT id FROM scrapy WHERE url = %s""", (response.url))
if cursor.fetchone() is None:
return None
else:
raise IgnoreRequest("Duplicate --db-- item found: %s" % response.url)
#spider.py
class TestSpider(CrawlSpider):
name = "test_spider"
allowed_domains = ["test.com"]
start_urls = ["http://test.com/company/JV-Driver-Jobs-dHJhZGVzODkydGVhbA%3D%3D"]
rules = [
Rule(SgmlLinkExtractor(allow=("http://example.com/job/(.*)",)),callback="parse_items"),
Rule(SgmlLinkExtractor(allow=("http://example.com/company/",)), follow=True),
]
def parse_items(self, response):
l = XPathItemLoader(testItem(), response = response)
l.default_output_processor = MapCompose(lambda v: v.strip(), replace_escape_chars)
l.add_xpath('job_title', '//h1/text()')
l.add_value('url',response.url)
l.add_xpath('job_description', '//tr[2]/td[2]')
l.add_value('job_code', '99')
return l.load_item()
It works but I got ERROR: Error downloading from raise IgnoreRequest() . Is it intended ?
2013-10-15 17:54:16-0600 [test_spider] ERROR: Error downloading <GET http://example.com/job/aaa>: Duplicate --db-- item found: http://example.com/job/aaa
Another problem with my approach is I have to query for each url I am going to crawl. Says, I have 10k urls to crawl which means I hit mysql server 10k times. How can i do it in 1 mysql query? (e.g. get all crawled urls and store them somewhere, then check the request url against them)
Update:
Follow audiodude suggestion, here is my latest code. However, DupFilterMiddleware stops working. It runs the init but never call process_request anymore. Removing _init_ will make the process_request works again. What did I do wrong ?
class DupFilterMiddleware(object):
def __init__(self):
self.conn = MySQLdb.connect(user='myuser',passwd='mypw',db='mydb',host='localhost', charset='utf8', use_unicode=True)
self.cursor = self.conn.cursor()
self.url_set = set()
self.cursor.execute('SELECT url FROM scrapy')
for url in self.cursor.fetchall():
self.url_set.add(url)
print self.url_set
log.msg("DupFilterMiddleware Initialize mysql connection", level=log.INFO)
def process_request(self, request, spider):
log.msg("Process Request URL:{%s}" % request.url, level=log.WARNING)
if request.url in url_set:
log.msg("IgnoreRequest Exception {%s}" % request.url, level=log.WARNING)
raise IgnoreRequest()
else:
return None
A few things I can think of:
First, you should use process_request in your DupFilterMiddleware. That way, you filter the request before it ever even gets downloaded. Your current solution is wasting alot of time and resources downloading pages that eventually get thrown out.
Secondly, you should not connect to your database inside of process_response/process_request. That means you are creating a new connection for every item (and throwing away the old one). This is very inefficient. Try the following:
class DupFilterMiddleware(object):
def __init__(self):
self.conn = MySQLdb.connect(...
self.cursor = conn.cursor()
Then replace cursor.execute(... in your process_response method with self.cursor.execute(...
Finally, I would agree that it can be suboptimal to hit the MySQL server 10k times. For such a low volume of data, why not load it all into a set() in memory. Put this in the __init__ method of your downloader middleware:
self.url_set = set()
cursor.execute('SELECT url FROM scrapy')
for url in cursor.fetchall():
self.url_set.add(url)
Then instead of executing a query and checking results, simply do:
if response.url in url_set:
raise IgnoreRequest(...