apache on mod_wsgi Daemon Mode not yielding small string - mod-wsgi

While trying to test a few things, Using Django + apache2 + mod_wsgi3.3. I find two different results by running periodic yielding of results. Between embeded and daemon mode.
When tried with embedded mode, i.e having no WSGIDaemonProcess, WSGIProcessGroup directive used. below mentioned function yields results one after the other, with every digit getting printed on browser view after 2 seconds of sleep.
def yielder(request):
gen = testYielding()
return HttpResponse(gen)
def testYielding():
yield "3"
time.sleep(2)
yield "4"
time.sleep(2)
yield "5"
time.sleep(2)
yield "6"
time.sleep(2)
yield "7"
Though with DaemonMode on, this view responds data after collating the complete response post 8 seconds with all the digits printed together and not yielding the same, one after the other.
Is this behavior correct? and is there a way to make sure on Daemon mode responses are yielded like embedded mode?

The flush which occurs in the daemon process isn't transfered across to the Apache child worker process that is doing the proxying. Whether the output therefore is passed back through to the client immediately, will depend in part on what Apache output filters you have registered. If you have output filters which want to try and buffer up response data before flushing, you will see this issue.
You should therefore look closely at what Apache output filters are in place. If you can change these, then you would have no choice but to use embedded mode.

Related

Two consecutive yields, only the first work

I have this piece of code that only executes the first yield's callback and not the next one. I have tried reordering them and it gives the same result:
Only the first yield callback gets executed.
for j in range(totalOrderPages): # the code gets in the loop
productURI = feedUrl % (productId, j + 1)
print "Got in the loop" # this gets printed
yield response.follow(productURI, self.parse_orders, meta={'pid': productId, 'categories': categories})
yield response.follow(first_page, self.parse_product, meta={'pid': productId, 'categories': categories})
Is there anything in Python or scrapy that prevents 2 consecutive yields?
Second question:
I'm trying to debug this using pdb.set_trace() but when I try to execute yield from the debugging console, it give the yield outside function error.
Does anyone know how can we debug yields?
Thank you.
Without knowing more details, like the redirection behaviour of the specific site or the contents of the variables (feedUrl, productURI, first_page, etc), I would say that some requests are being dropped by the Dupefilter (https://doc.scrapy.org/en/latest/topics/settings.html#dupefilter-class).
I'd recommend you to enable the DEBUG logging level and setting DUPEFILTER_DEBUG=True, and check the logs to see if that's the case.
You can force requests to bypass the Dupefilter by adding dont_filter=True when calling response.follow.
If this doesn't solve your issue, please share your crawl logs so we can have more information to debug the issue. Happy scraping!

Twisted deferreds block when URI is the same (multiple calls from the same browser)

I have the following code
# -*- coding: utf-8 -*-
# 好
##########################################
import time
from twisted.internet import reactor, threads
from twisted.web.server import Site, NOT_DONE_YET
from twisted.web.resource import Resource
##########################################
class Website(Resource):
def getChild(self, name, request):
return self
def render(self, request):
if request.path == "/sleep":
duration = 3
if 'duration' in request.args:
duration = int(request.args['duration'][0])
message = 'no message'
if 'message' in request.args:
message = request.args['message'][0]
#-------------------------------------
def deferred_activity():
print 'starting to wait', message
time.sleep(duration)
request.setHeader('Content-Type', 'text/plain; charset=UTF-8')
request.write(message)
print 'finished', message
request.finish()
#-------------------------------------
def responseFailed(err, deferred):
pass; print err.getErrorMessage()
deferred.cancel()
#-------------------------------------
def deferredFailed(err, deferred):
pass; # print err.getErrorMessage()
#-------------------------------------
deferred = threads.deferToThread(deferred_activity)
deferred.addErrback(deferredFailed, deferred) # will get called indirectly by responseFailed
request.notifyFinish().addErrback(responseFailed, deferred) # to handle client disconnects
#-------------------------------------
return NOT_DONE_YET
else:
return 'nothing at', request.path
##########################################
reactor.listenTCP(321, Site(Website()))
print 'starting to serve'
reactor.run()
##########################################
# http://localhost:321/sleep?duration=3&message=test1
# http://localhost:321/sleep?duration=3&message=test2
##########################################
My issue is the following:
When I open two tabs in the browser, point one at http://localhost:321/sleep?duration=3&message=test1 and the other at http://localhost:321/sleep?duration=3&message=test2 (the messages differ) and reload the first tab and then ASAP the second one, then the finish almost at the same time. The first tab about 3 seconds after hitting F5, the second tab finishes about half a second after the first tab.
This is expected, as each request got deferred into a thread, and they are sleeping in parallel.
But when I now change the URL of the second tab to be the same as the one of the first tab, that is to http://localhost:321/sleep?duration=3&message=test1, then all this becomes blocking. If I press F5 on the first tab and as quickly as possible F5 on the second one, the second tab finishes about 3 seconds after the first one. They don't get executed in parallel.
As long as the entire URI is the same in both tabs, this server starts to block. This is the same in Firefox as well as in Chrome. But when I start one in Chrome and another one in Firefox at the same time, then it is non-blocking again.
So it may not neccessarily be related to Twisted, but maybe because of some connection reusage or something like that.
Anyone knows what is happening here and how I can solve this issue?
Coincidentally, someone asked a related question over at the Tornado section. As you suspected, this is not an "issue" in Twisted but rather a "feature" of web browsers :). Tornado's FAQ page has a small section dedicated to this issue. The proposed solution is appending an arbitrary query string.
Quote of the day:
One dev's bug is another dev's undocumented feature!

Why are my flows not connecting?

Just starting with noflo, I'm baffled as why I'm not able to get a simple flow working. I started today, installing noflo and core components following the example pages, and the canonical "Hello World" example
Read(filesystem/ReadFile) OUT -> IN Display(core/Output)
'package.json' -> IN Read
works... so far fine, then I wanted to change it slightly adding "noflo-rss" to the mix, and then changing the example to
Read(rss/FetchFeed) OUT -> IN Display(core/Output)
'http://xkcd.com/rss.xml' -> IN Read
Running like this
$ /node_modules/.bin/noflo-nodejs --graph graphs/rss.fbp --batch --register=false --debug
... but no cigar -- there is no output, it just sits there with no output at all
If I stick a console.log into the sourcecode of FetchFeed.coffee
parser.on 'readable', ->
while item = #read()
console.log 'ITEM', item ## Hack the code here
out.send item
then I do see the output and the content of the RSS feed.
Question: Why does out.send in rss/FetchFeed not feed the data to the core/Output for it to print? What dark magic makes the first example work, but not the second?
When running with --batch the process will exit when the network has stopped, as determined by a non-zero number of open connections between nodes.
The problem is that rss/FetchFeed does not open a connection on its outport, so the connection count drops to zero and the process exists.
One workaround is to run without --batch. Another one I just submitted as a pull request (needs review).

jmeter stop current iteration

I am facing the following problem:
I have multiple HTTP Requests in my testplan.
I want every request to be repeated 4 times if they fail.
I realized that with a BeanShell Assertion, and its already working fine.
My problem is, that I don't want requests to be executed if a previous Request failed 5 times,
BUT I also dont want the thread to end.
I just want the current thread iteration to end,
so that the next iteration of the thread can start again with the 1st request (if the thread is meant to be repeated).
How do I realize that within the BeanShell Assertion?
Here is just a short extract of my code where i want the solution to have
badResponseCounter is being increased for every failed try of the request, this seems to work so far. Afterwards, the variable gets resetted.
if (badResponseCounter = 5) {
badResponseCounter = 0;
// Stop current iteration
}
I already checked the API, methods like setStopTest() or setStopThread() are given, but nothing for quitting the current iteration. I also need the preference "continue" in the thread group, as otherwise the entire test will stop after 1 single request failed.
Any ideas of how to do this?
In my opinion the easiest way is using following combination:
If Controller to check ${JMeterThread.last_sample_ok} and badResponseCounter variables
Test Action Sampler as a child of If Controller configured to "Go to next loop iteration"
Try this.
ctx.setRestartNextLoop(true);
if the thread number is 2, i tried to skip. I get the below result as i expected (it does not call b-2). It does not kill the thread either.

Rails 3.2.2 log files unordered, requests intertwined

I recollect getting log files that were nicely ordered, so that you could follow one request, then the next, and so on.
Now, the log files are, as my 4 year old says "all scroggled up", meaning that they are no longer separate, distinct chunks of text. Loggings from two requests get intertwined/mixed up.
For instance:
Started GET /foobar
...
Completed 200 OK in 2ms (Views: 0.4ms | ActiveRecord: 0.8ms)
Patient Load (wait, that's from another request that has nothing to do with foobar!)
[ blank space ]
Something else
This is maddening, because I can't tell what's happening within one single request.
This is running on Passenger.
I tried to search for the same answer but couldn't find any good info. I'm not sure if you should fix server or rails code.
If you want more info about the issue here is the commit that removed old way of logging https://github.com/rails/rails/commit/04ef93dae6d9cec616973c1110a33894ad4ba6ed
If you value production log readability over everything else you can use the
PassengerMaxInstancesPerApp 1
configuration. It might cause some scaling issues. Alternatively you could stuff something like this in application.rb:
process_log_filename = Rails.root + "log/#{Rails.env}-#{Process.pid}.log"
log_file = File.open(process_log_filename, 'a')
Rails.logger = ActiveSupport::BufferedLogger.new(log_file)
Yep!, they have made some changes in the ActiveSupport::BufferedLogger so it is not any more waiting until the request has ended to flush the logs:
http://news.ycombinator.com/item?id=4483390
https://github.com/rails/rails/commit/04ef93dae6d9cec616973c1110a33894ad4ba6ed
But they have added the ActiveSupport::TaggedLogging which is very funny and you can stamp every log with any kind of mark you want.
In your case could be good to stamp the logs with the request UUID like this:
# config/application.rb
config.log_tags = [:uuid]
Then even if the logs are messed up you still can follow which of them correspond to the request you are following up.
You can make more funny things with this feature to help you in your logs study:
How to log user_name in Rails?
http://zogovic.com/post/21138929607/running-time-in-rails-logs
Well, for me the TaggedLogging solution is a no go, I can live with some logs getting lost if the server crashes badly, but I want my logs to be perfectly ordered. So, following advice from the issue comments I'm applying this to my app:
# lib/sequential_logs.rb
module ActiveSupport
class BufferedLogger
def flush
#log_dest.flush
end
def respond_to?(method, include_private = false)
super
end
end
end
# config/initializers/sequential_logs.rb
require 'sequential_logs.rb'
Rails.logger.instance_variable_get(:#logger).instance_variable_get(:#log_dest).sync = false
As far as I can say this hasn't affected my app, it is still running and now my logs make sense again.
They should add some quasi-random reqid and write it in every line regarding one single request. This way you won't get confused.
I haven't used it, but I believe Lumberjack's unit_of_work method may be what you're looking for. You call:
Lumberjack.unit_of_work do
yield
end
And all logging done either in that block or in the yielded block are tagged with a unique ID.