Understand Scrapy Debug information - scrapy

I would like to get an understanding of what the line means by the word referer in the following line while doing a scrapy run
2021-01-05 19:08:08 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.eaton.com/us/en-us/catalog/services/eaton-ups-and-battery-training/ups-first-responder-training/eaton-9315-training.html> (referer: https://www.eaton.com/us/en-us/sitemap.html)

In Scrapy, if you first yield a Request to say example.com then in the parse function of that request you yield another request to say google.com then scrapy will automatically add a referer header from the page you came from. This is to simulate how a browser works. It's just telling the server which site you came from.
You can disable this in settings.py with REFERER_ENABLED = False

Related

Apache: Response code not 2xx (302) response in bench marking

What is the meaning of
WARNING: Response code not 2xx (302)
LOG: header received: HTTP/1.1 302
I found that in Apache bench marking.
What does it mean?

visualCaptcha error on loading

I can't figure out what else I need to do to initialize visualcaptcha. Running nginx on local server. I'm getting this error in my error log:
2014/11/20 09:00:16 [error] 3567#0: *13 open() "/home/jeff/public/project.com/public/start/6" failed (2: No such file or directory), client: 127.0.0.1, server: localhost, request: "GET /start/6?r=57nri4gpbu HTTP/1.1", host: "localhost", referrer: "http://localhost/"
I'm using php backend with jquery frontend. I've installed everything in the recommended way.
Jeff.
Looking at the error it seems you don't have the php server running in the right place, or accepting connections properly.
It seems your static server (which looks for files in /public/*) is trying to show start/6, when that should be hitting app.php or index.php (not sure if you are running the demo or coding something yourself).
I'll need more details as to your current file structure and virtualhost/server config in order to help you spot where the error is.

Why is 404 working when comment disabled

In my httpd.conf file, every mention of the ErrorDocument has a hash before it on the same line - meaning that it's commented out.
So why do I get a 404 error page on the browser? How does the browser know what message to display?
I must be getting a 404 because this is displayed in the error_log;
[Wed Jun 25 12:21:17 2014] [error] [client **********] File does not exist: /var/www/html/surveys/blahblah
Is there a default setting somewhere?
My environment is Linux, Apache and PHP
You see Apache httpd's simple hardcoded message.
You need to configure an empty document as the ErrorDocument if you want a empty page to be displayed.
See the Apache Documentation or this question for further informations.

Wkhtmltoimage: how to prevent to create pdf/image from localhost/127.0.0.1?

I have a list of urls and I wat to create scrennshots with wkhtmltoimage. Some of the urls are redirected to localhost/127.0.0.1 and then I have a screenshot of my localhost (list of directories). How to prevent it?
You can do any of the following:
Configure your webserver (running on localhost) to Show a pretty page with a message you like - so that you get that screenshot instead of list of directories
Configure your webserver (running on localhost) to Return a http error code 404
Cleanup your list to not include any url that resolves to 127.0.0.1, before feeding it to wkhtmltoimage

Production Rails App - Strange redirect to external sites

I've just launched my first Rails 3.2.6 application to a production server. When someone goes to the home page this is handled by my IndexController and depending on the type of user logged in it might send it to an alternative URL.
Slightly simplified code example of what I have is this:
def index
path = new_user_session_url #default path
if current_user
path = users_admin_index_path #admin path
end
redirect_to path, :notice => flash[:notice], :alert => flash[:alert]
end
What I'm confused at, is I've been monitoring the logs for issues and it appears the redirect is going to random sites in Brazil for two IP addresses. Is this something that I should be worried about? Any information on helping me understand what's going on here would be very much appreciated.
See the log extract below where in the "Redirected to" URL, the domain is getting changed from what my site is to www.bradesco.com.br, www.bb.com.br or www.itau.com.br.
No one has reported any issues on the site, but I just wanted to try and understand this a little better.
Log Extract
Started GET "/" for 65.111.177.188 at 2012-08-10 00:20:10 -0400
Processing by Home::IndexController#index as HTML
Redirected to http://www.itau.com.br/home
Completed 302 Found in 2ms (ActiveRecord: 0.0ms)
Started GET "/" for 65.111.177.188 at 2012-08-10 00:20:10 -0400
Processing by Home::IndexController#index as HTML
Redirected to http://www.bradesco.com.br/home
Completed 302 Found in 1ms (ActiveRecord: 0.0ms)
Started GET "/" for 65.111.177.188 at 2012-08-10 00:20:10 -0400
Processing by Home::IndexController#index as HTML
Redirected to http://www.bb.com.br/home
Completed 302 Found in 1ms (ActiveRecord: 0.0ms)
Started GET "/" for 64.251.28.71 at 2012-08-09 22:00:20 -0400
Processing by Home::IndexController#index as HTML
Redirected to http://www.bradesco.com.br/home
Completed 302 Found in 1ms (ActiveRecord: 0.0ms)
I'm seeing the same thing with one of my Rails staging servers. I think the issue is that you need to reject all traffic that isn't for the expected domains.
Something like this in your nginx setup ( if you're using nginx ):
http://nginx.org/en/docs/http/server_names.html
server {
listen 80 default_server;
server_name _;
return 444;
}
Not sure what the point of this traffic is? Some sort of round-about new way of using someone else's Rails app as a phishing site, while sniffing network traffic? There seems to be too many variables for that to be an effective technique.