how to set TCP connect timeout in scrapy?

how to set TCP connect timeout in scrapy? - scrapy

When crawling a slow website, I always get the error: TCP connection timed out: 10060. I guess this may happen when the crawler tries to establish a TCP connection with the server and the default connect timeout is too low.
I know the download timeout can be set in scrapy, but I found no way to set the connect timeout. Does anyone have some ideas? Thanks!

DOWNLOAD_TIMEOUT can be set in settings.py in your scrapy project
http://doc.scrapy.org/en/latest/topics/settings.html#download-timeout

Related

filezilla can't connect to vsftpd with TLS, but does work with unencrypted connection

I set up my server on centos7
From client side(not localhost), I can connect and transfer files to server with unencrypted connection but can't connect with TLS
It's my vsftpd.conf:
listen=YES
listen_ipv6=NO
pam_service_name=vsftpd
userlist_enable=YES
tcp_wrappers=YES
rsa_cert_file=/home/user/server/sync.crt
rsa_private_key_file=/home/user/server/sync.key
ssl_enable=YES
allow_anon_ssl=NO
force_local_data_ssl=YES
force_local_logins_ssl=YES
ssl_tlsv1=YES
ssl_sslv2=NO
ssl_sslv3=NO
require_ssl_reuse=NO
ssl_ciphers=HIGH
pasv_enable=YES
pasv_min_port=50000
pasv_max_port=60000
pasv_address=1.1.1.1
and filezilla's errorcode:
Connection attempt failed with "ETIMEDOUT - Connection attempt timed out".
425 Failed to establish connection.
How do I solve this problem?

This kind of error typically happens when a data connection cannot be created to transfer files or directory listings. Such data connections are done using dynamic ports, where in case of PASV the port to use is announced by the server within the response to the PASV command.
Firewalls often employ helpers to scan the traffic and look for such responses announcing which port the client should use - and then temporarily allowing such access. In case of plain FTP without encryption the firewall can see the response and determine the port to open - then it works. But, in case of FTPS the control connection is encrypted and therefore the firewall only sees encrypted communication and cannot determine the port to open - then it fails.

When will monit actually start or restart a service

Can someone please let me know on what basis monit decides that its time to restart an application? For instance, if I want monit to monitor my web application, what information should I provide to monit based on which it will restart?
thanks
Update:
I was able to kind of make it work using the following monit config
check host altamides with address web.dev1.ams
if failed port 80 with protocol http
then alert
However, I was wondering if I can use any absolute URL of my application. Something like http://foo:5453/test/url/1.html/
Can someone help me on that please?

Monit by himself will not restart any service, but you can provide to it the rules you want to perform it, you can do something like
check process couchdb with pidfile /usr/local/var/run/couchdb/couchdb.pid
start program = "/etc/init.d/couchdb start"
stop program = "/etc/init.d/couchdb stop"
if cpu > 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if memory usage > 70% MB for 5 cycles then restart
check host mmonit.com with address mmonit.com
if failed port 80 protocol http then alert
if failed port 443 protocol https then alert

I figured the answer from monit help page
if failed
port 80
protocol http
request "/data/show?a=b&c=d"
then restart

How to keep HAProxy TCP connection alive?

I am having HAProxy in front of TCP servers like Postgres, Mongo and logstash. I am able to establish TCP connection but the connection will timeout after several minutes. Errors I'm getting are like
Mongo::Error::SocketTimeoutError, Socket request timed out
and
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
What can I do to keep TCP connection alive?

Will this help?
option srvtcpka

When I use libvirt api to migration,occur a error like this

libvirt.libvirtError: unable to connect to server at 'ccrfox112:49152': Connection timed out

When migrating QEMU guests, without tunnelling via libvirtd, QEMU will listen on a port number in the range 49152->49216 for a connection from the source host. This error messages shows that the source host was unable to connect to the target host. You've not provided any useful information about your setup, so I'd have to guess that probably you have firewall rules on the target host that are blocking the source host access to the TCP port in question.

getting lots of 408 status code in apache access log after migration from http to https

We are getting lots of 408 status code in apache access log and these are coming after migration from http to https .
Our web server is behind loadbalancer and we are using keepalive on and keepalivetimeout value is 15 sec.
Can someone please help to resolve this.

Same problem here, after migration from http to https. Do not panic, it is not a bug but a client feature ;)
I suppose that you find these log entries only in the logs of the default (or alphabetically first) apache ssl conf and that you have a low timeout (<20).
As of my tests these are clients establishing pre-connected/speculative sockets to your web server for fast next page/resource load.
Since they only establish the initial socket connection or handshake ( 150 bytes or few thousands) the connect to the ip and do not specify a vhost name, and got logged in the default/firs apache conf log.
After few secs from the initial connection they drop the socket if not needed or the use is for faster further request.
If your timeout is lower than these few secs you get the 408 if is higher apache doesn't bother.
So either you ignore them / add a different default conf for apache, or you rise the timeout having more apache processes busy waiting from the client to drop or use the socket.
see https://bugs.chromium.org/p/chromium/issues/detail?id=85229 for some related discussions

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to set TCP connect timeout in scrapy? - scrapy

DOWNLOAD_TIMEOUT can be set in settings.py in your scrapy project http://doc.scrapy.org/en/latest/topics/settings.html#download-timeout

Related

filezilla can't connect to vsftpd with TLS, but does work with unencrypted connection

When will monit actually start or restart a service

How to keep HAProxy TCP connection alive?

When I use libvirt api to migration,occur a error like this

getting lots of 408 status code in apache access log after migration from http to https

Categories

Resources