How to retry other request in specific pipeline when connection false - scrapy

[scrapy.downloadermiddlewares.retry] DEBUG: Retrying http://img14.360buyimg.com/n1/s800x800_jfs/t21448/27/2565333063/465767/c06c0af6/5b5c83e6Nb83e3a19.pn > (failed 1 times): User timeout caused connection failure: Getting http://img14.360buyimg.com/n1/s800x800_jfs/t21448/27/2565333063/465767/c06c0af6/5b5c83e6Nb83e3a19.png took longer than 180.0 seconds..
Like This . I have several pipelines and I want to request other image url when request timeout failure.
I saw scrapy
Retry Middleware
But It seems like for all requests. I want to Specify Only My ImagePipelines.

Look at the documentation for the RetryMiddleware.
The setting RETRY_ENABLED is True by default, you can change it to False to disable the middleware.

Related

Jibri recording issues behind reverse proxy

I'm trying to run Jibri as part of a Jitsi-Meet installation (all on one server) behind a reverse SSL proxyJitsi works out of the box, but as soon as Jibri tries to log in to the session to record it, the corresponding Chrome session times out. Here's an excerpt from the jibri log:
2021-04-04 09:09:42.546 FINE: [890] org.jitsi.jibri.selenium.pageobjects.CallPage.visit() Visiting url https://example.com/room#config.iAmRecorder=true&config.externalConnectUrl=null&config.startWithAudioMuted=true&config.startWithVideoMuted=true&interfaceConfig.APP_NAME="Jibri"&config.analytics.disabled=true&config.p2p.enabled=false&config.prejoinPageEnabled=false&config.requireDisplayName=false
2021-04-04 09:09:42.633 FINE: [890] org.jitsi.jibri.selenium.pageobjects.CallPage.apply() Not joined yet: APP is not defined
...
2021-04-04 09:10:12.945 INFO: [890] org.jitsi.jibri.selenium.JibriSelenium.onSeleniumStateChange() Transitioning from state Starting up to Error: FailedToJoinCall SESSION Failed to join the call
2021-04-04 09:10:12.947 INFO: [890] org.jitsi.jibri.service.impl.FileRecordingJibriService.onServiceStateChange() File recording service transitioning from state Starting up to Error: FailedToJoinCall SESSION Failed to join the call
The reverse proxy is configured to watch out for this login string on port 443 (normal SSL traffic per the URL above) and forward this to the Jitsi instance. Prosody accepts the request on its http-bind interface but then the invocation times out.
As the web server logs are inconclusive: Where / what logs can I check to see what happens afterwards? I can see jicofo picking up the invocation but don't know what happens afterwards (Jicofo 2021-04-04 09:09:42.130 INFO: [461] org.jitsi.jicofo.recording.jibri.JibriSession.log() Updating status from JIBRI: <iq to='focus#auth.example.com/focus647288887711795' from='jibribrewery#internal.auth.example.com/jibri-nickname' id='5iurC-49012' type='result'><jibri xmlns='http://jitsi.org/protocol/jibri' status='pending'/></iq> for room#conference.example.com)?
More than happy to provide more info as required.

How to make scrapy continue downloading after losing connection fo a while

I am trying to scrape a website using scrapy, but the network in the office is unstable. If we lose network connection for even a few seconds, scrapy gets stuck and stops downloading. we can see that the last log is:
2018-08-27 11:50:05 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): *.*.org
2018-08-27 11:50:07 [urllib3.connectionpool] DEBUG: https://**.**.org:443 "GET /01313_**0.jpg HTTP/1.1" 200 135790
I had tried to change the timeout settings, but nothing happened.
thank you!
You can try to set RETRY_TIMES setting (in settings.py):
RETRY_TIMES=5

Openshift online v3 - Timeout when reading response headers from daemon process

I created an python api on openshift online with python image. If you request all the data, it takes more than 30 seconds to respond. The server gives a 504 gateway timeout http response. How do you configure the length a response can take? > I created an annotation on the route, this seems to set proxy timeout.
haproxy.router.openshift.io/timeout: 600s
Problem remains, I now got logging. It looks like the message comes from mod_wsgi.
I want to try alter the configuration of the httpd (mod_wsgi-express process) from request-timeout 60 to request-timeout 600. Where doe you configure this. I am using base image https://github.com/sclorg/s2i-python-container/tree/master/2.7
Logging:
Timeout when reading response headers from daemon process 'localhost:8080':/tmp/mod_wsgi-localhost:8080:1000430000/htdocs
Does someone know how to fix this error on openshift online
Next to alter timeout of haproxy of the route of my app
haproxy.router.openshift.io/timeout: 600s
I altered the request-timeout and socket-timeout in app.sh of my python application. So the mod_wsgi-express server is configured with a higher timeout
ARGS="$ARGS --request-timeout 600"
ARGS="$ARGS --socket-timeout 600"
My application now wait 10 minutes before cancelling a request

Tomcat Connection Pool Exhausted

I'm using Apache Tomcat JDBC connection pooling in my project. I'm confused because under heavy load I keep seeing the following error:
12:26:36,410 ERROR [] (http-/XX.XXX.XXX.X:XXXXX-X) org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-/XX.XXX.XXX.X:XXXXX-X] Timeout: Pool empty. Unable to fetch a connection in 10 seconds, none available[size:4; busy:4; idle:0; lastwait:10000].
12:26:36,411 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/APP].[AppConf]] (http-/XX.XXX.XXX.X:XXXXX-X) JBWEB000236: Servlet.service() for servlet AppConf threw exception: org.jboss.resteasy.spi.UnhandledException: java.lang.NullPointerException
My expectation was that with pooling, requests for new connections would be held in a queue until a connection became available. Instead it seems that requests are rejected when the pool has reached capacity. Can this behaviour be changed?
Thanks,
Dal
This is my pool configuration:
PoolProperties p = new PoolProperties();
p.setUrl("jdbc:oracle:thin:#" + server + ":" + port + ":" + SID_SVC);
p.setDriverClassName("oracle.jdbc.driver.OracleDriver");
p.setUsername(username);
p.setPassword(password);
p.setMaxActive(4);
p.setInitialSize(1);
p.setMaxWait(10000);
p.setRemoveAbandonedTimeout(300);
p.setMinEvictableIdleTimeMillis(150000);
p.setTestOnBorrow(true);
p.setValidationQuery("SELECT 1 from dual");
p.setMinIdle(1);
p.setMaxIdle(2);
p.setRemoveAbandoned(true);
p.setJdbcInterceptors(
"org.apache.tomcat.jdbc.pool.interceptor.ConnectionState;"
+ "org.apache.tomcat.jdbc.pool.interceptor.StatementFinalizer;"
+ "org.apache.tomcat.jdbc.pool.interceptor.ResetAbandonedTimer");
This working as per design/implementation, if you see the log Timeout: Pool empty. Unable to fetch a connection in 10 seconds and your configuration is p.setMaxWait(10000);. The requesting thread waits for 10seconds(10000 millseconds, maxwait) before giving up waiting for connection.
Now you have two solutions, increase the number of maxActive connection or check if there are any connection leaks/long running queries(which you do not expect).

Gwan report.c statistics

I am testing on G-wan server performance and it's very amazing!!! Here is the output from report.c
Requests
All: 5,725 (6.06% of Cache misses)
HTTP: 66 (1.15% of all requests)
Errors: 70 (1.22% of all requests)
CSP: 5,650 (98.69% of all requests) Exceptions: 1
Connections
Accepted: 4,717 (1.21 requests per connection)
Closed: 4,372
Timeouts: 682 (14.46%) Accept:682 Read:0 Slow:0 Build:0 Send:0 Close:0
Busy: 345 (Waiting: 334 Reading: 9 Replying: 2 Sending: 0 Pushing: 0 Relaying: 0 Closing: 0)
I found that the Errors rate seem to be quite high, and there an exceptions occur on CSP too, could anyone tell me what did "Errors" mean and how to avoid it? Thanks!
the "Errors" rate seem to be quite high
That's HTTP errors (wrong requests coming from a client, not found resources, etc. - look at the error.log file for a trace).
The only way to avoid HTTP errors is to prevent clients from connecting to the server.
If you can't live with this "high rate of HTTP errors" of 1.22% of all requests then use a G-WAN connection handler (with the HTTP_ERROR notification) to make G-WAN ignore HTTP errors and close the connection without sending an HTTP error message (just return 0; in the handler) - but that's probably not what most users want.
there an exceptions occur on CSP too
An exception means a 'graceful crash report' was issued for a servlet bug. As you have only 1 crash on 5,650 dynamic requests, that was probably during the servlet development. Look at your error.log and trace files to check what happened.
Note that the "cache misses" statistics are for static contents only (1.15% of all your HTTP requests).
Apparently, not all your clients are responding in the timely fashion: you have timeouts and pending requests.