Splash freezes with "Timing out client: IPv4Address" - scrapy-splash

I am running scrapy-splash for scraping data from one website.
Regularly ( randomly) splash freezes with next logs:
[36msplash-service_1 |[0m 2020-07-16 08:49:35.119333 [-] "172.31.0.4" - - [16/Jul/2020:08:49:34 +0000] "POST /execute HTTP/1.1" 200 266018 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36"
[36msplash-service_1 |[0m 2020-07-16 08:50:10.012973 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51970)
[36msplash-service_1 |[0m 2020-07-16 08:50:10.858080 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51978)
[36msplash-service_1 |[0m 2020-07-16 08:50:16.873014 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51974)
[36msplash-service_1 |[0m 2020-07-16 08:50:17.547947 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51966)
[36msplash-service_1 |[0m 2020-07-16 08:50:18.037436 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51976)
[36msplash-service_1 |[0m 2020-07-16 08:50:29.064655 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51932)
[36msplash-service_1 |[0m 2020-07-16 08:50:35.119997 [-] Timing out client: IPv4Address(type='TCP', host='172.31.0.4', port=51968)
How can I get the reason of that? Why it might stuck?
P.S I run it with args={"lua_source": self.lua_script_navigate, "timeout":60000}

Refer to Splash's HTTP API documention of the argument timeout:
timeout : float : optional
A timeout (in seconds) for the render (defaults to 30).
By default, maximum allowed value for the timeout is 90 seconds. To override it start Splash with --max-timeout command line option.
For example, here Splash is configured to allow timeouts up to 5
minutes:
$ docker run -it -p 8050:8050 scrapinghub/splash --max-timeout 300
If you has not started splash with --max-timeout, your lua_script is aborted after 30s even when setting a higher timeout in args.

Related

Springboot Webflux accesslog: What are the two numbers at the end please?

Small question regarding how to interpret a SpringBoot Webflux app access log please.
Currently, in my logs, more precisely access logs, I can see:
2021-07-31 13:46:19.913 INFO [service,,] 10 --- [or-http-epoll-1] reactor.netty.http.server.AccessLog : ip - - [31/Jul/2021:13:46:19 +0000] "GET /health HTTP/1.1" 200 3349 6
2021-07-31 13:47:18.531 INFO [service,,] 10 --- [or-http-epoll-2] reactor.netty.http.server.AccessLog : ip - - [31/Jul/2021:13:47:18 +0000] "GET /health/liveness HTTP/2.0" 200 3312 8
2021-07-31 13:47:33.347 INFO [service,,] 10 --- [or-http-epoll-2] reactor.netty.http.server.AccessLog : ip - - [31/Jul/2021:13:47:33 +0000] "GET /health HTTP/1.1" 200 3349 11
I understand the 200 is probably my http response, I return http 200.
But I am having a hard time understanding what are the last two numbers please.
3349 6
3312 8
3349 11
Any help?
Thank you
It does depend on log format definition, but it looks like the larger number is response size in bytes and the smaller is processing time of the request in ms.
I'll look at documentation to see where I'd expect to find the log format definition for a spring webflux app. I'd expect the format to be defined in a similar way to httpd access logs ( documentation for those is at https://httpd.apache.org/docs/2.4/logs.html)

TSocket read 0 bytes (code THRIFTTRANSPORT): TTransportException('TSocket read 0 bytes',)

[enter image description here][1]When I integrated HIVE into my HUE, I reported a mistake.
I tried for many days, but I couldn't solve it. Can anyone help me?
I search on Google,but no success.
TSocket read 0 bytes
×
TSocket read 0 bytes (code THRIFTTRANSPORT): TTransportException('TSocket read 0 bytes',)
update:
my hue version is 4.5.0 and hive version is 3.1.0
hive is one of the compenent of hdp 3.1.4.0
and no kerberos configured
hue conf related hive is below:
[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
hive_server_host=jd-xxx-001.ABC.XYZ
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10016
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/etc/hive/conf
# Timeout in seconds for thrift calls to Hive service
server_conn_timeout=120
hue log as below:
[03/Nov/2019 19:12:43 -0800] access INFO 192.168.16.13 admin - "GET /static/desktop/js/queryBuilder.4597d86a7a3f.js HTTP/1.1" returned in 3ms
[03/Nov/2019 19:12:43 -0800] access INFO 192.168.16.13 admin - "GET /static/desktop/js/bundles/hue/notebook-chunk-8a9143f5572b79c918e5.aefcf25c309b.js HTTP/1.1" returned in 1ms
[03/Nov/2019 19:12:43 -0800] access INFO 192.168.16.13 admin - "GET /static/desktop/js/bundles/hue/vendors~notebook-chunk-8a9143f5572b79c918e5.8b3cae4709a3.js HTTP/1.1" returne
d in 3ms
[03/Nov/2019 19:12:44 -0800] hive_server2_lib INFO Opening beeswax thrift session for user admin
[03/Nov/2019 19:12:44 -0800] thrift_util INFO Thrift exception; retrying: TSocket read 0 bytes
[03/Nov/2019 19:12:44 -0800] thrift_util INFO Thrift exception; retrying: TSocket read 0 bytes
[03/Nov/2019 19:12:44 -0800] thrift_util WARNING Out of retries for thrift call: OpenSession
[03/Nov/2019 19:12:44 -0800] thrift_util INFO Thrift saw a transport exception: TSocket read 0 bytes
[03/Nov/2019 19:12:44 -0800] access INFO 192.168.16.13 admin - "POST /notebook/api/autocomplete/ HTTP/1.1" returned in 33ms
[03/Nov/2019 19:12:44 -0800] access INFO 192.168.16.13 admin - "POST /notebook/api/create_notebook HTTP/1.1" returned in 5ms
[03/Nov/2019 19:12:44 -0800] access INFO 192.168.16.13 admin - "GET /desktop/workers/aceSqlSyntaxWorker.js HTTP/1.1" returned in 3ms
[03/Nov/2019 19:12:44 -0800] access INFO 192.168.16.13 admin - "GET /desktop/workers/aceSqlLocationWorker.js HTTP/1.1" returned in 1ms
[03/Nov/2019 19:12:44 -0800] access INFO 192.168.16.13 admin - "GET /desktop/api2/user_preferences/default_app HTTP/1.1" returned in 2ms
[03/Nov/2019 19:12:44 -0800] access INFO 192.168.16.13 admin - "GET /static/desktop/js/ace/theme-hue.js HTTP/1.1" returned in 0ms
[03/Nov/2019 19:12:45 -0800] access WARNING 192.168.16.13 admin - "POST /jobbrowser/jobs/ HTTP/1.1"-- 404 not found
[03/Nov/2019 19:12:45 -0800] base WARNING Not Found: /jobbrowser/jobs/
[03/Nov/2019 19:12:45 -0800] access INFO 192.168.16.13 admin - "GET /notebook/api/get_history HTTP/1.1" returned in 5ms
[03/Nov/2019 19:12:45 -0800] access INFO 192.168.16.13 admin - "GET /static/desktop/js/ace/snippets/hive.js HTTP/1.1" returned in 0ms
[03/Nov/2019 19:12:45 -0800] access INFO 192.168.16.13 admin - "GET /desktop/api2/context/computes/hive HTTP/1.1" returned in 19ms
[03/Nov/2019 19:12:46 -0800] hive_server2_lib INFO Opening beeswax thrift session for user admin
[03/Nov/2019 19:12:46 -0800] thrift_util INFO Thrift exception; retrying: TSocket read 0 bytes
[03/Nov/2019 19:12:46 -0800] thrift_util INFO Thrift exception; retrying: TSocket read 0 bytes
[03/Nov/2019 19:12:46 -0800] thrift_util WARNING Out of retries for thrift call: OpenSession
[03/Nov/2019 19:12:46 -0800] thrift_util INFO Thrift saw a transport exception: TSocket read 0 bytes
[03/Nov/2019 19:12:46 -0800] decorators ERROR Error running create_session
Traceback (most recent call last):
File "/home/hue/hue450_install/hue/desktop/libs/notebook/src/notebook/decorators.py", line 105, in decorator
return func(*args, **kwargs)
File "/home/hue/hue450_install/hue/desktop/libs/notebook/src/notebook/api.py", line 97, in create_session
response['session'] = get_api(request, session).create_session(lang=session['type'], properties=properties)
File "/home/hue/hue450_install/hue/desktop/libs/notebook/src/notebook/connectors/hiveserver2.py", line 90, in decorator
raise QueryError(message)
QueryError: TSocket read 0 bytes (code THRIFTTRANSPORT): TTransportException('TSocket read 0 bytes',)
This helps me:
hue/build/env/bin/pip install thrift-sasl==0.2.1
Use SASL framework to establish connection to host. use_sasl=true
engine=sqlite3 -> engine=postgresql_psycopg2 (or another db)
Sounds like the same this one which explains why https://github.com/cloudera/hue/issues/849

Cant access apprtc from another machine in same network

Iam new in webRtc. I need to run appRtc application on my local machine. I got the codebase from github and successfully run on my local ubuntu machine.
these are logs when try with browser on my pc
INFO 2017-11-30 09:00:18,966 api_server.py:205] Starting API server at: http://localhost:51242
WARNING 2017-11-30 09:00:18,976 inotify_file_watcher.py:196] There are too many directories in your application for changes in all of them to be monitored. You may have to restart the development server to see some changes to your files.
INFO 2017-11-30 09:00:18,976 dispatcher.py:197] Starting module "default" running at: http://localhost:8080
INFO 2017-11-30 09:00:18,978 admin_server.py:116] Starting admin server at: http://localhost:8000
INFO 2017-11-30 09:00:43,290 apprtc.py:93] Applying media constraints: {'video': {'mandatory': {}, 'optional': [{'minWidth': '1280'}, {'minHeight': '720'}]}, 'audio': True}
WARNING 2017-11-30 09:00:43,299 apprtc.py:137] Invalid or no value returned from memcache, using fallback: null
INFO 2017-11-30 09:00:43,370 module.py:788] default: "GET / HTTP/1.1" 200 8616
INFO 2017-11-30 09:00:43,430 module.py:788] default: "GET /callstats/callstats.min.js HTTP/1.1" 304 -
INFO 2017-11-30 09:00:43,433 module.py:788] default: "GET /css/main.css HTTP/1.1" 304 -
INFO 2017-11-30 09:00:43,491 module.py:788] default: "GET /js/apprtc.debug.js HTTP/1.1" 304 -
INFO 2017-11-30 09:00:47,736 apprtc.py:417] Added client 02588444 in room 490664053, retries = 0
INFO 2017-11-30 09:00:47,738 apprtc.py:93] Applying media constraints: {'video': {'mandatory': {}, 'optional': [{'minWidth': '1280'}, {'minHeight': '720'}]}, 'audio': True}
WARNING 2017-11-30 09:00:47,746 apprtc.py:137] Invalid or no value returned from memcache, using fallback: null
INFO 2017-11-30 09:00:47,748 apprtc.py:560] User 02588444 joined room 490664053
INFO 2017-11-30 09:00:47,751 apprtc.py:561] Room 490664053 has state ['02588444']
INFO 2017-11-30 09:00:47,902 module.py:788] default: "POST /join/490664053 HTTP/1.1" 200 1199
INFO 2017-11-30 09:00:49,820 apprtc.py:485] Saved message for client 02588444:{True, 1} in room 490664053, retries=0
INFO 2017-11-30 09:00:49,833 module.py:788] default: "POST /message/490664053/02588444 HTTP/1.1" 200 21
INFO 2017-11-30 09:00:49,880 apprtc.py:485] Saved message for client 02588444:{True, 2} in room 490664053, retries=0
INFO 2017-11-30 09:00:49,894 module.py:788] default: "POST /message/490664053/02588444 HTTP/1.1" 200 21
INFO 2017-11-30 09:00:49,930 apprtc.py:485] Saved message for client 02588444:{True, 3} in room 490664053, retries=0
INFO 2017-11-30 09:00:49,958 module.py:788] default: "POST /message/490664053/02588444 HTTP/1.1" 200 21
INFO 2017-11-30 09:00:49,970 apprtc.py:485] Saved message for client 02588444:{True, 4} in room 490664053, retries=0
INFO 2017-11-30 09:00:49,982 module.py:788] default: "POST /message/490664053/02588444 HTTP/1.1" 200 21
INFO 2017-11-30 09:00:50,074 apprtc.py:485] Saved message for client 02588444:{True, 5} in room 490664053, retries=0
INFO 2017-11-30 09:00:50,084 module.py:788] default: "POST /message/490664053/02588444 HTTP/1.1" 200 21
I can open apprtc application on my browser. But i canot access from any other machine in same network.
these are the logs when try with browser on any other pc in same network.
INFO 2017-11-30 08:55:06,749 api_server.py:205] Starting API server at: http://localhost:55410
WARNING 2017-11-30 08:55:06,758 inotify_file_watcher.py:196] There are too many directories in your application for changes in all of them to be monitored. You may have to restart the development server to see some changes to your files.
INFO 2017-11-30 08:55:06,759 dispatcher.py:197] Starting module "default" running at: http://localhost:8080
INFO 2017-11-30 08:55:06,760 admin_server.py:116] Starting admin server at: http://localhost:8000
INFO 2017-11-30 08:55:10,582 apprtc.py:93] Applying media constraints: {'audio': True, 'video': {'optional': [{'minWidth': '1280'}, {'minHeight': '720'}], 'mandatory': {}}}
WARNING 2017-11-30 08:55:10,589 apprtc.py:137] Invalid or no value returned from memcache, using fallback: null
INFO 2017-11-30 08:55:10,636 module.py:788] default: "GET / HTTP/1.1" 200 8616
INFO 2017-11-30 08:55:10,685 module.py:788] default: "GET /css/main.css HTTP/1.1" 200 6402
INFO 2017-11-30 08:55:10,689 module.py:788] default: "GET /callstats/callstats.min.js HTTP/1.1" 200 245432
INFO 2017-11-30 08:55:10,696 module.py:788] default: "GET /js/apprtc.debug.js HTTP/1.1" 200 101567
INFO 2017-11-30 08:55:11,086 module.py:788] default: "GET /images/webrtc-icon-192x192.png HTTP/1.1" 200 31806
INFO 2017-11-30 08:55:18,872 apprtc.py:417] Added client 26553344 in room 419183955, retries = 0
INFO 2017-11-30 08:55:18,874 apprtc.py:93] Applying media constraints: {'audio': True, 'video': {'optional': [{'minWidth': '1280'}, {'minHeight': '720'}], 'mandatory': {}}}
WARNING 2017-11-30 08:55:18,879 apprtc.py:137] Invalid or no value returned from memcache, using fallback: null
INFO 2017-11-30 08:55:18,880 apprtc.py:560] User 26553344 joined room 419183955
INFO 2017-11-30 08:55:18,881 apprtc.py:561] Room 419183955 has state ['26553344']
INFO 2017-11-30 08:55:18,896 module.py:788] default: "POST /join/419183955 HTTP/1.1" 200 1203
INFO 2017-11-30 08:55:24,331 module.py:788] default: "GET /images/webrtc-icon-192x192.png HTTP/1.1" 304 -
INFO 2017-11-30 08:55:24,363 apprtc.py:485] Saved message for client 26553344:{True, 1} in room 419183955, retries=0
INFO 2017-11-30 08:55:24,383 module.py:788] default: "POST /message/419183955/26553344 HTTP/1.1" 200 21
INFO 2017-11-30 08:55:24,396 apprtc.py:485] Saved message for client 26553344:{True, 2} in room 419183955, retries=1
INFO 2017-11-30 08:55:24,405 module.py:788] default: "POST /message/419183955/26553344 HTTP/1.1" 200 21
INFO 2017-11-30 08:55:58,211 apprtc.py:455] Removed client 26553344 from room 419183955, retries=0
INFO 2017-11-30 08:55:58,212 apprtc.py:494] Room 419183955 has state None
INFO 2017-11-30 08:55:58,221 module.py:788] default: "POST /leave/419183955/26553344 HTTP/1.1" 200 -
INFO 2017-11-30 08:55:59,071 module.py:788] default: "GET /r/ HTTP/1.1" 404 154
INFO 2017-11-30 08:56:00,542 module.py:788] default: "GET /favicon.ico HTTP/1.1" 404 154
The browser shows some error message like this
Failed to get access to local media. Error name was NotSupportedError. Continuing without sending a stream.
Why this behaviour? i need to access from other pc. is this possible?
Please do enable SSL. The access will only work with SSL

Build protocol mechanism in twisted Site

I am trying to understand how and when the protocols are created for http requests in Site factory. Though I have a basic understanding of working of factory and protocols,My confusion arises because I see multiple protocols are created for single requests from browser. Following are sample code and the output .
import sys
from twisted.web.server import Site
from twisted.web.static import File
from twisted.internet import reactor
from twisted.python import log
log.startLogging(sys.stdout)
class GFactory(Site):
protmade = 0
def __init__(self,resource):
Site.__init__(self,resource)
def buildProtocol(self, addr):
GFactory.protmade +=1
print "Building Protocol in factory" + str(GFactory.protmade)
print addr
p = Site.buildProtocol(self, addr)
return p
resource = File('./temp')
factory = GFactory(resource)
reactor.listenTCP(8888, factory)
reactor.run()
Log from a single request of http://localhost:8888/ through chrome
2016-05-02 01:38:07+0530 [-] Log opened.
2016-05-02 01:38:07+0530 [-] GFactory starting on 8888
2016-05-02 01:38:07+0530 [-] Starting factory <__main__.GFactory instance at 0x11bac75f0>
2016-05-02 01:38:19+0530 [-] Building Protocol in factory1
2016-05-02 01:38:19+0530 [-] IPv4Address(TCP, '127.0.0.1', 64913)
2016-05-02 01:38:19+0530 [-] Building Protocol in factory2
2016-05-02 01:38:19+0530 [-] IPv4Address(TCP, '127.0.0.1', 64914)
2016-05-02 01:38:19+0530 [-] Building Protocol in factory3
2016-05-02 01:38:19+0530 [-] IPv4Address(TCP, '127.0.0.1', 64915)
2016-05-02 01:38:19+0530 [-] "127.0.0.1" - - [01/May/2016:20:08:19 +0000] "GET / HTTP/1.1" 200 800 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
Following are my queries
1) As shown above there are 3 different protocols created for requests from 3 different ports. Is this an expected behaviour? Shouldn't there be one protocol for each request?
2) If I have to take some actions when the request expire or protocol is closed, which protocol should I be looking into?

Installation of Joomla extension make apache core dump

I installed joomla 3.2 on my ubunut 12 server and try to install some extension on it
But every time I have a failure of "No data received" in chrome
I checked apache log and I found following log:
[Thu Mar 06 01:06:08 2014] [notice] child pid 4278 exit signal
Segmentation fault (11), possible coredump in /etc/apache2
192.168.2.119 - - [06/Mar/2014:01:05:40 +0100] "GET /acece/administrator/index.php?option=com_installer&view=update&task=update.ajax&eid=0&skip=700
HTTP/1.1" 200 293 "Mozilla/5.0
(Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/33.0.1750.146 Safari/537.36"
Any one have some idea?
The same server ran joomla 2.5 before without any problem
I tried to update the packages of ubuntu , but nothing changes