First time Kannel user here. Trying to set up a kannel sms gateway on our office network but can't seem to establish a connection with the smsc. Please note that smsc hosts can be telnet on the given port from our network. Below is the bearerbox log.
2018-02-06 00:03:02 [9708] [0] INFO: Debug_lvl = -1, log_file = <none>, log_lvl = 0
2018-02-06 00:03:02 [9708] [0] INFO: REDIS: Connected to server at 10.4.163.221:6666.
2018-02-06 00:03:02 [9708] [0] INFO: REDIS: Selected database 0
2018-02-06 00:03:02 [9708] [0] INFO: REDIS: server version 2.8.20.
2018-02-06 00:03:02 [9708] [0] INFO: DLR using storage type: redis
2018-02-06 00:03:02 [9708] [0] DEBUG: Kannel bearerbox version `svn-r5111M'.
Build `Nov 11 2014 15:51:10', compiler `4.4.7 20120313 (Red Hat 4.4.7-11)'.
System Linux, release 2.6.32-642.el6.x86_64, version #1 SMP Wed Apr 13 00:51:26 EDT 2016, machine x86_64.
Hostname kannel64-001.dev1.whispir.net, IP 10.4.163.216.
Libxml version 2.7.6.
Using OpenSSL 1.0.1e-fips 11 Feb 2013.
Using hiredis API 0.10.1
Using native malloc.
2018-02-06 00:03:02 [9708] [0] INFO: Added logfile `/app/kannel-telcow/log/bearerbox.log' with level `0'.
2018-02-06 00:03:02 [9708] [0] INFO: Started access logfile `/app/kannel-telcow/log/access/access.log'.
2018-02-06 00:03:02 [9708] [0] INFO: HTTP: Opening server at port 13176.
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 1 (gwlib/fdset.c:poller)
2018-02-06 00:03:02 [9708] [1] DEBUG: Thread 1 (gwlib/fdset.c:poller) maps to pid 9708.
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 2 (gwlib/http.c:server_thread)
2018-02-06 00:03:02 [9708] [2] DEBUG: Thread 2 (gwlib/http.c:server_thread) maps to pid 9708.
2018-02-06 00:03:02 [9708] [2] DEBUG: HTTP: Including port 13176, fd 11 for polling in server thread
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 3 (gw/bb_http.c:httpadmin_run)
2018-02-06 00:03:02 [9708] [3] DEBUG: Thread 3 (gw/bb_http.c:httpadmin_run) maps to pid 9708.
2018-02-06 00:03:02 [9708] [0] DEBUG: starting smsbox connection module
2018-02-06 00:03:02 [9708] [0] INFO: BOXC: 'smsbox-max-pending' not set, using default (100).
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 4 (gw/bb_boxc.c:sms_to_smsboxes)
2018-02-06 00:03:02 [9708] [4] DEBUG: Thread 4 (gw/bb_boxc.c:sms_to_smsboxes) maps to pid 9708.
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 5 (gw/bb_boxc.c:smsboxc_run)
2018-02-06 00:03:02 [9708] [5] DEBUG: Thread 5 (gw/bb_boxc.c:smsboxc_run) maps to pid 9708.
2018-02-06 00:03:02 [9708] [0] INFO: Set SMS resend frequency to 60 seconds.
2018-02-06 00:03:02 [9708] [0] INFO: SMS resend retry set to unlimited.
2018-02-06 00:03:02 [9708] [0] DEBUG: MO concatenated message handling enabled
2018-02-06 00:03:02 [9708] [0] INFO: Set throughput to 15.000 for smsc id <smsc-au-telcow>
2018-02-06 00:03:02 [9708] [0] INFO: DLR rerouting for smsc id <smsc-au-telcow> disabled.
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 6 (gw/smsc/smsc_smpp.c:io_thread)
2018-02-06 00:03:02 [9708] [6] DEBUG: Thread 6 (gw/smsc/smsc_smpp.c:io_thread) maps to pid 9708.
2018-02-06 00:03:02 [9708] [0] INFO: Set throughput to 15.000 for smsc id <smsc-au-telcow>
2018-02-06 00:03:02 [9708] [6] DEBUG: Connecting to <120.240.136.6>
2018-02-06 00:03:02 [9708] [0] INFO: DLR rerouting for smsc id <smsc-au-telcow> disabled.
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 7 (gw/smsc/smsc_smpp.c:io_thread)
2018-02-06 00:03:02 [9708] [7] DEBUG: Thread 7 (gw/smsc/smsc_smpp.c:io_thread) maps to pid 9708.
2018-02-06 00:03:02 [9708] [0] DEBUG: Started thread 8 (gw/bb_smscconn.c:sms_router)
2018-02-06 00:03:02 [9708] [0] INFO: ----------------------------------------
2018-02-06 00:03:02 [9708] [0] INFO: Kannel bearerbox II version svn-r5111M starting
2018-02-06 00:03:02 [9708] [8] DEBUG: Thread 8 (gw/bb_smscconn.c:sms_router) maps to pid 9708.
2018-02-06 00:03:02 [9708] [7] DEBUG: Connecting to <120.240.136.7>
2018-02-06 00:03:02 [9708] [0] INFO: MAIN: Start-up done, entering mainloop
2018-02-06 00:03:02 [9708] [2] DEBUG: HTTP: Creating HTTPClient for `10.4.163.219'.
2018-02-06 00:03:02 [9708] [2] DEBUG: HTTP: Created HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:02 [9708] [3] DEBUG: HTTP: Destroying HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:02 [9708] [3] DEBUG: HTTP: Destroying HTTPClient for `10.4.163.219'.
2018-02-06 00:03:02 [9708] [7] DEBUG: SMPP[smsc-au-telcow]: Sending PDU:
2018-02-06 00:03:02 [9708] [7] DEBUG: SMPP PDU 0x7f9990001710 dump:
2018-02-06 00:03:02 [9708] [7] DEBUG: type_name: bind_transceiver
2018-02-06 00:03:02 [9708] [7] DEBUG: command_id: 9 = 0x00000009
2018-02-06 00:03:02 [9708] [7] DEBUG: command_status: 0 = 0x00000000
2018-02-06 00:03:02 [9708] [7] DEBUG: sequence_number: 1 = 0x00000001
2018-02-06 00:03:02 [9708] [7] DEBUG: system_id: "someid"
2018-02-06 00:03:02 [9708] [7] DEBUG: password: "somepasswd"
2018-02-06 00:03:02 [9708] [7] DEBUG: system_type: ""
2018-02-06 00:03:02 [9708] [7] DEBUG: interface_version: 52 = 0x00000034
2018-02-06 00:03:02 [9708] [7] DEBUG: addr_ton: 0 = 0x00000000
2018-02-06 00:03:02 [9708] [7] DEBUG: addr_npi: 0 = 0x00000000
2018-02-06 00:03:02 [9708] [7] DEBUG: address_range: ""
2018-02-06 00:03:02 [9708] [7] DEBUG: SMPP PDU dump ends.
2018-02-06 00:03:02 [9708] [6] DEBUG: SMPP[smsc-au-telcow]: Sending PDU:
2018-02-06 00:03:02 [9708] [6] DEBUG: SMPP PDU 0x7f9994001670 dump:
2018-02-06 00:03:02 [9708] [6] DEBUG: type_name: bind_transceiver
2018-02-06 00:03:02 [9708] [6] DEBUG: command_id: 9 = 0x00000009
2018-02-06 00:03:02 [9708] [6] DEBUG: command_status: 0 = 0x00000000
2018-02-06 00:03:02 [9708] [6] DEBUG: sequence_number: 1 = 0x00000001
2018-02-06 00:03:02 [9708] [6] DEBUG: system_id: someid
2018-02-06 00:03:02 [9708] [6] DEBUG: password: "somepasswd"
2018-02-06 00:03:02 [9708] [6] DEBUG: system_type: ""
2018-02-06 00:03:02 [9708] [6] DEBUG: interface_version: 52 = 0x00000034
2018-02-06 00:03:02 [9708] [6] DEBUG: addr_ton: 0 = 0x00000000
2018-02-06 00:03:02 [9708] [6] DEBUG: addr_npi: 0 = 0x00000000
2018-02-06 00:03:02 [9708] [6] DEBUG: address_range: ""
2018-02-06 00:03:02 [9708] [6] DEBUG: SMPP PDU dump ends.
2018-02-06 00:03:02 [9708] [2] DEBUG: HTTP: Creating HTTPClient for `10.4.163.220'.
2018-02-06 00:03:02 [9708] [2] DEBUG: HTTP: Created HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:02 [9708] [3] DEBUG: HTTP: Destroying HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:02 [9708] [3] DEBUG: HTTP: Destroying HTTPClient for `10.4.163.220'.
2018-02-06 00:03:03 [9708] [2] DEBUG: HTTP: Creating HTTPClient for `10.4.163.219'.
2018-02-06 00:03:03 [9708] [2] DEBUG: HTTP: Created HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:03 [9708] [3] DEBUG: HTTP: Destroying HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:03 [9708] [3] DEBUG: HTTP: Destroying HTTPClient for `10.4.163.219'.
2018-02-06 00:03:03 [9708] [2] DEBUG: HTTP: Creating HTTPClient for `10.4.163.220'.
2018-02-06 00:03:03 [9708] [2] DEBUG: HTTP: Created HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:03 [9708] [3] DEBUG: HTTP: Destroying HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:03 [9708] [3] DEBUG: HTTP: Destroying HTTPClient for `10.4.163.220'.
2018-02-06 00:03:04 [9708] [2] DEBUG: HTTP: Creating HTTPClient for `10.4.163.219'.
2018-02-06 00:03:04 [9708] [2] DEBUG: HTTP: Created HTTPClient area 0x7f999c000ad0.
2018-02-06 00:03:04 [9708] [3] DEBUG: HTTP: Destroying HTTPClient area 0x7f999c000ad0.
Below is the kannel.conf file.
# Group Config
group = smsc
smsc=smpp
transceiver-mode = true
smsc-id=smsc-au-telcow
port=18766
host=120.240.136.6
system-type=
address-range=""
smsc-username=someid
smsc-password=somepasswd
source-addr-ton=1
source-addr-npi=1
dest-addr-ton=1
dest-addr-npi=1
bind-addr-ton=0
bind-addr-npi=0
msg-id-type=0x01
alt-charset="ASCII"
keepalive=100
idle-timeout=100
max-pending-submits=10
use-ssl=true
throughput=15
interface-version=
group = smsc
smsc=smpp
transceiver-mode = true
smsc-id=smsc-au-telcow
port=18766
host=120.240.136.7
system-type=
address-range=""
smsc-username=someuid
smsc-password=somepasswd
source-addr-ton=1
source-addr-npi=1
dest-addr-ton=1
dest-addr-npi=1
bind-addr-ton=0
bind-addr-npi=0
msg-id-type=0x01
alt-charset="ASCII"
keepalive=100
idle-timeout=100
max-pending-submits=10
use-ssl=true
throughput=15
interface-version=
# CORE
group = core
admin-port=13176
smsbox-port=10176
admin-password=k4nn3l
log-file="/app/kannel-telcow/log/bearerbox.log"
log-level=0
access-log-format="%l [SMSC:%i] [SVC:%n] [ACT:%A] [BINF:%B] [FID:%F] [META:%D] [from:%p] [to:%P] [flags:%m:%c:%M:%C:%d] [msg:%L:%b] [udh:%U:%u]"
box-deny-ip="*.*.*.*"
box-allow-ip="127.0.0.1"
#unified-prefix = "00358,0"
access-log="/app/kannel-telcow/log/access/access.log"
dlr-storage = redis
# SMSBOX Setup
group = smsbox
bearerbox-host=localhost
sendsms-port=11176
log-file="/app/kannel-telcow/log/error-smsbox.log"
log-level=0
access-log="/app/kannel-telcow/log/smsaccess.log"
reply-couldnotfetch=""
reply-emptymessage=""
mo-recode=true
group = sendsms-user
username = someuser
password = somepwd
default-sender = 6148993003
default-smsc =
omit-empty = true
max-messages = 10
concatenation = true
group = sms-service
keyword = default
accept-x-kannel-headers = true
get-url = "http://10.4.163.74/gateway_kannel/KannelEntrance?udh=%u&Command=%k&Sender=%p&SMSbody=%r&receiver=%P&fromSMSC=%i"
omit-empty = true
max-messages = 10
group = redis-connection
id = redisdlr
host = 10.4.163.221
port = 6666
database = 0
max-connections = 1
group = dlr-db
id = redisdlr
table = dlr
#ttl = 1
field-smsc = smsc
field-timestamp = ts
field-destination = destination
field-source = source
field-service = service
field-url = url
field-mask = mask
field-status = status
field-boxc-id = boxc
Below is the kannel.sh status output: Please note "connecting..." It should be "online" if everything is well.
[DEV.]root#kannel64-001t:/app/kannel-telcow/etc $ kannel.sh status telcow
=== telcow (13176) ===
Kannel bearerbox version `1.4.4'.
Status: running, uptime 0d 0h 0m 4s
smsbox:(none), IP 127.0.0.1 (0 queued), (on-line 0d 0h 0m 3s)
smsc-au-telcow[smsc-au-telcow] SMPP:120.240.136.6:18766/18766:someid: (connecting, rcvd: sms 0 (0.00,0.00,0.00) / dlr 0 (0.00,0.00,0.00), sent: sms 0 (0.00,0.00,0.00) / dlr 0 (0.00,0.00,0.00), failed 0, queued 0 msgs)
smsc-au-telcow[smsc-au-telcow] SMPP:120.240.136.7:18766/18766:someid: (connecting, rcvd: sms 0 (0.00,0.00,0.00) / dlr 0 (0.00,0.00,0.00), sent: sms 0 (0.00,0.00,0.00) / dlr 0 (0.00,0.00,0.00), failed 0, queued 0 msgs)
Notes: smsc user name/password, ip addresses have been changed in this file due to security reasons. Can someone please advice on this I'm really at a loss here.
Many thanks in advance.
/B
hello its is all your kannel.conf file? Avoid attribute without value in your conf.
eg: system=
I don't see the SMSBOX-ROUTE CONFIGURATION.
You need to configure to forward the flow to the SMSBOX.This configuration work for me.
Please update your kannel.conf file that way and try:
#-------------CORE CONFIGURATION ------------------------
group = core
admin-port=13176
smsbox-port=10176
admin-password=k4nn3l
log-file="/app/kannel-telcow/log/bearerbox.log"
log-level=0
access-log-format="%l [SMSC:%i] [SVC:%n] [ACT:%A] [BINF:%B] [FID:%F] [META:%D] [from:%p] [to:%P] [flags:%m:%c:%M:%C:%d] [msg:%L:%b] [udh:%U:%u]"
admin-allow-ip = "*.*"
box-deny-ip="*.*.*.*"
box-allow-ip="127.0.0.1"
#unified-prefix = "00358,0"
access-log="/app/kannel-telcow/log/access/access.log"
#----------GROUP CONFIGURATION --------------------------
group = smsc
smsc=smpp
transceiver-mode = true
smsc-id=smsc-au-telcow
port=18766
host=120.240.136.6
address-range=""
smsc-username=someid
smsc-password=somepasswd
source-addr-ton=1
source-addr-npi=1
dest-addr-ton=1
dest-addr-npi=1
bind-addr-ton=1
bind-addr-npi=1
msg-id-type=0x01after this be assure that the executing user have the permission at last to write and read in the log file
alt-charset="ASCII"
keepalive=100
idle-timeout=100
max-pending-submits=10
use-ssl=true
wait-ack=600
throughput=60
#---------SMSBOX CONFIGURATION -----------------------
group = smsbox
smsbox-id =smsbox
bearerbox-host="127.0.0.1"
sendsms-port=11176
log-file="/app/kannel-telcow/log/error-smsbox.log"
log-level=0
access-log="/app/kannel-telcow/log/smsaccess.log"
#-------------- SMSBOX-ROUTE CONFIGURATION ----------------
group = smsbox-route
smsbox-id =smsbox
smsc-id =smsc-au-telcow
#-------SMS-SERVICE CONFIGURATION --------------------------
group = sms-service
keyword = default
catch-all = true
accept-x-kannel-headers = true
get-url = "http://10.4.163.74/gateway_kannel/KannelEntrance?udh=%u&Command=%k&Sender=%p&SMSbody=%r&receiver=%P&fromSMSC=%i"
omit-empty = true
max-messages = 10
assume-plain-text = true
#-------SENDSMS-USER CONFIGURATION ----------------------
group = sendsms-user
username = someuser
password = somepwd
default-sender = 6148993003
forced-smsc = smpp
omit-empty = true
max-messages = 10
concatenation = true
#---end.
after this be assure that the executing user have the permission of writting and reading in the log file
Related
So i tried to scrape this website using scrapy and scrapy-selenium for exercise.Iam trying to get names,prices etc. My xpath expression seems okey at dev tool on chrome.But it isnt working at my script.I dont know what i am doing wrong? Can u please explain that why my xpath expression not working?
import scrapy
from scrapy_selenium import SeleniumRequest
from scrapy.selector import Selector
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
class ComputerdealsSpider(scrapy.Spider):
name = 'computerdeals'
def start_requests(self):
yield SeleniumRequest(
url = 'https://slickdeals.net/computer-deals',
wait_time=3,
callback = self.parse
)
def parse(self, response):
products = response.xpath("//ul[#class ='bp-p-filterGrid_items']/li")
for product in products:
yield{
'price' : product.xpath(".//div/span[#class='bp-c-card_subtitle']/text()").get(),
}
OUTPUT
2022-11-20 13:59:59 [scrapy.utils.log] INFO: Scrapy 2.7.0 started (bot: silkdeals)
2022-11-20 13:59:59 [scrapy.utils.log] INFO: Versions: lxml 4.9.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 2.0.1, Twisted 22.8.0, Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)], pyOpenSSL 22.1.0 (OpenSSL 3.0.5 5 Jul 2022), cryptography 38.0.1, Platform Windows-10-10.0.19044-SP0
2022-11-20 13:59:59 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'silkdeals',
'NEWSPIDER_MODULE': 'silkdeals.spiders',
'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['silkdeals.spiders'],
'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2022-11-20 13:59:59 [asyncio] DEBUG: Using selector: SelectSelector
2022-11-20 13:59:59 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2022-11-20 13:59:59 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop
2022-11-20 13:59:59 [scrapy.extensions.telnet] INFO: Telnet Password: d3adcd8a4caad669
2022-11-20 13:59:59 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2022-11-20 13:59:59 [scrapy.middleware] WARNING: Disabled SeleniumMiddleware: SELENIUM_DRIVER_NAME and SELENIUM_DRIVER_EXECUTABLE_PATH must be set
2022-11-20 13:59:59 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-11-20 13:59:59 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-11-20 13:59:59 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-11-20 13:59:59 [scrapy.core.engine] INFO: Spider opened
2022-11-20 13:59:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-11-20 13:59:59 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-11-20 14:00:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://slickdeals.net/robots.txt> (referer: None)
2022-11-20 14:00:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://slickdeals.net/computer-deals/> from <GET https://slickdeals.net/computer-deals>
2022-11-20 14:00:01 [filelock] DEBUG: Attempting to acquire lock 2668401413376 on C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\tldextract\.suffix_cache/publicsuffix.org-tlds\de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-11-20 14:00:01 [filelock] DEBUG: Lock 2668401413376 acquired on C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\tldextract\.suffix_cache/publicsuffix.org-tlds\de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-11-20 14:00:01 [filelock] DEBUG: Attempting to release lock 2668401413376 on C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\tldextract\.suffix_cache/publicsuffix.org-tlds\de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-11-20 14:00:01 [filelock] DEBUG: Lock 2668401413376 released on C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\tldextract\.suffix_cache/publicsuffix.org-tlds\de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2022-11-20 14:00:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://slickdeals.net/computer-deals/> (referer: None)
2022-11-20 14:00:01 [scrapy.core.engine] INFO: Closing spider (finished)
2022-11-20 14:00:01 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 681,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 96185,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/301': 1,
'elapsed_time_seconds': 2.098319,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 11, 20, 11, 0, 1, 689826),
'httpcompression/response_bytes': 617590,
'httpcompression/response_count': 2,
'log_count/DEBUG': 10,
'log_count/INFO': 10,
'log_count/WARNING': 1,
'response_received_count': 2,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/200': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2022, 11, 20, 10, 59, 59, 591507)}
2022-11-20 14:00:01 [scrapy.core.engine] INFO: Spider closed (finished)
I am currently trying to crawl a website (around 500 subpages).
The script is working quite smoothly. However, after 3 to 4 hours of running, I am sometimes getting the error message which you can find bellow. I think it is not the script which makes the problems, it is the website server which is quite slowly.
My question is: Is it somehow possible to send more than 3 "failed requests" before the script automatically stops/ closes the spider?
2019-09-27 10:53:46 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 1 pages/min), scraped 4480 items (at 10 items/min)
2019-09-27 10:54:00 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=4480> (failed 1 times): 504 Gateway Time-out
2019-09-27 10:54:46 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 0 pages/min), scraped 4480 items (at 0 items/min)
2019-09-27 10:55:00 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=4480> (failed 2 times): 504 Gateway Time-out
2019-09-27 10:55:46 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 0 pages/min), scraped 4480 items (at 0 items/min)
2019-09-27 10:56:00 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=4480> (failed 3 times): 504 Gateway Time-out
2019-09-27 10:56:00 [scrapy.core.engine] DEBUG: Crawled (504) <GET https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=4480> (referer: https://blogabet.com/tipsters) ['partial']
2019-09-27 10:56:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <504 https://blogabet.com/tipsters/?f[language]=all&f[pickType]=all&f[sport]=all&f[sportPercent]=&f[leagues]=all&f[picksOver]=0&f[lastActive]=12&f[bookiesUsed]=null&f[bookiePercent]=&f[order]=followers&f[start]=4480>: HTTP status code is not handled or not allowed
2019-09-27 10:56:00 [scrapy.core.engine] INFO: Closing spider (finished)
UPDATED CODE ADDED
class AlltipsSpider(Spider):
name = 'alltips'
allowed_domains = ['blogabet.com']
start_urls = ('https://blogabet.com',)
def parse(self, response):
self.driver = webdriver.Chrome('C:\webdrivers\chromedriver.exe')
with open("urls.txt", "rt") as f:
start_urls = [url.strip() for url in f.readlines()]
for url in start_urls:
self.driver.get(url)
self.driver.find_element_by_id('currentTab').click()
sleep(3)
self.logger.info('Sleeping for 5 sec.')
self.driver.find_element_by_xpath('//*[#id="_blog-menu"]/div[2]/div/div[2]/a[3]').click()
sleep(7)
self.logger.info('Sleeping for 7 sec.')
while True:
try:
element = self.driver.find_element_by_id('last_item')
self.driver.execute_script("arguments[0].scrollIntoView(0, document.documentElement.scrollHeight-5);", element)
sleep(3)
self.driver.find_element_by_id('last_item').click()
sleep(7)
except NoSuchElementException:
self.logger.info('No more tipps')
sel = Selector(text=self.driver.page_source)
allposts = sel.xpath('//*[#class="block media _feedPick feed-pick"]')
for post in allposts:
username = post.xpath('.//div[#class="col-sm-7 col-lg-6 no-padding"]/a/#title').extract()
publish_date = post.xpath('.//*[#class="bet-age text-muted"]/text()').extract()
yield {'Username': username,
'Publish date': publish_date
self.driver.quit()
break
You can do this by simply changing the RETRY_TIMES setting to a higher number.
You can read about your retry-related options in the RetryMiddleware docs: https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#std:setting-RETRY_TIMES
Im atempting to stream the PersistenceQuery result with akka-http as SSE but it seems like when the http connection is closed from the client the PersistenceQuery stream is still hitting the event backend periodically.
// Http part
complete {
source(id)
.map(e => e.event) // other transformations
.map(e => ServerSentEvent(m.toString))
.keepAlive(4 seconds, () => ServerSentEvent.heartbeat)
}
// source
def source(id: UUID)(implicit system: ActorSystem, materializer: ActorMaterializer)= {
import system.dispatcher
val journalQuery = PersistenceQuery(system).readJournalFor[CassandraReadJournal](CassandraReadJournal.Identifier)
val futureSrcGraph: RunnableGraph[Future[Source[EventEnvelope, NotUsed]]] =
journalQuery.currentEventsByPersistenceId(id.toString, 0, Long.MaxValue)
.map(_.sequenceNr)
.toMat(Sink.last)(Keep.right)
.mapMaterializedValue(fs => fs.recoverWith {
case _ => Future { 0L } // assume we start at 1
}.map(s => journalQuery.eventsByPersistenceId(id.toString, s + 1, Long.MaxValue)))
Source.fromFutureSource(futureSrcGraph.run())
So this basically works the only problem is that the stream is never finished or so it seems. I have tried it with both CassandraReadJournal and LevelDb
Example of the log output:
[DEBUG] [06/18/2018 10:52:16.774] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query from seqNr [6] in partition [0]
[DEBUG] [06/18/2018 10:52:16.790] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query took [15] ms (empty)
[DEBUG] [06/18/2018 10:52:16.790] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query from seqNr [6] in partition [1]
[DEBUG] [06/18/2018 10:52:16.796] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query took [5] ms (empty)
[DEBUG] [06/18/2018 10:52:19.768] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query from seqNr [6] in partition [0]
[DEBUG] [06/18/2018 10:52:19.784] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query took [15] ms (empty)
[DEBUG] [06/18/2018 10:52:19.784] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query from seqNr [6] in partition [1]
[DEBUG] [06/18/2018 10:52:19.790] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query took [6] ms (empty)
[DEBUG] [06/18/2018 10:52:22.765] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query from seqNr [6] in partition [0]
[DEBUG] [06/18/2018 10:52:22.772] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query took [6] ms (empty)
[DEBUG] [06/18/2018 10:52:22.772] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query from seqNr [6] in partition [1]
[DEBUG] [06/18/2018 10:52:22.790] [sys-cassandra-plugin-default-dispatcher-17] [EventsByPersistenceIdStage(akka://sys)] EventsByPersistenceId [c6031a8a-db71-4dcb-9d4f-f140faa2f4c4] Query took [17] ms (empty)
And it keeps going for ever.
I have also tried omitting the Source.fromFutureSource and just running journalQuery.eventsByPersistenceId with the same results.
What am I doing wrong?
The here were that my corporat proxy never drops the connection to the server even when the client closes the connection.
I followed the basic Scrapy Login. It always works, but in this case, I had some problems. The FormRequest.from_response didn't request the https://www.crowdfunder.com/user/validateLogin, instead it always sent payload to https://www.crowdfunder.com/user/signup. I tried directly request the validateLogin with payload, but it responded with 404 Error. Any idea to help me solve this problem? Thanks in advance!!!
class CrowdfunderSpider(InitSpider):
name = "crowdfunder"
allowed_domains = ["crowdfunder.com"]
start_urls = [
'http://www.crowdfunder.com/',
]
login_page = 'https://www.crowdfunder.com/user/login/'
payload = {}
def init_request(self):
"""This function is called before crawling starts."""
return scrapy.Request(url=self.login_page, callback=self.login)
def login(self, response):
"""Generate a login request."""
self.payload = {'email': 'my_email',
'password': 'my_password'}
# scrapy login
return scrapy.FormRequest.from_response(response, formdata=self.payload, callback=self.check_login_response)
def check_login_response(self, response):
"""Check the response returned by a login request to see if we are
successfully logged in.
"""
if 'https://www.crowdfunder.com/user/settings' == response.url:
self.log("Successfully logged in. :) :) :)")
# start the crawling
return self.initialized()
else:
# login fail
self.log("login failed :( :( :(")
Here is the payload and request link sent by clicking login in browser:
payload and request url sent by clicking login button
Here is the log info:
2016-10-21 21:56:21 [scrapy] INFO: Scrapy 1.1.0 started (bot: crowdfunder_crawl)
2016-10-21 21:56:21 [scrapy] INFO: Overridden settings: {'AJAXCRAWL_ENABLED': True, 'NEWSPIDER_MODULE': 'crowdfunder_crawl.spiders', 'SPIDER_MODULES': ['crowdfunder_crawl.spiders'], 'ROBOTSTXT_OBEY': True, 'BOT_NAME': 'crowdfunder_crawl'}
2016-10-21 21:56:21 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-10-21 21:56:21 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.ajaxcrawl.AjaxCrawlMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-10-21 21:56:21 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-10-21 21:56:21 [scrapy] INFO: Enabled item pipelines:
[]
2016-10-21 21:56:21 [scrapy] INFO: Spider opened
2016-10-21 21:56:21 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-10-21 21:56:21 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6024
2016-10-21 21:56:21 [scrapy] DEBUG: Crawled (200) <GET https://www.crowdfunder.com/robots.txt> (referer: None)
2016-10-21 21:56:21 [scrapy] DEBUG: Redirecting (301) to <GET http://www.crowdfunder.com/user/login> from <GET https://www.crowdfunder.com/user/login/>
2016-10-21 21:56:22 [scrapy] DEBUG: Redirecting (301) to <GET https://www.crowdfunder.com/user/login> from <GET http://www.crowdfunder.com/user/login>
2016-10-21 21:56:22 [scrapy] DEBUG: Crawled (200) <GET https://www.crowdfunder.com/user/login> (referer: None)
2016-10-21 21:56:23 [scrapy] DEBUG: Crawled (200) <POST https://www.crowdfunder.com/user/signup> (referer: https://www.crowdfunder.com/user/login)
2016-10-21 21:56:23 [crowdfunder] DEBUG: login failed :( :( :(
2016-10-21 21:56:23 [scrapy] INFO: Closing spider (finished)
2016-10-21 21:56:23 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1569,
'downloader/request_count': 5,
'downloader/request_method_count/GET': 4,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 16313,
'downloader/response_count': 5,
'downloader/response_status_count/200': 3,
'downloader/response_status_count/301': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 10, 22, 4, 56, 23, 232493),
'log_count/DEBUG': 7,
'log_count/INFO': 7,
'request_depth_max': 1,
'response_received_count': 3,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'start_time': datetime.datetime(2016, 10, 22, 4, 56, 21, 180030)}
2016-10-21 21:56:23 [scrapy] INFO: Spider closed (finished)
FormRequest.from_response(response) by default uses the first form it finds. If you check what forms the page has you'd see:
In : response.xpath("//form")
Out:
[<Selector xpath='//form' data='<form action="/user/signup" method="post'>,
<Selector xpath='//form' data='<form action="/user/login" method="POST"'>,
<Selector xpath='//form' data='<form action="/user/login" method="post"'>]
So the form you are looking for is not 1st one. The way to fix it is to use one of many from_response method parameters to specify which form to use.
Using formxpath is the most flexible and my personal favorite:
In : FormRequest.from_response(response, formxpath='//*[contains(#action,"login")]')
Out: <POST https://www.crowdfunder.com/user/login>
We are not able to receive MO on smpp kannel
below if the configure smsbox configuration
group = smsbox
bearerbox-host = 127.0.0.1
sendsms-port = 13013
global-sender = 13013
sms-length = 500
smsbox-id = mysmsc
group = sendsms-user
username = tester
password = foobar
#max-messages=3
# http://kannel.machine:13013/cgi-bin/sendsms?username=tester&password=foobar
group = sms-service
accepted-smsc = smsc2
keyword = default
catch-all = yes
max-messages = 3
get-url = "http://some.com/rcv.php?sender=%p&text=%a"
group = smsbox-route
smsbox-id = mysmsc
smsc-id = smsc2
shortcode = 4867
below is the debug log
2016-08-29 22:22:47 [9639] [7] DEBUG: validity_period: NULL
2016-08-29 22:22:47 [9639] [7] DEBUG: registered_delivery: 0 = 0x00000000
2016-08-29 22:22:47 [9639] [7] DEBUG: replace_if_present_flag: 0 = 0x00000000
2016-08-29 22:22:47 [9639] [7] DEBUG: data_coding: 0 = 0x00000000
2016-08-29 22:22:47 [9639] [7] DEBUG: sm_default_msg_id: 0 = 0x00000000
2016-08-29 22:22:47 [9639] [7] DEBUG: sm_length: 4 = 0x00000004
2016-08-29 22:22:47 [9639] [7] DEBUG: short_message: "Test"
2016-08-29 22:22:47 [9639] [7] DEBUG: SMPP PDU dump ends.
2016-08-29 22:22:47 [9639] [7] WARNING: smsbox_list empty!
2016-08-29 22:22:47 [9639] [7] DEBUG: SMPP[smsc1]: Sending PDU:
2016-08-29 22:22:47 [9639] [7] DEBUG: SMPP PDU 0x7f6310005730 dump:
2016-08-29 22:22:47 [9639] [7] DEBUG: type_name: deliver_sm_resp
2016-08-29 22:22:47 [9639] [7] DEBUG: command_id: 2147483653 = 0x80000005
2016-08-29 22:22:47 [9639] [7] DEBUG: command_status: 0 = 0x00000000
2016-08-29 22:22:47 [9639] [4] WARNING: smsbox_list empty!
2016-08-29 22:22:47 [9639] [7] DEBUG: sequence_number: 118 = 0x00000076
2016-08-29 22:22:47 [9639] [7] DEBUG: message_id: NULL
2016-08-29 22:22:47 [9639] [7] DEBUG: SMPP PDU dump ends.
You don't have an smsc group there (smsc2 is not defined)