Connect Scrapy crawler with S3 - amazon-s3

My crawler download from a URL a Request.body which I save on a file locally. Now I would like to connect to my aws-s3.
I read the documentation but face two issues:
1. the config as well as the credential files are not of a dict type? my file is an unmodified was-credential and aws-config files.
The s3 config key is not a dictionary type, ignoring its value of: None
the Response is a 'bytes' one and cannot not be processed by the feeder as such. I tried Response.text and then got the same error raised but with 'str'.
Any help is highly appreciated. Thank you.
Additional information:
config file (path ~/.aws/config):
[default]
Region=eu-west-2
output=csv
and
credentials file (path ~/.aws/credentials):
[default]
aws_access_key_id=aws_access_key_id=foo
aws_secret_access_key=aws_secret_access_key=bar
the link to Scrapy documentation:
https://docs.scrapy.org/en/latest/topics/settings.html?highlight=s3
MacBook-Pro:aircraftPositions frederic$ scrapy crawl aircraftData_2018
2019-03-13 15:35:04 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: aircraftPositions)
2019-03-13 15:35:04 [scrapy.utils.log] INFO: Versions: lxml 4.3.1.0, libxml2 2.9.9, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28) - [Clang 6.0 (clang-600.0.57)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1a 20 Nov 2018), cryptography 2.5, Platform Darwin-18.2.0-x86_64-i386-64bit
2019-03-13 15:35:04 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'aircraftPositions', 'CONCURRENT_REQUESTS': 32, 'CONCURRENT_REQUESTS_PER_DOMAIN': 32, 'DOWNLOAD_DELAY': 10, 'FEED_FORMAT': 'json', 'FEED_STORE_EMPTY': True, 'FEED_URI': 's3://flightlists/lists_v1/%(name)s/%(time)s.json', 'NEWSPIDER_MODULE': 'aircraftPositions.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['aircraftPositions.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
2019-03-13 15:35:04 [scrapy.extensions.telnet] INFO: Telnet Password: 2f2c11f3300481ed
2019-03-13 15:35:04 [py.warnings] WARNING: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scrapy/utils/misc.py:144: ScrapyDeprecationWarning: Initialising `scrapy.extensions.feedexport.S3FeedStorage` without AWS keys is deprecated. Please supply credentials or use the `from_crawler()` constructor.
return objcls(*args, **kwargs)
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from before-call.apigateway to before-call.api-gateway
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
2019-03-13 15:35:04 [botocore.loaders] DEBUG: Loading JSON file: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botocore/data/endpoints.json
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Event choose-service-name: calling handler <function handle_service_name_alias at 0x103ca4b70>
2019-03-13 15:35:04 [botocore.loaders] DEBUG: Loading JSON file: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botocore/data/s3/2006-03-01/service-2.json
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x103c688c8>
2019-03-13 15:35:04 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x103c686a8>
2019-03-13 15:35:04 [botocore.args] DEBUG: The s3 config key is not a dictionary type, ignoring its value of: None
2019-03-13 15:35:04 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2019-03-13 15:35:04 [botocore.loaders] DEBUG: Loading JSON file: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/botocore/data/_retry.json
2019-03-13 15:35:04 [botocore.client] DEBUG: Registering retry handlers for service: s3
2019-03-13 15:35:04 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2019-03-13 15:35:04 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2019-03-13 15:35:04 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
... after that it is enabling the downloader and the middleware.
This is the spider:
class QuotesSpider(scrapy.Spider):
name = "aircraftData_2018"
def url_values(self):
time = list(range(1538140980, 1538140780, -60))
return time
def start_requests(self):
allowed_domains = ["https://domaine.net"]
list_urls = []
for n in self.url_values():
list_urls.append("https://domaine.net/.../.../all/{}".format(n))
for url in list_urls:
yield scrapy.Request(url=url, callback=self.parse, dont_filter=True)
def parse(self, response):
i = AircraftpositionsItem
i['url'] = response.url
i['body'] = response.body
yield I
This is the pipeline.py
class AircraftpositionsPipeline(object):
def process_item(self, item, spider):
return item
def return_body(self, response):
page = response.url.split("/")[-1]
filename = 'aircraftList-{}.csv'.format(page)
with open(filename, 'wb') as f:
f.write(response.body)

Related

While executing script it will launch URL in emulator and rest of line are not executing in karate framework

When i executing script it will launch URL"www.google.com" in emulator and rest of line are not executing.
"Please suggest me the solution" Is this way to write elements?
Environment details:
JDK 1.8
Appium v1.17
Node js v12.18
Andriod stuido v4
Feature file:
Feature: Testing Mobile
Scenario: launch chrome in appium
* configure driver =
"""
{
type: 'android',
webDriverPath : "/wd/hub",
start: true,
httpConfig : { readTimeout: 120000 }
}
"""
* def desiredConfig =
"""
{
"newCommandTimeout" : 300,
"platformVersion" : "10.0",
"platformName" : "Android",
"connectHardwareKeyboard" : true,
"deviceName" : "emulator-5554",
"avd" : "Pixel_2_API_29",
"automationName" : "UiAutomator2",
"browserName" : "Chrome" ,
"chromedriverExecutable" : "C:/Users/abc/Downloads/chromedriver_win32_2/chromedriver.exe"
}
"""
* driver { webDriverSession: { desiredCapabilities : "#(desiredConfig)"} }
* driver 'http://google.com'
And delay(4000)
* driver click("//a[text()='Images']")
# driver.input("//input[#name='q']", 'karate dsl')
Logs in console:
10:31:16.654 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [HTTP] --> POST /wd/hub
/session/3b12fd81-db3b-421e-8218-b94c1ed331b5/element
10:31:16.695 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [HTTP] {"using":"xpath","value":"//a[text()
='Images']"}
10:31:16.698 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [MJSONWP (3b12fd81)] Calling AppiumDrive
r.findElement() with args: ["xpath","//a[text()='Images']","3b12fd81-db3b-421e-8218-b94c1ed331b5"]
10:31:16.699 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [BaseDriver] Valid locator strategies fo
r this request: xpath, id, class name, accessibility id, -android uiautomator
10:31:16.702 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [BaseDriver] Waiting up to 0 ms for cond
ition
10:31:16.703 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [WD Proxy] Matched '/element' to command
name 'findElement'
10:31:16.706 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [WD Proxy] Proxying [POST /element] to [
POST http://127.0.0.1:8203/wd/hub/session/600582cc-a05b-422e-b886-7daeff02de45/element] with body: {"strategy":"xpath","selector":"//a[text()='Images']","context":"","multiple":false}
10:31:17.466 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [WD Proxy] Got response with status 404: {"sessi
onId":"600582cc-a05b-422e-b886-7daeff02de45","value":{"error":"no such element","message":"An element could not be located on the page using the given search parameters","stacktrace":"io.appium.uiautomator2.common.exceptions.ElementNotFoundException: An element could not be located on the page using the given search parameters\n\tat io.appium.uiautomator2.handler.FindElement.findElement(FindElement.java:102)\n\tat io.appium.uiautomator2.handler.FindElement.safeHandle(FindElement.java:72)\n\tat io.appium.uiautomator2.handler.request.SafeRequestHandler.handle(SafeRequestHandler.java:38)\n\tat io.appium.uiautomator2.server.AppiumServlet.handleRequest(AppiumServlet.java:252)\n\tat io.appium.uiautomator2.server.AppiumServlet.handleHttpRequest(AppiumServlet.java:242)\n\tat io.appium.uiautomator2.http.ServerHandler.channelRead(ServerHandler.java:51)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)\n\tat io.netty.channel.AbstractChannelHandlerCon...
10:31:17.467 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [W3C] Matched W3C error code 'no such el
ement' to NoSuchElementError
10:31:17.474 [ForkJoinPool-1-worker-1] DEBUG com.intuit.karate - response time in milliseconds: 823.75
5 < 500
5 < Connection: keep-alive
5 < Content-Length: 164
5 < Content-Type: application/json; charset=utf-8
5 < Date: Wed, 01 Jul 2020 05:01:17 GMT
5 < ETag: W/"a4-/qNMwkKiq6QWZf9aZdImFcg10wM"
5 < Vary: X-HTTP-Method-Override
5 < X-Powered-By: Express
{"status":7,"value":{"message":"An element could not be located on the page using the given search parameters."},"sessionId":"3b12fd81-db3b-421e-8218-b94c1ed331b5"}
10:31:17.478 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [MJSONWP (3b12fd81)] Encountered interna
l error running command: NoSuchElementError: An element could not be located on the page using the given search parameters.
10:31:17.484 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [MJSONWP (3b12fd81)] at AndroidUiaut
omator2Driver.findElOrEls (C:\Users\M1058955\AppData\Roaming\npm\node_modules\appium\node_modules\appium-android-driver\lib\commands\find.js:75:11)
10:31:17.484 [ForkJoinPool-1-worker-1] WARN com.intuit.karate - http response code: 500, response: {"sessionId":"3b12fd81-db3b-421e-8218-b94c1ed331b5","value":{"message":"An element could not be located on the page using the given search parameters."},"status":7}, request: [method: POST, responseTime: 823.7536, body: {"using":"xpath","value":"//a[text()='Images']"}]
10:31:17.484 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [HTTP] <-- POST /wd/hub/session/3b12fd81-db
3b-421e-8218-b94c1ed331b5/element 500 819 ms - 164
10:31:17.484 [ForkJoinPool-1-worker-1] WARN c.i.k.driver.android_1593579654225 - locator failed, will retry once: {"sessionId":"3b12fd81-db3b-421e-8218-b94c1ed331b5","value":{"message":"An element could not be located on the page using the given search parameters."},"status":7}
10:31:17.485 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [HTTP]
10:31:20.490 [ForkJoinPool-1-worker-1] DEBUG com.intuit.karate - request:
6 > POST http://localhost:50636/wd/hub/session/3b12fd81-db3b-421e-8218-b94c1ed331b5/element
6 > Accept-Encoding: gzip,deflate
6 > Connection: Keep-Alive
6 > Content-Length: 48
6 > Content-Type: application/json; charset=UTF-8
6 > Host: localhost:50636
6 > User-Agent: Apache-HttpClient/4.5.12 (Java/1.8.0_181)
{"using":"xpath","value":"//a[text()='Images']"}
10:31:20.531 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [HTTP] --> POST /wd/hub
/session/3b12fd81-db3b-421e-8218-b94c1ed331b5/element
10:31:20.532 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [HTTP] {"using":"xpath","value":"//a[text()
='Images']"}
10:31:20.533 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [MJSONWP (3b12fd81)] Calling AppiumDrive
r.findElement() with args: ["xpath","//a[text()='Images']","3b12fd81-db3b-421e-8218-b94c1ed331b5"]
10:31:20.534 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [BaseDriver] Valid locator strategies fo
r this request: xpath, id, class name, accessibility id, -android uiautomator
10:31:20.535 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [BaseDriver] Waiting up to 0 ms for cond
ition
10:31:20.536 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [WD Proxy] Matched '/element' to command
name 'findElement'
10:31:20.536 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [WD Proxy] Proxying [POST /element] to [
POST http://127.0.0.1:8203/wd/hub/session/600582cc-a05b-422e-b886-7daeff02de45/element] with body: {"strategy":"xpath","selector":"//a[text()='Images']","context":"","multiple":false}
10:31:21.026 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [WD Proxy] Got response with status 404: {"sessi
onId":"600582cc-a05b-422e-b886-7daeff02de45","value":{"error":"no such element","message":"An element could not be located on the page using the given search parameters","stacktrace":"io.appium.uiautomator2.common.exceptions.ElementNotFoundException: An element could not be located on the page using the given search parameters\n\tat io.appium.uiautomator2.handler.FindElement.findElement(FindElement.java:102)\n\tat io.appium.uiautomator2.handler.FindElement.safeHandle(FindElement.java:72)\n\tat io.appium.uiautomator2.handler.request.SafeRequestHandler.handle(SafeRequestHandler.java:38)\n\tat io.appium.uiautomator2.server.AppiumServlet.handleRequest(AppiumServlet.java:252)\n\tat io.appium.uiautomator2.server.AppiumServlet.handleHttpRequest(AppiumServlet.java:242)\n\tat io.appium.uiautomator2.http.ServerHandler.channelRead(ServerHandler.java:51)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)\n\tat io.netty.channel.AbstractChannelHandlerCon...
10:31:21.027 [android_1593579654225] DEBUG c.i.k.driver.android_1593579654225 - [debug] [W3C] Matched W3C error code 'no such el
ement' to NoSuchElementError
10:31:21.028 [ForkJoinPool-1-worker-1] DEBUG com.intuit.karate - response time in milliseconds: 498.60
6 < 500
can you please try karate version 0.9.6.RC3 ?, maybe based on you location Images locator might be changing?
xpath
Given driver 'https://www.google.com'
Then waitForUrl('https://www.google.com')
And click("//a[text()='Images']")
And input("//input[#name='q']", 'karate dsl')
should work with this version of karate and appium 1.17.1.

NightwatchJS /geckodriver/ selenoid: Error retrieving a new session from the selenium server, while executing test on Firefox,

Setup Information:
OS - MacOS HighSierra - 10.13.3
DockerCE for Mac - 17.12.0-ce-mac49
Selenoid, Selenoid-ui - Latest
Firefox - 58, geckodriver - 0.191 (0.190,0.170)
Chrome - 63, ChromeDriver 2.34.522932
Selenium standalone server - 3.9.1
On Local setup the test runs successsfully with Firefox and Chrome
using the same geckodriver.
On Remote setup the test fails while I use Firefox and succeeds using Chrome.
Verbose log from NightwatchJS
Prasannas-MacBook-Air:sim1 prvenkat$ ./node_modules/nightwatch/bin/nightwatch ./src/test/login.js --verbose
[Test / Login] Test Suite
=============================
Running: Validate Login page
INFO Request: POST /wd/hub/session
- data: {"desiredCapabilities":{"browserName":"firefox","javascriptEnabled":true,"acceptSslCerts":true,"platform":"ANY","moz:firefoxOptions":{"log":{"level":"trace"}},"version":"57.0","name":"Test / Login"}}
- headers: {"Content-Type":"application/json; charset=utf-8","Content-Length":199}
INFO Response 200 POST /wd/hub/session (3439ms) { value:
{ sessionId: '43ba7a2a-8fac-4ebb-8a8f-e046e5534944',
capabilities:
{ acceptInsecureCerts: false,
browserName: 'firefox',
browserVersion: '57.0',
'moz:accessibilityChecks': false,
'moz:headless': false,
'moz:processID': 37,
'moz:profile': '/tmp/rust_mozprofile.i3MIoxBzkGAM',
'moz:webdriverClick': false,
pageLoadStrategy: 'normal',
platformName: 'linux',
platformVersion: '4.9.60-linuxkit-aufs',
rotatable: false,
timeouts: { implicit: 0, pageLoad: 300000, script: 30000 } } } }
Error retrieving a new session from the selenium server
Connection refused! Is selenium server started?
{ value:
{ sessionId: '43ba7a2a-8fac-4ebb-8a8f-e046e5534944',
capabilities:
{ acceptInsecureCerts: false,
browserName: 'firefox',
browserVersion: '57.0',
'moz:accessibilityChecks': false,
'moz:headless': false,
'moz:processID': 37,
'moz:profile': '/tmp/rust_mozprofile.i3MIoxBzkGAM',
'moz:webdriverClick': false,
pageLoadStrategy: 'normal',
platformName: 'linux',
platformVersion: '4.9.60-linuxkit-aufs',
rotatable: false,
timeouts: [Object] } } }
Trace log from the Docker container
Initialize...
Connecting to ws://localhost:8080/ws/logs/43ba7a2a-8fac-4ebb-8a8f-e046e5534944...
Connected!
--- x11vnc loop: 1 ---
2018/02/22 05:20:47 Loading configuration files...
2018/02/22 05:20:47 Loaded configuration from [/etc/selenoid/browsers.json]
2018/02/22 05:20:47 Using default containers log configuration because of:read error: open config/container-logs.json: no such file or directory
2018/02/22 05:20:47 Timezone: UTC
2018/02/22 05:20:47 Listening on :4444
2018/02/22 05:20:47 [NEW_REQUEST]
2018/02/22 05:20:47 [NEW_REQUEST_ACCEPTED]
2018/02/22 05:20:47 [0] [LOCATING_SERVICE] [firefox-57.0]
2018/02/22 05:20:47 [0] [USING_DRIVER] [firefox-57.0]
2018/02/22 05:20:47 [0] [ALLOCATING_PORT]
2018/02/22 05:20:47 [0] [ALLOCATED_PORT] [41809]
2018/02/22 05:20:47 [0] [STARTING_PROCESS] [[/usr/bin/geckodriver --host :: --log debug --port=41809]]
1519276847981 geckodriver INFO geckodriver 0.19.1
1519276847981 webdriver::httpapi DEBUG Creating routes
1519276847986 geckodriver INFO Listening on [::]:41809
1519276848032 webdriver::server DEBUG -> HEAD /
1519276848160 webdriver::server DEBUG <- 404 Not Found {"value":{"error":"unknown command","message":"HEAD / did not match a known command","stacktrace":"stack backtrace:\n 0: 0x4edb3c - backtrace::backtrace::trace::hc4bd56a2f176de7e\n 1: 0x4edb72 - backtrace::capture::Backtrace::new::he3b2a15d39027c46\n 2: 0x440ac8 - webdriver::error::WebDriverError::new::ha0fbd6d1a1131b43\n 3: 0x43a665 - <webdriver::server::HttpHandler<U> as hyper::server::Handler>::handle::h343049f2e1aa3f13\n 4: 0x404b4d - std::sys_common::backtrace::__rust_begin_short_backtrace::he840f14c79c8e321\n 5: 0x40bee6 - std::panicking::try::do_call::hdd1d6b985699ef9d\n 6: 0x5e6a6c - panic_unwind::__rust_maybe_catch_panic\n at /checkout/src/libpanic_unwind/lib.rs:99\n 7: 0x41ef02 - <F as alloc::boxed::FnBox<A>>::call_box::hae8ac6ade91dedb6\n 8: 0x5df13b - alloc::boxed::{{impl}}::call_once<(),()>\n at /checkout/src/liballoc/boxed.rs:692\n - std::sys_common::thread::start_thread\n at /checkout/src/libstd/sys_common/thread.rs:21\n - std::sys::imp::thread::{{impl}}::new::thread_start\n at /checkout/src/libstd/sys/unix/thread.rs:84"}}
2018/02/22 05:20:48 [0] [PROCESS_STARTED] [32] [181.491408ms]
2018/02/22 05:20:48 [0] [PROXYING_REQUESTS] [http://127.0.0.1:41809]
2018/02/22 05:20:48 [0] [SESSION_ATTEMPTED] [unknown] [http://127.0.0.1:41809] [1]
1519276848161 webdriver::server DEBUG -> POST /session {"desiredCapabilities":{"browserName":"firefox","javascriptEnabled":true,"acceptSslCerts":true,"platform":"ANY","moz:firefoxOptions":{"log":{"level":"trace"}},"version":"57.0","name":"Test / Login"}}
1519276848167 mozrunner::runner INFO Running command: "/usr/bin/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.i3MIoxBzkGAM"
1519276848171 geckodriver::marionette TRACE connection attempt 0/600
1519276848271 geckodriver::marionette TRACE connection attempt 1/600
--- x11vnc loop: waiting for: 61
1519276848372 geckodriver::marionette TRACE connection attempt 2/600
PORT=5900
1519276848474 geckodriver::marionette TRACE connection attempt 3/600
1519276848575 geckodriver::marionette TRACE connection attempt 4/600
1519276848677 geckodriver::marionette TRACE connection attempt 5/600
1519276848694 Marionette DEBUG Received observer notification "profile-after-change"
1519276848755 Marionette DEBUG Received observer notification "command-line-startup"
1519276848756 Marionette INFO Enabled via --marionette
1519276848778 geckodriver::marionette TRACE connection attempt 6/600
1519276848880 geckodriver::marionette TRACE connection attempt 7/600
1519276848980 geckodriver::marionette TRACE connection attempt 8/600
1519276849081 geckodriver::marionette TRACE connection attempt 9/600
1519276849183 geckodriver::marionette TRACE connection attempt 10/600
1519276849285 geckodriver::marionette TRACE connection attempt 11/600
1519276849386 geckodriver::marionette TRACE connection attempt 12/600
1519276849487 geckodriver::marionette TRACE connection attempt 13/600
1519276849589 geckodriver::marionette TRACE connection attempt 14/600
1519276849690 geckodriver::marionette TRACE connection attempt 15/600
1519276849792 geckodriver::marionette TRACE connection attempt 16/600
1519276849894 geckodriver::marionette TRACE connection attempt 17/600
1519276849930 Marionette DEBUG Received observer notification "sessionstore-windows-restored"
1519276849996 geckodriver::marionette TRACE connection attempt 18/600
1519276850097 geckodriver::marionette TRACE connection attempt 19/600
1519276850197 geckodriver::marionette TRACE connection attempt 20/600
1519276850298 geckodriver::marionette TRACE connection attempt 21/600
1519276850399 geckodriver::marionette TRACE connection attempt 22/600
1519276850499 geckodriver::marionette TRACE connection attempt 23/600
1519276850600 geckodriver::marionette TRACE connection attempt 24/600
1519276850679 Marionette DEBUG Setting recommended pref toolkit.cosmeticAnimations.enabled to false
1519276850681 Marionette DEBUG Setting recommended pref datareporting.policy.dataSubmissionPolicyAccepted to false
1519276850681 Marionette DEBUG Setting recommended pref extensions.e10sBlocksEnabling to false
1519276850682 Marionette DEBUG New connections are accepted
1519276850683 Marionette INFO Listening on port 37569
1519276850702 geckodriver::marionette DEBUG Connected to Marionette onlocalhost:37569
1519276850712 Marionette DEBUG Accepted connection 0 from 127.0.0.1:42836
1519276850714 geckodriver::marionette TRACE <- {"applicationType":"gecko","marionetteProtocol":3}
1519276850714 geckodriver::marionette TRACE -> 315:[0,1,"newSession",{"acceptSslCerts":true,"browserName":"firefox","capabilities":{"desiredCapabilities":{"acceptSslCerts":true,"browserName":"firefox","javascriptEnabled":true,"name":"Test / Login","platform":"ANY","version":"57.0"}},"javascriptEnabled":true,"name":"Test / Login","platform":"ANY","version":"57.0"}]
1519276850717 Marionette TRACE 0 -> [0,1,"newSession",{"acceptSslCerts":true,"browserName":"firefox","capabilities":{"desiredCapabilities":{"acceptSslCerts":true,"browserName":"firefox","javascriptEnabled":true,"name":"Test / Login","platform":"ANY","version":"57.0"}},"javascriptEnabled":true,"name":"Test / Login","platform":"ANY","version":"57.0"}]
1519276850783 Marionette DEBUG Register listener.js for window 2147483649
1519276850806 Marionette TRACE 0 <- [1,1,null,{"sessionId":"43ba7a2a-8fac-4ebb-8a8f-e046e5534944","capabilities":{"browserName":"firefox","browserVersion":"57.0","platformName":"linux","platformVersion":"4.9.60-linuxkit-aufs","pageLoadStrategy":"normal","acceptInsecureCerts":false,"timeouts":{"implicit":0,"pageLoad":300000,"script":30000},"rotatable":false,"moz:accessibilityChecks":false,"moz:headless":false,"moz:processID":37,"moz:profile":"/tmp/rust_mozprofile.i3MIoxBzkGAM","moz:webdriverClick":false}}]
1519276850809 geckodriver::marionette TRACE <- [1,1,null,{"sessionId":"43ba7a2a-8fac-4ebb-8a8f-e046e5534944","capabilities":{"browserName":"firefox","browserVersion":"57.0","platformName":"linux","platformVersion":"4.9.60-linuxkit-aufs","pageLoadStrategy":"normal","acceptInsecureCerts":false,"timeouts":{"implicit":0,"pageLoad":300000,"script":30000},"rotatable":false,"moz:accessibilityChecks":false,"moz:headless":false,"moz:processID":37,"moz:profile":"/tmp/rust_mozprofile.i3MIoxBzkGAM","moz:webdriverClick":false}}]
1519276850809 webdriver::server DEBUG <- 200 OK {"value": {"sessionId":"43ba7a2a-8fac-4ebb-8a8f-e046e5534944","capabilities":{"acceptInsecureCerts":false,"browserName":"firefox","browserVersion":"57.0","moz:accessibilityChecks":false,"moz:headless":false,"moz:processID":37,"moz:profile":"/tmp/rust_mozprofile.i3MIoxBzkGAM","moz:webdriverClick":false,"pageLoadStrategy":"normal","platformName":"linux","platformVersion":"4.9.60-linuxkit-aufs","rotatable":false,"timeouts":{"implicit":0,"pageLoad":300000,"script":30000}}}}
2018/02/22 05:20:50 [0] [SESSION_CREATED] [unknown] [43ba7a2a-8fac-4ebb-8a8f-e046e5534944] [http://127.0.0.1:41809] [1] [2.832759113s]
2018/02/22 05:21:50 [SESSION_DELETED] [43ba7a2a-8fac-4ebb-8a8f-e046e5534944]
1519276910817 webdriver::server DEBUG -> DELETE /session/43ba7a2a-8fac-4ebb-8a8f-e046e5534944
1519276910819 geckodriver::marionette TRACE -> 37:[0,2,"quit",{"flags":["eForceQuit"]}]
1519276910827 Marionette TRACE 0 -> [0,2,"quit",{"flags":["eForceQuit"]}]
1519276910830 Marionette DEBUG New connections will no longer be accepted
1519276910973 Marionette TRACE 0 <- [1,2,null,{"cause":"shutdown"}]
1519276911015 geckodriver::marionette TRACE <- [1,2,null,{"cause":"shutdown"}]
1519276911015 webdriver::server DEBUG Deleting session
1519276911015 geckodriver::marionette DEBUG Stopping browser process
1519276911077 webdriver::server DEBUG <- 200 OK {"value": {}}
2018/02/22 05:21:51 [0] [TERMINATING_PROCESS] [32]
2018/02/22 05:21:51 [0] [TERMINATED_PROCESS] [32]
Disconnected
I had tested the scenario with three different geckodriver but the result was same.
Please help.
The error says it all :
1519276847986 geckodriver INFO Listening on [::]:41809
1519276848032 webdriver::server DEBUG -> HEAD /
1519276848160 webdriver::server DEBUG <- 404 Not Found {"value":{"error":"unknown command","message":"HEAD / did not match a known command","stacktrace":"stack backtrace:\n 0: 0x4edb3c - backtrace::backtrace::trace::hc4bd56a2f176de7e\n 1: 0x4edb72 - backtrace::capture::Backtrace::new::he3b2a15d39027c46\n 2: 0x440ac8 - webdriver::error::WebDriverError::new::ha0fbd6d1a1131b43\n 3: 0x43a665 - <webdriver::server::HttpHandler<U> as hyper::server::Handler>::handle::h343049f2e1aa3f13\n 4: 0x404b4d - std::sys_common::backtrace::__rust_begin_short_backtrace::he840f14c79c8e321\n 5: 0x40bee6 - std::panicking::try::do_call::hdd1d6b985699ef9d\n 6: 0x5e6a6c - panic_unwind::__rust_maybe_catch_panic\n at /checkout/src/libpanic_unwind/lib.rs:99\n 7: 0x41ef02 - <F as alloc::boxed::FnBox<A>>::call_box::hae8ac6ade91dedb6\n 8: 0x5df13b - alloc::boxed::{{impl}}::call_once<(),()>\n at /checkout/src/liballoc/boxed.rs:692\n - std::sys_common::thread::start_thread\n at /checkout/src/libstd/sys_common/thread.rs:21\n - std::sys::imp::thread::{{impl}}::new::thread_start\n at /checkout/src/libstd/sys/unix/thread.rs:84"}}
Due to this issue the SESSION gets DELETED as follows :
2018/02/22 05:20:50 [0] [SESSION_CREATED] [unknown] [43ba7a2a-8fac-4ebb-8a8f-e046e5534944] [http://127.0.0.1:41809] [1] [2.832759113s]
2018/02/22 05:21:50 [SESSION_DELETED] [43ba7a2a-8fac-4ebb-8a8f-e046e5534944]
The error stack trace clearly mentions that Selenium Language Binding Art is attempting to connect on the IPv6 loopback address ([::]:41809), which won't work at all with 3.9.x as per #JimEvans comment in the discussion thread Can't launch Selenium IE Driver after upgrading to version 3.9
Solution
A possible solution would be to disable IPv6 loopback address and enable IPv4 on the Remote setup and execute your Test.
Have a look here:
https://github.com/nightwatchjs/nightwatch/issues/1628#issuecomment-357746215
That means you will have to fallback to 3.8 and a selenium Grid setup.

s3cmd not excluding a chosen prefix when restoring from Glacier

What am I doing
I am trying to Glacier-restore all objects under the kevinturino.com prefix in my s3 bucket named simplewebsite. I want the prefix kevinturino.com/plugins to be excluded.
What is happening?
The kevinturino.com/plugins prefix is not being excluded.
My command is:
s3cmd -d --force restore --recursive -D999 --exclude=*plugins* --exclude=plugins/ --exclude=*/plugins* --exclude=kevinturino.com/plugins/ --exclude=kevinturino.com/plugins/* s3://simplewebsite/kevinturino.com/
Debug shows:
DEBUG: Canonical Request:
GET
/
marker=kevinturino.com%2Fplugins%2Fnew%2Fsym%2Froot%2Fetc%2Falternatives%2Fri&prefix=kevinturino.com%2F
host:simplewebsite.s3.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20170925T030908Z
host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
----------------------
DEBUG: signature-v4 headers: {'x-amz-content-sha256': u'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': u'AWS4-HMAC-SHA256 Credential=obfuscated/20170925/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=obfuscated', 'x-amz-date': '20170925T030908Z'}
DEBUG: Processing request, please wait...
DEBUG: get_hostname(simplewebsite): simplewebsite.s3.amazonaws.com
DEBUG: ConnMan.get(): re-using connection: https://simplewebsite.s3.amazonaws.com#6
DEBUG: format_uri(): /?marker=kevinturino.com%2Fplugins%2Fnew%2Fsym%2Froot%2Fetc%2Falternatives%2Fri&prefix=kevinturino.com%2F
DEBUG: Sending request method_string='GET', uri=u'/?marker=kevinturino.com%2Fplugins%2Fnew%2Fsym%2Froot%2Fetc%2Falternatives%2Fri&prefix=kevinturino.com%2F', headers={'x-amz-content-sha256': u'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': u'AWS4-HMAC-SHA256 Credential=obfuscated/20170925/us-east-1/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=obfuscated', 'x-amz-date': '20170925T030908Z'}, body=(0 bytes)
DEBUG: Response:
{'headers': {'content-type': 'application/xml',
'date': 'Mon, 25 Sep 2017 03:09:09 GMT',
'server': 'AmazonS3',
'transfer-encoding': 'chunked',
'x-amz-bucket-region': 'us-east-1',
'x-amz-id-2': 'obfuscated',
'x-amz-request-id': '500238C860B78674'},
'reason': 'OK',
'status': 200}
What do I expect?
All objects to be processed for restoration, and the kevinturino.com/plugins prefix to be excluded.

Riak CS returns "s3.amazonaws.com" no matter what is in cs_root_host

cs_root_host is set up right:
grep root_host /var/lib/riak-cs/generated.configs/app.2015.07.09.13.59.07.config
{cs_root_host,"s3.example.com"},
But when I upload file:
s3cmd put test.jpg s3://images --acl-public
I get in return:
Public URL of the object is: http://images.s3.amazonaws.com/test.jpg
Where is the issue?
Added:
Here is output - everything looks fine, except the last line:
(example.com is just replacement for real domain which I don't want to public)
s3cmd -d -c .s3cfg put newfile.jpg s3://images --acl-public
DEBUG: ConfigParser: Reading file '.s3cfg'
DEBUG: ConfigParser: access_key->YD...17_chars...U
DEBUG: ConfigParser: bucket_location->RU
DEBUG: ConfigParser: cloudfront_host->cloudfront.amazonaws.com
DEBUG: ConfigParser: cloudfront_resource->/2010-07-15/distribution
DEBUG: ConfigParser: default_mime_type->binary/octet-stream
DEBUG: ConfigParser: delete_removed->False
DEBUG: ConfigParser: dry_run->False
DEBUG: ConfigParser: encoding->UTF-8
DEBUG: ConfigParser: encrypt->False
DEBUG: ConfigParser: follow_symlinks->False
DEBUG: ConfigParser: force->False
DEBUG: ConfigParser: get_continue->False
DEBUG: ConfigParser: gpg_command->/usr/bin/gpg
DEBUG: ConfigParser: gpg_decrypt->%(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
DEBUG: ConfigParser: gpg_encrypt->%(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
DEBUG: ConfigParser: gpg_passphrase->...-3_chars...
DEBUG: ConfigParser: guess_mime_type->True
DEBUG: ConfigParser: host_base->s3.example.com
DEBUG: ConfigParser: host_bucket->%(bucket)s.s3.example.com
DEBUG: ConfigParser: human_readable_sizes->False
DEBUG: ConfigParser: list_md5->False
DEBUG: ConfigParser: log_target_prefix->
DEBUG: ConfigParser: preserve_attrs->True
DEBUG: ConfigParser: progress_meter->True
DEBUG: ConfigParser: proxy_host->127.0.0.1
DEBUG: ConfigParser: proxy_port->80
DEBUG: ConfigParser: recursive->False
DEBUG: ConfigParser: recv_chunk->4096
DEBUG: ConfigParser: reduced_redundancy->False
DEBUG: ConfigParser: secret_key->kG...37_chars...=
DEBUG: ConfigParser: send_chunk->4096
DEBUG: ConfigParser: simpledb_host->sdb.amazonaws.com
DEBUG: ConfigParser: skip_existing->False
DEBUG: ConfigParser: socket_timeout->10
DEBUG: ConfigParser: urlencoding_mode->normal
DEBUG: ConfigParser: use_https->False
DEBUG: ConfigParser: verbosity->WARNING
DEBUG: Updating Config.Config encoding -> UTF-8
DEBUG: Updating Config.Config follow_symlinks -> False
DEBUG: Updating Config.Config verbosity -> 10
DEBUG: Unicodising 'put' using UTF-8
DEBUG: Unicodising 'newfile.jpg' using UTF-8
DEBUG: Unicodising 's3://images' using UTF-8
DEBUG: Command: put
INFO: Compiling list of local files...
DEBUG: DeUnicodising u'' using UTF-8
DEBUG: DeUnicodising u'newfile.jpg' using UTF-8
DEBUG: Unicodising 'newfile.jpg' using UTF-8
DEBUG: Unicodising 'newfile.jpg' using UTF-8
INFO: Applying --exclude/--include
DEBUG: CHECK: newfile.jpg
DEBUG: PASS: newfile.jpg
INFO: Summary: 1 local files to upload
DEBUG: Content-Type set to 'image/jpeg'
DEBUG: String 'newfile.jpg' encoded to 'newfile.jpg'
DEBUG: SignHeaders: 'PUT\n\nimage/jpeg\n\nx-amz-acl:public-read\nx-amz-date:Fri, 10 Jul 2015 09:55:37 +0000\n/images/newfile.jpg'
DEBUG: CreateRequest: resource[uri]=/newfile.jpg
DEBUG: Unicodising 'newfile.jpg' using UTF-8
DEBUG: SignHeaders: 'PUT\n\nimage/jpeg\n\nx-amz-acl:public-read\nx-amz-date:Fri, 10 Jul 2015 09:55:37 +0000\n/images/newfile.jpg'
newfile.jpg -> s3://images/newfile.jpg [1 of 1]
DEBUG: get_hostname(images): images.s3.example.com
DEBUG: format_uri(): http://images.s3.example.com/newfile.jpg
32600 of 32600 100% in 0s 14.49 MB/sDEBUG: Response: {'status': 200, 'headers': {'content-length': '0', 'server': 'nginx', 'connection': 'keep-alive', 'etag': '"89e39f454c69a1ce1fadec3a222fc292"', 'date': 'Fri, 10 Jul 2015 09:55:37 GMT', 'content-type': 'text/plain'}, 'reason': 'OK', 'data': '', 'size': 32600}
32600 of 32600 100% in 0s 391.54 kB/s done
DEBUG: MD5 sums: computed=89e39f454c69a1ce1fadec3a222fc292, received="89e39f454c69a1ce1fadec3a222fc292"
Public URL of the object is: http://images.s3.amazonaws.com/newfile.jpg
This is not a Riak CS issue. s3cmd itself produce public url string
and print it.
For my environment, with s3cmd of master branch of commit
7bdefc81823699069706ea3680bfa65ec8ad3db5 (just fetched today, 2015-07-14),
it shows (seemingly) the corrent URL.
% ~/g/s3cmd/build/scripts-2.7/s3cmd -c .s3cfg.15018.alice put rebar.config -P s3://test/a
rebar.config -> s3://test/a [1 of 1]
2791 of 2791 100% in 0s 196.88 kB/s done
Public URL of the object is: http://test.s3.example.com/a
From the source code of s3cmd, it seems it uses host_bucket or host_base
configuration depending on bucket name (or maybe other configurations.)
Some other details on my environment
s3cmd configuration : host_base = s3.example.com and host_bucket = %(bucket)s.s3.example.com
Server is Riak CS of develop branch (commit 1f954aaae45429923f65fdad40c7916a55ab79f3)
Riak CS configuration : cs_root_host = s3.example.com

Enabling HttpProxyMiddleware in scrapyd

After reading the scrapy documentation, I thought that the HttpProxyMiddleware is enabled by default. But when I start a spider via scrapyd's webservice interface, HttpProxyMiddleware is not enabled. I receive the following output:
2013-02-18 23:51:01+1300 [scrapy] INFO: Scrapy 0.17.0-120-gf293d08 started (bot: pde)
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled extensions: FeedExporter, LogStats, CloseSpider, WebService, CoreStats, SpiderState
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-02-18 23:51:02+1300 [scrapy] DEBUG: Enabled item pipelines: PdePipeline
2013-02-18 23:51:02+1300 [shotgunsupplements] INFO: Spider opened
Note that HttpProxyMiddleware is not enabled. How can I enable it for scrapyd? Any help will be greatly appreciated.
My scrapy.cfg
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# http://doc.scrapy.org/topics/scrapyd.html
[settings]
default = pd.settings
[deploy]
url = http://localhost:6800/
project = pd
I have the following settings.py
BOT_NAME = 'pd' #this gets replaced with a function
BOT_VERSION = '1.0'
SPIDER_MODULES = ['pd.spiders']
NEWSPIDER_MODULE = 'pd.spiders'
DEFAULT_ITEM_CLASS = 'pd.items.Product'
ITEM_PIPELINES = 'pd.pipelines.PdPipeline'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)
TELNETCONSOLE_HOST = '127.0.0.1' # defaults to 0.0.0.0 set so
TELNETCONSOLE_PORT = '6073' # only we can see it.
TELNETCONSOLE_ENABLED = False
WEBSERVICE_ENABLED = True
LOG_ENABLED = True
ROBOTSTXT_OBEY = False
ITEM_PIPELINES = [
'pd.pipelines.PdPipeline',
]
DATA_DIR = '/home/pd/scraped_data' #directory to store export files to.
DOWNLOAD_DELAY = 2.0
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 750,
}
Regards,
Pranshu
After spending forever trying to debug, it turns out that HttpProxyMiddleware actually expects http_proxy environment variable to be set. The middleware will not be loaded if http_proxy is not set. Therefore, I set http_proxy and bob's your uncle! Everything works!