pyspider phantom is not enabled ;501 Sever Error - phantomjs

I used pyspider to crawl a website, when using PhantomJs, an error occurred as follows:
I've searched for the solutions in https://github.com/binux/pyspider/issues/215,
the author's seemed to solute it, so I tried, but it didn't still. How to solve it?

You need to check if phantomjs is in $PATH
try the following:
phantomjs -v
or try the following and check the output
pyspider phantomjs
it should output
phantomjs fetcher running on port 25555
Otherwise you will need to install phantomjs on your system.

Related

Python cron job with Chrome not running in AWS EC2

I've been using an EC2 instance to run a python script with cron everyday for a month or so. The script uses selenium.
Everything was working correctly until today, when my script did not run.
I have tried to run it manually but it's not working either. The error message says that
raise exception_class(message, screen, stacktrace) selenium.common.exceptions.
NoSuchElementException: Message: no such element: Unable to locate element:
{"method":"cssselector","selector":"#ctl00_ctl00_moteurRapideOffre_
ctl01_EngineCriteriaCollection_Contract > option:nth-child(5)"}
(Session info: headless chrome=90.0.4430.85)
However, the same script is running fine on my computer (ie on my Macbook, not on AWS EC2).
As the problem seems to come from Chrome, I uninstalled it on AWS EC2 using:
sudo yum remove google-chrome-stable
Then I reinstalled it using :
curl https://intoli.com/install-google-chrome.sh | bash
sudo mv /usr/bin/google-chrome-stable /usr/bin/google-chrome
google-chrome --version && which google-chrome
If I try to run Chrome on the EC2 using /usr/bin/google-chrome, it does not work and it displays the following error message :
ERROR:browser_main_loop.cc(1386)] Unable to open X display.
I don't know if it was working before as I have never used it this way. But it seems to be a problem.
I have seen on the web that it might come from the fact that there is no screen and that I should use a package named xvfb. I have tried to install it with the following code:
sudo yum install xorg-x11-server-Xvfb
I guess the package was correclty installed, but it is not working better.
To sum up, I think my problem in the python code is linked to the fact that Google Chrome is not working correclty and this might be linked to xvfb. But I am not sure at all, it is just what I have tried until now.
Could you please help me ? Thanks!
You can simply add setup your like this, runs after every 30 minutes
*/30 * * * * export DISPLAY=:0 && ,<do what ever you want.>
If this does not work, and you google-chrome or firefox not found, simply run the command below in your shell BASH, FISH, ZSH etc to get PATH.
echo $PATH
Whatever the result comes out from the above command just copy and paste it above your cronjob like this,
*/30 * * * * export DISPLAY=:0 && ,<your selenium script.>```
You can remove export ```export DISPLAY=:0``` line if you want to this in the background or make your driver headless.
The reason of doing this, you might install the respective from snapd etc and that's why path is not defined as you downloaded from separate resource.

chromium-browser Error: [670] Failed to put Xlib into threaded mode

my original aim was to run a headless selenium webdriver on a Raspberry Pi 3 (rasbian). After hours and hours of failing, I make a step back and now I only try to run chromium-browser which needed for the webdriver.
There... I recognize some errors after execute:
sudo ./chromium-browser --headless --no-sandbox --disable-gpu --disable-extensions
Error-Stack:
--disable-quic --enable-tcp-fast-open --disable-gpu-compositing --ppapi-flash-path=/usr/lib/chromium-browser/libpepflashplayer.so --ppapi-flash-args=enable_stagevideo_auto=0 --ppapi-flash-version=
[1015/183516.617458:ERROR:browser_main_loop.cc(670)] Failed to put Xlib into threaded mode.
[1015/183516.625190:ERROR:gpu_process_transport_factory.cc(1029)] Lost UI shared context.
I search for solution in internet but I found no results.
Following I have to add:
-everything works fine if run the webdriver on my windows system
-I reproduce the error on two complete different raspberry Pi's
-I also try to run on a Raspbian virtual machine
-I try to run the webdriver with iceweasel and geckodriver with the result "Error: connection refused"
So I am out of ideas, thankful for any response.
For others who are still struggling to find a solution:
It seems that the issue is with diplay resource when running the program on a remote server having multiple display.
You may try setting the DISPLAY environment parameter
export DISPLAY=:1.0;

Testing website in nightwatch without browser

is any chance to test website by url and do not open browser so any info output will be in the command line?
Thanks in advance !
You could use PhantomJS if you just don't want to run a browser but it sounds like may just want to see the output, have you tried running with the --verbose flag?

Can't update chromedriver and seleniumrelease

I try to work with protractor. So I followed a small tutorial and the first thing I did:
npm install -g protractor
This will install two command line tools, protractor and webdriver-manager.
But now I have to update my webdriver-manager:
webdriver-manager update
So my cmd tries to connect with https://chromedriver.storage.googleapis.com/2.14/chromedriver_win32.zip and https://selenium-release.storage.googleapis.com/2.45/selenium-server-standalone-2.45.0.jar.
But it will give this error:
C:\Program Files (x86)\Jenkins\workspace\testnew>webdriver-manager update
Updating selenium standalone
downloading https://selenium-release.storage.googleapis.com/2.45/selenium-server
-standalone-2.45.0.jar...
Updating chromedriver
downloading https://chromedriver.storage.googleapis.com/2.14/chromedriver_win32.
zip...
Error: Got error Error: getaddrinfo EAI_AGAIN from https://selenium-release.stor
age.googleapis.com/2.45/selenium-server-standalone-2.45.0.jar
Error: Got error Error: getaddrinfo EAI_AGAIN from https://chromedriver.storage.
googleapis.com/2.14/chromedriver_win32.zip
Sometimes it is the EAI_AGAIN error and sometimes ENOTFOUND.
But what I don't understand is that I can download the zip and the jar manually in my browser. When I surf to the URL it all works fine. But not in the cmd. Can someone help me?
PS: pinging isn't possible to the url's
Update: after proxy settings I get this error:
Error: Got error Error: tunneling socket could not be established, cause=socket
hang up from https://chromedriver.storage.googleapis.com/2.14/chromedriver_win32
.zip
It happened the same to me. The problem was due to a proxy we are using inside our company.
webdriver-manager has a parameter which is --proxy, where you can specify the proxy which the webdriver command should use.
The proxy configuration which you might have in nmp (.npmrc file in your users dicrectory) won't work for webdriver-manager.
Here the example which worked out for me.
webdriver-manager --proxy http://yourproxy:8080 update
If setting your proxy does not work, how it happened to me, you can download the files manually from the urls show in the console, and put them into the selenium folder
The path in Windows is:
users\username\AppData\Roaming\npm\node_modules\protractor\selenium
That works for me.
I hope that helps
Read on if your webdriver-manager update doesn't update chromedriver
to the latest.
I lost a few weeks pulling my hair around an issue I had with "Unable to discover open pages" and every time I would update the chromedriver, it would update to version 2.22 for chromedriver and I believe the selenium server to v2.53.
My problem wasn't really with the selenium server so v2.53 was fine.
Issue was with chromedriver v2.22.
Eventhough this chromdriver link showed that there was a latest version of 2.24, 'webdriver-manager update' would NOT pick up that latest version, it would only grab version 2.22 of the chrome driver.
How did I go around this?
Simply run the command below after you check this link for which version of chromedriver you want to update to; for instance, I wanted v2.24 so I ran the command below:
webdriver-manager update --versions.chrome 2.24
If you check your location: C:\Users\<USER>\AppData\Roaming\npm\node_modules\webdriver-manager\selenium\
You should see that the desired chromedriver was downloaded there; if it's not there, read the command prompt logs and it'll tell you where it downloaded your chromdriver files.
Hope that helps someone!
Your web browser is probably using a proxy, or some other indirect access to the wider internet that the webdriver-manager script isn't configured to use. (The webdriver-manager supports a --proxy parameter if you know what to pass to it.)
If you can download the files manually, just put them in the selenium directory manually. The script also unzips the "chromedriver_win32.zip" in place to get the chromedriver binary contained in it.

Why is selenium hanging on INFO - Checking Resource aliases, and how do I even debug this?

I'm trying to follow the tutorial here to setup a headless selenium test-run with jenkins. I'm running CentOS 5.6, and I've followed the instructions. Now, when I run this:
export DISPLAY=":99" && java -jar /var/lib/selenium/selenium-server.jar -browserSessionReuse -htmlSuite *firefox http://www.google.com ./test/selenium/html/TestSuite.html ./target/selenium/html/TestSuiteResults.html
Selenium hangs on INFO - Checking Resource Aliases. I can run the TestSuite.html file manually, and the path is correct.
How can I even begin to try and figure out what's going on? Is there a way I could connect to the display to see what's happening? I am behind a corporate proxy, but with or without -Dhttp.proxyHost arguments, I get the same hung result.
Well, after pointing at an internal server, I get right on past the INFO - Checking Resource Aliases step, so clearly the proxy was the issue.
By trying to hit a site that required the proxy, I was doing too much at once. Confounding variables confounded me.
Selenium is not hanging on INFO - Checking Resource Aliases. Its waiting for a command to execute. You need to trigger your tests using ANT or some other build tool in Jenkins. That should get you going