python - HTTP Error 503 Service Unavailable - error-handling

I am trying to scrape data from google and linkedin. Somehow it gave me this error:
*** httperror_seek_wrapper: HTTP Error 503: Service Unavailable
Can someone help advice how I solve this?

Google is simply detecting your query as automated. You would need a captcha solver to get unlimited results. The following link might be helpful.
https://support.google.com/websearch/answer/86640?hl=en
Bypassing Captcha using an OCR Engine:
http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html
Simple Approach:
An even simpler approach is to simply use sleep() a few times and to generate random queries. This way google will not spot that you are using an automated system. But the system is far slower ...
Error Handling:
To simply get remove the error message use try and except

I encountered the same situation and tried using the sleep() function before every request to spread the requests a little. It looked like it was working fine but failed soon enough even with a delay of 2 seconds. What solved it finally was using:
with contextlib.closing(urllib.urlopen(urlToOpen)) as x:
#do stuff with x.
This I did because I thought opening too many requests keeps it open and had to closed. Nevertheless, it worked quite consistently with as less as 0.5s delay time.

Related

How to continue test when the page still not completely loaded in selenium

Actually, I am creating automation testing for an e-commerce website. Actually, the website have function lazy load or something. I am testing it on UAT server. So, it will load the page slowly because the specification of the server. It takes more than 60 sec or more to load all the resources from the webpage. So, when I am trying to create selenium automation, it always waiting more than 60 sec to continue the next step (because waiting the page fully loaded). Please, someone give me tips how to continue run the test step after 10 seconds wait the page to load. It won't throw an exception, just continue the test step.
Not possible.
If you find some element and try execute some action while loading you will get stale element error + due loading issue you will have a lot of failed tests and it will take a lot more time to debug.
Automation means to execute fast and have reliable results.
It seems that this environment is not built for automation, you should request more resources.
As an alternative maybe you can use a headless driver or see if you can put the same build on a VM.
Why this is an issue: Selenium needs to wait for each request to be complete.For example when you request a page, if the page is not received entirely and the server still sending info then the request is not done, it is logical that you need a complete request in order to continue.
You should address this to your Project Manager/QA Lead and ask for advice/option on how to handle this.
Please note that these costs should be included/added in the automation price.You need to address this in a simple way:
good server -> automation runs smoothly and fast and the testing is
done faster
bad server -> unable to run automation since is not reliable and each
test has a high rate of failure => alternative X day(s) of
manual testing for each build
If this would be a coding issue like some delayed ajax request then you would have some solutions, devs could help, but if is an infrastructure/resources issue then if not depending on you, and you cannot solve it.
You could use try any type of wait implicit/explicit, explicit would throw some exception, but this is not a solution for poor resources.

Synchronous XMLHttpRequest broke website today

I got into work today and got a telling off from my boss because the company website wasn't loading properly on mobile. I looked in the console and saw the following error... Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/.. I clicked the link https://xhr.spec.whatwg.org/#xmlhttprequest and the update was made yesterday (7th June 2016). How do I fix this error?
Usually the message you provided should not break a page from being rendered.
It just states that you should not use Synchronous XMLHttpRequest.
Synchronous requests will slow down rendering since the site has to wait for each request to complete in an iterative manner.
To "fix" that all your AJAX requests must be set to be async.

http://dbpedia.org/sparql endpoint not reliable?

Sometimes a query works sometimes it doesn't. I get sometimes "Virtuoso S1T00 Error SR171: Transaction timed out" (no timeout is set or a big timeout is set - so this is not the problem there is another problem behind that i am not aware of) or simply a browser HTTP 500 error page.
Sometimes it works from a new browser window in IE sometimes it doesn't work from FF.
What is going on with dbpedia sparql endpoint? Is there some caching or something that I am not aware of?
The DBPedia query service is kindly provided for free, and does tend to get (ab)used by many users. If you need something that you can rely on I'd suggest setting up your own instance (IIRC there are EC2 instances for that purpose).
It's a shame that the error messages tend to be so random.
Due to Large set of data DBpedia is working very harse.It won't produce proper result.If you need better result try to setup ARQ for Sparql query on your localmachine.It will give better outcome.

Is there a browser-agnostic way to detect client-side script errors with Watin?

We're using WatiN to test our web portals. During the course of an E2E test, we'll occasionally see client-side script errors on the IE status bar. I'd like to chain a handler onto the script error event and record the error for later analysis and bug filing.
Problem is, I don't know that there's a global script error event or how to chain into it. And if there's not a browser-agnostic way to accomplish this, I can create MyIE and MyFF subclasses but then this becomes two browser-specific questions.
In essence, I'm thinking of something like this entirely made-up call:
browser.ScriptEngine.SetCustomErrorHandler(LogScriptingError);
... where LogScriptErrors is my code that does the obvious.
Many of our client-side scripting errors don't necessarily prevent the test from continuing (a pretty UI element didn't animate, for example, but the underlying form is still submittable), so I'd like to log the error and forge ahead in most cases.
You probably looking for this:
window.onerror=function(message, url, line){logError();};
You can add this code to your pages to handle errors in logError(). but this may not work in all browser(works in IE), check this for browser compatibility:
http://www.quirksmode.org/dom/events/error.html
Or you may try this commercial product:
exceptionhub.com/
You could maybe co-opt the ability to inject eval code (described under "Added Eval functionality") to add a script that caught all errors, not just errors from the eval'ed script. I'm not sure if this would work, but it's an area to explore. Another resource might be this blog post, which discusses how to evaluate Javascript in WatiN.

Selenium RC drops error when it tries open the popup

When selenium tries to open popup window I'm getting JS error permission denied in file
file:///C:/DOCUME~1//LOCALS~1/Temp/customProfileDir8708f7f69e14482ba857f4b2e74775c1/core/RemoteRunner.hta
So this break script execution, could you assist? I saw a related topic at MSDN and openqa but didn't find resolution that could help me.
I've just encountered this error. In the end it was because I was running IE in 'Offline' mode. Open the File menu and make sure that "Work Offline" does not have a tick next to it.
I've just updated a section about that in the Selenium docs. The website build is not working right now, so if you go to the site you will find the old version.
I'll paste the raw text here, I think your case is the second: JS trying to access sections that are still not loaded, so your solution would be a waitForPopUp command:
Why am I getting a permission denied
error?
The most common reason for this error
is that your session is attempting to
violate the same-origin policy by
crossing domain boundaries (e.g.,
accesses a page from http://domain1
and then accesses a page from
http://domain2) or switching protocols
(moving from http://domainX to
https://domainX). For this to be
solved, try using the Heightened
Privileges Browsers if you're working
with the Proxy Injection browsers.
This is covered in some detail in the
tutorial. Make sure you read the
sections about The Same Origin Policy
and Proxy Injection carefully.
If the previous situation was not your
case, it can also occur when
JavaScript attempts to look at
objects which are not yet available
(before the page has completely
loaded), or tries to look at objects
which are no longer available (after
the page has started to be unloaded).
This is most typically encountered
with AJAX pages which are working with
sections of a page or subframes that
load and/or reload independently of
the larger page. For this type of
problem, it is common that the error
is intermittent. Often it is
impossible to reproduce the problem
with a debugger because the trouble
stems from race conditions which are
not reproducible when the debugger's
overhead is added to the system. Try
first adding a static pause to make
sure this is the situation and then
moving on to the waitFor kind of
commands.