I am building a web site in Django that would scrape data from some site, so people could enter the site, set custom data filters and view scraped data in friendly format.
The problem is that requests and beautiful soup modules will not be enough for the scraping purposes, since I will also need some automation to be done (loading javascript or clicking buttons).
Since Selenium requiers a webdriver to be downloaded and put into a path, is it possible to use it from within web app? Like hosting the webdriver somewhere?
I am also open to solutions other than Selenium, if there are any.
I think what you would want is a selenium grid server.
https://www.seleniumhq.org/docs/07_selenium_grid.jsp
Basically you host it on some remote server and then you can connect to it and spin up web drivers remotely and use them in code as needed. It also comes with a handy interface for checking on current browser instances and even taking screenshots or executing scripts from the web ui.
Related
I am trying to scrape a website, but it is not loading in selenium. When I browse that website in my "real" chrome browser, everything works fine. Is there any way I can use my real browser with python to automate stuff, instead of using selenium??
Thanks
Using selenium we can automate real browsers.
If in case the website is not loading via selenium, you can check if adding desired capabilities helps.
Here we can set proxy, disable extensions etc. There are many options available.
https://chromedriver.chromium.org/capabilities
Also if you can share what kind of error is displayed that would be helpful.
There are many selenium webdriver binding package of Golang.
However, I don't want to control browser throught server.
How can I control browser with Golang and selenium without selenium server?
You can try github.com/fedesog/webdriver which says in its documentation:
This is a pure go library and doesn't require a running Selenium driver.
I would characterize the Selenium webdriver as a client rather than a server. Caveat: I have used the Selenium webdriver (Chrome version) from .Net and I am assuming it is similar for Go.
The way Selenium works is that you will launch an instance of it from within code, and it creates a live version of the selected browser (i.e. Chrome) and your program retains control over it. Then you write code to tell the browser to navigate to a page, inspect the response, and interact with the browser by filling out form data, clicking on buttons, etc. You can see what is happening on the browser as the code runs, so it is easy to troubleshoot when the interaction doesn't go as planned.
I have used Selenium to upload tens of thousands of records to a website that has no API and only a graphical user interface. Give it a chance.
I am using htmlunit for web scraping - logging to a website on behalf of the users, settings something in their profile and then come back.
Just using pure Htmlunit and no selenium framework.
Now my question:
WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_11);
Does this statement - creates a browser instance on the machine where i am executing the code or what it does?
I am using BrowserVersion.INTERNET_EXPLORER_11 as this is an accepted browser at that website.
How Selenium is different than htmlunit - i know we can use htmlunit as a webdriver in Selenium. Does Selenium needs a native browser instance on the machine where the code is getting executed? Does Selenium creates browser instances?
My use case is - I will be having multiple users accessing this application. I know WebClient in htmlunit is not thread safe(so have to code it as Spring proto type bean).
Is there any suggestions regarding this?
Any help is greatly appreciated.
HTMLUnit is a headless browser. So no window will be created if used with Selenium either. Setting the BrowserVersion will just tell HTMLUnit to present itself to the server as if it were a given browser (AFAIK, it will just change the User-Agent but might perform additional internal processing depending on the version). I guess this answers most of the questions but the last one.
Regarding asking for suggestions on how to implement this I would try to avoid logging in to a website that way. If the website does not provide an API for this then it is likely that it is agains the Terms Of Service. Assuming it is not, you will have to create new WebClient instances for each user each time the data needs to be extracted from the other site.
I'm new in using Selenium.
Selenium IDE is a user-friendly firefox plugin. I have no problem in using it. However, I found that the documentation for other Selenium tools such as Selenium RC and Selenium Core is quite confusing for beginners. It seems that the author assume that the readers already have deep knowledge in using these tools.
For example, when I try to figure out how to setup Selenium RC to test a webserver, the only diagram i can find from the Selenium website is this:
http://www.sparksupport.com/blog/wp-content/uploads/2010/11/selenium-rc.png
From this diagram, i can't even see which one is the webserver under test and where should i install the Selenium components.
At first I thought this diagram is a bit weird and i should be able to get a better diagram from other websites. I was surprised to find that almost all Selenium RC setup diagram on the internet are similar to this diagram (clones). No one has ever attempted to create a different diagram or give more description for Selenium RC setup.
Appreciate if anyone can give me guidance on how to setup Selenium RC. The things that i want to know are:
Can i use Selenium RC to test any website on the Internet?
How to setup Selenium RC?
Is my current setup correct? My current setup is like this: In a LAN network which has access to the Internet, I have 3 servers. Server-1 comes with IE8, Server-2 comes with Firefox 3.6. Server-3 will be used as the Selenium RC server. So, Selenium RC in server-3 will remotely control server-1 and server-2 to start up IE and FF. Server-1 and 2 will use server-3 as the HTTP proxy to connect to any webserver on the Internet. If I want to test a website such as yahoo.com, I can write Selenium script and let it run in Server-3 to control the IE and FF in server-1 and 2.
This info is related to Selenium 1.
Selenium system consists of 3 parts:
selenium core - that is javascript library that will be used to simulate user actions
selenium RC - this is selenium-server.jar - mediation JETTY server that will receive requests from selenium client. Selenium Server RC (Remote Control) should be on the same machine where the Browser placed
Selenium client - java/ruby/... library that you will be use with your tests to communicate with Selenium RC.
It will be helpful if you provide language that you use for your tests and other technical details.
About your questions:
can
type in command line -> java selenium-server.jar
or you can use class SeleniumServer in your program
please use text formatting when ask questions.
server-1 will has IE8 and SeleiumServer
server-2 will has FF and yet one SeleniumServer
server-3 will has you client tests
FYI - you can run all together on one PC
The below diagram is of a web application test system that I've implemented on numerous occassions. This does not show you specifically details on installing Selenium RC, but it does show you, at a high level, all of the necessary system components and how they interface.
We hope you can use it to get ideas on how to implement your own systems using open source solutions like Selenium, MySQL and Perl.
Our team understands that not all web sites are created equal, and, in order for any automation initiative to be successful a thorough analysis must be performed of not only the web application, but the business as well. Since our client's QA team, while technically savvy, were not programmers we decided to implement a page object design pattern where all of the "magical selenium commands" were abstracted in a class and exposed to the test developers as methods they would call from their test scripts.
The resulting implementation, as seen in the diagram below, is currently deployed and keeping management and interested parties up to date on the status of key functional areas of the web site.
System Diagram - Click to View
In the coming weeks, we are going to be covering each implementation step in more detail. We look forward to any feedback!
Web and Mobile Automation Blog
I want to test a Ajax based web application. I want to write the test scripts in Java and simulate the web browser.
Simulation of a web browser is very important since Iam using very advanced Ajax library like jQuery in the web Application.
Any ideas on how I should proceed?
I think you might want to give Selenium a look.
Kindness,
Dan
Doing "simulation" of a browser will probably not work that well if your application relies on Javascript a lot : there are some crawlers that you can use to test your application, but they don't like JS that much.
The best solution in your case might be to use a real browser to do your testing.
The Selenium tool-suite is quite nice for that : it allows your testing programm to pilot a browser (a real one : firefox, internet explorer, ...) ; which mean having you JS code executed exactly the same way that it would be with a "real" user.
For instance, you can have your testing programm tell a browser to open a page, click on a link, check some content in the page, ... And if there was some JS event plugged onto the link, it will have been executed : there will have been a real "click" on the link.
Using a tool like selenium has some drawbacks, though ; some of them are :
you need a machine with a graphic environnement, to launch the browsers (command line is not enough)
tests with selenium take time : browsing and using the application means loading all the CSS/JS/Images/ads/whatever, for each page ; like in a real browser -- because you are using a real browser
But these tests are quite nice, and usefull to test the application as a whole -- ie, more "functionnal tests" than "unit-test".