I am trying to build a scraping app using python and selenium and run it on a server. What it does is creating and scheduling posts on CreatorStudio to share them on Instagram. I can't use Chrome or Edge Chromium because I can't send emojis using send_keys on these browsers. Firefox on the other hand can send emojis. And I can't use copy-paste workaround because it's a server. send_keysworks fine on Firefox when I am trying to send keys to an input web element e.g: google search bar. But when I try to send keys to this element driver.find_element_by_xpath("//div[#class = '_1mf _1mj']"), It doesn't work on Firefox but it does on Chrome.
Is there any way to solve this issue?? Or is there any work arround or some java script I can run?
Thank you
If possible, try creating the element xpath using input tag. Otherwise you can use JavaScript Executor or Action class to send keys.
JavascriptExecutor jse = (JavascriptExecutor)driver;
jse.executeScript("document.getElementById('elementID').setAttribute('value', 'new value for element')");
Related
I am trying to scrape a website, but it is not loading in selenium. When I browse that website in my "real" chrome browser, everything works fine. Is there any way I can use my real browser with python to automate stuff, instead of using selenium??
Thanks
Using selenium we can automate real browsers.
If in case the website is not loading via selenium, you can check if adding desired capabilities helps.
Here we can set proxy, disable extensions etc. There are many options available.
https://chromedriver.chromium.org/capabilities
Also if you can share what kind of error is displayed that would be helpful.
There are many selenium webdriver binding package of Golang.
However, I don't want to control browser throught server.
How can I control browser with Golang and selenium without selenium server?
You can try github.com/fedesog/webdriver which says in its documentation:
This is a pure go library and doesn't require a running Selenium driver.
I would characterize the Selenium webdriver as a client rather than a server. Caveat: I have used the Selenium webdriver (Chrome version) from .Net and I am assuming it is similar for Go.
The way Selenium works is that you will launch an instance of it from within code, and it creates a live version of the selected browser (i.e. Chrome) and your program retains control over it. Then you write code to tell the browser to navigate to a page, inspect the response, and interact with the browser by filling out form data, clicking on buttons, etc. You can see what is happening on the browser as the code runs, so it is easy to troubleshoot when the interaction doesn't go as planned.
I have used Selenium to upload tens of thousands of records to a website that has no API and only a graphical user interface. Give it a chance.
I need to build a Python scraper to scrape data from a website where content is only displayed after a user clicks a link bound with a Javascript onclick function, and the page is not reloaded. I've looked into Selenium in order to do this and played around with it a bit, and it seems Selenium opens a new Firefox web browser everytime I instantiate a driver:
>>> driver = webdriver.Firefox()
Is this open browser required, or is there a way to get rid of it? I'm asking because the scraper is potentially part of a web app, and I'm afraid if multiple users start using it, I will have a bunch of browser windows open on my server.
Yes, selenium automates web browsers.
You can add this at the bottom of your python code to make sure the browser is closed at the end:
driver.quit()
I am trying to scrape a website that contains images using a headless Selenium.
Initially, the website populates 50 images. If you scroll down more and more images are loaded.
Windows 7 x64
python 2.7
recent install of selenium
[1] Non-Headless
Navigating to the website with selenium as follows:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get(url)
browser.execute_script('window.scrollBy(0, 10000)')
browser.page_source
This works (if anyone has a better suggestion please let me know).
I can continue to scrollBy() until I reach the end and then pull the source page.
[2] Headless with HTMLUNIT
from selenium import webdriver
driver = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNIT)
driver.get(url)
I cannot use scrollBy() in this headless environment.
Any suggestions on how to scrape this kind of page?
Thanks
One option is to study the JavaScript to see how it calculates what to load next. Then implement that logic in your scraping client instead. Once you have done that, you can use faster scraping tools like Perl's WWW::Mechanize.
You need to enable JavaScript explicitly when using the HtmlUnit Driver:
driver.setJavascriptEnabled(true);
According to [http://code.google.com/p/selenium/wiki/HtmlUnitDriver](the docs), it should emulate IE's JavaScript handling by default.
When I tried the same method, I got error messages that selenium crashed while connecting java to simulate javascript.
I wrote the script into execute_script method then the code works well.
I guess the communication between selenium and java server part is not configured properly.
Enabling the javascript with HTMLUNITDRIVERWITHJS is possible and quick ;)
I am using Chrome driver for my Selenium test case. It is working fine. There is a performance issue in my project, so I want to migrate the testcase from ChromeDriver to HtmlUnitDriver. When I am trying to use HtmlUnitDriver in my testcase, by just changing the driver name with HtmlUnitDriver, the selenium testcase is not working.
After working around with this driver, I thought that HtmlUnitDriver is not loading the entire page.
Why I am telling this is because HtmlUnitDriver can find some div id's which are in the beginning of the page.
Other divs were not found by this driver. I am getting NoSuchElementException for this div id's.
So please help me to resolve this problem in my project.
Aren't the elements you are looking for created by JavaScript/AJAX calls? You might need to enable JavaScript support in HtmlUnitDriver first.
But beware, it could work well, but it could behave differently from what you see in the real browsers.
Otherwise, are you using Implicit/Explicit Waits for your searches? Even with JS enabled, sometimes it takes a while before all asynchronous requests are handled.