Select elements using Selenium Webdriver PHP? - selenium

I have a number of page elements that I want to store in a variable and loop through using Selenium Webdriver PHP.
For example:
< cite > Name 1 < /cite >
< cite > Name 2 < /cite >
<cite > Name 3< /cite >
I am using the following code, but it doest give me the results from above(i.e. Name 1) etc. How do I grab the text from the element using Selenium Webdriver.
$users = $driver->findElements(
WebDriverBy::xpath('//cite')
)->getText();
foreach($users as $u)
{
echo $u;
}
I am using Selenium Webdriver Facebook wrapper

I don't really know PHP, but in Java, you'd do something similar to:
List<WebElement> elements = driver.findElements(By.xpath("//cite"));
for (WebElement element: elements) {
System.out.println(element.getText());
}
Given that, I'd assume that the PHP equivalent would be something like this:
$users = $driver->findElements(WebDriverBy::xpath('//cite'));
foreach($users as $u)
{
echo $u->getText();
}

JimEvan's answer is correct. If you're getting a "non-object" type error, you should check to make sure the findElements() call is actually returning something:
$users = $driver->findElements(WebDriverBy::xpath('//cite'));
fwrite(STDOUT, "Number of users: " . count($users));
Perhaps it has something to do with the space characters in your element tags (just a guess)?

Correct Code is.
$users = $driver->findElements(WebDriverBy::xpath('//cite')); foreach($users as $u) {echo $u->getText();}

Related

Perl : Scrape website and how to download PDF files from the website using Perl Selenium:Chrome

So I'm studying Scraping website using Selenium:Chrome on Perl, I just wondering how can I download all pdf files from year 2017 to 2021 and store it into a folder from this website https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021 . So far this is what I've done
use strict;
use warnings;
use Time::Piece;
use POSIX qw(strftime);
use Selenium::Chrome;
use File::Slurp;
use File::Copy qw(copy);
use File::Path;
use File::Path qw(make_path remove_tree);
use LWP::Simple;
my $collection_name = "mre_zen_test3";
make_path("$collection_name");
#DECLARE SELENIUM DRIVER
my $driver = Selenium::Chrome->new;
#NAVIGATE TO SITE
print "trying to get toc_url\n";
$driver->navigate('https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021');
sleep(8);
#GET PAGE SOURCE
my $toc_content = $driver->get_page_source();
$toc_content =~ s/[^\x00-\x7f]//g;
write_file("toc.html", $toc_content);
print "writing toc.html\n";
sleep(5);
$toc_content = read_file("toc.html");
This script only download the entire content of the website. Hope someone here can help me and teach me. Thank you very much.
Here is some working code, to help you get going hopefully
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # only convenience
use Selenium::Chrome;
my $base_url = q(https://www.fda.gov/drugs/)
. q(warning-letters-and-notice-violation-letters-pharmaceutical-companies/);
my $show = 1; # to see navigation. set to false for headless operation
# A little demo of how to set some browser options
my %chrome_capab = do {
my #cfg = ($show)
? ('window-position=960,10', 'window-size=950,1180')
: 'headless';
'extra_capabilities' => { 'goog:chromeOptions' => { args => [ #cfg ] } }
};
my $drv = Selenium::Chrome->new( %chrome_capab );
my #years = 2017..2021;
foreach my $year (#years) {
my $url = $base_url . "untitled-letters-$year";
$drv->get($url);
say "\nPage title: ", $drv->get_title;
sleep 1 if $show;
my $elem = $drv->find_element(
q{//li[contains(text(), 'PDF')]/a[contains(text(), 'Untitled Letter')]}
);
sleep 1 if $show;
# Downloading the file is surprisingly not simple with Selenium (see text)
# But as we found the link we can get its url and then use Selenium-provided
# user-agent (it's LWP::UserAgent)
my $href = $elem->get_attribute('href');
say "pdf's url: $href";
my $response = $drv->ua->get($href);
die $response->status_line if not $response->is_success;
say "Downloading 'Content-Type': ", $response->header('Content-Type');
my $filename = "download_$year.pdf";
say "Save as $filename";
path($filename)->spew( $response->decoded_content );
}
This takes shortcuts, switches approaches, and sidesteps some issues (which one need resolve for a fuller utility of this useful tool). It downloads one pdf from each page; to download all we need to change the XPath expression used to locate them
my #hrefs =
map { $_->get_attribute('href') }
$drv->find_elements(
# There's no ends-with(...) in XPath 1.0 (nor matches() with regex)
q{//li[contains(text(), '(PDF)')]}
. q{/a[starts-with(#href, '/media/') and contains(#href, '/download')]}
);
Now loop over the links, forming filenames more carefully, and download each like in the program above. I can fill the gaps further if there's need for that.
The code puts the pdf files on disk, in its working directory. Please review that before running this so to make sure that nothing gets overwritten!
See Selenium::Remove::Driver for starters.
Note: there is no need for Selenium for this particular task; it's all straight-up HTTP requests, no JavaScript. So LWP::UserAgent or Mojo would do it just fine. But I take it that you want to learn how to use Selenium, since it often is needed and is useful.

How to write locator for text between div and span (Preferably xpath) which contains non-breaking space (&nbsp)

I want to write xpath for the following div:
<div class='someclass' id='someid'>:TEST SELENIUM 1234<div>
Please note :
tag can be anything such as div, span ,or anchor tag.
can be present anywhere in the text.
What I have tried so far :
//div[contains(text(),":TEST SELENIUM 1234")]
//div[contains(text(),":TEST{ }SELENIUM{ }1234")]
//div[contains(text(),":TEST SELENIUM 1234")]
//div[normalize-space(text()) = ':TEST SELENIUM 1234']
//div[normalize-space(text()) = ':TEST{ }SELENIUM{ }1234']
//div[normalize-space(text()) = ':TEST SELENIUM 1234']
//div[normalize-space(.) = ':TEST SELENIUM 1234']
//div[normalize-space(.) = ':TEST{ }SELENIUM{ }1234']
//div[normalize-space(.) = ':TEST SELENIUM 1234']
//div[normalize-space(.) = ':TEST{\u00a0}SELENIUM{\u00a0}1234']
//div[normalize-space(.) = ':TEST${nbsp}SELENIUM${nbsp}1234']
What has worked for me (thanks to #Andersson)
//div[starts-with(text(), ":TEST") and substring(text(), 7)="SELENIUM" and substring(text(), 16)="1234"]
This is more of a work around and would work only for known Strings.
These are the SO post which I have already followed :
Link1
Link2
Any help will be highly appreciated.
According to the conversation above, you can use this sample xPath builder(JAVA):
public class Test {
public static void main(String[] args) {
String s = ":TEST SELENIUM 1234";
String[] parts = s.split(" ");
StringBuilder xpath = new StringBuilder("//*");
for (int i = 0; i < parts.length; i++){
xpath.append((i == 0) ? "[contains(text(), '" + parts[i] + "')" : " and contains(text(), '" + parts[i] + "')");
}
xpath.append("]");
System.out.println(xpath);
}
}
Output:
//*[contains(text(), ':TEST') and contains(text(), 'SELENIUM') and contains(text(), '1234')]
In Python I would do
required_div = [div for div in driver.find_elements_by_xpath('//div') if div.text == ':TEST SELENIUM 1234'][0]
to find required node by its complete text content ignoring non-breaking space chars
P.S. Again it's just a workaround, but it seem to be quite simple solution

Getting a vaule in a link for later usage in Selenium

I have a link on my webpage which I need to get the value from and save in for later usage (constructing a direct URL).
The html-link I want to obtain the value from look like this:
<a ng-bind="saving.customerContractName || (saving| savingscontract:$parent.$parent.cmsData) " ng-attr-target="{{(saving.type === 'ASK') ? '_blank' : undefined}}" ng-href="/lpn/mo/Logon.action?avtalenummer=176742" class="ng-binding" target="" href="/lpn/mo/Logon.action?avtalenummer=176742">Fondskonto Link (176742)</a>
The value I need to obtain is 176742.
Any tips on how to extract this value? And further use it in a direct URL call (something) like this:
String url2 = "https://www2-t.storebrand.no/ppjs/#/savings/index/THE_VALUE_HERE";
driver.get(url2);
this might work.
txt = driver.find_element_by_partial_link_text("Fondskonto Link").get_attribute("href").split("=")[1]
url = "https://www2-t.storebrand.no/ppjs/#/savings/index/%s" % txt
driver.get(url)

How to get the all element id's of a screen in selenium

How can I get the all element id's of a screen in selenium?
Please refer to this screenshot
Element ID is getting changed every time the page loads. I used contains#id , start-with#id, but it doesn't work every time. Now I want to get all the element id's from the webpage, so I can select the exact element.
My webpage contains input text, buttons, drop-downs.
Use Xpath instead or cssSelector. If you persist on knowing all page ids as a list of String in java, try to execute a javascript function.
// The Firefox driver supports javascript
WebDriver driver = new FirefoxDriver();
// Go to the Google Suggest home page
driver.get("http://www.google.com/webhp?complete=1&hl=en");
ArrayList ids = (ArrayList)((JavascriptExecutor) driver).executeScript(
" var allElements = document.getElementsByTagName('*'); var allIds = []; "
+ " for (var i = 0, n = allElements.length; i < n; ++i) { "
+ " var el = allElements[i]; "
+ " if (el.id) { allIds.push(el.id); } } return allIds; "
);
Regards,
Alan Mehio
London, UK

httpagility pack scraping between broken tag

i need to scrape a p tag which has h3 tag after it but does not have a closing p tag. It looks like this :
<script ad>asdasdasd</script>
<p>Translation companies are
-----------------------
-----------------------
<h3 class="this_class">mind blown site</h3>
There is no </p> tag so i cannot parse it completely. Now i have two questions :
1) can this be parsed using httpagility xpath ?
2) i have a function to find text between two strings (getbetween). But i have a doubt - If i use "asdasdasd" and " is it always 100% that vb.net will use the script tag which is just above h3 because there are 2-3 same lines - "asdasdasd"
3) Any other method you guys are aware of ?
(had to write in code so html does not mess up)
Regards,
It might be a good idea to post some more "real" html to really help you, at least the tags between the h3 and the p.
Anyway, this should get you the p-Tag from the h3-Tag.
HtmlDocument doc = new HtmlDocument();
doc.Load(... //Load the Html...
//Either of these lines will do
HtmlNode pNode = doc.DocumentNode.SelectSingleNode("//h3[#class='this_class']/preceding-sibling::p");
//HtmlNode pNode = doc.DocumentNode.SelectSingleNode("//h3[contains(text(),'mind blown site')]/preceding-sibling::p");
string pInnerHtml = pNode.NextSibling.InnerHtml; //Has the text "Translation companies are...."
So in general, to get all the nodes from the opening p tag to the start of a tag you don't want, you could do this:
var p = doc.DocumentNode.SelectSingleNode("//p");
var h3 = p.SelectSingleNode("following-sibling::h3[#class='this_class']");
var following = new List<string>();
for (var current = p.NextSibling; current != h3; current = current.NextSibling)
{
following.Add(current.InnerText);
}
var innerText = String.Concat(following);