HtmlUnit skips table data text after <b> tag - html-table

I am using HtmlUnit version 2.10. I am reading data from an html table. The cell in question contains:
<td colspan="2" id="num_custs_text">
<b>Affected Customers:</b> 22
</td>
If I use:
final List<?> elements = pageHtml.getByXPath(getXPath());
for (Object rowObject : elements) {
(...)
String rowDataString = rowData.asText();
(...)
}
the rowDataString only contains "Affected Customers:". It does not contain "22". I have tried dumping the entire page to a log using pageHtml.asXml() but the output does not contain "22". It looks like HtmlUnit is ignoring the text after the tag on the initial getPage operation.
How do I force HtmlUnit to load?
Thank you,
Neil

Given the fact that you have not provided the input HTML text nor the XPath you use it is impossible to determine whether you have a wrong XPath string or not. I will assume you have the right XPath string.
Now, you said:
I have tried dumping the entire page to a log using pageHtml.asXml() but the output does not contain "22"
If that is the case, then how do you know the 22 is actually there? I will assume you have checked that in an actual web browser. Was JavaScript enabled in that browser? I will assume it was.
Then the most likely issue is that the 22 is being set by JavaScript (maybe AJAX) and HtmlUnit is failing to fetch it (or you don't have JavaScript enabled in HtmlUnit).
Were my guesses right?

Related

Selenium xpath failing to find element (other xpath tools prove it's there)

Selenium FindElement:
driver.FindElement(By.XPath($"//*[contains(text(), '{text}')]"));
Throws:
no such element: Unable to locate element:
{
"method":"xpath",
"selector":"//*[contains(text(), '269424ae-4d74-4a68-91e0-1603f2d674a0')]"
}
(Session info: chrome=74.0.3729.169)
(Driver info:
chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729#{#29}),
platform=Linux 4.18.0-20-generic x86_64)
But it's definitely there and the xpath is valid because I can use AngleSharp to parse the driver's page source with the same xpath expression:
new HtmlParser()
.ParseDocument(driver.PageSource)
.SelectSingleNode($"//*[contains(text(), '{text}')]");
The target element is a div containing a guid:
<div class="home-project-title-text"> 269424ae-4d74-4a68-91e0-1603f2d674a0 </div>
This is with
dotnet core 2.2
chrome webdriver
Chrome 74
Ubuntu 18.04
EDIT1
Interestingly the document.evaluate in the browser console also fails with this xpath expression. I use this as a helper function for running xpath:
selectSingle = xpath => document.evaluate(xpath, document).iterateNext()
and then find that this returns null:
> selectSingle("//*[contains(text(), '269424ae-4d74-4a68-91e0-1603f2d674a0')]")
> null
but it's definitely there and has the expected text, e.g. I can use a different xpath expression to manually locate and check it's text content:
> selectSingle("//*[#id='app']/div/div[1]/div[3]/div/div[1]/div/div[1]/div")
.textContent
.trim()
== "269424ae-4d74-4a68-91e0-1603f2d674a0"
> true
EDIT2
So the cause was that the div was being created in react like this:
React.createElement(
"div",
{className = "home-project-title-text"},
" ",
"269424ae-4d74-4a68-91e0-1603f2d674a0",
" ");
I think this roughly means that the div has three textnodes as children (is that valid?). The result looks 100% normal - it renders perfectly and inspecting the element with devtools looks like a single text node and .textContent returns the concatenated string.
Now that you gave some more information (how this element is created):
Yes, it is possible that an XML element has as its children several separate text nodes. However, this is usually not the case if the text nodes are adjacent to each other, instead of separated by child elements.
If '269424ae-4d74-4a68-91e0-1603f2d674a0' is indeed the second text node, then
//*[contains(text(), '269424ae-4d74-4a68-91e0-1603f2d674a0')]
will indeed not return this div element. You should not think of this as "breaking XPath", it is just that the precise semantics of the expression are:
Find an element with any name whose first text node contains '269424ae-4d74-4a68-91e0-1603f2d674a0'.
text() actually selects all text nodes of an element, but XPath functions such as contains() silenty select only the first one.
What you actually would like to select is
an element with any name where any text node contains '269424ae-4d74-4a68-91e0-1603f2d674a0'
And an expression to achieve exactly that is:
//*[text()[contains(.,'269424ae-4d74-4a68-91e0-1603f2d674a0')]]
You can test those expressions with a small toy document such as:
<div className="home-project-title-text">
<other/>
269424ae-4d74-4a68-91e0-1603f2d674a0
<other/>
</div>
Where other elements are forcing the div element to contain three separate text nodes, two of them containing whitespace only.
Finally, if you already know that the element you are looking for is a div, then you should look specifically for that:
//div[text()[contains(.,'269424ae-4d74-4a68-91e0-1603f2d674a0')]]
It might be the case the element lives in an iframe, if this is the case - you will have to use IWebDriver.SwitchTo() function in order to switch to the required iframe prior to attempting locating the element.
It might be the case the element is not immediately available, i.e. it's being loaded by an AJAX request, in that case you will need to use WebDriverWait class so Selenium could wait till the element appears in DOM prior to interacting with it.
Try the following xpath.See if you get any luck.
//div[#class='home-project-title-text'][contains(.,'269424ae-4d74-4a68-91e0-1603f2d674a0')]
EDIT
//div[contains(.,'269424ae-4d74-4a68-91e0-1603f2d674a0')]

How to write a XPath for the text one4

I want to use XPath to locate a link behind a text.
I want to use XPath to locate a link behind a text. For example, locate "one4" by "what10". You can only use the text message "what10", but you can't use it in any other way, because the information on this page will change. I want to get is the "one4" link node.
<body>
<p>
so
<br>what1 one
<br>what2two
<br>what11one4
<br>what3three
<br>what4one1
<br>what5two2
<br>what6three3
<br>what7one3
<br>what8two3
<br>what9three3
<br>what10one4
<br>just return
<br></p>
</body>
For some special reasons, what I want to pass is that the text of what10 is positioned to one4.
Please help me.
You can use below line
WebElement loginLink = driver.findElement(By.linkText("one4"));
Selenium doesn't supports xpath-2.0 but uses xpath-1.0
The element which you are trying to refer i.e. which contains the text what10 is a Text Node and Selenium can't use it as a reference. So finding the node with text as one4 with reference to the text what10 won't be possible. As an alternative if the desired node is always the last but one node you can use the following solution:
xpath:
driver.findElement(By.xpath("//body/p//a[position()=last()-1]"));
Update
As per #MosheSlavin counter question here is the snapshot to demonstrate that the XPath works perfecto:

Finding text on page with Selenium 2

How can I check whether a given text string is present on the current page using Selenium?
The code is this:
def elem = driver.findElement(By.xpath("//*[contains(.,'search_text')]"));
if (elem == null) println("The text is not found on the page!");
If your searching the whole page for some text , then providing an xpath or selector to find an element is not necessary. The following code might help..
Assert.assertEquals(driver.getPageSource().contains("text_to_search"), true);
For some reason, certain elements don't seem to respond to the "generic" search listed in the other answer. At least not in Selenium2library under Robot Framework which is where I needed this incantation to find the particular element:
xpath=//script[contains(#src, 'super-sekret-url.example.com')]
A simpler (but probably less efficient) alternative to XPaths is to just get all the visible text in the page body like so:
def pageText = browser.findElement(By.tagName("body")).getText();
Then if you're using JUnit or something, you can use an assertion to check that the string you are searching for is contained in it.
assertThat("Text not found on page", pageText, containsString(searchText));
Using an XPath is perhaps more efficient, but this way is simpler to understand for those unfamiliar with it. Also, an AssertionError generated by assertThat will include the text that does exist on the page, which may be desirable for debugging as anybody looking at the logs can clearly see what text is on the page if what we are looking for isn't.

selenium getXpathCount

HI there
selenium.getXpathCount does not find element, any one hoas any idea ? Here is my code:
if (existArtist){
int result = selenium.getXpathCount("//*[#id='chugger-results']/div[1]/ul/li").intValue();
if (result>0){
//DO THIS
Either you have a broken DOM (Do a W3C Validation and see if you have any unclosed tags) or your XPath is looking for an element that doesn't exist.
We would need to see the entire HTML of the page to be able to answer your question (more visibility of your test code would be useful too)

Selenium RC Having problems with XPath for a table

I'm trying to select an element given by:
/html/body[#id='someid']/form[#id='formid']/div[#id='someid2']/div[#id='']/div[#id='']/div[#id='']/table/tbody[#id='tableid']/tr[7]/td[2]
Now the html of that row I'm trying to select looks like this:
<tr>
<td class="someClass">some text</td>
<td class="someClass2">my required text for verifying</td>
</tr>
I need to check whether my required text for verifying exists in the page.
I used selenium.isTextPresent("my required text for verifying"); and it doesnt work
So now I tried with selenium.isElementPresent("//td[contains(text(),'my required text for verifying')]")
This works sometimes but occassionally gives random failures.
Tried with selenium.isElementPresent(//*[contains(text(),'my required text for verifying')]) too..
How do I verify this text on the page using selenium?
The problem is not with the page taking time to load. I took screenshots before the failure occurs and found that the page was fully loaded so that shouldnt be the problem.
Could someone please suggest any way to select this element or any way to validate this text on the screen?
Try locating it by CSS:
assertText(selenium.getText("css=.someClass2"), "my required text for verifying");
The above should give a better failure message than isElementPresent, but you can still use that with CSS locators:
assertTrue(selenium.isElementPresent("css=.someClass2"));
If there is an issue with the load times you could try waiting for the element to be present:
selenium.waitForCondition("var value = selenium.isElementPresent('css=.someClass2'); value == true", "60000");
Some other XPath locators that might work for you, if you prefer not to use CSS locators:
//td[contains(#class, 'someClass2')
xpath=id('tableid')/tr[7]/td[2]
xpath=id('tableid')/descendant::td[contains(#class, 'someClass2')][7]
I've never heard of selenium; but your initial XPath is unnecessarily fragile and verbose.
If an element has an id, it's unique; using such a long XPath just to select a particular element is unnecessary; just select the last element with the id. Further, I see that you're occasionally selecting xyz[#id=''] - if you're trying to select elements without id attributes, you can do `xyz[not(#id)] instead.
Assuming your initial XPath is basically correct, it would suffice to do something like this:
//tbody[#id='tableid']/tr[7]/td[2]
However, using a specific row and column number like that is asking for trouble if ever anyhow changes details of the html. Also, it's atypical to have id's on tbody elements, perhaps the table element has the id?
Finally, you may be running into space-normalization issues. In xml, multiple consecutive spaces are often considered equivalent to a single space, and you're not accounting for that. In particular, if the xhtml is pretty-printed and contains a line-break in the middle of your sought-after text, it won't work.
//td[contains(normalize-space(text()),'my required text for verifying')]
Finally, text() explicitly selects
child text nodes - so the above xpath won't select elements where the text isn't the immediate child of td (e.g. <td><b>my required text for verifying</b></td>) won't match. Perhaps you mean to look up the concatenated text vale of all descendents:
//td[contains(normalize-space(string(.)),'my required text for verifying')]
Finally, type conversion can be implicit in XPath, so string(.) can be replaced by . in the above, leading to the version:
//td[contains(normalize-space(.),'my required text for verifying')]
This may be slow on large documents since it needs to normalize the spaces and perform a string search for each td element. If you run into perf problems, try to be more specific about which td elements need to be inspected, or, if you don't care where the text occurs, try to reduce the number of "calls" to normalize-space by normalizing the entire doc in one go (e.g. via /*[contains(normalize-space(.),'my required text for verifying')]).