I would like to search a Beautiful Soup element for a text match and return the sequence of tags that lead to the element containing that text.
For example, if at soup.html.head.meta there is text “Hello everybody”, I would like to search on “soup.head” for “Hello everybody” and return the result “soup.html.head.meta”.
Is there a good way to do this and if there is not a simple way, is there a good workaround for quickly finding out where certain known text is located?
Example:
I retrieved the HTML source code from this URL with wget: https://www.gitpod.io/docs/context-urls
I created a Beautiful Soup object from this document like so:
soup = bs4.BeautifulSoup(doc, 'html.parser')
The method soup.html.head.get_text() returns
'\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nGitpod
Contexts\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'
I know that somewhere in the head element is some text, "Gitpod Contexts". I would like to know the nearest element tag so I can delete everything except that element, because I am trying to prune the Beautiful Soup object to just contain elements with text in them, myself, without using "get_text()" over the entire object and just automatically pulling it out.
Example 2
A simpler demonstration would be this:
<html>
<body>
<p>
Hello!
</p>
<p>
Goodbye!
</p>
</body>
</html>
The function:
html.returnLocationOf("Hello!")
returns:
html.body.p
I don't know enough about Beautiful Soup to know how it would specify "the second p" for "Goodbye!" but I imagine it could be incorporated as a method somehow.
In page source I have script tags as below,
how to validate in selenium that particular scripts are persent???
<script src="/core/assets/vendor/domready/ready.min.js?v=1.0.8"></script>
<script src="/core/misc/drupalSettingsLoader.js?v=8.4.8"></script>
<script src="/core/misc/drupal.js?v=8.4.8"></script>
<script src="/core/misc/drupal.init.js?v=8.4.8"></script>
you can search for attribute src within a script. i.e. finding element by attribute
driver.findElement(By.xpath("//script[#src='/core/assets/vendor/domready/ready.min.js?v=1.0.8']"))
OR
driver.findElement(By.xpath("//script[contains(#src,'/core/assets/vendor/domready/ready.min.js?v=1.0.8')]")
OR
driver.findElement(By.cssSelector("script[src='/core/assets/vendor/domready/ready.min.js?v=1.0.8']"))
Here is my html file:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<script id="ScriptId" src=""></script>
</body>
</html>
I want to replace empty src by script.js.
I tried with XmlPoke, but my XPath query doesn't work I think or maybe I can't do this way:
<XmlPoke XmlInputPath="test.html"
Query="/html/body/script[id='ScriptId']/src"
Value="script.js"/>
Thanks in advance to help me to update this src value.
Attributes in XPath are prefixed with #.
/html/body/script[#id='ScriptId']/#src
You probably shouldn't be using something designed for XML with HTML as two are not the same, at best, if HTML is well-formed, it'll strip out non-XML stuff like DOCTYPE, at worst it'll blow up.
I am processing some HTML in VBA and want to inject a element to the tag.
oElement.insertAdjacentHTML "beforeEnd", "<base>HELLO</base>"
If I inspect the oElement.OuterHTML all that is added is HELLO
...<LINK rel=stylesheet type=text/css href="css/default.css">HELLO</HEAD>...
If I try adding li tags , it works as expected.
oElement.insertAdjacentHTML "beforeEnd", "<li>HELLO</li>"
Result
....<LINK rel=stylesheet type=text/css href="css/default.css">HELLO <LI>HELLO</LI> </HEAD>...
I've tried using just <base /> or <base href="blah blah , nothing get's added. Am I missing some key piece of knowledge about insertAdjacentHTML.
Any ideas??
You need to use IHTMLDOMNode interface for head object (don't know why, but it works). Create a "BASE" element, set attribute for href and finally add it to a head using appendChild.
Is it valid to put h2 tag in span tag given that the span tag is displayed as block?
would it make difference for search engines (SEO) if i used div instead
Sample input:
<!DOCTYPE HTML>
<html>
<head><title></title></head>
<body>
<span style="display: block">
<h2>A</h2>
</span>
</body>
</html>
And results from W3C validator:
Element h2 not allowed as child of element span in this context.
No, you can't. Accordind to HTML 4.01/XHTML 1.0 dtd you can include only inline elements in span tag. It's the following one:
a, object, applet, img, map, iframe, br, span, bdo, tt, i, b, u, s, strike, big, small, font, basefont, sub, sup, em, strong, dfn, code, q, samp, kbd, var, cite, abbr, acronym, input, select, textarea, label, button, ins, del, script.
Can't quickly check HTML 5, but don't think it's different here.
In HTML4, it is not valid to put any block element inside of any inline element.
This changes in HTML5, where it is valid to put block-level elements inside of anchor tags.