Why HtmlAgilityPack adds some characters to my html - vb.net

Here is my code:
Dim input = "<div><textarea>something</div></textarea>"
Dim doc As New HtmlAgilityPack.HtmlDocument
doc.OptionOutputAsXml = True
doc.LoadHtml(Input)
Using writer As New StringWriter
doc.Save(writer)
Dim res = writer.ToString
End Using
and the value of 'res' is:
"<?xml version="1.0" encoding="windows-1255"?>
<div>
<textarea>
//<![CDATA[
something
//]]>//
</textarea>
</div>"
the result as html is: My textarea
How can I prevent it ?

From my understanding of it, the reason is implied by this answer to Set textarea value with HtmlAgilityPack:
A <textarea> element doesn't have a value attribute. It's content is it's own text node:
<textarea>
Some content
</textarea>
To simulate the same thing safely, HAP has to enclose the content in a //<![CDATA[ section.
The source code for HAP has this comment for the relevant line(s):
// tags whose content may be anything
ElementsFlags.Add("textarea", HtmlElementFlag.CData);
So, you can't prevent it.

Related

How to replace some text with URL in vue?

I want to replace some text from the string with link. I dont know how can I display it. SO far below code displays string with href link.
<span class="text">{{ $t(myText) }}</span>
myText() {
var text = "You can click text.";
var href = "<a href='https://www.google.com'>Click Here</a>";
var replaced = text.replace("click", href);
return replaced;
},
To elaborate on my comment: the handlebars/moustache syntax is used to insert plain text into your template. That means that any string that contains HTML will be inserted as-is without parsing it as DOM.
In order to insert HTML into your template, you will need to use the v-html directive, i.e.:
<span class="text" v-html="$t(myText)"></span>
However, note that this presents a security risk if you're allowing users to insert their own content into the element.

replacing attributes within an html image tag

I have a 1000+ database entries that contain html image tags.
The problem is, 90% of the 'src' attributes are just placeholders. I need to replace all of those placeholders with the appropriate, real sources.
A typical database entry looks like this(the amount of image tags vary from entry to entry):
<p>A monster rushes at you!</p>
Monster:<p><img id="d8fh4-gfkj3" src="(image_placeholder)" /></p>
<br />
Treasure: <p><img id="x23zo-115a9" src="(image_placeholder)" /></p>
Please select your action below:
</br />
Using the IDs in the image tags above, 'd8fh4-gfkj3' & 'x23zo-115a9', I can query another function to get the "real" sources for those images.
So I tried using HtmlAgilityPack and came up with this(below):
Dim doc As New HtmlDocument()
doc.LoadHtml(encounterText)
For Each imgTag As HtmlNode In doc.DocumentNode.SelectNodes("//img")
'get the ID
Dim imgId As HtmlAttribute = imgTag.Attributes("id")
Dim imageId As String = imgId.Value
'get the new/real path
Dim newPath = getMediaPath(imageId)
Dim imgSrc As HtmlAttribute = imgTag.Attributes("src")
'check to see if the <img> tag "src" attribute has a placeholder
If imgSrc.Value.Contains("(image_placeholder)") Then
'replace old image src attribute with 'src=newPath'
End If
Next
But I can't figure out how to actually replace the old value with the new value.
Is there a way to do this with the HtmlAgilityPack?
Thanks!
You should be able to just set the value for the attribute:
'check to see if the <img> tag "src" attribute has a placeholder
If imgSrc.Value.Contains("(image_placeholder)") Then
'replace old image src attribute with 'src=newPath'
imgSrc.Value = newPath
End If
After the replacement, you can get the updated HTML with:
doc.DocumentNode.OuterHtml

vb.net get src links insade iframes with htmlagilitypack

im using htmlagility and trying to get both wanted1 and wanted2
the html code is like this
<div class='class1' id='id1'>
<iframe id="iframe1" src="wanted1"</iframe>
<iframe id="iframe" src="wanted2"</iframe>
</div>
but no luck can someone help me please
Here is a commented sample to get you started:
Dim htmlDoc As New HtmlAgilityPack.HtmlDocument
Dim html As String = <![CDATA[<div class='class1' id='id1'>
<iframe id="iframe1" src="wanted1"</iframe>
<iframe id="iframe" src="wanted2"</iframe>
</div>]]>.Value
'load the html string to the HtmlDocument we defined
htmlDoc.LoadHtml(html)
'using LINQ and some xpath you can target any node you want
' //iframe[#src] xpath passed to the SelectNodes function means select all iframe nodes that has src attribute
Dim srcs = From iframeNode In htmlDoc.DocumentNode.SelectNodes("//iframe[#src]")
Select iframeNode.Attributes("src").Value
'print all the src you got
For Each src In srcs
Console.WriteLine(src)
Next
make sure you learn about XPath.

write text inside div element use webbrowser vb.net

Need to write in element .
This is the HTML code
< div id="t" class="message-tools__textarea js-scroller" contenteditable="true" data-placeholder="Write a message (Enter to send)">XXXXXX< /div>
My question is how to write text like XXXXXX ?????
Add this:
<div id="t" runat="server" ...
And then in your codebehind:
t.InnerText = "XXXXXXX" ' or t.InnerHTML if you're adding HTML code).
So, if the text to be replaced is fixed and has only 1 instance within the HTML code, it should be rather simple:
Dim OrigCode, ModifiedCode as string
OrigCode = GetGoogleCodeFromURL ' get the code
ModifiedCode = OrigCode.Replace("XXXXXXX","ZZZZZZ")
Dim MyHTML as string = "<head>...</head><body><h1>Hello World!</h1><p> </p>" & _
ModifiedCode & "<p> </p><p>That's it.</p>"
But usually, things are a bit more complicated, so I'm not sure, if you expressed your wish precisely. Also, if the code is big and action is recursive, it might be better to break it into parts and handle only relevant part of it, due to performance issues.

How to get elements inside of commented out code HtmlAgilityPack in VB.NET

Is there a way to use HtmlAgilityPack on html that is inside <!-- --> comment blocks? For example, how can I target the inner text of "//div.[#class='theClass']" that is inside a block like this:
<!-- <div class="theClass'>Hello I am <span class="theSpan">some text.</span> </div>-->
So that I get
Hello I am some text.
The reason I ask is because I kept finding that this kept returning NULL, because the div's are inside comments:
htmlnodes = htmldoc.DocumentNode.SelectNodes("//div[#class='theClass']")
Unfortunately, XPath treats comment node content as plain text, means you can't query the content just like common nodes.
One possible way is to parse the comment node content as another HtmlDocument so you can query from it, for example :
'get desired comment node'
Dim htmlnode As HtmlNode = htmldoc.DocumentNode.SelectSingleNode("//comment()[contains(., theClass)]")
Dim comment As New HtmlDocument()
'remove the outer <!-- --> so we have clean content'
comment.LoadHtml(htmlnode.InnerHtml.Replace("<!--", "").Replace("-->", ""))
'here you can use common XPath query again'
Dim result As HtmlNode = comment.DocumentNode.SelectSingleNode("//div[#class='theClass']")
'following line will print "Hello I am some text."'
Console.WriteLine(result.InnerText)