HTML Agility Pack, create new line in HTML file - vb.net

Dim codice As String
Dim doc As New HtmlDocument
Dim coll As HtmlNodeCollection
Dim node As HtmlNode
Dim nuovo As HtmlNode
codice = "<li>� " + T_ClasNome.Text + "</li>"
doc.Load("classifica.html")
coll = doc.GetElementbyId("subnavi").SelectNodes("ul")
node = coll.Last
nuovo = HtmlNode.CreateNode(codice)
node.AppendChild(nuovo)
doc.Save("classifica.html")
This add a line of HTML in "codice" at a specified position, but I've noticed that everytime it writes to my HTML file it doesn't go to a new line, so it will write:
**(1st item)**<li>� 3 Class</li>**(2nd item)**<li>� classificagioca.0tori.htm</li>
How can I go to a new line in the HTML file for a more confortable view?

In C# you can try something like this.
var newLineNode = HtmlNode.CreateNode("\r\n");
var nuovo = HtmlNode.CreateNode(codice);
node.AppendChild(newLineNode);
node.AppendChild(nuovo);
node.AppendChild(newLineNode);

Related

Remove Signature from Xml

I have a file like this <ufile.io/g8dy86x0>
I'd like remove signature.
I try this code but doesn't work
Dim doc As XmlDocument = New XmlDocument
doc.Load(_filename)
Dim rnode As XmlNode = doc.SelectSingleNode("Signature")
doc.OwnerDocument.DocumentElement.RemoveChild(rnode)
What should i try ?

Extract Images with text from PDF and Edit it using iTextSharp

I am trying to do following things in Windows Forms
1) Read a PDF in Windows Forms
2) Get the Images with Text in it
3) Color / fill the Image
4) save everything to a new file
I have tried Problem with PdfTextExtractor in itext!
But It didn't help.
Here is the code I've tried:
Public Shared Sub ExtractImagesFromPDF(sourcePdf As String, outputPath As String)
'NOTE: This will only get the first image it finds per page.'
Dim pdf As New PdfReader(sourcePdf)
Dim raf As RandomAccessFileOrArray = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
Try
For pageNumber As Integer = 1 To pdf.NumberOfPages
Dim pg As PdfDictionary = pdf.GetPageN(pageNumber)
' recursively search pages, forms and groups for images.'
Dim obj As PdfObject = FindImageInPDFDictionary(pg)
If obj IsNot Nothing Then
Dim XrefIndex As Integer = Convert.ToInt32(DirectCast(obj, PRIndirectReference).Number.ToString(System.Globalization.CultureInfo.InvariantCulture))
Dim pdfObj As PdfObject = pdf.GetPdfObject(XrefIndex)
Dim pdfStrem As PdfStream = DirectCast(pdfObj, PdfStream)
Dim bytes As Byte() = PdfReader.GetStreamBytesRaw(DirectCast(pdfStrem, PRStream))
If (bytes IsNot Nothing) Then
Using memStream As New System.IO.MemoryStream(bytes)
memStream.Position = 0
Dim img As System.Drawing.Image = System.Drawing.Image.FromStream(memStream)
' must save the file while stream is open.'
If Not Directory.Exists(outputPath) Then
Directory.CreateDirectory(outputPath)
End If
Dim path__1 As String = Path.Combine(outputPath, [String].Format("{0}.jpg", pageNumber))
Dim parms As New System.Drawing.Imaging.EncoderParameters(1)
parms.Param(0) = New System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0)
'Dim jpegEncoder As System.Drawing.Imaging.ImageCodecInfo = iTextSharp.text.Utilities.GetImageEncoder("JPEG")'
img.Save(path__1) 'jpegEncoder, parms'
End Using
End If
End If
Next
Catch
Throw
Finally
pdf.Close()
raf.Close()
End Try
End Sub
Now, the actual purpose of this is to get something like this
If this is the actual PDF, I will have to check if there any any items in that bin(by Text in that box)
If there are items then I have to color it like below
Can someone help me with this
The PDF can be retrieved here.

Error "Data at the root level is invalid" after transformation, while doing a LoadXML

I am trying to do some XSLT transformation, to convert an XML to XML, using the following lines of code. When i try to create an XMLDocument object from the transformed XML i am getting an error
Data at the root level is invalid. Line 1, position 1.
Dim outputXML As New XmlDocument
Dim stream As New MemoryStream
Dim writer As XmlTextWriter = New XmlTextWriter(stream, System.Text.UnicodeEncoding.UTF8)
Dim navigator As XPathNavigator = illustratePlusXML.CreateNavigator()
Dim transormer As XslCompiledTransform = New XslCompiledTransform()
transormer.Load(ConfigurationManager.AppSettings("XSLT_File_Path"))
transormer.Transform(navigator, Nothing, writer)
Dim output As String = System.Text.UnicodeEncoding.UTF8.GetString(stream.ToArray())
outputXML.LoadXml(output)
Return outputXML
I could find a special character(square box), i persume this is causing the error. attached snapshot of the output xml. Can somebody please suggest ?
If you want to populate an XmlDocument as the result of an XSLT transformation then simply do
Dim resultDoc As New XmlDocument()
Using xw As XmlWriter = resultDoc.CreateNavigator().AppendChild()
Dim navigator As XPathNavigator = illustratePlusXML.CreateNavigator()
Dim transormer As XslCompiledTransform = New XslCompiledTransform()
transormer.Load(ConfigurationManager.AppSettings("XSLT_File_Path"))
transormer.Transform(navigator, Nothing, xw)
xw.Close()
End Using
There is no need to use a MemoryStream. If you really think you need to use a MemoryStream then make sure you reset its Position to 0 before calling the Load method.

Inserting both XmlDocumentType and XmlDeclaration to XmlDocument

I have some trouble creating my XmlDocument class. This is what I've tried to do:
Dim myDoc = New XmlDocument()
Dim docType As XmlDocumentType = myDoc.CreateDocumentType("DtdAttribute", Nothing, "DtdFile.dtd", Nothing)
myDoc.XmlResolver = Nothing
myDoc.AppendChild(docType)
Dim xmldecl As XmlDeclaration = myDoc.CreateXmlDeclaration("1.0", Encoding.GetEncoding("ISO-8859-15").BodyName, "yes")
Dim root As XmlElement = myDoc.CreateElement("RootElement")
myDoc.AppendChild(root)
myDoc.InsertBefore(xmldecl, root)
This will result in error: Cannot insert the node in the specified location. Line throwing this error is myDoc.InsertBefore(xmldecl, root)
Just cannot figure out this. Which order should I insert these elements? I've tried different orders but I think I'm just doing something totally wrong and this shouldn't even work in the first place :) But then how to do this?
This works for me:
Dim myDoc As New XmlDocument()
Dim xmldecl As XmlDeclaration = myDoc.CreateXmlDeclaration("1.0", Encoding.GetEncoding("ISO-8859-15").BodyName, "yes")
myDoc.AppendChild(xmldecl)
Dim docType As XmlDocumentType = myDoc.CreateDocumentType("DtdAttribute", Nothing, "DtdFile.dtd", Nothing)
myDoc.XmlResolver = Nothing
myDoc.AppendChild(docType)
Dim root As XmlElement = myDoc.CreateElement("DtdAttribute")
myDoc.AppendChild(root)
Note that the root element name must be the same as the name parameter given to XmlDocument.CreateDocumentType.
You may find, however, that for building an XML document from scratch like this, it's easier to just use the XmlTextWriter:
Using writer As New XmlTextWriter("C:\Test.xml", Encoding.GetEncoding("ISO-8859-15"))
writer.WriteStartDocument()
writer.WriteDocType("DtdAttribute", Nothing, "DtdFile.dtd", Nothing)
writer.WriteStartElement("DtdAttribute")
writer.WriteEndElement()
writer.WriteEndDocument()
End Using

Is there any easy method for splitting text in VB.NET?

Is there any easy method for splitting text in VB.NET? (using a start and end string to grab whats in between?)
I do this all the time in JScript with the following:
<junk>
<blah>
<data>someData1</data>
<data>someData2</data>
<data>someData3</data>
</blah>
</junk>
var data = string.split('<data>')[1].split('</data>')[0];
would give me "someData1" by changing the [1] index to [2] would give me "someData2" very easy
for some reason this seems to be very difficult to achieve in VB.NET.
Here is a chunk of the actual HTML I'm dealing with:
<...malformed html>
<div style='font-size:10pt;font-family:Times;color:#000000;position:absolute;top:2731.068;left:48'>Total</div>
<div style='font-size:10pt;font-family:Times;color:#000000;position:absolute;top:2731.068;left:346.2141'>18,072.59</div>
<div style='font-size:10pt;font-family:Times;color:#000000;position:absolute;top:2731.068;left:444.3433'>100.00%</div>
<div style='font-size:10pt;font-family:Times;color:#000000;position:absolute;top:2731.068;left:567.1293'>21,687.11</div>
<div style='font-size:10pt;font-family:Times;color:#000000;position:absolute;top:2731.068;left:666.3433'>100.00%</div>
<malformed html...>
I need to find the <div>Total</div> index then grab the data between the 1st and 3rd divs after that.
Dim e = XElement.Parse(str)
Dim a = e.XPathSelectElements("./blah").Elements().ToArray()
a(0).Value 'someData1
a(1).Value 'someData2
EDIT:
To parse html try using the Html Agility Pack
I got it working, although this is some of the worse code I've ever written...
Dim sr As StreamReader
sr = New StreamReader("C:\test.html")
Dim xactHTML As String = sr.ReadToEnd
Dim left As Integer = xactHTML.IndexOf("Total</div>")
Dim chunk1 As String = xactHTML.Substring(left + 12)
Dim right As Integer = chunk1.IndexOf("<div style='position")
Dim chunk2 As String = chunk1.Substring(0, right - 1)
Dim xHTML As String = "<xml>" & chunk2 & "</xml>"
Dim e = XElement.Parse(xHTML)
Dim a = e.Elements().ToArray()
Dim damageAmmount As String = a(2).Value()