vb.Net: How can I read a large XML file quickly? - vb.net

I am trying to read this XML document.
An excerpt:
<datafile xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="wiitdb.xsd">
<WiiTDB version="20100217113738" games="2368"/>
<game name="Help Wanted: 50 Wacky Jobs (DEMO) (USA) (EN)">
<id>DHKE18</id>
<type/>
<region>NTSC-U</region>
<languages>EN</languages>
<locale lang="EN">
<title>Help Wanted: 50 Wacky Jobs (DEMO)</title>
<synopsis/>
</locale>
<developer>HUDSON SOFT CO., LTD.</developer>
<publisher>Hudson Entertainment, Inc.</publisher>
<date year="2009" month="" day=""/>
<genre>party</genre>
<rating type="ESRB" value="E10+">
<descriptor>comic mischief</descriptor>
<descriptor>mild cartoon violence</descriptor>
<descriptor>mild suggestive themes</descriptor>
</rating>
<wi-fi players="0"/>
<input players="2">
<control type="wiimote" required="true"/>
<control type="nunchuk" required="true"/>
</input>
<rom version="" name="Help Wanted: 50 Wacky Jobs (DEMO) (USA) (EN).iso" size="4699979776"/>
</game>
So far I have this:
Dim doc as XPathDocument
Dim nav as XPathNavigator
Dim iter as XPathNodeIterator
Dim lstNav As XPathNavigator
Dim iterNews As XPathNodeIterator
doc = New XPathDocument("wiitdb.xml")
nav = doc.CreateNavigator
iter = nav.Select("/WiiTDB/game") 'Your node name goes here
'Loop through the records in that node
While iter.MoveNext
'Get the data we need from the node
lstNav = iter.Current
iterNews = lstNav.SelectDescendants(XPathNodeType.Element, False)
'Loop through the child nodes
txtOutput.Text = txtOutput.Text & vbNewLine & iterNews.Current.Name & ": " & iterNews.Current.Value
End While
It just skips the "While iter.MoveNext" part of the code. I tries it with a simple XML file, and it works fine.

I think your XPath query is off. WiiTDB is a closed node, so you need to look for /datafile/game or //game.

Use the System.Xml.Serialization namespace instead: create a dedicated, serializable class to hold the data you wish to load and define shared serialize / deserialize functions with strongly typed arguments to do the work for you.
As the structure of the new classes will closely follow that of your XML data, there should be no confusion as to which data is located where within a run time instance.
See my answer here for an idea of how to create a class from an example XML file.

Related

How to extract individual/child nodes from a KML file in VisualBasic?

I need to be able to extract individual nodes from this file into variables for further manipulation. I'm writing to the console to see what information is being pulled, but I am struggling to pull the name or description.
I can successfully print the entire file. I've tried getting individual nodes using placemark.<name>.Value and placemark.Element("name").Value, the second of which throws a NullReferenceException. Any ideas on how to be able to pull out the name and description in this instance?
Imports System.Xml
Imports System.Xml.Linq 'Visual Studio 2015 tells me this isn't needed
Imports System.Core 'Visual Studio 2015 tells me this isn't needed
Dim file As XDocument = XDocument.Load(filePath)
Dim placemarks As IEnumerable(Of XElement) = From test In file.Root.Elements()
For Each placemark As XElement In placemarks
Console.WriteLine(placemark) 'This works
Console.WriteLine(placemark.<name>.Value) 'This prints an empty line
Console.WriteLine(placemark.Element("description").Value) 'This throws a NullReferenceException
Next
This is the structure
<?xml version='1.0' encoding='UTF-8'?>
<kml xmlns='http://www.opengis.net/kml/2.2'>
<Document>
<name>Untitled layer</name>
<Placemark>
<name>Name 1</name>
<description>Description 1</description>
<ExtendedData>
<Data name='Test data one'>
<value>Test data 1</value>
</Data>
</ExtendedData>
<Point>
<coordinates>34725567547</coordinates>
</Point>
</Placemark>
<Placemark>
<name>Name 2</name>
<description>Description 2</description>
<ExtendedData>
<Data name='Test data two'>
<value>Test data 2</value>
</Data>
</ExtendedData>
<Point>
<coordinates>056795763767</coordinates>
</Point>
</Placemark>
If I have understood you correctly, you are trying to fetch the name & description present inside the PlaceMark node. But, since you are only fetching Root.Elements() your query will only fetch the complete XML starting from your root node.
You need to find the Descendants of PlaceMark node because you need to fetch the name & description inside it. Also, since the root node kml consists of namespace you need to specify that as well.
Here is the code:-
Dim ns As XNamespace = "http://www.opengis.net/kml/2.2"
Dim placeMarks = From test In file.Root.Element(ns + "Document")
.Descendants(ns + "Placemark") Select test
For Each pm In placeMarks
Console.WriteLine("Name: {0}", pm.Element(ns + "name").Value)
Console.WriteLine("Description: {0}", pm.Element(ns + "description").Value)
Console.WriteLine()
Next
I am getting following output:-

What is the best way to parse GML in VB.Net

I'm looking for the best way to parse GML to return the spatial data. As example here's a GML file:
<?xml version="1.0" encoding="utf-8"?>
<gml:FeatureCollection xmlns:gml="http://www.opengis.net/gml"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:onecallgml="http://www.pelicancorp.com/onecallgml"
xsi:schemaLocation="http://www.pelicancorp.com/onecallgml http://www.pelicancorp.com/digsafe/onecallgml.xsd">
<gml:featureMember>
<onecallgml:OneCallReferral gml:id="digsite">
<onecallgml:LocationDetails>
<gml:surfaceProperty>
<gml:Polygon srsName="EPSG:2193">
<gml:exterior>
<gml:LinearRing>
<gml:posList>
1563229.00057526 5179234.72234694 1563576.83066077 5179352.36361939 1563694.22647617 5179123.23451613 1563294.42782719 5179000.13697214 1563229.00057526 5179234.72234694
</gml:posList>
</gml:LinearRing>
</gml:exterior>
</gml:Polygon>
</gml:surfaceProperty>
</onecallgml:LocationDetails>
</onecallgml:OneCallReferral>
</gml:featureMember>
</gml:FeatureCollection>
How do I iterate through each featureMember, and then its polygon(s) and then get the posList coordinates into an array?
When dealing with XML in VB.NET, I recommend using LINQ to XML. You will probably want to extract more information (e.g. something to tie back to the featureMember), but a simple example could be:
' You will need to import the XML namespace
Imports <xmlns:gml = "http://www.opengis.net/gml">
...
Dim xml As XElement = XElement.Parse(myGmlString) ' or some other method
Dim polys = (
From fm In xml...<gml:featureMember>
From poly In fm...<gml:Polygon>
Select New With {
.Name = poly.#srsName,
.Coords = (poly...<gml:posList>.Value.Trim() _
.Split({" "}, StringSplitOptions.RemoveEmptyEntries) _
.Select(Function(x) CDbl(x))).ToArray()
}
).ToList()
This will give you a List of anonymous types with the polygon name and the coordinates as an array of Double.

Read an XML node in namespace using Linq

How do I work out what the namespace declaration is for the Extension node?
I want to return all of the child nodes under: GPO->User->ExtensionData->Extension
<?xml version="1.0" encoding="utf-16"?>
<GPO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.microsoft.com/GroupPolicy/Settings">
<User>
<VersionDirectory>4</VersionDirectory>
<VersionSysvol>4</VersionSysvol>
<Enabled>true</Enabled>
<ExtensionData>
<Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/Scripts" xsi:type="q1:Scripts">
<q1:Script>
<q1:Command>Logon.cmd</q1:Command>
<q1:Type>Logon</q1:Type>
<q1:Order>0</q1:Order>
<q1:RunOrder>PSNotConfigured</q1:RunOrder>
</q1:Script>
</Extension>
<Name>Scripts</Name>
</ExtensionData>
</User>
<LinksTo>
<SOMName>an interesting data value</SOMName>
<SOMPath>some data value</SOMPath>
<Enabled>true</Enabled>
<NoOverride>false</NoOverride>
</LinksTo>
</GPO>
This is my attempt:
Dim NS As XNamespace = "http://www.microsoft.com/GroupPolicy/Settings/Scripts"
Dim UserPolCount = XDoc.Descendants(NS + "Extension").First()
I get the following error: Sequence contains no elements
Also, the XML sample I have provided is only a small snippet, the ExtensionData->Extension nodes can be nested in different areas, so I was hoping to find the way of specifying the full path.
Thanks
The Extension Element is still under the root namespace of:
http://www.microsoft.com/GroupPolicy/Settings
Elements under Extension are under the Scripts namespace:
http://www.microsoft.com/GroupPolicy/Settings/Scripts
So you need:
Dim NS As XNamespace = "http://www.microsoft.com/GroupPolicy/Settings/"
Dim NS1 As XNamespace = "http://www.microsoft.com/GroupPolicy/Settings/Scipts"
Dim UserPolCount = XDoc.Descendants(NS + "Extension").First()
Dim ScriptNode = UserPolCount.Elements(NS1 + "Script")
EDIT From the comments:
Dim extension =
XDoc
.Root
.Element(NS + "User")
.Element(NS + "ExtensionData")
.Element(NS + "Extension");
You are using the wrong namespace. You need to use http://www.microsoft.com/GroupPolicy/Settings as the namespace.
The reason is that only the children of Extension are in the Scripts namespace. You can easily see this: The children are all prefixed with q1, the Extension tag itself not. Therefore it is defined in the default namespace, defined by the attribute xmlns="http://www.microsoft.com/GroupPolicy/Settings" on the root tag GPO.

Adding a node to an existing XML file using inno setup

In my inno setup script there is a [code] section and I need to add some code to:
Open an xml file
then add a single node in a specific place
Save the file back to the hard drive
I need to be able to edit a file called config.xml in \documents\docotype
in the file there is some code like this:
<References>
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>System.dll</string>
<string>System.Core.dll</string>
<string>System.Drawing.dll</string>
<string>System.Windows.Forms.dll</string>
<string>System.XML.dll</string>
</ArrayOfString>
</References>
I need it to look like this:
<References>
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>System.dll</string>
<string>System.Core.dll</string>
<string>System.Drawing.dll</string>
<string>System.Windows.Forms.dll</string>
<string>System.XML.dll</string>
<string>C:\\bin\Custom\cutty109.dll</string>
</ArrayOfString>
</References>
So really I just need to add the following line into the file in the 'ArrayOfString' section
<string>C:\\bin\Custom\cutty109.dll</string>
I'm sure this must be possible but I have no clue how..
Thanks
Please refer the CodeAutomation.iss example provided along with inno install. And use this code instead the original code under the 'Modify the XML document' section.
{ Modify the XML document }
NewNode := XMLDoc.createElement('string');
XMLDoc.setProperty('SelectionLanguage', 'XPath');
RootNode := XMLDoc.selectSingleNode('//References/ArrayOfString');
RootNode.appendChild (NewNode);
RootNode.lastChild.text :='C:\\bin\Custom\cutty109.dll';
{ Save the XML document }
I assume you really need some dynamic way to add to this config file, if not then of course overriding the old one is the simplest method.
To dynamically add sections to a config file, you have some options:
You can create your own command line utility (exe or script) that does the file manipulation and call that utility in the [Run] section of your install script. This could look something like this:
In the [Files] section, you'll have one line for your utility:
Source: "myUtil.exe"; DestDir: "{app}"
In the [Run] section, you'll have one line for each manipulation you need to do in your config, like this:
FileName: "{app}\myUtil.exe"; Parameters: "/addSection:"
OR
You can use Pascal scripting to manipulate your config file. You can create a Pascal that uses CreateOleObject to call msxml.dll for XML the file manipulation. Then, in your [Files] section you can use AfterInstall to to call your Pascal function, like this:
Source: "myFileThatNeedsConfigManipulation.dll"; DestDir: ... ;
AfterInstall: MyPascalFunctionThatDoesTheManipulation
Try something like this:
Dim sXPath : sXPath = "/configuration/References/ArrayOfString"
Dim sAdd : sAdd = "C:\\bin\Custom\cutty109.dll"
Dim sElm : sElm = "string"
Dim sFSpec : sFSpec = resolvePath( "..\data\config.xml" )
Dim oXDoc : Set oXDoc = CreateObject( "Msxml2.DOMDocument" )
oXDoc.setProperty "SelectionLanguage", "XPath"
oXDoc.async = False
oXDoc.load sFSpec
If 0 = oXDoc.ParseError Then
WScript.Echo sFSpec, "looks ok"
Dim ndFnd : Set ndFnd = oXDoc.selectSingleNode( sXPath )
If ndFnd Is Nothing Then
WScript.Echo "|", sXPath, "| not found"
Else
WScript.Echo "found |" & ndFnd.tagName & "|"
Dim ndNew : Set ndNew = oXDoc.createElement( sElm )
ndNew.appendChild oXDoc.createTextNode( sAdd )
ndFnd.appendChild ndNew
WScript.Echo "After appending:"
WScript.Echo oXDoc.xml
oXDoc.Save Replace( sFSpec, ".xml", "-2.xml" )
End If
Else
WScript.Echo oXDoc.ParseError.Reason
End If
The steps:
create a Msxml2.DOMDocument
use XPath to find the node to change
create a now string element and append the text
append the new node to the found node
save the modified XML

VB.NET Serialization Missing dot right before new line serialization

I've been using XML serialization for a while, and today I realized something really odd. If I have a new line right after a "dot" (.), when i deserialize, I lose the dot. Has anyone ever had this happen to them? The following is my serialization code:
Serialize
Dim xmlSerializer As New System.Xml.Serialization.XmlSerializer(GetType(SilverWare.Licensing.Common.StoreLicense), New System.Type() {GetType(SilverWare.Licensing.Common.StationLicense)})
Dim gen As LicenseGenerator
If store Is Nothing Then
Throw New ArgumentNullException("store")
ElseIf store.StationLicenses Is Nothing Then
Throw New ArgumentNullException("store.StationLicenses")
ElseIf store.StationLicenses.Length = 0 Then
Throw New ArgumentOutOfRangeException("store.StationLicenses", "Must contain at least one element.")
End If
' Create a license generator for issuing new license keys.
gen = New LicenseGenerator(store)
' Generate store key.
store.LicenseKey = gen.GenerateLicenseKey
' Generate individual station keys.
For Each station In store.StationLicenses
station.LicenseKey = gen.GenerateLicenseKey(station)
Next
' Write license to file.
Using xFile As Xml.XmlWriter = Xml.XmlWriter.Create(licenseFile)
xmlSerializer.Serialize(xFile, store)
xFile.Close()
End Using
Deserialize
Dim xmlDeserializer As New System.Xml.Serialization.XmlSerializer(GetType(SilverWare.Licensing.Common.StoreLicense), New System.Type() {GetType(SilverWare.Licensing.Common.StationLicense)})
Dim result As SilverWare.Licensing.Common.StoreLicense
Using xFile As Xml.XmlReader = Xml.XmlReader.Create(licenseFile)
result = DirectCast(xmlDeserializer.Deserialize(xFile), SilverWare.Licensing.Common.StoreLicense)
xFile.Close()
End Using
Return result
The really funny part is that if I have a space after the dot, or remove the new line character, there are no problems. This only happens if it is dot which I find mind boggling.
Here is a quick sample of my XML file that was created when I serialized:
<?xml version="1.0" encoding="utf-8" ?>
<StoreLicense xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...
<ReceiptAddress>98 N. Washington St.
Berkeley Springs West Virginia</ReceiptAddress>
<Name>Ambrae House at Berkeley Springs</Name>
<AliasName>Ambrae House</AliasName>
<Address1>98 N. Washington St.</Address1>
<Address2 />
...
</StoreLicense>
The line that is having the problem is the ReceiptAddress Node.
This post on MSDN seems to answer your question.
MSDN: Serialize String containing only whitespace such as a " " character
From that post, try this:
<XmlAttribute("xml:space")> _
Public SpacePreserve As [String] = "preserve"
This creates a root node like the following:
<DataImportBase xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xml:space="preserve">
Jim
Since I was using someone elses dll, I didn't even think that it would be modifying my data when we imported it. What was happening was that the other programmer had a reg_ex that was looking for a dot before a new line. That was my issue, and my grief for 3 months.