XML cleanup - unmatched tags - vb.net

I am trying to format xml entries I have so that I can use the xmltextreader without getting errors. I added a default header and footer in the event I notice there is no opening or closing tags. I remove illegal characters and check for unicode but I always find an issue where an entry slips in and gives the error:
data at the root level is invalid
and when I check that entry is slipped through the cleaning process or just has an unmatched tag somewhere. Now I use
Dim stringSplitter() As String = {"</entry>"}
' split the file content based on the closing entry tag
sampleResults = _html.Split(stringSplitter, StringSplitOptions.RemoveEmptyEntries)
to split my xml into individual entries before I start the cleanup process. Here are my default headers;
Private defaultheader = "xmlns=""http://www.w3.org/2005/Atom"""
Private headerl As String = "<?xml version=""1.0"" encoding=""utf-8""?>" & vbNewLine & "<entry " & defaultNameSpace & ">"
Private footer As String = "</entry>"
is there any tool in the .net framework that can detect and cleanup unmatched tags so that I can get this to work

I think you are looking in the wrong direction for a solution :)
I think what you need is to check out the IXmlSerializer.
check out this article:
Proper way to implement IXmlSerializable?
My approach would be to create an entry object, make it serializable, and read it via the serializer.
Create another serialized object called CleanedEntry, and give that the entry object in the constructor.
If the input never contains any errors, you should be able to make this work quite easily.
(of course this depends a bit on how the source looks like, and what you want to do with it.)
Please give an example of expected input /output if my answer seems hazy, and I will try to elaborate on it. (if I have the time ; ) )

Related

Normalize string from HtmlAgilityPack document

I'm trying to get a web page using vb.net and HtmlAgilityPack with this code:
Dim mWPage As New HtmlAgilityPack.HtmlDocument
Dim wC As New WebClient()
mWPage.Load(wC.OpenRead(mUrl))
My problem is to get text from a table but, when I extract InnerText, i get something like this:
Modificat<!--span-->i dati
instead of (Note that I wrote the same string and below it's displayed correctly):
Modificati dati
I've tryed to use the answer here but it doesn't work in this case (or I wasn't able to make it works)
I noticed that contents changes when I change "User-Agent", so I tryed various "User-Agent" but I never got a perfect text.
So my questions are:
can I use the code that is indicated in the answer to solve the problem?
if not, can I get a perfect text using the right "User-Agent"?
If so, how can I find the right "User-Agent"?
If not, how can I fix the receivedstring?
The response from the server based on a new User-Agent is fully dependent on the server so we will not be able to predict which one will yield the response you're looking for.
But... You will be able to use the HttpUtility.HtmlDecode method to get rid of the encoded HTML and turn it into teh string you're looking for.
To filter out the HTML comment you may need to change the XPath you're using. If you append //text(), you should get only the text elements that match the rest of your expression.

How to set "always open by this program"

i want my program to ask user "Do u want to set .mp3 file type always default open by this program?" (for first time only) any example to do this?
First, you will need to familiarize yourself with the Windows Registry.
Associations between programs and extensions are handled inside the HKEY_CLASSES_ROOT key.
Each extension appears as a sub-key.
As each key's default value you will find the associated key that handles most of the operations, currently supported, for that particular file type.
For example, you might find the .mp3's default value is set to "WMP11.AssocFile.MP3" or perhaps it set to "VLC.mp3", if you have installed VLC and configured it as the default MP3 player.
So, now you need to locate that key, again, inside HKEY_CLASSES_ROOT.
Although this may vary, you should find that "VLC.mp3" (or whatever key was associated with the .mp3 extension) has a sub-key called "shell".
Under "shell" you will find another sub-key called "Open".
And, finally, under "Open" you will another sub-key called "Command".
The "Command" key is the one containing the information used by Windows (and other programs) to open/start whatever application is currently associated with the ".mp3" (or any other) extension.
Once you understand and feel comfortable with the way associations are handled in the Registry, you should then use .NET's Microsoft.Win32's Registry class to navigate and query the required keys and their values.
Here's a very basic illustration of how the code would look like:
Dim mp3 = Registry.ClassesRoot.OpenSubKey(".mp3")
Dim associatedValue = mp3.GetValue("")
Dim associatedKey = Registry.ClassesRoot.OpenSubKey(associatedValue)
Dim defaultProgram = associatedKey.OpenSubKey("Shell\Open\Command").GetValue("")
MsgBox("MP3 Files Are Opened Using: " + vbCrLf + defaultProgram)
Hope this helps...
You need to set file associations. See this article on Code Project on setting File Associations in VB.NET.
An error shows up again after importing and declaring it like
Dim rgText As Registry.ClassesRoot.OpenSubKey(".txt")
and the error looks like this:
Type 'Registry.ClassesRoot.OpenSubKey' is not defined.

To generate a random, single-use URL

I've published a different take on a log in system on CodeProject ( http://www.codeproject.com/KB/aspnet/mlogin.aspx ) and I've got some free time, so I thought I'd have a look at password recovery/reset.
It was suggested on the article that I look into sending the account owner a single use, random url where they can reset their password if the account gets locked because of too many invalid login attempts/forgotten password.
Can anyone provide some guidance to help me to do this?
So far, I'm thinking I just have to generate a random string in a "recovery" field in the database table for the user's row and then check if the requested URL on the site is the same as the value for that field, then dynamically draw the page server-side.
Am I thinking on the right track here, or way off the mark?
Thanks in advance!
You're on the right track. Rather than a random string, a GUID is sufficient (uniqueidentifier field in SQL). Use the "d" format so the URL doesn't have curly braces:
MyUser.RecoveryKey = Guid.NewGuid()
Dim EmailBody As String = "http://blah/recoverpass.aspx?key=" & _
MyUser.RecoveryKey.ToString("D");

MySQL: How can I remove trailing HTML from a field in the database?

I want to remove some rogue HTML from a DB field that is supposed to contain a simple filename. Example of ok field:
myfile.pdf
Example of not ok field:
myfile2.pdf<input type="hidden" id="gwProxy" />...
Does anyone know a query I can run that can remove the HTML part but leave the filename? i.e. remove everything from the first < character onwards.
Lets assume the field is called myattachment and is defined as a varchar(250) and the table is called mytable in a MySQL database.
Background info (not necessary to read):
The field in our database is supposed to contain filenames however, due to a issue (documented here) some of the fields now contain a filename and some rogue HTML. We have fixed the root issue and now need to fix the corrupt fields. In the past I have replaced text using this kind of query:
UPDATE mytable SET myattachment = replace(myattachment, 'JPG', 'jpg') WHERE myattachment LIKE '%JPG';
This query seems to work ok, can anyone see any issues with it?
UPDATE mytable
SET myattachment = SUBSTRING_INDEX(myattachment, '<', 1)
WHERE `myattachment` LIKE '%<%';
For docs on SUBSTRING_INDEX see the mysql manual page.

Create files using list of filenames and add content to each

I need to make a bunch of redirect pages as I've recently updated my web site which previously used .html files and now all the files are .aspx. I have a tab-delimited file containing a list of original filenames and the corresponding new filename.
It seems like there should be a language out there that I should be able to create a file using the first column for the filename and insert the second column as its content with some additional text for the 301 redirect.
Could someone point me in the right direction as to what language(s) would be able to accomplish this? Also, if you could also point out the name of the method/function I would be using so I know where to begin when creating the file.
I've needed to do this type of thing many times and am willing to learn a new language (Perl, Python, or whatever) to accomplish this, but I just need pointed in the right direction. I am using Windows XP to develop on.
Thank you for your time.
This can be done in a few lines of C# if you already are working with aspx you can process this in the codebehind on a dummy page.
System.IO.StreamReader myreader = new System.IO.StreamReader(Server.MapPath("~/Text.txt"));
while (!myreader.EndOfStream)
{
//9 is Ascii value of Tab bad idea to split if the second set of values might contain tabs but can reconstruct the data if inputString.length >2
string[] inputString = myreader.ReadLine().Split(char.ConvertFromUtf32(9).ToCharArray());
//construct the path to where you want to save the file, or if the filename is the full path all the better
System.IO.StreamWriter filemaker = new System.IO.StreamWriter(#"C:\" + inputString[0]);
filemaker.Write(inputString[1]);
filemaker.Close();
}