I have a text file which contains about 60 lines. I would like to parse out all the text from that file and display in a window. The text file contains words that are separated by an underscore. I would like to use regular expression to solve this problem.
Update:
This is my code as of now. I am trying to read "filename" in my code.
Dim filename = "D:\databases.txt"
Dim regexpression As String = "/^[^_]*_([^_]*)\w/"
I know I don't have much done here anyway but I am trying to learn VB on my own and have gotten stuck here.
Please feel free to suggest what I should be doing instead.
Something like this:
TextBox1.Lines = IO.File.ReadAllLines("fileName")
To remove underscores:
TextBox1.Lines = IO.File.ReadAllLines("fileName").Replace("_", String.Empty)
If you also need other special characters removed, you can use Regex.Replace:
Remove special characters from a string
Also on MSDN:
How to: Strip Invalid Characters from a String
Or the old school way - loop through all characters, and filter only those you need:
Most efficient way to remove special characters from string
Related
I currently have a text file that has a little over 500 lines of paths.
(i.e., N:\Fork\Cli\Scripts\ABC01.VB)
Some of these file names vary in length (i.e., ABC01.VB, ABCDEF123.VB, etc)
How can I go about using substring function to remove the path name, numbers, and file type, leaving just the letters.
For example, processing N:\Fork\Cli\Scripts\ABC01.VB, and returning ABC.
Or N:\Fork\Cli\Scripts\ZUBDK22039.VB and returning ZUBDK.
I've only been able to retrieve the first 3 letters using this code
Dim comp As String = sLine.Substring(28, 3)
sw.WriteLine(comp)
As Plutonix points out, the best way to isolate the file name from a path is with System.IO.Path.GetFileNameWithoutExtension.
You can extract just the letters (not digits or other characters) from a filename like this:
Dim myPath As String = "N:\Fork\Cli\Scripts\AB42Cde01.VB"
Dim filename As String = System.IO.Path.GetFileNameWithoutExtension(myPath)
Dim letters As String = filename.Where(Function(c) Char.IsLetter(c)).ToArray
The above code sets letters to ABCde.
The code relies on the fact that Strings are treated like arrays of characters. The Where method processes all the characters in the string (array) and selects only the ones that are letters (using the Char.IsLetter method). The selected characters are converted to an array (string) that is assigned to the letters variable.
I see from your latest comment that it is not possible for numerals to be mixed with the letters (as in my example). However, the code should still work in your case.
I have a csv file, when i use the split function, my issue is that the 16th segment of the array has a name in it (in most cases) that has first and last name split by a comma. This obviously causes me issues as it puts my array out of sync. any suggestions on how i can handle this?
the string in the 16th segment is surrounded by "" if that helps, the split function still splits it though.
you can use TextFieldParser as indicated here
I recommend Lumen CSV Library, it can correctly handle field values with commas.
Also it has a very good performance, and a very simple usage.
See the link above, it won't disappoint you.
I think you're missing the point. Split is only good for simple csv parsing. Anything that gets even a little complicated means a lot of extra code. Something like the TextFieldParser is better suited to what you want. However if you must use Split here's one way:
Dim TempArray() As String
Dim Output As New List(Of String)
If SourceString.Contains("""") Then
TempArray = SourceString.Split(""""c)
Output.AddRange(TempArray(0).Split(","c))
Output.Add(TempArray(1))
'If the quoted part of the csv line is at the end of the line omit this statement.
Output.AddRange(TempArray(2).Split(","c))
Else
Output = New List(Of String)(SourceString.Split(","c))
End If
This assumes that the data is strictly organized, except for the quotes, if not you'll have to add validation code.
Split by "," with the quotes instead of just a comma. Don't forget to take care of the first and last quotes on the line.
This is a more general question. I am reading a document and saving its contents into a string variable. The resulting variable contains approximately 1 million characters (no cleansing). My code would then search the string, and extract key words. However, I am hung-up on an issue:
If I pass the string directly to a message box, it will show me the contents using Mid:
Messagebox.Show(Mid(searchString, startPos, endPos))
However, if I first pass the mid to a string variable, the contents are empty and the messagebox displays nothing:
Dim myString as String
myString = Mid(searchString, startPos, endPos)
Messagebox.Show(myString)
The same effect happens when I use .substring and when I use a stringbuilder.
Does anybody know why this is happening? I assume something is happening during assignment, but I am not sure what is lost?
Here is a snippet of code:
searchPos = textString.IndexOf(searchText, searchPos, StringComparison.OrdinalIgnoreCase)
MessageBox.Show(searchPos)
MessageBox.Show(Mid(textString, searchPos, 100))
So, the inconsistency is as such: the length of textString is around 3,700,000 characters. When I find the indexOf, the value returned in the first Messagebox is 455,225. However, if I try to pull out the characters using Mid, the second messagebox is blank.
Also, although it claims to be 3,700,000 characters, if I do a messagebox on textString, I am only shown around 6 characters of what appears to be XML. The file that is being searched is an old .ppt file, and I know I can just work-around it, but I am confused by how the computer can find the indexof my searchText correctly, but then cannot show me anything.
Thoughts?
I am currently reading in a text file in VB.Net using
Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText(file)
File contains several lines of text and when I read in the text file, it knows that they are on separate lines and prints them out accordingly.
However, when I try to split fileReader into an array of the different lines, the line break seems to stay there, even if I use Split(ControlChars.Cr) or Split(ControlChars.NewLine). It will successfully split it into the separate lines but when I display it, it will "push" the text down a line, like the line break is still there...
Does anyone have any ideas on what is going on and how I can remove these "invisible" control chars.
Text File:
Test1
Test2
Test3
Test4
fileReader:
Test1
Test2
Test3
Test4
lines() printout
Test1
Test2
Test3
Test4
Use trim() on each line, it'll remove extraneous whitespace.
The System.IO.File class has a ReadAllLines method that will actually give you back an array of strings, one per line.
If that method doesn't work, either, I would examine exactly what bytes are causing you issues. In the watch window, you can do a System.Text.Encoding.ASCII.GetBytes (sampleLine) and examine exactly what you are working with.
I'm assuming you are using ASCII encoding, if not, you'll need to swap out ASCII with the correct option, and then modify your file read to read based on that encoding, as well.
As mentioned use the Readalllines method to have it split automatically.
The problem you are having is PC ASCII files are usually split with a carriage return and a new line, splitting on just one will leave the other. You can split and trim as mentioned or use the other split that splits on strings instead of chars.
dim s() as string = Split(fileReader ,vbCrLf)
Trim will remove spaces from the data as well, depending on your situation that could be a problem for you.
Ran into a similar problem recently. The Trim() doesnt work because the extra lines are already there after doing the split (or using File.ReadAllLines). Here's what worked for me:
Dim allText As String = System.IO.File.ReadAllText(filePath)
allText = allText.Replace(Chr(13), "")
Dim lines As String() = allText.Split(vbLf)
Chr(13) is the Control-M character that result in extra lines using Split() or File.ReadAllLines.
I know you can put Unicode character codes in a VB.Net string like this:
str = Chr(&H0030) & "More text"
I would like to know how I can put the char code right into the string literal so I can use Unicode symbols from the designer view.
Is this even possible?
Use the ChrW() function to return Unicode characters.
Dim strW As String
strW = ChrW(&H25B2) & "More text"
The C# language supports this with escapes:
var str = "\u0030More text";
But that isn't available in VB.NET. Beware that you almost certainly don't want to use Chr(), that is meant for legacy code that works with the default code page. You'll want ChrW() and pass the Unicode codepoint.
Your specific example is not a problem, &H0030 is the code for "0" so you can simply put it directly in the string literal.
Dim str As String = "0MoreText"
You can use the Charmap.exe utility to copy and paste glyphs that don't have an easy ASCII code.
Replace Chr with Convert.ToChar:
str = Convert.ToChar(&H0030) & "More text"
To display an Unicode character, you can use following statement
ChrW(n) where n is the number representing de Unicode character.
Convert.ToChar(n)
type directly character in editor using Alt + N key combination
paste/copy Unicode character directly in editor
Char.ConvertFromUtf32(n)
XML String using &#x....; syntax
Example to assign ♥ character :
s = ChrW(&H2665)
s = Convert.ToChar(&H2665)
s = "♥" 'in typing Alt+2665
s = "♥" 'using paste/copy of ♥ from another location
s = Char.ConvertFromUtf32(&H2665)
s = <text>I ♥ you</text>
BUT when Unicode Character is greater than 0xFFFF (C syntax is more readable 😉), only method 4, 5 and 6 are working !
ChrW() function indicates an error at build
Convert.ToChar() function crashes at runtime
Alt+N is refused because it accepts only 4 digits
Example
lblCharacter.Text = "This solution works 😉"
Debug.Print (Char.ConvertFromUtf32(&H1F600))
s = <text>diable: 😈</text>
PS: smiley pasted (0x1F600) directly in Visual Studio code editor or Notepad++ have lost background color ! Explanation: the smiley pasted in this answer is filled by orange color but in Visual Studio editor or Notepad++, this color has disappeared !
To use String literals in Visual Studio Editor, you must use method 3 or 4 !
In Form (Design mode)
In Properties (see Text property)
I was hoping you could use XML literals and XML escapes but it doesn't work. I don't think XML literals allow you to use &#NN;. Although it is a way of including quotes " inside strings.
'Does not compile :('
Dim myString = _
<q>This string would contain an escaped character if it actually compiled.</q>.Value
I use the Character Map utility (charmap.exe). Run and select the characters you want in the control's font, such as ©Missico™, copy then paste into the Text property in the property grid. You will have to change the font because the default font for a form is "Microsoft Sans Serif" which is not a Unicode font. I do not think you can use this method for non-printable characters.
Depending on your needs, you can also use Localization, which creates resource files for each language. Again, you would use charmap.exe to select and copy the characters needed and paste them into the resource file. You probably can use non-printable characters, such as tabs, newline, and so on, since this is just a text file (Unicode).
No, it's not possible since VB strings don't support escape sequences. Simply use ChrW, which is a few characters more to type, but also a bit cleaner.