Cutting up a CSV file using split - vb.net

I have a csv file, when i use the split function, my issue is that the 16th segment of the array has a name in it (in most cases) that has first and last name split by a comma. This obviously causes me issues as it puts my array out of sync. any suggestions on how i can handle this?
the string in the 16th segment is surrounded by "" if that helps, the split function still splits it though.

you can use TextFieldParser as indicated here

I recommend Lumen CSV Library, it can correctly handle field values with commas.
Also it has a very good performance, and a very simple usage.
See the link above, it won't disappoint you.

I think you're missing the point. Split is only good for simple csv parsing. Anything that gets even a little complicated means a lot of extra code. Something like the TextFieldParser is better suited to what you want. However if you must use Split here's one way:
Dim TempArray() As String
Dim Output As New List(Of String)
If SourceString.Contains("""") Then
TempArray = SourceString.Split(""""c)
Output.AddRange(TempArray(0).Split(","c))
Output.Add(TempArray(1))
'If the quoted part of the csv line is at the end of the line omit this statement.
Output.AddRange(TempArray(2).Split(","c))
Else
Output = New List(Of String)(SourceString.Split(","c))
End If
This assumes that the data is strictly organized, except for the quotes, if not you'll have to add validation code.

Split by "," with the quotes instead of just a comma. Don't forget to take care of the first and last quotes on the line.

Related

How can I disable automatic string detection in VS2015?

I'm using VB.NET, and my code contains a lot of strings that very often have double quotes inside of them. My problem is that as I'm fixing the string to escape double quotes (replacing every '"' with '""' inside of the string) it messes with the proceeding code, temporarily assuming everything is a string (since the double quotes don't match up) and completely messing up the formatting of other strings. It assumes that the start of a following string is the end of the current string which causes the actual string to be interpreted and formatted as code, which I have to go back and fix (since it adds spaces and other formatting characters that shouldn't actually be there).
Is there any way to disable this behavior? I didn't have the same problem in VS2013. I've been looking under Tools > Options > Text Editor > Basic, but I couldn't find anything relevant.
Additional Information: I can just modify the strings in a separate text document to escape all of the double-quotes (which is what I've resorted to for now), but in VS2013 I could easily just copy/paste the strings directly into my code without it messing up proceeding strings by temporarily interpreting them as code due to the uneven count of double-quotes.
This behavior is especially problematic when manually adding double-quotes within strings, because if you don't escape them quickly enough (or make a brief typo when doing so), you get the same issue.
You might notice that for other languages, such as C++, writing a string on one line (even with an uneven number of double-quotes) does not affect proceeding lines. Having this same behavior for VB would be great, assuming that there's some setting to enable it.
Yes its an inconvenience.
What I usually do is put some non-used character (e.g. some unused symbol on keyboard, or Alt+{some number}) instead of double quotes. When I'm done building my string whatever way I want, I just finalize it with either bringing up the Find and Replace box and replace that character with two double-quotes. Or just put a REPLACE statement immediately following it, replacing that character with Chr(34).
Instead use Chr(34), or if you end up repeating strings at all, store them as a resource.

Count a specific word using a Function ans a Sub

In VB.net, I want to make a counting program using a Function and a Sub.
There is a textbox to input a date and a button to exercise the programme in Form1.
I have a txt file which was extracted from MS-Excel with sequential date of time at its column A.
And from that txt file, I want to count the number of date(Actually string) such as "18-Jun-12".
The answer showing the count should be in the format of msgbox in the Sub.
I really have no idea how to link a Function and a Sub using variable, because I am just beginner.
Any help will be gratefully accepted.
If the fields are delimited by comma you must be careful since the field itself could contain a comma. Then you cannot differentiate between the value and the delimiter. You either could enclose the fields with quotes to mask them. But then you should use an available CSV parser anyway.
If the values never contain comma and you want a simple solution use File.ReadLines or File.ReadAllLines to read the lines and String.Split to get all fields per line.
Here's a simple approach using a little bit of LINQ to count all lines which contain the searched date (as string):
Dim linesWithThatDate = From line in File.ReadLines("Path to File")
Where line.Split(","c)(0).Trim() = "18-Jun-12"
Dim count = linesWithThatDate.Count()
As an aside, if the user must enter a date you could use a DateTimePicker control instead. Then you should also use Date.Parse(line.Split(","c)(0).Trim()) or Date.TryParse to get a real date.

Read a text file and display result in a window

I have a text file which contains about 60 lines. I would like to parse out all the text from that file and display in a window. The text file contains words that are separated by an underscore. I would like to use regular expression to solve this problem.
Update:
This is my code as of now. I am trying to read "filename" in my code.
Dim filename = "D:\databases.txt"
Dim regexpression As String = "/^[^_]*_([^_]*)\w/"
I know I don't have much done here anyway but I am trying to learn VB on my own and have gotten stuck here.
Please feel free to suggest what I should be doing instead.
Something like this:
TextBox1.Lines = IO.File.ReadAllLines("fileName")
To remove underscores:
TextBox1.Lines = IO.File.ReadAllLines("fileName").Replace("_", String.Empty)
If you also need other special characters removed, you can use Regex.Replace:
Remove special characters from a string
Also on MSDN:
How to: Strip Invalid Characters from a String
Or the old school way - loop through all characters, and filter only those you need:
Most efficient way to remove special characters from string

Remove "Invisible" Control Characters in VB.Net

I am currently reading in a text file in VB.Net using
Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText(file)
File contains several lines of text and when I read in the text file, it knows that they are on separate lines and prints them out accordingly.
However, when I try to split fileReader into an array of the different lines, the line break seems to stay there, even if I use Split(ControlChars.Cr) or Split(ControlChars.NewLine). It will successfully split it into the separate lines but when I display it, it will "push" the text down a line, like the line break is still there...
Does anyone have any ideas on what is going on and how I can remove these "invisible" control chars.
Text File:
Test1
Test2
Test3
Test4
fileReader:
Test1
Test2
Test3
Test4
lines() printout
Test1
Test2
Test3
Test4
Use trim() on each line, it'll remove extraneous whitespace.
The System.IO.File class has a ReadAllLines method that will actually give you back an array of strings, one per line.
If that method doesn't work, either, I would examine exactly what bytes are causing you issues. In the watch window, you can do a System.Text.Encoding.ASCII.GetBytes (sampleLine) and examine exactly what you are working with.
I'm assuming you are using ASCII encoding, if not, you'll need to swap out ASCII with the correct option, and then modify your file read to read based on that encoding, as well.
As mentioned use the Readalllines method to have it split automatically.
The problem you are having is PC ASCII files are usually split with a carriage return and a new line, splitting on just one will leave the other. You can split and trim as mentioned or use the other split that splits on strings instead of chars.
dim s() as string = Split(fileReader ,vbCrLf)
Trim will remove spaces from the data as well, depending on your situation that could be a problem for you.
Ran into a similar problem recently. The Trim() doesnt work because the extra lines are already there after doing the split (or using File.ReadAllLines). Here's what worked for me:
Dim allText As String = System.IO.File.ReadAllText(filePath)
allText = allText.Replace(Chr(13), "")
Dim lines As String() = allText.Split(vbLf)
Chr(13) is the Control-M character that result in extra lines using Split() or File.ReadAllLines.

Easy way to write a string with control characters to an nvarchar field in a DB?

EDIT: Accepted answer points out what my issue was. I added another answer which shows an even easier way.
I have a multiline textbox from which I create a single string:
Dim wholeThing As String
Dim line as String
For Each line In txtMultiline.Lines
wholeThing = wholeThing & line & Environment.Newline
Next
What I'd like to do is write this to a DB as-is by adding wholeThing as a parameter using to SqlCommand.Parameters.AddWithValue, but all of my painstakingly-inserted newlines go away (or appear to).
I saw this and was hoping I didn't need to concern myself with CHAR(nnn) SQL stuff.
Any suggestions? Thanks!
What do you mean when you say "appears to"? A newline character can be easily stored in an nvarchar field. What tools are you using to verify the results of your actions?
It's even easier than this. As per the discussion in the accepted answer, I wasn't seeing all of the lines in my view. Putting up a messagebox showed everything.
What's more, I don't need to take things line by line:
wholeThing = txtMultiline.Text
works, too, and it keeps all of the line breaks.