How vb.net split string - vb.net

I have a file txt file that values i want split after value "accession_number=" and before value "&token"
example Values in text file :
10.0.0.6:80/ImageSuite/Web/Worklist/DICOMViewer.aspx?patient_id=5049885&study_uid=201702060824&accession_number=20170206082802&token
10.0.0.6:80/ImageSuite/Web/Worklist/DICOMViewer.aspx?patient_id=4409276&study_uid=201702060826&accession_number=20170206083002&token
10.0.0.6:80/ImageSuite/Web/Worklist/DICOMViewer.aspx?patient_id=4402764&study_uid=201702060801&accession_number=20170206080416&token
10.0.0.6:80/ImageSuite/Web/Worklist/DICOMViewer.aspx?patient_id=4402537&study_uid=201702060837&accession_number=20170206084025&token
example values after proccess :
20170206082802
20170206083002
20170206080416
20170206084025
Thank you

You can read each line into an array of strings using File.ReadAllLines() and then iterate over that and parse the information using a Regex.
'Declare the Regex.
Dim Parser As New Regex("(?<=accession_number\=)\d+", RegexOptions.IgnoreCase)
'Read the file's lines into an array of strings.
Dim Lines As String() = File.ReadAllLines("C:\test.txt")
'Iterate over the array.
For Each Line As String In Lines
Dim m As Match = Parser.Match(Line) 'Match the pattern.
If m.Success = True Then 'Was the match successful?
Console.WriteLine(m.Value) 'Print the matched value.
End If
Next
Online test: http://ideone.com/VU5Iyj
Regex pattern explanation:
(?<=accession_number\=)\d+
(?<= => Match must be preceded by...
accession_number\= => ..."accession_number=".
) => End of preceding capture group.
\d+ => Match one or more numerical characters.

You can do it by reading all lines, and then based on the lines you can execute following code (assuming the file you want to read is located at C:\test.txt)
dim results = from line in File.ReadAllLines("C:\test.txt") _
where not String.IsNullOrWhiteSpace(line) _
from field in line.Split("&") _
where field.StartsWith("accession_number=") _
select field.Split("=")(1)
for each result in results
Console.WriteLine(result)
next
This will use as an input all lines, and then for the lines that are not empty, it will split using & and then it checks if that field startswith the accession_number, if it does, it splits by = to return the second item in the array.
As an extra explanation:
from line in File.ReadAllLines("C:\test.txt")
while evaluate every single line in the file
' input eg: 10.0.0.6:80/ImageSuite/Web/Worklist/DICOMViewer.aspx?patient_id=5049885&study_uid=201702060824&accession_number=20170206082802&token
where not String.IsNullOrWhiteSpace( line )
will exclude all empty lines (or lines that exist only of whitespaces)
from field in line.Split("&")
will split every found line into an array of strings using the & as a separator
' field eg: accession_number=20170206082802
where field.StartsWith("accession_number=")
will exclude all fields that do not start with accession_number=
select field.Split("=")(1)
' result sample: 20170206082802
will return for any matches the part after =
A full example you can find on this dotnetfiddle. It uses a slightly different method to read from stream, simply because in that environment, I cannot provide a dummy file, however, it should do the trick

Related

How to sort excel values with numbers in end

I have a macro which reads file names from a folder. The problem is that when file names are in series like A1,A2.....A200.pdf, as in this image:
then it reads in Excel as A1,A10,A100,A101.....A109,A11,A110.....A119,A20, as in this image:
How can I sort this so that the value in Excel comes as same as folder file names, or is there a way I can sort in Excel itself?
You can sort this in Excel with a helper column. Create a new column and calculate the length of your filenames in that "=LEN(A1)". Then use two-level sort to sort your filenames. Data -> Sort: Use length in the first level and the filenames in the second level.
Another option, you can use the RegEx object to extract the Numeric digits "captured" inside the file name.
Option Explicit
Sub SortFileNames()
Dim i As Long
With Sheets("Sheet1") ' replaces "Sheet1| with your sheet's name
For i = 1 To .Cells(.Rows.Count, "A").End(xlUp).Row
.Range("B" & i).Value = RetractNumberwithRegex(.Range("A" & i)) ' call the Regex function
Next i
End With
End Sub
'========================================================================
Function RetractNumberwithRegex(Rng As Range) As String
' function uses the Regex object to substract the Numeric values inside
Dim Reg1 As Object
Dim Matches As Object
Set Reg1 = CreateObject("vbscript.regexp")
With Reg1
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "[0-9]{1,20}" ' any size numeric string (up to 20 digits length)
End With
Set Matches = Reg1.Execute(Rng.Value2)
If Matches.Count <> 0 Then
RetractNumberwithRegex = Matches.Item(0)
End If
End Function
This is happening ofcourse of because different sorting algorithm in both these cases (Windows Explorer and Excel) Refer to this article if you want to understand.
To solve your problem, one of the ways is to pull out only the numeric part of file names in a different cell (say column B) and then sort based on those numbers.
If I can assume that the pattern of the files names is AXXX.pdf i.e. one letter A, then number, and 4 characters for file extension. You can use this function
=VALUE(MID(A1,2,LEN(A1)-5))
This works by pulling out some number of characters from in between the string. As per assumption, the number starts from 2nd place that's why the second parameter is 2. Then to decide, how many characters you pull, you know that all the characters except 'A' (1 char) and '.pdf' (4 chars) make the number. So, take the lenght of the whole name and reduce 5 characters. You get your number part which you can sort.
This will be your result:
The best way is to change the file names in your Excel list to have leading zeroes. Instead of A19, refer to the file as A019 and it will sort correctly. Convert the file names using this formula in a helper column.
=Left($A2, 1) & Right("000" & Mid($A2, 2, 3), 3)
Note that the 3 zeroes and string lengths of 3 are all related to each other. To create fixed length numbers of 4 digits, just use 4 zeroes and increase both string lengths to 4.
Copy the formula down from row 2 to the end. Copy the helper column, paste Values in place and, when everything is perfect, replace the original column with the helper.
In order to accommodate a fixed number of digits following the number the above formula may be tweaked. The formula below will accommodate 4 extra characters which follow the number, for example ".pdf" (including the period).
=Left($A2, 1) & Right("000" & Mid($A2, 2, 7), 7)

Count lines before specified string of Text File? In VB

is there a way to count the amount of lines before a specific line / string in a text file.
For Example:
1
2
3
4
5
6
7
8
9
Say i want to count the amount of line before '8'...
How would i do that?
thanks!
Hope that this actually you are looking for,
it will read all lines from a file specified. then find the IndexOf particular line(searchText) then add 1 to it will gives you the required count since index is0based.
Dim lines = File.ReadAllLines("f:\sample.txt")
Dim searchText As String = "8"
msgbox(Array.IndexOf(lines, searchText) + 1)
Here's another example using List.FindIndex(), which allows you to pass in a Predicate(T) to define how to make a match:
Dim fileName As String = "C:\Users\mikes\Documents\SomeFile.txt"
Dim lines As New List(Of String)(File.ReadAllLines(fileName))
Dim index As Integer = lines.FindIndex(Function(x) x.Equals("8"))
MessageBox.Show(index)
In the example above, we're looking for an exact match with "8", but you can make the predicate match whatever you like for more complex scenarios. Just make the function (the predicate) return True for what you want to be a match.
For example, a line containing "magic":
Function(x) x.ToLower().Contains("magic")
or a line that begins with a "FirstStep":
Function(x) x.StartsWith("FirstStep")
The predicate doesn't have to be a simple string function, it can be as complex as you like. Here's one that will find a string that ends with "UnicornFarts", but only on Wednesday and if Notepad is currently open:
Function(x) DateTime.Today.DayOfWeek = DayOfWeek.Wednesday AndAlso Process.GetProcessesByName("notepad").Length > 0 AndAlso x.EndsWith("UnicornFarts")
You get the idea...
Using a List, instead of an Array, is good for situations when you need to delete and/or insert lines into the contents before writing them back out to the file.

Read complex tab separated file(multiple column lines and line breaks) into the objects

Ik have a tab separated data file. Each object is separated from each other with 2 line breaks and each object's first and third row is the column names.
My Tab Separated File
ID [TAB] NAME
001 [TAB] Croline
DATE [TAB] DOC
30/06/2010 [TAB] 101435
2 x EMPTY LINE
ID [TAB] NAME
002 [TAB] Grek
DATE [TAB] DOC
30/06/2010 [TAB] 101437
2 x EMPTY LINE
...........
...........
My Object Class
Public Class MyObject
Public Property Id As String
Public Property Name As String
Public Property Date As String
Public Property Doc As String
End Class
How can I read this file into the MyObjects?
It's hard to help you to understand how to do this without knowing, more specifically, what part of this task you are having trouble with, but perhaps a simple working example with help you get started.
If you define the your data class like this:
Public Class MyObject
Public Property Id As String
Public Property Name As String
Public Property [Date] As String ' Note that "Date" must be surrounded with brackets since it is a keyword in VB
Public Property Doc As String
End Class
Then you can load it like this:
' Create a list to hold the loaded objects
Dim objects As New List(Of MyObject)()
' Read all of the lines from the file into an array of strings
Dim lines() As String = File.ReadAllLines("test.txt")
' Loop through the array of lines from the file. Step by 7 each
' time so that the current value of "i", at each iteration, will
' be the index of the first line of each object
For i As Integer = 0 To lines.Length Step 7
If lines.Length >= i + 3 Then
' Create a new object to store the data for the current item in the file
Dim o As New MyObject()
' Get the values from the second line
Dim fields() As String = lines(i + 1).Split(ControlChars.Tab)
o.Id = fields(0)
o.Name = fields(1)
' Get the values from the fourth line
fields = lines(i + 3).Split(ControlChars.Tab)
o.Date = fields(0)
o.Doc = fields(1)
' Add this item to the list
objects.Add(o)
End If
Next
The code to load it is very basic. It does no extra validation to ensure that the data in the file is correctly formatted, but, given a valid file, it will work to load the data into a list of objects.
The solution will be something like (pseudocode):
Create an empty list of MyObjects
Open file for reading
While there are lines left to read:
create a MyObject instance i
read a line and ignore it.
read a line into s1
split s1 at tab character into a and b
set i.Id to a1
set i.Name to b1
read a line and ignore it
read a line into s2
split s2 at tab character into a and b
set i.Date to a2
set i.Doc to b2
add i to your list
read a line and ignore it.
read a line and ignore it.
Translating this into vb.net is left as an exercise for the reader.
Rather than writing specific code for that data, I'd first convert it to simple CSV. It might make sense if this is just a one-shot thing.
1) Load the file in Notepad++
2) Replace \t by ; (using Extended Search Mode), giving you this kind of data :
ID;NAME
001;Croline
DATE;DOC
30/06/2010;101435
ID;NAME
002;Grek
DATE;DOC
30/06/2010;101437
3) Remove all DATE;DOC lines by searching for DATE;DOC\n and replacing with ; (you might need to do Edit > EOL Conversions > UNIX Format for that to work)
4) Do the same to replace all ID;NAME lines by a placeholder symbol that is guaranteed not to be used in the data, maybe £. Your data should look like this :
£001;Croline
;30/06/2010;101435
£002;Grek
;30/06/2010;101437
5) Do Edit > Blank Operations > Remove Unnecessary Blank and EOL, which should put all your data on a single line.
6) Search & Replace your placeholder symbol £ by \n
7) Remove extra blank line at the top. Voilà, you have a CSV file.

Spliting a string based on multiple characters

I am trying to split a string with multiple characters. The string might sometimes contain a - or a /. What I have achieved is the hyphen but I am not able to search for the slash. Any thoughts on how to split the string based on both characters at once ? Once I split after - I add the value after the - to the result list as a separate index and I would like to accomplish the same for '/'.
So For example the Split string has Jet-blue, the below code will add Jet in the result list with index(0) and blue with index(1). In addition to splitting with '-' I would also like to split with '/'. Any suggestions ?
Code:
Dim result As New List(Of String)()
For Each str_get As String In Split
Dim splitStr = str_get.Split("-")
For Each str_split As String In splitStr
result.Add(str_split) ' Enter into result list
' result.TrimExcess()
Next
result.Remove("")
Next
You can either use this or this overload of the Split method.
The first one takes an array of Char:
"Hello World".Split({"e"c, "o"c}) ' Notice the c!
The second one takes an array of String and StringSplitOptions:
"Hello World".Split({"el", "o"}, StringSplitOptions.None)

Creating an Array from a CSV text file and selecting certain parts of the array. in VB.NET

Ive searched over and over the internet for my issue but I havent been able to find / word my searches correctly...
My issue here is that I have a Comma Separated value file in .txt format... simply put, its a bunch of data delimited by commas and text qualifier is separated with ""
For example:
"So and so","1234","Blah Blah", "Foo","Bar","","","",""
"foofoo","barbar","etc.."
Where ever there is a carriage return it signifies a new row and every comma separates a new column from another.
My next step is to go into VB.net and create an array using these values and having the commas serve as the delimeter and somehow making the array into a table where the text files' format matches the array (i hope im explaining myself correctly :/ )
After that array has been created, I need to select only certain parts of that array and store the value into a variable for later use....
Andthats where my trouble comes in... I cant seem to get the correct logic as to how to make the array and selecting the certain info out of it..
If any
You might perhaps give a more detailed problem description, but I gather you're looking for something like this:
Sub Main()
Dim fileOne As String = "a1,b1,c1,d1,e1,f1,g1" + Environment.NewLine + _
"a2,b2,c2,d2,e2,f2,g2"
Dim table As New List(Of List(Of String))
' Process the file
For Each line As String In fileOne.Split(Environment.NewLine)
Dim row As New List(Of String)
For Each value In line.Split(",")
row.Add(value)
Next
table.Add(row)
Next
' Search the "table" using LINQ (for example)
Dim v = From c In table _
Where c(2) = "c1"
Console.WriteLine("Rows containing 'c1' in the 3rd column:")
For Each x As List(Of String) In v
Console.WriteLine(x(0)) ' printing the 1st column only
Next
' *** EDIT: added this after clarification
' Fetch value in row 2, column 3 (remember that lists are zero-indexed)
Console.WriteLine("Value of (2, 3): " + table(1)(2))
End Sub