compare and merge multiple files the text file using VB.NET - vb.net

I have a multiple text files that I need to merge. but I need to compare the reference number before merge it.
below is the text file
Text 1
001Email
002Video
003SocialNetwork
Text 2
001Gmail
001Yahoo
002Youtube
002Metacafe
003Facebook
003Myspace
Text 3
www.gmail.com001
www.yahoo.com001
www.youtube.com002
www.myspace.com002
www.facebook.com003
www.myspace.com003
Output
001Email
001Gmail
www.gmail.com001
001Yahoo
wwww.yahoo.com001
002Video
002Youtube
www.youtube.com002
002Metacafe
www.metacafe.com002
003SocialNetwork
003Facebook
www.facebook.com003
003Myspace
www.myspace.com003
What will be the fastest way to deal it read line by line to compare. the text file consist of thousand of line

Here's what might possibly be an overly complex solution. The comments in the code should explain everything hopefully. The output doesn't match exactly what you have because I don't know how much order is important for everything. It sorts everything first by the reference number and then by the text portion of the string (excluding www.). The results you posted were in reference number order and then file parsing order and then alphabetical (002Metacafe came after 002Video). Let me know if that's important.
Option Explicit On
Option Strict On
Imports System.IO
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
''//List of files to process
Dim Files As New List(Of String)
Files.Add(Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "Text1.txt"))
Files.Add(Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "Text2.txt"))
Files.Add(Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "Text3.txt"))
''//Will hold the current line being read
Dim Line As String
''//Holds our main collection of data
Dim MyData As New List(Of Data)
''//Loop through each file
For Each F In Files
''//Open the file for reading
Using FS As New FileStream(F, FileMode.Open, FileAccess.Read, FileShare.Read)
Using SR As New StreamReader(FS)
''//Read each line
Line = SR.ReadLine()
Do While Line IsNot Nothing
''//The data constructor handles parsing of the line
MyData.Add(New Data(Line))
''//Read next line
Line = SR.ReadLine()
Loop
End Using
End Using
Next
''//Our data implements IComparable(Of Data) so we can just sort the list
MyData.Sort()
''//Output our data
For Each D In MyData
Trace.WriteLine(D)
Next
Me.Close()
End Sub
End Class
Public Class Data
Implements IComparable(Of Data)
''//Our RegEx pattern for looking for a string that either starts or ends with numbers
Private Shared ReadOnly Pattern As String = "^(?<RefStart>\d+)?(?<Text>.*?)(?<RefEnd>\d+)?$"
Public Text As String ''//The _text_ portion of the data
Public Reference As String ''//The reference number stored as text
Public ReferenceAtStart As Boolean ''//Whether the reference number was found at the start or end of the line
Public ReadOnly Property ReferenceAsNum() As Integer ''//Numeric version of the reference number for sorting
Get
Return Integer.Parse(Me.Reference)
End Get
End Property
Public ReadOnly Property TextComparable() As String ''//Remove the www for sorting
Get
Return Me.Text.Replace("www.", "")
End Get
End Property
Public Sub New(ByVal line As String)
''//Sanity check
If String.IsNullOrEmpty(line) Then Throw New ArgumentNullException("line")
''//Parse the line
Dim M = Regex.Match(line, Pattern)
If M Is Nothing Then Throw New ArgumentException("Line does not conform to expected pattern")
''//If the RefStart has a value then the number is at the beginning of the string
If M.Groups("RefStart").Success Then
Me.ReferenceAtStart = True
Me.Reference = M.Groups("RefStart").Value
Else ''//Otherwise its at the end
Me.ReferenceAtStart = False
Me.Reference = M.Groups("RefEnd").Value
End If
Me.Text = M.Groups("Text").Value
End Sub
Public Function CompareTo(ByVal other As Data) As Integer Implements System.IComparable(Of Data).CompareTo
''//Compare the reference numbers first
Dim Ret = Me.ReferenceAsNum.CompareTo(other.ReferenceAsNum)
''//If they are the same then compare the strings
If Ret = 0 Then Ret = String.Compare(Me.TextComparable, other.TextComparable, StringComparison.InvariantCultureIgnoreCase)
Return Ret
End Function
Public Overrides Function ToString() As String
''//Reproduce the original string
If Me.ReferenceAtStart Then
Return String.Format("{0}{1}", Me.Reference, Me.Text)
Else
Return String.Format("{1}{0}", Me.Reference, Me.Text)
End If
End Function
End Class

Related

Import CSV file with comma and multiline text in quotes in a DataGridView

I'm trying to import a CSV file into a DataGridView but I'm running in some issues when I try to import multiline text.
What I'm trying to import is this:
ID;RW;Name;Description;Def;Unit;Min;Max
0;R;REG_INFO;"state of the
machine";0;ms;0;0xFFFF
1;R/W;REG_NUMBER;current number;0;days;0;65,535
This is what it should like when imported:
What I've implemented till now:
Private Sub btnOpen_Click(sender As Object, e As EventArgs) Handles btnOpen.Click
Using ofd As OpenFileDialog = New OpenFileDialog() With {.Filter = "Text file|*.csv"}
If ofd.ShowDialog() = DialogResult.OK Then
Dim lines As List(Of String) = File.ReadAllLines(ofd.FileName).ToList()
Dim list As List(Of Register) = New List(Of Register)
For i As Integer = 1 To lines.Count - 1
Dim data As String() = lines(i).Split(";")
list.Add(New Register() With {
.ID = data(0),
.RW = data(1),
.Name = data(2),
.Description = data(3),
.Def = data(4),
.Unit = data(5),
.Min = data(6),
.Max = data(7)
})
Next
DataGridView1.DataSource = list
End If
End Using
End Sub
But I run in some problems with multiline text when I try to load the CSV, as "state of the machine" in the example.
An example, using the TextFieldParser class.
(This class is available in .Net 5)
The TextFieldParser object provides methods and properties for parsing
structured text files. Parsing a text file with the TextFieldParser is
similar to iterating over a text file, while using the ReadFields
method to extract fields of text is similar to splitting the strings
Your source of data is a delimited (not fixed-length) structure, the header/fields values are separated by a symbol, so you can specify TextFieldType = FieldType.Delimited
The delimiter is not a comma (the C in CSV), so you need to pass the delimiter symbol(s) to the SetDelimiters() method.
Call the ReadFields() to extract each line as an array of String, representing the Fields' values (=> here, no conversion is performed, all values are returned as strings. Make your own Type converter in case it's needed.)
Imports Microsoft.VisualBasic.FileIO
Public Class RegisterParser
Private m_FilePath As String = String.Empty
Private m_delimiters As String() = Nothing
Public Sub New(sourceFile As String, delimiters As String())
m_FilePath = sourceFile
m_delimiters = delimiters
End Sub
Public Function ReadData() As List(Of Register)
Dim result As New List(Of Register)
Using tfp As New TextFieldParser(m_FilePath)
tfp.TextFieldType = FieldType.Delimited
tfp.SetDelimiters(m_delimiters)
tfp.ReadFields()
Try
While Not tfp.EndOfData
result.Add(New Register(tfp.ReadFields()))
End While
Catch fnfEx As FileNotFoundException
MessageBox.Show($"File not found: {fnfEx.Message}")
Catch exIDX As IndexOutOfRangeException
MessageBox.Show($"Invalid Data format: {exIDX.Message}")
Catch exIO As MalformedLineException
MessageBox.Show($"Invalid Data format at line {exIO.Message}")
End Try
End Using
Return result
End Function
End Class
Pass the path of the CSV file and the set of delimiters to use (here, just ;).
The ReadData() method returns a List(Of Register) objects, to assign to the DataGridView.DataSource.
DefaultCellStyle.WrapMode is set to True, so multiline text can actually wrap in the Cell (otherwise it would be clipped).
After that, call AutoResizeRows(), so the wrapped text can be seen.
Dim csvPath = [The CSV Path]
Dim csvParser = New RegisterParser(csvPath, {";"})
DataGridView1.DataSource = csvParser.ReadData()
DataGridView1.Columns("Description").DefaultCellStyle.WrapMode = DataGridViewTriState.True
DataGridView1.AutoResizeRows()
Register class:
Added a constructor that accepts an array of strings. You could change it to Object(), then add a converter to the class to parse and convert the values to another Type.
Public Class Register
Public Sub New(ParamArray values As String())
ID = values(0)
RW = values(1)
Name = values(2)
Description = values(3)
Def = values(4)
Unit = values(5)
Min = values(6)
Max = values(7)
End Sub
Public Property ID As String
Public Property RW As String
Public Property Name As String
Public Property Description As String
Public Property Def As String
Public Property Unit As String
Public Property Min As String
Public Property Max As String
End Class

Sort directory path alphabetically

Below code is used for sorting Directories in vb.net.
Dim a As New List(Of String)
a.Add("a\b\c")
a.Add("a\b\c\d")
a.Add("a\b\c d")
a.Add("a\b\c d\e")
a.Add("a\b\c\d\f")
a.Sort(Function(x, Y) (x).CompareTo((Y)))
result:
a\b\c
a\b\c d
a\b\c d\e
a\b\c\d
a\b\c\d\f
In the result list directories with space is placed before "\".
There are more than 1500000 sub-directories and files
it takes around 50 seconds to sort(default method)
all other methods we tried is taking at least 400 seconds.
how to sort directory path alphabetically?
Is there any built in method to consider Backslash before space ?
You need to break the path up into individual folder names and compare each of them in turn until you find a difference. If there is no difference then you use the length to differentiate, i.e. higher-level folder comes first.
a.Sort(Function(x, y)
Dim xFolderNames As New List(Of String)
Dim yFolderNames As New List(Of String)
'Split first path into folder names.
Do Until String.IsNullOrEmpty(x)
xFolderNames.Insert(0, Path.GetFileName(x))
x = Path.GetDirectoryName(x)
Loop
'Split second path into folder names.
Do Until String.IsNullOrEmpty(y)
yFolderNames.Insert(0, Path.GetFileName(y))
y = Path.GetDirectoryName(y)
Loop
Dim result = 0
'Compare up to as many folders as are in the shortest path.
For i = 0 To Math.Min(xFolderNames.Count, yFolderNames.Count) - 1
result = xFolderNames(i).CompareTo(yFolderNames(i))
If result <> 0 Then
'A difference has been found.
Exit For
End If
Next
If result = 0 Then
'No difference was found so put the shortest path first.
result = xFolderNames.Count.CompareTo(yFolderNames.Count)
End If
Return result
End Function)
For good measure, here's a class that encapsulates that functionality:
Imports System.Collections.ObjectModel
Imports System.IO
Public Class FileSystemPath
Implements IComparable, IComparable(Of FileSystemPath)
Public ReadOnly Property FullPath As String
Public ReadOnly Property PartialPaths As ReadOnlyCollection(Of String)
Public Sub New(fileOrFolderPath As String)
FullPath = fileOrFolderPath
Dim partialPaths As New List(Of String)
Do Until String.IsNullOrEmpty(fileOrFolderPath)
partialPaths.Insert(0, Path.GetFileName(fileOrFolderPath))
fileOrFolderPath = Path.GetDirectoryName(fileOrFolderPath)
Loop
Me.PartialPaths = New ReadOnlyCollection(Of String)(partialPaths)
End Sub
Public Function CompareTo(obj As Object) As Integer Implements IComparable.CompareTo
Return CompareTo(DirectCast(obj, FileSystemPath))
End Function
Public Function CompareTo(other As FileSystemPath) As Integer Implements IComparable(Of FileSystemPath).CompareTo
Dim result = 0
'Compare up to as many folders as are in the shortest path.
For i = 0 To Math.Min(PartialPaths.Count, other.PartialPaths.Count) - 1
result = PartialPaths(i).CompareTo(other.PartialPaths(i))
If result <> 0 Then
'A difference has been found.
Exit For
End If
Next
If result = 0 Then
'No difference was found so put the shortest path first.
result = PartialPaths.Count.CompareTo(other.PartialPaths.Count)
End If
Return result
End Function
Public Overrides Function ToString() As String
Return FullPath
End Function
End Class
It can be used with barely a change to your code:
Dim a As New List(Of FileSystemPath)
a.Add(New FileSystemPath("a\b\c"))
a.Add(New FileSystemPath("a\b\c\d"))
a.Add(New FileSystemPath("a\b\c d"))
a.Add(New FileSystemPath("a\b\c d\e"))
a.Add(New FileSystemPath("a\b\c\d\f"))
a.Sort()
Console.WriteLine(String.Join(Environment.NewLine, a))
Console.ReadLine()

Read from text file, store data into corresponding structure every 6 element, and then form an array

The text file contains the following: inside the [], anything inside () was not in the text file, just for clarification
[1(ID)
Jimmy(First name)
Paul (Last name)
78 (marks1)
80 (marks2)
92 (marks3)
2
Ben
James
67
82
73
]
I created a structure that holds student details including their name, id, marks in each subject.
Private Structure StudInfo
Public FName As String
Public LName As String
Public StudentId As Integer
Public ScMark As Integer
Public EnMark As Integer
Public MaMark As Integer
The program needs to read the first six elements in a row, storing each element into the corresponding structure type, then let it become the first element of an array"students()", and then next six elements, let it become the second element of that array. I have no idea how to use loops to do that.
Private Sub Button4_Click(sender As Object, e As EventArgs) Handles Button4.Click
'create an array that hold student details
Dim Students() As StudInfo
' read from text file
Dim FileNum As Integer = FreeFile()
Dim TempS As String = ""
Dim TempL As String
FileOpen(FileNum, "some.text", OpenMode.Input)
Do Until EOF(FileNum)
TempL = LineInput(FileNum)
TempS = TempL + vbCrLf
Loop
End Sub
Thank you.
You have to use a BinaryReader (which takes a IO.Stream as it's constructor), then you can read the data type you want, into the variable you want.
The problem you will have is the data will not be searchable (ie. you cannot read the 30th record, unless you physically read the first 29, because the strings are of variable length, and therefore the record is variable length), this also applies to modifying a record (you cant make it bigger, because it will overwrite the next record).
The answer is to work with fixed length records, or field offsets or fixed length strings. Then you will have a records of predictable size, and you can determine the amount of records by dividing the file length by the record size.
Hope this helps.
You could try something like this:
Public Class Form1
Private Students As New List(Of StudInfo)
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Students.Clear()
Dim fileName = "c:\some folder\directory\someFile.txt"
Using sr As New System.IO.StreamReader(fileName)
Dim value As Integer
Dim strValue As String
While Not sr.EndOfStream
Try
Dim student As New StudInfo
strValue = sr.ReadLine().Trim("[]")
If Integer.TryParse(strValue, value) Then
student.StudentId = value
Else
MessageBox.Show("Error Converting StudentID to Integer")
Exit Sub
End If
student.FName = sr.ReadLine().Trim("[]")
student.LName = sr.ReadLine().Trim("[]")
strValue = sr.ReadLine().Trim("[]")
If Integer.TryParse(strValue, value) Then
student.ScMark = value
Else
MessageBox.Show("Error Converting ScMark to Integer")
Exit Sub
End If
strValue = sr.ReadLine().Trim("[]")
If Integer.TryParse(strValue, value) Then
student.EnMark = value
Else
MessageBox.Show("Error Converting EnMark to Integer")
Exit Sub
End If
strValue = sr.ReadLine().Trim("[]")
If Integer.TryParse(strValue, value) Then
student.MaMark = value
Else
MessageBox.Show("Error Converting MaMark to Integer")
Exit Sub
End If
Students.Add(student)
Catch ex As Exception
MessageBox.Show("Error reading file. All records may not have been created.")
End Try
End While
MessageBox.Show("Done!")
End Using
End Sub
Private Class StudInfo
Public FName As String
Public LName As String
Public StudentId As Integer
Public ScMark As Integer
Public EnMark As Integer
Public MaMark As Integer
End Class
End Class
It depends a bit on the exact format of your text file.
If the file only contains the data for the two students (no brackets or blank lines), then all you need to do is to open the file and read 6 lines, add the data to your structure. and read the next 6 lines. If you have an undetermined number of students in the text file, then you would be better using a List. Other wise you're goiing to have to use extra processing time to Redim the array each time you want to add a student and keep track of the array size an all sorts of messing around.
However. Lets go with the most straightforward answer and assume your data has no brackets or blank lines and that there are only two students.
This code should work just fine. If you have a different definite number of students, then you will need to change the size of the Students array.
I'm also assuming that the data is correctly formatted and there are no non-numeric characters are in the lines where there shouldn't be.
Private Sub ReadStudentInfo()
'create an array that hold student details
Dim Students(2) As StudInfo
Dim index As Integer = 0
' read from text file
Dim datafile As New StreamReader("some.text")
Do Until datafile.EndOfStream
Dim tempStudent As StudInfo
With tempStudent
Integer.TryParse(datafile.ReadLine, .StudentId)
.FName = datafile.ReadLine
.LName = datafile.ReadLine
Integer.TryParse(datafile.ReadLine, .ScMark)
Integer.TryParse(datafile.ReadLine, .EnMark)
Integer.TryParse(datafile.ReadLine, .MaMark)
End With
Students(index) = tempStudent
index = index + 1
Loop
End Sub
If your text file does contain blank lines, just insert
datafile.ReadLine()
between each line of data - like this
Integer.TryParse(datafile.ReadLine, .ScMark)
datafile.ReadLine()
Integer.TryParse(datafile.ReadLine, .EnMark)
If you have the brackets in your text file, then you'll need to add extra code to remove them.

how to write to/read from a "settings" text file

I'm working on a Timer program, that allows the user to set up a timer for each individual user account on the computer. I'm having some trouble writing the settings to a text file and reading from it. I want to know if it's possible to write it in this fashion --> username; allowedTime; lastedLoggedIn; remainingTime; <-- in one line for each user, and how would I go about doing that? I also wanted to know if it's possible to alter the text file in this way, in the case that there's already an entry for a user, only change the allowedTime, or the remainingTime, kinda just updating the file?
Also I'm also having trouble being able to read from the text file. First of all I can't figure out how to determine if a selected user is in the file or not. Form there, if the user is listed in the file, how can access the rest of the line, like only get the allowedTime of that user, or the remaining time?
I tried a couple of ways, but i just can't get it to do how I'm imaging it, if that makes sense.
here's the code so far:
Public Sub saveUserSetting(ByVal time As Integer)
Dim hash As HashSet(Of String) = New HashSet(Of String)(File.ReadAllLines("Settings.txt"))
Using w As StreamWriter = File.AppendText("Settings.txt")
If Not hash.Contains(selectedUserName.ToString()) Then
w.Write(selectedUserName + "; ")
w.Write(CStr(time) + "; ")
w.WriteLine(DateTime.Now.ToLongDateString() + "; ")
Else
w.Write(CStr(time) + "; ")
w.WriteLine(DateTime.Now.ToLongDateString() + "; ")
End If
End Using
End Sub
Public Sub readUserSettings()
Dim currentUser As String = GetUserName()
Dim r As List(Of String) = New List(Of String)(System.IO.File.ReadLines("Settings.txt"))
'For Each i = 0 to r.lenght - 1
'Next
'check to see if the current user is in the file
MessageBox.Show(r(0).ToString())
If r.Contains(selectedUserName) Then
MessageBox.Show(selectedUserName + " is in the file.")
'Dim allowedTime As Integer
Else
MessageBox.Show("the user is not in the file.")
End If
'if the name is in the file then
'get the allowed time and the date
'if the date is older than the current date return the allowed time
'if the date = the current date then check thhe remaning time and return that
'if the date is ahead of the current date return the reamining and display a messgae that the current date needs to be updated.
End Sub
edit: I just wanted to make sure if I'm doing the serialization right and the same for the deserialization.
this is what i got so far:
Friend userList As New List(Of Users)
Public Sub saveUserSetting()
Using fs As New System.IO.FileStream("Settings.xml", IO.FileMode.OpenOrCreate)
Dim bf As New BinaryFormatter
bf.Serialize(fs, userList)
End Using
End Sub
Public Sub readUserSettings()
Dim currentUser As String = GetUserName()
Dim useList As New List(Of Users)
Using fs As New System.IO.FileStream("Settings.xml", IO.FileMode.OpenOrCreate)
Dim bf As New BinaryFormatter
useList = bf.Deserialize(fs)
End Using
MessageBox.Show(useList(0).ToString)
End Sub
<Serializable>
Class Users
Public userName As String
Public Property allowedTime As Integer
Public Property lastLoggedInDate As String
Public Property remainingTime As Integer
Public Overrides Function ToString() As String
Return String.Format("{0} ({1}, {2}, {3})", userName, allowedTime, lastLoggedInDate, remainingTime)
End Function
End Class
edit 2:
I'm not too familiar with try/catch but would this work instead?
Public Sub readUserSettings()
If System.IO.File.Exists("Settings") Then
Using fs As New System.IO.FileStream("Settings", FileMode.Open, FileAccess.Read)
Dim bf As New BinaryFormatter
userList = bf.Deserialize(fs)
End Using
Else
MessageBox.Show("The setting file doesn't exists.")
End If
End Sub
You have a few typos and such in your code, but it is pretty close for your first try:
Friend userList As New List(Of Users)
Public Sub saveUserSetting()
' NOTE: Using the BINARY formatter will write a binary file, not XML
Using fs As New System.IO.FileStream("Settings.bin", IO.FileMode.OpenOrCreate)
Dim bf As New BinaryFormatter
bf.Serialize(fs, userList)
End Using
End Sub
Public Sub readUserSettings()
' this doesnt seem needed:
Dim currentUser As String = GetUserName()
' do not want the following line - it will create a NEW
' useRlist which exists only in this procedure
' you probably want to deserialize to the useRlist
' declared at the module/class level
' Dim useList As New List(Of Users)
' a) Check if the filename exists and just exit with an empty
' useRlist if not (like for the first time run).
' b) filemode wass wrong - never create here, just read
Using fs As New System.IO.FileStream("Settings.bin",
FileMode.Open, FileAccess.Read)
Dim bf As New BinaryFormatter
' user list is declared above as useRList, no useList
useList = bf.Deserialize(fs)
End Using
' Console.WriteLine is much better for this
MessageBox.Show(useList(0).ToString)
End Sub
<Serializable>
Class Users
' I would make this a property also
Public userName As String
Public Property allowedTime As Integer
Public Property lastLoggedInDate As String
Public Property remainingTime As Integer
Public Overrides Function ToString() As String
Return String.Format("{0} ({1}, {2}, {3})", userName, allowedTime, lastLoggedInDate, remainingTime)
End Function
End Class
ToDo:
a) decide whether you want XML or binary saves. With XML, users can read/edit the file.
b) Use a file path created from Environment.GetFolder(); with a string literal it may end up in 'Program Files' when deployed, and you cannot write there.
c) when reading/loading the useRlist, use something like
FileStream(myUserFile, FileMode.Open, FileAccess.Read)
It wont exist the first time run, so check if it does and just leave the list empty. After that, you just need to open it for reading. For saving use something like:
FileStream(myUserFile, FileMode.OpenOrCreate, FileAccess.Write)
You want to create it and write to it. You might put the Load/Save code inside a Try/Catch so if there are file access issues you can trap and report them, and so you know the list did not get saved or read.
Using a serializer, the entire contents of the list - no matter how long - will get saved with those 3-4 lines of code, and the entire list read back in the 2-3 lines to load/read the file.
I don't have the answer to all your questions however I've been also working on a timer application and just recently started using text file to read and write information. The method I'm using has proven itself fairly easy to use and not very confusing. Here is an extract of my code:
Dim startup As String = "C:\Users\DigiParent\Desktop\Project data\Digitimeinfo.txt"
Dim reader As New System.IO.StreamReader(startup, Encoding.Default)
Dim data As String = reader.ReadToEnd
Dim aryTextFile(6) As String
aryTextFile = data.Split(",")
This will read everything in the text file and in sort separate everything in between the , and store them individual in the array. To put the code back in one line use
Dim LineOfText As String
LineOfText = String.Join(",", aryTextFile)
so you could write someting like this to write your info to a text file:
Dim startup As String = "C:\Users\DigiParent\Desktop\Project data\Digitimeinfo.txt"
Dim objWriter As New System.IO.StreamWriter(startup, False)
Dim aryTextFile(2) As String
aryTextFile(0) = pasword
aryTextFile(1) = user
aryTextFile(2) = remainingtime
LineOfText = String.Join(",", aryTextFile)
objWriter.WriteLine(LineOfText)
objWriter.Close()
and to read it you could use steam reader.

Search engine in vb.net

I am building a search engine in vb.net which would have to search for a word entered by the user in 40 text files within the project directory.
It should return the results as the total number of matches (text files) and the number of times this word is in each file. Any suggestions for a start would be grateful.
Regards.
get a list of the files in the directory with something like: Directory.GetFiles(ProjectDir, "*.*"), then read each file in the list like this:
Dim sr As StreamReader = New StreamReader(fileName)
Dim line As String
Do
line = sr.ReadLine()
scan the line and count
Loop Until line Is Nothing
sr.Close()
Try this code, in a console application, not only could find a word
even you can get the results using a RegEx Expression.
Class TextFileInfo
Public File As System.IO.FileInfo
public Count As Integer
public FileText As String
public ItMatch as Boolean = False
Sub New (FileFullName as String,WordPattern as String)
File = new System.IO.FileInfo(FileFullName)
Using Fs As System.IO.StreamReader(File.FullName)
FileText = Fs.ReadToEnd()'//===>Read Text
End Using
Count = _CountWords(WordPattern,FileText)
ItMatch = Count > 0
End Sub
Public Sub DisplayInfo()
System.Console.WriteLine("File Name:" + File.Name)
System.Console.WriteLine("Matched Times:" & Count)
End Sub
Private Function _CountWords(Word As String,Text As String) as Integer
Dim RegEx as System.Text.RegularExpressions.Regex(Word)
return RegEx.Matches(Text).Count'//===>Returns how many times this word match in the Text
End Fuction
End Class
Public Function SearchEngine(PatternWord As String,RootDirectory As String) List(Of TextFileInfo)
Dim MatchedFiles As New List(Of TextFileInfo)
Dim RootDir As New System.IO.DirectoryInfo(RootDirectory)
For Each iTextFile as System.IO.FileInfo In RootDir.GetFiles("*.txt")
'//===>Create a object of TextFileInfo and check if the file contains the word
Dim iMatchFile as New TextFileInfo(iTextFiles.FullName,PatternWord)
If iMatchFile.ItMatch Then
'//===>Add the object to the list if it has been matches
MatchedFiles.Add(iMatchFile)
End If
Loop
retur MatchedFiles '//===>Return the results of the files that has the matched word
End Function
Sub Main()
Dim SearchResults as List(Of TextFileInfo) = SearchEngine("JajajaWord","C:\TextFiles\")
For Each iSearch As TextFileInfo In SearchResults
iSearch.DisplayInfo()
Loop
End Sub