itextsharp search pdf and extract found pages to another pdf - vb.net

Can anyone show me how to extract pages based on page numbers found in search and create new pdf to be able to print? What I have in mind is I will search a pdf using vb.net and the pages that have my answer will be extracted to another pdf and in the end of search it will print the new pdf. What I have done till now is I have done the search and it returns page number for the correct results, but I dont know from here what to do please see below:
Public Shared Function SearchTextFromPdf(ByVal sourcePdf As String, ByVal searchPhrase As String, Optional ByVal caseSensitive As Boolean = False) As List(Of Integer)
Dim fBrowse As New OpenFileDialog
With fBrowse
.Filter = "PDF Files(*.pdf)|*.pdf|All Files(*.*)|*.*"
.Title = "Choose Pdf"
End With
If fBrowse.ShowDialog() = Windows.Forms.DialogResult.OK Then
sourcePdf = fBrowse.FileName
Else
Exit Function
End If
Dim foundList As New List(Of Integer)
Dim raf As iTextSharp.text.pdf.RandomAccessFileOrArray = Nothing
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
Try
raf = New iTextSharp.text.pdf.RandomAccessFileOrArray(sourcePdf)
reader = New iTextSharp.text.pdf.PdfReader(raf, Nothing)
If caseSensitive = False Then
searchPhrase = searchPhrase.ToLower()
End If
For i As Integer = 1 To reader.NumberOfPages()
Dim pageText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, i)
If caseSensitive = False Then
pageText = pageText.ToLower()
End If
If pageText.Contains(searchPhrase) Then
MsgBox(i)
foundList.Add(i)
End If
Next
reader.Close()
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
Return foundList
End Function

You can use the following code:
Imports iTextSharp.text.pdf.parser
Imports iTextSharp.text.pdf
Imports iTextSharp.text
Imports System.IO
Public Class Form1
Dim sourceFile As String = "D:\source.pdf"
Dim resultFile As String = "D:\result.pdf"
Dim arrayOfPages As Integer() = {1, 5, 7, 9}
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
ExtractPages(sourceFile, arrayOfPages)
End Sub
Public Sub ExtractPages(sourcePdfFile As String, pagesForExtracting As Integer())
Dim reader As New PdfReader(sourcePdfFile)
Dim document As New Document(reader.GetPageSize(1))
Dim pdfCopy As New PdfCopy(document, New FileStream(resultFile, FileMode.Create))
Try
document.Open()
For Each pageNumber As Integer In pagesForExtracting
Dim importedPage As PdfImportedPage = pdfCopy.GetImportedPage(reader, pageNumber)
pdfCopy.AddPage(importedPage)
Next
Dim text As String = PdfTextExtractor.GetTextFromPage(reader, 1, New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy())
document.Close()
reader.Close()
Catch ex As Exception
Throw ex
End Try
End Sub
End Class
If pdfCopy throws null reference exception - you have to ignore this exception, choosing Continue in Visual Studio IDE

Related

Import very large .csv to List array, then copy to DataTable

I am trying to import a large CSV file, where I am dumping each row of the input csv file into an array (vector), which is NumColumns long. I fetched some code to copy a list to a DataTable, however, I am not sure the IList (IsEnumerable?) is needed. I also haven't looked into what T is.
My gut feeling is that I can go to some other code I have to load a DataTable with row and column data from a 2-dimensional array x(,), but for some reason I think there may be a fast way to simply .add(x), i.e. add the entire row vector to the DataTable to keep the speed up. You don't want to loop through columns(?)
Below is the code which will open up any .csv.
Imports System.ComponentModel
Imports System.IO
Public Class Form1
Dim NumColumns As Integer
Dim ColumnNames() As String
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim filename As String = Nothing
With OpenFileDialog1
.FileName = "*.csv"
.CheckFileExists = True
.ShowReadOnly = True
.Filter = "Comma delimited *.csv|*.csv"
If .ShowDialog = DialogResult.OK Then
filename = .FileName
End If
End With
Dim csvreader As New StreamReader(filename)
Dim inputLine As String = ""
inputLine = csvreader.ReadLine()
Dim buff() As String = Split(inputLine, ",")
NumColumns = UBound(buff)
ReDim ColumnNames(UBound(buff) + 1)
For j As Integer = 0 To NumColumns
ColumnNames(j + 1) = buff(j)
Next
inputLine = csvreader.ReadLine()
Do While inputLine IsNot Nothing
Dim rowdata = New MyDataArray(NumColumns)
Dim csvArray() As String = Split(inputLine, ",")
For i As Integer = 0 To NumColumns
rowdata.x(i) = csvArray(i)
Next
MyDataArray.DataArray.Add(rowdata)
inputLine = csvreader.ReadLine()
Loop
Dim dgv As New DataGridView
dgv.DataSource = ToDataTable(MyDataArray.DataArray)
dgv.Width = 1000
dgv.Height = 1000
Me.Controls.Add(dgv)
End Sub
Public Shared Function ToDataTable(Of T)(data As IList(Of T)) As DataTable
Dim properties As PropertyDescriptorCollection = TypeDescriptor.GetProperties(GetType(T))
Dim dt As New DataTable()
For i As Integer = 0 To properties.Count - 1
Dim [property] As PropertyDescriptor = properties(i)
dt.Columns.Add([property].Name, [property].PropertyType)
Next
Dim values As Object() = New Object(properties.Count - 1) {}
For Each item As T In data
For i As Integer = 0 To values.Length - 1
values(i) = properties(i).GetValue(item)
Next
dt.Rows.Add(values(1))
Next
Return dt
End Function
End Class
Public Class MyDataArray
Public Shared DataArray As New List(Of MyDataArray)()
Public Property x() As Object
Sub New(ByVal cols As Integer)
ReDim x(cols)
End Sub
End Class
Maybe this code will help?
You can use OleDB to convert the CSV to a Database and then put it into a datatable. All you need is a DataGridView and form (If you want to print it). Then you can use teh code below to accomplish what you need to do.
Public Class Form1
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim file As String = "test.txt"
Dim path As String = "C:\Test\"
Dim ds As New DataSet
Try
If IO.File.Exists(IO.Path.Combine(path, file)) Then
Dim ConStr As String = _
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & _
path & ";Extended Properties=""Text;HDR=No;FMT=Delimited\"""
Dim conn As New OleDb.OleDbConnection(ConStr)
Dim da As New OleDb.OleDbDataAdapter("Select * from " & _
file, conn)
da.Fill(ds, "TextFile")
End If
Catch ex As Exception
MessageBox.Show(ex.ToString)
End Try
DataGridView1.DataSource = ds.Tables(0)
End Sub
End Class
However, you could always convert it to an xml and work from there
Imports System.IO
Module Module3
Public Function _simpleCSV2tbl(CSVfile As String) As DataTable
Dim watch As Stopwatch = Stopwatch.StartNew()
watch.Start()
'A,B,C,D,E
'00001,4,1,2,3560
'00002,4,12,1,2000
'00003,1,4,2,4500
'00004,4,12,1,2538.63
'00005,1,1,2,3400
'00006,2,5,2,2996.48
Dim dTable As New DataTable(CSVfile)
Using reader As New StreamReader(CSVfile)
Dim CSV1stLine As String = reader.ReadLine
Dim getCols = (From s In CSV1stLine.Split(",") Select s).ToList()
Dim setTblColumns = (From c In getCols Select dTable.Columns.Add(c, GetType(String))).ToList
Dim ReadToEnd As String = reader.ReadToEnd()
Dim getRows = (From s In ReadToEnd.Split(vbLf) Select s).ToArray
_setTblRows(getRows, dTable)
Console.WriteLine(String.Format("Elapsed: {0}", Format(watch.Elapsed.TotalMilliseconds, "F"), dTable.Rows.Count))
reader.Close()
reader.Dispose()
End Using
_ShowTbl(dTable, 10)
End Function
Public Function _setTblRows(getRows As String(), dTable As DataTable) As IEnumerable
_setTblRows = getRows.Select(Function(r) dTable.LoadDataRow(r.Split(","), False)).ToArray()
End Function
Public Sub _ShowTbl(ByVal dTable As DataTable, Optional ByVal rPad As Short = 0)
If dTable.TableName = Nothing Then dTable.TableName = "NoName"
If rPad = 0 Then
Console.WriteLine(String.Format("->Unformatted Table: {0}, Count={1}", dTable.TableName, dTable.Rows.Count))
Else
Console.WriteLine(String.Format("->Formatted Table: {0}, Count={1}", dTable.TableName, dTable.Rows.Count))
End If
_ShowTblColumns(dTable.Columns, rPad)
_ShowTblRows(dTable.Rows, rPad)
End Sub
Public Function _ShowTblColumns(ByVal TblColumns As DataColumnCollection, Optional ByVal rPad As Short = 0)
Dim getTblColumns = (From c As DataColumn In TblColumns Select c.ColumnName).ToList()
Console.WriteLine(String.Join(",", getTblColumns.Select(Function(s) String.Format(s.PadLeft(rPad, vbNullChar)).ToString).ToArray))
End Function
Public Function _ShowTblRow(ByVal Row As DataRow, Optional ByVal rPad As Short = 0)
Dim getRowFields = (From r In Row.ItemArray Select r).ToList
Console.WriteLine(String.Join(",", getRowFields.Select(Function(s) String.Format(s.PadLeft(rPad, vbNullChar)).ToString).ToArray))
End Function
Public Function _ShowTblRows(ByVal Rows As DataRowCollection, Optional ByVal rPad As Short = 0)
Dim rCount As Integer
For Each row As DataRow In Rows
_ShowTblRow(row, rPad)
rCount += 1
If rCount Mod 20 = 0 Then
If NewEscape(String.Format(" {0} out of {1}", rCount.ToString, (Rows.Count).ToString)) Then Exit Function
End If
Next row
Console.WriteLine()
End Function
Public Function _NewEscape(ByVal Message As String) As Boolean
Console.Write("{0} / Press any key to continue or Esc to quit {1}", Message, vbCrLf)
Dim rChar As Char = Console.ReadKey(True).KeyChar
Dim rEscape As Boolean
If rChar = Chr(27) Then rEscape = True
Return rEscape
End Function
End Module

Reporting Progress for a CopyToAsync Operation

Is there a way to report progress on a CopyToAsync operation on a FileStream? As far as I can tell there are no Events listed for a FileStream object so I can't add a handler to it. The best examples I've found deal with DownloadProgressChanged/DownloadFileComplete for WebClient objects.
For i As Int32 = 0 To strFileList.Count - 1
Try
Using srmSource As FileStream = File.Open(dirSource + strFileList(i), FileMode.Open)
Using srmDestination As FileStream = File.Create(dirDestination + strFileList(i))
Me.lblStatus.Text = "Copying file - " & strFileList(i) & "..."
Await srmSource.CopyToAsync(srmDestination)
End Using
End Using
Me.lblStatus.Text = "Copying complete!"
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
Next
Here's what I came up with using these links as references:
http://blogs.msdn.com/b/dotnet/archive/2012/06/06/async-in-4-5-enabling-progress-and-cancellation-in-async-apis.aspx (converted from C# to VB.NET)
http://social.msdn.microsoft.com/Forums/en-US/8c121fef-ebc7-42ab-a2f8-3b5e9a6e9854/delegates-with-parameter?forum=vbide
Imports System.IO
Imports System.Net
Imports System.Threading.Tasks
Public Class frmStartup
Private Async Sub frmStartup_Load(sender As Object, e As EventArgs) Handles Me.Load
Dim FileList As List(Of String) = GetFilesToTransfer()
If FileList.Count > 0 Then
UpdateLabel("Found files to transfer...")
Me.prgTransfer.Visible = True
Try
Dim ProgressIndicator As Object = New Progress(Of Int32)(AddressOf ReportProgress)
Await TransferFiles(FileList, ProgressIndicator)
UpdateLabel("File transfer complete!")
Catch ex As Exception
UpdateLabel("Error transferring files!")
Finally
Me.prgTransfer.Visible = False
End Try
End If
End Sub
Private Function GetFilesToTransfer() As List(Of String)
Dim strFilesToTransfer As List(Of String) = New List(Of String)
strFilesToTransfer.Add("aud1.mp3")
strFilesToTransfer.Add("aud2.mp3")
Return strFilesToTransfer
End Function
Public Async Function TransferFiles(ByVal FileList As List(Of String), ByVal Progress As IProgress(Of Int32)) As Task
Dim intTotal As Int32 = FileList.Count
Dim dirSource As String = "\\source\"
Dim dirDestination As String = "c:\destination\"
Await Task.Run(Async Function()
Dim intTemp As Int32 = 0
For i As Int32 = 0 To FileList.Count - 1
UpdateLabel("Copying " & FileList(i) & "...")
Using srmSource As FileStream = File.Open(dirSource + FileList(i), FileMode.Open)
Using srmDestination As FileStream = File.Create(dirDestination + FileList(i))
Await srmSource.CopyToAsync(srmDestination)
End Using
End Using
intTemp += 1
If Progress IsNot Nothing Then
Progress.Report((intTemp * 100 / intTotal))
End If
Next
End Function)
End Function
Private Delegate Sub UpdateLabelInvoker(ByVal LabelText As String)
Private Sub UpdateLabel(ByVal LabelText As String)
If Me.lblStatus.InvokeRequired Then
Me.lblStatus.Invoke(New UpdateLabelInvoker(AddressOf UpdateLabel), LabelText)
Else
Me.lblStatus.Text = LabelText
End If
End Sub
Private Sub ReportProgress(ByVal Value As Int32)
Me.prgTransfer.Value = Value
End Sub
End Class

how to read from text file to textbox in visual basic 1 line every hit on button

I have files type .txt (Text File) and it's have multiple line and i have these pictures
i want to make program generate fake information
Mean: when some one hit generate button it's ready from text file and fill textbox in visual basic
every hit(press) on Generate Button make program generate new information from text files (.txt)
i tried a lot ways:
Code:
Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText("C:\test.txt")
Code:
Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText("C:\test.txt", _
System.Text.Encoding.UTF32)
and this
Code:
Dim oFile as System****.File
Dim oRead as System****.StreamReader
oRead = oFile.OpenText(“C:\test.txt”)
and this
Code:
Dim FILE_NAME As String = "C:\Users\user\Desktop\test.txt"
Dim objReader As New System.I--O.StreamReader(FILE_NAME)
TextBox1.Text = objReader.ReadToEnd
Code:
' Form Load :
Dim text As String = MsgBox("text you want to make the form remember it.")
Or new Sub :
Code:
Private Sub Text
text
Code:
' Button Click :
Text()
Code:
Dim path As String = "THE FILE PATH" 'The file path
Dim reader As New IO.StreamReader(path)
Dim lineIndex As Integer = 2 ' The index of the line that you want to read
For i As Integer = 0 To lineIndex - 1
reader.ReadLine()
Next
TextBox1.Text = reader.ReadLine
Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
TextBox1.Text = ReadLineFromTxt("THE TXT FILE PATH", 0) '0 is the line index
End Sub
Public Shared Function ReadLineFromTxt(ByVal path As String, ByVal lineIndex As Integer) As String
Dim reader As New IO.StreamReader(path)
For I As Integer = 0 To lineIndex - 1
reader.ReadLine()
Next
Return reader.ReadLine()
End Function
End Class
These ways take from members in this fourm:
http://www.mpgh.net/forum/33-visual-basic-programming/693165-help-how-can-i-generate-text-txt-file-textbox-when-button-click-2.html
if are these ways working please tell me how to use it in best way
i have Visual studio 2012 and updated 1
With all due respect
Assuming you are reading from a file and displaying lines on the form, you can use these.
If you have a large file (> 10MB), then you can use this pattern... (syntax from memory, please excuse mistype)
Public Class YourFormNameHere
Private _CurrentLine as Integer = 0
Private Sub btnClicked(sender, e) 'or enter pressed - This is YOUR keypress event handler.
Using Dim sr as New StreamReader(filePath)
Dim _curIndex as Integer = 0
While (sr.EndOfFile == false)
Dim _line as String = sr.ReadLine()
If (_curIndex = _CurrentLine)
txtLineDisplay.Text = _line
Break
End If
curIndex += 1
End While
End Using
End Sub
End Class
If you have a smaller file, then use this pattern.
Public Class YourFormNameHere
Private _Lines as String()
Private _CurrentLine as Integer = 0
Private Sub formLoad(sender, e) 'one-time load event - This is YOUR form load event
_Lines = File.ReadAllLines(filePath)
End Sub
Private Sub btnClicked(sender, e) 'or enter pressed - This is YOUR keypress event handler.
txtLineDisplay.Text = _Lines(_CurrentLine)
_CurrentLine += 1
End Sub
End Class

Convert byte array to string in VB.net

I have byte data of .doc, .txt, .docx and I want to convert it to string, I did following things but not getting exact result:
Public ByteData As Byte() = // my data
Dim str As String = String.Empty
str = System.Text.Encoding.UTF8.GetString(objCandidateInfo.ByteData, 0, objCandidateInfo.ByteData.Length)
str = Convert.ToBase64String(objCandidateInfo.ByteData)
Edited
So now I am converting the same using Word Application, this code is working
this is my code
Private Shared ObjwordApp As Word.Application
Private Shared nullobj As Object = System.Reflection.Missing.Value
Private Shared doc As Word.Document
Shared Sub New()
ObjwordApp = New Word.Application()
End Sub
Public Shared Sub InitializeClass()
ObjwordApp.Visible = False
End Sub
Private Shared Sub OpenWordFile(ByVal StrFilePath As Object)
Try
ObjwordApp.Visible = False
Catch ex As Exception
ObjwordApp = New Word.Application()
End Try
Try
doc = ObjwordApp.Documents.Open(StrFilePath, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj)
Catch ex As Exception
CloseWordFile()
ObjwordApp.Visible = False
End Try
End Sub
Private Shared Sub CopyWordContent()
Try
doc.ActiveWindow.Selection.WholeStory()
doc.ActiveWindow.Selection.Copy()
Catch ex As Exception
Clipboard.Clear()
End Try
End Sub
Private Shared Sub CloseWordFile()
Try
doc.Close()
Catch ex As Exception
End Try
End Sub
Public Shared Function ReadWordFile(ByVal StrFilePath As String, ByVal StrDataFormat As String) As String
Dim StrFileContent = String.Empty
If (File.Exists(StrFilePath)) Then
Try
OpenWordFile(StrFilePath)
CopyWordContent()
Catch ex As Exception
Finally
CloseWordFile()
End Try
Try
Dim dataObj As IDataObject = Clipboard.GetDataObject()
If (dataObj.GetDataPresent(StrDataFormat)) Then
StrFileContent = dataObj.GetData(StrDataFormat)
Else
StrFileContent = ""
End If
Clipboard.Clear()
Catch ex As Exception
End Try
End If
Return StrFileContent
End Function
And when I saving the byte array to DB, I call below function and convert it to rtf, it is not converting, when I attach debugger to it dataObj is Nothing
code 1
Dim str As String = String.Empty
Try
'str = System.Text.Encoding.UTF8.GetString(objCandidateInfo.ByteData, 0, objCandidateInfo.ByteData.Length)
'str = Convert.ToBase64String(objCandidateInfo.ByteData)
'str = System.Text.Encoding.ASCII.GetString(objCandidateInfo.ByteData, 0, objCandidateInfo.ByteData.Length)
str = ClsDocumentManager.ReadContent(objCandidateInfo.ByteData, DataFormats.Rtf)
Catch ex As Exception
End Try
I save data db in both byte and text format, so when I call it from db (byte value that I save and convert it to rtf), its working the code is
Code 2
rtbAttachment.Rtf = ClsDocumentManager.ReadContent(byteAttachment, DataFormats.Rtf)
These are the methods in ClsDocumentManager class
Public Shared Function GetRandomNo() As Integer
Dim RandomNo As New Random()
Return RandomNo.Next(Convert.ToInt32(DateTime.Now().Minute.ToString() & DateTime.Now().Second.ToString() & DateTime.Now().Hour.ToString()))
End Function
Public Shared Function ReadContent(ByVal byteArray As Byte(), ByVal StrReadFormat As String) As String
Dim StrFileContent As String = String.Empty
Try
If (Not IsNothing(byteArray)) Then
Dim StrFileName As String = GetRandomNo().ToString() & ".doc"
StrFileName = ClsSingleton.aTempFolderName & StrFileName
If (CreateWordFile(byteArray, StrFileName)) Then
StrFileContent = ClsWordManager.ReadWordFile(StrFileName, StrReadFormat)
If (File.Exists(StrFileName)) Then
File.Delete(StrFileName)
End If
End If
End If
Catch ex As Exception
End Try
Return StrFileContent
End Function
Public Shared Function CreateWordFile(ByVal byteArray As Byte(), ByVal StrFileName As String) As Boolean
Dim boolResult As Boolean = False
Try
If (Not IsNothing(byteArray)) Then
If (Not File.Exists(StrFileName)) Then
Dim objFileStream As New FileStream(StrFileName, FileMode.Create, FileAccess.Write)
objFileStream.Write(byteArray, 0, byteArray.Length)
objFileStream.Close()
boolResult = True
End If
End If
Catch ex As Exception
boolResult = False
End Try
Return boolResult
End Function
Error Code while debugging
Dim dataObj As IDataObject = Clipboard.GetDataObject()
If (dataObj.GetDataPresent(StrDataFormat)) Then
StrFileContent = dataObj.GetData(StrDataFormat)
Else
StrFileContent = ""
End If
`dataObj` is `Nothing` only when calling from **Code 1**
Updated
**`ClsDocumentManager`**
Imports System.IO
Public Class ClsDocumentManager
Public Shared Function GetRandomNo() As Integer
Dim RandomNo As New Random()
Return RandomNo.Next(Convert.ToInt32(DateTime.Now().Minute.ToString() & DateTime.Now().Second.ToString() & DateTime.Now().Hour.ToString()))
End Function
Public Shared Function ReadContent(ByVal byteArray As Byte(), ByVal StrReadFormat As String) As String
Dim StrFileContent As String = String.Empty
Try
If (Not IsNothing(byteArray)) Then
Dim StrFileName As String = GetRandomNo().ToString() & ".doc"
StrFileName = ClsSingleton.aTempFolderName & StrFileName
If (CreateWordFile(byteArray, StrFileName)) Then
StrFileContent = ClsWordManager.ReadWordFile(StrFileName, StrReadFormat)
If (File.Exists(StrFileName)) Then
File.Delete(StrFileName)
End If
End If
End If
Catch ex As Exception
End Try
Return StrFileContent
End Function
Public Shared Function CreateWordFile(ByVal byteArray As Byte(), ByVal StrFileName As String) As Boolean
Dim boolResult As Boolean = False
Try
If (Not IsNothing(byteArray)) Then
If (Not File.Exists(StrFileName)) Then
Dim objFileStream As New FileStream(StrFileName, FileMode.Create, FileAccess.Write)
objFileStream.Write(byteArray, 0, byteArray.Length)
objFileStream.Close()
boolResult = True
End If
End If
Catch ex As Exception
boolResult = False
End Try
Return boolResult
End Function
End Class
Here is my ClsWordManager Class
Imports System.IO
Imports System.Text
Public Class ClsWordManager
Private Shared ObjwordApp As Word.Application
Private Shared nullobj As Object = System.Reflection.Missing.Value
Private Shared doc As Word.Document
Shared Sub New()
ObjwordApp = New Word.Application()
End Sub
Public Shared Sub InitializeClass()
ObjwordApp.Visible = False
End Sub
Private Shared Sub OpenWordFile(ByVal StrFilePath As Object)
Try
ObjwordApp.Visible = False
Catch ex As Exception
ObjwordApp = New Word.Application()
End Try
Try
doc = ObjwordApp.Documents.Open(StrFilePath, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj, nullobj)
Catch ex As Exception
CloseWordFile()
ObjwordApp.Visible = False
End Try
End Sub
Private Shared Sub CopyWordContent()
Try
doc.ActiveWindow.Selection.WholeStory()
doc.ActiveWindow.Selection.Copy()
Catch ex As Exception
Clipboard.Clear()
End Try
End Sub
Private Shared Sub CloseWordFile()
Try
doc.Close()
Catch ex As Exception
End Try
End Sub
Public Shared Function ReadWordFile(ByVal StrFilePath As String, ByVal StrDataFormat As String) As String
Dim StrFileContent = String.Empty
If (File.Exists(StrFilePath)) Then
Try
OpenWordFile(StrFilePath)
CopyWordContent()
Catch ex As Exception
Finally
CloseWordFile()
End Try
Try
Dim dataObj As IDataObject = Clipboard.GetDataObject()
If (dataObj.GetDataPresent(StrDataFormat)) Then
StrFileContent = dataObj.GetData(StrDataFormat)
Else
StrFileContent = ""
End If
Clipboard.Clear()
Catch ex As Exception
End Try
End If
Return StrFileContent
End Function
End Class
So now the problem is When I convert it in following code : look at ByteAttachmets in arguement, it convert byte to string
Public Function UpdateCandidateAttachment(ByVal CandidateID As Integer, ByVal ByteAttachmets As Byte(), ByVal StrExtension As String) As Integer
Dim Result As Integer = -1
Try
Dim objDataLayer As New ClsDataLayer()
Dim str As String = Nothing
Try
'str = System.Text.Encoding.UTF8.GetString(objCandidateInfo.ByteData, 0, objCandidateInfo.ByteData.Length)
'str = Convert.ToBase64String(objCandidateInfo.ByteData)
'str = System.Text.Encoding.ASCII.GetString(objCandidateInfo.ByteData, 0, objCandidateInfo.ByteData.Length)
str = ClsDocumentManager.ReadContent(ByteAttachmets, DataFormats.Rtf)
Catch ex As Exception
End Try
objDataLayer.AddParameter("#CANDIDATE_ID", CandidateID)
objDataLayer.AddParameter("#ATTACHMENT_DATA", ByteAttachmets)
objDataLayer.AddParameter("#CREATED_BY", ClsCommons.IntUserId)
objDataLayer.AddParameter("#EXTENSION", StrExtension)
Result = objDataLayer.ExecuteNonQuery("TR_PROC_UpdateCandidateAttachment")
Catch ex As Exception
MsgBox(ex.Message)
End Try
Return Result
End Function
And when I call it from following code by property : look at objCandidateInfo.ByteData, it is not working.
Public Function AddUpdateCandidate(ByVal objCandidateInfo As ClsCandidateInfo) As Integer
Dim Result As Integer = -1
Try
If (ClsCommons.IsValidEmail(objCandidateInfo.StrEmail)) Then
Dim str As String = Nothing
Try
'str = System.Text.Encoding.UTF8.GetString(objCandidateInfo.ByteData, 0, objCandidateInfo.ByteData.Length)
'str = Convert.ToBase64String(objCandidateInfo.ByteData)
'str = System.Text.Encoding.ASCII.GetString(objCandidateInfo.ByteData, 0, objCandidateInfo.ByteData.Length)
Dim byteAttachment As Byte() = objCandidateInfo.ByteData
str = ClsDocumentManager.ReadContent(byteAttachment, DataFormats.Rtf)
Catch ex As Exception
End Try
Dim objDataLayer As New ClsDataLayer()
objDataLayer.AddParameter("#REQUIREMENT_ID", objCandidateInfo.RequirementId)
objDataLayer.AddParameter("#Candidate_Name", objCandidateInfo.StrCandidateName)
objDataLayer.AddParameter("#Current_Organization", objCandidateInfo.StrCurrentCompany)
objDataLayer.AddParameter("#Current_Designation", objCandidateInfo.StrCurrentDesignation)
If (objCandidateInfo.StrExp.Trim() = "") Then
objDataLayer.AddParameter("#Overall_Exp", DBNull.Value)
Else
Dim DecExp As Decimal = -1
If (Decimal.TryParse(objCandidateInfo.StrExp, DecExp)) Then
objDataLayer.AddParameter("#Overall_Exp", DecExp)
Else
objDataLayer.AddParameter("#Overall_Exp", DBNull.Value)
End If
End If
objDataLayer.AddParameter("#Qualification", objCandidateInfo.StrQualification)
objDataLayer.AddParameter("#Location", objCandidateInfo.StrCurrentLocation)
objDataLayer.AddParameter("#Current_CTC", objCandidateInfo.StrCurrentCTC)
objDataLayer.AddParameter("#Expected_CTC", objCandidateInfo.StrExpectedCTC)
objDataLayer.AddParameter("#Phone_No", objCandidateInfo.StrPhoneNo)
objDataLayer.AddParameter("#Mobile", objCandidateInfo.StrMobile)
objDataLayer.AddParameter("#Notice_Period", objCandidateInfo.StrNoticePeriod)
objDataLayer.AddParameter("#Remarks", objCandidateInfo.StrRemarks)
If (objCandidateInfo.StrYearofExp.Trim() = "") Then
objDataLayer.AddParameter("#Years_of_Experience", DBNull.Value)
Else
Dim DecExp As Decimal = -1
If (Decimal.TryParse(objCandidateInfo.StrYearofExp, DecExp)) Then
objDataLayer.AddParameter("#Years_of_Experience", DecExp)
Else
objDataLayer.AddParameter("#Years_of_Experience", DBNull.Value)
End If
End If
objDataLayer.AddParameter("#Address", objCandidateInfo.StrAddress)
objDataLayer.AddParameter("#Email", objCandidateInfo.StrEmail)
If (objCandidateInfo.intIndustry > 0) Then
objDataLayer.AddParameter("#Industry", objCandidateInfo.intIndustry)
Else
objDataLayer.AddParameter("#Industry", DBNull.Value)
End If
If (objCandidateInfo.intFunctionalArea > 0) Then
objDataLayer.AddParameter("#Functional_Area", objCandidateInfo.intFunctionalArea)
Else
objDataLayer.AddParameter("#Functional_Area", DBNull.Value)
End If
If (objCandidateInfo.StrDob.Trim() = "") Then
objDataLayer.AddParameter("#DOB", DBNull.Value)
Else
Try
objDataLayer.AddParameter("#DOB", Convert.ToDateTime(objCandidateInfo.StrDob))
Catch ex As Exception
objDataLayer.AddParameter("#DOB", DBNull.Value)
End Try
End If
If (objCandidateInfo.intSourceBy > 0) Then
objDataLayer.AddParameter("#Source", objCandidateInfo.intSourceBy)
Else
objDataLayer.AddParameter("#Source", DBNull.Value)
End If
objDataLayer.AddParameter("#SKILL_SET", objCandidateInfo.strSkillSet)
objDataLayer.AddParameter("#ATTACHMENT_DATA", objCandidateInfo.ByteData)
objDataLayer.AddParameter("#EXTENSION", objCandidateInfo.StrExtension)
objDataLayer.AddParameter("#CREATED_BY", ClsCommons.IntUserId)
Result = objDataLayer.ExecuteNonQuery("TR_PROC_AddUpdateFullCandidateData")
Else
MsgBox("Data is not extracted, Some Error Occured, please update your software.")
End If
Catch ex As Exception
MsgBox(ex.Message)
End Try
Return Result
End Function
I hope I clear my query
(Edited after several changes to question.)
If you only want to get the text content of the file, you need to handle text files and binary files differently. If the input file format is text-base (.txt, .htm, etc.) you can mostly treat it as a string, although you still need to know what encoding to use.
If, however, the input file format is binary (like .doc, .docx, etc.), you cannot just convert your byte array directly to a string because the file contents do not represent only text - the bytes describe layout, formatting, and other information about the file. In that case you need to use Word or some other 3rd-part library to handle the file data for you.
To get the content of a Word document using automation, just create an instance of Word.Application, open a document, select all text in its active window and use the Selection.Text property to get the text into a string. Something like:
oDocument.ActiveWindow.Selection.WholeStory()
sText = oDocument.ActiveWindow.Selection.Text
The Selection object is an instance of Range in Word. This gives you the plain, unformatted content of the document. You can either convert it to a byte array or use it as a string. To convert it to a byte array, you need to use an encoding because in-memory characters must be translated to bytes.
If you want to convert your content to RTF format, you need 3rd-part tools (or implement the RTF format yourself) - RTF is not a plain text format, it has fairly complex structure.
You can also use Word to save a document in RTF format - look up the Document.SaveAs2() method to do this. This saves the document to disk in RTF format. If you need this data in a database, just read the .rtf file (File.ReadAllBytes()) and then save the bytes to the database.

The best way to extract data from a CSV file into a searchable datastructure?

I have a csv file with 48 columns of data.
I need to open this file, place it into a data structure and then search that data and present it in a DataRepeater.
So far I have successfully used CSVReader to extract the data and bind it to myDataRepeater. However I am now struggling to place the data in a table so that I can filter the results. I do not want to use SQL or any other database.
Does anyone have a suggestion on the best way to do this?
So far, this is working in returning all records:
Private Sub BindCsv()
' open the file "data.csv" which is a CSV file with headers"
Dim dirInfo As New DirectoryInfo(Server.MapPath("~/ftp/"))
Dim fileLocation As String = dirInfo.ToString & "data.txt"
Using csv As New CsvReader(New StreamReader(fileLocation), True)
myDataRepeater.DataSource = csv
myDataRepeater.DataBind()
End Using
End Sub
Protected Sub myDataRepeater_ItemDataBound(ByVal sender As Object, ByVal e As RepeaterItemEventArgs) Handles myDataRepeater.ItemDataBound
Dim dataItem As String() = DirectCast(e.Item.DataItem, String())
DirectCast(e.Item.FindControl("lblPropertyName"), ITextControl).Text = dataItem(2).ToString
DirectCast(e.Item.FindControl("lblPrice"), ITextControl).Text = dataItem(7).ToString
DirectCast(e.Item.FindControl("lblPricePrefix"), ITextControl).Text = dataItem(6)
DirectCast(e.Item.FindControl("lblPropertyID"), ITextControl).Text = dataItem(1)
DirectCast(e.Item.FindControl("lblTYPE"), ITextControl).Text = dataItem(18)
DirectCast(e.Item.FindControl("lblBedrooms"), ITextControl).Text = dataItem(8)
DirectCast(e.Item.FindControl("lblShortDescription"), ITextControl).Text = dataItem(37)
Dim dirInfo As New DirectoryInfo(Server.MapPath("~/ftp/images/"))
DirectCast(e.Item.FindControl("imgMain"), Image).ImageUrl = dirInfo.ToString & "pBRANCH_" & dataItem(1) & ".jpg"
DirectCast(e.Item.FindControl("linkMap"), HyperLink).NavigateUrl = "http://www.multimap.com/map/browse.cgi?client=public&db=pc&cidr_client=none&lang=&pc=" & dataItem(5) & "&advanced=&client=public&addr2=&quicksearch=" & dataItem(5) & "&addr3=&addr1="
End Sub
Code add to filter results:
Try
Dim csv As New CSVFile(fileLocation)
Dim ds As DataSet = csv.ToDataSet("MyTable")
If Not ds Is Nothing Then
Dim strExpr As String = "Bedrooms >= '3'"
Dim strSort As String = "PropertyID ASC"
'Use the Select method to find all rows matching the filter.
Dim myRows() As DataRow
'myRows = Dt.Select(strExpr, strSort)
myRows = csv.ToDataSet("MyTable").Tables("MyTable").Select(strExpr, strSort)
myDataRepeater.DataSource = myRows
myDataRepeater.DataBind()
End If
Catch ex As Exception
End Try
Which does return the two rows I am expecting but then when it binds to the datarepeater I get the following error:
DataBinding: 'System.Data.DataRow' does not contain a property with the name 'PropertyName'.
Corrected code, filter not being applied:
Public Sub PageLoad(ByVal Sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
If Not Page.IsPostBack Then
ReadCsv()
lblSearch.Text = "Lettings Search"
End If
End Sub
Private Sub ReadCsv()
Dim dirInfo As New DirectoryInfo(Server.MapPath("~/ftp/"))
Dim fileLocation As String = dirInfo.ToString & "data.txt"
Try
Dim csv As New CSVFile(fileLocation)
Dim ds As DataSet = csv.ToDataSet("MyTable")
If Not ds Is Nothing Then
myDataRepeater.DataSource = ds
myDataRepeater.DataMember = ds.Tables.Item(0).TableName
myDataRepeater.DataBind()
End If
ds = Nothing
csv = Nothing
Catch ex As Exception
MsgBox(ex.Message)
End Try
End Sub
Protected Sub btnSubmit_Click(ByVal sender As Object, ByVal e As System.Web.UI.ImageClickEventArgs) Handles btnSubmit.Click
Dim rowCount As Integer
rowCount = QueryCsv()
pnlSearch.Visible = False
lblResults.Visible = True
lblSearch.Text = "Search Results"
lblResults.Text = "Your search returned " & rowCount.ToString & " results"
If rowCount > 0 Then
myDataRepeater.Visible = True
pnlResults.Visible = True
btnBack.Visible = True
End If
End Sub
Protected Function QueryCsv() As Integer
Dim dirInfo As New DirectoryInfo(Server.MapPath("~/ftp/"))
Dim fileLocation As String = dirInfo.ToString & "data.txt"
Dim numberofRows As Integer
Try
Dim csv As New CSVFile(fileLocation)
Dim ds As DataSet = csv.ToDataSet("MyTable")
If Not ds Is Nothing Then
Dim strExpr As String = "PropertyID = 'P1005'"
Dim strSort As String = "PropertyID DESC"
Try
ds.Tables.Item(0).DefaultView.RowFilter = strExpr
ds.Tables.Item(0).DefaultView.Sort = strSort
myDataRepeater.DataSource = ds.Tables.Item(0).DefaultView
Catch ex As Exception
End Try
End If
numberofRows = ds.Tables("MyTable").Rows.Count
Catch ex As Exception
End Try
Return numberofRows
End Function
Why not use the built-in TextFileParser to get the data into a DataTable? Something like Paul Clement's answer in this thread
One of the ways I've done this is by using a structure array and reflection.
First, set up your structure in a module: CSVFileFields.vb
Imports System.Reflection
Public Module CSVFileFields
#Region " CSV Fields "
Public Structure CSVFileItem
Dim NAME As String
Dim ADDR1 As String
Dim ADDR2 As String
Dim CITY As String
Dim ST As String
Dim ZIP As String
Dim PHONE As String
Public Function FieldNames() As String()
Dim rtn() As String = Nothing
Dim flds() As FieldInfo = Me.GetType.GetFields(BindingFlags.Instance Or BindingFlags.Public)
If Not flds Is Nothing Then
ReDim rtn(flds.Length - 1)
Dim idx As Integer = -1
For Each fld As FieldInfo In flds
idx += 1
rtn(idx) = fld.Name
Next
End If
Return rtn
End Function
Public Function ToStringArray() As String()
Dim rtn() As String = Nothing
Dim flds() As FieldInfo = Me.GetType.GetFields(BindingFlags.Instance Or BindingFlags.Public)
If Not flds Is Nothing Then
ReDim rtn(flds.Length - 1)
Dim idx As Integer = -1
For Each fld As FieldInfo In flds
idx += 1
rtn(idx) = fld.GetValue(Me)
Next
End If
Return rtn
End Function
Public Shadows Function ToString(ByVal Delimiter As String) As String
Dim rtn As String = ""
Dim flds() As FieldInfo = Me.GetType.GetFields(BindingFlags.Instance Or BindingFlags.Public)
If Not flds Is Nothing Then
For Each fld As FieldInfo In flds
rtn &= fld.GetValue(Me) & Delimiter
Next
rtn = rtn.Substring(0, rtn.Length - 1)
End If
Return rtn
End Function
End Structure
#End Region
End Module
Next we will make our own collection out of the structure we just made. This will make it easy to use .Add() .Remove() etc for our structure. We can also remove individual items with .RemoveAt(Index). File: CSVFileItemCollection.vb
#Region " CSVFileItem Collection "
Public Class CSVFileItemCollection
Inherits System.Collections.CollectionBase
Public Sub Add(ByVal NewCSVFileItem As CSVFileItem)
Me.List.Add(NewCSVFileItem)
End Sub
Public Sub Remove(ByVal RemoveCSVFileItem As CSVFileItem)
Me.List.Remove(RemoveCSVFileItem)
End Sub
Default Public Property Item(ByVal index As Integer) As CSVFileItem
Get
Return Me.List.Item(index)
End Get
Set(ByVal value As CSVFileItem)
Me.List.Item(index) = value
End Set
End Property
Public Shadows Sub Clear()
MyBase.Clear()
End Sub
Public Shadows Sub RemoveAt(ByVal index As Integer)
Remove(Item(index))
End Sub
End Class
#End Region
Next you need your class to handle the reflection import: CSVFile.vb
Imports System.Reflection
Imports System.IO
Imports Microsoft.VisualBasic.PowerPacks
Public Class CSVFile
#Region " Private Variables "
Private _CSVFile As CSVFileItem, _Delimiter As String, _Items As New CSVFileItemCollection
#End Region
#Region " Private Methods "
Private Sub FromString(ByVal Line As String, ByVal Delimiter As String)
Dim CSVFileElements() As String = Line.Split(Delimiter)
If Not CSVFileElements Is Nothing Then
Dim fldInfo() As FieldInfo = _CSVFile.GetType.GetFields(BindingFlags.Instance Or BindingFlags.Public)
If Not fldInfo Is Nothing Then
Dim itm As System.ValueType = CType(_CSVFile, System.ValueType)
For fldIdx As Integer = 0 To CSVFileElements.Length - 1
fldInfo(fldIdx).SetValue(itm, CSVFileElements(fldIdx).Replace(Chr(34), ""))
Next
_CSVFile = itm
Else
Dim itms As Integer = 0
If Not fldInfo Is Nothing Then
itms = fldInfo.Length
End If
Throw New Exception("Invalid line definition.")
End If
Else
Dim itms As Integer = 0
If Not CSVFileElements Is Nothing Then
itms = CSVFileElements.Length
End If
Throw New Exception("Invalid line definition.")
End If
End Sub
#End Region
#Region " Public Methods "
Public Sub New()
_CSVFile = New CSVFileItem
End Sub
Public Sub New(ByVal Line As String, ByVal Delimiter As String)
_CSVFile = New CSVFileItem
_Delimiter = Delimiter
FromString(Line, Delimiter)
End Sub
Public Sub New(ByVal Filename As String)
LoadFile(Filename)
End Sub
Public Sub LoadFile(ByVal Filename As String)
Dim inFile As StreamReader = File.OpenText(Filename)
Do While inFile.Peek > 0
FromString(inFile.ReadLine, ",")
_Items.Add(_CSVFile)
_CSVFile = Nothing
Loop
inFile.Close()
End Sub
#End Region
#Region " Public Functions "
Public Function ToDataSet(ByVal TableName As String) As DataSet
Dim dsCSV As DataSet = Nothing
If Not _Items Is Nothing AndAlso _Items.Count > 0 Then
Dim flds() As FieldInfo = _Items.Item(0).GetType.GetFields(BindingFlags.Instance Or BindingFlags.Public)
If Not flds Is Nothing Then
dsCSV = New DataSet
dsCSV.Tables.Add(TableName)
For Each fld As FieldInfo In flds
'Add Column Names
With dsCSV.Tables.Item(TableName)
.Columns.Add(fld.Name, fld.FieldType)
End With
Next
'Populate Table with Data
For Each itm As CSVFileItem In _Items
dsCSV.Tables.Item(TableName).Rows.Add(itm.ToStringArray)
Next
End If
End If
Return dsCSV
End Function
#End Region
#Region " Public Properties "
Public ReadOnly Property Item() As CSVFileItem
Get
Return _CSVFile
End Get
End Property
Public ReadOnly Property Items() As CSVFileItemCollection
Get
Return _Items
End Get
End Property
#End Region
End Class
Okay a little explanation. What this class is doing is first getting the line of delimited (",") text and splitting it into a string array. Then it iterates through every field you have in your structure CSVFileItem and based on the index populates that structure variable. It doesn't matter how many items you have. You could have 1 or 1,000 so long as the order in which your structure is declared is the same as the contents you are loading. For example, your input CSV should match CSVFileItem as "Name,Address1,Address2,City,State,Zip,Phone". That is done with this loop here from the above code:
Dim fldInfo() As FieldInfo = _CSVFile.GetType.GetFields(BindingFlags.Instance Or BindingFlags.Public)
If Not fldInfo Is Nothing Then
Dim itm As System.ValueType = CType(_CSVFile, System.ValueType)
For fldIdx As Integer = 0 To CSVFileElements.Length - 1
fldInfo(fldIdx).SetValue(itm, CSVFileElements(fldIdx).Replace(Chr(34), ""))
Next
_CSVFile = itm
Else
Dim itms As Integer = 0
If Not fldInfo Is Nothing Then
itms = fldInfo.Length
End If
Throw New Exception("Invalid line definition.")
End If
To make things easy, instead of having to load the file from our main class, we can simply pass it the file path and this class will do all of the work and return a collection of our structure. I know this seems like a lot of setup, but it's worth it and you can come back and change your original structure to anything and the rest of the code will still work flawlessly!
Now to get everything going. Now you get to see how easy this is to implement with only a few lines of code. File: frmMain.vb
Public Class Form1
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Try
Dim csv As New CSVFile("C:\myfile.csv")
Dim ds As DataSet = csv.ToDataSet("MyTable")
If Not ds Is Nothing Then
'Add Labels
Dim lblSize As New Size(60, 22), lblY As Integer = 10, lblX As Integer = 10, lblSpacing As Integer = 10
For Each fldName As String In csv.Items.Item(0).FieldNames
Dim lbl As New Label
lbl.AutoSize = True
lbl.Size = lblSize
lbl.Location = New Point(lblX, lblY)
lbl.Name = "lbl" & fldName
lblX += lbl.Width + lblSpacing
lbl.DataBindings.Add(New Binding("Text", ds.Tables.Item(0), fldName, True))
drMain.ItemTemplate.Controls.Add(lbl)
Next
drMain.DataSource = ds
drMain.DataMember = ds.Tables.Item(0).TableName
End If
ds = Nothing
csv = Nothing
Catch ex As Exception
MsgBox(ex.Message)
End Try
End Sub
End Class
This really makes for some dynamic programming. You can wrap these in generic classes and call the functions for any structure. You then have some reusable code that will make your programs efficient and cut down on programming time!
Edit:
Added the ability to dump structure collection to a dataset and then dynamically fill a datarepeater.
Hope this helps. (I know this was a lot of information and seems like a lot of work, but I guarantee you that once you get this in place, it will really cut down on future projects coding time!)