How can I make this recursive function more efficient? - vb.net

I created this function to recursively copy an entire directory from an FTP server. It works just fine except that it is about 4 times slower than using FileZilla to do the same operation. It takes approximately 55 seconds to download the directory in FileZilla but it takes 229 seconds with this function. What can I do to make it download/run faster?
Private Sub CopyEntireDirectory(ByVal directory As String)
Dim localPath = localDirectory & formatPath(directory)
'creates directory in destination path
IO.Directory.CreateDirectory(localPath)
'Gets the directory details so I can separate folders from files
Dim fileList As ArrayList = Ftp.ListDirectoryDetails(directory, "")
For Each item In fileList
'checks if it's a folder or file: d=folder
If (item.ToString().StartsWith("d")) Then
'gets the directory from the details
Dim subDirectory As String = item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
CopyEntireDirectory(directory & "/" & subDirectory)
Else
Dim remoteFilePath As String = directory & "/" & item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
Dim destinationPath = localPath & "\" & item.ToString().Substring(item.ToString().LastIndexOf(" ") + 1)
'downloads file to destination directory
Ftp.DownLoadFile(remoteFilePath, destinationPath)
End If
Next
End Sub
Below is the download function that is taking up all the time.
Public Sub DownLoadFile(ByVal fromFilename As String, ByVal toFilename As String)
Dim files As ArrayList = Me.ListDirectory(fromFilename, "")
Dim request As FtpWebRequest = Me.CreateRequestObject(fromFilename)
request.Method = WebRequestMethods.Ftp.DownloadFile
Dim response As FtpWebResponse = CType(request.GetResponse(), FtpWebResponse)
If response.StatusCode <> FtpStatusCode.OpeningData AndAlso response.StatusCode <> FtpStatusCode.DataAlreadyOpen Then
Throw New ApplicationException(Me.BuildCustomFtpErrorMessage(request, response))
End If
Dim fromFilenameStream As Stream = response.GetResponseStream()
Dim toFilenameStream As FileStream = File.Create(toFilename)
Dim buffer(BLOCK_SIZE) As Byte
Dim bytesRead As Integer = fromFilenameStream.Read(buffer, 0, buffer.Length)
Do While bytesRead > 0
toFilenameStream.Write(buffer, 0, bytesRead)
Array.Clear(buffer, 0, buffer.Length)
bytesRead = fromFilenameStream.Read(buffer, 0, buffer.Length)
Loop
response.Close()
fromFilenameStream.Close()
toFilenameStream.Close()
End Sub

The slowness would obviously be within then FTP commands. Running your other code recursively would likely be able to run a million times per second because there is nothing to it.
The FTP download (whatever it is) should have the ability to define the size of the chunks is grabs. This will be key in your speed. It needs to be optimized based on your connection speed and file size. There is no RIGHT number for everyone.
EDIT
Based on the new code, the issue is in your BLOCK_SIZE which I assume is a constant. Play with the size of this to get your optimal speed.
HINT: This should be a multiple of 1024

Related

How to convert encoding of FTP Getlisting array of strings?

I am using the following vb code to get the list of files in a ftp directory and populate a database table with it to be used in another integration process. Please forgive my bad bad programming skills (I am not a vb.net developer).
Public Sub Main()
Dim StrFolderArrary As String() = Nothing
Dim StrFileArray As String() = Nothing
Dim fileName As String
Dim RemotePath As String
RemotePath = Dts.Variables("User::FTPFullPath").Value.ToString()
Dim ADODBConnection As SqlClient.SqlConnection
ADODBConnection = DirectCast(Dts.Connections("DB_Connection").AcquireConnection(Dts.Transaction), SqlClient.SqlConnection)
Dim cm As ConnectionManager = Dts.Connections("FTP_Connection") 'FTP connection manager name
Dim ftp As FtpClientConnection = New FtpClientConnection(cm.AcquireConnection(Nothing))
ftp.Connect() 'Connecting to FTP Server
ftp.SetWorkingDirectory(RemotePath) 'Provide the Directory on which you are working on FTP Server
ftp.GetListing(StrFolderArrary, StrFileArray) 'Get all the files and Folders List
'If there is no file in the folder, strFile Arry will contain nothing, so close the connection.
If StrFileArray Is Nothing Then
ftp.Close()
'If Files are there, Loop through the StrFileArray arrary and insert into table
Else
For Each fileName In StrFileArray
'MessageBox.Show(fileName)
Dim SQLCommandText As String
SQLCommandText = "INSERT INTO dbo._FTPFileList ([DirName],[FileName]) VALUES (N'" + RemotePath + "', N'" + fileName + "')"
'MessageBox.Show(SQLCommandText)
Dim cmdDatabase As SqlCommand = New SqlCommand(SQLCommandText, ADODBConnection)
cmdDatabase.ExecuteNonQuery()
Next
ftp.Close()
End If
' Add your code here
'
Dts.TaskResult = ScriptResults.Success
End Sub
It works fine and I get the results in the database table. The problem is that the encoding of the strings coming from FTP makes the file names with accentuation to be written incorrectly as shown in the example below.
database table
The correct file name is Razão and I know that the db collation is correct since it can be written like this.
So I tried to convert the strings using this code for each file name in the string array but without any success.
For Each fileName In StrFileArray
Dim utf8 As UTF8Encoding = New UTF8Encoding(True, True)
Dim bytes As Byte() = New Byte(utf8.GetByteCount(fileName) + utf8.GetPreamble().Length - 1) {}
Array.Copy(utf8.GetPreamble(), bytes, utf8.GetPreamble().Length)
utf8.GetBytes(fileName, 0, fileName.Length, bytes, utf8.GetPreamble().Length)
Dim fileName2 As String = utf8.GetString(bytes, 0, bytes.Length)
I believe it is coming with different encoding from the FTP side so I would like to know how to convert the strings during the GetListing method.
Or do you have any ideas how to deal with this?
Thanks in advance.
edit:
I also tried the following code without success.
Dim utf8 As Encoding = Encoding.UTF8
Dim w1252 As Encoding = Encoding.GetEncoding(1252)
Dim w1252Bytes As Byte() = w1252.GetBytes(fileName)
Dim utf8Bytes As Byte() = Encoding.Convert(w1252, utf8, w1252Bytes)
Dim utf8Chars As Char() = New Char(utf8.GetCharCount(utf8Bytes, 0, utf8Bytes.Length) - 1) {}
utf8.GetChars(utf8Bytes, 0, utf8Bytes.Length, utf8Chars, 0)
Dim fileName2 As String = New String(utf8Chars)

How can I improve the efficiency of my simple file-splitting program

I have a simple program that reads a .txt file, and then splits it up into many files of "pMaxRows" number of rows. These .txt files are huge - some are nearly 25Gb. Right now it is not running fast enough for my liking, I feel that there should be a way to improve the efficiency by maybe reading/writing multiple lines at once, but I am not very experienced with vb.net streamreader/streamwriter.
Code is below:
Public Sub Execute(ByVal pFileLocation As String, _
ByVal pMaxRows As Int32)
Dim sr As IO.StreamReader
Dim Row As String
Dim SourceRowCount As Int64
Dim TargetRowCount As int64
Dim TargetFileNumber As Int32
''Does the file exist in that location?
If IO.File.Exists(pFileLocation) = False Then
Throw New Exception("File does not exist at " & pFileLocation)
End If
''Split FileLocation into FileName and Folder Location
Dim arrFileLoc() As String = pFileLocation.Split("\")
Dim i As Integer = arrFileLoc.Length - 1
Dim FileName As String = arrFileLoc(i)
Dim FileLocationLength As Integer = pFileLocation.Length
Dim FileNameLength As Integer = FileName.Length
Dim Folder As String = pFileLocation.Remove(FileLocationLength - FileNameLength, FileNameLength)
''Read the file
sr = New IO.StreamReader(pFileLocation)
SourceRowCount = 0
TargetRowCount = 0
TargetFileNumber = 1
''Create First Target File Name
Dim TargetFileName As String
TargetFileName = TargetFileNumber & "_" & FileName
''Open streamreader and start reading lines
Do While Not sr.EndOfStream
''if it hits the target number of rows:
If (TargetRowCount = pMaxRows) Then
''Advance target file number
TargetFileNumber += 1
''Create New file with target file number
TargetFileName = TargetFileNumber & "_" & FileName
''Set target row count back to 0
TargetRowCount = 0
End If
''Read line
Row = sr.ReadLine()
''Write line
Using sw As New StreamWriter(Folder & TargetFileName, True)
sw.WriteLine(Row)
End Using
SourceRowCount += 1
TargetRowCount += 1
Loop
End Sub
Anyone have any suggestions? Even directing me to the right place if this has been answered before would be much appreciated

How to get the file's size using FtpWebResponse

Trying to download file from a FTP to get download progress i have planned to implement backgroundWorker,the following code will display the download progress,speed,amount of kb downloading in the UI
Following is the code i wrote in backgroundWorker_doWork
'Creating the request and getting the response
Dim theResponse As FtpWebResponse
Dim theRequest As FtpWebRequest
Try
'Checks if the file exist
theRequest = WebRequest.Create(Me.txtFileName.Text)
theResponse = theRequest.GetResponse
Catch ex As Exception
MessageBox.Show("An error occurred while downloading file. Possible causes:" & ControlChars.CrLf & _
"1) File doesn't exist" & ControlChars.CrLf & _
"2) Remote server error", "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
Dim cancelDelegate As New DownloadCompleteSafe(AddressOf DownloadComplete)
Me.Invoke(cancelDelegate, True)
Exit Sub
End Try
Dim length As Long = theResponse.ContentLength 'Size of the response (in bytes)
Dim safedelegate As New ChangeTextsSafe(AddressOf ChangeTexts)
Me.Invoke(safedelegate, length, 0, 0, 0) 'Invoke the TreadsafeDelegate
Dim writeStream As New IO.FileStream(Me.whereToSave, IO.FileMode.Create)
'Replacement for Stream.Position (webResponse stream doesn't support seek)
Dim nRead As Integer
'To calculate the download speed
Dim speedtimer As New Stopwatch
Dim currentspeed As Double = -1
Dim readings As Integer = 0
Do
If BackgroundWorker1.CancellationPending Then 'If user abort download
Exit Do
End If
speedtimer.Start()
Dim readBytes(4095) As Byte
Dim bytesread As Integer = theResponse.GetResponseStream.Read(readBytes, 0, 4096)
nRead += bytesread
Dim percent As Short = (nRead * 100) / length
Me.Invoke(safedelegate, length, nRead, percent, currentspeed)
If bytesread = 0 Then Exit Do
writeStream.Write(readBytes, 0, bytesread)
speedtimer.Stop()
readings += 1
If readings >= 5 Then 'For increase precision, the speed it's calculated only every five cicles
currentspeed = 20480 / (speedtimer.ElapsedMilliseconds / 1000)
speedtimer.Reset()
readings = 0
End If
Loop
'Close the streams
theResponse.GetResponseStream.Close()
writeStream.Close()
If Me.BackgroundWorker1.CancellationPending Then
IO.File.Delete(Me.whereToSave)
Dim cancelDelegate As New DownloadCompleteSafe(AddressOf DownloadComplete)
Me.Invoke(cancelDelegate, True)
Exit Sub
End If
Dim completeDelegate As New DownloadCompleteSafe(AddressOf DownloadComplete)
Me.Invoke(completeDelegate, False)
following is the error i got
Dim length As Long = theResponse.ContentLength getting -1
For example
ftp://username:mypassword#ftp.drivehq.com/masters/5/party/party.csv
above given is the ftp Url am passing to download file, here party.csv is the file to download
NOTE : the problem is when getting the file size of the file to download in my case the size of party,csv, Dim length As Long = theResponse.ContentLength. here theResponse.ContentLength getting -1 the actual size of the file is 420KB if i change
Dim length As Long =430080(420kb=430080bytes) the code will work
Found the following solution to get the exact method to find the file's size
Dim request As FtpWebRequest = DirectCast(WebRequest.Create(Me.txtFileName.Text), FtpWebRequest)
request.Method = WebRequestMethods.Ftp.GetFileSize
Dim response As FtpWebResponse = DirectCast(request.GetResponse(), FtpWebResponse)
Dim fileSize As Long = response.ContentLength

Can't write to files properly in VB.NET

I am trying to make a program that writes the binary code of a file in a separate text file.
When I use this program on a text file, it doesn't write anything in the new file. I then tested this for .jpg and .mp3 files and the program seems to write most of the binary code but leaves out the last couple of bytes. Here is my code:
Sub Main()
Console.Write("Filename: ")
Dim Filename As String = Console.ReadLine()
Console.Write("Extension: ")
Dim Extension As String = Console.ReadLine()
Console.WriteLine()
Dim Stream_1 As FileStream = New FileStream(Filename & "." & Extension, FileMode.Open)
Dim Stream_2 As FileStream = New FileStream(Filename & "_b.txt", FileMode.Create)
Dim Reader_1 As BinaryReader = New BinaryReader(Stream_1)
Dim Writer_2 As StreamWriter = New StreamWriter(Stream_2)
Dim File_Bytes() As Byte = Reader_1.ReadBytes(Convert.ToInt32(Stream_1.Length))
Dim Binary_String As String = ""
'These are used to a add line break after every 8 bytes
Dim Binary_String_Collection As String = ""
Dim Counter As Integer
For Each File_Byte In File_Bytes
Counter += 1
Binary_String = Convert.ToString(File_Byte, 2)
For I = 1 To 8 - Binary_String.Length
Binary_String = "0" & Binary_String
Next
Binary_String_Collection = Binary_String_Collection & Binary_String & " "
If Counter = 8 Then
Writer_2.WriteLine(Binary_String_Collection)
Counter = 0
Binary_String_Collection = ""
End If
Next
If Binary_String_Collection <> "" Then
Writer_2.WriteLine(Binary_String_Collection)
End If
Console.ReadLine()
End Sub
At first I thought that my program wasn't reading the binary code properly so I added console outputs at locations where it writes to the file. The program displayed correct output so I'm confused why it isn't writing properly.
Make sure you close the file and Dispose of the streams correctly.

VB.NET - download zip in Memory and extract file from memory to disk

I'm having some trouble with this, despite finding examples. I think it may be an encoding problem, but I'm just not sure. I am trying to programitally download a file from a https server, that uses cookies (and hence I'm using httpwebrequest). I'm debug printing the capacity of the streams to check, but the output [raw] files look different. Have tried other encoding to no avail.
Code:
Sub downloadzip(strURL As String, strDestDir As String)
Dim request As HttpWebRequest
Dim response As HttpWebResponse
request = Net.HttpWebRequest.Create(strURL)
request.UserAgent = strUserAgent
request.Method = "GET"
request.CookieContainer = cookieJar
response = request.GetResponse()
If response.ContentType = "application/zip" Then
Debug.WriteLine("Is Zip")
Else
Debug.WriteLine("Is NOT Zip: is " + response.ContentType.ToString)
Exit Sub
End If
Dim intLen As Int64 = response.ContentLength
Debug.WriteLine("response length: " + intLen.ToString)
Using srStreamRemote As StreamReader = New StreamReader(response.GetResponseStream(), Encoding.Default)
'Using ms As New MemoryStream(intLen)
Dim fullfile As String = srStreamRemote.ReadToEnd
Dim memstream As MemoryStream = New MemoryStream(New UnicodeEncoding().GetBytes(fullfile))
'test write out to flie
Dim data As Byte() = memstream.ToArray()
Using filestrm As FileStream = New FileStream("c:\temp\debug.zip", FileMode.Create)
filestrm.Write(data, 0, data.Length)
End Using
Debug.WriteLine("Memstream capacity " + memstream.Capacity.ToString)
'Dim strData As String = srStreamRemote.ReadToEnd
memstream.Seek(0, 0)
Dim buffer As Byte() = New Byte(2048) {}
Using zip As New ZipInputStream(memstream)
Debug.WriteLine("zip stream cap " + zip.Length.ToString)
zip.Seek(0, 0)
Dim e As ZipEntry
Dim flag As Boolean = True
Do While flag ' daft, but won't assign e=zip... tries to evaluate
e = zip.GetNextEntry
If IsNothing(e) Then
flag = False
Exit Do
Else
e.UseUnicodeAsNecessary = True
End If
If Not e.IsDirectory Then
Debug.WriteLine("Writing out " + e.FileName)
' e.Extract(strDestDir)
Using output As FileStream = File.Open(Path.Combine(strDestDir, e.FileName), _
FileMode.Create, FileAccess.ReadWrite)
Dim n As Integer
Do While (n = zip.Read(buffer, 0, buffer.Length) > 0)
output.Write(buffer, 0, n)
Loop
End Using
End If
Loop
End Using
'End Using
End Using 'srStreamRemote.Close()
response.Close()
End Sub
So I get the right size file downloaded, but dotnetzip does not recognise it, and the files that get copied out are incomplete/invalid zips. I've spent most of today on this, and am ready to give up.
I think the answer will be to break down the problem, and perhaps change a couple aspects in the code.
For example, lets get rid of converting the response stream to a string:
Dim memStream As MemoryStream
Using rdr As System.IO.Stream = response.GetResponseStream
Dim count = Convert.ToInt32(response.ContentLength)
Dim buffer = New Byte(count) {}
Dim bytesRead As Integer
Do
bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
Loop Until bytesRead = count
rdr.Close()
memStream = New MemoryStream(buffer)
End Using
Next, there's an easier way to output the contents of a memory stream to a file. Consider your code
Dim data As Byte() = memstream.ToArray()
Using filestrm As FileStream = New FileStream("c:\temp\debug.zip", FileMode.Create)
filestrm.Write(data, 0, data.Length)
End Using
can be replaced with
Using filestrm As FileStream = New FileStream("c:\temp\debug.zip", FileMode.Create)
memstream.WriteTo(filestrm)
End Using
That eliminates the need to transfer your memory stream into another byte array, and then push the byte array down the stream, when in fact the memory stream can transfer data directly to file (via the filestream) saving the middle-man buffer.
I'll admit I haven't worked with the Zip/compression libraries you're using, but with the above amendments you have removed unnecessary transfers between streams, byte arrays, strings, etc, and hopefully eliminated the encoding issues you were having.
Give that a try and let us know how you get on. Consider attempting to open the file that you saved ("C:\temp\debug.zip") to see if it is listed as corrupt. If not, then you know at least as far as that in the code, it is working ok.
I thought I'd post my full working solution to my own question, it combines the two excellent replies I've had, thank you guys.
Sub downloadzip(strURL As String, strDestDir As String)
Try
Dim request As HttpWebRequest
Dim response As HttpWebResponse
request = Net.HttpWebRequest.Create(strURL)
request.UserAgent = strUserAgent
request.Method = "GET"
request.CookieContainer = cookieJar
response = request.GetResponse()
If response.ContentType = "application/zip" Then
Debug.WriteLine("Is Zip")
Else
Debug.WriteLine("Is NOT Zip: is " + response.ContentType.ToString)
Exit Sub
End If
Dim intLen As Int32 = response.ContentLength
Debug.WriteLine("response length: " + intLen.ToString)
Dim memStream As MemoryStream
Using stmResponse As IO.Stream = response.GetResponseStream()
'Using ms As New MemoryStream(intLen)
Dim buffer = New Byte(intLen) {}
'Dim memstream As MemoryStream = New MemoryStream(buffer)
Dim bytesRead As Integer
Do
bytesRead += stmResponse.Read(buffer, bytesRead, intLen - bytesRead)
Loop Until bytesRead = intLen
memStream = New MemoryStream(buffer)
Dim res As Boolean = False
res = ZipExtracttoFile(memStream, strDestDir)
End Using 'srStreamRemote.Close()
response.Close()
Catch ex As Exception
'to do :)
End Try
End Sub
Function ZipExtracttoFile(strm As MemoryStream, strDestDir As String) As Boolean
Try
Using zip As ZipFile = ZipFile.Read(strm)
For Each e As ZipEntry In zip
e.Extract(strDestDir)
Next
End Using
Catch ex As Exception
Return False
End Try
Return True
End Function
You can download into a MemoryStream, then examine it:
Public Sub Download(url as String)
Dim req As HttpWebRequest = System.Net.WebRequest.Create(url)
req.Method = "GET"
Dim resp As HttpWebResponse = req.GetResponse()
If resp.ContentType = "application/zip" Then
Console.Error.Write("The result is a zip file.")
Dim length As Int64 = resp.ContentLength
If length = -1 Then
Console.Error.WriteLine("... length unspecified")
length = 16 * 1024
Else
Console.Error.WriteLine("... has length {0}", length)
End If
Dim ms As New MemoryStream
CopyStream(resp.GetResponseStream(), ms) '' **see note below!!!!
'' list contents of the zip file
ms.Seek(0,SeekOrigin.Begin)
Using zip As ZipFile = ZipFile.Read (ms)
Dim e As ZipEntry
Console.Error.WriteLine("Entries:")
Console.Error.WriteLine(" {0,22} {1,10} {2,12}", _
"Name", "compressed", "uncompressed")
Console.Error.WriteLine("----------------------------------------------------")
For Each e In zip
Console.Error.WriteLine(" {0,22} {1,10} {2,12}", _
e.FileName, _
e.CompressedSize, _
e.UncompressedSize)
Next
End Using
Else
Console.Error.WriteLine("The result is Not a zip file.")
CopyStream(resp.GetResponseStream(), Console.OpenStandardOutput)
End If
End Sub
Private Shared Sub CopyStream(input As Stream, output As Stream)
Dim buffer(32768 - 1) As Byte
Dim n As Int32
Do
n = input.Read(buffer, 0, buffer.Length)
If n = 0 Then Exit Do
output.Write(buffer, 0, n)
Loop
End Sub
EDIT
Just one note - I would not advise using this code (this approach) if the Zip file is very large. How large is "very large"? Well that depends, of course. The code I suggested above downloads the file into a memory stream, which of course means the entire contents of the zip file are held in memory. If it is a 28kb zip file, then there's no problem. But if it is a 2gb zip file, then you may have a big problem.
In that case you will want to stream it to a temporary file on disk, not to a MemoryStream. I'll leave that as an exercise for the reader.
The above will work for "reasonably sized" zip files, where "reasonable" depends on your machine configuration and application scenario.