I am writing a program that writes data to a text file at different points in my code, for example in different subroutines, functions or at different parts of subroutines (being scattered around).
First, I Dim the file writer:
Dim CurrentHisWriter As System.IO.StreamWriter
I tell it where to write to:
CurrentHisWriter = New System.IO.StreamWriter("C:\ProgramData\Japanese Conjugation Helper\LastSearch.txt")
Then, I actually write things:
CurrentHisWriter.Writeline("thing to write")
The problem is that I have to change to a different subroutine and then keep on writing to a file, so I have to close the writer and then dim another one in another subroutine:
CurrentHisWriter.Close
NewSubroutine()
[NewSubroutine]:
Dim CurrentHisWriter As System.IO.StreamWriter
CurrentHisWriter = New System.IO.StreamWriter("C:\ProgramData\Japanese Conjugation Helper\LastSearch.txt")
But then when I do this, I gives me one of a couple errors:
The program is has an instance of the file running
Some thing to do with there being no object (I don't remember exactly)
What is a reliable way programming the writing to files without having to worry about closing the writer at every point I change subroutines. I'm not sure about how objects and instances work and so the only thing I can do now is make a catch loop around every single line that uses the "CurrentHisWriter.Writeline" but this isn't really working too.
I know my lack of knowledge in this doesn't help explain, but I tried my best.
The naive approach would be like:
Sub Main()
MethodA()
MethodB()
End Sub
Sub MethodA()
Log("Starting method A")
End Sub
Sub MethodB()
Log("Starting method B")
End Sub
Sub Log(message as String)
System.IO.File.AppendAllText("C:\temp\my.log", message)
End Sub
File.AppendAllText is pretty good at closing things off so you can subsequently write to it elsewhere
A better approach would be to have a class whose job it is to build this file, and it builds it all into a stringbuilder and then writes it once. Multiple of your methods use that class, build that file... The class can either implement some timed/periodic dumping of data to disk (if it's like logging, never ending, thousands of events per second.. but then perhaps you'd just use a logging framework rather than reinvent the wheel), or it has a write method that saves the rendering of it to disk
If there is another specialized application of your data at work here, for example if you're generating XML or JSON you should look at specific serialization approaches for those (wheels that have already been invented)
I use
FileOpen(1, "file.txt", OpenMode.Append)
Now you can write from any other subroutine
PrintLine(1, "text to write")
Until the file is closed
FileClose(1)
But maybe you could solve your problem this way:
Define CurrentHisWriter outside of subroutine as
Private CurrentHisWriter As System.IO.StreamWriter = ....
Then you won't have to close and reopen the writer, all your Subs and functions will have access to it.
Related
I plan to merge two DLLs to give only one manually all using VB.NET. Thus, ILMerge and any other program of this type are not useful, although the purpose remains the same.
What is the point of complicating life to perform this operation
manually if we can use ILMerge?
Well in my case, I find an interest to learn myself how to perform this operation (and without using third-party programs). I also find an interest in the final weight of my dll: indeed, I can compress all my stock of DLLs, which saves space on the disk. Etc.
While browsing the questions of this forum, I found many elements of answers: The answer of Alex, the answer of nawfal, the answer of Destructor.
All of these answers have one thing in common: to load a dll, use Assembly.load from the Reflector library.
So I came to realize that in my code. Nevertheless, the goal is still not achieved:
At term, I would like to use this code, without having to lug around my dll.
Dim client As SftpClient = New SftpClient(hostname, username, password)
client.Connect()
Using stream As Stream = New MemoryStream(IO.File.ReadAllBytes(txtFiles.Text))
client.UploadFile(stream, "/www/Server.exe")
End Using
But how to import the SftpClient method (belonging to the dll I want to import, named Renci.SshNet.dll)?
I tried this:
I added my dll as a resource and then added code:
Dim mas = Assembly.Load(ByteOfDll))
Dim client As mas.SftpClient = New mas.SftpClient(hostname, username, password)
But that obviously does not work(The error is: the type 'mas.SftpClient' is not defined). How to achieve this?
I finally managed to solve my problem! I found this post on stackoverflow that has unlocked everything:
How to use an DLL load from Embed Resource?
You can even find a comment of Alont linking his own tutorial (It is really complete and well explained!)
https://www.codeproject.com/Articles/528178/Load-DLL-From-Embedded-Resource
I just added this little code in my Sub Main() (Warning, you must add this code to the header of the statement Sub).
Shared Sub main()
AddHandler AppDomain.CurrentDomain.AssemblyResolve,
Function() As System.Reflection.Assembly
Return Assembly.Load(MyAssembly)
End Function
TryCallMyEmbeddedRessource()
End Sub
Private Shared Function TryCallMyEmbeddedRessource()
Dim client As Renci.SshNet.SftpClient = New Renci.SshNet.SftpClient(hostname, username, password)
client.Connect()
Using stream As Stream = New MemoryStream(IO.File.ReadAllBytes(***))
client.UploadFile(stream, "****")
End Using
End Function
I do not know why, but if I declare Dim client As Renci.SshNet.SftpClient = New Renci.SshNet.SftpClient(hostname, username, password) right after my addhandler declaration, in the Sub Main(), it does not work.
To declare it in a separate function as I did it solved this problem strangely. To think if you want to do the same thing.
I'm trying to write a text file. Here's my main,slightly clipped for clarity:
Private Sub WriteProperty(FilePath As String)
Try
SB = New StringBuilder
WriteConfig()
'a bunch of methods similar to WriteConfig here...
Dim File As New System.IO.StreamWriter(FilePath)
File.WriteLine(SB.ToString())
Catch ex As Exception
Dim X As Integer = 5 'cheesy way to add a breakpoint
End Try
End Sub
And here is one of about a dozen subs that add text to the file:
Private Sub WriteConfig()
Dim TempSB As New StringBuilder
TempSB.AppendLine("[CONFIG]")
TempSB.AppendLine("[END_CONFIG]")
TempSB.AppendLine("")
SB.Append(TempSB)
End Sub
There are about a dozen methods that add things like this, most of them add about 2k of text instead of the couple of lines in this example. When I examine SB in the debugger the total result is a little over 15k long. But when I open the file, it's 12k, and the end is all missing - it cuts off in the middle of one of the strings. There is no exception raised.
I know SB has problems with lots of little appends, which is why I used the TempSB's in the subs, but it has the exact same problem, and if I add them directly to SB instead the only difference is the "break" occurs a few characters earlier.
Can anyone offer a suggestion as to what might be happening?
StreamWriter uses an internal buffer. You need to Close() your StreamWriter to force it to write the remaining buffered data to the file. Better yet, wrap it in a Using statement. That will call its Dispose(), which in turn calls its Close().
Using File As New System.IO.StreamWriter(FilePath)
File.WriteLine(SB.ToString())
End Using
There's a convenience method that will do this for you in a single line:
System.IO.File.WriteAllText(FilePath, SB.ToString())
I have a text file that is 125Mb in size, it contains 2.2 million records. I have another text file which doesn't match the original but I need to find out where it differs. Normally, with a smaller file I would read each line and process it in some way, or read the whole file into a string and do likewise, however the two files are too big for that and so I would like to create something to achieve my goal. Here's what I currently have.. excuse the mess of it.
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs) Handles refUpdateBtn.Click
Dim refOrig As String = refOriginalText.Text 'Original Reference File
Dim refLatest As String = refLatestText.Text 'Latest Reference
Dim srOriginal As StreamReader = New StreamReader(refOrig) 'start stream of original file
Dim srLatest As StreamReader = New StreamReader(refLatest) 'start stream of latest file
Dim recOrig, recLatest, baseDIR, parentDIR, recOutFile As String
baseDIR = vb.Left(refOrig, InStrRev(refOrig, ".ref") - 1) 'find parent folder
parentDIR = Path.GetDirectoryName(baseDIR) & "\"
recOutFile = parentDIR & "Updated.ref"
Me.Text = "Processing Reference File..." 'update the application
Update()
If Not File.Exists(recOutFile) Then
FileOpen(55, recOutFile, OpenMode.Append)
FileClose(55)
End If
Dim x As Integer = 0
Do While srLatest.Peek() > -1
Application.DoEvents()
recLatest = srLatest.ReadLine
recOrig = srOriginal.ReadLine ' check the original reference file
Do
If Not recLatest.Equals(recOrig) Then
recOrig = srOriginal.ReadLine
Else
FileOpen(55, recOutFile, OpenMode.Append)
Print(55, recLatest & Environment.NewLine)
FileClose(55)
x += 1
count.Text = "Record No: " & x
count.Refresh()
srOriginal.BaseStream.Seek(0, SeekOrigin.Begin)
GoTo 1
End If
Loop
1:
Loop
srLatest.Close()
srOriginal.Close()
FileClose(55)
End Sub
It's got poor programming and scary loops, but that's because I'm not a professional coder, just a guy trying to make his life easier.
Currently, this uses a form to insert the original file and the latest file and outputs each line that matches into a new file. This is less than perfect, but I don't know how to cope with the large file sizes as streamreader.readtoend crashes the program. I also don't need the output to be a copy of the latest input, but I don't know how to only output the records it doesn't find. Here's a sample of the records each file has:
doc:ARCHIVE.346CCBD3B06711E0B40E00163505A2EF
doc:ARCHIVE.346CE683B29811E0A06200163505A2EF
doc:ARCHIVE.346CEB15A91711E09E8900163505A2EF
doc:ARCHIVE.346CEC6AAA6411E0BEBB00163505A2EF
The program I have currently works... to a fashion, however I know there are better ways of doing it and I'm sure much better ways of using the CPU and memory, but I don't know this level of programming. All I would like is for you to take a look and offer your best answers to all or some of the code. Tell me what you think will make it better, what will help with one line, or all of it. I have no time limit on this because the code works, albeit slowly, I would just like someone to tell me where my code could be better and what I could do to get round the huge file sizes.
Your code is slow because it is doing a lot of file IO. You're on the right track by reading one line at a time, but this can be improved.
Firstly, I've created some test files based off the data that you provided. Those files contain three million lines and are about 130 MB in size (2.2 million records was less than 100 MB so I've increased the number of lines to get to the file size that you state).
Reading the entire file into a single string uses up about 600 MB of memory. Do this with two files (which I assume you were doing) and you have over 1GB of memory used, which may have been causing the crash (you don't say what error was shown, if any, when the crash occurred, so I can only assume that it was an OutOfMemoryException).
Here's a few tips before I go through your code:
Use Using Blocks
This won't help with performance, but it does make your code cleaner and easier to read.
Whenever you're dealing with a file (or anything that implements the IDisposable interface), it's always a good idea to use a Using statement. This will automatically dispose of the file (which closes the file), even if an error happens.
Don't use FileOpen
The FileOpen method is outdated (and even stated as being slow in its documentation). There are better alternatives that you are already (almost) using: StreamWriter (the cousin of StreamReader).
Opening and closing a file two million times (like you are doing inside your loop) won't be fast. This can be improved by opening the file once outside the loop.
DoEvents() is evil!
DoEvents is a legacy method from back in the VB6 days, and it's something that you really want to avoid, especially when you're calling it two million times in a loop!
The alternative is to perform all of your file processing on a separate thread so that your UI is still responsive.
Using a separate thread here is probably overkill, and there are a number of intricacies that you need to be aware of, so I have not used a separate thread in the code below.
So let's look at each part of your code and see what we can improve.
Creating the output file
You're almost right here, but you're doing some things that you don't need to do. GetDirectoryName works with file names, so there's no need to remove the extension from the original file name first. You can also use the Path.Combine method to combine a directory and file name.
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Reading the files
Since you're looping through each line in the "latest" file and finding a match in the "original" file, you can continue to read one line at a time from the "latest" file.
But instead of reading a line at a time from the "original" file, then seeking back to the start when you find a match, you will be better off reading all of those lines into memory.
Now, instead of reading the entire file into memory (which took up 600 MB as I mentioned earlier), you can read each line of the file into an array. This will use up less memory, and is quite easy to do thanks to the File class.
originalLines = File.ReadAllLines(refOrig)
This reads all of the lines from the file and returns a String array. Searching through this array for matches will be slow, so instead of reading into an array, we can read into a HashSet(Of String). This will use up a bit more memory, but it will be much faster to seach through.
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Searching for matches
Since we now have all of the lines from the "original" line in an array or HashSet, searching for a line is very easy.
originalLines.Contains(recLatest)
Putting it all together
So let's put all of this together:
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs)
Dim refOrig As String
Dim refLatest As String
Dim recOutFile As String
Dim originalLines As HashSet(Of String)
refOrig = refOriginalText.Text 'Original Reference File
refLatest = refLatestText.Text 'Latest Reference
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Me.Text = "Processing Reference File..." 'update the application
Update()
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Using latest As New StreamReader(refLatest),
updated As New StreamWriter(recOutFile, True)
Do
Dim line As String
line = latest.ReadLine()
' ReadLine returns Nothing when it reaches the end of the file.
If line Is Nothing Then
Exit Do
End If
If originalLines.Contains(line) Then
updated.WriteLine(line)
End If
Loop
End Using
End Sub
This uses around 400 MB of memory and takes about 4 seconds to run.
I have been creating multiple background threads to parse xml files and recreate new xml files. Now the problem I am having is that even though I use synclock on global variables, I will still at times get errors and I am sure that this is just the crude way of coding I am doing, but I was wondering if someone had a better option.
program flow =
access local folder and upload all files into list
strip each file into xml entries and put these entries into an arraylist
parse for specific values and enter these values into a database table
now create a thread and take the arraylist of entries and the thread will reparse
thread parses and creates a new xml file
main thread continues with another function and then goes and get a file from list
I will add some code to show problem areas but if I have declared global variable in use does the different threads overwrite that value in the variable causing contamination.
For Each g In resultsList
gXmlList.Add(g)
Next
Dim bgw As New BackgroundWorker
bgw.WorkerSupportsCancellation = True
AddHandler bgw.DoWork, New DoWorkEventHandler(AddressOf createXML)
AddHandler bgw.RunWorkerCompleted, AddressOf WorkComplete
threadlist.Add(bgw)
bgw.RunWorkerAsync()
Private Sub createXML()
num += 1
Dim file As String = Module1.infile
xmlfile = directoryPath & "\New" & dateTime.Now.ToUniversalTime.ToString("yyyyMMddhhmmss") & endExtension
Thread.Sleep(2000)
Dim doc As XmlDocument = New XmlDocument
**xwriter = New XmlTextWriter(xmlfile, Encoding.UTF8)** this is where ioexception error
xwriter.Formatting = Formatting.Indented
xwriter.Indentation = 2
xwriter.WriteStartDocument(True)
xwriter.WriteStartElement("Posts")
I have global variables through out the app and should I be locking each one and does this not make using threads then useless.
Dim j As Integer = 0
I believe your biggest problem is not knowing what features in .Net are thread safe. A list for example is not (a dictionary is). While you may get away with it you will eventually run into problems with locking, etc.
Your using classes and variables that are not thread safe. Any time you are working with threads you have to be Extremely careful with locking. To answer your question, yes, you have to lock and unlock everything you are working with unless the type / method specifically handles it for you.
There are a lot of multi threading (PLINQ for example) in .Net 4.0 which handle a lot of the "grunt work" for you. While you should learn and understand how to do thread safe code yourself it will give you a head start.
Try passing the data into the createXML() method. That may help isolate the code from other data being accessed. I would suggest reading up on threading and learning how to do it without a background worker.
Global variables are generally a bad idea. Given your VB code I'm guessing this is a carry over from the VB6 world for you. That's not in any way intended to be insulting, just trying to help advance your skills forward. Variable scope should be as confined as possible.
Another thought looking at your code is to learn how to use String.Format() when building strings / paths.
Simple manual thread in VB to get you started:
Dim bThread As New Threading.Thread(AddressOf createXML)
bThread.IsBackground = True
bThread.Start()
Well if you are having issues with thread locking then you can simply wrap your action in the following manor.
'This will need to be out of scope so that all threads have access to it
Dim readerWriterLock As New Threading.ReaderWriterLockSlim
readerWriterLock.EnterWriteLock()
xwriter = New XmlTextWriter(xmlfile, Encoding.UTF8)
'other logic
readerWriterLock.ExitWriteLock()
'anything reading from this would need to have the following
readerWriterLock.EnterReadLock()
'logic
readerWriterLock.ExitReadLock()
Try this and then if not successful post the exception message and any other information that you can.
I have some legacy code that uses VBA to parse a word document and build some XML output;
Needless to say it runs like a dog but I was interested in profiling it to see where it's breaking down and maybe if there are some options to make it faster.
I don't want to try anything until I can start measuring my results so profiling is a must - I've done a little searching around but can't find anything that would do this job easily. There was one tool by brentwood? that requires modifying your code but it didn't work and I ran outa time.
Anyone know anything simple that works?
Update: The code base is about 20 or so files, each with at least 100 methods - manually adding in start/end calls for each method just isn't appropriate - especially removing them all afterwards - I was actually thinking about doing some form of REGEX to solve this issue and another to remove them all after but its just a little too intrusive but may be the only solution. I've found some nice timing code on here earlier so the timing part of it isn't an issue.
Using a class and #if would make that "adding code to each method" a little easier...
Profiler Class Module::
#If PROFILE = 1 Then
Private m_locationName As String
Private Sub Class_Initialize()
m_locationName = "unknown"
End Sub
Public Sub Start(locationName As String)
m_locationName = locationName
MsgBox m_locationName
End Sub
Private Sub Class_Terminate()
MsgBox m_locationName & " end"
End Sub
#Else
Public Sub Start(locationName As String)
'no op
End Sub
#End If
some other code module:
' helper "factory" since VBA classes don't have ctor params (or do they?)
Private Function start_profile(location As String) As Profiler
Set start_profile = New Profiler
start_profile.Start location
End Function
Private Sub test()
Set p = start_profile("test")
MsgBox "do work"
subroutine
End Sub
Private Sub subroutine()
Set p = start_profile("subroutine")
End Sub
In Project Properties set Conditional Compilation Arguments to:
PROFILE = 1
Remove the line for normal, non-profiled versions.
Adding the lines is a pain, I don't know of any way to automatically get the current method name which would make adding the profiling line to each function easy. You could use the VBE object model to inject the code for you - but I wonder is doing this manually would be ultimately faster.
It may be possible to use a template to add a line to each procedure:
http://msdn.microsoft.com/en-us/library/aa191135(office.10).aspx
Error handler templates usually include an ExitHere label of some description.. The first line after the label could be the timer print.
It is also possible to modify code through code: "Example: Add some lines required for DAO" is an Access example, but something similar could be done with Word.
This would, hopefully, narrow down the area to search for problems. The line could then be commented out, or you could revert to back-ups.
Insert a bunch of
Debug.Print "before/after foo", Now
before and after snippets that you think might run for long terms, then just compare them and voila there you are.
My suggestion would be to divide and conquer, by inserting some timing lines in a few key places to try to isolate the problem, and then drill down on that area.
If the problem is more diffused and not obvious, I'd suggest simplifying by progressively disabling whole chunks of code one at a time, as far as is possible without breaking the process. This is the analogy of finding speed bumps in an Excel workbook by progressively hard coding sheets or parts of sheets until the speed problem disappears.
About that "Now" function (above, svinto) ...
I've used the "Timer" function (in Excel VBA), which returns a Single.
It seems to work just fine. Larry