VBA - Excel Unable to dettect file size increase - vba

This one is driving me crazy. Any help is welcome.
I have a numeric simulaiton that constantly writes to a text file. The file size increases constantly. I need to report file size and check if it's still increasing or if it has ceased increasing.
VBA can't seem to detect the file's increased size, it keeps registering the same size. But, if I have windows explorer oppened on the file folder and press F5, the size increases in VBA.
I need to know if it is supposed to work this way because how windows indexes files or if I'm doing something wrong.
I have used filelen(), filesystemobject.filezise(), datelastmodified(),datelastacessed() and nothing...
I've already found a workarround: If I copy the file in question to a temp file and then read the temp file size, I can detect the change in size. But this is an ugly solution. I would very much like to just check the file size and get the correct result.
Sorry if it isn't very clear. I will be happy to clarify further, should it be necessary.
Here is the code I've been using, but to test it you would have to mimic the file writing situation, like in a video encoding for example, because if you do it with a text file edited in notepad, the act of saving it also updates the information to vba.
Public fileSizeLastStep as long
Public Sub sizeWatch()
dim fso as new fileSystemObject
dim fileToMeasure as File
dim fileSizeNow as long
Set fileToMeasure = fso.GetFile("C:\filename.txt")
fileSizeNow = fileToMeasure.Size
If fileSizeNow <> fileSizeLastStep Then
fileSizeLastStep = fileSizeNow
Set fso = Nothing
Call Application.OnTime(Now + TimeValue("00:02:00"), "sizeWatch")
Else
MsgBox "Simulation Finished!"
'calls whathever function that starts next simulation
End If
End Sub
Thank you for your time.

Perhaps it's an issue with this:
https://blogs.msdn.microsoft.com/oldnewthing/20111226-00/?p=8813
Quote from the article:
If you really need the actual file size right now, you can do what
the first customer did and call GetĀ­FileĀ­Size. That function operates
on the actual file and not on the directory entry, so it gets the real
information and not the shadow copy. Mind you, if the file is being
continuously written-to, then the value you get is already wrong the
moment you receive it.
So try using GetFileSize instead.
Declare Function GetFileSize Lib "kernel32.dll" (ByVal hFile As Long, lpFileSizeHigh As Long) As Long

Related

VB.NET (2013) - Check string against huge file

I have a text file that is 125Mb in size, it contains 2.2 million records. I have another text file which doesn't match the original but I need to find out where it differs. Normally, with a smaller file I would read each line and process it in some way, or read the whole file into a string and do likewise, however the two files are too big for that and so I would like to create something to achieve my goal. Here's what I currently have.. excuse the mess of it.
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs) Handles refUpdateBtn.Click
Dim refOrig As String = refOriginalText.Text 'Original Reference File
Dim refLatest As String = refLatestText.Text 'Latest Reference
Dim srOriginal As StreamReader = New StreamReader(refOrig) 'start stream of original file
Dim srLatest As StreamReader = New StreamReader(refLatest) 'start stream of latest file
Dim recOrig, recLatest, baseDIR, parentDIR, recOutFile As String
baseDIR = vb.Left(refOrig, InStrRev(refOrig, ".ref") - 1) 'find parent folder
parentDIR = Path.GetDirectoryName(baseDIR) & "\"
recOutFile = parentDIR & "Updated.ref"
Me.Text = "Processing Reference File..." 'update the application
Update()
If Not File.Exists(recOutFile) Then
FileOpen(55, recOutFile, OpenMode.Append)
FileClose(55)
End If
Dim x As Integer = 0
Do While srLatest.Peek() > -1
Application.DoEvents()
recLatest = srLatest.ReadLine
recOrig = srOriginal.ReadLine ' check the original reference file
Do
If Not recLatest.Equals(recOrig) Then
recOrig = srOriginal.ReadLine
Else
FileOpen(55, recOutFile, OpenMode.Append)
Print(55, recLatest & Environment.NewLine)
FileClose(55)
x += 1
count.Text = "Record No: " & x
count.Refresh()
srOriginal.BaseStream.Seek(0, SeekOrigin.Begin)
GoTo 1
End If
Loop
1:
Loop
srLatest.Close()
srOriginal.Close()
FileClose(55)
End Sub
It's got poor programming and scary loops, but that's because I'm not a professional coder, just a guy trying to make his life easier.
Currently, this uses a form to insert the original file and the latest file and outputs each line that matches into a new file. This is less than perfect, but I don't know how to cope with the large file sizes as streamreader.readtoend crashes the program. I also don't need the output to be a copy of the latest input, but I don't know how to only output the records it doesn't find. Here's a sample of the records each file has:
doc:ARCHIVE.346CCBD3B06711E0B40E00163505A2EF
doc:ARCHIVE.346CE683B29811E0A06200163505A2EF
doc:ARCHIVE.346CEB15A91711E09E8900163505A2EF
doc:ARCHIVE.346CEC6AAA6411E0BEBB00163505A2EF
The program I have currently works... to a fashion, however I know there are better ways of doing it and I'm sure much better ways of using the CPU and memory, but I don't know this level of programming. All I would like is for you to take a look and offer your best answers to all or some of the code. Tell me what you think will make it better, what will help with one line, or all of it. I have no time limit on this because the code works, albeit slowly, I would just like someone to tell me where my code could be better and what I could do to get round the huge file sizes.
Your code is slow because it is doing a lot of file IO. You're on the right track by reading one line at a time, but this can be improved.
Firstly, I've created some test files based off the data that you provided. Those files contain three million lines and are about 130 MB in size (2.2 million records was less than 100 MB so I've increased the number of lines to get to the file size that you state).
Reading the entire file into a single string uses up about 600 MB of memory. Do this with two files (which I assume you were doing) and you have over 1GB of memory used, which may have been causing the crash (you don't say what error was shown, if any, when the crash occurred, so I can only assume that it was an OutOfMemoryException).
Here's a few tips before I go through your code:
Use Using Blocks
This won't help with performance, but it does make your code cleaner and easier to read.
Whenever you're dealing with a file (or anything that implements the IDisposable interface), it's always a good idea to use a Using statement. This will automatically dispose of the file (which closes the file), even if an error happens.
Don't use FileOpen
The FileOpen method is outdated (and even stated as being slow in its documentation). There are better alternatives that you are already (almost) using: StreamWriter (the cousin of StreamReader).
Opening and closing a file two million times (like you are doing inside your loop) won't be fast. This can be improved by opening the file once outside the loop.
DoEvents() is evil!
DoEvents is a legacy method from back in the VB6 days, and it's something that you really want to avoid, especially when you're calling it two million times in a loop!
The alternative is to perform all of your file processing on a separate thread so that your UI is still responsive.
Using a separate thread here is probably overkill, and there are a number of intricacies that you need to be aware of, so I have not used a separate thread in the code below.
So let's look at each part of your code and see what we can improve.
Creating the output file
You're almost right here, but you're doing some things that you don't need to do. GetDirectoryName works with file names, so there's no need to remove the extension from the original file name first. You can also use the Path.Combine method to combine a directory and file name.
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Reading the files
Since you're looping through each line in the "latest" file and finding a match in the "original" file, you can continue to read one line at a time from the "latest" file.
But instead of reading a line at a time from the "original" file, then seeking back to the start when you find a match, you will be better off reading all of those lines into memory.
Now, instead of reading the entire file into memory (which took up 600 MB as I mentioned earlier), you can read each line of the file into an array. This will use up less memory, and is quite easy to do thanks to the File class.
originalLines = File.ReadAllLines(refOrig)
This reads all of the lines from the file and returns a String array. Searching through this array for matches will be slow, so instead of reading into an array, we can read into a HashSet(Of String). This will use up a bit more memory, but it will be much faster to seach through.
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Searching for matches
Since we now have all of the lines from the "original" line in an array or HashSet, searching for a line is very easy.
originalLines.Contains(recLatest)
Putting it all together
So let's put all of this together:
Private Sub refUpdateBtn_Click(sender As Object, e As EventArgs)
Dim refOrig As String
Dim refLatest As String
Dim recOutFile As String
Dim originalLines As HashSet(Of String)
refOrig = refOriginalText.Text 'Original Reference File
refLatest = refLatestText.Text 'Latest Reference
recOutFile = Path.Combine(Path.GetDirectoryName(refOrig), "Updated.ref")
Me.Text = "Processing Reference File..." 'update the application
Update()
originalLines = New HashSet(Of String)(File.ReadAllLines(refOrig))
Using latest As New StreamReader(refLatest),
updated As New StreamWriter(recOutFile, True)
Do
Dim line As String
line = latest.ReadLine()
' ReadLine returns Nothing when it reaches the end of the file.
If line Is Nothing Then
Exit Do
End If
If originalLines.Contains(line) Then
updated.WriteLine(line)
End If
Loop
End Using
End Sub
This uses around 400 MB of memory and takes about 4 seconds to run.

SearchAllSubDirectories Inefficient Under Load

To detail the problem a bit here, I need to obtain the file "DKIM.txt" hundreds, potentially thousands of times from within different directories.
The file will always appear in a folders such as:
C:\CONX\Users\Jason\222\DKIM.txt
C:\CONX\Users\Donald\12\DKIM.txt
C:\CONX\Users\Yuri\1251\DKIM.txt
The folder depth will never change, the username and the identifier (eg. Jason, and 222) will always change.
My currently working code is as follows:
For Each UserDirectory As String In My.Computer.FileSystem.GetFiles("C:\CONX\Users", FileIO.SearchOption.SearchAllSubDirectories, "DKIM.txt")
Console.WriteLine(UserDirectory)
Next
The problem with the above is it's slow on our extremely populated machines. Their load at times can be 80-90% CPU and simply cycling through all the sub directories to search for one file we know will always exist is inefficient and slow.
So my question is, how would I just wildcard the directory username and identifier.
Example: Return all directories that match: C:\CONX\Users\*\*\DKIM.txt where * is our wildcard.
Thank you.
Depending on the number of sub directory, it could be faster to do your own custom search. You could do that search yourself with GetDirectory and GetFiles.
Loop All directory under C:\CONX\Users
Loop All sub-directory
Check if that sub-directory contains the file.
This also depends if each username have multiple directory under them. I would also suggest to cache the result. If you only need to run it once, it might not matter that it's slow.
Also, Console.WriteLine is very slow. You could write in a file instead you might see some improvement.
As Hans Passant pointed out the flaws so eloquently I'm going to go ahead and propose what I think can work. Here it goes.
Use a program to monitor your \Users folder for any file changes (created/deleted) and add a trigger to run a batch script to update a text file where you maintain all your users and identifier. You can write your own window process (daemon) if you like for this or use an off the shelf software like Watch 4 Folder. Here is a guide for that describes it's usability http://www.guidingtech.com/9861/automate-folder-actions-windows-watch-4-folder/
Now you will just be searching the new text file (master list) as opposed to traversing the entire directory and sub directory structure. It will be better if you can store the master list of user/identifier in the database and use a batch script or powershell to update the database.
Good luck.
Solution: Search with a predefined depth limit while also matching with a filename. This will eliminate wasted CPU cycles recursively searching for the file we knew existed in each directory.
Public Sub GetFiles(ByVal strFileFilter As String, ByVal strDirectory As String, ByVal intDepthLimit As Integer, ByVal intCurrentDepth As Integer)
Dim folderInfo As New DirectoryInfo(strDirectory)
' Is the current depth on this recursion less than our limit?
' If so, find any directories and get into them by calling GetFiles recursively (incrementing depth count)
If intCurrentDepth < intDepthLimit Then
Dim directories() As DirectoryInfo
directories = folderInfo.GetDirectories()
For Each fDirectory In directories
' Recursively call ourselves incrementing the depth using the given folder path.
GetFiles(strFileFilter, fDirectory.FullName, intDepthLimit, intCurrentDepth + 1)
Next
End If
' After we can't go further down, add any files which match our filter to listbox (in this case lstFiles)
Dim files() As FileInfo
files = folderInfo.GetFiles(strFileFilter)
For Each fFile In files
lstFiles.Items.Add(fFile.FullName)
Next
End Sub
Source: http://www.coderslexicon.com/playing-with-recursive-directory-diving-in-vb-net/
eg. GetFiles("DKIM.txt", "C:\CONX\Users", 3, 0)
Dropped the execution time for hundreds of large directories from multiple seconds using FileIO.SearchOption.SearchAllSubDirectories to less than half a second.

Visual Basic (2010) - Using variables in embedded text files?

Ive always been able to just search for what I need on here, and I've usually found it fairly easily, but this seems to be an exception.
I'm writing a program in Visual Basic 2010 Express, it's a fairly simple text based adventure game.
I have a story, with multiple possible paths based on what button/option you choose.
The text of each story path is saved in its own embedded resource .txt file. I could just write the contents of the text files straight into VB, and that would solve my problem, but that's not the way I want to do this, because that would end up looking really messy.
My problem is that I need to use variable names within my story, here's an example of the contents of one of the embedded text files,
"When "+playername+" woke up, "+genderheshe+" didn't recognise "+genderhisher+" surroundings."
I have used the following code to read the file into my text box
Private Sub frmAdventure_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim thestorytext As String
Dim imageStream As Stream
Dim textStreamReader As StreamReader
Dim assembly As [Assembly]
assembly = [assembly].GetExecutingAssembly()
imageStream = assembly.GetManifestResourceStream("Catastrophe.CatastropheStoryStart.png")
textStreamReader = New StreamReader(assembly.GetManifestResourceStream("Catastrophe.CatastropheStoryStart.txt"))
thestorytext = textStreamReader.ReadLine()
txtAdventure.Text = thestorytext
End Sub
Which works to an extent, but displays it exactly as it is in the text file, keeps the quotes and the +s and the variable names instead of removing the quotes and the +s and replacing the variable names with what's stored within the variables.
Can anyone tell me what I need to change or add to make this work?
Thanks, and apologies if this has been answered somewhere and I just didn't recognise it as the solution or didn't know what to search to find it or something.
Since your application is compiled, you cannot just put some of your VB code in the text file and have it executed when it is read.
What you can do, and what is usually done, is that you leave certain tags inside your text file, then locate them and replace them with the actual values.
For example:
When %playername% woke up, %genderheshe% didn`t recognise %genderhisher% surroundings.
Then in your code, you would find all the tags:
Dim matches = Regex.Matches(thestorytext, "%(\w+?)%")
For Each match in matches
' the tag name is now in: match.Groups(1).Value
' replace the tag with the value and replace it back into the original string
Next
Of course the big problem still remains - which is how to fill in the actual values. Unfortunately, there is no clean way to do this, especially using any local variables.
You can either manually maintain a Dictionary of tag names and their values, or use Reflection to get the values directly at the runtime. While it should be used carefully (speed, security, ...), it will work just fine for your case.
Assuming you have all your variables defined as properties in the same class (Me) as the code that reads and processes this text, the code will look like this:
Dim matches = Regex.Matches(thestorytext, "%(\w+?)%")
For Each match in matches
Dim tag = match.Groups(1).Value
Dim value = Me.GetType().GetField(tag).GetValue(Me)
thestorytext = thestorytext.Replace(match.Value, value) ' Lazy code
Next
txtAdventure.Text = thestorytext
If you don't use properties, but only fields, change the line to this:
Dim value = Me.GetType().GetField(tag).GetValue(Me)
Note that this example is rough and the code will happily crash if the tags are misspelled or not existing (you should do some error checking), but it should get you started.

Open and save problems

I have a function in my VB project where i scan an image and then I can change the contrast.
I scan it and saves it C:\temp\my_img.tif.
In the winform the image is displayed in a PictureBox.
If I in the contrast function set like img.Save("C:\temp\my_img.tif", ImageFormat.Tiff) I get "A generic error occurred in GDI+.". If I however set the filename to something else, it works just fine.
So, how do I release the used image before saving it?
The whole function, in short:
Sub setContrast(ByVal C As Single)
'filename(1) ia a "global" variable that stores the used file path, in this case "C:\temp\my_img.tif"
Dim img As Image = Image.FromFile(filename(1)) '<--- I get the image
'A bunch of contrast stuff in some rows.....
'Here, i should release the image...
img.Save(filename(1), ImageFormat.Tiff) '<---Tries to save
PictureBox1.Refresh()
End Sub
Save it using a different file name, and then, if necessary, delete the old file and rename the new file to match the old, having Disposed of the Image beforehand.
From Image.FromFile:
The file remains locked until the Image is disposed.
There's no wording anywhere else that says that this is somehow worked around if the same Image instance is trying to Save back to the file.

How can I tell what module my code is executing in?

For a very long time, when I have an error handler I make it report what Project, Module, and Procedure the error was thrown in. I have always accomplished this by simply storing their name via constants. I know that in a Class you get the name programmatically with TypeName(Me), but obviously that only gets me one out of three pieces of information and only when I'm not in a "Standard" module.
I don't have a really huge problem with using constants, it's just that people don't always keep them up to date, or worse they copy and paste and then you have the wrong name being reported, etc. So what I would like to do is figure out a way to get rid of the Constants shown in the example, without losing the information.
Option Compare Binary
Option Explicit
Option Base 0
Option Private Module
Private Const m_strModuleName_c As String = "MyModule"
Private Sub Example()
Const strProcedureName_c As String = "Example"
On Error GoTo Err_Hnd
Exit_Proc:
On Error Resume Next
Exit Sub
Err_Hnd:
ErrorHandler.FormattedErrorMessage strProcedureName_c, m_strModuleName_c, _
Err.Description, Err.Source, Err.Number, Erl
Resume Exit_Proc
End Sub
Does anyone know ways to for the code to tell where it is? If you can conclusively show it can't be done, that's an answer too:)
Edit:I am also aware that the project name is in Err.Source. I was hoping to be able to get it without an exception for other purposes. If you know great, if not we can define that as outside the scope of the question.
I am also aware of how to get the error line, but that information is of course only somewhat helpful without knowing Module.Procedure.
For the project name, the only way I can think of doing this is by deliberately throwing an error somewhere in Sub Main(), and in the error handling code, save the resulting Err.Source into an global variable g_sProjectName. Otherwise, I seem to remember that there was a free 3rd party DLL called TLBINF32.DLL which did COM reflection - but that seems way over the top for what you want to do, and in any case there is probably a difference between public and private classes. And finally, you could use a binary editor to search for the project name string in your EXE, and then try to read the string from the position. Whilst it is frustrating that the names of every project and code module is embedded in the EXE, there seems to be no predictable way of doing this, so it is NOT recommended.
There are several questions here.
You can get the Project Name by calling App.Name
You cannot get the name of the method you are in. I recommend using the automated procedure templates from MZ Tools, which will automatically put in all the constants you need and your headache will be over.
The last piece is possibly having to know the name of the EXE (or lib) that invoked your ActiveX DLL. To figure this out, try the following:
'API Declarations'
Private Declare Function GetModuleFileName Lib _
"kernel32" Alias "GetModuleFileNameA" (ByVal _
hModule As Long, ByVal lpFileName As String, _
ByVal nSize As Long) As Long
Private Function WhosYourDaddy() As String
Dim AppPath As String
Const MAX_PATH = 260
On Error Resume Next
'allocate space for the string'
AppPath = Space$(MAX_PATH)
If GetModuleFileName(0, AppPath, Len(AppPath)) Then
'Remove NULLs from the result'
AppPath = Left$(AppPath, InStr(AppPath, vbNullChar) - 1)
WhosYourDaddy = AppPath
Else
WhosYourDaddy = "Not Found"
End If
End Function
Unfortunately, you'll need to have individual On Error GoTo X statements for individual modules and procedures. The project is always stored in Err.Source. The VBA error handling isn't all that great in this area -- after all, how much good does the project name as the source of the error, as opposed to procedure/module, do you.
If you manually or programatically number your lines (like old-school BASIC), you can use ERL to list the line number the error occurred on. Be warned, though, that an error that occurs on a line without a number will make ERL throw its own error, instead of returning a zero. More information can be found at this blog post.
If you're using Access 2007 (not sure about other Office apps/versions), try this code snippet dug out of the help documentation:
Sub PrintOpenModuleNames()
Dim i As Integer
Dim modOpenModules As Modules
Set modOpenModules = Application.Modules
For i = 0 To modOpenModules.Count - 1
Debug.Print modOpenModules(i).Name
Next
End Sub
And Microsoft includes these remarks:
All open modules are included in the
Modules collection, whether they are
uncompiled, are compiled, are in
break mode, or contain the code
that's running.
To determine whether an individual
Module object represents a standard
module or a class module, check the
Module object's Type property.
The Modules collection belongs to the
Microsoft Access Application object.
Individual Module objects in the
Modules collection are indexed
beginning with zero.
So far, I haven't been able to find anything on referencing the current Project or Procedure. but this should point you in the right direction.
I suggest you take a look at CodeSMART for VB6, This VB6 addin has a customizable Error Handling Schemes Manager, with macros that will insert strings for module name, method name, etc., into your error handling code with a single context menu selection.
Has some other very nice features as well - a Find In Files search that's superior to anything I'd seen till ReSharper, a Tab Order designer, and much more.
At my previous employer, we used this tool for many years, starting with the 2005 version. Once you get used to it,it's really hard to do without it.