I am currently running a find and replace through hundreds of .txt files at a time. I am looking for a way to pull a count of the number of finds for my value.
Here is the code I am currently running, hoping to be able to add to or modify this code.
Dim flatfiles As String() = IO.Directory.GetFiles("C:\DATA\TEST\", "*.txt").Where(Function(x) File.ReadAllText(x).Contains("Bob")).ToArray
For Each f As String In flatfiles
Dim contents As String = File.ReadAllText(f)
File.WriteAllText(f, contents.Replace("Bob", "Bill"))
Next
One (inefficient) way to do this would be to include a counter outside of the For Each loop (Dim itemsFound as Integer = 0), then increment it by the count of find in each file, using something like:
itemsFound = itemsFound + (Regex.Split(contents, find).Length - 1)
Regex.Split splits the string up whenever it finds find, which means the count you're looking for is one less than the number of items in the list.
I would say as well, that you're calling File.ReadAllText twice in your code, so you could improve it by getting rid of the Where code, and just check in your For each loop (seeing as you're now counting the number of instances in the file anyway, it's easy enough to check for 0 occurrences). Alternatively, you could replace the .Where code to store the contents of the file in an array rather than the name of the files (although this can be dangerous if the files are large); or you could even just do it all in Linq if you want some obfuscated code...
Related
Here is what i have tried so far , but no new shot variable is being declared
Module Module1
Dim shotlist As New List(Of Boolean)
Dim shono As Integer = 0
Dim shonos As String
Dim shotname As String
Dim fshot As Boolean
Dim shots As String
Sub Main()
For i As Integer = 0 To 1000
Dim ("shots" & i) as String = "shots" & i
fshot = Convert.ToBoolean(shots)
Next
End Sub
End Module
You can't do things this way. The variable names you see when you write a program are lost, turned into memory addresses by the compiler when it compiles your program. You cannot store any information using a variable's name - the name is just a reference for you, the programmer, during the time you write a program (design time)
Think of variables like buckets, with labels on the outside. Buckets only hold certain kinds of data inside
Dim shotname as String
This declares a bucket labelled shotname on the outside and it can hold a string inside. You can put a string inside it:
shotname = "Shot 1"
You can put any string you like inside this bucket, and thus anything you can reasonably represent as a string can also be put inside this bucket:
shotname = DateTime.Now.ToString()
This takes the current time and turns it into a string that looks like a date/time, and puts it in the bucket. The thing in the bucket is always a string, and lots of things (nearly anything actually) can be represented as a string, but we don't type all our buckets as strings because it isn't very useful - if you have two buckets that hold numbers for example, you can multiply them:
Dim a as Integer
a=2
Dim b as Integer
b=3
Dim c as Integer
c=a*b
But you can't multiply strings, even if they're strings trust look like numbers; strings would have to be converted to numbers first. Storing everything as a string and converting it to some other type before working on it then converting it back to a string to store it would be very wearisome
So that's all well and good and solves the problem of storing varying information in the computer memory and giving it a name that you can reference it by, but it means the developer has to know all the info that will ever be entered into the program. shotname is just one bucket storing one bit of info. Sure you could have
Dim shotname1 as String
Dim shotname2 as String
Dim shotname3 as String
But it would be quite tedious copy paste exercise to do this for a thousand shotname, and after you've done it you have to refer to all of them individually using their full name. There isn't a way to dynamically refer to bucket labels when you write a program, so this won't work:
For i = 0 To 1000
shotname&i = "This is shotname number " & i
Next i
Remember, shotname is lost when compiling, and i is too, so shotname&i just flat out cannot work. Both these things become, in essence, memory addresses that mean something to the compiler and you can't join two memory addresses together to get a third memory address that stores some data. They were only ever nice names for you to help you understand how to write the program, pick good names and and not get confused about what is what
Instead you need the mechanism that was invented to allow varying amounts of data not known at the design-time of the program - arrays
At their heart, arrays are what you're trying to do. You're trying to have load of buckets with a name you can vary and form the name programmatically.
Array naming is simple; an array has a name like any other variable (what I've been calling a bucket up to now - going to switch to calling them variables because everyone else does) and the name refers to the array as a whole, but the array is divided up into multiple slots all of which store the same type of data and have a unique index. Though the name of the array itself is fixed, the index can be varied; together they form a reference to a slot within the array and provide a mechanism for having names that can be generated dynamically
Dim shotnames(999) as String
This here is an array of 1000 strings, and it's called shotnames. It's good to use a plural name because it helps remind you that it's an array, holding multiple something. The 999 defines the last valid slot in the array - arrays start at 0, so with this one running 0 to 999 means there are 1000 entries in it
And critically, though the shotnames part of the variable name remains fixed and must be what you use to refer to the array itself(if you want to do something with the entire thing, like pass it to a function), referring to an individual element/slot in the array is done by tacking a number onto the end within brackets:
shotnames(587) = "The 588th shotname"
Keep in mind that "starts at 0 thing"
This 587 can come from anything that supplies a number; it doesn't have to be hard coded by you when you write the program:
For i = 0 to 999
shotnames(i) = "The " & (i+1) & "th shotname"
Next i
Or things that generate numbers:
shotnames(DateTime.Now.Minute) = "X"
If it's 12:59 when this code runs, then shotnames(59), the sixtieth slot in the array, will become filled with X
There are other kinds of varying storage; a list or a dictionary are commonly used examples, but they follow these same notions - you'll get a variable where part of the name is fixed and part of the name is an index you can vary.
At their heart they are just arrays too- if you looked inside a List you'd find an array with some extra code surrounding it that, when the array gets full, it makes a new array of twice the size and copies everything over. This way it provides "an array that expands" type functionality - something that arrays don't do natively.
Dictionaries are worth mentioning because they provide a way to index by other things than a number and can achieve storage of a great number of things not known at design time as a result:
Dim x as New Dictionary(Of String, Object)
This creates a dictionary indexed by a string that stores objects (I.e. anything - dates, strings, numbers, people..)
You could do like:
x("name") = "John"
x("age") = 32
You could even realize what you were trying to do earlier:
For i = 0 to 999
x("shotname"&i) = "The " & (i+1) & "th shotname"
Next i
(Though you'd probably do this in a Dictionary(Of string, string), ie a dictionary that is both indexed by string and stores strings.. and you probably wouldn't go to this level of effort to have a dictionary that stores x("shotname587") when it's simpler to declare an array shotname(587))
But the original premise remains true: your variable has a fixed part of the name (ie x) and a changeable part of the name (ie the string in the brackets) that is used as an index.
And there is no magic to the indexing by string either. If you opened up a dictionary and looked inside it, you'd find an array, together with some bits of code that take the string you passed in as an index, and turn it into a number like 587, and store the info you want in index 587. And there is a routine to deal with the case where two different strings both become 587 when converted, but other than that it's all just "if you want a variable where part of the name is changeable/formable programmatically rather than by the developer, it's an array"
I am trying to program a way to read a text file and match all the values and their quantites. For example if the text file is like this:
Bread-10 Flour-2 Orange-2 Bread-3
I want to create a list with the total quantity of all the common words. I began my code, but I am having trouble understanding to to sum the values. I'm not asking for anyone to write the code for me but I am having trouble finding resources. I have the following code:
Dim query = From data In IO.File.ReadAllLines("C:\User\Desktop\doc.txt")
Let name As String = data.Split("-")(0)
Let quantity As Integer = CInt(data.Split("-")(1))
Let sum As Integer = 0
For i As Integer = 0 To query.Count - 1
For j As Integer = i To
Next
Thanks
Ok, lets break this down. And I not seen the LET command used for a long time (back in the GWBASIC days!).
But, that's ok.
So, first up, we going to assume your text file is like this:
Bread-10
Flour-2
Orange-2
Bread-3
As opposed to this:
Bread-10 Flour-2 Orange-2 Bread-3
Now, we could read one line, and then process the information. Or we can read all lines of text, and THEN process the data. If the file is not huge (say a few 100 lines), then performance is not much of a issue, so lets just read in the whole file in one shot (and your code also had this idea).
Your start code is good. So, lets keep it (well ok, very close).
A few things:
We don't need the LET for assignment. While older BASIC languages had this, and vb.net still supports this? We don't need it. (but you will see examples of that still floating around in vb.net - especially for what we call "class" module code, or "custom classes". But again lets just leave that for another day.
Now the next part? We could start building up a array, look for the existing value, and then add it. However, this would require a few extra arrays, and a few extra loops.
However, in .net land, we have a cool thing called a dictionary.
And that's just a fancy term of for a collection VERY much like an array, but it has some extra "fancy" features. The fancy feature is that it allows one to put into the handly list things by a "key" name, and then pull that "value" out by the key.
This saves us a good number of extra looping type of code.
And it also means we don't need a array for the results.
This key system is ALSO very fast (behind the scene it uses some cool concepts - hash coding).
So, our code to do this would look like this:
Note I could have saved a few lines here or there - but that would make this code hard to read.
Given that you look to have Fortran, or older BASIC language experience, then lets try to keep the code style somewhat similar. it is stunning that vb.net seems to consume even 40 year old GWBASIC type of syntax here.
Do note that arrays() in vb.net do have some fancy "find" options, but the dictionary structure is even nicer. It also means we can often traverse the results with out say needing a for i = 1 to end of array, and having to pull out values that way.
We can use for each.
So this would work:
Dim MyData() As String ' an array() of strings - one line per array
MyData = File.ReadAllLines("c:\test5\doc.txt") ' read each line to array()
Dim colSums As New Dictionary(Of String, Integer) ' to hold our values and sum them
Dim sKey As String
Dim sValue As Integer
For Each strLine As String In MyData
sKey = Split(strLine, "-")(0)
sValue = Split(strLine, "-")(1)
If colSums.ContainsKey(sKey) Then
colSums(sKey) = colSums(sKey) + sValue
Else
colSums.Add(sKey, sValue)
End If
Next
' display results
Dim KeyPair As KeyValuePair(Of String, Integer)
For Each KeyPair In colSums
Debug.Print(KeyPair.Key & " = " & KeyPair.Value)
Next
The above results in this output in the debug window:
Bread = 13
Flour = 2
Orange = 2
I was tempted here to write this code using just pure array() in vb.net, as that would give you a good idea of the "older" types of coding and syntax we could use here, and a approach that harks all the way back to those older PC basic systems.
While the dictionary feature is more advanced, it is worth the learning curve here, and it makes this problem a lot easier. I mean, if this was for a longer list? Then I would start to consider introduction of some kind of data base system.
However, without some data system, then the dictionary feature is a welcome approach due to that "key" value lookup ability, and not having to loop. It also a very high speed system, so the result is not much looping code, and better yet we write less code.
I'm a beginner in VB and I'm throwing together a quick tool to pull data out of excel sheets into an SQL server.
The actual opening/manipulation of the excel files I can do, but I would like to limit the files I'm dealing with based on the created date, but I'm struggling with googling a solution
So in order to get all of the paths, I'm simply using:
Dim fname As String
For Each file As String In Directory.GetFiles(pathtoscan)
fname = Path.GetFileName(file)
Which works fine to get everything (writing to the SQL server table as I go), but of course the above means getting every single path, where as I'd like to "optimise" it by only getting paths created after a certain defined date.
Is this doable, or would it simply be a matter of filtering after grabbing all paths anyway, and thus mean no better "performance"?
Many thanks in advance.
You can use LINQ and File.GetCreationTime:
Dim relevantFiles = From f In Directory.EnumerateFiles(pathtoscan)
Where File.GetCreationTime(f) > yourDate
For Each file As String In relevantFiles
' ... '
Next
Also use EnumerateFiles instead of GetFiles. The latter returns an array of all files before you start filtering them. The former returns one after the other.
I've got a folder browser dialogue populating the directory location (path) of a system.io.directory.getfiles. The issue is if you accidentally select a folder with hundereds or thousands of files (which there's no reason you would ever need this for this app) it will lock up the app while it grabs all the files. All I'm grabbing are the directory locations as strings and want to put a limit on the amount of files that can be grabbed. Here's my current code that isn't working.
If JigFolderBrowse.ShowDialog = DialogResult.OK Then
Dim dirs(50) As String
dirs = System.IO.Directory.GetFiles(JigFolderBrowse.SelectedPath.ToString, "*", System.IO.SearchOption.AllDirectories)
If dirs.Length> 50 Then
MsgBox("Too Many Files Selected" + vbNewLine + "Select A Smaller Folder To Be Organized")
Exit Sub
End If
'Seperate Each File By Type
For i = 0 To dirs.Length - 1
If Not dirs(i).Contains("~$") Then
If dirs(i).Contains(".SLDPRT") Or dirs(i).Contains(".sldprt") Then
PartsListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".SLDASM") Or dirs(i).Contains(".sldasm") Then
AssemListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".SLDDRW") Or dirs(i).Contains(".slddrw") Then
DrawingListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".pdf") Or dirs(i).Contains(".PDF") Then
PDFsListBx.Items.Add(dirs(i))
ElseIf dirs(i).Contains(".DXF") Or dirs(i).Contains(".dxf") Then
DXFsListBx.Items.Add(dirs(i))
ElseIf Not dirs(i).Contains(".db") Then
OtherFilesListBx.Items.Add(dirs(i))
End If
End If
The Directory.GetFiles method always retrieves the full list of matching files before returning. There is no way to limit it (outside of specifying a more narrow search pattern, that is). There is, however, the Directory.EnumerateFiles method which does what you need. From the MSDN article:
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
So, for instance, you could do something like this:
dirs = Directory.
EnumerateFiles(
JigFolderBrowse.SelectedPath.ToString(),
"*",
SearchOption.AllDirectories).
Take(50).
ToArray()
Take is a LINQ extension method which returns only the first x-number of items from any IEnumerable(Of T) list. So, in order for that line to work, you'll need to import the System.Linq namespace. If you can't, or don't want to, use LINQ, you can just implement your own method that does the same sort of thing (iterates an IEnumerable list in a for loop and returns after reading only the first 50 items).
Side Note 1: Unused Array
Also, it's worth mentioning, in your code, you initialize your dirs variable to point to a 50-element string array. You then, in the very next line, set it to point to a whole new array (the one returned by the Directory.GetFiles method). While it's not breaking functionality, it is unnecessarily inefficient. You're creating that extra array, just giving the garbage collector extra work to do, for no reason. You never use that first array. It just gets dereferenced and discarded in the very next line. It would be better to create the array variable as null:
Dim dirs() As String
Or
Dim dirs() As String = Nothing
Or, better yet:
Dim dirs() As String = Directory.
EnumerateFiles(
JigFolderBrowse.SelectedPath.ToString(),
"*",
SearchOption.AllDirectories).
Take(50).
ToArray()
Side Note 2: File Extension Comparisons
Also, it looks like you are trying to compare the file extensions in a case-insensitive way. There are two problems with the way you are doing it. First, you only comparing it against two values: all lowercase (e.g. ".pdf") and all uppercase (e.g. ".PDF). That won't work with mixed-case (e.g. ".Pdf").
It is admittedly annoying that the String.Contains method does not have a case-sensitivity option. So, while it's a little hokey, the best option would be to make use of the String.IndexOf method, which does have a case-insensitive option:
If dirs(i).IndexOf(".pdf", StringComparison.CurrentCultureIgnoreCase) <> -1 Then
However, the second problem, which invalidates my last point of advice, is that you are checking to see if the string contains the particular file extension rather than checking to see if it ends with it. So, for instance, a file name like "My.pdf.zip" will still match, even though it's extension is ".zip" rather than ".pdf". Perhaps this was your intent, but, if not, I would recommend using the Path.GetExtension method to get the actual extension of the file name and then compare that. For instance:
Dim ext As String = Path.GetExtension(dirs(i))
If ext.Equals("pdf", StringComparison.CurrentCultureIgnoreCase) Then
' ...
Im sure someone out there can help, im totally new to coding but getting into it and really enjoying. I know this is such a simple question out there for you folks but i have the following, I load a spread sheet of strings (2 columns) into a datagridview the reason i do this because there is over 100,000 find and replaces and these will generally sit within and existing string when searching, then from there i want to simply search a txt file and find and replace a number of strings in it. So it would check each row in a datagrid take from column 1 the find and use column 2 to replace then outputs the string to another txt file once the find and replace has taken place. My current results are that it just takes what was in the first file and copies without replacing in the second find.
Any assistance is gratefully received, many thanks.
Please see below my amateur code:-
Private Sub CmdBtnTestReplace_Click(sender As System.Object, e As System.EventArgs) Handles CmdBtnTestReplace.Click
Dim fName As String = "c:\backup\logs\masterUser.txt"
Dim wrtFile As String = "c:\backup\logs\masterUserFormatted.txt"
Dim strRead As New System.IO.StreamReader(fName)
Dim strWrite As New System.IO.StreamWriter(wrtFile)
Dim s As String
Dim o As String
For Each row As DataGridViewRow In DataGridView1.Rows
If Not row.IsNewRow Then
Dim Find1 As String = row.Cells(0).Value.ToString
Dim Replace1 As String = row.Cells(1).Value.ToString
Cursor.Current = Cursors.WaitCursor
s = strRead.ReadToEnd()
o = s.Replace(Find1, Replace1)
strWrite.Write(o)
End If
Next
strRead.Close()
strWrite.Close()
Cursor.Current = Cursors.Default
MessageBox.Show("Finished Replacing")
End Sub
1. What you are doing is :
creating a StreamReader whose purpose is to read chars from a File/Stream in sequence.
creating a StreamWriter whose purpose is to add content to a File/Stream.
then looping
a) read the remaining content of file fName and put it in s
b) replace words from s and put the result in o
c) add o to the existing content of the file wrtFile
then the usual closing of the stream reader/writer...
But that doesn't work because, on the secund iteration of the loop, strRead is already at the end of your loaded file, then there is nothing to read anymore, and s is always an empty string starting from the secund iteration.
Furthermore, because s is empty, o will be empty aswell.
And last of all, even if you manage to re-read the content of the file and replace the words, strWrite will not clear the initial content of the output file, but will write the resulting replaced string (o) after the previously updated content of the file.
2. Since you loaded the content of the file in a string (s = strRead.ReadToEnd()), why don't you :
load that s string before the For-Next block
loop the datagridview rows in a For-Next block
replace using the pair Find1/Replace1 s = s.Replace(Find1, Replace1)
then, save the content of s in the targeted file outside the For-Next block
3. However, improving your understanding of how streams work, what should be considered and what are forbidden is a bit outside the scope of SO I think; such documentation could be found/gathered on the MSDN page or with the help of your friend : google. The same applies for finding out/thinking of how you should arrange your code, how to achieve your goal.Let's take an example :
' Content of your file :
One Two Three Four Five Six
' Content of your DataGridView :
One | Two
Two | Three
Three | Four
Four | Five
Five | Six
Six | Seven
The resulting replacement text at the end of a similar routine as yours will be :
Seven Seven Seven Seven Seven Seven ' :/
' while the expected result would be :
Two Three Four Five Six Seven
And that's because of the iteration : already replaced portions of your file (or loaded file content) could get replaced again and again. To avoid that, either :
split the loaded content in single words, and use a "replaced" flag for each word (to avoid replacing that word more than once)
or preload all the pair Find/Replace, and parse the file content in sequence once, replacing that instance when required.
So, before using an interesting object in the framework :
you should know what it does and how it behaves
otherwise -> read the documentation
otherwise -> create a minimalistic test solution which purpose is to brute force testings on that particular object to debunck all its powers and flaws.
So, like I said in 2., move those ReadAllText() and Write() outside the For/Next block to start from and have a look at the resulting output (Ask specific questions in comments when google can't answer) Then if you're OK with it even if issue like the One Two Three example above could occur, then voila ! Otherwise, use google to gather more examples on "splitting text in words" and reformating the whole, have some tries, then get back here if you're stuck on precise issues.