Creating multiple .txt files while restricting size of each - vb.net

In my program, I collect bits of information on a massive scale, hundreds of thousands to millions of lines each. I am trying to limit each file I create to a certain size in order to be able to quickly open it and read the data. I am using a HashSet to collect all the data without duplicates.
Here's my code so far:
Dim Founds As HashSet(Of String)
Dim filename As String = (Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\Sorted_byKING\sorted" + Label4.Text + ".txt")
Using writer As New System.IO.StreamWriter(filename)
For Each line As String In Founds
writer.WriteLine(line)
Next
Label4.Text = Label4.Text + 1 'Increments sorted1.txt, sorted2.txt etc
End Using
So, my question is:
How do I go about saving, let's say 250,000 lines in a text file before moving to another one and adding the next 250,000?

First of all, do not use Labels to simply store values. You should use variables instead, that's what variables are for.
Another advice, always use Path.Combine to concatenate paths, that way you don't have to worry about if each part of the path ends with a separator character or not.
Now, to answer your question:
If you'd like to insert the text line by line, you can use something like:
Sub SplitAndWriteLineByLine()
Dim Founds As HashSet(Of String) 'Don't forget to initialize and fill your HashSet
Dim maxLinesPerFile As Integer = 250000
Dim fileNum As Integer = 0
Dim counter As Integer = 0
Dim filename As String = String.Empty
Dim writer As IO.StreamWriter = Nothing
For Each line As String In Founds
If counter Mod maxLinesPerFile = 0 Then
fileNum += 1
filename = IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),
$"Sorted_byKING\sorted{fileNum.ToString}.txt")
If writer IsNot Nothing Then writer.Close()
writer = New IO.StreamWriter(filename)
End If
writer.WriteLine(line)
counter += 1
Next
writer.Dispose()
End Sub
However, if you will be inserting the text from the HashSet as is, you probably don't need to write line by line, instead you can write each "bunch" of lines at once. You could use something like the following:
Sub SplitAndWriteAll()
Dim Founds As HashSet(Of String) 'Don't forget to initialize and fill your HashSet
Dim maxLinesPerFile As Integer = 250000
Dim fileNum As Integer = 0
Dim filename As String = String.Empty
For i = 0 To Founds.Count - 1 Step maxLinesPerFile
fileNum += 1
filename = IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),
$"Sorted_byKING\sorted{fileNum.ToString}.txt")
IO.File.WriteAllLines(filename, Founds.Skip(i).Take(maxLinesPerFile))
Next
End Sub

Related

exclude header from csv in vb.net

I got a .csv and I want to load it into a datagridview. I have a button called button1 and I got a datagridview called datagridview1. I click the button and it appears... including the header, which I don't want.
Please:
How do I exclude the header from the .csv ?
code:
Imports System.IO
Imports System.Text
Public Class CSV_Reader
Private Sub CSV_Reader_Load(sender As Object, e As EventArgs) Handles MyBase.Load
End Sub
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim filename As String = "C:\Users\Gaius\Desktop\meepmoop.csv"
Dim thereader As New StreamReader(filename, Encoding.Default)
Dim colsexpected As Integer = 7
Dim sline As String = ""
DataGridView1.Rows.Clear()
Do
sline = thereader.ReadLine
If sline Is Nothing Then Exit Do
Dim words() As String = sline.Split(";")
DataGridView1.Rows.Add("")
If words.Length = colsexpected Then
For ix As Integer = 0 To 6
DataGridView1.Rows(DataGridView1.Rows.Count - 2).Cells(ix).Value = words(ix)
Next
Else
DataGridView1.Rows(DataGridView1.Rows.Count - 2).Cells(0).Value = "ERROR"
End If
Loop
thereader.Close()
End Sub
End Class
meepmoop.csv:
alpha;bravo;charlie;delta;echo;foxtrot;golf
1;meep;moop;meep;moop;meep;moop
2;moop;meep;moop;meep;moop;meep
3;meep;moop;meep;moop;meep;moop
4;moop;meep;moop;meep;moop;meep
5;meep;moop;meep;moop;meep;moop
6;moop;meep;moop;meep;moop;meep
7;meep;moop;meep;moop;meep;moop
8;moop;meep;moop;meep;moop;meep
9;meep;moop;meep;moop;meep;moop
10;moop;meep;moop;meep;moop;meep
edit:
[...]
Dim sline As String = ""
DataGridView1.Rows.Clear()
Dim line As String = thereader.ReadLine()
If line Is Nothing Then Return
Do
sline = thereader.ReadLine
[...]
The above addition to the code works but I have no idea why. Nor do I understand why I have to -2 rather than -1. I can't rely on guesswork, I'm expected to one day do this professionally. But I just can't wrap my head around it. Explanation welcome.
edit:
Do
sline = thereader.ReadLine
If sline Is Nothing Then Exit Do
Dim words() As String = sline.Split(";")
If words.Count = 7 Then
DataGridView1.Rows.Add(words(0), words(1), words(2), words(3), words(4), words(5), words(6))
Else
MsgBox("ERROR - There are " & words.Count & " columns in this row and there must be 7!")
End If
Loop
I've shortened the Loop on the advice of a coworker, taking his word on it being 'better this way'.
Another method, using Enumerable.Select() + .Skip()
As noted in Ondřej answer, there's a specific tool for these operations: TextFieldParser
But, if there are no special requirements and the string parsing is straightforward enough, it can be done with the standard tools, as shown in Tim Schmelter answer.
This method enumerates the string arrays returned by the Split() method, and groups them in a list that can be then used in different ways. As a raw text source (as in this case) or as a DataSource.
Dim FileName As String = "C:\Users\Gaius\Desktop\meepmoop.csv"
Dim Delimiter As Char = ";"c
Dim ColsExpected As Integer = 7
If Not File.Exists(FileName) Then Return
Dim Lines As String() = File.ReadAllLines(FileName, Encoding.Default)
Dim StringColumns As List(Of String()) =
Lines.Select(Function(line) Split(line, Delimiter, ColsExpected, CompareMethod.Text)).
Skip(1).ToList()
DataGridView1.Rows.Clear()
'If the DataGridView is empty, add a `[ColsExpected]` number of `Columns`:
DataGridView1.Columns.AddRange(Enumerable.Range(0, ColsExpected).
Select(Function(col) New DataGridViewTextBoxColumn()).ToArray())
StringColumns.Select(Function(row) DataGridView1.Rows.Add(row)).ToList()
If you instead want to include and use the Header because your DataGridView is empty (it has no predefined Columns), you could use the Header line in the .csv file to create the control's Columns:
'Include the header (no .Skip())
Dim StringColumns As List(Of String()) =
Lines.Select(Function(line) Split(line, Delimiter, ColsExpected, CompareMethod.Text)).ToList()
'Insert the columns with the .csv header columns description
DataGridView1.Columns.AddRange(Enumerable.Range(0, ColsExpected).
Select(Function(col, idx) New DataGridViewTextBoxColumn() With {
.HeaderText = StringColumns(0)(idx)
}).ToArray())
'Remove the header line...
StringColumns.RemoveAt(0)
StringColumns.Select(Function(row) DataGridView1.Rows.Add(row)).ToList()
You can skip the header by calling ReadLine twice. Also use the Using-statement:
Using thereader As New StreamReader(filename, Encoding.Default)
Dim colsexpected As Integer = 7
Dim sline As String = ""
Dim line As String = thereader.ReadLine() ' header
if line is Nothing Then Return
Do
sline = thereader.ReadLine()
If sline Is Nothing Then Exit Do
Dim words() As String = sline.Split(";"c)
' ... '
Loop
End Using
You should use VB.NET class that is designed and tested for this purpose. It is Microsoft.VisualBasic.FileIO.TextFieldParser and you can skip header by calling ReadFields() once before you start parsing in loop.

Why is each String being written to a different file?

I am trying to generate a size-based list of files. The current size being passed is 10 MB worth of file-names per text file. Instead of it counting to 10 MB and then incrementing the version letter, it is writing each file-name to its own individual file. This is strange as each file is ~150 kb, but I cannot figure out why it is reporting total as > number every time the code loops.
Private Function GenerateListsForSize(source As String, destination As String, name As String, number As Integer)
Dim files As ArrayList = New ArrayList
Dim total As Integer
Dim version As Char = "A"
Dim path As String
Dim counter As Integer = 0
Dim passTexts As ArrayList = New ArrayList
Dim infoReader As System.IO.FileInfo
For Each foundFile As String In My.Computer.FileSystem.GetFiles(source)
files.Add(foundFile)
Next
If files.Count > 1 Then 'If files exist in dir, count them and get how many lists
path = destination & "\" & name & version & ".txt"
Dim fs As FileStream = File.Create(path) 'creates the first text file
fs.Close()
passTexts.Add(path)
For Each foundfile As String In files
Using sw As StreamWriter = New StreamWriter(path)
Console.WriteLine(foundfile)
sw.WriteLine(foundfile)
End Using
infoReader = My.Computer.FileSystem.GetFileInfo(foundfile)
total = total + infoReader.Length
If total >= number Then 'If max file size is reached
version = Chr(Asc(version) + 1) 'Increments Version
path = destination & "\" & name & version & ".txt" 'Corrects path
fs = File.Create(path) 'creates the new text file with updated path
fs.Close()
passTexts.Add(path)
total = 0 'resets total
End If
Next
End If
Return passTexts
End Function
Every time through the loop, you open the file (using the StreamWriter) which overwrites the previous contents. Your file will only ever have one filename inside it. Instead of opening and writing every time through the loop, only write the file when you have accumulated all the filenames. I removed the calls to File.Create as they aren't necessary. The StreamWriter will create the file if it doesn't exist. And I changed the ArrayList's to List(Of String) since they're easier to work with. Also, be sure to turn Option Strict On. This code has not been tested, but it should get my point across. I hope I haven't misunderstood what you were trying to do.
Private Function GenerateListsForSize(source As String, destination As String, name As String, number As Integer) As List(Of String)
Dim files As New List(Of String)()
Dim filenamesToWrite As New List(Of String)()
Dim total As Integer
Dim version As Char = "A"
Dim filename As String
Dim counter As Integer = 0
Dim passTexts As New List(Of String)()
Dim infoReader As System.IO.FileInfo
files.AddRange(My.Computer.FileSystem.GetFiles(source))
If files.Count > 1 Then 'If files exist in dir, count them and get how many lists
'Path.Combine is preferable to concatenating strings.
filename = Path.Combine(destination, String.Format("{0}{1}.txt", name, version))
passTexts.Add(filename)
For Each foundfile As String In files
filenamesToWrite.Add(foundfile)
infoReader = My.Computer.FileSystem.GetFileInfo(foundfile)
total = total + infoReader.Length
If total >= number Then 'If max file size is reached
'Only write when the list is complete for this batch.
Using sw As StreamWriter = New StreamWriter(filename)
For Each fname As String In filenamesToWrite
Console.WriteLine(foundfile)
sw.WriteLine(foundfile)
Next
End Using
version = Chr(Asc(version) + 1) 'Increments Version
filename = Path.Combine(destination, String.Format("{0}{1}.txt", name, version)) 'corrects path
passTexts.Add(filename) 'IS THIS A DUPLICATE????
total = 0 'resets total
filenamesToWrite.Clear() 'clear the list of file names to write
End If
Next
End If
Return passTexts
End Function

VB "Index was out of range, must be non-negative and less than the size of the collection." When trying to generate a random number more than once

So I'm trying to generate a random number on button click. Now this number needs to be between two numbers that are inside my text file with various other things all separated by the "|" symbol. The number is then put into the text of a textbox which is being created after i run the form. I can get everything to work perfectly once, but as soon as i try to generate a different random number it gives me the error: "Index was out of range, must be non-negative and less than the size of the collection." Here is the main code as well as the block that generates the textbox after loading the form. As well as the contents of my text file.
Private Sub generate()
Dim newrandom As New Random
Try
Using sr As New StreamReader(itemfile) 'Create a stream reader object for the file
'While we have lines to read in
Do Until sr.EndOfStream
Dim line As String
line = sr.ReadLine() 'Read a line out one at a time
Dim tmp()
tmp = Split(line, "|")
rows(lineNum).buybutton.Text = tmp(1)
rows(lineNum).buyprice.Text = newrandom.Next(tmp(2), tmp(3)) 'Generate the random number between two values
rows(lineNum).amount.Text = tmp(4)
rows(lineNum).sellprice.Text = tmp(5)
rows(lineNum).sellbutton.Text = tmp(1)
lineNum += 1
If sr.EndOfStream = True Then
sr.Close()
End If
Loop
End Using
Catch x As Exception ' Report any errors in reading the line of code
Dim errMsg As String = "Problems: " & x.Message
MsgBox(errMsg)
End Try
End Sub
Private Sub Form2_Load(sender As Object, e As EventArgs) Handles MyBase.Load
rows = New List(Of duplicate)
For dupnum = 0 To 11
'There are about 5 more of these above this one but they all have set values, this is the only troublesome one
Dim buyprice As System.Windows.Forms.TextBox
buyprice = New System.Windows.Forms.TextBox
buyprice.Width = textbox1.Width
buyprice.Height = textbox1.Height
buyprice.Left = textbox1.Left
buyprice.Top = textbox1.Top + 30 * dupnum
buyprice.Name = "buypricetxt" + Str(dupnum)
Me.Controls.Add(buyprice)
pair = New itemrow
pair.sellbutton = sellbutton
pair.amount = amounttxt
pair.sellprice = sellpricetxt
pair.buybutton = buybutton
pair.buyprice = buypricetxt
rows.Add(pair)
next
end sub
'textfile contents
0|Iron Sword|10|30|0|0
1|Steel Sword|20|40|0|0
2|Iron Shield|15|35|0|0
3|Steel Shield|30|50|0|0
4|Bread|5|10|0|0
5|Cloak|15|30|0|0
6|Tent|40|80|0|0
7|Leather Armour|50|70|0|0
8|Horse|100|200|0|0
9|Saddle|50|75|0|0
10|Opium|200|500|0|0
11|House|1000|5000|0|0
Not sure what else to add, if you know whats wrong please help :/ thanks
Add the following two lines to the start of generate():
Private Sub generate()
Dim lineNum
lineNum = 0
This ensures that you don't point to a value of lineNum outside of the collection.
I usually consider it a good idea to add
Option Explicit
to my code - it forces me to declare my variables, and then I think about their initialization more carefully. It helps me consider their scope, too.
Try this little modification.
I took your original Sub and changed a little bit take a try and let us know if it solve the issue
Private Sub generate()
Dim line As String
Dim lineNum As Integer = 0
Dim rn As New Random(Now.Millisecond)
Try
Using sr As New StreamReader(_path) 'Create a stream reader object for the file
'While we have lines to read in
While sr.Peek > 0
line = sr.ReadLine() 'Read a line out one at a time
If Not String.IsNullOrEmpty(line) And Not String.IsNullOrWhiteSpace(line) Then
Dim tmp()
tmp = Split(line, "|")
rows(lineNum).buybutton.Text = tmp(1)
rows(lineNum).buyprice.Text = rn.Next(CInt(tmp(2)), CInt(tmp(3))) 'Generate the random number between two values
rows(lineNum).amount.Text = tmp(4)
rows(lineNum).sellprice.Text = tmp(5)
rows(lineNum).sellbutton.Text = tmp(1)
lineNum += 1
End If
End While
End Using
Catch x As Exception ' Report any errors in reading the line of code
Dim errMsg As String = "Problems: " & x.Message
MsgBox(errMsg)
End Try
End Sub

permutation not accepting large words

the vb.net code below permutates a given word...the problem i have is that it does not accept larger words like "photosynthesis", "Calendar", etc but accepts smaller words like "book", "land", etc ...what is missing...Pls help
Module Module1
Sub Main()
Dim strInputString As String = String.Empty
Dim lstPermutations As List(Of String)
'Loop until exit character is read
While strInputString <> "x"
Console.Write("Please enter a string or x to exit: ")
strInputString = Console.ReadLine()
If strInputString = "x" Then
Continue While
End If
'Create a new list and append all possible permutations to it.
lstPermutations = New List(Of String)
Append(strInputString, lstPermutations)
'Sort and display list+stats
lstPermutations.Sort()
For Each strPermutation As String In lstPermutations
Console.WriteLine("Permutation: " + strPermutation)
Next
Console.WriteLine("Total: " + lstPermutations.Count.ToString)
Console.WriteLine("")
End While
End Sub
Public Sub Append(ByVal pString As String, ByRef pList As List(Of String))
Dim strInsertValue As String
Dim strBase As String
Dim strComposed As String
'Add the base string to the list if it doesn't exist
If pList.Contains(pString) = False Then
pList.Add(pString)
End If
'Iterate through every possible set of characters
For intLoop As Integer = 1 To pString.Length - 1
'we need to slide and call an interative function.
For intInnerLoop As Integer = 0 To pString.Length - intLoop
'Get a base insert value, example (a,ab,abc)
strInsertValue = pString.Substring(intInnerLoop, intLoop)
'Remove the base insert value from the string eg (bcd,cd,d)
strBase = pString.Remove(intInnerLoop, intLoop)
'insert the value from the string into spot and check
For intCharLoop As Integer = 0 To strBase.Length - 1
strComposed = strBase.Insert(intCharLoop, strInsertValue)
If pList.Contains(strComposed) = False Then
pList.Add(strComposed)
'Call the same function to review any sub-permutations.
Append(strComposed, pList)
End If
Next
Next
Next
End Sub
End Module
Without actually creating a project to run this code, nor knowing how it 'doesn't accept' long words, my answer would be that there are a lot of permutations for long words and your program is just taking much longer than you're expecting to run. So you probably think it has crashed.
UPDATE:
The problem is the recursion, it's blowing up the stack. You'll have to rewrite your code to use an iteration instead of recursion. Generally explained here
http://www.refactoring.com/catalog/replaceRecursionWithIteration.html
Psuedo code here uses iteration instead of recursion
Generate list of all possible permutations of a string

How to check if lines in string are separated by space?

I'm building a program that gets the publisher of the book by scanning its title page and using OCR … since publishers are always at the bottom of the title page I'm thinking that a detecting lines that are separated by space is a solution but I don't know how to test for that. Here is my code:
Dim builder As New StringBuilder()
Dim reader As New StringReader(txtOCR.Text)
Dim iCounter As Integer = 0
While True
Dim line As String = reader.ReadLine()
If line Is Nothing Then Exit While
'i want to put the condition here
End While
txtPublisher.Text = builder.ToString()
Do you mean empty lines? Then you can do this:
Dim bEmpty As Boolean
And then inside the loop:
If line.Trim().Length = 0 Then
bEmpty = True
Else
If bEmpty Then
'...
End If
bEmpty = False
End If
Why not do the following: from the bottom, go up until you find the first non-empty line (no idea how the OCR works … maybe the bottom-most line is always non-empty, in which case this is redundant). In the next step, go up until the first empty line. The text in the middle is the publisher.
You don’t need the StringReader for that:
Dim lines As String() = txtOCR.Text.Split(Environment.NewLine)
Dim bottom As Integer = lines.Length - 1
' Find bottom-most non-empty line.
Do While String.IsNullOrWhitespace(lines(bottom))
bottom -= 1
Loop
' Find empty line above that
Dim top As Integer = bottom - 1
Do Until String.IsNullOrWhitespace(lines(top))
top -= 1
Loop
Dim publisherSubset As New String(bottom - top)()
Array.Copy(lines, top + 1, publisherSubset, 0, bottom - top)
Dim publisher As String = String.Join(Environment.NewLine, publisherSubset)
But to be honest I don’t think this is a particularly good approach. It’s inflexible and doesn’t cope well with unexpected formatting. I’d instead use a regular expression to describe what the publisher string (and its context) looks like. And maybe even that isn’t enough and you have to put some thought into describing the whole page to extrapolate which of the bits is the publisher.
Assuming the publisher is always on the last line and always comes after an empty line. Then perhaps the following?
Dim Lines as New List(Of String)
Dim currentLine as String = ""
Dim previousLine as String = ""
Using reader As StreamReader = New StreamReader(txtOCR.Txt)
currentLine = reader.ReadLine
If String.IsNullOrWhiteSpace(previousLine) then lines.Add(currentLine)
previousLine = currentLine
End Using
txtPublisher.Text = lines.LastOrDefault()
To ignore if the previous line is blank/empty:
Dim Lines as New List(Of String)
Using reader As StreamReader = New StreamReader(txtOCR.Txt)
lines.Add(reader.ReadLine)
End Using
txtPublisher.Text = lines.LastOrDefault()