Split text file Path to (x) number of files with VB.NET - vb.net

I searched everywhere on stackoverflow.com but still not have a solution:
My code is very simple :
'Get Full Path of File
Dim elements As String = Path.GetTempPath() & "file.txt"
'Create A new folder for outputs if not exist
If (Not System.IO.Directory.Exists(Path.GetTempPath() & "folder")) Then
System.IO.Directory.CreateDirectory(Path.GetTempPath() & "folder")
End If
I want divide the file.txt with contents to (x) number of files inside the new folder
example :
if x = 3
the output files will be created automatically:
/folder/file_1.txt
/folder/file_2.txt
/folder/file_3.txt

Read the data from the existing text file.
Divide the data into an arbitrary number of strings of arbitrary length.
If the new directory doesn't exist create it.
Create files in this directory to store the arbitrary sections of data until an arbitrary number of files have been created and all data has been stored.
Since you refuse to provide the logic (I guess it is a secret), I will have to make some assumptions.
The original file contains lines that are approximately the same length.
The intent is divide the original file into files of approximately equal size.
The number of files is based on the size of the original file.
I split the files based on lines so a word would not be split between 2 files.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim OriginalFilePath = "C:/devlist.txt" '16KB file
Dim lines = File.ReadAllLines(OriginalFilePath)
Dim NumOfLines = lines.Length
Dim NumOfFiles As Integer = GetNumberOfFiles(OriginalFilePath, NumOfLines)
If NumOfFiles = -1 Then
MessageBox.Show("No data in file.")
Return
End If
'The "\" operator is integer division
Dim LinesPerFile = NumOfLines \ NumOfFiles
'Dim LeftoverLines = NumOfLines Mod NumOfFiles - didn't need this afterall
Directory.CreateDirectory("C:\Some Directory")
Dim StartIndex As Integer
Dim EndIndex As Integer
Dim sb As New StringBuilder
For i = 0 To NumOfFiles - 1
EndIndex = StartIndex + LinesPerFile
If EndIndex >= NumOfLines - 1 Then
EndIndex = NumOfLines - 1
End If
For index = StartIndex To EndIndex
sb.AppendLine(lines(index))
Next
Dim NewFilePath = $"C:\Some Directory\SplitFile{i.ToString}.txt"
'.WriteAllText will create the new file or overwrite it if it exists
File.WriteAllText(NewFilePath, sb.ToString)
StartIndex = EndIndex + 1
sb.Clear()
Next
End Sub
Private Function GetNumberOfFiles(FilePath As String, NumOfLines As Integer) As Integer
Dim OriginalFileLength = New FileInfo(FilePath).Length
Dim NumOfFiles As Integer
Select Case OriginalFileLength
Case 0
MessageBox.Show("No data in file")
Return -1
Case 1
NumOfFiles = 1
Case 2
If NumOfLines < 2 Then
NumOfFiles = 1
End If
NumOfFiles = 2
Case 3 To 10_000
If NumOfLines < 3 Then
NumOfFiles = NumOfLines
Else
NumOfFiles = 3
End If
'You can continue the If statements but I assumed
'a file of this size would have at least 4 lines
Case 10_001 To 100_000
NumOfFiles = 4
Case 100_001 To 500_000
NumOfFiles = 5
Case Else
NumOfFiles = 6
End Select
Return NumOfFiles
End Function
Result of splitting the 16KB file

This just sample
Dim Divider = 3
Dim myNewFile(Divider - 1) As String
Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText("C:\test.txt")
Dim myNewSize As Long = 0
Long.TryParse(fileReader.Length / Divider, myNewSize)
If myNewSize = 0 Then
MessageBox.Show("Can't Be Processed")
Exit Sub
End If
For myCnt As Int16 = 0 To divider - 1
myNewFile(myCnt) = fileReader.Substring(myCnt * myNewSize + 1, myNewSize)
Next
'Resize The Last For Include The Remain
Dim myNewLastSize As Long = myNewSize + fileReader.Length - myNewSize * Divider
myNewFile(Divider - 1) = fileReader.Substring((Divider - 1) * myNewSize + 1, myNewLastSize)
And then you should save each split data to each it's table

Related

Splitting string every 100 characters not working

I am having a problem where I just can't seem to get it to split or even display the message. The message variable is predefined in another part of my code and I have debugged to make sure that the value comes through. I am trying to get it so that every 100 characters it goes onto a new line and with every message it also goes onto a new line.
y = y - 13
messagearray.AddRange(Message.Split(ChrW(100)))
Dim k = messagearray.Count - 1
Dim messagefin As String
messagefin = ""
While k > -1
messagefin = messagefin + vbCrLf + messagearray(k)
k = k - 1
End While
k = 0
Label1.Text = Label1.Text & vbCrLf & messagefin
Label1.Location = New Point(5, 398 + y)
You can use regular expression. It will create the array of strings where every string contains 100 characters. If the amount of remained characters is less than 100, it will match all of them.
Dim input = New String("A", 310)
Dim mc = Regex.Matches(input, ".{1,100}")
For Each m As Match In mc
'// Do something
MsgBox(m.Value)
Next
You can use LINQ to do that.
When you do a Select you can get the index of the item by including a second parameter. Then group the characters by that index divided by the line length so, the first character has index 0, and 0 \ 100 = 0, all the way up to the hundredth char which has index 99: 99 \ 100 = 0. The next hundred chars have 100 \ 100 = 1 to 199 \ 100 = 1, and so on (\ is the integer division operator in VB.NET).
Dim message = New String("A"c, 100)
message &= New String("B"c, 100)
message &= New String("C"c, 99)
Dim lineLength = 100
Dim q = message.Select(Function(c, i) New With {.Char = c, .Idx = i}).
GroupBy(Function(a) a.Idx \ lineLength).
Select(Function(b) String.Join("", b.Select(Function(d) d.Char)))
TextBox1.AppendText(vbCrLf & String.Join(vbCrLf, q))
It is easy to see how to change the line length because it is in a variable with a meaningful name, for example I set it to 50 to get the output
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
You can use String.SubString to do that. Like this
Dim Message As String = "your message here"
Dim MessageList As New List (Of String)
For i As Integer = 0 To Message.Length Step 100
If (Message.Length < i + 100) Then
MessageList.Add(Message.SubString (i, Message.Length - i)
Exit For
Else
MessageList.Add(Message.SubString (i, 100))
End If
Next
Dim k = MessageList.Count - 1
...
Here is what your code produced with a bit of clean up. I ignored the new position of the label.
Private Sub OpCode()
Dim messagearray As New List(Of String) 'I guessed that messagearray was a List(Of T)
messagearray.AddRange(Message.Split(ChrW(100))) 'ChrW(100) is lowercase d
Dim k = messagearray.Count - 1
Dim messagefin As String
messagefin = ""
While k > -1
messagefin = messagefin + vbCrLf + messagearray(k)
k = k - 1
End While
k = 0 'Why reset k? It falls out of scope at End Sub
Label1.Text = Label1.Text & vbCrLf & messagefin
End Sub
I am not sure why you think that splitting a string by lowercase d would have anything to do with getting 100 characters. As you can see the code reversed the order of the list items. It also added a blank line between the existing text in the label (In this case Label1) and the new text.
To accomplish your goal, I first created a List(Of String) to store the chunks. The For loop starts at the beginning of the input string and keeps going to the end increasing by 10 on each iteration.
To avoid an index out of range which would happen at the end. Say, we only had 6 characters left from start index. If we tried to retrieve 10 characters we would have an index out of range.
At the end we join the elements of the string with the separated of new line.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
BreakInto10CharacterChunks("The quick brown fox jumped over the lazy dogs.")
End Sub
Private Sub BreakInto10CharacterChunks(input As String)
Dim output As New List(Of String)
Dim chunk As String
For StartIndex = 0 To input.Length Step 10
If StartIndex + 10 > input.Length Then
chunk = input.Substring(StartIndex, input.Length - StartIndex)
Else
chunk = input.Substring(StartIndex, 10)
End If
output.Add(chunk)
Next
Label1.Text &= vbCrLf & String.Join(vbCrLf, output)
End Sub
Be sure to look up String.SubString and String.Join to fully understand how these methods work.
https://learn.microsoft.com/en-us/dotnet/api/system.string.substring?view=netframework-4.8
and https://learn.microsoft.com/en-us/dotnet/api/system.string.join?view=netframework-4.8

vb.net efficiently finding byte sequence in byte array

so I am creating a piece of software that in short, has a list of original byte sequences and new sequences that those bytes need to be changed into, kinda like this in text form "original location(currently irrelevant as sequence can be in different places) $ 56,69,71,73,75,77 : 56,69,71,80,50,54"
I already have code that works fine, however there can be up to 600+ of these sequences to find and change and in some cases it is taking a really really long time 15 mins +, i think it is down to how long it is taking to find the sequences to them change so i am trying to find a better way to do this as currently it is unusable due to how long it takes.
I have copied the whole code for this function below in hopes one of you kind souls can have a look and help =)
Dim originalbytes() As Byte
Dim fd As OpenFileDialog = New OpenFileDialog()
fd.Title = "Select the file"
fd.Filter = "All files (*.*)|*.*|All files (*.*)|*.*"
fd.FilterIndex = 2
If fd.ShowDialog() = DialogResult.OK Then
TextBox2.Text = fd.FileName
originalbytes = File.ReadAllBytes(fd.FileName)
End If
Dim x As Integer = 0
Dim y As Integer = 0
Dim textbox1array() = TextBox1.Lines
Dim changedbytes() = originalbytes
Dim startvalue As Integer = 0
Dim databoxarray() As String
Dim databoxarray2() As String
While x < textbox1array.Length - 1
'for each change to make
databoxarray = textbox1array(x).Replace(" $ ", vbCr).Replace(" : ", vbCr).Split
databoxarray2 = databoxarray(1).Replace(",", vbCr).Split
Dim databox2bytes() As String = databoxarray2
'copy original bytes line to databox2 lines
y = 0
While y < (originalbytes.Length - databox2bytes.Length)
'repeat for all bytes in ori file - size of data to find
If originalbytes(y) = databox2bytes(0) Then
startvalue = y
Dim z As String = 1
Dim samebytecounter As Integer = 1
While z < databox2bytes.Length
'repeat for all ori bytes
If originalbytes(y + z) = databox2bytes(z) Then
samebytecounter = samebytecounter + 1
End If
z = z + 1
End While
If samebytecounter = databox2bytes.Length Then
'same original data found, make changes
Dim bytestoinsert() As String = databoxarray(2).Replace(",", vbCr).Split
Dim t As Integer = 0
While t < bytestoinsert.Length
changedbytes(startvalue + t) = bytestoinsert(t)
t = t + 1
End While
End If
End If
y = y + 1
End While
x = x + 1
End While
File.WriteAllBytes(TextBox2.Text & " modified", changedbytes)
Let 's take a look at that inner while loop in your code, there are some things that can be optimized:
There is no need to check the total length all the time
Dim length as Integer = originalbytes.Length - databox2bytes.Length
While y < length
'repeat for all bytes in ori file - size of data to find
If originalbytes(y) = databox2bytes(0) Then
startvalue = y
z is not necessary, samebytecounter does exactly the same
Dim samebytecounter As Integer = 1
This while loop is a real bottleneck, since you always check the full length of your databox2bytes, you should rather quit the while loop when they don't match
While samebytecounter < databox2bytes.Length AndAlso originalbytes(y + samebytecounter ) = databox2bytes(samebytecounter )
samebytecounter = samebytecounter + 1
End While
This seems fine, but you already splitted the data at the top of your while loop, so, no need to create another array that does the same operation again
If samebytecounter = databox2bytes.Length Then
'same original data found, make changes
Dim t As Integer = 0
While t < databoxarray2.Length
changedbytes(startvalue + t) = databoxarray2(t)
t = t + 1
End While
End If
End If
y = y + 1
End While
For the rest I would agree that the algorithm you created is hugely inefficient, theoretically your code could have been rewritten like eg: (didn't really test this code)
Dim text = System.Text.Encoding.UTF8.GetString(originalbytes, 0, originalbytes.Length)
dim findText = System.Text.Encoding.UTF8.GetString(stringToFind, 0, stringToFind.Length)
dim replaceWith = System.Text.Encoding.UTF8.GetString(stringToSet, 0, stringToSet.Length)
text = text.Replace( findText, replaceWith )
dim outbytes = System.Text.Encoding.UTF8.GetBytes(text)
which would probably be a huge time saver.
For the rest your code seems to be created in such a way that nobody will really understand it if it's laying around for a month or so, I would say, including yourself

how to convert (3 digit) Decimal to Ascii from textbox.text?

when i inpot Decimal numbers to textbox, the output will be one word
EX:
input:
textbox.text = 11311711511597105
output:
textbox.text = qussai
You should show us what you had tried.
The full code should be like this:
Module VBModule
Sub Main()
Dim output As String = DecimalToASCII("113117115115097105")
Console.WriteLine(output)
End Sub
Function DecimalToASCII(ByVal input As String) As String
Dim current As String = ""
Dim temp As Integer = 0
If input.Length Mod 3 <> 0 Then
Return "Wrong Input"
End If
For i As Integer = 0 To input.Length - 1 Step 3
temp = 0
For j As Integer = i To i + 2
temp *= 10
temp += CType(input(j).ToString(), Integer)
Next
current &= Chr(temp).ToString()
Next
Return current
End Function
End Module

VBA to load very large file in one go (no buffering)

I am experiencing an unexpected vb limitation on the string max size, as explained in this post:
VBA unexpected reach of string size limit
While I was expecting to be able to load files up to 2GB (2^31 char) using open path for binary and get function, I get an out of string space error when I try to load a string larger than 255,918,061 characters.
I managed to work around this issue buffering the input stream of get. The problem is that I need to load the file as an array of string by splitting the buffer on vbCrLf characters.
This requires then to build the array line by line. Moreover, since I cannot be sure whether the buffer is ending on a break line or not I need additional operations. This solution is Time and Memory consuming. Loading a file of 300MB with this code costs 900MB (!) use of memory by excel. Is there a better solution ?
Here bellow is my code:
Function Load_File(path As String) As Variant
Dim MyData As String, FNum As Integer
Dim LenRemainingBytes As Long
Dim BufferSizeCurrent As Long
Dim FileByLines() As String
Dim CuttedLine As Boolean
Dim tmpSplit() As String
Dim FinalSplit() As String
Dim NbOfLines As Long
Dim LastLine As String
Dim count As Long, i As Long
Const BufferSizeMax As Long = 100000
FNum = FreeFile()
Open path For Binary As #FNum
LenRemainingBytes = LOF(FNum)
NbOfLines = FileNbOfLines(path)
ReDim FinalSplit(NbOfLines)
CuttedLine = False
Do While LenRemainingBytes > 0
MyData = ""
If LenRemainingBytes > BufferSizeMax Then
BufferSizeCurrent = BufferSizeMax
Else
BufferSizeCurrent = LenRemainingBytes
End If
MyData = Space$(BufferSizeCurrent)
Get #FNum, , MyData
tmpSplit = Split(MyData, vbCrLf)
If CuttedLine Then
count = count - 1
tmpSplit(0) = LastLine & tmpSplit(0)
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
Else
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
End If
Erase tmpSplit
LastLine = Right(MyData, Len(MyData) - InStrRev(MyData, vbCrLf) - 1)
CuttedLine = Len(LastLine) > 1
LenRemainingBytes = LenRemainingBytes - BufferSizeCurrent
Loop
Close FNum
Load_File = FinalSplit
Erase FinalSplit
End Function
Where the function FileNbOfLines is efficiently returning the number of line break characters.
Edit:
My Needs are:
To look for a specific string within the file
To get a specific number of lines coming after this string
Here you go, not pretty but should give you the general concept:
Sub GetLines()
Const fileName As String = "C:\Users\bloggsj\desktop\testfile.txt"
Const wordToFind As String = "FindMe"
Dim lineStart As String
Dim lineCount As String
Dim linesAfterWord As Long
With CreateObject("WScript.Shell")
lineCount = .Exec("CMD /C FIND /V /C """" """ & fileName & """").StdOut.ReadAll
lineStart = Split(.Exec("CMD /C FIND /N """ & wordToFind & """ """ & fileName & """").StdOut.ReadAll, vbCrLf)(2)
End With
linesAfterWord = CLng(Trim(Mid(lineCount, InStrRev(lineCount, ":") + 1))) - CLng(Trim(Mid(lineStart, 2, InStr(lineStart, "]") - 2)))
Debug.Print linesAfterWord
End Sub
Uses CMD to count the number of lines, then find the line at which the word appears, then subtract one from the other to give you the amount of lines after the word has been found.
Answer: Yes, using ReadAll from FSO should do the job.
Best answer: Just avoid it !
My needs were:
Identify a specific string within the file
Extract a certain number of lines after this string
As far as you precisely know the exact amout of data you want to extract, and assuming this amount of data is below vba string size limit (!), here is what it does the job the faster.
Decrease of computation time is improved using binary comparison of strings. My code is as follows:
Function GetFileLines(path As String, str As String, NbOfLines As Long) As String()
Const BUFSIZE As Long = 100000
Dim StringFound As Boolean
Dim lfAnsi As String
Dim strAnsi As String
Dim F As Integer
Dim BytesLeft As Long
Dim Buffer() As Byte
Dim strBuffer As String
Dim BufferOverlap As String
Dim PrevPos As Long
Dim NextPos As Long
Dim LineCount As Long
Dim data As String
F = FreeFile(0)
strAnsi = StrConv(str, vbFromUnicode) 'Looked String
lfAnsi = StrConv(vbLf, vbFromUnicode) 'LineBreak character
Open path For Binary Access Read As #F
BytesLeft = LOF(F)
ReDim Buffer(BUFSIZE - 1)
'Overlapping buffer is 3/2 times the size of strBuffer
'(two bytes per character)
BufferOverlap = Space$(Int(3 * BUFSIZE / 4))
StringFound = False
Do Until BytesLeft = 0
If BytesLeft < BUFSIZE Then ReDim Buffer(BytesLeft - 1)
Get #F, , Buffer
strBuffer = Buffer 'Binary copy of bytes.
BytesLeft = BytesLeft - LenB(strBuffer)
Mid$(BufferOverlap, Int(BUFSIZE / 4) + 1) = strBuffer 'Overlapping Buffer
If Not StringFound Then 'Looking for the the string
PrevPos = InStrB(BufferOverlap, strAnsi) 'Position of the looked string within the buffer
StringFound = PrevPos <> 0
If StringFound Then strBuffer = BufferOverlap
End If
If StringFound Then 'When string is found, loop until NbOfLines
Do Until LineCount = NbOfLines
NextPos = InStrB(PrevPos, strBuffer, lfAnsi)
If NextPos = 0 And LineCount < NbOfLines Then 'Buffer end reached, NbOfLines not reached
'Adding end of buffer to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos)
PrevPos = 1
Exit Do
Else
'Adding New Line to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos, NextPos - PrevPos + 1)
End If
PrevPos = NextPos + 1
LineCount = LineCount + 1
If LineCount = NbOfLines Then Exit Do
Loop
End If
If LineCount = NbOfLines then Exit Do
Mid$(BufferOverlap, 1, Int(BUFSIZE / 4)) = Mid$(strBuffer, Int(BUFSIZE / 4))
Loop
Close F
GetFileLines = Split(data, vbCrLf)
End Function
To crunch even more computation time, it is highly advised to use fast string concatenation as explained here.
For instance the following function can be used:
Sub FastConcat(ByRef Dest As String, ByVal Source As String, ByRef ccOffset)
Dim L As Long, Buffer As Long
Buffer = 50000
L = Len(Source)
If (ccOffset + L) >= Len(Dest) Then
If L > Buffer Then
Dest = Dest & Space$(L)
Else
Dest = Dest & Space$(Buffer)
End If
End If
Mid$(Dest, ccOffset + 1, L) = Source
ccOffset = ccOffset + L
End Sub
And then use the function as follows:
NbOfChars = 0
Do until...
FastConcat MyString, AddedString, NbOfChars
Loop
MyString = Left$(MyString,NbOfChars)

How do I get an Ascii to warp to a certain value after it has past 122?

I am trying to write an encryption program. The problem I am facing is that I am converting the text to ascii and then adding on the offset. However when it goes past the letter 'z' I want it to warp back to 'a' and go from there.
Sub enc()
Text = TextBox1.Text
finalmessage = ""
letters = Text.ToCharArray
offset = ComboBox1.SelectedItem
For x = LBound(letters) To UBound(letters)
finalmessage = finalmessage + Chr(Asc(letters(x)) + offset)
Next
TextBox2.Text = finalmessage
End Sub
I guess to make it easy to decode afterwards, you should to it somewhat in the line of base64 encoding, first encoding everything to a normalized binary string, then encode in the range you want (since using binary, it has to be something that fits with 2^X).
To match your range, i used a baseset of 32, and a simple encoding decoding example (a bit more verbose that it should be, perhaps)
Module Module1
Dim encodeChars As String = "abcdefghijklmnopqrstuvwxyzABCDEF" ' use 32 as a base
Function Encode(text As String) As String
Dim bitEncoded As String = ""
Dim outputMessage As String = ""
For Each ch As Char In text.ToCharArray()
Dim i As Integer = Convert.ToByte(ch)
bitEncoded &= Convert.ToString(i, 2).PadLeft(8, "0"c)
Next
While bitEncoded.Length Mod 5 <> 0
bitEncoded &= "0"
End While
For position As Integer = 0 To bitEncoded.Length - 1 Step 5
Dim range As String = bitEncoded.Substring(position, 5)
Dim index As Integer = Convert.ToInt32(range, 2)
outputMessage &= encodeChars(index).ToString()
Next
Return outputMessage
End Function
Function Decode(encodedText As String) As String
Dim bitEncoded As String = ""
Dim outputMessage As String = ""
For Each ch In encodedText
Dim index As Integer = encodeChars.IndexOf(ch)
If index < 0 Then
Throw New FormatException("Invalid character in encodedText!")
End If
bitEncoded &= Convert.ToString(index, 2).PadLeft(5, "0"c)
Next
' strip the extra 0's
While bitEncoded.Length Mod 8 <> 0
bitEncoded = bitEncoded.Substring(0, bitEncoded.Length - 1)
End While
For position As Integer = 0 To bitEncoded.Length - 1 Step 8
Dim range As String = bitEncoded.Substring(position, 8)
Dim index As Integer = Convert.ToInt32(range, 2)
outputMessage &= Chr(index).ToString()
Next
Return outputMessage
End Function
Sub Main()
Dim textToEncode As String = "This is a small test, with some special characters! Just testing..."
Dim encodedText As String = Encode(textToEncode)
Dim decodedText As String = Decode(encodedText)
Console.WriteLine(textToEncode)
Console.WriteLine(encodedText)
Console.WriteLine(decodedText)
If Not String.Equals(decodedText, textToEncode) Then
Console.WriteLine("Encoding / decoding failed!")
Else
Console.WriteLine("Encoding / decoding completed succesfully!")
End If
Console.ReadLine()
End Sub
End Module
this then gives the following output?
This is a small test, with some special characters! Just testing...
krugsCzanfzsayjaonwwcBdmebAgkCBufqqhoAlunaqhgBBnmuqhgCdfmnuwcBbamnugcCtbmnAgkCtteeqeuDltoqqhizltoruwCzzofyxa
This is a small test, with some special characters! Just testing...
Encoding / decoding completed succesfully!