Efficiently storing a GUID in text? - vb.net

Our app requires us to write out a large list of key, GUID value pairs for export to a plain text file.
6004, aa20dc0b-1e10-4efa-b683-fc98820b0fba
There will be potentially millions of these GUIDs, and there may not be a lot of room on the device where the file will be written, so is there a more efficient way to print the GUID to keep the file size down?
I experimented with Hex encoding
Dim g As Guid = Guid.NewGuid
Dim sb As New Text.StringBuilder
For Each b As Byte In g.ToByteArray
sb.Append(String.Format("{0:X2}", b))
Next
Console.WriteLine(sb.ToString)
this got it down to 32 chars, each line is a bit shorter:
9870, EBB7EF914C29A6459A34EDCB61EB8C8F
Are there any other approaches to write the GUIDs to the file that are more efficient?

I agree with previous comment, SQL would be ideal (or another DB) text files can be very unstable. Just done some quick testing iterating over millions of GUIDs and storing in text files.
Here is the outcome, basically anything more than 1 million guids would start to cause issues. You can store safely 10 million, but the file would struggle to open (on an average PC) as its > 150mb see below:
The code I used is below for you if wanted to try out. I know its not perfect, but gives you an idea of where the time goes. Main conclusions are to append the files with a bulk append, don't try to append each GUID individually if you can help it. This saves a large amount of processing and time!
Even if you convert to other formats like base or binary, I think you will still get similar variations on the file size, don't forget, you are still just saving as a text file, binary will no doubt be a larger file size due to string length!
Hope this helps comment back if you need anything!
Chicken
Dim Report_List As List(Of String)
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Report_List = New List(Of String)
Dim Zeros As Integer = 2
Do While Zeros < 8
Dim sw As Stopwatch = New Stopwatch
sw.Start()
Dim CountName As String = "1" & Microsoft.VisualBasic.Left("00000000000000000000000", Zeros)
Dim CountNum As Integer = CInt(CountName)
Dim Master_String As System.Text.StringBuilder = New System.Text.StringBuilder
For i = 1 To CountNum
Dim g As Guid = Guid.NewGuid
'Dim sb As New System.Text.StringBuilder
'For Each b As Byte In g.ToByteArray
' sb.Append(String.Format("{0:X2}", b))
'Next
'Master_String.AppendLine(sb.ToString)
'Master_String.AppendLine(sb.ToString)
Master_String.AppendLine(Convert.ToBase64String(g.ToByteArray))
i += 1
Next
Using sr As StreamWriter = New StreamWriter("c:\temp\test-" & CountName & ".txt", False)
sr.Write(Master_String.ToString)
End Using
sw.Stop()
Report_List.Add(sw.Elapsed.ToString & " - " & CountName)
Zeros += 1
Loop
For Each lr In Report_List
Me.ListBox1.Items.Add(lr)
Next
End Sub

Related

Reading from text files in Visual Basic

This is the first challenge on Day 1 of the 2018 Advent of Code
(link: https://adventofcode.com/2018/day/1)
So I am trying to create a program that reads a long list of positive and negative numbers (e.g +1, -2, +3, etc.) and then add them up to create a total. I have researched some methods of file handling in Visual Basic, and have come up with the below method:
Sub Main()
Dim objStreamReader As StreamReader
Dim strLine As String = ""
Dim total As Double = 0
objStreamReader = New StreamReader(AppDomain.CurrentDomain.BaseDirectory & "frequencies.txt")
strLine = objStreamReader.ReadLine
Do While Not strLine Is Nothing
Console.WriteLine(strLine)
strLine = objStreamReader.ReadLine
total += strLine
Loop
Console.WriteLine(total)
objStreamReader.Close()
Console.ReadLine()
End Sub
Here is a link to the list of numbers: https://adventofcode.com/2018/day/1/input
It is not a syntax error I am getting but a logic error. The answer is somehow wrong, but I cannot seem to figure out where! I have tried to remove the signs from each number but that throws me a NullException error when it compiles.
So far I have come out with the answer 549, which the Advent of Code webiste rejects. Any ideas?
Make your life easier by using File.ReadLines(fileName) instead of dealing with StreamReader. Use Path.Combine instead of string concatenation to create a path. Path.Combine takes care of adding missing \ or removing extra ones etc.
Your file might contain an extra empty line at its end, that does not convert to a number. Use Double.TryParse to make sure you have a valid number before totalizing it. You should have Option Strict On anyway to enforce explicit conversions.
Dim fileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "frequencies.txt")
Dim total As Double = 0
For Each strLine As String In File.ReadLines(fileName)
Console.WriteLine(strLine)
Dim n As Double
If Double.TryParse(strLine, n) Then
total += n
End If
Next
Console.WriteLine(total)
Console.ReadLine()
For appending two string, please use string builder.
Dim test as new stringbuilder()
Test.append("your string")
It will not affect performance.

Vernam cipher key file sincronization

I am working in VB 2010 on a one-time pad software.
The problem is the following. I have a txt file with many continous characters like this "KHKJHDKJDHAKHDAKHDAKHAKDHADHAKJDHASJDHA", i need to read a variable number of this characters and after replace it with a same number of dots (Ex. read four chars "KHKJ" replaced them with four dots "....HDKJDETCETC"). After this i will have to memorize the position of the last dot, in the txt key file, for a next read of the remaining keys. How to do it?
Thanks to info.de for the quick reply and my apologize for not having deepened my request. Let me explain better, my project is a Vernam chat and i need to have one single keys txt file that must remain synchronized in some way between broadcaster and receiver. When I send the message the key used must be deleted and who receives after decripting delete it too.
The goal is synchronizing the two key files!
I thought something like this:
'READ PART
Using stream = File.OpenRead("c:\key.txt")
stream.Seek(v, SeekOrigin.Begin)
Dim b = New Byte(a - 1) {} '
stream.Read(b, 0, a)
Dim str = Encoding.ASCII.GetString(b)
txtPad.Text = str ' portions of code read in txt for encodinh/decoding
End Using
v = startup position for read in file key.txt
a= lenght of text in key.txt to be taken = lenght of text to encode
WRITE PART
Using stream = File.OpenWrite("c:\key.txt")
stream.Seek(v, SeekOrigin.Begin)
Dim b = New Byte(a - 1) {}
stream.Write(b, 0, 1)
End Using
But how can I overwrite (no delete) with dots taken characters using this code?
And above all what will be the next coordinates to repeat a subsequent encoding/decoding operation? (V changes constantly)
Thank's in advance.
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
'//--- -----------------------------------------------
'//--- KHKJHDKJDHAKHDAKHDAKHAKDHADHAKJDHASJDHA-line1#
'//--- KHKJHDKJDHAKHDAKHDAKHAKDHADHAKJDHASJDHA-line2#
Dim lines() As String = System.IO.File.ReadAllLines(Application.StartupPath & "\TextFile1.txt")
Dim iReplaceLength As Integer
Dim sReplaceSubString As String
For Each line As String In lines
MessageBox.Show(line)
sReplaceSubString = "KHKJ"
iReplaceLength = sReplaceSubString.Length
MessageBox.Show(line.Replace(sReplaceSubString.ToCharArray, New String("."c, iReplaceLength)))
MessageBox.Show(line.Substring(iReplaceLength, line.Length - iReplaceLength))
Next
End Sub
End Class

Visual Basic Read File Line by Line storing each Line in str

I am trying to loop through the contents of a text file reading the text file line by line. During the looping process there is several times I need to use the files contents.
Dim xRead As System.IO.StreamReader
xRead = File.OpenText(TextBox3.Text)
Do Until xRead.EndOfStream
Dim linetext As String = xRead.ReadLine
Dim aryTextFile() As String = linetext.Split(" ")
Dim firstname As String = Val(aryTextFile(0))
TextBox1.Text = firstname.ToString
Dim lastname As String = Val(aryTextFile(0))
TextBox2.Text = lastname.ToString
Loop
Edit: What I am trying to do is read say the first five items in a text file perform some random processing then read the next 5 lines of the text file.
I would like to be able to use the lines pulled from the text file as separated string variables.
It is not clear why you would need to have 5 lines stored at any time, according to your code sample, since you are only processing one line at a time. If you think that doing 5 lines at once will be faster - this is unlikely, because .NET maintains caching internally, so both approaches will probably perform the same. However, reading one line at a time is a much more simple pattern to use, so better look into that first.
Still, here is an approximate version of the code that does processing every 5 lines:
Sub Main()
Dim bufferMaxSize As Integer = 5
Using xRead As New System.IO.StreamReader(TextBox3.Text)
Dim buffer As New List(Of String)
Do Until xRead.EndOfStream
If buffer.Count < bufferMaxSize Then
buffer.Add(xRead.ReadLine)
Continue Do
Else
PerformProcessing(buffer)
buffer.Clear()
End If
Loop
If buffer.Count > 0 Then
'if line count is not divisible by bufferMaxSize, 5 in this case
'there will be a remainder of 1-4 records,
'which also needs to be processed
PerformProcessing(buffer)
End If
End Using
End Sub
Here is mine . Rely easy . Just copy the location from the file and copy1 folder to does locations . This is my first program :) . ready proud of it
Imports System.IO
Module Module1
Sub Main()
For Each Line In File.ReadLines("C:\location.txt".ToArray)
My.Computer.FileSystem.CopyDirectory("C:\Copy1", Line, True)
Next
Console.WriteLine("Done")
Console.ReadLine()
End Sub
End Module

How To Read From Text File & Store Data So To Modify At A Later Time

What I am trying to do may be better for use with SQL Server but I have seen many applications in the past that simply work on text files and I am wanting to try to imitate the same behaviour that those applications follow.
I have a list of URL's in a text file. This is simple enough to open and read line by line, but how can I store additional data from the file and query the data?
E.g.
Text File:
http://link1.com/ - 0
http://link2.com/ - 0
http://link3.com/ - 1
http://link4.com/ - 0
http://link5.com/ - 1
Then I will read the data with:
Private Sub ButtonX2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ButtonX2.Click
OpenFileDialog1.Filter = "*txt Text Files|*.txt"
If OpenFileDialog1.ShowDialog() = DialogResult.OK Then
Dim AllText As String = My.Computer.FileSystem.ReadAllText(OpenFileDialog1.FileName)
Dim Lines() = Split(AllText, vbCrLf)
Dim list = New List(Of Test)
Dim URLsLoaded As Integer = 0
For i = 0 To UBound(Lines)
If Lines(i) = "" Then Continue For
Dim URLInfo As String() = Split(Lines(i), " - ")
If URLInfo.Count < 6 Then Continue For
list.Add(New Test(URLInfo(0), URLInfo(1)))
URLsLoaded += 1
Next
DataGridViewX1.DataSource = list
LabelX5.Text = URLsLoaded.ToString()
End If
End Sub
So as you can see, above I am prompting the user to open a text file, afterwards it is displayed back to the user in a datagridview.
Now here is my issue, I want to be able to query the data, E.g. Select * From URLs WHERE active='1' (Too used to PHP + MySQL!)
Where the 1 is the corresponding 1 or 0 after the URL in the text file.
In the above example the data is being stored in a simple class as per below:
Public Class Test
Public Sub New(ByVal URL As String, ByVal Active As Integer)
_URL = URL
_Active = Active
End Sub
Private _URL As String
Public Property URL() As String
Get
Return _URL
End Get
Set(ByVal value As String)
_URL = value
End Set
End Property
Private _Active As String
Public Property Active As String
Get
Return _Active
End Get
Set(ByVal value As String)
_Active = value
End Set
End Property
End Class
Am I going completely the wrong way about storing the data after importing from a text file?
I am new to VB.NET and still learning the basics but I find it much easier to learn by playing around before hitting the massive books!
Working example:
Dim myurls As New List(Of Test)
myurls.Add(New Test("http://link1.com/", 1))
myurls.Add(New Test("http://link2.com/", 0))
myurls.Add(New Test("http://link3.com/", 0))
Dim result = From t In myurls Where t.Active = 1
For Each testitem As Test In result
MsgBox(testitem.URL)
Next
By the way, LINQ is magic. You can shorten your loading/parse code to 3 rows of code:
Dim Lines() = IO.File.ReadAllLines("myfile.txt")
Dim myurls As List(Of Test) = (From t In lines Select New Test(Split(t, " - ")(0), Split(t, " - ")(1))).ToList
DataGridViewX1.DataSource = myurls
The first line reads all lines in the file to an array of strings.
The second line splits each line in the array, and creates a test-item and then converts all those result items to an list ( of Test).
Of course this could be misused to sillyness by making it to a one-row:er:
DataGridViewX1.DataSource = (From t In IO.File.ReadAllLines("myfile.txt") Select New Test(Split(t, " - ")(0), Split(t, " - ")(1))).ToList
Wich would render your load function to contain only following 4 rows:
If OpenFileDialog1.ShowDialog() = DialogResult.OK Then
DataGridViewX1.DataSource = (From t In IO.File.ReadAllLines("myfile.txt") Select New Test(Split(t, " - ")(0), Split(t, " - ")(1))).ToList
LabelX5.Text = ctype(datagridviewx1.datasource,List(Of Test)).Count
End If
You can query your class using LINQ, as long as it is in an appropriate collection type, like List(of Test) . I am not familiar completely with the VB syntax for LINQ but it would be something like below.
list.Where(Function(x) x.Active == "1").Select(Function(x) x.Url)
However, this isnt actually storing anything into a database, which i think your question might be asking?
I think you are reinventing the wheel, which is not generally a good thing. If you want SQL like functionality just store the data in a SQL DB and query it.
There are a lot of reasons you should just use an existing DB:
Your code will be less tested and thus more likely to have bugs.
Your code will be less optimized and probably perform worse. (You were planning on implementing a query optimizer and indexing engine for performance, right?)
Your code won't have as many features (locking, constraints, triggers, backup/recovery, a query language, etc.)
There are lots of free RDBMS options out there so it might even be cheaper to use an existing system than spending your time writing an inferior one.
That said, if this is just an academic exercise, go for it. However, I wouldn't do this for a real-world system.

linq submitchanges runs out of memory

I have a database with about 180,000 records. I'm trying to attach a pdf file to each of those records. Each pdf is about 250 kb in size. However, after about a minute my program starts taking about about a GB of memory and I have to stop it. I tried doing it so the reference to each linq object is removed once it's updated but that doesn't seem to help. How can I make it clear the reference?
Thanks for your help
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
I suggest you perform this in batches, using Take (before the call to ToList) to process a particular number of items at a time. Read (say) 10, set the PDFContent on all of them, call SubmitChanges, and then start again. (I'm not sure offhand whether you should start with a new DataContext at that point, but it might be cleanest to do so.)
As an aside, your code to read the contents of a file is broken in at least a couple of ways - but it would be simpler just to use File.ReadAllBytes in the first place.
Also, your way of handling the list gradually shrinking is really inefficient - after fetching 180,000 records, you're then building a new list with 179,999 records, then another with 179,998 records etc.
Does the DataContext have ObjectTrackingEnabled set to true (the default value)? If so, then it will try to keep a record of essentially all the data it touches, thus preventing the garbage collector from being able to collect any of it.
If so, you should be able to fix the situation by periodically disposing the DataContext and creating a new one, or turning object tracking off.
OK. To use the smallest amount of memory we have to update the datacontext in blocks. I've put a sample code below. Might have sytax errors since I'm using notepad to type it in.
Dim DB as YourDataContext = new YourDataContext
Dim BlockSize as integer = 25
Dim AllItems = DB.Items.Where(function(i) i.PDFfile.HasValue=False)
Dim count = 0
Dim tmpDB as YourDataContext = new YourDataContext
While (count < AllITems.Count)
Dim _item = tmpDB.Items.Single(function(i) i.recordID=AllItems.Item(count).recordID)
_item.PDF = GetPDF()
Count +=1
if count mod BlockSize = 0 or count = AllItems.Count then
tmpDB.SubmitChanges()
tmpDB = new YourDataContext
GC.Collect()
end if
End While
To Further optimise the speed you can get the recordID's into an array from allitems as an anonymous type, and set DelayLoading on for that PDF field.