I have a database with about 180,000 records. I'm trying to attach a pdf file to each of those records. Each pdf is about 250 kb in size. However, after about a minute my program starts taking about about a GB of memory and I have to stop it. I tried doing it so the reference to each linq object is removed once it's updated but that doesn't seem to help. How can I make it clear the reference?
Thanks for your help
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
I suggest you perform this in batches, using Take (before the call to ToList) to process a particular number of items at a time. Read (say) 10, set the PDFContent on all of them, call SubmitChanges, and then start again. (I'm not sure offhand whether you should start with a new DataContext at that point, but it might be cleanest to do so.)
As an aside, your code to read the contents of a file is broken in at least a couple of ways - but it would be simpler just to use File.ReadAllBytes in the first place.
Also, your way of handling the list gradually shrinking is really inefficient - after fetching 180,000 records, you're then building a new list with 179,999 records, then another with 179,998 records etc.
Does the DataContext have ObjectTrackingEnabled set to true (the default value)? If so, then it will try to keep a record of essentially all the data it touches, thus preventing the garbage collector from being able to collect any of it.
If so, you should be able to fix the situation by periodically disposing the DataContext and creating a new one, or turning object tracking off.
OK. To use the smallest amount of memory we have to update the datacontext in blocks. I've put a sample code below. Might have sytax errors since I'm using notepad to type it in.
Dim DB as YourDataContext = new YourDataContext
Dim BlockSize as integer = 25
Dim AllItems = DB.Items.Where(function(i) i.PDFfile.HasValue=False)
Dim count = 0
Dim tmpDB as YourDataContext = new YourDataContext
While (count < AllITems.Count)
Dim _item = tmpDB.Items.Single(function(i) i.recordID=AllItems.Item(count).recordID)
_item.PDF = GetPDF()
Count +=1
if count mod BlockSize = 0 or count = AllItems.Count then
tmpDB.SubmitChanges()
tmpDB = new YourDataContext
GC.Collect()
end if
End While
To Further optimise the speed you can get the recordID's into an array from allitems as an anonymous type, and set DelayLoading on for that PDF field.
Related
Random() doesn't seem to be so random at all, it keeps repeating the pattern all the time.
How can I make this "more" random?
Dim ioFile As New System.IO.StreamReader("C:\names.txt")
Dim lines As New List(Of String)
Dim rnd As New Random()
Dim line As Integer
While ioFile.Peek <> -1
lines.Add(ioFile.ReadLine())
End While
line = rnd.Next(lines.Count + 0)
NAMES.AppendText(lines(line).Trim())
ioFile.Close()
ioFile.Dispose()
Clipboard.SetText(NAMES.Text)
This works fine for me. I changed a few things like implementing a using block, removed a redundant addition of 0, and added a loop to test 100 times out to debug. a sample of 200 that you are just "eyeballing" is not enough to say that a random sequence is "not working".
Using ioFile As New System.IO.StreamReader("C:\names.txt")
Dim lines As New List(Of String)
Dim rnd As New Random()
Dim line As Integer
While ioFile.Peek <> -1
lines.Add(ioFile.ReadLine())
End While
For i As Integer = 1 To 100
line = rnd.Next(lines.Count)
Debug.WriteLine(lines(line).Trim())
Next
End Using
You don't need a stream reader to read a text file. File.ReadAllLines will return an array of lines in the file. Calling .ToList on this method gets you the desired List(Of String)
We will loop through the length of the list in a for loop. We subtract one because indexes start at zero.
To get the random index we call .Next on our instance of the Random class that was declared outside the method (a form level variable) The .Next method is inclusive of the first variable and exclusive of the second. I used a variable to store the original value of lines.Count because this value will change in the loop and it would mess with for loop if we used lines.Count -1 directly in the To portion of the For.
Once we get the random index we add that line to the TextBox and remove it from the list.
Private Sub ShuffleNames()
Dim index As Integer
Dim lines = File.ReadAllLines("C:\Users\xxx\Desktop\names.txt").ToList
Dim loopLimit = lines.Count - 1
For i = 0 To loopLimit
index = rnd.Next(0, lines.Count)
TextBox1.AppendText(lines(index).Trim & Environment.NewLine)
lines.RemoveAt(index)
Next
End Sub
I have a variable Queue in which I write information from a stream. The variable is initiated as follows:
Public Shared Queue As List(Of String) = New List(Of String)(1024)
The code to read the stream is
Public Shared Sub ReadStreamForever(ByVal stream As Stream)
Dim encoder = New UTF8Encoding()
Dim buffer = New Byte(2047) {}
Dim counter as Integer = 0
While True
If stream.CanRead Then
Dim len As Integer = stream.Read(buffer, 0, 2048)
Counter = Counter + 1
If len > 0 Then
Dim text = encoder.GetString(buffer, 0, len)
SSEApplication.Push(text)
Else
Exit While
End If
Else
Exit While
End If
End While
End Sub
Where the push methode just does a few string manipulation and adds line after line into the Queue Variable
Public Shared Sub Push(ByVal text As String)
If String.IsNullOrWhiteSpace(text) Then
Return
End If
Dim lines = text.Trim().Split(vbLf)
SSEApplication.Queue.AddRange(lines)
End Sub
I have different big datasets I want to stream but the Queue length after filling it up is always 2691, so it looks like it is kind of limited in length. I just do not know where I limit the Queue Variable and how to enlarge it. Could anyone help me here?
In general, List doesn't have fixed length, Add method resizes List and makes space for another element.
If you want to have fixed length, you could use simple array: Dim Queue(1024) As string
But then, you will get an exception when trying to add more elements, so you can check the condition in Push method:
If lines.Count < 1024 Then
SSEApplication.Queue.AddRange(lines)
End If
That check will also prevent having more than 1024 elements when using List, but if you have collection of fixed length, I would recommend using simple array.
Useful resource: Arrays in Visual Basic, there you can also read, how to enlarge array, when you want to add extra elements using ReDim keyword.
Our app requires us to write out a large list of key, GUID value pairs for export to a plain text file.
6004, aa20dc0b-1e10-4efa-b683-fc98820b0fba
There will be potentially millions of these GUIDs, and there may not be a lot of room on the device where the file will be written, so is there a more efficient way to print the GUID to keep the file size down?
I experimented with Hex encoding
Dim g As Guid = Guid.NewGuid
Dim sb As New Text.StringBuilder
For Each b As Byte In g.ToByteArray
sb.Append(String.Format("{0:X2}", b))
Next
Console.WriteLine(sb.ToString)
this got it down to 32 chars, each line is a bit shorter:
9870, EBB7EF914C29A6459A34EDCB61EB8C8F
Are there any other approaches to write the GUIDs to the file that are more efficient?
I agree with previous comment, SQL would be ideal (or another DB) text files can be very unstable. Just done some quick testing iterating over millions of GUIDs and storing in text files.
Here is the outcome, basically anything more than 1 million guids would start to cause issues. You can store safely 10 million, but the file would struggle to open (on an average PC) as its > 150mb see below:
The code I used is below for you if wanted to try out. I know its not perfect, but gives you an idea of where the time goes. Main conclusions are to append the files with a bulk append, don't try to append each GUID individually if you can help it. This saves a large amount of processing and time!
Even if you convert to other formats like base or binary, I think you will still get similar variations on the file size, don't forget, you are still just saving as a text file, binary will no doubt be a larger file size due to string length!
Hope this helps comment back if you need anything!
Chicken
Dim Report_List As List(Of String)
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Report_List = New List(Of String)
Dim Zeros As Integer = 2
Do While Zeros < 8
Dim sw As Stopwatch = New Stopwatch
sw.Start()
Dim CountName As String = "1" & Microsoft.VisualBasic.Left("00000000000000000000000", Zeros)
Dim CountNum As Integer = CInt(CountName)
Dim Master_String As System.Text.StringBuilder = New System.Text.StringBuilder
For i = 1 To CountNum
Dim g As Guid = Guid.NewGuid
'Dim sb As New System.Text.StringBuilder
'For Each b As Byte In g.ToByteArray
' sb.Append(String.Format("{0:X2}", b))
'Next
'Master_String.AppendLine(sb.ToString)
'Master_String.AppendLine(sb.ToString)
Master_String.AppendLine(Convert.ToBase64String(g.ToByteArray))
i += 1
Next
Using sr As StreamWriter = New StreamWriter("c:\temp\test-" & CountName & ".txt", False)
sr.Write(Master_String.ToString)
End Using
sw.Stop()
Report_List.Add(sw.Elapsed.ToString & " - " & CountName)
Zeros += 1
Loop
For Each lr In Report_List
Me.ListBox1.Items.Add(lr)
Next
End Sub
I'm using this code to encrypt/decrypt files:
Public Shared Sub encryptordecryptfile(ByVal strinputfile As String, _
ByVal stroutputfile As String, _
ByVal bytkey() As Byte, _
ByVal bytiv() As Byte, _
ByVal direction As CryptoAction)
Try
fsInput = New System.IO.FileStream(strinputfile, FileMode.Open, FileAccess.Read)
fsOutput = New System.IO.FileStream(stroutputfile, FileMode.OpenOrCreate, FileAccess.Write)
fsOutput.SetLength(0)
Dim bytbuffer(4096) As Byte
Dim lngbytesprocessed As Long = 0
Dim lngfilelength As Long = fsInput.Length
Dim intbytesincurrentblock As Integer
Dim cscryptostream As CryptoStream
Dim csprijndael As New System.Security.Cryptography.RijndaelManaged
Select Case direction
Case CryptoAction.ActionEncrypt
cscryptostream = New CryptoStream(fsOutput, _
csprijndael.CreateEncryptor(bytkey, bytiv), _
CryptoStreamMode.Write)
Case CryptoAction.ActionDecrypt
cscryptostream = New CryptoStream(fsOutput, _
csprijndael.CreateDecryptor(bytkey, bytiv), _
CryptoStreamMode.Write)
End Select
While lngbytesprocessed < lngfilelength
intbytesincurrentblock = fsInput.Read(bytbuffer, 0, 4096)
cscryptostream.Write(bytbuffer, 0, intbytesincurrentblock)
lngbytesprocessed = lngbytesprocessed + CLng(intbytesincurrentblock)
End While
cscryptostream.Close()
fsInput.Close()
fsOutput.Close()
Catch ex As Exception
End Try
End Sub
Is I need to get the percentage of this process being done as an integer. I am going to use a background worker, so I need to call for this sub from the background worker and be able to keep refreshing a progress bar that the background worker reports to. Is this possible?
Thanks in advance.
There are a couple of things you can do to make your cryptor more efficient and other issues:
A method like encryptordecryptfile which then requires a "mode" argument to know which action to take means it really might be better off as 2 methods
The way you are going, you will be raising a blizzard of ProgressChanged events which the ProgressBar wont be able to keep up with given the animation. A 700K file will result in 170 or so progress reports of tiny amounts
Some of the crypto steps can be incorporated
You have a lot of things not being disposed of; you could run out of resources if you run a number of files thru it in a loop.
It might be worth noting that you can replace the entire While block with fsInput.CopyTo(cscryptostream) to process the file all at once. This doesnt allow progress reporting though. Its also not any faster.
Rather than a BackgroundWorker (which will work fine), you might want to implement it as a Task. The reason for this is that all those variables need to make their way from something like a button click to the DoWork event where your method is actually called. Rather than using global variables or a class to hold them, a Task works a bit more directly (but does involve one extra step when reporting progress). First, a revised EncryptFile method:
Private Sub EncryptFile(inFile As String,
outFile As String,
pass As String,
Optional reporter As ProgressReportDelegate = Nothing)
Const BLOCKSIZE = 4096
Dim percentDone As Integer = 0
Dim totalBytes As Int64 = 0
Dim buffSize As Int32
' Note A
Dim key = GetHashedBytes(pass)
Dim iv = GetRandomBytes(16)
Dim cryptor As ICryptoTransform
' Note B
Using fsIn As New FileStream(inFile, FileMode.Open, FileAccess.Read),
fsOut As New FileStream(outFile, FileMode.OpenOrCreate, FileAccess.Write)
fsOut.SetLength(0)
' Note C
'ToDo: work out optimal block size for Lg vs Sm files
If fsIn.Length > (2 * BLOCKSIZE) Then
' use buffer size to limit to 20 progress reports
buffSize = CInt(fsIn.Length \ 20)
' to multiple of 4096
buffSize = CInt(((buffSize + BLOCKSIZE - 1) / BLOCKSIZE) * BLOCKSIZE)
' optional, limit to some max size like 256k?
'buffSize = Math.Min(buffSize, BLOCK256K)
Else
buffSize = BLOCKSIZE
End If
Dim buffer(buffSize-1) As Byte
' Note D
' write the IV to "naked" fs
fsOut.Write(iv, 0, iv.Length)
Using rij = Rijndael.Create()
rij.Padding = PaddingMode.ISO10126
Try
cryptor = rij.CreateEncryptor(key, iv)
Using cs As New CryptoStream(fsOut, cryptor, CryptoStreamMode.Write)
Dim bytesRead As Int32
Do Until fsIn.Position = fsIn.Length
bytesRead = fsIn.Read(buffer, 0, buffSize)
cs.Write(buffer, 0, bytesRead)
If reporter IsNot Nothing Then
totalBytes += bytesRead
percentDone = CInt(Math.Floor((totalBytes / fsIn.Length) * 100))
reporter(percentDone)
End If
Loop
End Using
Catch crEx As CryptographicException
' ToDo: Set breakpoint and inspect message
Catch ex As Exception
' ToDo: Set breakpoint and inspect message
End Try
End Using
End Using
End Sub
Note A
One of the standard crypto tasks it could handle is creating the Key and IV arrays for you. These are pretty simple and could be shared/static members.
Public Shared Function GetHashedBytes(data As String) As Byte()
Dim hBytes As Byte()
' or SHA512Managed
Using hash As HashAlgorithm = New SHA256Managed()
' convert data to bytes:
Dim dBytes = Encoding.UTF8.GetBytes(data)
' hash the result:
hBytes = hash.ComputeHash(dBytes)
End Using
Return hBytes
End Function
Public Shared Function GetRandomBytes(size As Integer) As Byte()
Dim data(size - 1) As Byte
Using rng As New RNGCryptoServiceProvider
' fill the array
rng.GetBytes(data)
End Using
Return data
End Function
As will be seen later, you can store the IV in the encrypted file rather than saving and managing it in code.
Note B
Using blocks close and dispose of resources for you. Basically, if something has a Dispose method, then you should wrap it in a Using block.
Note C
You dont want to report progress for every block read, that will just overwhelm the ProgressBar. Rather than another variable to keep track of when the progress has changed by some amount, this code starts by creating a buffer size which is 5% of the input file size so there will be about 20 reports (every 5%).
As the comments indicate, you may want to add some code to set minimum/maximum buffer size. Doing so would change the progress report frequency.
Note D
You can write the IV() to the filestream before you wrap it in the CryptoStream (and of course read it back first when Decrypting). This prevents you from having to store the IV.
The last part is kicking this off as a Task:
Dim t As Task
t = Task.Run(Sub() EncryptFile(inFile, oFile, "MyWeakPassword",
AddressOf ReportProgress))
...
What a BGW does is execute the work on one thread, but progress is reported on the UI thread. As a Task, all we need to do is use Invoke:
Delegate Sub ProgressReportDelegate(value As Int32)
Private Sub ReportProgress(v As Int32)
If progBar.InvokeRequired Then
progBar.Invoke(Sub() progBar.Value = v)
Else
progBar.Value = v
progBar.Invalidate()
End If
End Sub
The Encryptor will work either directly or as a Task. For small files, you can omit the progress report entirely:
' small file, no progress report:
EncryptFile(ifile, oFile, "MyWeakPassword")
' report progress, but run on UI thread
EncryptFile(ifile, oFile, "MyWeakPassword",
AddressOf ReportProgress)
' run as task
Dim t As Task
t = Task.Run(Sub() EncryptFile(ifile, oFile, "MyWeakPassword",
AddressOf ReportProgress))
...and if you had a list of files to do, you could run them all at once and perhaps report total progress.
What I am trying to do may be better for use with SQL Server but I have seen many applications in the past that simply work on text files and I am wanting to try to imitate the same behaviour that those applications follow.
I have a list of URL's in a text file. This is simple enough to open and read line by line, but how can I store additional data from the file and query the data?
E.g.
Text File:
http://link1.com/ - 0
http://link2.com/ - 0
http://link3.com/ - 1
http://link4.com/ - 0
http://link5.com/ - 1
Then I will read the data with:
Private Sub ButtonX2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ButtonX2.Click
OpenFileDialog1.Filter = "*txt Text Files|*.txt"
If OpenFileDialog1.ShowDialog() = DialogResult.OK Then
Dim AllText As String = My.Computer.FileSystem.ReadAllText(OpenFileDialog1.FileName)
Dim Lines() = Split(AllText, vbCrLf)
Dim list = New List(Of Test)
Dim URLsLoaded As Integer = 0
For i = 0 To UBound(Lines)
If Lines(i) = "" Then Continue For
Dim URLInfo As String() = Split(Lines(i), " - ")
If URLInfo.Count < 6 Then Continue For
list.Add(New Test(URLInfo(0), URLInfo(1)))
URLsLoaded += 1
Next
DataGridViewX1.DataSource = list
LabelX5.Text = URLsLoaded.ToString()
End If
End Sub
So as you can see, above I am prompting the user to open a text file, afterwards it is displayed back to the user in a datagridview.
Now here is my issue, I want to be able to query the data, E.g. Select * From URLs WHERE active='1' (Too used to PHP + MySQL!)
Where the 1 is the corresponding 1 or 0 after the URL in the text file.
In the above example the data is being stored in a simple class as per below:
Public Class Test
Public Sub New(ByVal URL As String, ByVal Active As Integer)
_URL = URL
_Active = Active
End Sub
Private _URL As String
Public Property URL() As String
Get
Return _URL
End Get
Set(ByVal value As String)
_URL = value
End Set
End Property
Private _Active As String
Public Property Active As String
Get
Return _Active
End Get
Set(ByVal value As String)
_Active = value
End Set
End Property
End Class
Am I going completely the wrong way about storing the data after importing from a text file?
I am new to VB.NET and still learning the basics but I find it much easier to learn by playing around before hitting the massive books!
Working example:
Dim myurls As New List(Of Test)
myurls.Add(New Test("http://link1.com/", 1))
myurls.Add(New Test("http://link2.com/", 0))
myurls.Add(New Test("http://link3.com/", 0))
Dim result = From t In myurls Where t.Active = 1
For Each testitem As Test In result
MsgBox(testitem.URL)
Next
By the way, LINQ is magic. You can shorten your loading/parse code to 3 rows of code:
Dim Lines() = IO.File.ReadAllLines("myfile.txt")
Dim myurls As List(Of Test) = (From t In lines Select New Test(Split(t, " - ")(0), Split(t, " - ")(1))).ToList
DataGridViewX1.DataSource = myurls
The first line reads all lines in the file to an array of strings.
The second line splits each line in the array, and creates a test-item and then converts all those result items to an list ( of Test).
Of course this could be misused to sillyness by making it to a one-row:er:
DataGridViewX1.DataSource = (From t In IO.File.ReadAllLines("myfile.txt") Select New Test(Split(t, " - ")(0), Split(t, " - ")(1))).ToList
Wich would render your load function to contain only following 4 rows:
If OpenFileDialog1.ShowDialog() = DialogResult.OK Then
DataGridViewX1.DataSource = (From t In IO.File.ReadAllLines("myfile.txt") Select New Test(Split(t, " - ")(0), Split(t, " - ")(1))).ToList
LabelX5.Text = ctype(datagridviewx1.datasource,List(Of Test)).Count
End If
You can query your class using LINQ, as long as it is in an appropriate collection type, like List(of Test) . I am not familiar completely with the VB syntax for LINQ but it would be something like below.
list.Where(Function(x) x.Active == "1").Select(Function(x) x.Url)
However, this isnt actually storing anything into a database, which i think your question might be asking?
I think you are reinventing the wheel, which is not generally a good thing. If you want SQL like functionality just store the data in a SQL DB and query it.
There are a lot of reasons you should just use an existing DB:
Your code will be less tested and thus more likely to have bugs.
Your code will be less optimized and probably perform worse. (You were planning on implementing a query optimizer and indexing engine for performance, right?)
Your code won't have as many features (locking, constraints, triggers, backup/recovery, a query language, etc.)
There are lots of free RDBMS options out there so it might even be cheaper to use an existing system than spending your time writing an inferior one.
That said, if this is just an academic exercise, go for it. However, I wouldn't do this for a real-world system.