Datarow ends up with wrong or lost data - vb.net

I'm scraping twitter tweets, I launch multiple backgroundworkers and they do the following:
For x as Integer = 0 to 5
Dim BGW As New BackgroundWorker
AddHandler BGW.DoWork, AddressOf TweetGrab
BGW.RunWorkerAsync(tweeturl)
Next
Public TemporaryRows As New List(Of DataRow)
Private Sub TweetGrab(tweeturl as String)
'some html stuff here
Dim ImageUrl as String = twitterImage.Attributes("src").Value
Dim ThumbnailUrl As String = ImageUrl & ":small"
Dim DataRowTemporary As DataRow = DataTable1.NewRow()
DataRowTemporary("ImageUrl") = ImageUrl
DataRowTemporary("ThumbnailUrl") = ThumbnailUrl
DataRowTemporary("Checked") = False
'I detect the error even here
TemporaryRows.Add(DataRowTemporary)
End Sub
Later on, I do stuff with the TemporaryRows. I loop over the rows and check if they meet some conditions.
The problem is that DataRowTemporary("Checked") ends being DBNull and DataRowTemporary("ThumbnailUrl") is completely different than ImageUrl even though I specified Dim ThumbnailUrl As String = ImageUrl & ":small"
This happens in about 2/10 cases. I would guess it has something to do with background threads but I don't have any ideas how to solve it. I can reedit the fields after the error occurs, but I would like to prevent the error from occurring in the first place.

The problem was changing the collection when its being accessed by other threads (in parallel).
You can not do add/remove in parallel, the collection must be locked in order to not get strange errors.
SyncLock TemporaryRows
TemporaryRows 'Add/Remove
End SyncLock

Related

Ping multiple device names (hostname) on the Network

A DataGridView displays hostnames at Column index 0, computer / printer names on the network.
pc1
pc2
print3
pc5
print
....
There are more than 500 such names.
I know how to ping them:
For i = 0 To DataGridView1.Rows.Count - 1
Try
If My.Computer.Network.Ping(DataGridView1.Item(0, i).Value) = True Then
DataGridView1.Rows(i).DefaultCellStyle.BackColor = Color.Lime
Else
DataGridView1.Rows(i).DefaultCellStyle.BackColor = Color.Red
End If
Catch ex As Exception
DataGridView1.Rows(i).DefaultCellStyle.BackColor = Color.Red
End Try
Next
The problem is that the Ping takes a very long time and the application freezes.
How can you speed up this procedure?
And let's say if the node is not available, then simply remove it from the list.
An example to Ping multiple addresses at the same time, using the async version of provided by the Ping class, Ping.SendPingAsync().
This version is await-able, not the same as the Ping.SendAsync() method, still asynchronous but event-driven.
Since you're using a DataGridView to both store the IpAddress/HostName and to present the PingReply results, you need to determine a way to match the Ping result to correct Cell of the DataGridView from which the Ip/Host address was taken.
Here, I'm passing to the method the Row's Index, so when the Ping result comes back, asynchronously, we can match the response to a specific Cell in the DataGridView.
To make the initialization method more generic, I'm passing also the index of the Column where the Ip/Host address is stored and the index of the Column that will show the result (you could also just pass all indexes, not a DataGridView Control reference to the method and handle the results in a different way).
A loop extracts the addresses from the the DataGridView and creates a List(Of Task), adding a PingAsync() Task for each address found.
When the collection is completed, the List(Of Task) is passed to the Task.WhenAll() method, which is then awaited.
This method starts all the Task in the list and returns when all Task have a result.
► Note that the Ping procedure sets a TimeOut, to 5000ms here, so all the Tasks will return before or within that interval, successful or not.
You can then decide if you want to reschedule the failed Pings or not.
The UI update is handled using a Progress delegate. It's just a method (Action delegate) that is called when the Ping procedure has a result to show.
It can also be used when the method that updates the UI runs in a different Thread: the Report() method will call the Progress object delegate in the Thread that created the delegate: the UI Thread, here (in the example, we're not actually ever leaving it, though).
This is how it works:
Assume you're starting the ping sequence from Button.Click event handler.
Note that the handler is declared async.
Private Async Sub btnMassPing_Click(sender As Object, e As EventArgs) Handles btnMassPing.Click
Await MassPing(DataGridView1, 1, 2)
End Sub
Initialization method and IProgress<T> report handler:
Imports System.Drawing
Imports System.Net.NetworkInformation
Imports System.Net.Sockets
Imports System.Threading.Tasks
Private Async Function MassPing(dgv As DataGridView, statusColumn As Integer, addressColumn As Integer) As Task
Dim obj = New Object()
Dim tasks = New List(Of Task)()
Dim progress = New Progress(Of (sequence As Integer, reply As Object))(
Sub(report)
SyncLock obj
Dim status = IPStatus.Unknown
If TypeOf report.reply Is PingReply Then
status = DirectCast(report.reply, PingReply).Status
ElseIf TypeOf report.reply Is SocketError Then
Dim socErr = DirectCast(report.reply, SocketError)
status = If(socErr = SocketError.HostNotFound,
IPStatus.DestinationHostUnreachable,
IPStatus.Unknown)
End If
Dim color As Color = If(status = IPStatus.Success, Color.Green, Color.Red)
Dim cell = dgv(statusColumn, report.sequence)
cell.Style.BackColor = color
cell.Value = If(status = IPStatus.Success, "Online", status.ToString())
End SyncLock
End Sub)
For row As Integer = 0 To dgv.Rows.Count - 1
If row = dgv.NewRowIndex Then Continue For
Dim ipAddr = dgv(addressColumn, row).Value.ToString()
tasks.Add(PingAsync(ipAddr, 5000, row, progress))
Next
Try
Await Task.WhenAll(tasks)
Catch ex As Exception
' Log / report the exception
Console.WriteLine(ex.Message)
End Try
End Function
PingAsync worker method:
Private Async Function PingAsync(ipAddress As String, timeOut As Integer, sequence As Integer, progress As IProgress(Of (seq As Integer, reply As Object))) As Task
Dim buffer As Byte() = New Byte(32) {}
Dim ping = New Ping()
Try
Dim options = New PingOptions(64, True)
Dim reply = Await ping.SendPingAsync(ipAddress, timeOut, buffer, options)
progress.Report((sequence, reply))
Catch pex As PingException
If TypeOf pex.InnerException Is SocketException Then
Dim socEx = DirectCast(pex.InnerException, SocketException)
progress.Report((sequence, socEx.SocketErrorCode))
End If
Finally
ping.Dispose()
End Try
End Function

How to pause loop while multithreading is alive

I have 3 threads that are called inside a loop.
For i As Integer = 0 To DG.Rows.Count - 1
Dim thread1 = New System.Threading.Thread(AddressOf processData)
Dim thread2 = New System.Threading.Thread(AddressOf processData2)
Dim thread3 = New System.Threading.Thread(AddressOf processData3)
If Not thread1.IsAlive Then
x1 = i
thread1.Start()
ElseIf Not thread2.IsAlive Then
x2 = i
thread2.Start()
ElseIf Not thread3.IsAlive Then
x3 = i
thread3.Start()
End If
Next
How do I pause the loop while all threads are alive?
What I want is, if one of the threads finishes then continue the loop and get the (i), then pause the loop again if there are no available threads. Because sometimes DG.Rows items are more than 3.
Let the framework handle this for you: use the ThreadPool.
First, create an array to hold thread status for each item:
Dim doneEvents(DG.Rows.Count) As ManualResetEvent
Like the x1,x2,x3 variables, this needs to be accessible from both your main thread and the processData method.
Then modify your processData method to accept an Object argument at the beginning and set a ResetEvent at the end:
Public Sub processData(ByVal data As Object)
Dim x As Integer = CInt(data)
'...
'Existing code here
doneEvents(x).Set()
End Sub
Now you can just queue them all up like this:
For i As Integer = 0 To DG.Rows.Count - 1
ThreadPool.QueueUserWorkItem(processData, i)
Next
WaitHandle.WaitAll(doneEvents)
Console.WriteLine("All data is processed.")
Though I suspect you should also pass the data from your grid for each row to the processData method.
You can also use the newer Async/Await keywords, but I'll have a hard time writing a sample for this without knowing something of the contents of processData.
I think you want to do something like this. Don't pause, just launch a thread per loop iteration.
For i As Integer = 0 To DG.Rows.Count - 1
Dim thread1 = New System.Threading.Thread(AddressOf processData)
thread1.Start(i)
Next
But in any case, I don't think you want to call new System.Threading.Thread in each loop. Those should be moved outside the For loop.
It could be that you use TPL's Parallel methods and write your code like this:
Parallel.For( _
0, _
DG.Rows.Count, _
New ParallelOptions() With {.MaxDegreeOfParallelism = 3}, _
Sub(i) processData(i))
I don't understand why you have processData, processData2, and processData3 though.

Multi-Threading IP Address Pings Causing Application Crash

Boy, learning something new can be a real headache if you can't find a solid source. I have been designing applications in a linear fashion for some time now and want to step up into a more powerful approach. I have been reading up on threading, and perhaps have gone to an larger level than I should. However, one usually steps up when the application calls for it and no better time than the present to learn something new.
My program is designed to do something that seems rather simple, but has become extremely difficult to create in a smooth running manor. The original design created object of each device on the network it wished to ping, in my real world environment they are Kindles. The goal was to ensure they were still connected to the network by Pining them. I used a For Loop and Obj Array to do this set on a Timer. This had unexpected results causing the ListView to flicker and load slowly after the ListView1.Items.Clear. I evolved into updating the List Items rather than clearing them and the flicker remained.
I assumed this was due to the slow process of the array and pings so I started hunting for solutions and came across Multi-Threading. I have known about this for some time, but have yet to dive into the practice. My program seemed to need more speed and smoother operation so I took a stab at it. The below code in its complete form is the result, however it crashes and throws errors. Clearly I have not used Threading as it was intended. Using it in simpler functions works fine and I feel I have the grasp. That is if i want my program to pointlessly run counters.
I don't know what to do next in my steps for getting this task done, and figure I am combining several different methods into a mush of dead program. I could really use some help getting back on track with this. All comments welcome and thank you for checking out my code.
Form1 Code
Public Class Form1
'Obj Array
Public Shared objDevice As New List(Of kDevice)
'Thread Array for each Obj
Public Shared thread() As System.Threading.Thread
Private Sub ipRefresh(objID, itemPos)
Dim objDev As kDevice = objID
If My.Computer.Network.Ping(objDev.kIP) Then
objDev.kStatus = "Online"
objDev.kPings = 0
Else
objDev.kPings += 1
End If
If objDev.kPings >= 8 Then
objDev.kStatus = "Offline"
objDev.kPings = 0
ListView1.Items(itemPos).BackColor = Color.Red
End If
Dim str(4) As String
Dim itm As ListViewItem
str(0) = objDev.kName
str(1) = objDev.kIP
str(2) = objDev.kStatus
str(3) = objDev.kPings
itm = New ListViewItem(str)
ListView1.Items(itemPos) = itm
End Sub
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Me.CheckForIllegalCrossThreadCalls = False
' Adding ListView Columns
ListView1.Columns.Add("Device", 100, HorizontalAlignment.Left)
ListView1.Columns.Add("IP Address", 150, HorizontalAlignment.Left)
ListView1.Columns.Add("Status", 60, HorizontalAlignment.Left)
ListView1.Columns.Add("Pings", 60, HorizontalAlignment.Left)
Dim ipList As New List(Of String)
Dim nameList As New List(Of String)
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("kDevices.csv")
MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
MyReader.Delimiters = New String() {","}
Dim currentRow As String()
Dim rowP As Integer = 1
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim cellP As Integer = 0
Dim nTemp As String = ""
For Each currentField As String In currentRow
Select Case cellP
Case 0
nameList.Add(currentField.Replace("""", ""))
Case 1
ipList.Add(currentField.Replace("""", ""))
End Select
cellP += 1
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & " is invalid. Skipping")
End Try
rowP += 1
End While
End Using
Dim nameLAR As String() = nameList.ToArray
Dim ipLAR As String() = ipList.ToArray
ReDim Preserve thread(nameLAR.Length)
For i As Integer = 0 To nameLAR.Length - 1
Dim newDevice As New kDevice
Dim objNum = i
objDevice.Add(newDevice)
newDevice.kName = nameLAR(i)
newDevice.kIP = ipLAR(i)
If My.Computer.Network.Ping(newDevice.kIP) Then
newDevice.kStatus = "Online"
Else
newDevice.kStatus = "Loading"
End If
Dim str(4) As String
Dim itm As ListViewItem
str(0) = newDevice.kName
str(1) = newDevice.kIP
str(2) = newDevice.kStatus
str(3) = newDevice.kPings
itm = New ListViewItem(str)
If newDevice.kStatus = "Loading" Then
itm.BackColor = Color.Yellow
End If
ListView1.Items.Add(itm)
thread(objNum) = New System.Threading.Thread(Sub() Me.ipRefresh(objDevice(objNum), objNum))
Next
End Sub
Private Sub Timer1_Tick(sender As Object, e As EventArgs) Handles Timer1.Tick
For i As Integer = 0 To objDevice.Count - 1
thread(i).Start()
Next
End Sub
End Class
kDevice Class
Public Class kDevice
Private strkName As String
Private strkIP As String
Private strkStatus As String
Private strkLastStatus As String
Private strkPings As Integer = 0
Public Property kName As String
Get
Return strkName
End Get
Set(value As String)
strkName = value
End Set
End Property
Public Property kIP As String
Get
Return strkIP
End Get
Set(value As String)
strkIP = value
End Set
End Property
Public Property kStatus As String
Get
Return strkStatus
End Get
Set(value As String)
strkStatus = value
End Set
End Property
Public Property kPings As Integer
Get
Return strkPings
End Get
Set(value As Integer)
strkPings = value
End Set
End Property
End Class
The Error / Crash on Line 32 of my code which is when it tries to pass the update to the ListView Item
An unhandled exception of type 'System.ArgumentException'
occurred in Microsoft.VisualBasic.dll
Additional information: InvalidArgument=Value of '18'
is not valid for 'index'.
or
An unhandled exception of type 'System.NullReferenceException'
occurred in Microsoft.VisualBasic.dll
Additional information: Object reference not set to an instance
of an object.
If my code does not make sense, or at lease the idea of what I was trying to make it do, please let me know and I will explain whichever parts are unclear. Again thank you for looking over my issue.
Just a possible issue I noticed:
Dim str(4) As String
Dim itm As ListViewItem
str(0) = newDevice.kName
str(1) = newDevice.kIP
str(2) = newDevice.kStatus
str(3) = newDevice.kPings
itm = New ListViewItem(str)
If newDevice.kStatus = "Loading" Then
itm.BackColor = Color.Yellow
End If
ListView1.Items.Add(itm)
In this bit here, you declare str(4) which would be 5 possible indexes (remember it starts at zero), where you should have 4 (str(3)) . I don't think this is the whole issue, but just a small bit you should probably fix. You also may want to look into other ways to update the listview without setting
Me.CheckForIllegalCrossThreadCalls = False
Here's an awesome guide that helped me when I did my first multi threaded application: http://checktechno.blogspot.com/2012/11/multi-thread-for-newbies.html

Strange error reported in this code, please explain?

I put together this bit of code from a few other samples, and I am getting an error I cant understand. On this line in the code below, on the word Observer,
Dim Results As ManagementObjectCollection = Worker.Get(Observer)
I get the error
"Value of type 'System.Management.ManagementOperationObserver' cannot be converted to 'Integer'"
Can somebody explain what this means?
There are two signatures for ManagementObjectSearcher.Get(), one has no parameters and the other has one parameter, a ManagementOperationObserver for async operation. That is what I am providing, yet the error indicates conversion involving an integer?
Public Shared Sub WMIDriveDetectionASYNC(ByVal args As String())
Dim Observer As New ManagementOperationObserver()
Dim completionHandler As New MyHandler()
AddHandler Observer.Completed, AddressOf completionHandler.Done
Dim Machine = "192.168.0.15"
Dim Scope = New ManagementScope("\\" & Machine & "\root\cimv2")
Dim QueryString = "select Name, Size, FreeSpace from Win32_LogicalDisk where DriveType=3"
Dim Query = New ObjectQuery(QueryString)
Dim Worker = New ManagementObjectSearcher(Scope, Query)
Dim Results As ManagementObjectCollection = Worker.Get(Observer) 'use parameter to make async
For Each item As ManagementObject In Results
Console.WriteLine("{0} {2} {1}", item("Name"), item("FreeSpace"), item("Size"))
Dim FullSpace As Long = (CLng(item("Size")) - CLng(item("FreeSpace"))) \ 1000000
Console.WriteLine(FullSpace)
Next
End Sub
Public Class MyHandler
Private _isComplete As Boolean = False
Public Sub Done(sender As Object, e As CompletedEventArgs)
_isComplete = True
End Sub 'Done
Public ReadOnly Property IsComplete() As Boolean
Get
Return _isComplete
End Get
End Property
End Class
Thanks for any advice!
I think that uses a reference type to get the result and put it in the object you sent as a parameter. So I think it just needs to look like:
Worker.Get(Observer)
instead of trying to set something = to that since it isn't a function that returns a value.
Then use the events you hook up to the object to handle whatever you need to do with the items you find.

linq submitchanges runs out of memory

I have a database with about 180,000 records. I'm trying to attach a pdf file to each of those records. Each pdf is about 250 kb in size. However, after about a minute my program starts taking about about a GB of memory and I have to stop it. I tried doing it so the reference to each linq object is removed once it's updated but that doesn't seem to help. How can I make it clear the reference?
Thanks for your help
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
I suggest you perform this in batches, using Take (before the call to ToList) to process a particular number of items at a time. Read (say) 10, set the PDFContent on all of them, call SubmitChanges, and then start again. (I'm not sure offhand whether you should start with a new DataContext at that point, but it might be cleanest to do so.)
As an aside, your code to read the contents of a file is broken in at least a couple of ways - but it would be simpler just to use File.ReadAllBytes in the first place.
Also, your way of handling the list gradually shrinking is really inefficient - after fetching 180,000 records, you're then building a new list with 179,999 records, then another with 179,998 records etc.
Does the DataContext have ObjectTrackingEnabled set to true (the default value)? If so, then it will try to keep a record of essentially all the data it touches, thus preventing the garbage collector from being able to collect any of it.
If so, you should be able to fix the situation by periodically disposing the DataContext and creating a new one, or turning object tracking off.
OK. To use the smallest amount of memory we have to update the datacontext in blocks. I've put a sample code below. Might have sytax errors since I'm using notepad to type it in.
Dim DB as YourDataContext = new YourDataContext
Dim BlockSize as integer = 25
Dim AllItems = DB.Items.Where(function(i) i.PDFfile.HasValue=False)
Dim count = 0
Dim tmpDB as YourDataContext = new YourDataContext
While (count < AllITems.Count)
Dim _item = tmpDB.Items.Single(function(i) i.recordID=AllItems.Item(count).recordID)
_item.PDF = GetPDF()
Count +=1
if count mod BlockSize = 0 or count = AllItems.Count then
tmpDB.SubmitChanges()
tmpDB = new YourDataContext
GC.Collect()
end if
End While
To Further optimise the speed you can get the recordID's into an array from allitems as an anonymous type, and set DelayLoading on for that PDF field.