Lucene.net result as string / parsing to slow - vb.net

I need some help to speed up the lucene.net search.
We need the result in a string with ; as a seperator.
The parsing of the topdocs takes to long:
Dim resultDocs As TopDocs = indexSearch.Search(query, indexReader.MaxDoc())
Dim hits As Object = resultDocs.ScoreDocs
Dim strGetDocIDList As String = ""
For Each sDoc As ScoreDoc In hits
Dim documentFromSearcher As Document = indexSearch.Doc(sDoc.Doc)
Dim contentValue As String = documentFromSearcher.Get("id")
strGetDocIDList = strGetDocIDList + Path.GetFileName(contentValue) + ";"
Next
Return strGetDocIDList
How can we speed this up?
Regards
Ingo

There are a few ways to tune performances for loading STORED fields in Lucene.
First, by default its loads every stored fields of the Document when you load it. Only store what you need and do not systematically store everything.
If you dont need to load all the stored fields for this particular query, try writing a FieldSelector to gain further control on field loading.
Finally, add the Field you load the stored data for more often before other stored fields in your Documents. Fields are loaded sequentially and in some cases, adding them first to Documents can speed up things a little bit.
FieldSelector API link
An article that may help you with implementing a FieldSelector

Related

Adding a New Element/Field with an Increment Integer as Value

After using Mongoimport to import a CSV file to my database, I want to add a new field or element per document. And, the data per for this new field is the is the index number plus 2.
Dim documents = DB.GetCollection(Of BsonDocument)(collectionName).Find(filterSelectedDocuments).ToListAsync.Result
For Each doc in documents
DB.GetCollection(Of BsonDocument)(collectionName).UpdateOneAsync(
Builders(Of BsonDocument).Filter.Eq(Of ObjectId)("_id", doc.GetValue("_id").AsObjectId),
Builders(Of BsonDocument).Update.Set(Of Integer)("increment.value", documents.IndexOf(doc) + 2).Wait()
Next
If I have over a million of data to import, is there a better way to achieved this like using UpdateManyAsync?
Just as a side note: Since you've got the Wait() and the Result everywhere, the Async methods don't seem to make an awful lot of sense. Also, your logic appears flawed since there is no .Sort() anywhere. So you've got no guarantee about the order of your returned documents. Is it indended that every document just gets a kind of random but unique and increasing number assigned?
Anyway, to make this faster, you'd really want to patch your CSV file and write the increasing "increment.value" field straight into it before the import. This way, you've got your value directly in MongoDB and do not need to query and update the imported data again.
If this is not an option you could optimize your code like this:
Only retrieve the _id of your documents - that's all you need and it will majorly impact your .find() perfomance since a lot less data needs to be transferred/deserialized from MongoDB.
Iterate over the Enumerable of your result instead of using a fully populated list.
Use bulk writes to avoid connecting to MongoDB again and again for every document and use a chunked flushing approach and flush every 1000 documents or so.
Theoretically, you could go further using multithreading or yield semantics for nicer streaming. However, that's getting a little complicated and may not even be needed.
The following should get you going faster already:
' just some cached values
Dim filterDefinitionBuilder = Builders(Of BsonDocument).Filter
Dim updateDefinitionBuilder = Builders(Of BsonDocument).Update
Dim collection = DB.GetCollection(Of BsonDocument)(collectionName)
' load only _id field
Dim documentIds = collection.Find(filterSelectedDocuments).Project(Function(doc) doc.GetValue("_id")).ToEnumerable()
' bulk write buffer (pre-initialized to size 1000 to avoid memory traffic upon array expansion)
Dim updateModelsBuffer = new List(Of UpdateOneModel(Of BsonDocument))(1000)
' starting value for our update counter
Dim i As Long = 2
For Each objectId In documentIds
' for every document we want one update command...
' ...that finds exactly one document identified by its _id field
Dim filterDefinition = filterDefinitionBuilder.Eq(Of ObjectId)("_id", objectId)
' ...and updates the "increment.value" with our running counter
Dim updateDefinition = updateDefinitionBuilder.Set(Of Integer)("increment.value", i)
updateModelsBuffer.Add(New UpdateOneModel(Of BsonDocument)(filterDefinition, updateDefinition))
' every e.g. 1000 documents
If updateModelsBuffer.Count = 1000
' we flush the contents to the database
collection.BulkWrite(updateModelsBuffer)
' and we empty our buffer list
updateModelsBuffer.Clear()
End If
i = i + 1
Next
' flush left over commands that have not been written yet in case we do not have a multiple of 1000 documents
collection.BulkWrite(updateModelsBuffer)

How to improve Mongo document retrieval performance

I am retrieving documents from a Mongo database and copying them to internal storage. I'm finding it takes more than a few seconds to retrieve and store a hundred of these documents. Is there anything I can do to improve the performance? Some of collections have more than 1000 documents. Here's what I have (written in vb.net)
' get the documents from collection "reqitems" and put them in "collection"
Dim collection As IFindFluent(Of BsonDocument, BsonDocument) = _
reqitems.Find(Builders(Of BsonDocument).Filter.Empty)
ReDim model.ReqItems(TotalCollection.ToList.Count) ' storage for the processed documents
For Each item As BsonDocument In TotalCollection.ToList()
' note: given a string a=x, "GetRHS" returns x
Dim parentuid As String = GetRHS(item.GetElement("parentuid").ToString)
Dim nodename As String = GetRHS(item.GetElement("nodename").ToString)
' .... about a dozen of these elements
' .... process the elements and copy them to locations in model.ReqItems
next
You can add indexes to your collection if you haven't done so. Please refer to : https://docs.mongodb.com/manual/indexes/
Also, I would suggest to run the particular Mongodb query with executions stats. ex: db.mycollection.find().explain("executionStats"); which will give you more stats regarding the performance of the query. https://docs.mongodb.com/manual/reference/explain-results/#executionstats
Adding indices didn't really help. What slows it down is accessing the elements in the document one at a time (GetRHS in the posted code). So, as a fix, I converted the document to a string, then parse the string for keyword-value pair. Hopefully, what I found might be able to help someone with the same problem

Avoid updating textbox in real time in vb.net

I have a very simple code in a VB.NET program to load all paths in a folder in a text box. The code works great, the problem is that it adds the lines in real time, so it takes about 3 minutes to load 20k files while the interface is displaying line by line.
This is my code:
Dim ImageryDB As String() = IO.Directory.GetFiles("c:\myimages\")
For Each image In ImageryDB
txtbAllimg.AppendText(image & vbCrLf)
Next
How can I force my program to load the files in chunks or update the interface every second?
Thanks in advance
Yes, you can do that. You'll need to load the file names into an off-screen data structure of some kind rather than loading them directly into the control. Then you can periodically update the control to display whatever is loaded so far. However, I think you'll find that the slowness comes only from updating the control. Once you remove that part, there will be no need to update the control periodically during the loading process since it will be nearly instantaneous.
You could just load all of the file names into a string and then only set the text box to that string after it's been fully loaded, like this:
Dim imagePaths As String = ""
For Each image As String In Directory.GetFiles("c:\myimages\")
imagePaths &= image & Environment.NewLine
Next
txtbAllimg.Text = imagePaths
However, that's not as efficient as using the StringBuilder:
Dim imagePaths As New StringBuilder()
For Each image As String In Directory.GetFiles("c:\myimages\")
imagePaths.AppendLine(image)
Next
txtbAllimg.Text = imagePaths.ToString()
However, since the GetFiles method is already returning the complete list of paths to you as a string array, it would be even more convenient (and likely even more efficient) to just use the String.Join method to combine all of the items in the array into a single string:
txtbAllimg.Text = String.Join(Environment.NewLine, Directory.GetFiles("c:\myimages\"))
I know that this is not an answer to your actual question, but AppendText is slow. Using a ListBox and Adding the items to it is approx. 3 times faster. The ListBox also has the benefit of being able to select an item easily (at least more easily than a TextBox)
For each image in ImageryDB
Me.ListBox1.Items.add (image)
Next
However, there is probably an even more useful and faster way to do this. Using FileInfo.
Dim dir As New IO.DirectoryInfo("C:\myImages")
Dim fileInfoArray As IO.FileInfo() = dir.GetFiles()
Dim fileInfo As IO.FileInfo
For Each fileInfo In fileInfoArray
Me.ListBox2.Items.Add(fileInfo.Name)
Next

vb.net for-each on a mongo database

I am using a for/each iteration to get the last document in a mongo database. I pull this document every second and retrieve the 2nd element in the last document. However, my database is probably going to get quite large and I worry this is not a very effective method. Does anyone have any better ideas? I'm new to mongo and vb.net and just trying to learn and track my car at the same time.
Dim counter As Integer
For Each item As BsonDocument In drivingData.FindAll()
Next
counter += 1
If counter = drivingData.FindAll().Count - 1 Then
Dim dataString As String = item.GetElement(1).Value.ToString()...
As was commented on, why not sort in reverse and limit your response document to 1.
Considering you are looking for the last document, even without another field such as a time to sort on the _id field will be returned in the order of insertion so you can reverse sort in this and the "last" document will then be the "first" result.
db.collection.find().sort({ _id: -1 }).limit(1)
Through trial and error I got the result. Thanks for your repsonses as they eventually got through my thick skull:
Dim drivingValues As BsonArray
Dim lastDoc As BsonDocument
lastDoc = drivingData.FindAll().SetSortOrder(SortBy.Descending("result")).Last
drivingValues = lastDoc.GetElement(1).Value

How can I read individual lines of a CSV file into a string array, to then be selectively displayed via combobox input?

I need your help, guys! :|
I've got myself a CSV file with the following contents:
1,The Compact,1.8GHz,1024MB,160GB,440
2,The Medium,2.4GHz,1024MB,180GB,500
3,The Workhorse,2.4GHz,2048MB,220GB,650
It's a list of computer systems, basically, that the user can purchase.
I need to read this file, line-by-line, into an array. Let's call this array csvline().
The first line of the text file would stored in csvline(0). Line two would be stored in csvline(1). And so on. (I've started with zero because that's where VB starts its arrays). A drop-down list would then enable the user to select 1, 2 or 3 (or however many lines/systems are stored in the file). Upon selecting a number - say, 1 - csvline(0) would be displayed inside a textbox (textbox1, let's say). If 2 was selected, csvline(1) would be displayed, and so on.
It's not the formatting I need help with, though; that's the easy part. I just need someone to help teach me how to read a CSV file line-by-line, putting each line into a string array - csvlines(count) - then increment count by one so that the next line is read into another slot.
So far, I've been able to paste the numbers of each system into an combobox:
Using csvfileparser As New Microsoft.VisualBasic.FileIO.TextFieldParser _
("F:\folder\programname\programname\bin\Debug\systems.csv")
Dim csvalue As String()
csvfileparser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
csvfileparser.Delimiters = New String() {","}
While Not csvfileparser.EndOfData
csvalue = csvfileparser.ReadFields()
combobox1.Items.Add(String.Format("{1}{0}", _
Environment.NewLine, _
csvalue(0)))
End While
End Using
But this only selects individual values. I need to figure out how selecting one of these numbers in the combobox can trigger textbox1 to be appended with just that line (I can handle the formatting, using the string.format stuff). If I try to do this using csvalue = csvtranslator.ReadLine , I get the following error message:
"Error 1 Value of type 'String' cannot be converted to '1-dimensional array of String'."
If I then put it as an array, ie: csvalue() = csvtranslator.ReadLine , I then get a different error message:
"Error 1 Number of indices is less than the number of dimensions of the indexed array."
What's the knack, guys? I've spent hours trying to figure this out.
Please go easy on me - and keep any responses ultra-simple for my newbie brain - I'm very new to all this programming malarkey and just starting out! :)
Structure systemstructure
Dim number As Byte
Dim name As String
Dim procspeed As String
Dim ram As String
Dim harddrive As String
Dim price As Integer
End Structure
Private Sub csvmanagement()
Dim systemspecs As New systemstructure
Using csvparser As New FileIO.TextFieldParser _
("F:\folder\programname\programname\bin\Debug\systems.csv")
Dim csvalue As String()
csvparser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
csvparser.Delimiters = New String() {","}
csvalue = csvparser.ReadFields()
systemspecs.number = csvalue(0)
systemspecs.name = csvalue(1)
systemspecs.procspeed = csvalue(2)
systemspecs.ram = csvalue(3)
systemspecs.harddrive = csvalue(4)
systemspecs.optical = csvalue(5)
systemspecs.graphics = csvalue(6)
systemspecs.audio = csvalue(7)
systemspecs.monitor = csvalue(8)
systemspecs.software = csvalue(9)
systemspecs.price = csvalue(10)
While Not csvparser.EndOfData
csvalue = csvparser.ReadFields()
systemlist.Items.Add(systemspecs)
End While
End Using
End Sub
Edit:
Thanks for your help guys, I've managed to solve the problem now.
It was merely a matter calling loops at the right point in time.
I would recommend using FileHelpers to do the reading.
The binding shouldn't be an issue after that.
Here is the Quickstart for Delimited Records:
Dim engine As New FileHelperEngine(GetType( Customer))
// To Read Use:
Dim res As Customer() = DirectCast(engine.ReadFile("FileIn.txt"), Customer())
// To Write Use:
engine.WriteFile("FileOut.txt", res)
When you get the file read, put it into a normal class and just bind to the class or use the list of items you have to do custom stuff with the combobox. Basically, get it out of the file and into a real class asap, then things will be easier.
At least take a look at the library. After using it, we use a lot more simple flat files since it is so easy, and we haven't written a file access routine since (for that kinda stuff).
http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
I think your main problem is understanding how arrays work (hence the error message).
You can use split and join functions to convert strings into and out of arrays
dim s() as string = split("1,2,3",",") gives and array of strings with 3 elements
dim ss as string = join(s,",") gives you the string back
Firstly, it's actually really good that you are using the TextFieldParser for reading CSV files - most don't but you won't have to worry about extra commas and quoted text etc...
The Readline method only gives you the raw string, hence the "Error 1 Value of type 'String' cannot be converted to '1-dimensional array of String'."
What you may find easier with combo boxes etc is to use an object (e.g. 'systemspecs') rather than strings. Assign the CSV data to the objects and override the "ToString" method of the 'systemspecs' class to display in the combo box how you want with formatting etc. That way when you handle the SelectedIndexChanged event (or similar) you get the "SelectedItem" from the combo box (which can be Nothing so check) and cast it as the 'systemspecs' to use it. The advantage is that you are not restricted to display the exact data in the combo etc.
' in "systemspecs"...
Public Overrides Function ToString() As String
Return Name ' or whatever...
End Function ' ToString
e.g.
dim item as new systemspecs
item.ID = csvalue(1)
item.Name = csvalue(2)
' etc...
combobox1.Items.Add(item)
Let me know if that makes sense!
PK :-)