How to improve Mongo document retrieval performance - vb.net

I am retrieving documents from a Mongo database and copying them to internal storage. I'm finding it takes more than a few seconds to retrieve and store a hundred of these documents. Is there anything I can do to improve the performance? Some of collections have more than 1000 documents. Here's what I have (written in vb.net)
' get the documents from collection "reqitems" and put them in "collection"
Dim collection As IFindFluent(Of BsonDocument, BsonDocument) = _
reqitems.Find(Builders(Of BsonDocument).Filter.Empty)
ReDim model.ReqItems(TotalCollection.ToList.Count) ' storage for the processed documents
For Each item As BsonDocument In TotalCollection.ToList()
' note: given a string a=x, "GetRHS" returns x
Dim parentuid As String = GetRHS(item.GetElement("parentuid").ToString)
Dim nodename As String = GetRHS(item.GetElement("nodename").ToString)
' .... about a dozen of these elements
' .... process the elements and copy them to locations in model.ReqItems
next

You can add indexes to your collection if you haven't done so. Please refer to : https://docs.mongodb.com/manual/indexes/
Also, I would suggest to run the particular Mongodb query with executions stats. ex: db.mycollection.find().explain("executionStats"); which will give you more stats regarding the performance of the query. https://docs.mongodb.com/manual/reference/explain-results/#executionstats

Adding indices didn't really help. What slows it down is accessing the elements in the document one at a time (GetRHS in the posted code). So, as a fix, I converted the document to a string, then parse the string for keyword-value pair. Hopefully, what I found might be able to help someone with the same problem

Related

Match Words and Add Quantities vb.net

I am trying to program a way to read a text file and match all the values and their quantites. For example if the text file is like this:
Bread-10 Flour-2 Orange-2 Bread-3
I want to create a list with the total quantity of all the common words. I began my code, but I am having trouble understanding to to sum the values. I'm not asking for anyone to write the code for me but I am having trouble finding resources. I have the following code:
Dim query = From data In IO.File.ReadAllLines("C:\User\Desktop\doc.txt")
Let name As String = data.Split("-")(0)
Let quantity As Integer = CInt(data.Split("-")(1))
Let sum As Integer = 0
For i As Integer = 0 To query.Count - 1
For j As Integer = i To
Next
Thanks
Ok, lets break this down. And I not seen the LET command used for a long time (back in the GWBASIC days!).
But, that's ok.
So, first up, we going to assume your text file is like this:
Bread-10
Flour-2
Orange-2
Bread-3
As opposed to this:
Bread-10 Flour-2 Orange-2 Bread-3
Now, we could read one line, and then process the information. Or we can read all lines of text, and THEN process the data. If the file is not huge (say a few 100 lines), then performance is not much of a issue, so lets just read in the whole file in one shot (and your code also had this idea).
Your start code is good. So, lets keep it (well ok, very close).
A few things:
We don't need the LET for assignment. While older BASIC languages had this, and vb.net still supports this? We don't need it. (but you will see examples of that still floating around in vb.net - especially for what we call "class" module code, or "custom classes". But again lets just leave that for another day.
Now the next part? We could start building up a array, look for the existing value, and then add it. However, this would require a few extra arrays, and a few extra loops.
However, in .net land, we have a cool thing called a dictionary.
And that's just a fancy term of for a collection VERY much like an array, but it has some extra "fancy" features. The fancy feature is that it allows one to put into the handly list things by a "key" name, and then pull that "value" out by the key.
This saves us a good number of extra looping type of code.
And it also means we don't need a array for the results.
This key system is ALSO very fast (behind the scene it uses some cool concepts - hash coding).
So, our code to do this would look like this:
Note I could have saved a few lines here or there - but that would make this code hard to read.
Given that you look to have Fortran, or older BASIC language experience, then lets try to keep the code style somewhat similar. it is stunning that vb.net seems to consume even 40 year old GWBASIC type of syntax here.
Do note that arrays() in vb.net do have some fancy "find" options, but the dictionary structure is even nicer. It also means we can often traverse the results with out say needing a for i = 1 to end of array, and having to pull out values that way.
We can use for each.
So this would work:
Dim MyData() As String ' an array() of strings - one line per array
MyData = File.ReadAllLines("c:\test5\doc.txt") ' read each line to array()
Dim colSums As New Dictionary(Of String, Integer) ' to hold our values and sum them
Dim sKey As String
Dim sValue As Integer
For Each strLine As String In MyData
sKey = Split(strLine, "-")(0)
sValue = Split(strLine, "-")(1)
If colSums.ContainsKey(sKey) Then
colSums(sKey) = colSums(sKey) + sValue
Else
colSums.Add(sKey, sValue)
End If
Next
' display results
Dim KeyPair As KeyValuePair(Of String, Integer)
For Each KeyPair In colSums
Debug.Print(KeyPair.Key & " = " & KeyPair.Value)
Next
The above results in this output in the debug window:
Bread = 13
Flour = 2
Orange = 2
I was tempted here to write this code using just pure array() in vb.net, as that would give you a good idea of the "older" types of coding and syntax we could use here, and a approach that harks all the way back to those older PC basic systems.
While the dictionary feature is more advanced, it is worth the learning curve here, and it makes this problem a lot easier. I mean, if this was for a longer list? Then I would start to consider introduction of some kind of data base system.
However, without some data system, then the dictionary feature is a welcome approach due to that "key" value lookup ability, and not having to loop. It also a very high speed system, so the result is not much looping code, and better yet we write less code.

How to limit GetFiles to get all paths created after a certain date?

I'm a beginner in VB and I'm throwing together a quick tool to pull data out of excel sheets into an SQL server.
The actual opening/manipulation of the excel files I can do, but I would like to limit the files I'm dealing with based on the created date, but I'm struggling with googling a solution
So in order to get all of the paths, I'm simply using:
Dim fname As String
For Each file As String In Directory.GetFiles(pathtoscan)
fname = Path.GetFileName(file)
Which works fine to get everything (writing to the SQL server table as I go), but of course the above means getting every single path, where as I'd like to "optimise" it by only getting paths created after a certain defined date.
Is this doable, or would it simply be a matter of filtering after grabbing all paths anyway, and thus mean no better "performance"?
Many thanks in advance.
You can use LINQ and File.GetCreationTime:
Dim relevantFiles = From f In Directory.EnumerateFiles(pathtoscan)
Where File.GetCreationTime(f) > yourDate
For Each file As String In relevantFiles
' ... '
Next
Also use EnumerateFiles instead of GetFiles. The latter returns an array of all files before you start filtering them. The former returns one after the other.

Adding a New Element/Field with an Increment Integer as Value

After using Mongoimport to import a CSV file to my database, I want to add a new field or element per document. And, the data per for this new field is the is the index number plus 2.
Dim documents = DB.GetCollection(Of BsonDocument)(collectionName).Find(filterSelectedDocuments).ToListAsync.Result
For Each doc in documents
DB.GetCollection(Of BsonDocument)(collectionName).UpdateOneAsync(
Builders(Of BsonDocument).Filter.Eq(Of ObjectId)("_id", doc.GetValue("_id").AsObjectId),
Builders(Of BsonDocument).Update.Set(Of Integer)("increment.value", documents.IndexOf(doc) + 2).Wait()
Next
If I have over a million of data to import, is there a better way to achieved this like using UpdateManyAsync?
Just as a side note: Since you've got the Wait() and the Result everywhere, the Async methods don't seem to make an awful lot of sense. Also, your logic appears flawed since there is no .Sort() anywhere. So you've got no guarantee about the order of your returned documents. Is it indended that every document just gets a kind of random but unique and increasing number assigned?
Anyway, to make this faster, you'd really want to patch your CSV file and write the increasing "increment.value" field straight into it before the import. This way, you've got your value directly in MongoDB and do not need to query and update the imported data again.
If this is not an option you could optimize your code like this:
Only retrieve the _id of your documents - that's all you need and it will majorly impact your .find() perfomance since a lot less data needs to be transferred/deserialized from MongoDB.
Iterate over the Enumerable of your result instead of using a fully populated list.
Use bulk writes to avoid connecting to MongoDB again and again for every document and use a chunked flushing approach and flush every 1000 documents or so.
Theoretically, you could go further using multithreading or yield semantics for nicer streaming. However, that's getting a little complicated and may not even be needed.
The following should get you going faster already:
' just some cached values
Dim filterDefinitionBuilder = Builders(Of BsonDocument).Filter
Dim updateDefinitionBuilder = Builders(Of BsonDocument).Update
Dim collection = DB.GetCollection(Of BsonDocument)(collectionName)
' load only _id field
Dim documentIds = collection.Find(filterSelectedDocuments).Project(Function(doc) doc.GetValue("_id")).ToEnumerable()
' bulk write buffer (pre-initialized to size 1000 to avoid memory traffic upon array expansion)
Dim updateModelsBuffer = new List(Of UpdateOneModel(Of BsonDocument))(1000)
' starting value for our update counter
Dim i As Long = 2
For Each objectId In documentIds
' for every document we want one update command...
' ...that finds exactly one document identified by its _id field
Dim filterDefinition = filterDefinitionBuilder.Eq(Of ObjectId)("_id", objectId)
' ...and updates the "increment.value" with our running counter
Dim updateDefinition = updateDefinitionBuilder.Set(Of Integer)("increment.value", i)
updateModelsBuffer.Add(New UpdateOneModel(Of BsonDocument)(filterDefinition, updateDefinition))
' every e.g. 1000 documents
If updateModelsBuffer.Count = 1000
' we flush the contents to the database
collection.BulkWrite(updateModelsBuffer)
' and we empty our buffer list
updateModelsBuffer.Clear()
End If
i = i + 1
Next
' flush left over commands that have not been written yet in case we do not have a multiple of 1000 documents
collection.BulkWrite(updateModelsBuffer)

Lucene.net result as string / parsing to slow

I need some help to speed up the lucene.net search.
We need the result in a string with ; as a seperator.
The parsing of the topdocs takes to long:
Dim resultDocs As TopDocs = indexSearch.Search(query, indexReader.MaxDoc())
Dim hits As Object = resultDocs.ScoreDocs
Dim strGetDocIDList As String = ""
For Each sDoc As ScoreDoc In hits
Dim documentFromSearcher As Document = indexSearch.Doc(sDoc.Doc)
Dim contentValue As String = documentFromSearcher.Get("id")
strGetDocIDList = strGetDocIDList + Path.GetFileName(contentValue) + ";"
Next
Return strGetDocIDList
How can we speed this up?
Regards
Ingo
There are a few ways to tune performances for loading STORED fields in Lucene.
First, by default its loads every stored fields of the Document when you load it. Only store what you need and do not systematically store everything.
If you dont need to load all the stored fields for this particular query, try writing a FieldSelector to gain further control on field loading.
Finally, add the Field you load the stored data for more often before other stored fields in your Documents. Fields are loaded sequentially and in some cases, adding them first to Documents can speed up things a little bit.
FieldSelector API link
An article that may help you with implementing a FieldSelector

Performance with Strings in a VB Application [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
In my VB application,I am copying huge amount of data using VB string?
This results in performance issue.
What shoul I use in place of VB string to improve performance?
Try to make sure you are always passing ByRef wherever possible (in VB6). In VB.Net this is not an issue. Also, maybe preallocate your strings. Analysis and a better description of your programming task can help provide a better answer.
This two part article is good source of information for VB6. Some tips mentioned there are:
Use $ version of string functions (i.e. Replace$ instead of Replace).
Use LenB() to check if string is empty (or not).
Use vbNullString constant instead of "".
One way to solve this is to pre-allocate a byte/char array that is big enough to hold all your concatenated strings.
If you are using VB.NET, then there is the StringBuilder class.
Did you try using a StringBuilder from .NET? It is marked as "COM Visible", which means you should be able to use it from VB 6.
With VB 6 maybe this can help you out.
Taken out from the page:
The cStringBuilder class takes a different approach. When you initialize the
class, it pre-allocates a 1MB buffer of memory to store the string in,
so the string can grow and shrink dynamically within that 1MB space
and never need to reallocate any memory. The only time it needs to
reallocate is when the string grows bigger than the currently
allocated buffer. When the string buffer grows, it is always in
increments of 1MB.
How Do I Use It? Here is a run-down of all the functions and
properties available in cStringBuilder:
Property StringData As String Gets or sets the data stored in the
StringBuilder.
Property Length As Long Gets the length of the data stored in the
StringBuilder.
Sub Clear Resets the stored string to nothing, and resizes the
in-memory buffer back to 1MB.
Sub Append(str As String) Adds the string str on to the end of the
data stored in the cStringBuilder.
Sub Insert(index As Long, str As String) Inserts a string (str) into
the StringBuilder at a specific index (index). Note that the index is
zero-based, so to insert to the beginning of the string, specify 0 as
the index.
Sub Overwrite(index As Long, str As String) Inserts a string (str)
into the StringBuilder at a specific index (index), overwriting the
data at that point, rather than moving it to the right as is the case
with Insert. If the data you insert goes past the end of the stored
data, the excess will be appended normally.
Sub Remove(index As Long, length As Long) Removes a section (length
characters) of the stored string, starting at index. If length = 0,
the entire string starting at index is removed.
And the usage is also quite easy
'Extremely short example:
Dim sb As New cStringBuilder
sb.StringData = "Hello, "
sb.Append "Wxld!"
sb.Insert 8, "o"
sb.Overwrite 9, "r"
MsgBox sb.StringData