Exceeded max configured index size while indexing document while running LS Agent

Exceeded max configured index size while indexing document while running LS Agent - indexing

In our project we have a LS agent which is supposed to delete and create a new FT index. We haven't figured out yet why updating the index cannot be accomplished automatically(although in our database option this is set to update the index daily). So we decided to make roughly the same thing, but using our very simple agent. The agent looks like this and runs daily in the night:
Option Public
Option Declare
Dim s As NotesSession
Dim ndb As NotesDatabase
Sub Initialize
Set s = New NotesSession
Set ndb = s.CurrentDatabase
Print("BEFORE REMOVING INDEXES")
Call ndb.Removeftindex()
Print("INDEXES HAVE BEEN REMOVED SUCCESSFULLY")
Call ndb.createftindex(FTINDEX_ALL_BREAKS, true)
Print("INDEXES HAVE BEEN CREATED SUCCESSFULLY")
End Sub
In most cases it works very well, but sometimes when somebody creates a document which exceeds 12MB (we really don't know how this is possible) the agent fails to create the index. (but the latest index is already deleted).
Error message is:
31.05.2018 03:01:25 Full Text Error (FTG): Exceeded max configured index
size while indexing document NT000BD992 in database index *path to FT file*.ft
My question is how to avoid this problem? We've already expanded the limit of 6MB by the following command SET CONFIG FTG_INDEX_LIMIT=12582912. Can we expand it even more? And in general, how to solve the problem? Thanks in advance.

Using FTG_INDEX_LIMIT is an option to avoid this error, yes. But it will impact server performance in two ways - FTI-update processes will take more time and more memory.
There's no max size of this limit (in theory), but! As update-processes eats memory from common heap it can lead to 'out-of-memory-overheaping' and server crash.
You can try to exclude attachments from index - i don't think someone can put more than 1mb of text in one doc, but users can attach some big text files - and this will produce the error you writing about.
p.s. and yeah, i agree with Scott - why do you need such agent anyway? common ft-indexing working fine usualy.

Related

Spark - Failed to load collect frame - "RetryingBlockFetcher - Exception while beginning fetch"

We have a Scala Spark application, that reads something like 70K records from the DB to a data frame, each record has 2 fields.
After reading the data from the DB, we make minor mapping and load this as a broadcast for later usage.
Now, in local environment, there is an exception, timeout from the RetryingBlockFetcher while running the following code:
dataframe.select("id", "mapping_id")
.rdd.map(row => row.getString(0) -> row.getLong(1))
.collectAsMap().toMap
The exception is:
2022-06-06 10:08:13.077 task-result-getter-2 ERROR
org.apache.spark.network.shuffle.RetryingBlockFetcher Exception while
beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /1.1.1.1:62788
at
org.apache.spark.network.client.
TransportClientFactory.createClient(Transpor .tClientFactory.java:253)
at
org.apache.spark.network.client.
TransportClientFactory.createClient(TransportClientFactory.java:195)
at
org.apache.spark.network.netty.
NettyBlockTransferService$$anon$2.
createAndStart(NettyBlockTransferService.scala:122)
In the local environment, I simply create the spark session with local "spark.master"
When I limit the max of records to 20K, it works well.
Can you please help? maybe I need to configure something in my local environment in order that the original code will work properly?
Update:
I tried to change a lot of Spark-related configurations in my local environment, both memory, a number of executors, timeout-related settings, and more, but nothing helped! I just got the timeout after more time...
I realized that the data frame that I'm reading from the DB has 1 partition of 62K records, while trying to repartition with 2 or more partitions the process worked correctly and I managed to map and collect as needed.
Any idea why this solves the issue? Is there a configuration in the spark that can solve this instead of repartition?
Thanks!

getgroup() is very slow

I am using the function getgroup() to read all of the groups of a user in the active directory.
I'm not sure if I'm doing something wrong but it is very very slow. Each time it arrives at this point, it takes several seconds. I'm also accessing the rest of Active directory using the integrated function of "Accountmanagement" and it executes instantly.
Here's the code:
For y As Integer = 0 To AccountCount - 1
Dim UserGroupArray As PrincipalSearchResult(Of Principal) = UserResult(y).GetGroups()
UserInfoGroup(y) = New String(UserGroupArray.Count - 1) {}
For i As Integer = 0 To UserGroupArray.Count - 1
UserInfoGroup(y)(i) = UserGroupArray(i).ToString()
Next
Next
Later on...:
AccountChecker_Listview.Groups.Add(New ListViewGroup(Items(y, 0), HorizontalAlignment.Left))
For i As Integer = 0 To UserInfoGroup(y).Count - 1
AccountChecker_Listview.Items.Add(UserInfoGroup(y)(i)).Group = AccountChecker_Listview.Groups(y)
Next
Item(,) contains my normal Active directory data that I display Item(y, 0) contain the username.
y is the number of user accounts in AD. I also have some other code for the other information in this loop but it's not the issue here.
Anyone know how to make this goes faster or if there is another solution?

I'd recommend trying to find out where the time is spent. One option is to use a profiler, either the one built into Visual Studio or a third-party profiler like Redgate's Ants Profiler or the Yourkit .Net Profiler.
Another is to trace the time taken using the System.Diagnostics.Stopwatch class and use the results to guide your optimization efforts. For example time the function that retrieves data from Active Directory and separately time the function that populates the view to narrow down where the bottleneck is.
If the bottleneck is in the Active Directory lookup you may want to consider running the operation asynchronously so that the window is not blocked and populates as new data is retrieved. If it's in the listview you may want to consider for example inserting the data in a batch operation.

Control the timeout for locking Exclusive SQLite3 database

I have a SQLite database that I want to lock for synchronization purposes. I don't want a process that runs async on a different box processing data that has been added from a different box until it has finished with updates. DataAccess is a class that connects to sPackageFileName and reuses the same connection as long as sPackageFileName is the same or unless .Close method is called. So basically DataAccess.ExecCommand executes a command.
In Google I found this ....
DataAccess.ExecCommand("PRAGMA locking_mode = EXCLUSIVE", sPackageFileName)
DataAccess.ExecCommand("BEGIN EXCLUSIVE", sPackageFileName)
DataAccess.ExecCommand("COMMIT", sPackageFileName)
This works as advertise. If I run this on box A and then on box B I get a "database locked" exception. The problem is how long it takes. I found a PRAGMA busy_timeout. This PRAGMA is timeout controls access locks, not database locks. I am stratring to think there is not PRAGMA for database lock timeout. Right now it seems about 3-4 minutes. One other note, the sPackageFileName is not on either box, they (box A and B) connect to it over a share drive.
Also I am using the VB.NET wrapper for the SQLite dll.

CL got me on the right trail. It was the timeout of the .NET command. Here the code setting it up from my class.
Dim con As DbConnection = OpenDb(DatabaseName, StoreNumber, ShareExclusive, ExtType)
Dim cmd As DbCommand = con.CreateCommand()
If _QueryTimeOut > -1 Then cmd.CommandTimeout = _QueryTimeOut
Don't get hang up on the variables, the purpose of posting the code is show I could show the property I was talking about. The default _QueryTimeOut was set the 300 (seconds). I set cmd.ComandTimeout to 1 (second) and it returned as expected.
As CL finally got through to me, the timeout was happening someplace else. Sometimes it takes a kick to get you out of the box. :-)

Sitecore Lucene indexing - file not found exception using advanced database crawler

I'm having a problem with Sitecore/Lucene on our Content Management environment, we have two Content Delivery environments where this isn't a problem. I'm using the Advanced Database Crawler to index a number of items of defined templates. The index is pointing to the master database.
The index will remain 'stable' for a few hours or so, and then in the logs I will start to see this error appearing. Along with if I try and open a Searcher.
ManagedPoolThread #17 16:18:47 ERROR Could not update index entry. Action: 'Saved', Item: '{9D5C2EAC-AAA0-43E1-9F8D-885B16451D1A}'
Exception: System.IO.FileNotFoundException
Message: Could not find file 'C:\website\www\data\indexes\__customSearch\_f7.cfs'.
Source: Lucene.Net
at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run()
at Sitecore.Search.Index.CreateReader()
at Sitecore.Search.Index.CreateSearcher(Boolean close)
at Sitecore.Search.IndexSearchContext.Initialize(ILuceneIndex index, Boolean close)
at Sitecore.Search.IndexDeleteContext..ctor(ILuceneIndex index)
at Sitecore.Search.Crawlers.DatabaseCrawler.DeleteItem(Item item)
at Sitecore.Search.Crawlers.DatabaseCrawler.UpdateItem(Item item)
at System.EventHandler.Invoke(Object sender, EventArgs e)
at Sitecore.Data.Managers.IndexingProvider.UpdateItem(HistoryEntry entry, Database database)
at Sitecore.Data.Managers.IndexingProvider.UpdateIndex(HistoryEntry entry, Database database)
From what I read this can be due to an update on the index whilst there is an open reader, and when a merge operation happens the reader will still have a reference to the deleted segment, or something to that avail (I'm not an expert on Lucene).
I have tried a few things with no success. Including sub classing the Sitecore.Search.Index object and overriding CreateWriter(bool recreate) to change the merge scheduler/policy and tweaking the merge factor. See below.
protected override IndexWriter CreateWriter(bool recreate)
{
IndexWriter writer = base.CreateWriter(recreate);
LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();
policy.SetMergeFactor(20);
policy.SetMaxMergeMB(10);
writer.SetMergePolicy(policy);
writer.SetMergeScheduler(new SerialMergeScheduler());
return writer;
}
When I'm reading the index I call SearchManager.GetIndex(Index).CreateSearchContext().Searcher and when I'm done getting the documents I need I call .Close() which I thought would've been sufficient.
I was thinking I could perhaps try overriding CreateSearcher(bool close) as well, to ensure I'm opening a new reader each time, which I will give a go after this. I don't really know enough about how Sitecore handles Lucene, its readers/writers?
I also tried playing around with the UpdateInterval value in the web config to see if that would help, alas it didn't.
I would greatly appreciate anyone who a) knows of any kind of situations in which this could occur, and b) any potential advice/solutions, as I'm starting to bang my head against a rather large wall :)
We're running Sitecore 6.5 rev111123 with Lucene 2.3.
Thanks,
James.

It seems like Lucene freaks out when you try to re-index something that is in the process of being indexed already. To verify that, try the following:
Set the updateinterval of your index to a really high value (8 hours).
Then, stop the w3wp.exe and delete the index.
After deleting the index try to rebuild the index in Sitecore and wait for this to finish.
Test again and see if this occurs.
If this doesn't occur anymore it will be the updateinterval set too low which causes your index (that is probably still being constructed) to be overwritten with a new one (that won't be finished either) causing your segments.gen file to contain the wrong index information.
This .gen file will point your indexreader to what segments are part of your index and is recreated after index rebuilding.
That's why I suggest to try to disable the updates for a large amount of time and to rebuild it manually.

SELECT through oledbcommand in vb.net not picking up recent changes

I'm using the following code to work out the next unique Order Number in an access database. ServerDB is a "System.Data.OleDb.OleDbConnection"
Dim command As New OleDb.OleDbCommand("", serverDB)
command.CommandText = "SELECT max (ORDERNO) FROM WORKORDR"
iOrder = command.ExecuteScalar()
NewOrderNo = (iOrder + 1)
If I subsequently create a WORKORDR (using a different DB connection), the code will not pick up the new "next order number."
e.g.
iFoo = NewOrderNo
CreateNewWorkOrderWithNumber(iFoo)
iFoo2 = NewOrderNo
will return the same value to both iFoo and iFoo2.
If I Close and then reopen serverDB, as part of the "NewOrderNo" function, then it works. iFoo and iFoo2 will be correct.
Is there any way to force a "System.Data.OleDb.OleDbConnection" to refresh the database in this situation without closing and reopening the connection.
e.g. Is there anything equivalent to serverdb.refresh or serverdb.FlushCache
How I create the order.
I wondered if this could be caused by not updating my transactions after creating the order. I'm using an XSD for the order creation, and the code I use to create the record is ...
Sub CreateNewWorkOrderWithNumber(ByVal iNewOrder As Integer)
Dim OrderDS As New CNC
Dim OrderAdapter As New CNCTableAdapters.WORKORDRTableAdapter
Dim NewWorkOrder As CNC.WORKORDRRow = OrderDS.WORKORDR.NewWORKORDRRow
NewWorkOrder.ORDERNO = iNewOrder
NewWorkOrder.name = "lots of fields filled in here."
OrderDS.WORKORDR.AddWORKORDRRow(NewWorkOrder)
OrderAdapter.Update(NewWorkOrder)
OrderDS.AcceptChanges()
End Sub

From MSDN
Microsoft Jet has a read-cache that is
updated every PageTimeout milliseconds
(default is 5000ms = 5 seconds). It
also has a lazy-write mechanism that
operates on a separate thread to main
processing and thus writes changes to
disk asynchronously. These two
mechanisms help boost performance, but
in certain situations that require
high concurrency, they may create
problems.
If you possibly can, just use one connection.
Back in VB6 you could force the connection to refresh itself using ADO. I don't know whether it's possible with VB.NET. My Google-fu seems to be weak today.
You can change the PageTimeout value in the registry but that will affect all programs on the computer that use the Jet engine (i.e. programmatic use of Access databases)

I always throw away a Connection Object after I used it. Due to Connection Pooling getting a new Connection is cheap.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas