Lucene indexes are getting deleted while adding a document - lucene

I am using Lucene.Net.
On some clients' machines all the indexes are getting deleted when we are trying to add a new document.
Process:
Check if document already exists
If Exists delete
Else add a new document
Say I had 2000 indexes, now adding new one, it deletes all the indexed documents and just keeps the one which we have added recently. However, a few times it also deletes that.
It is only happening on a few customers' machines and we are also not able to reproduce on our environment.
var dir = new DirectoryInfo(_reportIndexDirectory);
_directory = FSDirectory.Open(dir);
_ireader = IndexReader.Open(_directory, true);
_iwriter = new IndexWriter(_directory, _analyzer, (!dir.Exists || dir.GetFiles().Length == 0) ? true : false, IndexWriter.MaxFieldLength.LIMITED);
_iwriter.SetMaxFieldLength(25000);
_iwriter.SetSimilarity(_similarityOne);
_iwriter.SetRAMBufferSizeMB(900);

Related

Core Data + CloudKit Migration: Cannot create or modify field [...] in record [...] in production schema

I use NSPersistentCloudKitContainer to sync Core Data with Cloud Kit. To prepare for a new migration, I have created a new model version of the xcdatamodel and marked it as "current". I created a new entity and added a relationship from another entity. Nothing spectacular and suitable for a lightweight migration I thought.
Let's name this new entity: EntityNew
This is my code to initialize the NSPersistentCloudKitContainer:
lazy var persistentContainer: NSPersistentContainer = {
let container = NSPersistentCloudKitContainer(name: "MyContainerName")
container.loadPersistentStores(completionHandler: { _, error in
guard let error = error as NSError? else { return }
fatalError("###\(#function): Failed to load persistent stores:\(error)")
})
container.viewContext.automaticallyMergesChangesFromParent = true
return container
}()
shouldMigrateStoreAutomatically and shouldInferMappingModelAutomatically are set to true by default.
Everything worked fine locally. No errors occurred during the migration.
The problems started when I created a new instance of EntityNew:
let newItem = EntityNew(context: context)
newItem = "..."
saveContext()
newItem was created locally without any problems, but the iCloud Sync stopped working from this moment. The following error appeared in the console:
"<CKRecordID: 0x283fb1460; recordName=2E2209A1-F9F6-4DF2-960D-2C31F764ED05, zoneID=com.apple.coredata.cloudkit.zone:__defaultOwner__>" = "<CKError 0x2830a5950: \"Batch Request Failed\" (22/2024); server message = \"Atomic failure\"; uuid = ADA626F4-160E-49FE-A0BD-2198E5FBD09A; container ID = \"iCloud.[MyContainerID]\">"
"<CKRecordID: 0x283fb1a00; recordName=3145C837-D80D-47E0-B944-DBC6576A9B0A, zoneID=com.apple.coredata.cloudkit.zone:__defaultOwner__>" = "<CKError 0x2830a4000: \"Invalid Arguments\" (12/2006); server message = \"Cannot create or modify field 'CD_[Fieldname in EntityNew]' in record 'CD_[OtherEntityName]' in production schema\"; uuid = ADA626F4-160E-49FE-A0BD-2198E5FBD09A; container ID = \"iCloud.[ContainerID]\">";
"Cannot create or modify field 'CD_[Fieldname in EntityNew]' in record 'CD_[OtherEntityName]' in production schema"
Cloud Kit tries to modify the field CD_[Fieldname in EntityNew] (which is correct) on the record CD_[OtherEntityName], which is not the entity I created above! So Core Data tries to modify the wrong entity! This behavior does not happen for all fields (approx. 5 out of 10). I checked the local sqlite file of my iPhone but the local tables seems correct. The phenomenon can be observed in both, the Development and the Production icloud-container-environment. If I start with an empty database (which already contains the new entity, so no migration is necessary) the synchronization works.
What did I miss? Any ideas?
Thank you!

Setting MaxPageSize for EmbeddableDocumentStore

Seems that setting the Configuration for MaxPageSize inside the progrram does not work.
I know that it is bad practice, but i want to retrieve all distinct artists inside a Music Database collection and store it in a GUI Combo for selection and don't like to loop through bunch of 128 documents. The system is used standalone, so the performance impact isthe sameas a select * from a sql db.
i do this in my code:
_store = new EmbeddableDocumentStore
{
Configuration = new RavenConfiguration()
{
DataDirectory = #"Database\MusicDatabase\",
RunInMemory = false,
MaxPageSize = 300000
}
};
_store.Initialize();
But i still get only 128 documents returned.
Tried to add a key to my App config file and also added a web.config file with Raven/MaxPageSize, but it seems to be ignored.
Any idea?
thanks,
Helmut
MaxPageSize controls the max limit. And raising it is generally a bad idea.
If you don't specify how many items you want, however, we'll default to 128.

Alfresco: unable to backup alf_data

I am an alfresco 3.3c user with an instance supporting more that 4 million objects. I’m starting having problems with backup, because to backup the alf_data/contentstore folder even in a incremental mode, it takes to long (always need to analyze all those files for changes).
I’ve noticed that alf_data/contentstore is organized internally per years, could I assume that the olders years (2012) are not anymore changed? (if yes, I can just create an exception and remove those dirs from the backup process, obviously with a previous full backup )
Thanks, kind regards.
Yes, you can assume that no objects will be created (and items are never updated) in old directories within your content store, although items may be removed by the repository's cleanup jobs after being deleted from Alfresco's trash can.
This is the section from org.alfresco.repo.content.filestore.FileContentStore which generates a new content URL. You can easily see that it always uses the current date and time.
/**
* Creates a new content URL. This must be supported by all
* stores that are compatible with Alfresco.
*
* #return Returns a new and unique content URL
*/
public static String createNewFileStoreUrl()
{
Calendar calendar = new GregorianCalendar();
int year = calendar.get(Calendar.YEAR);
int month = calendar.get(Calendar.MONTH) + 1; // 0-based
int day = calendar.get(Calendar.DAY_OF_MONTH);
int hour = calendar.get(Calendar.HOUR_OF_DAY);
int minute = calendar.get(Calendar.MINUTE);
// create the URL
StringBuilder sb = new StringBuilder(20);
sb.append(FileContentStore.STORE_PROTOCOL)
.append(ContentStore.PROTOCOL_DELIMITER)
.append(year).append('/')
.append(month).append('/')
.append(day).append('/')
.append(hour).append('/')
.append(minute).append('/')
.append(GUID.generate()).append(".bin");
String newContentUrl = sb.toString();
// done
return newContentUrl;
}
Actually no you can't, because if the file was modified/updated in Alfresco the filesystem path doesn't change. Remember, you can hot-backup the content-store (not the lucene index folder) dir, and it's not necessary to check every single file for consistency. Just launch a shell/batch script executing a copy without check, or use a tool like xxcopy.
(I'm talking about node properties, not the node content)

avoid indexing documents again Lucene

When I run my program, I index the documents each time I run the program in eclipse. However, I want to just index once. Perhaps by deleting the index after each use, but I don't know how to go about doing that.
Set your IndexWriter to OpenMode.CREATE. It's probably set to OpenMode.CREATE_OR_APPEND now. Setting it to CREATE will cause the existing index at the specified directory to be overwritten when you open the indexwriter, to make way for the new one.
Like:
IndexWriterConfig config = new IndexWriterConfig(version, analyzer);
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
//etc.....
IndexWriter writer = new IndexWriter(directory, config);

RavenDB, RavenHQ and Appharbor - document size error with very first document

I have a completely empty RavenHQ database that's linked to my Appharbor application. The amount of space the database is currently using is 1.1mb out of an available 25mb for my bronze account. The database previously had records in it, but I have deleted them using "delete collection" in the management studio.
The very first time I call session.Store(myobject), and BEFORE I call .SaveChanges(), I get the following error.
System.InvalidOperationException: Url: "/docs/Raven/Hilo/AccItems"
Raven.Database.Exceptions.OperationVetoedException: PUT vetoed by Raven.Bundles.Quotas.Triggers.DatabaseSizeQoutaForDocumetsPutTrigger because: Database size is 45,347 KB, which is over the allowed quota of 25,600 KB. No more documents are allowed in.
Now, the document is definitely not that big, so I don't know what this error can mean, especially as I don't think I've even hit the database at that point since I haven't closed the session by calling SaveChanges(). Any ideas? Here's the code itself.
XDocument doc = XDocument.Parse(rawXml);
var accItems = ExtractItemsFromFeed(doc);
using (IDocumentSession session = _store.OpenSession())
{
var dbItems = session.Query<AccItem>().ToList();
foreach (var item in accItems)
{
var existingRecord = dbItems.SingleOrDefault(x => x.Source == x.SourceId == cottage.SourceId);
if (existingRecord == null)
{
session.Store(item);
_logger.Info("Saved new item {0}.", item.ShortName);
}
else
{
existingRecord.ShortName = item.ShortName;
_logger.Info("Updated item {0}.", item.ShortName);
}
session.SaveChanges();
}
}
Any other comments about the style of this code would be most welcome, as I was unsure of the best way to approach the "update existing item or create if it isn't there" scenario.
The answer here was as follows.
RavenHQ support found that the database was indeed oversized, but it seemed that the size reported in the Appharbor-branded RavenHQ control panel was incorrect. I had filled up the database way over the limit with a previous faulty version of the code posted above, so the error message I received was actually correct.
Fixing this problem without paying to upgrade the database wasn't straightforward, as it's not possible to shrink the database. As I also wasn't able to delete my single Appharbor/RavenHQ database or create another one that left me with the choice of creating an entirely new Appharbor application, or registering directly with RavenHQ for a new account. I chose the latter. The RavenHQ-branded control panel is slightly different to the Appharbor one, in that it has the ability to create and delete databases.
So to summarize: there doesn't seem to be any benefit to using RavenHQ as an add-on to Appharbor - you might as well go and get a proper free RavenHQ account.