ChronicleMap cannot store/use the defined No of Max.Entries after removing a few entries? - chronicle-map

Chronicle Map Versions I used - 3.22ea5 / 3.21.86
I am trying to use ChronicleMap as an LRU cache.
I have two ChronicleMaps both equal in configuration with allowSegmentTiering set as false. Consider one as main and the other as backup.
So, when the main Map gets full, few entries will be removed from the main Map and in parallel the backup Map will be used. Once the entries are removed from main Map, the entries from the backup Map will be refilled in the Main Map.
Shown below a sample code.
ChronicleMap<ByteBuffer, ByteBuffer> main = ChronicleMapBuilder.of(ByteBuffer.class, ByteBuffer.class).name("main")
.entries(61500)
.averageKey(ByteBuffer.wrap(new byte[500]))
.averageValue(ByteBuffer.wrap(new byte[5120]))
.allowSegmentTiering(false)
.create();
ChronicleMap<ByteBuffer, ByteBuffer> backup = ChronicleMapBuilder.of(ByteBuffer.class, ByteBuffer.class).name("backup")
.entries(100)
.averageKey(ByteBuffer.wrap(new byte[500]))
.averageValue(ByteBuffer.wrap(new byte[5120]))
.allowSegmentTiering(false)
.create();
System.out.println("Main Heap Size -> "+main.offHeapMemoryUsed());
SecureRandom random = new SecureRandom();
while (true)
{
System.out.println();
AtomicInteger entriesAdded = new AtomicInteger(0);
try
{
int mainEntries = main.size();
while /*(true) Loop until error is thrown */(mainEntries < 61500)
{
try
{
byte[] keyN = new byte[500];
byte[] valueN = new byte[5120];
random.nextBytes(keyN);
random.nextBytes(valueN);
main.put(ByteBuffer.wrap(keyN), ByteBuffer.wrap(valueN));
mainEntries++;
}
catch (Throwable t)
{
System.out.println("Max Entries is not yet reached!!!");
break;
}
}
System.out.println("Main Entries -> "+main.size());
for (int i = 0; i < 10; i++)
{
byte[] keyN = new byte[500];
byte[] valueN = new byte[5120];
random.nextBytes(keyN);
random.nextBytes(valueN);
backup.put(ByteBuffer.wrap(keyN), ByteBuffer.wrap(valueN));
}
AtomicInteger removed = new AtomicInteger(0);
AtomicInteger i = new AtomicInteger(Math.max( (backup.size() * 5), ( (main.size() * 5) / 100 ) ));
main.forEachEntry(c -> {
if (i.get() > 0)
{
c.context().remove(c);
i.decrementAndGet();
removed.incrementAndGet();
}
});
System.out.println("Removed "+removed.get()+" Entries from Main");
backup.forEachEntry(b -> {
ByteBuffer key = b.key().get();
ByteBuffer value = b.value().get();
b.context().remove(b);
main.put(key, value);
entriesAdded.incrementAndGet();
});
if(backup.size() > 0)
{
System.out.println("It will never be logged");
backup.clear();
}
}
catch (Throwable t)
{
// System.out.println();
// t.printStackTrace(System.out);
System.out.println();
System.out.println("-------------------------Failed----------------------------");
System.out.println("Added "+entriesAdded.get()+" Entries in Main | Lost "+(backup.size() + 1)+" Entries in backup");
backup.clear();
break;
}
}
main.close();
backup.close();
The above code yields the following result.
Main Entries -> 61500
Removed 3075 Entries from Main
Main Entries -> 61500
Removed 3075 Entries from Main
Main Entries -> 61500
Removed 3075 Entries from Main
Max Entries is not yet reached!!!
Main Entries -> 59125
Removed 2956 Entries from Main
Max Entries is not yet reached!!!
Main Entries -> 56227
Removed 2811 Entries from Main
Max Entries is not yet reached!!!
Main Entries -> 53470
Removed 2673 Entries from Main
-------------------------Failed----------------------------
Added 7 Entries in Main | Lost 3 Entries in backup
In the above result, The Max Entries of the Main map got decreased in the subsequent iterations and the refilling from the backup Map also got failed.
In the Issue 128, it was said the entries are deleted properly.
Then why the above sample code fails? What am I doing wrong in here? Is the Chronicle Map not designed for such usage pattern?
Even If I use one Map only, the max Entries the Map can hold gets reduced after each removal of entries.

Related

Apache Ignite, Scan Query without filter and Iterator over cache, both returns empty result set. The number of entries in the cache is non-zero

Apache Ignite Version: 2.8.0
While starting the first ignite cache node, below code snippets, give expected output. When the second ignite cache joins the cluster, the entries object in the first code snippet is empty while in the second snippet iter.hasNext() gives false.
The ignite cluster is started with ShareNothing persistance.
Cache mode is set to CacheMode.REPLICATED.
Both the nodes are in server mode.
call to size() in both instances give a non-zero value.
call to localSize() gives non-zero value in the first instance, but 0 in second instance.
Map results = new HashMap();
QueryCursor<Cache.Entry> entries = binaryCache.query(new ScanQuery(null));
logger.log(Level.DEBUG, "CacheSize from entrySet: [%s]", binaryCache.size()); // Outputs non zero number.
try {
for (Cache.Entry e : entries) {
results.put(e.getKey(), e.getValue());
}
} catch (IOException e) {
throw new RuntimeException("Error deserializing ignite entry", e);
}
return results.entrySet();
OR
Map results = new HashMap();
Iterator<Cache.Entry<Object, BinaryObject>> iter = binaryCache.iterator();
logger.log(Level.DEBUG, "CacheSize from entrySet: [%s]", binaryCache.size()); // Outputs non zero number.
try {
while(iter.hasNext()) {
Cache.Entry e = iter.next();
results.put(e.getKey(), e.getValue());
}
} catch (IOException e) {
throw new RuntimeException("Error deserializing ignite entry", e);
}
return results.entrySet();
As null is passed to ScanQuery, ideally, all the entries from the cache should be fetched.
IgniteCache is iterable, as you noticed in the second example.
If you need to get the stored values, the simplest code would be like:
IgniteCache<Object, BinaryObject> myCache = ignite.getOrCreateCache(ccfg).withKeepBinary();
for(Cache.Entry<Object, BinaryObject> entry: myCache){
System.out.println(entry.getKey() + " " + entry.getValue());
}
System.out.println("trying again");
for(Cache.Entry<Object, BinaryObject> entry: myCache){
System.out.println(entry.getKey() + " " + entry.getValue());
}
If cache has cache store, it may desync since cache store loads are not propagated across cluster. This is how cache store works on replicated caches.

Hibernate Search manual indexing throw a "org.hibernate.TransientObjectException: The instance was not associated with this session"

I use Hibernate Search 5.11 on my Spring Boot 2 application, allowing to make full text research.
This librairy require to index documents.
When my app is launched, I try to re-index manually data of an indexed entity (MyEntity.class) each five minutes (for specific reason, due to my server context).
I try to index data of the MyEntity.class.
MyEntity.class has a property attachedFiles, which is an hashset, filled with a join #OneToMany(), with lazy loading mode enabled :
#OneToMany(mappedBy = "myEntity", cascade = CascadeType.ALL, orphanRemoval = true)
private Set<AttachedFile> attachedFiles = new HashSet<>();
I code the required indexing process, but an exception is thrown on "fullTextSession.index(result);" when attachedFiles property of a given entity is filled with one or more items :
org.hibernate.TransientObjectException: The instance was not associated with this session
The debug mode indicates a message like "Unable to load [...]" on entity hashset value in this case.
And if the HashSet is empty (not null, only empty), no exception is thrown.
My indexing method :
private void indexDocumentsByEntityIds(List<Long> ids) {
final int BATCH_SIZE = 128;
Session session = entityManager.unwrap(Session.class);
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<MyEntity> criteria = builder.createQuery(MyEntity.class);
Root<MyEntity> root = criteria.from(MyEntity.class);
criteria.select(root).where(root.get("id").in(ids));
TypedQuery<MyEntity> query = fullTextSession.createQuery(criteria);
List<MyEntity> results = query.getResultList();
int index = 0;
for (MyEntity result : results) {
index++;
try {
fullTextSession.index(result); //index each element
if (index % BATCH_SIZE == 0 || index == ids.size()) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
} catch (TransientObjectException toEx) {
LOGGER.info(toEx.getMessage());
throw toEx;
}
}
}
Does someone have an idea ?
Thanks !
This is probably caused by the "clear" call you have in your loop.
In essence, what you're doing is:
load all entities to reindex into the session
index one batch of entities
remove all entities from the session (fullTextSession.clear())
try to index the next batch of entities, even though they are not in the session anymore... ?
What you need to do is to only load each batch of entities after the session clearing, so that you're sure they are still in the session when you index them.
There's an example of how to do this in the documentation, using a scroll and an appropriate batch size: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#search-batchindex-flushtoindexes
Alternatively, you can just split your ID list in smaller lists of 128 elements, and for each of these lists, run a query to get the corresponding entities, reindex all these 128 entities, then flush and clear.
Thanks for the explanations #yrodiere, they helped me a lot !
I chose your alternative solution :
Alternatively, you can just split your ID list in smaller lists of 128 elements, and for each of these lists, run a query to get the corresponding entities, reindex all these 128 entities, then flush and clear.
...and everything works perfectly !
Well seen !
See the code solution below :
private List<List<Object>> splitList(List<Object> list, int subListSize) {
List<List<Object>> splittedList = new ArrayList<>();
if (!CollectionUtils.isEmpty(list)) {
int i = 0;
int nbItems = list.size();
while (i < nbItems) {
int maxLastSubListIndex = i + subListSize;
int lastSubListIndex = (maxLastSubListIndex > nbItems) ? nbItems : maxLastSubListIndex;
List<Object> subList = list.subList(i, lastSubListIndex);
splittedList.add(subList);
i = lastSubListIndex;
}
}
return splittedList;
}
private void indexDocumentsByEntityIds(Class<Object> clazz, String entityIdPropertyName, List<Object> ids) {
Session session = entityManager.unwrap(Session.class);
List<List<Object>> splittedIdsLists = splitList(ids, 128);
for (List<Object> splittedIds : splittedIdsLists) {
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
Transaction transaction = fullTextSession.beginTransaction();
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<Object> criteria = builder.createQuery(clazz);
Root<Object> root = criteria.from(clazz);
criteria.select(root).where(root.get(entityIdPropertyName).in(splittedIds));
TypedQuery<Object> query = fullTextSession.createQuery(criteria);
List<Object> results = query.getResultList();
int index = 0;
for (Object result : results) {
index++;
try {
fullTextSession.index(result); //index each element
if (index == splittedIds.size()) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
} catch (TransientObjectException toEx) {
LOGGER.info(toEx.getMessage());
throw toEx;
}
}
transaction.commit();
}
}

Deleting many managed objects selectd by fragment name

I want to delete many managed objects, selected by fragment type. There are more then 2000 elements in it. Unfortunately I can not delete all with one function call. I have to call this function many times until I have deleted all. How can I delete a list of managed objects in a sufficient way? Not defining page size did not help...
This is my current function:
InventoryFilter filter = new InventoryFilter();
filter.byFragmentType("xy_fragment");
ManagedObjectCollection moc = inventoryApi.getManagedObjectsByFilter(filter);
int count = 0;
// max page size is 2000
for (ManagedObjectRepresentation mo : moc.get(2000).allPages()) {
if (mo.get("c8y_IsBinary") != null) {
binariesApi.deleteFile(mo.getId());
} else {
inventoryApi.delete(mo.getId());
}
LOG.debug(count + " remove: " + mo.getName() + ", " + mo.getType());
count++;
}
LOG.info("all objectes removed, count:" + count);
By calling moc.get(2000).allPages() you already obtain an iterator that queries following pages on demand as you iterate over it.
The problem you are facing is caused by deleting elements from the same list you are iterating over. You delete element from the first page, but once the second page is queried from the server it does not contain the expected elements anymore because you already deleted the first page. Now all elements are shifted forward by your page size.
You can avoid all of that by making a local copy of all elements you want to delete first:
List<ManagedObjectRepresentation> allObjects = Lists.newArrayList( moc.get(2000).allPages())
for (ManagedObjectRepresentation mo : allObjects) {
//delete here
}
There is no bulk delete allowed on the inventory API so your method of looping through the objects is the correct approach.
A bulk delete is already a dangerous tool on the other APIs but on the inventory API it would give you the potential to accidentally delete all your data with just one call (as all data associated with a managedObject is also deleted upon the deletion of the managedObject).
That is why it is not available.
I solved the problem by calling the method until no elements can be found any more. It is not nice but I have no other idea.
public synchronized void removeManagedObjects(String deviceTypeKey) {
int count = 0;
do {
count = deleteManagedObjectes(deviceTypeKey);
}while(count > 0);
}
private int deleteManagedObjectes(String deviceTypeKey) {
InventoryFilter filter = new InventoryFilter();
filter.byFragmentType("xy_fragment");
ManagedObjectCollection moc = inventoryApi.getManagedObjectsByFilter(filter);
int count = 0;
if(moc == null) {
LOG.info("ManagedObjectCollection are NULL");
return count;
}
for (ManagedObjectRepresentation mo : moc.get(2000).allPages()) {
if (mo.get("c8y_IsBinary") != null) {
binariesApi.deleteFile(mo.getId());
} else {
inventoryApi.delete(mo.getId());
}
LOG.debug(count + " remove: " + mo.getName() + ", " + mo.getType());
count++;
}
LOG.info("all objectes removed, count:" + count);
return count;
}

Insert 1000000 documents into RavenDB

I want to insert 1000000 documents into RavenDB.
class Program
{
private static string serverName;
private static string databaseName;
private static DocumentStore documentstore;
private static IDocumentSession _session;
static void Main(string[] args)
{
Console.WriteLine("Start...");
serverName = ConfigurationManager.AppSettings["ServerName"];
databaseName = ConfigurationManager.AppSettings["Database"];
documentstore = new DocumentStore { Url = serverName };
documentstore.Initialize();
Console.WriteLine("Initial Databse...");
_session = documentstore.OpenSession(databaseName);
for (int i = 0; i < 1000000; i++)
{
var person = new Person()
{
Fname = "Meysam" + i,
Lname = " Savameri" + i,
Bdate = DateTime.Now,
Salary = 6001 + i,
Address = "BITS provides one foreground and three background priority levels that" +
"you can use to prioritize transBfer jobs. Higher priority jobs preempt"+
"lower priority jobs. Jobs at the same priority level share transfer time,"+
"which prevents a large job from blocking small jobs in the transfer"+
"queue. Lower priority jobs do not receive transfer time until all the "+
"higher priority jobs are complete or in an error state. Background"+
"transfers are optimal because BITS uses idle network bandwidth to"+
"transfer the files. BITS increases or decreases the rate at which files "+
"are transferred based on the amount of idle network bandwidth that is"+
"available. If a network application begins to consume more bandwidth,"+
"BITS decreases its transfer rate to preserve the user's interactive"+
"experience. BITS supports multiple foreground jobs and one background"+
"transfer job at the same time.",
Email = "Meysam" + i + "#hotmail.com",
};
_session.Store(person);
Console.ForegroundColor = ConsoleColor.Green;
Console.WriteLine("Count:" + i);
Console.ForegroundColor = ConsoleColor.White;
}
Console.WriteLine("Commit...");
_session.SaveChanges();
documentstore.Dispose();
_session.Dispose();
Console.WriteLine("Complete...");
Console.ReadLine();
}
}
but session doesn't save changes, I get an error:
An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll
A document session is intended to handle a small number of requests. Instead, experiment with inserting in batches of 1024. After that, dispose the session and create a new one. The reason you get an OutOfMemoryException is because the document session caches all constituent objects to provide a unit of work, which is why you should dispose of the session after inserting a batch.
A neat way to do this is with the use of a Batch linq extension:
foreach (var batch in Enumerable.Range(1, 1000000)
.Select(i => new Person { /* set properties */ })
.Batch(1024))
{
using (var session = documentstore.OpenSession())
{
foreach (var person in batch)
{
session.Store(person);
}
session.SaveChanges();
}
}
The implementations of both Enumerable.Range and Batch are lazy and don't keep all the objects in memory.
RavenDB also has a bulk API that does a similar thing without the need for additional LINQ extensions:
using (var bulkInsert = store.BulkInsert())
{
for (int i = 0; i < 1000 * 1000; i++)
{
bulkInsert.Store(new User
{
Name = "Users #" + i
});
}
}
Note .SaveChanges() isn't called and will be called either when a batch size is reached (defined in the BulkInsert() if needed), or when the bulkInsert is disposed of.

RavenDB Paging Behaviour

I have the following test for skip take -
[Test]
public void RavenPagingBehaviour()
{
const int count = 2048;
var eventEntities = PopulateEvents(count);
PopulateEventsToRaven(eventEntities);
using (var session = Store.OpenSession(_testDataBase))
{
var queryable =
session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
var entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
eventEntity.Key = "Modified";
}
session.SaveChanges();
queryable = session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
Assert.AreEqual(eventEntity.Key, "Modified");
}
}
}
PopulateEventsToRaven simply adds 2048 very simple documents to the database.
The first skip take combination gets the first 1024 doucuments modifies the documents and then commits changes.
The next skip take combination again wants to get the first 1024 documents but this time it gets the document number 1024 to 2048 and hence fails the test. Why is this , I would expect the first 1024 again?
Edit: I have varified that if I dont modify the documents the behaviour is fine.
The problem is that you don't specify an order by, and that means that RavenDB is free to choose with items to return, those aren't necessarily going to be the same items that it returned in the previous call.
Use an OrderBy and it will be consistent.