Ignite offheapUsedSize doesn't get reset after clear cache - ignite

I got Ignite offheapUsedSize through igenite.dataRegionMetircs().getOffheapUsedSize(), but after I clear cache, this value doesn't get reset, it just keep increasing as time going, I have tried all methods it still not work.
IgniteCache.clear
IgniteCache.removeAll
IgniteCache.clearStatistics()
IgniteCache.resetQueryMetrics
IgniteCache.resetQueryDetailMetrics
IgniteCache.destroy
offheapUsedSize get reset only after I restart the server.

This is working as design
http://apache-ignite-users.70518.x6.nabble.com/how-to-monitor-off-heap-size-used-td27733.html
This is my way to to get real off heap usage:
DataRegionMetricsSnapshot.getTotalUsedPages() * DataRegionMetricsSnapshot.getPageSize()

Related

Spark - Failed to load collect frame - "RetryingBlockFetcher - Exception while beginning fetch"

We have a Scala Spark application, that reads something like 70K records from the DB to a data frame, each record has 2 fields.
After reading the data from the DB, we make minor mapping and load this as a broadcast for later usage.
Now, in local environment, there is an exception, timeout from the RetryingBlockFetcher while running the following code:
dataframe.select("id", "mapping_id")
.rdd.map(row => row.getString(0) -> row.getLong(1))
.collectAsMap().toMap
The exception is:
2022-06-06 10:08:13.077 task-result-getter-2 ERROR
org.apache.spark.network.shuffle.RetryingBlockFetcher Exception while
beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /1.1.1.1:62788
at
org.apache.spark.network.client.
TransportClientFactory.createClient(Transpor .tClientFactory.java:253)
at
org.apache.spark.network.client.
TransportClientFactory.createClient(TransportClientFactory.java:195)
at
org.apache.spark.network.netty.
NettyBlockTransferService$$anon$2.
createAndStart(NettyBlockTransferService.scala:122)
In the local environment, I simply create the spark session with local "spark.master"
When I limit the max of records to 20K, it works well.
Can you please help? maybe I need to configure something in my local environment in order that the original code will work properly?
Update:
I tried to change a lot of Spark-related configurations in my local environment, both memory, a number of executors, timeout-related settings, and more, but nothing helped! I just got the timeout after more time...
I realized that the data frame that I'm reading from the DB has 1 partition of 62K records, while trying to repartition with 2 or more partitions the process worked correctly and I managed to map and collect as needed.
Any idea why this solves the issue? Is there a configuration in the spark that can solve this instead of repartition?
Thanks!

Ignite TcpCommunicationSpi : Can slowClientQueueLimit be set to same value as messageQueueLimit as per docs?

I am not completely sure of the meaning or the interplay between slowClientQueueLimit and messageQueueLimit.
As per the documentation, they both should ideally be set to the same value, https://ignite.apache.org/releases/2.4.0/javadoc/org/apache/ignite/spi/communication/tcp/TcpCommunicationSpi.html#setSlowClientQueueLimit-int-
However when i do set that i see this in the logs, is it a minor bug in the check or should i change this?
[WARN ] 2018-06-27 22:32:18.429 [main] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Slow client queue limit is set to a value greater than message queue limit (slow client queue limit will have no effect) [msgQueueLimit=1024, slowClientQueueLimit=1024]
Thanks
From code the warning is correct, but javadoc is not. slowClientQueueLimit has to be less than msgQueueLimit, because when message is being prepared to sending, first are checked back pressure limits, and only then slowClientQueueLimit. If these two numbers are equal, sender thread will be blocked by back pressure before it could go to slow client check. What means client would not be dropped.
Set slowClientQueueLimit to msgQueueLimit - 1 or less, and I'll suggest community to fix the docs.

how to update Rails.cache TTL on read/fetch

I would like to keep my objects in the Rails cache so long as there is a read within some interval (say 10 minutes). I can successfully set a TTL on cache object creation with:
Rails.cache.fetch('key', expires_in: 10.minutes) do
some_expensive_operation
end
But I notice that subsequent cache reads for 'key' do not up the TTL (at least not in my setup, which is Rails 3.2 + Redis).
Is there a way to have Rails.cache.{fetch,read} re-up the TTL for cache hits?
My alternative is to do the following, which seems potentially wasteful:
result = Rails.cache.read('key') || some_expensive_operation
Rails.cache.write('key', result, expires_in: 10.minutes)
Here's a redis-specific solution I came up with:
def fetch(key, ttl_seconds)
cached_value = $redis.get(key)
if cached_value
# re-up the TTL
$redis.expire(key, ttl_seconds)
result = Marshal.load(cached_value)
else
result = yield
$redis.setex(key, ttl_seconds, Marshal.dump(result))
end
result
end
fetch('key', 10.minutes) do
some_expensive_operation
end
I'd prefer something that's cache-provider neutral, but that just doesn't seem to be part of the Rails caching API, unless I've missed something.
As I understand you want update TTL on fetch just to make cache store only frequently used records. I guess the reason is cache size.
Just don't do that! Let cache engine to decide what records to keep. If you're using Memcached it does this job out of box, you can never bother deleting old records. If you're using Redis you need to configure it as an LRU cache.

Compass/Lucene in clustered environment

I get the following error in a clustered environment where one node is indexing the objects and the other node is confused about the segments that are there in cache. The node never recovers by itself even after server restart. The node that's indexing might be merging the segments and deleting which the other node is not aware of. I did not touch the invalidateCacheInterval setting and added compass.engine.globalCacheIntervalInvalidation property with 500ms. It didn't help.
This is happening while searching and indexing on the other node.
Can someone help me how to resolve this issue? Maybe to ask compass reload the cache or start from scratch without having to reindex all the objects?
org.compass.core.engine.SearchEngineException: Failed to search with query [+type:...)]; nested exception is org.apache.lucene.store.jdbc.JdbcStoreException: No entry for [_6ge.tis] table index_objects
org.apache.lucene.store.jdbc.JdbcStoreException: No entry for [_6ge.tis] table index_objects
at org.apache.lucene.store.jdbc.index.FetchOnBufferReadJdbcIndexInput$1.execute(FetchOnBufferReadJdbcIndexInput.java:68)
at org.apache.lucene.store.jdbc.support.JdbcTemplate.executeSelect(JdbcTemplate.java:112)
at org.apache.lucene.store.jdbc.index.FetchOnBufferReadJdbcIndexInput.refill(FetchOnBufferReadJdbcIndexInput.java:58)
at org.apache.lucene.store.ConfigurableBufferedIndexInput.readByte(ConfigurableBufferedIndexInput.java:27)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127)
at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:250)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:218)
at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:752)
at org.apache.lucene.index.MultiSegmentReader.docFreq(MultiSegmentReader.java:377)
at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:86)
at org.apache.lucene.search.Similarity.idf(Similarity.java:457)
at org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:44)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:185)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:360)
at org.apache.lucene.search.Query.weight(Query.java:95)
at org.apache.lucene.search.Hits.(Hits.java:85)
at org.apache.lucene.search.Searcher.search(Searcher.java:61)
at org.compass.core.lucene.engine.transaction.support.AbstractTransactionProcessor.findByQuery(AbstractTransactionProcessor.java:146)
at org.compass.core.lucene.engine.transaction.support.AbstractSearchTransactionProcessor.performFind(AbstractSearchTransactionProcessor.java:59)
at org.compass.core.lucene.engine.transaction.search.SearchTransactionProcessor.find(SearchTransactionProcessor.java:50)
at org.compass.core.lucene.engine.LuceneSearchEngine.find(LuceneSearchEngine.java:352)
at org.compass.core.lucene.engine.LuceneSearchEngineQuery.hits(LuceneSearchEngineQuery.java:188)
at org.compass.core.impl.DefaultCompassQuery.hits(DefaultCompassQuery.java:199)

How to flush the io buffer in Erlang?

How do you flush the io buffer in Erlang?
For instance:
> io:format("hello"),
> io:format(user, "hello").
This post seems to indicate that there is no clean solution.
Is there a better solution than in that post?
Sadly other than properly implementing a flush "command" in the io/kernel subsystems and making sure that the low level drivers that implement the actual io support such a command you really have to simply rely on the system quiescing before closing. A failing I think.
Have a look at io.erl/io_lib.erl in stdlib and file_io_server.erl/prim_file.erl in kernel for the gory details.
As an example, in file_io_server (which effectively takes the request from io/io_lib and routes it to the correct driver), the command types are:
{put_chars,Chars}
{get_until,...}
{get_chars,...}
{get_line,...}
{setopts, ...}
(i.e. no flush)!
As an alternative you could of course always close your output (which would force a flush) after every write. A logging module I have does something like this every time and it doesn't appear to be that slow (it's a gen_server with the logging received via cast messages):
case file:open(LogFile, [append]) of
{ok, IODevice} ->
io:fwrite(IODevice, "~n~2..0B ~2..0B ~4..0B, ~2..0B:~2..0B:~2..0B: ~-8s : ~-20s : ~12w : ",
[Day, Month, Year, Hour, Minute, Second, Priority, Module, Pid]),
io:fwrite(IODevice, Msg, Params),
io:fwrite(IODevice, "~c", [13]),
file:close(IODevice);
io:put_chars(<<>>)
at the end of the script works for me.
you could run
flush().
from the shell, or try
flush()->
receive
_ -> flush()
after 0 -> ok
end.
That works more or less like a C flush.