Is StreamTransformer concurrent safe? - ignite

Suppose that I have a ignite cluster with several nodes and a partitioned non-empty IgniteCache named "TEST_CACHE". Then I run following code in one of nodes:
ignite.compute().run(new IgniteRunnable(){
#IgniteInstanceResource
private Ignite ignite;
#Override
public void run() {
IgniteDataStreamer<String,Long> ds = ignite.dataStreamer("TEST_CACHE");
ds.receiver(new StreamTransformer<String,Long>(){
#Override
public Object process(MutableEntry<String, Long> entry, Object... arguments)
throws EntryProcessorException {
Long value = entry.getValue();
entry.setValue(value==null?1L:(value.longValue()+1L));
return null;
}
});
//loop for adding lots of String data
while(...)
ds.addData(...);
}
});
This is similar to the offical StreamTransformerExample code, but what different is each node will get a DataStreamer instance of a same cache, and invoke addData method concurrently. In other words, for the same string data in different nodes, maybe one node has just got the value by "Long value = entry.getValue()" but not execute next row code to set value and update into cache, then another node is executing "entry.getValue()". So is it possible to update wrong value in this concurrent StreamTransformer use case?

StreamReceiver.receive calls cache.invoke with your entry processor, so entry is locked within this operation. So yes, it is concurrent safe.
BTW, did you enable allowOverwrite in your DataStreamer?

Related

Spring Cloud Stream deserialization error handling for Batch processing

I have a question about handling deserialization exceptions in Spring Cloud Stream while processing batches (i.e. batch-mode: true).
Per the documentation here, https://docs.spring.io/spring-kafka/docs/2.5.12.RELEASE/reference/html/#error-handling-deserializer, (looking at the implementation of FailedFooProvider), it looks like this function should return a subclass of the original message.
Is the intent here that a list of both Foo's and BadFoo's will end up at the original #StreamListener method, and then it will be up to the code (i.e. me) to sort them out and handle separately? I suspect this is the case, as I've read that the automated DLQ sending isn't desirable for batch error handling, as it would resubmit the whole batch.
And if this is the case, what if there is more than one message type received by the app via different #StreamListener's, say Foo's and Bar's. What type should the value function return in that case? Below is the pseudo code to illustrate the second question?
#StreamListener
public void readFoos(List<Foo> foos) {
List<> badFoos = foos.stream()
.filter(f -> f instanceof BadFoo)
.map(f -> (BadFoo) f)
.collect(Collectors.toList());
// logic
}
#StreamListener
public void readBars(List<Bar> bars) {
// logic
}
// Updated to return Object and let apply() determine subclass
public class FailedFooProvider implements Function<FailedDeserializationInfo, Object> {
#Override
public Object apply(FailedDeserializationInfo info) {
if (info.getTopics().equals("foo-topic") {
return new BadFoo(info);
}
else if (info.getTopics().equals("bar-topic") {
return new BadBar(info);
}
}
}
Yes, the list will contain the function result for failed deserializations; the application needs to handle them.
The function needs to return the same type that would have been returned by a successful deserialization.
You can't use conditions with batch listeners. If the list has a mixture of Foos and Bars, they all go to the same listener.

Is there a way to pass redis commands in jedis, without using the functions?

We are trying to build a console to process redis queries. But, in the back end we need to use Jedis. So, the commands, given as the inputs needs to be processed using Jedis. For example, in redis-cli, we use " keys * ". For the same we use jedis.keys(" * ") in Jedis. I have no idea, how to convert " keys * " into jedis.keys(" * "). Kindly tell me some suggestions....
I know this is an old question, but hopefully the following will be useful for others.
Here's something I came up with as the most recent version of Jedis (3.2.0 as of this time) did not support the "memory usage " command which is available on Redis >= 4. This code assumes a Jedis object has been created, probably from a Jedis resource pool:
import redis.clients.jedis.util.SafeEncoder;
// ... Jedis setup code ...
byteSize = (Long) jedis.sendCommand(new ProtocolCommand() {
#Override
public byte[] getRaw() {
return SafeEncoder.encode("memory");
}},
SafeEncoder.encode("usage"),
SafeEncoder.encode(key));
This is a special case command as it has a primary keyword memory with a secondary action usage (other ones are doctor, stats, purge, etc). When sending multi-keyword commands to Redis, the keywords must be treated as a list. My first attempt at specifying memory usage as a single argument failed with a Redis server error.
Subsequently, it seems the current Jedis implementation is geared toward single keyword commands, as underneath the hood there's a bunch of special code to deal with multi-keyword commands such as debug object that doesn't quite fit the original command keyword framework.
Anyway, once my current project that required the ability to call memory usage is complete, I'll try my hand at providing a patch to the Jedis maintainer to implement the above command in a more official/conventional way, which would look something like:
Long byteSize = jedis.memoryUsage(key);
Finally, to address your specific need, you're best bet is to use the scan() method of the Jedis class. There are articles here on SO that explain how to use the scan() method.
Hmm...You can make the same thing by referring to the following.
redis.clients.jedis.Connection.sendCommand(Command, String...)
Create a class extend Connection.
Create a class extend Connection instance and call the connect() method.
Call super.sendCommand(Protocol.Command.valueOf(args[0].toUpperCase()), args[1~end]).
example for you:
public class JedisConn extends Connection {
public JedisConn(String host, int port) {
super(host, port);
}
#Override
protected Connection sendCommand(final Protocol.Command cmd, final String... args) {
return super.sendCommand(cmd, args);
}
public static void main(String[] args) {
JedisConn jedisConn = new JedisConn("host", 6379);
jedisConn.connect();
Connection connection = jedisConn.sendCommand(Protocol.Command.valueOf(args[0].toUpperCase()), Arrays.copyOfRange(args, 1, args.length));
System.out.println(connection.getAll());
jedisConn.close();
}
}
Haha~~
I have found a way for this. There is a function named eval(). We can use that for this as shown below.
`Scanner s=new Scanner(System.in);String query=s.nextLine();
String[] q=query.split(" ");
String cmd='\''+q[0]+'\'';
for(int i=1;i<q.length;i++)
cmd+=",\'"+q[i]+'\'';
System.out.println(j.eval("return redis.call("+cmd+")"));`

Concurrent threads in GemFire CacheWriter

We are currently using Cassandra as NoSQL Database and GemFire as In memory Database. We have been using the GemFire CacheWriter to insert the records in Cassandra. I would like your feedback on whether it’s a good engineering practice to use Concurrent threads in CacheWriter to insert/Update records. Your feedback on this would be appreciated.
public class GenericWriter<K, V> extends CacheWriterAdapter<K, V> implements Declarable {
private static Logger log = LoggerFactory.getLogger(GenericWriter.class);
#Autowired
private CassandraOperations cassandraOperations;
ExecutorService executor = null;
#Override
public void beforeCreate(EntryEvent<K, V> e) {
executor = Executors.newSingleThreadExecutor();
executor.submit(() -> {
if (eventOperation.equals("CREATE") || eventOperation.equalsIgnoreCase("PUTALL_CREATE")) {
try {
cassandraOperations.insert(e.getNewValue());
} catch (CassandraConnectionFailureException | CassandraWriteTimeoutException
| CassandraInternalException cassException) {
} catch (Exception ex) {
log.error("Exception in GenericCacheWriter->" + ExceptionUtils.getStackTrace(ex));
throw ex;
}
}
});
executor.shutdown();
}
#Override
public void init(Properties arg0) {
// TODO Auto-generated method stub
}
}
The CacheWriter handler is called synchronously, so the application does not continue until the handler returns. Therefore, is not recommended to execute long-running operations inside this listener. If a long-running operation is needed, consider processing the operation asynchronously through an AsyncEventListener instead.
Using an ExecutorService to delegate the execution to a different thread is possible but it is an anti-pattern, as it no longer implements the fail-fast property, and the handling of the event is no longer synchronous, so its timing would not be guaranteed relative to the application's completion of the event.
You can read more about this topic in the Geode Wiki, specifically in CacheWrite and CacheListener Best Practices.
Hope this helps.
Best regards.
Yes, it's a fine pattern but remove the Executor and partition your data such that all updates into GemFire go to one and only one node. Partition Cassandra the same way. Put a write lock around the Cassandra update. Use this only when your throughput is low.
If you need high throughput, use the AsyncEventListener and guarantee eventual consistency to your users. If you must use Executors in the AEL, use them in a way so as to throw an exception in the main thread. If the update fails after a number of tries, you write the failed entry to a different region with an expiration of a few seconds or a minute. When that expires, retry the operation. Keep doing this until the succeeds and then and only then, delete the expired entry.
You will need to track version numbers and what you are updating watching old values/ new values if order of updates is important to you or not.

Mutex violations using ServiceStack Redis for distributed locking

I'm attempting to implement DLM using the locking mechanisms provided by the ServiceStack-Redis library and described here, but I'm finding that the API seems to present a race condition which will sometimes grant the same lock to multiple clients.
BasicRedisClientManager mgr = new BasicRedisClientManager(redisConnStr);
using(var client = mgr.GetClient())
{
client.Remove("touchcount");
client.Increment("touchcount", 0);
}
Random rng = new Random();
Action<object> simulatedDistributedClientCode = (clientId) => {
using(var redisClient = mgr.GetClient())
{
using(var mylock = redisClient.AcquireLock("mutex", TimeSpan.FromSeconds(2)))
{
long touches = redisClient.Get<long>("touchcount");
Debug.WriteLine("client{0}: I acquired the lock! (touched: {1}x)", clientId, touches);
if(touches > 0) {
Debug.WriteLine("client{0}: Oh, but I see you've already been here. I'll release it.", clientId);
return;
}
int arbitraryDurationOfExecutingCode = rng.Next(100, 2500);
Thread.Sleep(arbitraryDurationOfExecutingCode); // do some work of arbitrary duration
redisClient.Increment("touchcount", 1);
}
Debug.WriteLine("client{0}: Okay, I released my lock, your turn now.", clientId);
}
};
Action<Task> exceptionWriter = (t) => {if(t.IsFaulted) Debug.WriteLine(t.Exception.InnerExceptions.First());};
int arbitraryDelayBetweenClients = rng.Next(5, 500);
var clientWorker1 = new Task(simulatedDistributedClientCode, 1);
var clientWorker2 = new Task(simulatedDistributedClientCode, 2);
clientWorker1.Start();
Thread.Sleep(arbitraryDelayBetweenClients);
clientWorker2.Start();
Task.WaitAll(
clientWorker1.ContinueWith(exceptionWriter),
clientWorker2.ContinueWith(exceptionWriter)
);
using(var client = mgr.GetClient())
{
var finaltouch = client.Get<long>("touchcount");
Console.WriteLine("Touched a total of {0}x.", finaltouch);
}
mgr.Dispose();
When running the above code to simulate two clients attempting the same operation within short succession of one another, there are three possible outputs. The first one is the optimal case where the Mutex works properly and the clients proceed in the proper order. The second case is when the 2nd client times out waiting to acquire a lock; also an acceptable outcome. The problem, however, is that as arbitraryDurationOfExecutingCode approaches or exceeds the timeout for acquiring a lock, it is quite easy to reproduce a situation where the 2nd client is granted the lock BEFORE the 1st client releases it, producing output like this:
client1: I acquired the lock! (touched: 0x)
client2: I acquired the lock! (touched: 0x)
client1: Okay, I released my lock, your turn now.
client2: Okay, I released my lock, your turn now.
Touched a total of 2x.
My understanding of the API and its documentation is that the timeOut argument when acquiring a lock is meant to be just that -- the timeout for getting the lock. If I have to guess at a timeOut value that is high enough to always be longer than the duration of my executing code just to prevent this condition, that seems pretty error prone. Does anyone have a work around other than passing null to wait on locks forever? I definitely don't want to do that or I know I'll end up with ghost locks from crashed workers.
The answer from mythz (thanks for the prompt response!) confirms that the built-in AcquireLock method in ServiceStack.Redis doesn't draw a distinction between the lock acquisition period versus the lock expiration period. For our purposes, we have existing code that expected the distributed locking mechanism to fail quickly if the lock was taken, but allow for long-running processes within the lock scope. To accommodate these requirements, I derived this variation on the ServiceStack RedisLock that allows a distinction between the two.
// based on ServiceStack.Redis.RedisLock
// https://github.com/ServiceStack/ServiceStack.Redis/blob/master/src/ServiceStack.Redis/RedisLock.cs
internal class RedisDlmLock : IDisposable
{
public static readonly TimeSpan DefaultLockAcquisitionTimeout = TimeSpan.FromSeconds(30);
public static readonly TimeSpan DefaultLockMaxAge = TimeSpan.FromHours(2);
public const string LockPrefix = ""; // namespace lock keys if desired
private readonly IRedisClient _client; // note that the held reference to client means lock scope should always be within client scope
private readonly string _lockKey;
private string _lockValue;
/// <summary>
/// Acquires a distributed lock on the specified key.
/// </summary>
/// <param name="redisClient">The client to use to acquire the lock.</param>
/// <param name="key">The key to acquire the lock on.</param>
/// <param name="acquisitionTimeOut">The amount of time to wait while trying to acquire the lock. Defaults to <see cref="DefaultLockAcquisitionTimeout"/>.</param>
/// <param name="lockMaxAge">After this amount of time expires, the lock will be invalidated and other clients will be allowed to establish a new lock on the same key. Deafults to <see cref="DefaultLockMaxAge"/>.</param>
public RedisDlmLock(IRedisClient redisClient, string key, TimeSpan? acquisitionTimeOut = null, TimeSpan? lockMaxAge = null)
{
_client = redisClient;
_lockKey = LockPrefix + key;
ExecExtensions.RetryUntilTrue(
() =>
{
//Modified from ServiceStack.Redis.RedisLock
//This pattern is taken from the redis command for SETNX http://redis.io/commands/setnx
//Calculate a unix time for when the lock should expire
lockMaxAge = lockMaxAge ?? DefaultLockMaxAge; // hold the lock for the default amount of time if not specified.
DateTime expireTime = DateTime.UtcNow.Add(lockMaxAge.Value);
_lockValue = (expireTime.ToUnixTimeMs() + 1).ToString(CultureInfo.InvariantCulture);
//Try to set the lock, if it does not exist this will succeed and the lock is obtained
var nx = redisClient.SetEntryIfNotExists(_lockKey, _lockValue);
if (nx)
return true;
//If we've gotten here then a key for the lock is present. This could be because the lock is
//correctly acquired or it could be because a client that had acquired the lock crashed (or didn't release it properly).
//Therefore we need to get the value of the lock to see when it should expire
string existingLockValue = redisClient.Get<string>(_lockKey);
long lockExpireTime;
if (!long.TryParse(existingLockValue, out lockExpireTime))
return false;
//If the expire time is greater than the current time then we can't let the lock go yet
if (lockExpireTime > DateTime.UtcNow.ToUnixTimeMs())
return false;
//If the expire time is less than the current time then it wasn't released properly and we can attempt to
//acquire the lock. This is done by setting the lock to our timeout string AND checking to make sure
//that what is returned is the old timeout string in order to account for a possible race condition.
return redisClient.GetAndSetEntry(_lockKey, _lockValue) == existingLockValue;
},
acquisitionTimeOut ?? DefaultLockAcquisitionTimeout // loop attempting to get the lock for this amount of time.
);
}
public override string ToString()
{
return String.Format("RedisDlmLock:{0}:{1}", _lockKey, _lockValue);
}
public void Dispose()
{
try
{
// only remove the entry if it still contains OUR value
_client.Watch(_lockKey);
var currentValue = _client.Get<string>(_lockKey);
if (currentValue != _lockValue)
{
_client.UnWatch();
return;
}
using (var tx = _client.CreateTransaction())
{
tx.QueueCommand(r => r.Remove(_lockKey));
tx.Commit();
}
}
catch (Exception ex)
{
// log but don't throw
}
}
}
To simplify use as much as possible, I also expose some extension methods for IRedisClient to parallel the AcquireLock method, along these lines:
internal static class RedisClientLockExtensions
{
public static IDisposable AcquireDlmLock(this IRedisClient client, string key, TimeSpan timeOut, TimeSpan maxAge)
{
return new RedisDlmLock(client, key, timeOut, maxAge);
}
}
Your question highlights the behavior of Distributed Locking in ServiceStack.Redis, if the timeout specified is exceeded, the timed-out clients treats it as an invalid lock and will attempt to auto-recover the lock. If there was no auto-recovery behavior a crashed client would never release the lock and no further operations waiting on that lock would be allowed through.
The locking behavior for AcquireLock is encapsulated in the RedisLock class:
public IDisposable AcquireLock(string key, TimeSpan timeOut)
{
return new RedisLock(this, key, timeOut);
}
Which you can take a copy of and modify to suit the behavior you'd prefer:
using (new MyRedisLock(client, key, timeout))
{
//...
}

Glassfish - JEE6 - Use of Interceptor to measure performance

For measuring execution time of methods, I've seen suggestions to use
public class PerformanceInterceptor {
#AroundInvoke
Object measureTime(InvocationContext ctx) throws Exception {
long beforeTime = System.currentTimeMillis();
Object obj = null;
try {
obj = ctx.proceed();
return obj;
}
finally {
time = System.currentTimeMillis() - beforeTime;
// Log time
}
}
Then put
#Interceptors(PerformanceInterceptor.class)
before whatever method you want measured.
Anyway I tried this and it seems to work fine.
I also added a
public static long countCalls = 0;
to the PerformanceInterceptor class and a
countCalls++;
to the measureTime() which also seems to work o.k.
With my newby hat on, I will ask if my use of the countCalls is o.k. i.e
that Glassfish/JEE6 is o.k. with me using static variables in a Java class that is
used as an Interceptor.... in particular with regard to thread safety. I know that
normally you are supposed to synchronize setting of class variables in Java, but I
don't know what the case is with JEE6/Glassfish. Any thoughts ?
There is not any additional thread safety provided by container in this case. Each bean instance does have its own instance of interceptor. As a consequence multiple thread can access static countCalls same time.
That's why you have to guard both reads and writes to it as usual. Other possibility is to use AtomicLong:
private static final AtomicLong callCount = new AtomicLong();
private long getCallCount() {
return callCount.get();
}
private void increaseCountCall() {
callCount.getAndIncrement();
}
As expected, these solutions will work only as long as all of the instances are in same JVM, for cluster shared storage is needed.