Apache Ignite data distribution issue - ignite

I am trying to load data into an Apache Ignite cluster using a Java thin client, but the data is not getting loaded evenly. I see that only one machine's memory is increased.
I followed this, added a custom affinity mapper. Version is 2.14.0
https://ignite.apache.org/docs/2.14.0/thin-clients/java-thin-client#partition-awareness
Update1: Adding code
ClientConfiguration cfg = new ClientConfiguration()
.setAddresses("host1:10800","host2:10800",......)
.setTimeout(10*60*1000)
.setPartitionAwarenessEnabled(true)
.setPartitionAwarenessMapperFactory(new ClientPartitionAwarenessMapperFactory() {
#Override public ClientPartitionAwarenessMapper create(String cacheName, int partitions) {
final AffinityFunction aff = new MyAffinityFunction(partitions);
return new ClientPartitionAwarenessMapper() {
#Override
public int partition(Object key) {
return aff.partition(key);
}
};
}
});
ignite = Ignition.startClient(cfg)
kmvpCache = ignite.cache(s"cacheName")
val kmvpBatch = new java.util.HashMap[java.lang.Long, Array[Byte]](params.batchSize)
for(....){
for(....){
kmvpBatch.put(<Key>, <value>)
}
kmvpCache.putAll(kmvpBatch)
kmvpBatch.clear()
}

Related

Gemfire Pdx Serialization Put All

Is it normal that a client application took a longer time to insert data into GemFire Cluster for the first time? For example, my client application took around 4 seconds to insert GemFire Cluster successfully. However , the subsequent insert only took around 1 second. May i know what is the reason behind it?
#Configuration
#EnablePool(name = "sgpool", socketBufferSize = 1000000, prSingleHopEnabled = true, idleTimeout = 10000)
public class RegionConfiguration {
#Bean("People")
public ClientRegionFactoryBean<String, Person> customersRegion(GemFireCache gemfireCache) {
ClientRegionFactoryBean<String, Person> customersRegion = new ClientRegionFactoryBean<>();
customersRegion.setCache(gemfireCache);
customersRegion.setClose(false);
customersRegion.setPoolName("sgpool");
customersRegion.setShortcut(ClientRegionShortcut.CACHING_PROXY);
return customersRegion;
}
}
#ClientCacheApplication
#EnablePdx
#Service
#EnableGemfireRepositories(basePackageClasses = PersonRepository.class)
#Import(RegionConfiguration.class)
public class PersonDataAccess {
#Autowired
#Qualifier("People")
private Region<String, Person> peopleRegion;
#Autowired
private PersonRepository personRepository;
#PostConstruct
public void init() {
peopleRegion.registerInterestForAllKeys();
}
public void saveAll(Iterable<Person> iteratorList) {
personRepository.saveAll(iteratorList);
}
}
#Service
#EnableScheduling
#Log4j2
public class PersonService {
#Autowired
public PersonDataAccess personDataAccess;
private Person createPerson(String ic, int age) {
return new Person(ic, "Jack - " + age, "Kay - " + age, LocalDate.of(2000, 12, 10), age, 2,
new Address("Jack Wonderful Land 12", "Wonderful Land", "Wonderful 101221"), "11111", "22222", "33333",
"Dream", "Space");
}
#Scheduled(fixedDelay = 1000, initialDelay = 5000)
public void testSaveRecord() {
ArrayList<Person> personList = new ArrayList<>();
for (int counter = 0; counter < 30000; counter++) {
personList.add(createPerson("S2011" + counter, counter));
}
log.info("start saving person");
personDataAccess.saveAll(personList);
log.info("Complete saving all the message");
}
}
GemFire Configuration:
Using PDX Serialization
1 Locator and Cache Server (GemFire 9.10.5
Partition Region
Client Application (Using Spring Data GemFire 2.3.4)
Thank you so much
Clearly the 1st time there’s a PDX cache register process going on between client and server kind of like
client server event distribution
The register process does not need to happen 2nd time because it’s now setup.
The PDX registration process is explained more here where it says: Geode maintains a central registry of the PDX domain object metadata.
There must be a way to get the region setup done before your 1st insert, so that each insert is the same.
Maybe this could help

stop polling files when rabbitmq is down: spring integration

I'm working on a project where we are polling files from a sftp server and streaming it out into a object on the rabbitmq queue. Now when the rabbitmq is down it still polls and deletes the file from the server and losses the file while sending it on queue when rabbitmq is down. I'm using ExpressionEvaluatingRequestHandlerAdvice to remove the file on successful transformation. My code looks like this:
#Bean
public SessionFactory<ChannelSftp.LsEntry> sftpSessionFactory() {
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory(true);
factory.setHost(sftpProperties.getSftpHost());
factory.setPort(sftpProperties.getSftpPort());
factory.setUser(sftpProperties.getSftpPathUser());
factory.setPassword(sftpProperties.getSftpPathPassword());
factory.setAllowUnknownKeys(true);
return new CachingSessionFactory<>(factory);
}
#Bean
public SftpRemoteFileTemplate sftpRemoteFileTemplate() {
return new SftpRemoteFileTemplate(sftpSessionFactory());
}
#Bean
#InboundChannelAdapter(channel = TransformerChannel.TRANSFORMER_OUTPUT, autoStartup = "false",
poller = #Poller(value = "customPoller"))
public MessageSource<InputStream> sftpMessageSource() {
SftpStreamingMessageSource messageSource = new SftpStreamingMessageSource(sftpRemoteFileTemplate,
null);
messageSource.setRemoteDirectory(sftpProperties.getSftpDirPath());
messageSource.setFilter(new SftpPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(),
"streaming"));
messageSource.setFilter(new SftpSimplePatternFileListFilter("*.txt"));
return messageSource;
}
#Bean
#Transformer(inputChannel = TransformerChannel.TRANSFORMER_OUTPUT,
outputChannel = SFTPOutputChannel.SFTP_OUTPUT,
adviceChain = "deleteAdvice")
public org.springframework.integration.transformer.Transformer transformer() {
return new SFTPTransformerService("UTF-8");
}
#Bean
public ExpressionEvaluatingRequestHandlerAdvice deleteAdvice() {
ExpressionEvaluatingRequestHandlerAdvice advice = new ExpressionEvaluatingRequestHandlerAdvice();
advice.setOnSuccessExpressionString(
"#sftpRemoteFileTemplate.remove(headers['file_remoteDirectory'] + headers['file_remoteFile'])");
advice.setPropagateEvaluationFailures(false);
return advice;
}
I don't want the files to get removed/polled from the remote sftp server when the rabbitmq server is down. How can i achieve this ?
UPDATE
Apologies for not mentioning that I'm using spring cloud stream rabbit binder. And here is the transformer service:
public class SFTPTransformerService extends StreamTransformer {
public SFTPTransformerService(String charset) {
super(charset);
}
#Override
protected Object doTransform(Message<?> message) throws Exception {
String fileName = message.getHeaders().get("file_remoteFile", String.class);
Object fileContents = super.doTransform(message);
return new customFileDTO(fileName, (String) fileContents);
}
}
UPDATE-2
I added TransactionSynchronizationFactory on the customPoller as suggested. Now it doesn't poll file when rabbit server is down, but when the server is up, it keeps on polling the same file over and over again!! I cannot figure it out why? I guess i cannot use PollerSpec cause im on 4.3.2 version.
#Bean(name = "customPoller")
public PollerMetadata pollerMetadataDTX(StartStopTrigger startStopTrigger,
CustomTriggerAdvice customTriggerAdvice) {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setAdviceChain(Collections.singletonList(customTriggerAdvice));
pollerMetadata.setTrigger(startStopTrigger);
pollerMetadata.setMaxMessagesPerPoll(Long.valueOf(sftpProperties.getMaxMessagePoll()));
ExpressionEvaluatingTransactionSynchronizationProcessor syncProcessor =
new ExpressionEvaluatingTransactionSynchronizationProcessor();
syncProcessor.setBeanFactory(applicationContext.getAutowireCapableBeanFactory());
syncProcessor.setBeforeCommitChannel(
applicationContext.getBean(TransformerChannel.TRANSFORMER_OUTPUT, MessageChannel.class));
syncProcessor
.setAfterCommitChannel(
applicationContext.getBean(SFTPOutputChannel.SFTP_OUTPUT, MessageChannel.class));
syncProcessor.setAfterCommitExpression(new SpelExpressionParser().parseExpression(
"#sftpRemoteFileTemplate.remove(headers['file_remoteDirectory'] + headers['file_remoteFile'])"));
DefaultTransactionSynchronizationFactory defaultTransactionSynchronizationFactory =
new DefaultTransactionSynchronizationFactory(syncProcessor);
pollerMetadata.setTransactionSynchronizationFactory(defaultTransactionSynchronizationFactory);
return pollerMetadata;
}
I don't know if you need this info but my CustomTriggerAdvice and StartStopTrigger looks like this :
#Component
public class CustomTriggerAdvice extends AbstractMessageSourceAdvice {
#Autowired private StartStopTrigger startStopTrigger;
#Override
public boolean beforeReceive(MessageSource<?> source) {
return true;
}
#Override
public Message<?> afterReceive(Message<?> result, MessageSource<?> source) {
if (result == null) {
if (startStopTrigger.getStart()) {
startStopTrigger.stop();
}
} else {
if (!startStopTrigger.getStart()) {
startStopTrigger.stop();
}
}
return result;
}
}
public class StartStopTrigger implements Trigger {
private PeriodicTrigger startTrigger;
private boolean start;
public StartStopTrigger(PeriodicTrigger startTrigger, boolean start) {
this.startTrigger = startTrigger;
this.start = start;
}
#Override
public Date nextExecutionTime(TriggerContext triggerContext) {
if (!start) {
return null;
}
start = true;
return startTrigger.nextExecutionTime(triggerContext);
}
public void stop() {
start = false;
}
public void start() {
start = true;
}
public boolean getStart() {
return this.start;
}
}
Well, would be great to see what your SFTPTransformerService and determine how it is possible to perform an onSuccessExpression when there should be an exception in case of down broker.
You also should not only throw an exception do not perform delete, but consider to add a RequestHandlerRetryAdvice to re-send the file to the RabbitMQ: https://docs.spring.io/spring-integration/docs/5.0.6.RELEASE/reference/html/messaging-endpoints-chapter.html#retry-advice
UPDATE
So, well, since Gary guessed that you use Spring Cloud Stream to send message to the Rabbit Binder after your internal process (very sad that you didn't share that information originally), you need to take a look to the Binder error handling on the matter: https://docs.spring.io/spring-cloud-stream/docs/Elmhurst.RELEASE/reference/htmlsingle/#_retry_with_the_rabbitmq_binder
And that is true that ExpressionEvaluatingRequestHandlerAdvice is applied only for the SFTPTransformerService and nothing more. The downstream error (in the Binder) is not included in this process already.
UPDATE 2
Yeah... I think Gary is right, and we don't have choice unless configure a TransactionSynchronizationFactory on the customPoller level instead of that ExpressionEvaluatingRequestHandlerAdvice: ExpressionEvaluatingRequestHandlerAdvice .
The DefaultTransactionSynchronizationFactory can be configured with the ExpressionEvaluatingTransactionSynchronizationProcessor, which has similar goal as the mentioned ExpressionEvaluatingRequestHandlerAdvice, but on the transaction level which will include your process starting with the SFTP Channel Adapter and ending on the Rabbit Binder level with the send to AMQP attempts.
See Reference Manual for more information: https://docs.spring.io/spring-integration/reference/html/transactions.html#transaction-synchronization.
The point with the ExpressionEvaluatingRequestHandlerAdvice (and any AbstractRequestHandlerAdvice) that they have a boundary only around handleRequestMessage() method, therefore only during the component they are declared.

How to associate a redis CrudRepository to a database

I am using spring-data-redis on my spring-boot 1.4 application. I have two distinct CrudRepositories. However, I am struggling to associate them with their respective Connection factories.
Bottom line is: I'd like PersonRedisRepository to use db #6 and OtherPurposeRedisRepository to use db #3. To be hoehest, I am not 100% sure if the way I am tackling the matter is correct.
The repository
interface PersonRedisRepository extends CrudRepository<Person, String> {
}
interface OtherPurposeRedisRepository extends CrudRepository<OtherPurpose, String> {
}
Configuration for person repository
#EnableRedisRepositories(basePackageClasses = [PersonRedisRepository.class], redisTemplateRef = "personRedisTemplate")
class RedisConfigurationForPerson {
#Bean(name = "personFactory")
public RedisConnectionFactory personJedisConnectionFactory() {
JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory()
jedisConnectionFactory.usePool = true
jedisConnectionFactory.hostName = "127.0.0.1"
jedisConnectionFactory.database = 6
return jedisConnectionFactory
}
#Bean(name = "personRedisTemplate")
public RedisTemplate<byte[], byte[]> availabilityCacheRedisTemplate() {
RedisTemplate<byte[], byte[]> template = new RedisTemplate<byte[], byte[]>()
template.setConnectionFactory(personJedisConnectionFactory())
template
}
}
Configuration for other purpose repository
#EnableRedisRepositories(basePackageClasses = [OtherPurpsoseRepository.class], redisTemplateRef = "otherPurposeRedisTemplate")
class RedisConfigurationForOtherPurpose {
#Bean(name = "otherPurposeFactory")
public RedisConnectionFactory otherPurposeJedisConnectionFactory() {
JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory()
jedisConnectionFactory.usePool = true
jedisConnectionFactory.hostName = "127.0.0.1"
jedisConnectionFactory.database = 3
return jedisConnectionFactory
}
#Bean(name = "otherPurposeRedisTemplate")
public RedisTemplate<byte[], byte[]> otherPurposeRedisTemplate() {
RedisTemplate<byte[], byte[]> template = new RedisTemplate<byte[], byte[]>()
template.setConnectionFactory(otherPurposeJedisConnectionFactory())
template
}
}
Everything works just fine, I can read/write using both Repositories. However, they both read/write on the db 6.
Another guy had the same problem as you. Even if the examples are for jpa repositories these links should help you :
Spring Boot Configure and Use Two DataSources
http://www.baeldung.com/spring-data-jpa-multiple-databases
you have first to bind the configuration datasource with the #Primay annotation and specify the datasource you are working on. This is the first part. I've looked quickly the second part and I will go deeper later. Will update my psot when done ;)

Storm Kafkaspout KryoSerialization issue for java bean from kafka topic

Hi I am new to Storm and Kafka.
I am using storm 1.0.1 and kafka 0.10.0
we have a kafkaspout that would receive java bean from kafka topic.
I have spent several hours digging to find the right approach for that.
Found few articles which are useful but none of the approaches worked for me so far.
Following is my codes:
StormTopology:
public class StormTopology {
public static void main(String[] args) throws Exception {
//Topo test /zkroot test
if (args.length == 4) {
System.out.println("started");
BrokerHosts hosts = new ZkHosts("localhost:2181");
SpoutConfig kafkaConf1 = new SpoutConfig(hosts, args[1], args[2],
args[3]);
kafkaConf1.zkRoot = args[2];
kafkaConf1.useStartOffsetTimeIfOffsetOutOfRange = true;
kafkaConf1.startOffsetTime = kafka.api.OffsetRequest.LatestTime();
kafkaConf1.scheme = new SchemeAsMultiScheme(new KryoScheme());
KafkaSpout kafkaSpout1 = new KafkaSpout(kafkaConf1);
System.out.println("started");
ShuffleBolt shuffleBolt = new ShuffleBolt(args[1]);
AnalysisBolt analysisBolt = new AnalysisBolt(args[1]);
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.setSpout("kafkaspout", kafkaSpout1, 1);
//builder.setBolt("counterbolt2", countbolt2, 3).shuffleGrouping("kafkaspout");
//This is for field grouping in bolt we need two bolt for field grouping or it wont work
topologyBuilder.setBolt("shuffleBolt", shuffleBolt, 3).shuffleGrouping("kafkaspout");
topologyBuilder.setBolt("analysisBolt", analysisBolt, 5).fieldsGrouping("shuffleBolt", new Fields("trip"));
Config config = new Config();
config.registerSerialization(VehicleTrip.class, VehicleTripKyroSerializer.class);
config.setDebug(true);
config.setNumWorkers(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(args[0], config, topologyBuilder.createTopology());
// StormSubmitter.submitTopology(args[0], config,
// builder.createTopology());
} else {
System.out
.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
}
}
}
I am serializing the data at kafka using kryo
KafkaProducer:
public class StreamKafkaProducer {
private static Producer producer;
private final Properties props = new Properties();
private static final StreamKafkaProducer KAFKA_PRODUCER = new StreamKafkaProducer();
private StreamKafkaProducer(){
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "com.abc.serializer.MySerializer");
producer = new org.apache.kafka.clients.producer.KafkaProducer(props);
}
public static StreamKafkaProducer getStreamKafkaProducer(){
return KAFKA_PRODUCER;
}
public void produce(String topic, VehicleTrip vehicleTrip){
ProducerRecord<String,VehicleTrip> producerRecord = new ProducerRecord<>(topic,vehicleTrip);
producer.send(producerRecord);
//producer.close();
}
public static void closeProducer(){
producer.close();
}
}
Kyro Serializer:
public class DataKyroSerializer extends Serializer<Data> implements Serializable {
#Override
public void write(Kryo kryo, Output output, VehicleTrip vehicleTrip) {
output.writeLong(data.getStartedOn().getTime());
output.writeLong(data.getEndedOn().getTime());
}
#Override
public Data read(Kryo kryo, Input input, Class<VehicleTrip> aClass) {
Data data = new Data();
data.setStartedOn(new Date(input.readLong()));
data.setEndedOn(new Date(input.readLong()));
return data;
}
I need to get the data back to the Data bean.
As per few articles I need to provide with a custom scheme and make it part of topology but till now I have no luck
Code for Bolt and Scheme
Scheme:
public class KryoScheme implements Scheme {
private ThreadLocal<Kryo> kryos = new ThreadLocal<Kryo>() {
protected Kryo initialValue() {
Kryo kryo = new Kryo();
kryo.addDefaultSerializer(Data.class, new DataKyroSerializer());
return kryo;
};
};
#Override
public List<Object> deserialize(ByteBuffer ser) {
return Utils.tuple(kryos.get().readObject(new ByteBufferInput(ser.array()), Data.class));
}
#Override
public Fields getOutputFields( ) {
return new Fields( "data" );
}
}
and bolt:
public class AnalysisBolt implements IBasicBolt {
/**
*
*/
private static final long serialVersionUID = 1L;
private String topicname = null;
public AnalysisBolt(String topicname) {
this.topicname = topicname;
}
public void prepare(Map stormConf, TopologyContext topologyContext) {
System.out.println("prepare");
}
public void execute(Tuple input, BasicOutputCollector collector) {
System.out.println("execute");
Fields fields = input.getFields();
try {
JSONObject eventJson = (JSONObject) JSONSerializer.toJSON((String) input
.getValueByField(fields.get(1)));
String StartTime = (String) eventJson.get("startedOn");
String EndTime = (String) eventJson.get("endedOn");
String Oid = (String) eventJson.get("_id");
int V_id = (Integer) eventJson.get("vehicleId");
//call method getEventForVehicleWithinTime(Long vehicleId, Date startTime, Date endTime)
System.out.println("==========="+Oid+"| "+V_id+"| "+StartTime+"| "+EndTime);
} catch (Exception e) {
e.printStackTrace();
}
}
but if I submit the storm topology i am getting error:
java.lang.IllegalStateException: Spout 'kafkaspout' contains a
non-serializable field of type com.abc.topology.KryoScheme$1, which
was instantiated prior to topology creation.
com.minda.iconnect.topology.KryoScheme$1 should be instantiated within
the prepare method of 'kafkaspout at the earliest.
Appreciate help to debug the issue and guide to right path.
Thanks
Your ThreadLocal is not Serializable. The preferable solution would be to make your serializer both Serializable and threadsafe. If this is not possible, then I see 2 alternatives since there is no prepare method as you would get in a bolt.
Declare it as static, which is inherently transient.
Declare it transient and access it via a private get method. Then you can initialize the variable on first access.
Within the Storm lifecycle, the topology is instantiated and then serialized to byte format to be stored in ZooKeeper, prior to the topology being executed. Within this step, if a spout or bolt within the topology has an initialized unserializable property, serialization will fail.
If there is a need for a field that is unserializable, initialize it within the bolt or spout's prepare method, which is run after the topology is delivered to the worker.
Source: Best Practices for implementing Apache Storm

ServiceStack.Redis: Unable to Connect: sPort:

I regularly get
ServiceStack.Redis: Unable to Connect: sPort: 0 or ServiceStack.Redis: Unable to Connect: sPort: 50071 (or another port number).
This appears to happen when our site is busier. Redis itself appears fine, no real increase in CPU or memory usage.
I'm using connection pooling and have tried changing the timeout values without success.
public sealed class RedisConnection
{
// parameter values are:
// Config.Settings.RedisPoolSize = 10000
// Config.Settings.RedisPoolTimeoutSeconds = 2
// Config.Settings.RemoteCacheServerName = 192.168.10.12
private static readonly PooledRedisClientManager instance
= new PooledRedisClientManager(Config.Settings.RedisPoolSize,
Config.Settings.RedisPoolTimeoutSeconds,
new string[] { Config.Settings.RemoteCacheServerName })
{
ConnectTimeout = 1500
};
static RedisConnection()
{
}
public static PooledRedisClientManager Instance
{
get
{
return instance;
}
}
}
Usage is like this:
public sealed class Caching
{
public static T GetCacheSingle<T>(string key)
{
using (var redisClient = RedisConnection.Instance.GetReadOnlyClient())
{
var value = redisClient.Get<byte[]>(key);
....
}
}
}
This was caused by latency issue due to Redis running on Ubuntu hosted as a virtual machine on Hyper-V.
We reduced latency by 45% by moving to a physical Linux box, fixing the problem.