I have a query that stalls/hangs over large argument inputs. The same code works well on smaller SQL argument input. The code is as follows:
subEntries = dataRDD.sql("SELECT v.id,v.sub,v.obj FROM VPRow v JOIN table(id bigint = ?) i ON v.id = i.id",new Object[] {subKeyEntries.toArray()});
LOG.debug("Reading : "+subEntries.count());
Please note that the Ignite documentation mentions that the input argument can be of any size - "Here you can provide object array (Object[]) of any length as a parameter". The parameter passed to the query in the stalling case was of size 23641 of long values.
My spring configuration file is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<!-- Set a cache name. -->
<property name="name" value="dataRDD"/>
<!-- Set a cache mode. -->
<property name="cacheMode" value="PARTITIONED"/>
<!-- Index Integer pairs used in the example. -->
<property name="indexedTypes">
<list>
<value>java.lang.Long</value>
<value>sample.VPRow</value>
</list>
</property>
<property name="backups" value="0"/>
</bean>
</list>
</property>
<!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<value>[IP1]</value>
<value>[...]</value>
<value>[IP5]</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
The VPRow class is defined as follows
public class VPRow implements Serializable {
#QuerySqlField
private long id;
#QuerySqlField
private String sub;
#QuerySqlField
private String obj;
public VPRow(long id,String sub, String obj) {
this.id = id;
this.sub = sub;
this.obj = obj;
}
...
}
Usually databases has a limitation for "IN" operator.
You can divide this sql query in parts and run them simultaneously.
Please see paragraph 2 here: https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations
IN operator doesn't use indexes, so with a long list like this query will do too many scans. Changing the query like described should help.
Related
I have the following configuration file
<bean abstract="true" id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="peerClassLoadingEnabled" value="true"/>
<property name="includeEventTypes">
<list>
<!--Task execution events-->
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_STARTED"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FINISHED"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FAILED"/>
</list>
</property>
<property name="metricsUpdateFrequency" value="10000"/>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<!-- In distributed environment, replace with actual host IP address. -->
<value>127.0.0.1:47500..47509</value>
<value>127.0.0.1:48500..48509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
<!-- Enabling the required Failover SPI. -->
<property name="failoverSpi">
<bean class="org.apache.ignite.spi.failover.jobstealing.JobStealingFailoverSpi"/>
</property>
<property name="collisionSpi">
<bean class="org.apache.ignite.spi.collision.jobstealing.JobStealingCollisionSpi">
<property name="activeJobsThreshold" value="50"/>
<property name="waitJobsThreshold" value="0"/>
<property name="messageExpireTime" value="1000"/>
<property name="maximumStealingAttempts" value="10"/>
<property name="stealingEnabled" value="true"/>
</bean>
</property>
</bean>
The closure gets executed over the server nodes in the grid as expected.
When we add a new node by executing the below command to the grid during the execution of closure
The existing nodes acknowledge the addition of the new node in the grid but the closure is not distributed to the newly added node.
Below is my closure implementation
#Override
public AccruedSimpleInterest apply(SimpleInterestParameter simpleInterestParameter) {
BigDecimal si = simpleInterestParameter.getPrincipal()
.multiply(new BigDecimal(simpleInterestParameter.getYears()))
.multiply(new BigDecimal(simpleInterestParameter.getRate())).divide(SimpleInterestClosure.HUNDRED);
System.out.println("Calculated SI for id=" + simpleInterestParameter.getId() + " SI=" + si.toPlainString());
return new AccruedSimpleInterest(si, simpleInterestParameter);
}
Below is the main class
public static void main(String... args) throws IgniteException, IOException {
Factory<SimpleInterestClosure> siClosureFactory = FactoryBuilder.factoryOf(new SimpleInterestClosure());
ClassPathResource ress = new ClassPathResource("example-ignite-poc.xml");
File file = new File(ress.getPath());
try (Ignite ignite = Ignition.start(file.getPath())) {
System.out.println("Started Ignite Cluster");
IgniteFuture<Collection<AccruedSimpleInterest>> igniteFuture = ignite.compute()
.applyAsync(siClosureFactory.create(), createParamCollection());
Collection<AccruedSimpleInterest> res = igniteFuture.get();
System.out.println(res.size());
}nter code here
As far as my understanding goes, Job Stealing SPI requires you to implement some additional APIs in order to work.
Please see this discussion on user list:
Some remarks about job stealing SPI:
1)You have some nodes that can proceed the tasks of some compute job.
2)Tasks will be executed in public thread pool by default:
https://apacheignite.readme.io/docs/thread-pools#section-public-pool
3)If some node thread pool is busy then some task of compute job can be
executed on other node.
In next cases it will not work:
1)In case if you choose specific node for your compute task
2)In case if you do affinity call (the same as above but node will be
choose by affinity mapping)
I use next process for my Ignite cache with third party persistence:
empty database
start two instances in server mode
start first instance in client mode.
The client in cycle
creates entities
reads entities by a simple SqlQuery
So far all right. All the code works properly.
Then I start second instance in client mode. The code is the same.
The second client also in cycle
creates entities
reads entities by the SqlQuery
And the second client gets empty ResultSet. While the first client still reads the data properly. BTW. Both clients can get entities by keys.
Off course all the data in memory.
So why the second client can't read by SqlQuery?
Three options of code are below. All of them work identically: The first started client always gets correct result. The second started client always gets empty ResultSet.
SqlQuery<EntryKey, Entry> sql = new SqlQuery<>(Entry.class, "accNumber = ?");
sql.setArgs(number);
List<Cache.Entry<EntryKey, Entry>> res = entryCache.query(sql).getAll();
...
SqlFieldsQuery sql = new SqlFieldsQuery("select d_c, summa from Entry where accNumber = ?");
sql.setArgs(number);
List<List<?>> res = entryCache.query(sql).getAll();
...
SqlQuery<BinaryObject, BinaryObject> query = new SqlQuery<>(Entry.class, "accNumber = ?");
QueryCursor<Cache.Entry<BinaryObject, BinaryObject>> entryCursor = binaryEntry
.query(query.setArgs(number));
List<javax.cache.Cache.Entry<BinaryObject, BinaryObject>> res = entryCursor.getAll();
XML configuration is below:
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<!-- Set a cache name. -->
<property name="name" value="entryCache" />
<!-- Set cache mode. -->
<property name="cacheMode" value="PARTITIONED" />
<property name="atomicityMode" value="TRANSACTIONAL" />
<!-- Number of backup nodes. -->
<property name="backups" value="1" />
<property name="cacheStoreFactory">
<bean class="javax.cache.configuration.FactoryBuilder"
factory-method="factoryOf">
<constructor-arg
value="ru.raiffeisen.cache.store.jdbc.CacheJdbcEntryStore" />
</bean>
</property>
<property name="readThrough" value="true" />
<property name="writeThrough" value="true" />
<property name="queryEntities">
<list>
<bean class="org.apache.ignite.cache.QueryEntity">
<!-- Setting indexed type's key class -->
<property name="keyType"
value="ru.raiffeisen.cache.repository.EntryKey" />
<!-- Setting indexed type's value class -->
<property name="valueType" value="ru.raiffeisen.cache.repository.Entry" />
<!-- Defining fields that will be either indexed or queryable. Indexed
fields are added to 'indexes' list below. -->
<property name="fields">
<map>
<entry key="key.accNumber" value="java.lang.String" />
<entry key="key.d_c" value="ru.raiffeisen.cache.repository.EntryKey.DEB_CRE" />
<entry key="key.valuedate" value="java.util.Date" />
<entry key="summa" value="java.lang.Integer " />
</map>
</property>
<!-- Defining indexed fields. -->
<property name="indexes">
<list>
<!-- Single field (aka. column) index -->
<bean class="org.apache.ignite.cache.QueryIndex">
<constructor-arg value="key.accNumber" />
</bean>
</list>
</property>
</bean>
</list>
</property>
</bean>
In case of a Third party store, readThrough works only for key-value API, for SQL you need to run loadCache method before performing queries on Ignite.
If you want to use read from disk with persistence, I would recommend using Ignite native persistence store: https://apacheignite.readme.io/docs/distributed-persistent-store
Also, I see in you configuration:
<property name="indexedTypes" value="true" />
It's definitely a mistake, it should be configured like:
<property name="indexedTypes">
<list>
<value>java.lang.Integer</value>
<value>java.lang.Long</value>
</list>
</property>
IndexedTypes and QueryEntity configure the same things, actually, internally, IndexedTypes will create a configuration of QueryEntity. So, it's redundantly to configure both.
Thanks to everyone for suggestions.
I achieved an option which gives a stable and correct result.
Actually I moved the field accNumber from the key class to the value class. So now select is filtered against a primitive field of the value class.
Query configuration:
<bean class="org.apache.ignite.cache.QueryEntity">
<!-- Setting indexed type's key class -->
<property name="keyType"
value="ru.raiffeisen.cache.repository.EntryKey" />
<!-- Setting indexed type's value class -->
<property name="valueType" value="ru.raiffeisen.cache.repository.Entry" />
<!-- Defining fields that will be either indexed or queryable. Indexed
fields are added to 'indexes' list below. -->
<property name="fields">
<map>
<entry key="accNumber" value="java.lang.String" />
<entry key="key.d_c" value="ru.raiffeisen.cache.repository.EntryKey.DEB_CRE" />
<entry key="key.valuedate" value="java.util.Date" />
<entry key="summa" value="java.lang.Integer " />
</map>
</property>
<!-- Defining indexed fields. -->
<property name="indexes">
<list>
<!-- Single field (aka. column) index -->
<bean class="org.apache.ignite.cache.QueryIndex">
<constructor-arg value="accNumber" />
</bean>
</list>
</property>
</bean>
Apache Ignite Version is: 2.1.0
I am using the default configuration for client & servers. The following is the client configuration. The server configuration does not have the "clientMode" property.
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/util
http://www.springframework.org/schema/util/spring-util.xsd">
<bean abstract="true" id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Set to true to enable distributed class loading for examples, default is false. -->
<property name="peerClassLoadingEnabled" value="true"/>
<property name="clientMode" value="true"/>
<!-- Enable task execution events for examples. -->
<property name="includeEventTypes">
<list>
<!--Task execution events-->
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_STARTED"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FINISHED"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FAILED"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_TIMEDOUT"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_SESSION_ATTR_SET"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_REDUCED"/>
<!--Cache events -->
<util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_OBJECT_PUT"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_OBJECT_READ"/>
<util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_OBJECT_REMOVED"/>
</list>
</property>
<!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<!--
Ignite provides several options for automatic discovery that can be used
instead os static IP based discovery. For information on all options refer
to our documentation: http://apacheignite.readme.io/docs/cluster-config
-->
<!-- Uncomment static IP finder to enable static-based discovery of initial nodes. -->
<!--<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">-->
<!-- <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder"> -->
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<!-- In distributed environment, replace with actual host IP address. -->
<value>xxx.1y4.1zz.91:47500..47509</value>
<value>xxx.1y4.1zz.92:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
The closure gets executed over the server nodes in the grid as expected.
When we add a new node by executing the below command to the grid during the execution of closure
.\ignite.bat ..\examples\config\example-ignite.xml
The existing nodes acknowledge the addition of the new node in the grid but the closure is not distributed to the newly added node.
Is there any configuration available to enable execution of closure to a node added during the execution of the closure?
Edit 1:
Below is the IgniteClosure implementation class:
public class SimpleInterestClosure implements IgniteClosure<SimpleInterestParam, AccruedSimpleInterest> {
private static final long serialVersionUID = -5542687183747797356L;
private static final BigInteger HUNDRED = new BigInteger("100".getBytes());
private static Logger log = Logger.getLogger("SimpleInterestClosure");
#Override
public AccruedSimpleInterest apply(SimpleInterestParam e) {
BigInteger si = e.getPrincipal().multiply(new BigInteger(e.getDurationInYears().toString().getBytes())).
multiply(new BigInteger(e.getInterestRate().toString().getBytes())).divide(SimpleInterestClosure.HUNDRED);
log.info("Calculated SI for id=" + e.getId());
return new AccruedSimpleInterest(e, si);
}
}
Edit 2:
Below is the method which invokes the IgniteClosure implementation
public void method() throws IgniteException, IOException {
Factory<SimpleInterestClosure> siClosureFactory = FactoryBuilder.singletonfactoryOf( new SimpleInterestClosure());
ClassPathResource ress = new ClassPathResource("example-ignite.xml");
File file = new File(ress.getPath());
try (Ignite ignite = Ignition.start(file.getPath())) {
log.info("Started Ignite Cluster");
IgniteFuture<Collection<AccruedSimpleInterest>> igniteFuture = ignite.compute()
.applyAsync(siClosureFactory.create(), createParamCollection());
Collection<AccruedSimpleInterest> res = igniteFuture.get();
}
}
This sounds like you're looking for job stealing: http://apacheignite.readme.io/docs/load-balancing#job-stealing
Although it currently has a bug that may be an issue in this particular case: http://issues.apache.org/jira/browse/IGNITE-1267
Im trying to retrieve the cached value for every element in the JavaPairRDD. Im using the LOCAL cache mode as i want to minimize data shuffling of cached data. The ignite nodes are started in embedded mode within a spark job. The following code works fine if i run it on a single node. However, when i run it on a cluster of 5 machines, i get zero results.
The first attempt i had was using the IgniteRDD sql method:
dataRDD.sql("SELECT v.id,v.sub,v.obj FROM VPRow v JOIN table(id bigint = ?) i ON v.id = i.id",new Object[] {objKeyEntries.toArray()});
where objKeyEntries is a collected set of entries in an RDD. The second attempt was using AffinityRun:
JavaPairRDD<Long, VPRow> objEntries = objKeyEntries.mapPartitionsToPair(new PairFlatMapFunction<Iterator<Tuple2<Long, Boolean>>, Long, VPRow>() {
#Override
public Iterator<Tuple2<Long, VPRow>> call(Iterator<Tuple2<Long, Boolean>> tuple2Iterator) throws Exception {
ApplicationContext ctx = new ClassPathXmlApplicationContext("ignite-rdd.xml");
IgniteConfiguration igniteConfiguration = (IgniteConfiguration) ctx.getBean("ignite.cfg");
Ignite ignite = Ignition.getOrStart(igniteConfiguration);
IgniteCache<Long, VPRow> cache = ignite.getOrCreateCache("dataRDD");
ArrayList<Tuple2<Long,VPRow>> lst = new ArrayList<>();
while(tuple2Iterator.hasNext()) {
Tuple2<Long, Boolean> val = tuple2Iterator.next();
ignite.compute().affinityRun("dataRDD", val._1(),()->{
lst.add(new Tuple2<>(val._1(),cache.get(val._1())));
});
}
return lst.iterator();
}
});
The following is the ignite-rdd.xml configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="memoryConfiguration">
<bean class="org.apache.ignite.configuration.MemoryConfiguration">
<property name="systemCacheInitialSize" value="#{100 * 1024 * 1024}"/>
<property name="defaultMemoryPolicyName" value="default_mem_plc"/>
<property name="memoryPolicies">
<list>
<bean class="org.apache.ignite.configuration.MemoryPolicyConfiguration">
<property name="name" value="default_mem_plc"/>
<property name="initialSize" value="#{5 * 1024 * 1024 * 1024}"/>
</bean>
</list>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<!-- Set a cache name. -->
<property name="name" value="dataRDD"/>
<!-- Set a cache mode. -->
<property name="cacheMode" value="LOCAL"/>
<!-- Index Integer pairs used in the example. -->
<property name="indexedTypes">
<list>
<value>java.lang.Long</value>
<value>edu.code.VPRow</value>
</list>
</property>
<property name="affinity">
<bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
<property name="partitions" value="50"/>
</bean>
</property>
</bean>
</list>
</property>
<!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<value>[IP5]</value>
<value>[IP4]</value>
<value>[IP3]</value>
<value>[IP2]</value>
<value>[IP1]</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Are you sure that you need to use LOCAL cache mode?
Most likely you filled cache only on one node and local caches on other nodes still empty.
affinityRun doesn't work because you have LOCAL cache, not PARTITIONED, so, it's not possible to determine owner node for key with AffinityFunction.
When I start a remote compute job , call() Or affinityCall(). Remote server will create 6 threads, and these thread never exit. Just like the VisualVM shows below:
view VisualVM snapshot
thread name from "utility-#153%null%" to "marshaller-cache-#14i%null%", will never be ended.
If client runs over and over again, the number of threads on server node will be increased rapidly. As a result, server node run out of memory.
How can I close this thread when client closed.
May be I do not run client in the current way.
Client Code
String cacheKey = "jobIds";
String cname = "myCacheName";
ClusterGroup rmts = getIgnite().cluster().forRemotes();
IgniteCache<String, List<String>> cache = getIgnite().getOrCreateCache(cname);
List<String> jobList = cache.get(cacheKey);
Collection<String> res = ignite.compute(rmts).apply(
new IgniteClosure<String, String>() {
#Override
public String apply(String word) {
return word;
}
},
jobList
);
getIgnite().close();
System.out.println("ignite Closed");
if (res == null) {
System.out.println("Error: Result is null");
return;
}
res.forEach(s -> {
System.out.println(s);
});
System.out.println("Finished!");
getIgnite(), get the instance of Ignite.
public static Ignite getIgnite() {
if (ignite == null) {
System.out.println("RETURN INSTANCE ..........");
Ignition.setClientMode(true);
ignite = Ignition.start(confCache);
ignite.configuration().setDeploymentMode(DeploymentMode.CONTINUOUS);
}
return ignite;
}
Server config:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<!--
Alter configuration below as needed.
-->
<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="peerClassLoadingEnabled" value="true"/>
<property name="peerClassLoadingMissedResourcesCacheSize" value="0"/>
<property name="publicThreadPoolSize" value="64"/>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>172.22.1.72:47500..47509</value>
<value>172.22.1.100:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="cacheMode" value="PARTITIONED"/>
<property name="memoryMode" value="ONHEAP_TIERED"/>
<property name="backups" value="0"/>
<property name="offHeapMaxMemory" value="0"/>
<property name="swapEnabled" value="false"/>
</bean>
</property>
</bean>
</beans>
These thread pools are static and number of threads in them never depends on load (number of executed operations, jobs, etc.). Having said that, I'm don't think they are the reason of OOME, unless you somehow start a new node within the same JVM for each executed job.
I would also recommend to always reuse the existing node that is already started in a JVM. Starting a new one and closing it for each job is a bad practice.
Threads are created in thread pools, so you may set their size in IgniteConfiguration: setUtilityCachePoolSize(int) and setMarshallerCachePoolSize(int) for Ignite 1.5 and setMarshallerCacheThreadPoolSize(int) for Ignite 1.7, and others.