I am trying to run ignite basic example and it is failing with spark TaskNotSerializable error. Could you please help me out.?
val ignite = Ignition.start("/usr/local/ignite/config/example-ignite.xml");
val cfg = ignite.configuration()
val ic = new IgniteContext[Integer, Integer](sc, () => cfg)
Ignition.setClientMode(true);
val sharedRdd = ic.fromCache("example")
val x = sqlContext.sparkContext.parallelize(1 to 10000, 10).map(i => (new Integer(i), new Integer(i)))
sharedRdd.savePairs(x)
If your class is not implements as Serializable then also above exception occurs due to spark. To avoid this exception make your class Serializable (implements Serializable)
Related
I have a service called AcctBeneficiaryService and trying to write junit test case for it and getting failed due to Null pointer exception at for loop
My service would look like this
#Service
public class AcctBeneficiaryService {
public List<AcctBeneficiary> getAccountBeneficiaryDetails(long rltshpId) throws Exception {
RltshpData rlshpData = rltshpClient.getRltshpInfo(rltshpId);
List<AcctBeneficiary> acctBeneficiaries = new ArrayList<>();
List<Long> conidList = new ArrayList<>();
**for(BigDecimal account : rlshpData.getAccounts())**
(Null pointer here) {
AcctBeneficiary acctBeneficiary = new AcctBeneficiary();
Account accountInfo = new Account();
My Test Case would look like this
public void testGetAccountBeneficiaryDetails() throws Exception {
//Initializing required objects for Mock
RltshpData rlshpData = new RltshpData();
RltshpInfo rltshpInfo = new RltshpInfo();
List<BigDecimal> accounts = new ArrayList<>();
//Mock Data for rltshpInfo
rltshpInfo.setIrNo(135434);
rltshpInfo.setIsoCtryCd("US");
rltshpInfo.setRltshpNa("Individual");
//Mock Data for accounts
accounts.add(new BigDecimal(4534));
//accounts.add(new BigDecimal(4564));
//populate mocking data to the object
rlshpData.setRltshpInfo(rltshpInfo);
rlshpData.setAccounts(accounts);
when(service.getAccountBeneficiaryDetails(new Long(1234))).thenReturn(Mockito.
<AcctBeneficiary>anyList());
assertNotNull(acctBeneficiaryList);
Error Details
java.lang.NullPointerException
at
.fplbeneficiariesrest.service.AcctBeneficiaryService.getAccountBeneficiaryDetails
(AcctBeneficiaryService.java:64)
at
.fplbeneficiariesrest.service.AcctBeneficiaryServiceTest.
testGetAccountBeneficiaryDetails(AcctBeneficiaryServiceTest.java:160)
Null pointer is at
for(BigDecimal account : rlshpData.getAccounts()) This for loop is not able to take the Mocked data as rlshpData.getAccounts() is getting executed before
I'm creating a Neo4J procedure to import RDF data. The RDF data has a few complex structure information and I write the tests to cover every case (some triples creates labels, some properties, some relations...etc..).
The procedures are written in Kotlin.
It works fine, and actually each test when executed individually succeeds. but when I run the whole test case at one, I get one success then all other tests fails with the exception:
org.neo4j.kernel.impl.core.ThreadToStatementContextBridge$BridgeDatabaseShutdownException: This database is shutdown.
I'm new to Neo4J and I struggle to find good examples, here is the structure of a test case:
package mypackage
import org.junit.Assert
import org.junit.Rule
import org.junit.Test
import org.neo4j.driver.internal.value.NullValue
import org.neo4j.driver.v1.Config
import org.neo4j.driver.v1.GraphDatabase
import org.neo4j.harness.junit.Neo4jRule
import java.io.File
import java.net.URI
class PropertyParserTest {
// this rule starts a Neo4j instance
#Rule
#JvmField
var neo4j: Neo4jRule = Neo4jRule()
// This is the procedure/function to test
.withProcedure(Mypackage::class.java)
#Test
#Throws(Throwable::class)
fun shouldSetTheNameCorrectly() {
GraphDatabase.driver(neo4j.boltURI(), Config.build().withoutEncryption().toConfig()).use({ driver ->
driver.session().use({ session ->
// Given
val path: String = File("src/test/resources/test_rdf__1.ttl").getAbsolutePath()
val testFile = File(path)
val urlTestFile: URI = testFile.toURI()
session.run("CALL mypackage.import('${urlTestFile}')")
// When
val result = session.run("MATCH (n) WHERE n:Person RETURN n.name as name")
// Then
var rec = result.next()
Assert.assertEquals("Manuel, Niklaus (Niclaus)", rec.get("name").asString())
rec = result?.next()
Assert.assertEquals("Fischli / Weiss", rec.get("name").asString())
rec = result?.next()
Assert.assertEquals("Hodler, Ferdinand", rec.get("name").asString())
})
})
}
#Test
#Throws(Throwable::class)
fun shouldSetTheAlternateNameCorrectly() {
GraphDatabase.driver(neo4j?.boltURI(), Config.build().withoutEncryption().toConfig()).use({ driver ->
driver.session().use({ session ->
// Given
val path: String = File("src/test/resources/test_rdf_2.ttl").absolutePath
val testFile = File(path)
val urlTestFile: URI = testFile.toURI()
session.run("CALL mypackage.import('${urlTestFile}')")
// When
val result = session.run("MATCH (n) WHERE n:Person RETURN n.name as name, n.alternate_names as alternate_names")
// Then
var rec = result.next()
Assert.assertEquals("Holbein, Hans", rec.get("name").asString())
var alternateNames = rec.get("alternate_names").asList()
Assert.assertEquals(9, alternateNames.size)
Assert.assertEquals("Holpenius, Joannes", alternateNames[0])
Assert.assertEquals("Olpenius, Hans", alternateNames[8])
rec = result.next()
Assert.assertEquals("Manuel, Niklaus (Niclaus)", rec.get("name").asString())
alternateNames = rec.get("alternate_names").asList()
Assert.assertEquals(8, alternateNames.size)
rec = result.next()
Assert.assertEquals("Fischli / Weiss", rec.get("name").asString())
Assert.assertTrue(rec.get("alternate_names") is NullValue)
rec = result.next()
Assert.assertEquals("Hodler, Ferdinand", rec.get("name").asString())
Assert.assertTrue(rec.get("alternate_names") is NullValue)
rec = result.next()
Assert.assertEquals("Holbein", rec.get("name").asString())
alternateNames = rec.get("alternate_names").asList()
Assert.assertEquals(3, alternateNames.size)
})
})
}
}
Any idea ? I'm using this code base as starting point : https://github.com/jbarrasa/neosemantics/blob/3.3/src/test/java/semantics/RDFImportTest.java
ok I've actually found a code in my procedure executing outside a transaction and I think this may be the cause of the problem. When I group everything in a single transaction, I don't have this problem anymore.
I'm not entirely sure why it works for individual tests and fails when running the whole test case, but this works now.
I'm using Kafka to send a file with 3 columns using Spark streaming 1.3 to insert into HBase.
This is how my HBase looks like :
ROW COLUMN+CELL
zone:bizert column=travail:call, timestamp=1491836364921, value=contact:numero
zone:jendouba column=travail:Big data, timestamp=1491835836290, value=contact:email
zone:tunis column=travail:info, timestamp=1491835897342, value=contact:num
3 row(s) in 0.4200 seconds
And this is how I read data with spark streaming, I'm using spark-shell:
import org.apache.spark.streaming.{ Seconds, StreamingContext }
import org.apache.spark.streaming.kafka.KafkaUtils
import kafka.serializer.StringDecoder
val ssc = new StreamingContext(sc, Seconds(10))
val topicSet = Set ("zed")
val kafkaParams = Map[String, String]("metadata.broker.list" -> "xx.xx.xxx.xx:9092")
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicSet)
lines.foreachRDD(rdd => { (!rdd.partitions.isEmpty)
lines.saveAsTextFiles("hdfs://xxxxx:8020/user/admin/zed/steams3/")
})
this code is working when I'm saving data into HDFS even it save many empty data to HDFS.
before writing this question I was searching here and some other question like mine but I didn't get a good solution.
May you propose the best way to do this?.
This is how my code look now
val sc = new SparkContext("local", "Hbase spark")
val tableName = "notz"
val conf = HBaseConfiguration.create()
conf.addResource(new Path("file:///opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/hbase/conf.dist/hbase-site.xml"))
conf.set(TableInputFormat.INPUT_TABLE, tableName)
val admin = new HBaseAdmin(conf)
lines.foreachRDD(rdd => { (!rdd.partitions.isEmpty)
if(!admin.isTableAvailable(tableName)) {
print("Creating GHbase Table")
val tableDesc = new HTableDescriptor(tableName)
tableDesc.addFamily(new HColumnDescriptor("zone"
.getBytes()))
admin.createTable(tableDesc)
}else{
print("Table already exists!!")
}
val myTable = new HTable(conf, tableName)
// i'm blocked here
})
I am loading data from mysql into Ignite cache with following code. The code is run with client mode Ignite and will load the data into Ignite cluster.
I would ask:
Which parts of the code will run at the server side?
The working mechanism of loading data into cache looks like map-reduce, so, what tasks are sent to the server? the sql?
I would particularlly ask: will the following code run at the client side or the server sdie?
CacheConfiguration cfg = StudentCacheConfig.cache("StudentCache", storeFactory);
IgniteCache cache = ignite.getOrCreateCache(cfg);
Following is the full code that loads the data into cache
public class LoadStudentIntoCache {
public static void main(String[] args) {
Ignition.setClientMode(false);
String configPath = "default-config.xml";
Ignite ignite = Ignition.start(configPath);
CacheJdbcPojoStoreFactory storeFactory = new CacheJdbcPojoStoreFactory<Integer, Student>();
storeFactory.setDialect(new MySQLDialect());
IDataSourceFactory factory = new MySqlDataSourceFactory();
storeFactory.setDataSourceFactory(new Factory<DataSource>() {
public DataSource create() {
try {
DataSource dataSource = factory.createDataSource();
return dataSource;
} catch (Exception e) {
return null;
}
}
});
//
CacheConfiguration<Integer, Student> cfg = StudentCacheConfig.cache("StudentCache", storeFactory);
IgniteCache<Integer, Student> cache = ignite.getOrCreateCache(cfg);
List<String> sqls = new ArrayList<String>();
sqls.add("java.lang.Integer");
sqls.add("select id, name, birthday from db1.student where id < 1000" );
sqls.add("java.lang.Integer");
sqls.add("select id, name, birthday from db1.student where id >= 1000 and id < 1000" );
cache.loadCache(null, , sqls.toArray(new String[0]));
Student s = cache.get(1);
System.out.println(s.getName() + "," + s.getBirthday());
ignite.close();
}
}
The code you showed here will be executed within your application, there is no magic happening. Usually it's a client node, however in your case it's started in server mode (probably by mistake): Ignition.setClientMode(false).
The data loading process will happen on each server node. I.e. each server node will execute SQL queries provided to load the data from the DB.
I am trying to broadcast a variable within a for loop in spark. During this process, spark is throwing task not serialize-able error. If the same variable is broadcasted out of the for loop, there is no error. Below is code snippet that throws error. Any help is appreciated.
var Final = computedRDD.filter(x => x.Id == uniqueKey(0))
for (partId <- uniqueKey) {
val FinalBroadcast = sc.broadcast(Final.collect)
val computeNew = computedRDD.filter(x => x.partId == partId).repartition(executors).mapPartitions(performFinalPass(FinalBroadcast))
computeNew.collect.forall(x => Final.add(x))
}