I am starting to use Spark Streaming to process a real time data feed I am getting. My scenario is I have a Akka actor receiver using "with ActorHelper", then I have my Spark job doing some mappings and transformation and then I want to send the result to another actor.
My issue is the last part. When trying to send to another actor Spark is raising an exception:
15/02/20 16:43:16 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.IllegalStateException: Trying to deserialize a serialized ActorRef without an ActorSystem in scope. Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'
The way I am creating this last actor is the following:
val actorSystem = SparkEnv.get.actorSystem
val lastActor = actorSystem.actorOf(MyLastActor.props(someParam), "MyLastActor")
And then using it like this:
result.foreachRDD(rdd => rdd.foreachPartition(lastActor ! _))
I am not sure where or how to do the advise "Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'". Do I need to set anything special through configuration? Or create my actor differently?
Look at the following example to access an actor outside of the Spark domain.
/*
* Following is the use of actorStream to plug in custom actor as receiver
*
* An important point to note:
* Since Actor may exist outside the spark framework, It is thus user's responsibility
* to ensure the type safety, i.e type of data received and InputDstream
* should be same.
*
* For example: Both actorStream and SampleActorReceiver are parameterized
* to same type to ensure type safety.
*/
val lines = ssc.actorStream[String](
Props(new SampleActorReceiver[String]("akka.tcp://test#%s:%s/user/FeederActor".format(
host, port.toInt))), "SampleReceiver")
I found that if I collect before I send to the actor it works like a charm:
result.foreachRDD(rdd => rdd.collect().foreach(producer ! _))
Related
There is a flow as per below scenario.
Initiating Party : PartyA
Responding Party : PartyB
Transaction 1: Input StateA - ContractA results in output StateB - ContractA. Participants are PartyA and PartyB
Transaction 2: Input StateB - ContractA and no output. Participants are PartyA and PartyB
Is this possible in Corda? Please do share an example with response. Thanks.
It sounds like you're getting two different error messages:
If you don't try and initiate a second flow-session to get the second signature, you get something like:
net.corda.core.flows.UnexpectedFlowEndException: Counterparty flow on
O=Mock Company 2, L=London, C=GB has completed without sending data
While if you do initiate a second flow-session to get the second signature, you get something like:
java.lang.IllegalStateException: Attempted to initiateFlow() twice in
the same InitiatingFlow
com.example.flow.ExampleFlow$Initiator#312d7fe4 for the same party
O=Mock Company 2, L=London, C=GB. This isn't supported in this version
of Corda. Alternatively you may initiate a new flow by calling
initiateFlow() in an #InitiatingFlow sub-flow.
In the first case, the error is caused by the fact that the counterparty's flow has already completed. You try and get around this by creating a second flow session, but each Initiating flow can only initiate a single flow-session with a given counterparty.
Instead, you simply need to modify the responder flow to sign twice. For example:
#InitiatedBy(Initiator::class)
class Acceptor(val otherPartyFlow: FlowSession) : FlowLogic<Unit>() {
#Suspendable
override fun call() {
val signTransactionFlow = object : SignTransactionFlow(otherPartyFlow) {
override fun checkTransaction(stx: SignedTransaction) = requireThat {
// Transaction checks...
}
}
subFlow(signTransactionFlow)
subFlow(signTransactionFlow)
}
}
yes it is possible .Please find the link to know more
https://docs.corda.net/key-concepts-transactions.html
Situation is the following.
We have setup SSL + ACLs in Kafka Broker.
We are setting up stream, which reads messages from two topics:
KStream<String, String> stringInput
= kBuilder.stream( STRING_SERDE, STRING_SERDE, inTopicName );
stringInput
.filter( streamFilter::passOrFilterMessages )
.map( processor )
.to( outTopicName );
It is done like two times (in the loop).
Then we are setting general error handler:
streams.setUncaughtExceptionHandler( ( Thread t, Throwable e ) -> {
synchronized ( this ) {
LOG.fatal( ... );
this.stop();
}
}
);
Problem is the following. If for example in one topic certificate is no more valid. The stream is throwing exception Not authorized to access topics ...
So far so good.
But the exception is handled by general error handler, so the complete application stops even if the second topic has no problems.
The question is, how to handle this exception per topic?
How to avoid situation that at some moment complete application stops due to the problem that one single topic has problems with authorization?
I understand that if Broker is not available, then complete app may stop. But if only one topic is not available, then single stream shall stop, and not complete application, or?
By design, Kafka Streams treats the topology a one and cannot distinguish between both parts. For your specific case, as you loop and build to independent pipelines, you could run two KafkaStreams instances in parallel (within the same application/JVM) to isolate both from each other. Thus, if one fails, the other one is not affected. You would need to use two different application.id for both instances.
I'm trying to implement a Message Broker set up with Lagom 1.2.2 and have run into a wall. The documentation has the following example for the service descriptor:
default Descriptor descriptor() {
return named("helloservice").withCalls(...)
// here we declare the topic(s) this service will publish to
.publishing(
topic("greetings", this::greetingsTopic)
)
....;
}
And this example for the implementation:
public Topic<GreetingMessage> greetingsTopic() {
return TopicProducer.singleStreamWithOffset(offset -> {
return persistentEntityRegistry
.eventStream(HelloEventTag.INSTANCE, offset)
.map(this::convertEvent);
});
}
However, there's no example of what the argument type or return type of the convertEvent() function are, and this is where I'm drawing a blank. On the other end, the subscriber to the MessageBroker, it seems that it's consuming GreetingMessage objects, but when I create a function convertEvent to return GreetingMessage objects, I get a compilation error:
Error:(61, 21) java: method map in class akka.stream.javadsl.Source<Out,Mat> cannot be applied to given types;
required: akka.japi.function.Function<akka.japi.Pair<com.example.GreetingEvent,com.lightbend.lagom.javadsl.persistence.Offset>,T>
found: this::convertEvent
reason: cannot infer type-variable(s) T
(argument mismatch; invalid method reference
incompatible types: akka.japi.Pair<com.example.GreetingEvent,com.lightbend.lagom.javadsl.persistence.Offset> cannot be converted to com.example.GreetingMessage)
Are there any more more thorough examples of how to use this? I've already checked in the Chirper sample app and it doesn't seem to have an example of this.
Thanks!
The error message you pasted tells you exactly what map expects:
required: akka.japi.function.Function<akka.japi.Pair<com.example.GreetingEvent,com.lightbend.lagom.javadsl.persistence.Offset>,T>
So, you need to pass a function that takes Pair<GreetingEvent, Offset>. What should the function return? Well, update it to take that, and then you'll get the next error, which once again will tell you what it was expecting you to return, and in this instance you'll find it's Pair<GreetingMessage, Offset>.
To explain what these types are - Lagom needs to track which events have been published to Kafka, so that when you restart a service, it doesn't start from the beginning of your event log and republish all the events from the beginning of time again. It does this by using offsets. So the event log produces pairs of events and offsets, and then you need to transform these events to the messages that will be published to Kafka, and when you returned the transformed message to Lagom, it needs to be a in a pair with the offset that you got from the event log, so that after publishing to Kafka, Lagom can persist the offset, and use that as the starting point next time the service is restarted.
A full example can be seen here: https://github.com/lagom/online-auction-java/blob/a32e696/bidding-impl/src/main/java/com/example/auction/bidding/impl/BiddingServiceImpl.java#L91
As a follow up to a previous question of mine, I want to find all 30 pathways that exist between two given nodes within a depth of 4. Something to the effect of this:
start startnode = node(1), endnode(1000)
match startnode-[r:rel_Type*1..4]->endnode
return r
limit 30;
My database contains ~50k nodes and 2M relationships.
Expectedly, the computation time for this query is very, very large; I even ended up with the following GC message in the message.log file: GC Monitor: Application threads blocked for an additional 14813ms [total block time: 182.589s]. This error keeps occuring, and blocks all threads for an indefinite period of time. Therefore, I am looking for a way to lower the computational strain of this query on the server by optimizing the query.
Is there any extension I could use to help optimize this query?
Give this one a try:
https://github.com/wfreeman/findpaths
You can query the extension like so:
.../findpathslen/1/1000/4/30
And it will give you a json response with the paths found. Hopefully that helps you.
The meat of it is here, using the built-in graph algorithm to find paths of a certain length:
#GET
#Path("/findpathslen/{id1}/{id2}/{len}/{count}")
#Produces(Array("application/json"))
def fof(#PathParam("id1") id1:Long, #PathParam("id2") id2:Long, #PathParam("len") len:Int, #PathParam("count") count:Int, #Context db:GraphDatabaseService) = {
val node1 = db.getNodeById(id1)
val node2 = db.getNodeById(id2)
val pathFinder = GraphAlgoFactory.pathsWithLength(Traversal.pathExpanderForAllTypes(Direction.OUTGOING), len)
val pathIterator = pathFinder.findAllPaths(node1,node2).asScala
val jsonMap = pathIterator.take(count).map(p => obj(p))
Response.ok(compact(render(decompose(jsonMap))), MediaType.APPLICATION_JSON).build()
}
From Don Syme blog (http://blogs.msdn.com/b/dsyme/archive/2010/01/10/async-and-parallel-design-patterns-in-f-reporting-progress-with-events-plus-twitter-sample.aspx) I tried to implement a twitter stream listener. My goal is to follow the guidance of the twitter api documentation which says "that tweets should often be saved or queued before processing when building a high-reliability system".
So my code needs to have two components:
A queue that piles up and processes each status/tweet json
Something to read the twitter stream that dumps to the queue the tweet in json strings
I choose the following:
An agent to which I post each tweet, that decodes the json, and dumps it to database
A simple http webrequest
I also would like to dump into a text file any error from inserting in the database. ( I will probably switch to a supervisor agent for all the errors).
Two problems:
is my strategy here any good ? If I understand correctly, the agent behaves like a smart queue and processes its messages asynchronously ( if it has 10 guys on its queue it will process a bunch of them at time, instead of waiting for the 1 st one to finish then the 2nd etc...), correct ?
According to Don Syme's post everything before the while is Isolated so the StreamWriter and the database dump are Isolated. But because I need this, I never close my database connection... ?
The code looks something like:
let dumpToDatabase databaseName =
//opens databse connection
fun tweet -> inserts tweet in database
type Agent<'T> = MailboxProcessor<'T>
let agentDump =
Agent.Start(fun (inbox: MailboxProcessor<string>) ->
async{
use w2 = new StreamWriter(#"\Errors.txt")
let dumpError =fun (error:string) -> w2.WriteLine( error )
let dumpTweet = dumpToDatabase "stream"
while true do
let! msg = inbox.Receive()
try
let tw = decode msg
dumpTweet tw
with
| :? MySql.Data.MySqlClient.MySqlException as ex ->
dumpError (msg+ex.ToString() )
| _ as ex -> ()
}
)
let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
let parameters = "track=RT&"
let stream_url = filter_url
let stream = twitterStream MyCredentials stream_url parameters
while true do
agentDump.Post(stream.ReadLine())
Thanks a lot !
Edit of code with processor agent:
let dumpToDatabase (tweets:tweet list)=
bulk insert of tweets in database
let agentProcessor =
Agent.Start(fun (inbox: MailboxProcessor<string list>) ->
async{
while true do
let! msg = inbox.Receive()
try
msg
|> List.map(decode)
|> dumpToDatabase
with
| _ as ex -> Console.WriteLine("Processor "+ex.ToString()))
}
)
let agentDump =
Agent.Start(fun (inbox: MailboxProcessor<string>) ->
let rec loop messageList count = async{
try
let! newMsg = inbox.Receive()
let newMsgList = newMsg::messageList
if count = 10 then
agentProcessor.Post( newMsgList )
return! loop [] 0
else
return! loop newMsgList (count+1)
with
| _ as ex -> Console.WriteLine("Dump "+ex.ToString())
}
loop [] 0)
let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
let parameters = "track=RT&"
let stream_url = filter_url
let stream = twitterStream MyCredentials stream_url parameters
while true do
agentDump.Post(stream.ReadLine())
I think that the best way to describe agent is that it is is a running process that keeps some state and can communicate with other agents (or web pages or database). When writing agent-based application, you can often use multiple agents that send messages to each other.
I think that the idea to create an agent that reads tweets from the web and stores them in a database is a good choice (though you could also keep the tweets in memory as the state of the agent).
I wouldn't keep the database connection open all the time - MSSQL (and MySQL likely too) implements connection pooling, so it will not close the connection automatically when you release it. This means that it is safer and similarly efficient to reopen the connection each time you need to write data to the database.
Unless you expect to receive a large number of error messages, I would probably do the same for file stream as well (when writing, you can open it, so that new content is added to the end).
The way queue of F# agents work is that it processes messages one by one (in your example, you're waiting for a message using inbox.Receive(). When the queue contains multiple messages, you'll get them one by one (in a loop).
If you wanted to process multiple messages at once, you could write an agent that waits for, say, 10 messages and then sends them as a list to another agent (which would then perform bulk-processing).
You can also specify timeout parameter to the Receive method, so you could wait for at most 10 messages as long as they all arrive within one second - this way, you can quite elegantly implement bulk processing that doesn't hold messages for a long time.