Akka - Unable to send Discriminated Unions as messages in F# - akka.net

Akka - Discriminated Unions as messages in F#
I am unable to use discriminated unions as messages to akka actors. If anyone can point me at an example that does this, it would be much appreciated.
My own attempt at this is at git#github.com:Tweega/AkkaMessageIssue.git. (snippets below). It is a cutdown version of a sample found at https://github.com/rikace/AkkaActorModel.git (Chat project)
Problem
The DU message never finds its target on the server actor, but is sent to the deadletter box. If I send Objects, instead, they do arrive.
If I send a DU, but set my server actor to listen for generic Objects, the message does arrive, but its type is
seq [seq [seq []]
and I can't get at underlying DU.
The DU I am trying to send as message
type PrinterJob =
| PrintThis of string
| Teardown
The client code
let system = System.create "MyClient" config
let chatClientActor =
spawn system "ChatClient" <| fun mailbox ->
let server = mailbox.Context.ActorSelection("akka.tcp://MyServer#localhost:8081/user/ChatServer")
let rec loop nick = actor {
let! (msg:PrinterJob) = mailbox.Receive()
server.Tell(msg)
return! loop nick
}
loop ""
while true do
let input = Console.ReadLine()
chatClientActor.Tell(PrintThis(input))
Messages are forwarded to the client from console input
while true do
let input = Console.ReadLine()
chatClientActor.Tell(PrintThis(input))
The server code
let system = System.create "MyServer" config
let chatServerActor =
spawn system "ChatServer" <| fun (mailbox:Actor<_>) ->
let rec loop (clients:Akka.Actor.IActorRef list) = actor {
let! (msg:PrinterJob) = mailbox.Receive()
printfn "Received %A" msg //Received seq [seq [seq []]; seq [seq [seq []]]] ???
match msg with
| PrintThis str ->
Console.WriteLine("Printing: {0} Do we get this?", str)
return! loop clients
| Teardown ->
Console.WriteLine("Tearing down now")
return! loop clients
}
loop []
Dependencies
(I am not using paket here) - PM commands below:
Install-Package Akka -Version 1.4.23
Install-Package Akka.Remote -Version 1.4.23
Install-Package Akka.FSharp -Version 1.4.23
I am hosting the application in net5.0
Constructor argument names - oddity?
When passing in class instances as objects, akka seems to be sensitive to the name of constructor parameters. The message gets handled, but the data is not copied across from client to server. If you have a property called Username, the constructor parameter cannot be, for example, uName, otherwise its value is null when it reaches the server. Code for this is in branch params.
type DoesWork(montelimar: string) =
member x.Montelimar = montelimar
type DoesNotWork(montelimaro: string) =
member x.Montelimar = montelimaro

I opened an issue in the Akka.NET repository: https://github.com/akkadotnet/akka.net/issues/5194
And added a detailed reproduction for this: https://github.com/akkadotnet/akka.net/pull/5196
But it looks like Newtonsoft.Json really can't perform this deserialization without being given a type hint, which Akka.NET's network serialization does not do by default for JSON:
type TestUnion =
| A of string
| B of int * string
type TestUnion2 =
| C of string * TestUnion
| D of int
[<Fact(Skip="JSON.NET really does not support even basic DU serialization")>]
member _.``JSON.NET must serialize DUs`` () =
let du = C("a-11", B(11, "a-12"))
let settings = new JsonSerializerSettings()
settings.Converters.Add(new DiscriminatedUnionConverter())
let serialized = JsonConvert.SerializeObject(du, settings)
let deserialized = JsonConvert.DeserializeObject(serialized, settings)
Assert.Equal(du :> obj, deserialized)
That test will not pass and it doesn't use any of Akka.NET's infrastructure at all - so the default JSON serializer simply won't work for real-world F# use cases.
We can try changing the defaults of our serialization system to include a type hint, but that will take a lot of validation testing (for old Akka.Persistence data serialized without one).
A better solution, which my pull request validates, is to use Hyperion for polymorphic serialization instead - it will be similarly transparent to you but it has much more robust handling for complex types than Newtonsoft.Json and is actually faster: https://getakka.net/articles/networking/serialization.html#how-to-setup-hyperion-as-default-serializer

Related

Scalding Unit Test - How to Write A Local File?

I work at a place where scalding writes are augmented with a specific API to track dataset meta data. When converting from normal writes to these special writes, there are some intricacies with respect to Key/Value, TSV/CSV, Thrift ... datasets. I would like to compare the binary file is the same prior to conversion and after conversion to the special API.
Given I cannot provide the specific api for the metadata-inclusive writes, I only ask how can I write a unit test for .write method on a TypedPipe?
implicit val timeZone: TimeZone = DateOps.UTC
implicit val dateParser: DateParser = DateParser.default
implicit def flowDef: FlowDef = new FlowDef()
implicit def mode: Mode = Local(true)
val fileStrPath = root + "/test"
println("writing data to " + fileStrPath)
TypedPipe
.from(Seq[Long](1, 2, 3, 4, 5))
// .map((x: Long) => { println(x.toString); System.out.flush(); x })
.write(TypedTsv[Long](fileStrPath))
.forceToDisk
The above doesn't seem to write anything to local (OSX) disk.
So I wonder if I need to use a MiniDFSCluster something like this:
def setUpTempFolder: String = {
val tempFolder = new TemporaryFolder
tempFolder.create()
tempFolder.getRoot.getAbsolutePath
}
val root: String = setUpTempFolder
println(s"root = $root")
val tempDir = Files.createTempDirectory(setUpTempFolder).toFile
val hdfsCluster: MiniDFSCluster = {
val configuration = new Configuration()
configuration.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, tempDir.getAbsolutePath)
configuration.set("io.compression.codecs", classOf[LzopCodec].getName)
new MiniDFSCluster.Builder(configuration)
.manageNameDfsDirs(true)
.manageDataDfsDirs(true)
.format(true)
.build()
}
hdfsCluster.waitClusterUp()
val fs: DistributedFileSystem = hdfsCluster.getFileSystem
val rootPath = new Path(root)
fs.mkdirs(rootPath)
However, my attempts to get this MiniCluster to work haven't panned out either - somehow I need to link the MiniCluster with the Scalding write.
Note: The Scalding JobTest framework for unit testing isn't going to work due actual data written is sometimes wrapped in bijection codec or setup with case class wrappers prior to the writes made by the metadata-inclusive writes APIs.
Any ideas how I can write a local file (without using the Scalding REPL) with either Scalding alone or a MiniCluster? (If using the later, I need a hint how to read the file.)
Answering ... There is an example of how to use a mini cluster for exactly reading and writing to HDFS. I will be able to cross read with my different writes and examine them. Here it is in the tests for scalding's TypedParquet type
HadoopPlatformJobTest is an extension for JobTest that uses a MiniCluster.
With some hand-waiving on detail in the link, the bulk of the code is this:
"TypedParquetTuple" should {
"read and write correctly" in {
import com.twitter.scalding.parquet.tuple.TestValues._
def toMap[T](i: Iterable[T]): Map[T, Int] = i.groupBy(identity).mapValues(_.size)
HadoopPlatformJobTest(new WriteToTypedParquetTupleJob(_), cluster)
.arg("output", "output1")
.sink[SampleClassB](TypedParquet[SampleClassB](Seq("output1"))) {
toMap(_) shouldBe toMap(values)
}
.run()
HadoopPlatformJobTest(new ReadWithFilterPredicateJob(_), cluster)
.arg("input", "output1")
.arg("output", "output2")
.sink[Boolean]("output2")(toMap(_) shouldBe toMap(values.filter(_.string == "B1").map(_.a.bool)))
.run()
}
}

Receive message from an Elm process

I'm toying around with Elm processes in order to learn more about how they work. In parts of this, I'm trying to implement a timer.
I bumped into an obstacle, however: I can't find a way to access the result of a process' task in the rest of the code.
For a second, I hoped that if I make the task resolve with a Cmd, the Elm runtime would be kind enough to perform that effect for me, but that was a naive idea:
type Msg
= Spawned Process.Id
| TimeIsUp
init _ =
( Nothing
, Task.perform Spawned (Process.spawn backgroundTask)
)
backgroundTask : Task.Task y (Platform.Cmd.Cmd Msg)
backgroundTask =
Process.sleep 1000
-- pathetic attempt to send a Msg starts here
|> Task.map ( always
<| Task.perform (always TimeIsUp)
<| Task.succeed ()
)
-- and ends here
|> Task.map (Debug.log "Timer finished") -- logs "Timer finished: <internals>"
update msg state =
case msg of
Spawned id ->
(Just id, Cmd.none)
TimeIsUp ->
(Nothing, Cmd.none)
view state =
case state of
Just id ->
text "Running"
Nothing ->
text "Time is up"
The docs say
there is no public API for processes to communicate with each other.
I'm not sure if that implies that a process can't cummunicate with the rest of the app.
Is there any way to have update function receive a TimeIsUp once the process exits?
There is one way but it requires a port of hell:
make a fake HTTP request from the process,
then intercept it via JavaScript
and pass it back to Elm.
port ofHell : (() -> msg) -> Sub msg
subscriptions _ =
ofHell (always TimeIsUp)
backgroundTask : Task.Task y (Http.Response String)
backgroundTask =
Process.sleep 1000
-- nasty hack starts here
|> Task.andThen ( always
<| Http.task { method = "EVIL"
, headers = []
, url = ""
, body = Http.emptyBody
, resolver = Http.stringResolver (always Ok "")
, timeout = Nothing
}
)
Under the hood, Http.task invokes new XMLHttpRequest(), so we can intercept it by redefining that constructor.
<script src="elm-app.js"></script>
<div id=hack></div>
<script>
var app = Elm.Hack.init({
node: document.getElementById('hack')
})
var orig = window.XMLHttpRequest
window.XMLHttpRequest = function () {
var req = new orig()
var orig = req.open
req.open = function (method) {
if (method == 'EVIL') {
app.ports.ofHell.send(null)
}
return orig.open.apply(this, arguments)
}
return req
}
</script>
The solution is not production ready, but it does let you continue playing around with Elm processes.
Elm Processes aren't a fully fledged API at the moment. It's not possible to do what you want with the Process library on its own.
See the notes in the docs for Process.spawn:
Note: This creates a relatively restricted kind of Process because it cannot receive any messages. More flexibility for user-defined processes will come in a later release!
and the whole Future Plans section, eg.:
Right now, this library is pretty sparse. For example, there is no public API for processes to communicate with each other.

what is correct use of consumer groups in spring cloud stream dataflow and rabbitmq?

A follow up to this:
one SCDF source, 2 processors but only 1 processes each item
The 2 processors (del-1 and del-2) in the picture are receiving the same data within milliseconds of each other. I'm trying to rig this so del-2 never receives the same thing as del-1 and vice versa. So obviously I've got something configured incorrectly but I'm not sure where.
My processor has the following application.properties
spring.application.name=${vcap.application.name:sample-processor}
info.app.name=#project.artifactId#
info.app.description=#project.description#
info.app.version=#project.version#
management.endpoints.web.exposure.include=health,info,bindings
spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.security.servlet.SecurityAutoConfiguration
spring.cloud.stream.bindings.input.group=input
Is "spring.cloud.stream.bindings.input.group" specified correctly?
Here's the processor code:
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
public Object transform(String inputStr) throws InterruptedException{
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = " I AM [" + inputStr + "] AND I HAVE BEEN PROCESSED!!!!!!!";
log.info("SampleProcessor.transform() incoming inputStr="+inputStr);
return message;
}
Is the #Transformer annotation the proper way to link this bit of code with "spring.cloud.stream.bindings.input.group" from application.properties? Are there any other annotations necessary?
Here's my source:
private String format = "EEEEE dd MMMMM yyyy HH:mm:ss.SSSZ";
#Bean
#InboundChannelAdapter(value = Source.OUTPUT, poller = #Poller(fixedDelay = "1000", maxMessagesPerPoll = "1"))
public MessageSource<String> timerMessageSource() {
ApplicationLog log = new ApplicationLog(this, "timerMessageSource");
String message = new SimpleDateFormat(format).format(new Date());
log.info("SampleSource.timeMessageSource() message=["+message+"]");
return () -> new GenericMessage<>(new SimpleDateFormat(format).format(new Date()));
}
I'm confused about the "value = Source.OUTPUT". Does this mean my processor needs to be named differently?
Is the inclusion of #Poller causing me a problem somehow?
This is how I define the 2 processor streams (del-1 and del-2) in SCDF shell:
stream create del-1 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
stream create del-2 --definition ":split > processor-that-does-everything-sleeps5 --spring.cloud.stream.bindings.applicationMetrics.destination=metrics > :merge"
Do I need to do anything differently there?
All of this is running in Docker/K8s.
RabbitMQ is given by bitnami/rabbitmq:3.7.2-r1 and is configured with the following props:
RABBITMQ_USERNAME: user
RABBITMQ_PASSWORD <redacted>:
RABBITMQ_ERL_COOKIE <redacted>:
RABBITMQ_NODE_PORT_NUMBER: 5672
RABBITMQ_NODE_TYPE: stats
RABBITMQ_NODE_NAME: rabbit#localhost
RABBITMQ_CLUSTER_NODE_NAME:
RABBITMQ_DEFAULT_VHOST: /
RABBITMQ_MANAGER_PORT_NUMBER: 15672
RABBITMQ_DISK_FREE_LIMIT: "6GiB"
Are any other environment variables necessary?

which OO programming style in R will result readable to a Python programmer?

I'm author of the logging package on CRAN, I don't see myself as an R programmer, so I tried to make it as code-compatible with the Python standard logging package as I could, but now I have a question. and I hope it will give me the chance to learn some more R!
it's about hierarchical loggers. in Python I would create a logger and send it logging records:
l = logging.getLogger("some.lower.name")
l.debug("test")
l.info("some")
l.warn("say no")
In my R package instead you do not create a logger to which you send messages, you invoke a function where one of the arguments is the name of the logger. something like
logdebug("test", logger="some.lower.name")
loginfo("some", logger="some.lower.name")
logwarn("say no", logger="some.lower.name")
the problem is that you have to repeat the name of the logger each time you want to send it a logging message. I was thinking, I might create a partially applied function object and invoke that instead, something like
logdebug <- curry(logging::logdebug, logger="some.lower.logger")
but then I need doing so for all debugging functions...
how would you R users approach this?
Sounds like a job for a reference class ?setRefClass, ?ReferenceClasses
Logger <- setRefClass("Logger",
fields=list(name = "character"),
methods=list(
log = function(level, ...)
{ levellog(level, ..., logger=name) },
debug = function(...) { log("DEBUG", ...) },
info = function(...) { log("INFO", ...) },
warn = function(...) { log("WARN", ...) },
error = function(...) { log("ERROR", ...) }
))
and then
> basicConfig()
> l <- Logger$new(name="hierarchic.logger.name")
> l$debug("oops")
> l$info("oops")
2011-02-11 11:54:05 NumericLevel(INFO):hierarchic.logger.name:oops
> l$warn("oops")
2011-02-11 11:54:11 NumericLevel(WARN):hierarchic.logger.name:oops
>
This could be done with the proto package. This supports older versions of R (its been around for years) so you would not have a problem of old vs. new versions of R.
library(proto)
library(logging)
Logger. <- proto(
new = function(this, name)
this$proto(name = name),
log = function(this, ...)
levellog(..., logger = this$name),
setLevel = function(this, newLevel)
logging::setLevel(newLevel, container = this$name),
addHandler = function(this, ...)
logging::addHandler(this, ..., logger = this$name),
warn = function(this, ...)
this$log(loglevels["WARN"], ...),
error = function(this, ...)
this$log(loglevels["ERROR"], ...)
)
basicConfig()
l <- Logger.$new(name = "hierarchic.logger.name")
l$warn("this may be bad")
l$error("this definitely is bad")
This gives the output:
> basicConfig()
> l <- Logger.$new(name = "hierarchic.logger.name")
> l$warn("this may be bad")
2011-02-28 10:17:54 WARNING:hierarchic.logger.name:this may be bad
> l$error("this definitely is bad")
2011-02-28 10:17:54 ERROR:hierarchic.logger.name:this definitely is bad
In the above we have merely layered proto on top of logging but it would be possible to turn each logging object into a proto object, i.e. it would be both, since both logging objects and proto objects are R environments. That would get rid of the extra layer.
See the http://r-proto.googlecode.com for more info.
Why would you repeat the name? It would be more convenient to pass the log-object directly to the function, ie
logdebug("test",logger=l)
# or
logdebug("test",l)
A bit the way one would use connections in a number of functions. That seems more the R way of doing it I guess.

How to handle authentication and authorization with thrift?

I'm developing a system which uses thrift. I'd like clients identity to be checked and operations to be ACLed. Does Thrift provide any support for those?
Not directly. The only way to do this is to have an authentication method which creates a (temporary) key on the server, and then change all your methods so that the first argument is this key and they all additionally raise an not-authenticated error. For instance:
exception NotAuthorisedException {
1: string errorMessage,
}
exception AuthTimeoutException {
1: string errorMessage,
}
service MyAuthService {
string authenticate( 1:string user, 2:string pass )
throws ( 1:NotAuthorisedException e ),
string mymethod( 1:string authstring, 2:string otherargs, ... )
throws ( 1:AuthTimeoutException e, ... ),
}
We use this method and save our keys to a secured memcached instance with a 30min timeout for keys to keep everything "snappy". Clients who receive an AuthTimeoutException are expected to reauthorise and retry and we have some firewall rules to stop brute-force attacks.
Tasks like autorisation and permissions are not considered as a part of Thrift, mostly because these things are (usually) more related to the application logic than to a general RPC/serialization concept. The only Thing that Thrift supports out of the box right now is the TSASLTransport. I can't say much about that one myself, simply because I never felt the need to use it.
The other option could be to make use of THeaderTransport which unfortunately at the time of writing is only implemented with C++. Hence, if you plan to use it with some other language you may have to invest some additional work. Needless to say that we accept contributions ...
A bit late (I guess very late) but I had modified the Thrift Source code for this a couple of years ago.
Just submitted a ticket with the Patch to https://issues.apache.org/jira/browse/THRIFT-4221 for just this.
Have a look at that. Basically the proposal is to add a "BeforeAction" hook that does exactly that.
Example Golang generated diff
+ // Called before any other action is called
+ BeforeAction(serviceName string, actionName string, args map[string]interface{}) (err error)
+ // Called if an action returned an error
+ ProcessError(err error) error
}
type MyServiceClient struct {
## -391,7 +395,12 ## func (p *myServiceProcessorMyMethod) Process(seqId int32, iprot, oprot thrift.TP
result := MyServiceMyMethodResult{}
var retval string
var err2 error
- if retval, err2 = p.handler.MyMethod(args.AuthString, args.OtherArgs_); err2 != nil {
+ err2 = p.handler.BeforeAction("MyService", "MyMethod", map[string]interface{}{"AuthString": args.AuthString, "OtherArgs_": args.OtherArgs_})
+ if err2 == nil {
+ retval, err2 = p.handler.MyMethod(args.AuthString, args.OtherArgs_)
+ }
+ if err2 != nil {
+ err2 = p.handler.ProcessError(err2)