How to Integrate Akka.Net Streams with AspNet core, or giraffe - asp.net-core

When I use Giraffe or ASP.Net Core in general, I can create an actor system, add it as a service and then get It thought the request handler select any actor and ask/tell a message.
Either using Cluster.Sharding or a normal user/actor I know it will be a single instance of the actor in the whole system processing multiple messages.
How can I do the same communication with Streams? They don’t seem to be references in the router, or the actor system as the actor paths: Actor References, Path and Addresses.
Should this be done differently?
Copying from the IO section, I could materialize one graph to handle each request, but in general I communicate with “Singletons” like Domain Driven Design Aggregate Roots to handle the domain logic (thats why the sharding module), I’m not sure how to do Singleton Sinks that can be used in the newly materialized graph in the request handler, as there must be only one sink for all the requests.

There are many ways to integrate akka streams with external systems. The one that makes it easy recipient would be Source.queue (somewhat similar to System.Threading.Channels and predating them). You can materialize your stream at initialization point and then register queue endpoints in Giraffe DI - this way you don't pay cost of the same stream initialization on every request:
open Akka.Streams
open Akkling
open Akkling.Streams
open FSharp.Control.Tasks.Builders
let run () = task {
use sys = System.create "sys" <| Configuration.defaultConfig()
use mat = sys.Materializer()
// construct a stream with async queue on both ends with buffer for 10 elements
let sender, receiver =
Source.queue OverflowStrategy.Backpressure 10
|> Source.map (fun x -> x * x)
|> Source.toMat (Sink.queue) Keep.both
|> Graph.run mat
// send data to a queue - quite often result could be just ignored
match! sender.OfferAsync 2 with
| :? QueueOfferResult.Enqueued -> () // successfull
| :? QueueOfferResult.Dropped -> () // doesn't happen in OverflowStrategy.Backpressure
| :? QueueOfferResult.QueueClosed -> () // queue has been already closed
| :? QueueOfferResult.Failure as f -> eprintfn "Unexpected failure: %O" f.Cause
// try to receive data from the queue
match! receiver.AsyncPull() with
| Some data -> printfn "Received: %i" data
| None -> printfn "Stream has been prematurelly closed"
// asynchronously close the queue
sender.Complete()
do! sender.WatchCompletionAsync()
}

Related

Reactor Kafka consume messages synchronous and process them async

I'm quite new into the reactive world and using Spring Webflux + reactor Kafka.
kafkaReceiver
.receive()
// .publishOn(Schedulers.boundedElastic())
.doOnNext(a -> log.info("Reading message: {}", a.value()))
.concatMap(kafkaRecord ->
//perform DB operation
//kafkaRecord.receiverOffset.ackwnowledge
)
.doOnError(e -> log.error("Error", e))
.retry()
.subscribe();
I understood that in order to parallelise message consumption, I have to instantiate one KafkaReceiver for each partition but is it possible/recommended for a partition to read messages in a synchronous manner and process them async (including the manual acknowledge)?
So that this is the desired output:
Reading message:1
Reading message:2
Reading message:3
Reading message:4
Stored message 1 in DB + ack
Reading message:5
Stored message 2 in DB + ack
Stored message 5 in DB + ack
Stored message 3 in DB + ack
Stored message 4 in DB + ack
In case of errors, I'm thinking of publishing the record to a DLT.
I've tried with flatMap too, but it seems that the entire processing happens sequentially on a single thread. Also if I'm publishing to a new scheduler, the processing happens on a new single Thread.
If what I'm asking is possible, can someone please help me with a code snippet?
What's the output of your current code log ?

What does it mean that an object handle has so many TimerQueueTimer references

I have an app in which I suspect a memory leak. Not only in the heap, but it seems to me the whole working set is growing for each request that is made to my app. I am trying to debug it according to these instructions but I am having a hard time interpreting what I see. I am using the dotnet-dump tool to analyze a dump.
All in all I have 618 DocumentClient instances if I interpret it correctly. Of course that will add up to a lot of data in strings, byte arrays etc.
Statistics:
MT Count TotalSize Class Name
00007f853c355110 618 187872 Microsoft.Azure.Cosmos.DocumentClient
Here is a snippet of a single reference taken from the method table of the document client. See the pastebin for full reference. It continues for 1200+ lines with mostly TimerQueueTimer references.
00007F85AF2F10D8 (strong handle)
-> 00007F84C80FBAD8 System.Object[]
-> 00007F84C80FBB00 System.Threading.ThreadLocal`1+LinkedSlotVolatile[[System.Collections.Concurrent.ConcurrentBag`1+WorkStealingQueue[[System.IDisposable, System.Private.CoreLib]], System.Collections.Concurrent]][]
-> 00007F84C80FBB40 System.Threading.ThreadLocal`1+LinkedSlot[[System.Collections.Concurrent.ConcurrentBag`1+WorkStealingQueue[[System.IDisposable, System.Private.CoreLib]], System.Collections.Concurrent]]
-> 00007F84C80FBB70 System.Collections.Concurrent.ConcurrentBag`1+WorkStealingQueue[[System.IDisposable, System.Private.CoreLib]]
-> 00007F84C80FBBB0 System.IDisposable[]
-> 00007F84C80FBA90 System.Diagnostics.DiagnosticListener+DiagnosticSubscription
-> 00007F84C80FAF30 Microsoft.ApplicationInsights.AspNetCore.DiagnosticListeners.HostingDiagnosticListener
-> 00007F84C80EB450 Microsoft.ApplicationInsights.Extensibility.TelemetryConfiguration
-> 00007F84C80D5688 Microsoft.ApplicationInsights.Extensibility.Implementation.ApplicationId.ApplicationInsightsApplicationIdProvider
-> 00007F84C80D5A60 Microsoft.ApplicationInsights.Extensibility.Implementation.ApplicationId.ProfileServiceWrapper
-> 00007F84C80D5A88 System.Net.Http.HttpClient
-> 00007F84C80D5AD0 System.Net.Http.HttpClientHandler
-> 00007F84C80D5B00 System.Net.Http.SocketsHttpHandler
-> 00007F84D80D1018 System.Net.Http.RedirectHandler
-> 00007F84D80D1000 System.Net.Http.HttpConnectionHandler
-> 00007F84D80D0D38 System.Net.Http.HttpConnectionPoolManager
-> 00007F84D80D0F70 System.Threading.Timer
-> 00007F84D80D0FE8 System.Threading.TimerHolder
-> 00007F84D80D0F88 System.Threading.TimerQueueTimer
-> 00007F84C80533A0 System.Threading.TimerQueue
-> 00007F84D910F3C0 System.Threading.TimerQueueTimer
-> 00007F84D910EE58 System.Threading.TimerQueueTimer
-> 00007F84D910A680 System.Threading.TimerQueueTimer
https://pastebin.com/V8CNQjR7
Do I have an Application Insights or Cosmos memory leak? Why are there so many TimerQueueTimer references?
await Task.Delay create new TimerQueueTimer on every call.
Lots of TimerQueueTimer is sign of someone is using await Task.Delay() in a loop, instead of using simple new Timer().
-> Microsoft.Azure.Cosmos.Routing.GlobalEndpointManager+<StartRefreshLocationTimer>d__25
-> Microsoft.Azure.Cosmos.Routing.GlobalEndpointManager
Looks like GlobalEndpointManager of Microsoft.Azure.Cosmos uses await Task.Delay every time exception is thrown in StartRefreshLocationTimer method of GlobalEndpointManager.cs class
You can try few things here:
1) Check which exception is thrown and how to avoid it.
My guess this should help log exception:
DefaultTrace.TraceSource.Listeners.Add(new System.Diagnostics.ConsoleTraceListener())
(check example)
2) make sure ShouldRefreshEndpoints returns false, if it's ok for your app :)

filterLogging not working on Database.Persist.Sql's runSqlPool function

The Haskell library Database.Persist.Sqlite includes functions that run within a LoggingT context, to control debugging output. So I expected to be able to limit the debugging output they produce, thus:
runStdoutLoggingT . filterLogger (\_ _ -> False) (runSqlPool (insertBy myData) myPool)
(condensed and simplified from my actual code) However, it doesn't suppress logging. The evalation of insertBy produces a line on stdout of the form
[Debug#SQL] SELECT "id","key","data_source_row_id","loaded" FROM "data_row" WHERE "key"=? AND "data_source_row_id"=?; [PersistText blahblahblah]
So why isn't the output suppressed by the filterLogger call ?
Since the question has received two downvotes, I'll add that the pattern shown above (i.e., runStdoutLoggingT . filterLogger) is used in many GitHub projects and I can't see how my application is any different. It is somewhat frustrating to be downvoted without explanation or means of recourse.
The architecture of Persistent is a little circuitous and underdocumented:
withSqlPool takes a builder. The builder is able to build a SqlBackend out of any "logging function" (basically the internal type that MonadLogger uses). The function then creates a resource pool of SqlBackends, for you to acquire and release and use. This is the continuation argument Pool SqlBackend -> m a you pass in. In return, withSqlPool promises to give you back a bunch of side effects, typed as (MonadIO m, MonadBaseControl IO m, MonadLogger m) => m a.
runSqlPool, on the other hand, takes a MonadBaseControl IO m => ReaderT SqlBackend m a and a Pool SqlBackend and returns m a. We can infer from this that it basically acquires a SqlBackend from the resource pool, uses it to construct and run a SQL query, and then returns MonadBaseControl IO => m a. Indeed, its documentation is "Get a connection from the pool, run the given action, and then return the connection to the pool."
Though named similarly, they do two very different things. The first function constructs the resource pool and the second function uses it. Most Persistent SQL code will have this shape:
withSqlPool (\logFunc -> do
conn <- makeConnection connectionString
return SqlBackend { ... , connLogFunc = logFunc })
numberOfOpenConnections
(\pool -> do
runSqlPool (insertBy myData) pool
runSqlPool (anotherTransaction moreData) pool)
In fact, if you're using persistent-postgresql, the above is simply the expanded form of
withPostgresqlPool connectionString
numberOfOpenConnections
(\pool -> do
runSqlPool (insertBy myData) pool
runSqlPool (anotherTransaction moreData) pool)
But wait! We can't quite execute this as an IO action yet. MonadIO m, MonadBaseControl IO m, MonadLogger m are our constraints and it's that third one that we have to discharge:
main :: IO ()
main =
runStdoutLoggingT $
withPostgresqlPool connectionString
numberOfOpenConnections
(\pool -> do
runSqlPool (insertBy myData) pool
runSqlPool (anotherTransaction moreData) pool
return ())
When the third constraints disappears, we're able to unify IO () with (MonadIO m, MonadBaseControl IO m) => m () by realizing m ~ IO.
It's now, at this stage, that we're able to insert our filterLogger – right before the constraint is discharged with runStdoutLoggingT:
main :: IO ()
main =
runStdoutLoggingT . filterLogger (\_ _ -> False) $
withPostgresqlPool connectionString
numberOfOpenConnections
(\pool -> do
runSqlPool (insertBy myData) pool
runSqlPool (anotherTransaction moreData) pool
return ())
Overall, a mess created by bad naming and the underwhelmingly documented Database.Persist.Sql module.
Let's underline the point: runSqlPool simply inherits the logging behavior from the MonadLogger constraint generated by withSqlPool. It is only at the withSqlPool level that we're able to insert the desired filterLogger call.

eunit: How to test a simple process?

I'm currently writing a test for a module that runs in a simple process started with spawn_link(?MODULE, init, [self()]).
In my eunit tests, I have a setup and teardown function defined and a set of test generators.
all_tests_test_() ->
{inorder, {
foreach,
fun setup/0,
fun teardown/1,
[
fun my_test/1
]}
}.
The setup fun creates the process-under-test:
setup() ->
{ok, Pid} = protocol:start_link(),
process_flag(trap_exit,true),
error_logger:info_msg("[~p] Setting up process ~p~n", [self(), Pid]),
Pid.
The test looks like this:
my_test(Pid) ->
[ fun() ->
error_logger:info_msg("[~p] Sending to ~p~n", [self(), Pid]),
Pid ! something,
receive
Msg -> ?assertMatch(expected_result, Msg)
after
500 -> ?assert(false)
end
end ].
Most of my modules are gen_server but for this I figured it'll be easier without all gen_server boilerplate code...
The output from the test looks like this:
=INFO REPORT==== 31-Mar-2014::21:20:12 ===
[<0.117.0>] Setting up process <0.122.0>
=INFO REPORT==== 31-Mar-2014::21:20:12 ===
[<0.124.0>] Sending to <0.122.0>
=INFO REPORT==== 31-Mar-2014::21:20:12 ===
[<0.122.0>] Sending expected_result to <0.117.0>
protocol_test: my_test...*failed*
in function protocol_test:'-my_test/1-fun-0-'/0 (test/protocol_test.erl, line 37)
**error:{assertion_failed,[{module,protocol_test},
{line,37},
{expression,"false"},
{expected,true},
{value,false}]}
From the Pids you can see that whatever process was running setup (117) was not the same that was running the test case (124). The process under test however is the same (122). This results in a failing test case because the receive never gets the message und runs into the timeout.
Is that the expected behaviour that a new process gets spawned by eunit to run the test case?
An generally, is there a better way to test a process or other asynchronous behaviour (like casts)? Or would you suggest to always use gen_server to have a synchronous interface?
Thanks!
[EDIT]
To clarify, how protocol knows about the process, this is the start_link/0 fun:
start_link() ->
Pid = spawn_link(?MODULE, init, [self()]),
{ok, Pid}.
The protocol ist tightly linked to the caller. If the either of them crashes I want the other one to die as well. I know I could use gen_server and supervisors and actually it did that in parts of the application, but for this module, I thought it was a bit over the top.
did you try:
all_tests_test_() ->
{inorder, {
foreach,
local,
fun setup/0,
fun teardown/1,
[
fun my_test/1
]}
}.
From the doc, it seems to be what you need.
simple solution
Just like in Pascal answer, adding the local flag to test description might solve some your problem, but it will probably cause you some additional problems in future, especially when you link yourself to created process.
testing processes
General practice in Erlang is that while process abstraction is crucial for writing (designing and thinking about) programs, it is not something that you would expose to user of your code (even if it is you). Instead expecting someone to send you message with proper data, you wrap it in function call
get_me_some_expected_result(Pid) ->
Pid ! something,
receive
Msg ->
Msg
after 500
timeouted
end
and then test this function rather than receiving something "by hand".
To distinguish real timeout from received timeouted atom, one can use some pattern matching, and let it fail in case of error
get_me_some_expected_result(Pid) ->
Pid ! something,
receive
Msg ->
{ok, Msg}
after 500
timeouted
end
in_my_test() ->
{ok, ValueToBeTested} = get_me_some_expected_result().
In addition, since your process could receive many different messages in meantime, you can make sure that you receive what you think you receive with little pattern-matching and local reference
get_me_some_expected_result(Pid) ->
Ref = make_ref(),
Pid ! {something, Ref},
receive
{Ref, Msg} ->
{ok, Msg}
after 500
timeouted
end
And now receive will ignore (leave for leter) all messages that will not have same Reg that you send to your process.
major concern
One thing that I do not really understand, is how does process you are testing know where to send back received message? Only logical solution would be getting pid of it's creator during initialization (call to self/0 inside protocol:start_link/0 function). But then our new process can communicate only with it's creator, which might not be something you expect, and which is not how tests are run.
So simplest solution would be sending "return address" with each call; which again could be done in our wrapping function.
get_me_some_expected_result(Pid) ->
Ref = make_ref(),
Pid ! {something, Ref, self()},
receive
{Ref, Msg} ->
{ok, Msg}
after 500
timeouted
end
Again, anyone who will use this get_me_some_expected_result/1 function will not have to worry about message passing, and testing such functions makes thing extremely easier.
Hope this helps at least a little.
Maybe it's simply because you are using the foreach EUnit fixture in place of the setup one.
There, try the setup fixture: the one that uses {setup, Setup, Cleanup, Tests} instead of {inorder, {foreach, …}}

Twitter stream api with agents in F#

From Don Syme blog (http://blogs.msdn.com/b/dsyme/archive/2010/01/10/async-and-parallel-design-patterns-in-f-reporting-progress-with-events-plus-twitter-sample.aspx) I tried to implement a twitter stream listener. My goal is to follow the guidance of the twitter api documentation which says "that tweets should often be saved or queued before processing when building a high-reliability system".
So my code needs to have two components:
A queue that piles up and processes each status/tweet json
Something to read the twitter stream that dumps to the queue the tweet in json strings
I choose the following:
An agent to which I post each tweet, that decodes the json, and dumps it to database
A simple http webrequest
I also would like to dump into a text file any error from inserting in the database. ( I will probably switch to a supervisor agent for all the errors).
Two problems:
is my strategy here any good ? If I understand correctly, the agent behaves like a smart queue and processes its messages asynchronously ( if it has 10 guys on its queue it will process a bunch of them at time, instead of waiting for the 1 st one to finish then the 2nd etc...), correct ?
According to Don Syme's post everything before the while is Isolated so the StreamWriter and the database dump are Isolated. But because I need this, I never close my database connection... ?
The code looks something like:
let dumpToDatabase databaseName =
//opens databse connection
fun tweet -> inserts tweet in database
type Agent<'T> = MailboxProcessor<'T>
let agentDump =
Agent.Start(fun (inbox: MailboxProcessor<string>) ->
async{
use w2 = new StreamWriter(#"\Errors.txt")
let dumpError =fun (error:string) -> w2.WriteLine( error )
let dumpTweet = dumpToDatabase "stream"
while true do
let! msg = inbox.Receive()
try
let tw = decode msg
dumpTweet tw
with
| :? MySql.Data.MySqlClient.MySqlException as ex ->
dumpError (msg+ex.ToString() )
| _ as ex -> ()
}
)
let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
let parameters = "track=RT&"
let stream_url = filter_url
let stream = twitterStream MyCredentials stream_url parameters
while true do
agentDump.Post(stream.ReadLine())
Thanks a lot !
Edit of code with processor agent:
let dumpToDatabase (tweets:tweet list)=
bulk insert of tweets in database
let agentProcessor =
Agent.Start(fun (inbox: MailboxProcessor<string list>) ->
async{
while true do
let! msg = inbox.Receive()
try
msg
|> List.map(decode)
|> dumpToDatabase
with
| _ as ex -> Console.WriteLine("Processor "+ex.ToString()))
}
)
let agentDump =
Agent.Start(fun (inbox: MailboxProcessor<string>) ->
let rec loop messageList count = async{
try
let! newMsg = inbox.Receive()
let newMsgList = newMsg::messageList
if count = 10 then
agentProcessor.Post( newMsgList )
return! loop [] 0
else
return! loop newMsgList (count+1)
with
| _ as ex -> Console.WriteLine("Dump "+ex.ToString())
}
loop [] 0)
let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
let parameters = "track=RT&"
let stream_url = filter_url
let stream = twitterStream MyCredentials stream_url parameters
while true do
agentDump.Post(stream.ReadLine())
I think that the best way to describe agent is that it is is a running process that keeps some state and can communicate with other agents (or web pages or database). When writing agent-based application, you can often use multiple agents that send messages to each other.
I think that the idea to create an agent that reads tweets from the web and stores them in a database is a good choice (though you could also keep the tweets in memory as the state of the agent).
I wouldn't keep the database connection open all the time - MSSQL (and MySQL likely too) implements connection pooling, so it will not close the connection automatically when you release it. This means that it is safer and similarly efficient to reopen the connection each time you need to write data to the database.
Unless you expect to receive a large number of error messages, I would probably do the same for file stream as well (when writing, you can open it, so that new content is added to the end).
The way queue of F# agents work is that it processes messages one by one (in your example, you're waiting for a message using inbox.Receive(). When the queue contains multiple messages, you'll get them one by one (in a loop).
If you wanted to process multiple messages at once, you could write an agent that waits for, say, 10 messages and then sends them as a list to another agent (which would then perform bulk-processing).
You can also specify timeout parameter to the Receive method, so you could wait for at most 10 messages as long as they all arrive within one second - this way, you can quite elegantly implement bulk processing that doesn't hold messages for a long time.