I need to merge events coming from 2 different event sourcing systems handled by Akka.Net Persistence module. The merge must sort events based on their timestamp, and I found the MergeSorted operator in Akka.Stream that does exactly what I need (tried with 2 list of numbers - for events I wrote a custom EventEnvelopComparer).
In my solution I have an actor system (readsystem1) to read from db1, and a second actor system (readysystem2) to read from db2, both created passing the right connection string to the db (a PostGres db).
The problem is: when I use the MergeSorted operator, I need to pass an instance of ActorMaterializer and if the actor materializer is created in the readsystem1 actor system then only the events from db1 are loaded (and merged with themselves); the opposite if I create the actor materializer in the readsystem2. I need to load them both.
Here is an example of the code (writing timestamps to a file, just to test them):
var actorMaterializer1 = ActorMaterializer.Create(
readSystem1,
ActorMaterializerSettings.Create(readSystem1).WithDebugLogging(true));
var readJournal1 = PersistenceQuery.Get(readSystem1)
.ReadJournalFor<SqlReadJournal>(SqlReadJournal.Identifier);
var source1 = readJournal1.CurrentEventsByPersistenceId("mypersistenceId", 0L, long.MaxValue);
await source1
.Select(x => ByteString.FromString($"{x.Timestamp.ToString()}{Environment.NewLine}"))
.RunWith(FileIO.ToFile(new FileInfo("c:\\tmp\\timestamps1.txt")), actorMaterializer1);
// just creating the materializer changes the events loaded by the source!!!
var actorMaterializer2 = ActorMaterializer.Create(
readSystem2,
ActorMaterializerSettings.Create(readSystem1).WithDebugLogging(true));
var readJournal2 = PersistenceQuery.Get(readSystem2)
.ReadJournalFor<SqlReadJournal>(SqlReadJournal.Identifier);
var source2 = readJournal2.CurrentEventsByPersistenceId("mypersistenceId", 0L, long.MaxValue);
await source2
.Select(x => ByteString.FromString($"{x.Timestamp.ToString()}{Environment.NewLine}"))
.RunWith(FileIO.ToFile(new FileInfo("c:\\tmp\\timestamps2.txt")), actorMaterializer2);
// RunWith receives actorMaterializer1, so only events coming from db1
// will be loaded and merged with themselves
var source = source1.MergeSorted(source2, new EventEnvelopComparer());
await source
.Select(x => ByteString.FromString($"{x.Timestamp.ToString()}{Environment.NewLine}"))
.RunWith(FileIO.ToFile(new FileInfo("c:\\tmp\\timestamps.txt")), actorMaterializer1);
How can I accomplish this? Is it possible to read 2 different event sourcing table from the same actor system, in the same or in different db? Is there something about the ActorMaterializer that can solve my problem? Is my approach completely wrong?
To use events from two different ActorSystems I think you'd need to use StreamRefs. But what you could do here is configure two ReadJournalIds, each pointing to a different *.db file. That way you can use one ActorSystem and materializer.
var source1 = PersistenceQuery.Get(actorSystem).ReadJournalFor<SqlReadJournal>("read-journal-1")
.CurrentEventsByPersistenceId("sample-id-1", 0L, long.MaxValue);
var source2 = PersistenceQuery.Get(actorSystem).ReadJournalFor<SqlReadJournal>("read-journal-2")
.CurrentEventsByPersistenceId("sample-id-1", 0L, long.MaxValue);
var source = source1.MergeSorted(source2, new EventEnvelopComparer())
.RunForeach(x => System.Console.WriteLine($"EVENT: {x.Timestamp}"), actorSystem.Materializer());
I think I see what's going on here....
Here's why your choice of Materializer is creating an issue for you - the Materializer is going to compile your Akka.Persistence.Query + Akka.Streams graphs into actors. When you use the Materializer from ActorSystem A - it's going to materialize the actors into that ActorSystem and use its Journal implementation for Akka.Persistence - that's why you only get 1 events from 1 ActorSystem.
However, it looks like you're doing exactly what I would do - pre-materializing each source before merging them together... Would it be possible to create a reproduction of this on GitHub using a dummy SQLite database or some such? If so I'd be happy to debug it.
Related
We are using a pretty simple flow where messages are retrieved from PubSub, their JSON content is being flatten into two types (for BigQuery and Postgres) and then inserted into both sinks.
But, we are seeing duplicates in both sinks (Postgres was kinda fixed with a unique constraint and a "ON CONFLICT... DO NOTHING").
At first we trusted in the supposedly "insertId" UUId that the Apache Beam/BigQuery creates.
Then we add a "unique_label" attribute to each message before queueing them into PubSub, using data from the JSON itself, which gives them uniqueness (a device_id + a reading's timestamp). And subscribed to the topic using that attribute with "withIdAttribute" method.
Finally we paid for GCP Support, and their "solutions" do not work. They have told us to even use Reshuffle transform, which is deprecated by the way, and some windowing (that we do not won't since we want near-real time data).
This the main flow, pretty basic:
[UPDATED WITH LAST CODE]
Pipeline
val options = PipelineOptionsFactory.fromArgs(*args).withValidation().`as`(OptionArgs::class.java)
val pipeline = Pipeline.create(options)
var mappings = ""
// Value only available at runtime
if (options.schemaFile.isAccessible){
mappings = readCloudFile(options.schemaFile.get())
}
val tableRowMapper = ReadingToTableRowMapper(mappings)
val postgresMapper = ReadingToPostgresMapper(mappings)
val pubsubMessages =
pipeline
.apply("ReadPubSubMessages",
PubsubIO
.readMessagesWithAttributes()
.withIdAttribute("id_label")
.fromTopic(options.pubSubInput))
pubsubMessages
.apply("AckPubSubMessages", ParDo.of(object: DoFn<PubsubMessage, String>() {
#ProcessElement
fun processElement(context: ProcessContext) {
LOG.info("Processing readings: " + context.element().attributeMap["id_label"])
context.output("")
}
}))
val disarmedMessages =
pubsubMessages
.apply("DisarmedPubSubMessages",
DisarmPubsubMessage(tableRowMapper, postgresMapper)
)
disarmedMessages
.get(TupleTags.readingErrorTag)
.apply("LogDisarmedErrors", ParDo.of(object: DoFn<String, String>() {
#ProcessElement
fun processElement(context: ProcessContext) {
LOG.info(context.element())
context.output("")
}
}))
disarmedMessages
.get(TupleTags.tableRowTag)
.apply("WriteToBigQuery",
BigQueryIO
.writeTableRows()
.withoutValidation()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry())
.to(options.bigQueryOutput)
)
pipeline.run()
DissarmPubsubMessage is a PTransforms that uses FlatMapElements transform to get TableRow and ReadingsInputFlatten (own class for Postgres)
We expect zero duplicates or the "best effort" (and we append some cleaning cron job), we paid for these products to run statistics and bigdata analysis...
[UPDATE 1]
I even append a new simple transform that logs our unique attribute through a ParDo which supposedly should ack the PubsubMessage, but this is not the case:
new flow with AckPubSubMessages step
Thanks!!
Looks like you are using the global window. One technique would be to window this into an N minute window. Then process the keys in the window and drop an items with dup keys.
The supported programming languages are Python and Java, your code seems to be Scala and as far as I know it is not supported. I strongly recommend using Java to avoid any unsupported feature for the programming language you use.
In addition, I would recommend the following approaches to work on duplicates, the option 2 could meet your need of near-real-time:
message_id. Probably you already read the FAQ - duplicates which points to deprecated doc. However, if you check the PubsubMessage object you will notice that messageId is still available and it will be populated if not set by the publisher:
"ID of this message, assigned by the server when the message is
published ... It must not be populated by the publisher in a
topics.publish call"
BigQuery Streaming. To validate duplicate during loading the data, right before inserting in BQ you can create UUID.Please refer the section Example sink: Google BigQuery.
Try the Dataflow template PubSubToBigQuery and validate there are not duplicates in BQ.
I have created 2 SQL models in Google App Maker. For simplicity sake lets say Model 1 has all of the information that can be added and edited for each of the records. Model 2 works as a storage model where once a record in Model 1 is removed it moves over to Model 2. The idea is that the individual can click on a "removed" boolean which will open a dialog page to add in comments for the removal and once complete the record will be moved to Model 2 for storage and will no longer be visible in Model 1.
Is there any way to do this? If you need more information let me know and I will try to provide it but the reason I cannot post the existing app is because the information is confidential.
Thanks for you help!
Updated answer: move to another model
If you want to enforce users to enter a message, you need to forbid them to delete records through datasources:
// onBeforeDelete model event
throw new Error('You should provide message prior deleting a record');
Then you need to implement audit itself:
// server script
function archive(itemKey, message) {
if (!message) {
throw new Error('Message is required');
}
var record = app.models.MyModel.getRecord(itemKey);
if (!record) {
throw new Error('Record was not found');
}
var archive = app.models.Removed.newRecord();
archive.Field1 = record.Field1;
archive.Field2 = record.Field2;
...
archive.Message = message;
app.saveRecords([archive]);
app.deleteRecords([record]);
}
// client script
google.script.run
.withSuccessHandler(function() {
// TODO
})
.withFailureHandler(function() {
// TODO
})
.archive(itemKey, message);
If you need to implement auditing for multiple/all models then you can generalize the snippet by passing model's name and using Model Metadata: funciton archive(modelName, itemKey, message) {}
Original answer: move to another DB
Normally I would recommend just to add and set a boolean field Deleted to the model and ensure that records marked as deleted are not sent to the client. Implementation of moving data between databases could be tricky since transactions are not supported across multiple databases.
If you desperately want to make your app more complex and less reliable you can create record's backup in onBeforeDelete model event using JDBC Apps Script service (External Database Sample could be your friend to start with):
// onBeforeDelete model event
var connection = Jdbc.getConnection(dbUrl, user, userPassword);
var statement = connection.prepareStatement('INSERT INTO ' + TABLE_NAME +
' (Field1, Field2, ...) values (?, ?, ...)');
statement.setString(1, record.Field1);
statement.setString(2, record.Field2);
...
statement.execute();
Why do you need JDBC? Because App Maker natively doesn't support models attached to different databases.
I was able to do what I needed using a query filter as a client script This keeps the data on the back end when i export and only shows the active user whatever is not removed.
var datasource1 = app.datasources.WatchList_Data;
datasource1.query.filters.Remove_from_WatchList._equals = 'No';
datasource1.load();
I have a WCF Data Service (OData) that serves as the data repository for a larger system. I'm trying to fire off specific methods based on operations on Entities in the repository.
Specifically, if someone changes a Message record, I want to hook into the pipeline. I'm using ChangeInterceptors for this.
They work for Add and Delete. However, nothing fires when an entity is updated. I am concerned that the DbContext can not resolve the fact that the entity has changed, since the request is stateless.
This does not trigger the handler:
var whatever = from m in Messages
where m.MessageKey == 3
select m;
whatever.First().UpdatedDate = DateTime.Now;
this.SaveChanges();
Has anyone else faced this problem?
So, I was trying to use AttachTo() to handle the fact that my record was detached. This flat out didn't work, and led to runtime exceptions like the following:
This operation requires the entity be of an Entity Type, and has at least one key
property.Parameter name: entity
At any rate, just use the update method and the change will be intercepted (and actually applied)
var whatever = (from m in Messages where m.MessageKey == 1
select m ).Single();
whatever.UpdatedDate = DateTime.Now;
this.UpdateObject(whatever);
this.SaveChanges();
Maybe this can be done without StreamInsight, but I'm curious.
I have an application that is populating a table with "messages" (inserts a row in the table).
I want to create a monitoring application that monitors this table for the rate at which messages are "arriving", and how quickly they are "processed" (flag gets updated).
As this is a vendors application, I don't want to drop in a trigger or anything. But I can query the db and the table has a PK using an identity column.
How can I get to a hopping window query? I would love to show a line graph for the say the past 30 minutes showing the rate of messages coming in, and the rate at which the messages are process.ed.
Depending on what information is captured in this table of messages, I think you could probably do this faster by just running a SQL query.
If you are still wanting to use StreamInsight to do this, here's some code to get you started.
var app = Application;
var interval = TimeSpan.FromSeconds(1);
var windowSize = TimeSpan.FromSeconds(10);
var hopSize = TimeSpan.FromSeconds(1);
/* Replace the Observable.Interval with your logic to poll the database and
convert the messages to instances of TPayload. It just needs to be a class
that implements the IObservable<TPayload> interface. */
var observable = app.DefineObservable(()=> Observable.Interval(interval));
// Convert the observable to a point streamable.
var streamable = observable.ToPointStreamable(
e=> PointEvent.CreateInsert(DateTimeOffset.Now, e),
AdvanceTimeSettings.IncreasingStartTime);
/* Using the streamable from the step before, write your actual LINQ queries
to do the analytics you want. */
var query = from win in streamable.HoppingWindow(windowSize, hopSize)
select new Payload{
Timestamp = DateTime.UtcNow,
Value = win.Count()
};
/* Create a sink to output your events (WCF, etc). It just needs to be a
class that implements the IObserver<TPayload> interface. The
implementation is highly dependent on your needs. */
var observer = app.DefineObserver(()=> Observer.Create<Payload>(e => e.Dump()));
query.Bind(observer).Run();
In the TestFixtureTearDown-part of an NUnit test I try to delete some test-entities created in the TestFixtureSetUp-part. I use the following code
sessionFactory = NHibernateHelper.CreateSessionFactory(cssc["DefaultTestConnectionString"].ConnectionString);
uow = new NHibernateUnitOfWork(sessionFactory);
var g = reposGebruiker.GetByName(gebruiker.GebruikerNaam);
reposGebruiker.Delete(g);
var k = reposKlant.GetByName(klant.Naam);
reposKlant.Delete(k);
// Commit changes to persistant storage
uow.Commit();
However, after the commit, the two entities were still in the database. After searching on I came across this page on SO and so I added:
uow.Session.Flush();
However, still the entities remain in the DB. Does anyone have an idea as to why this is?
I've never used the UoW class you're using, but my projects are implemented using ISession.BeginTransaction and ISession.Transaction.Commit in a helper like this:
public void CreateContext(Action logic)
{
ISession.BeginTransaction();
logic();
ISession.Transaction.Commit();
}
And then:
CreateContext(() =>
Session.Delete(someObject));
This should work.
I want to mention that this is an example, and you'd want to make some abstractions.
How are the repositories created? In for the delete to succeed, the objects must be loaded in the same UoW (ISession) in which the Delete command is issued. The Delete method makes the objects non-persistent and marks them for deletion.