Is it possible to detect the absence of related events based on given timout using azure stream analytics? - azure-stream-analytics

Say you have events telling you when given doors in a building are opened and closed and how long each door is allowed to remained opened before an alarm is triggered.
Something like: (might need a timestamp too on events...)
{ "id": 1, "event":"Opened", "toBeClosedInSeconds":30 }
{ "id": 1, "event": "Closed" }
Is it possible to use Azure stream analytics to identify doors left open for more than the given timeframe? That is, identify the absence of closed before the given timeout passes? And if so, what would such a query look like?

Note: It’s difficult for the stream processing system to know if there are no events, or if events are delayed.
In this article, we discuss how you can make design choices to solve practical time handling problems in the Azure Stream Analytics service. Time handling design decisions are closely related to event ordering factors.
Hope this helps.

Related

Need suggestions: Send multiple images to backend, perform upload operation in backend, send response

I need some best practice guidelines for a backend service in a scenario like this one:
UI sends multiple images for uploading to the backend service
Backend service receives all of the images and processes upload to storage one by one
There can be failure in 1 or multiple image upload
My question is how do I send the response towards UI if my backend service is unable to upload 1 or more file(s).
One way can be to send failed and successful image link together in a JSON response body. So the UI knows about the failure and handles it in its own way.
Another way can be to send only the successfully uploaded images' link which is the best case scenario.
Any suggestions will be welcomed with some reference links.
Use an Orchestrator - something specific that can coordinate multiple actions and provide a meaningful result back to the caller.
This might be as simple as a component sitting in the UI that orchestrates calls to the backend. The UI component and the backend service might be designed as parts of a cohesive solution, or the UI component might simply act as a type of client/proxy/facade to some random backend service.
UI calls the orchestrator with references to all the images it needs uploading.
The orchestrator works through the items, uploading each as you prefer (sequentially or in parallel, etc). For each file, handle errors however you prefer - e.g. try once and die gracefully on failure; put errors into a queue or some other mechanism for retry (how many times is up to you); etc.
Based on rules internal to the orchestrator, return status to the caller.
For potentially long-running processes (like file uploads) make sure the call to the orchestrator is asynchronous.
Rather than only returning "complete" result at the end, the orchestrator might provide a simple status back, allowing callers to get some idea of where processing is at. For example, you might have a call-back (from the orchestrator to it's caller) that simply emits very simple statuses like: processing, failed and complete. A more complex solution would be for the orchestrator to return more specific info like %complete and detailed error info.
Have a look at how the big cloud providers do complex file uploads by reading their documentation and studying their API's.
I need some best practice guidelines for a backend service
In no particular order:
Keep it as simple as possible - generally, the fewer moving parts the better. E.g. pay attention to the Single Responsibility Principle (SRP).
Clean up after yourself. If the upload service generates any data - make sure you have a clean-up process so you don't end up with mountains of un-needed data lying around, especially stuff like image files. If you design an upload solution that maintains state (which is independent of what happens to the images once they are uploaded) then you'll be storing data which probably won't be needed once the images are all processed.
Think about support - not just developer debugging but also operational support. Getting your solution into production is not the end result, it's just the beginning.
If designing this solution across teams (e.g. frontend and backend teams) make sure both teams are involved in the design. If the backend team can't provide a solution that works for the frontend team then it's not going to end well.
Think about the likely error scenarios and how can you handle them.
This isn't really just a question of best practice, as there are multiple ways you could implement it, more than one of which could be valid. This is actually an architecture and design question, with more than one valid answer, hence I don't think it fits as a Stack Overflow question and you will not get references to any one correct approach.
That said, by way of an answer I will outline what I think you need. At a very high level, and not necessarily in this order but taking these factors into account, I would:
Design the UI process flow. For example, you may decide that the user process will have several stages:
User selects first image for upload;
User selects each subsequent image for upload;
User presses some kind of "Go" button after selecting all images;
System now uploads the batch, and user receives a response confirming success or otherwise;
User has option to click through to detailed success/error details.
Design the required success/error reports
Design the data needed to support the overall functionality
Provide one or more APIs giving the upload function and the report function(s) the CRUD access they need to this data
If you hit any specific technical issues at any stage, then please post a new questions accordingly as you go.
As to the point you mentioned, how to send the UI response, there is more than one valid way but I would return a basic success/falure response initially, containing only minimal details such as number of successes, and return more details in further messages in response to user actions (such as clicking through to detailed success/error details), at which point I would retrieve the requested error details from the database.
As I said at the start of my answer, I don't think your question can be answered just in terms of best practices, as it's a whole architecture and design question, but I hope my answer helps you along this path.

Microservices + CQRS implementation

I am working on implementing a microservice architecture using the CQRS pattern. I have a working implementation using API Gateway, Lambda and DynamoDB with one exception - the event sourcing.
Event Sourcing has the applications publishing a notification to an event stream that other services in the platform can consume. This notification represents an event that took place as part of the originating HTTP request. For instance, if the user makes a HTTP POST with a complete "check patient into hospital" model then the Lambda will break that apart and publish multiple events in sequential order.
Patient Checked in (includes Patient Id, hospital id + visit id)
Room Assigned (includes room number, + visit id)
Patient tested (includes tested + visit id)
Patient checked-out (visit id)
The intent for this pattern is to provide an audit trail of all events that took place while the patient was in the hospital. This example (not what I'm actually building) would be stored in an event source that can be replayed at any time. If the VisitId was deleted across all services we could just replay the events one at a time, in order, and reproduce an exact copy of the original record. You consider all records immutable to achieve this. Each POST would push into the event source and then land in the database that would pull the data out during a HTTP GET request. It would also have subscribers that would take pieces of this data and do other things - such as a "Visit Survey" service that would listen to the Patient Checked Out event and prep a post-op survey.
I've looked at several AWS services to provide this. I know about Kinesis Data Streams but I don't like the pricing structure nor do I want to deal with shards (no autoscaling). Since my entire platform is built on consumption based pricing (Dynamo, Lambda etc) I want to keep my event source the same way. This makes it easier for me to estimate a per-user cost as I just do math based on estimated requests per month, per user.
I've been using SNS for the stream itself, delivering the notifications, and it's been great. Super fast and not had any major issues while developing it. The issue though is that this is not suitable for a replay store - only delivery of the event messages. For a replay store I thought Kinesis Firehose made a lot of sense... Send it to S3 + SNS at the same time. Turns out SNS isn't a delivery destination available. I can Put to S3 myself and then publish to SNS but that seems like duplicate work in the code base when I can setup an S3 trigger to fire a Lambda and just have another small Lambda that reacts to the Event landing in S3 and do the insert into the DynamoDB. I've seen that this can be much slower though than just publishing through SNS. I'm also not sure about retry policies on the Put event. This simplifies retries though as I can just re-use the code in the triggered Lambda to replay all events in a bucket path.
I could just PutObject and then Publish to SNS within the same HTTP POST Lambda. If the SNS Publish fails though then I now have an object in S3 that was never published. I'd have to write a different Lambda to handle the fixing and publishing. Not the end of the world - either-way I have two Lambdas to deploy. I'm just not sure which way makes more sense in this pattern with AWS services.
Has anyone done something similar and have any recommendations? Am I working my way into a technical hole that will be difficult to manage later? I'm open to other paths as well if I can keep it to a consumption based pricing model. Thanks!
Event Sourcing has the applications publishing a notification to an event stream that other services in the platform can consume.
You'll want to be a little bit careful here -- there are at least two different definitions of "event sourcing" running around.
If you care about event sourcing, in the sense usually coupled with CQRS (Greg Young, et al), then your events are your book of record. The important complication this introduces is that your service needs to be able to lock the "event stream" when making changes to it (without that lock, you run into "lost edit" scenarios and have to clean up the mess).
So the "pointer to your current changes" needs to live in something that has transactions. DynamoDB should be fine for this (based on my memory of the event sourcing break out room at re:Invent 2017). In theory, you could have the lock in dynamo, which contains a pointer to an immutable document stored in S3. I haven't been able to persuade myself that the trade offs justify the complexity, but as best I can tell there's nothing in that architecture that violates physics and causality.
If your operations team isn't happy with Dynamo, another reasonable option is RDS; choose your preferred relational data engine, deploy an event storage schema to it, and off you go.
As for the pub sub part, I believe you to be on the right track with SNS. It's the right choice for "fanning out" messages from a publisher to multiple consumers. Yes, it doesn't support replay, but that's fine -- replay can happen by pulling events from the book of record. See the later parts of Greg Young's Polyglot Data talk. Yes, sometimes you will get messages on both the push channel and the pull channel, but that's fine; you already signed up for idempotent message handling when you decided a distributed architecture was a good idea.
Edit
Why the need to store a pointer in DynamoDB?
Because S3 doesn't offer you any locking; which means that on the unhappy path, where two copies of your logic are trying to write different versions of your data, you end up victim to the lost edit problem.
You could manage the situation with optimistic locking - something analogous to HTTP's conditional PUT; but S3 (last time I checked) doesn't support conditional modification.
You could use S3 as an object store for immutable documents, but now you need some mechanism to determine which document in S3 is the "current" one. If you try to implement that in S3, you run into the same lost edit problem all over again.
So you need a different tool to handle that part of the problem; some tool that is suitable for "state succession". So DynamoDB fits there.
If you are using DynamoDB for locking, can you also use it for event storage? I don't have enough laps to feel confident that I know the answer there. For small problems, I'm mostly confident that the answer is yes. For large problems...?
Possibly useful discussions:
Rich Hickey; The Language of the System
Kenneth Truyers; Git as a NoSql Database

What are the errors in this BPMN?

I have a BPMN diagram (see below) with some errors that I can't seem to figure out. The diagram depicts the Produce Magazine Article Process, where the writer and Researcher are freelancers who work together to write articles for various publications.
Bigger version: BPMN diagram
There is a bunch of errors here, three of them are logical (two are related), one is BPMN syntax.
Let's start with the syntax.
The message is always a communication between two separate pools s it has to cross pool boundaries. In your case, you have depicted Freelancers as a single pool, so Send information, being between lanes but not pools is a syntax error. Before suggesting a solution though, I will focus on logical errors.
Time event is not used to show the fact that some time goes by between the activities. That is actually something natural in the process It is used to indicate that the flow of time is a trigger of the next action(s). For instance, 7 days after choosing a topic the Publication might contact the Researcher to check on the progress. That would be indicated by timed event. In your case, it seems that the flow continuation is triggered by passing messages so you should indicate it as an Incoming message event. You actually do that in 2 places, one that is obvious (Get article as a "result" of time event) and the second that correlates to a second problem.
The second thing that most probably is a logical question is that since we are talking here about freelancers, most probably Researcher and Writer are two separate entities, not one organisation as your current diagram suggests. If that is the case, you should have them represented as two separate pools. Then your message would be judged, but still rather than "Wait for information" time event you should have "Receive information" incoming message event (that is BTW the starting event for the Writer pool - similarly receiving Article request by Researcher should be handled by Incoming message event).
If you prefer to depict the Freelancer as one "organisation", then you should completely abandon the time event (as again you have used it as an indication of time passing and as I have explained earlier that is not how it should be used). You have a simple flow, where once Researcher finishes their job, it is passed to Writer who carries it over from there. In such case, you should have a simple action flow (solid line) between the actions themselves.
It is also a good practice to be consistent in using End events (and at least recommended - some BPM engines verify that) to always have an End even for every branch of a process. You are missing one or two, depending on how are you going to approach the Freelancers part. Similarly, you should have a Start event for Publication.
Below are the two options shown in the form of diagrams. Note that I also did some minor changes to handle the insufficient information case by Publication. Otherwise, they will be stuck forever waiting for the article to come.
Option with Freelancers as separate pools:
Option with Freelancers considered as a single organisation

Binance order book managment using websocket

I have a question regarding the suggested implementation that is in binance documentation. The guidelines are avaliable on the link:
How to manage a local order book correctly
If I need a constant stream of #depth data, why do I need first four steps they suggest. Why would I buffer the stream first and then take snapshot just to determine which data to throw away and then continue listening to stream? I don't understand the logical need for those steps if they are even needed for my use case (which is tracking the real time order book data)
If you take a snapshot and then start listening to the stream you may miss an event
between getting the snapshot and starting the stream. This'll mean your local order book will be invalid (and you definitely don't want this in a trading application).
The idea behind taking the snapshot after is that you are guaranteed to have all the events after your snapshot. A side effect of this approach is that you may also have some from before your snapshot. So you can discard the few (if any) you don't need based on their lastUpdateId.
I'm not sure what language you're using to manage one but if you want a java implementation let me know and i'll push mine to github so you can use it.

Is Eventual consistency incompatible with user authentication process?

I practice DDD in my project.
Let's assume the boundedcontext IdentityAndAccessContext and MeetingContext.
Both contexts deal with the following terms:
IdentityAndAccessContext has the notion of User class.
MeetingContext has the notion of Participant class. (let's forget Creator for the example).
Participant represents the user in Meeting bounded context.
First, a User has to be created, leading to a UserCreatedEvent.
Then, in order to apply eventual consistency between those bounded contexts, the message is stored in the IdentityAndAccessContext and then sent helps to an event listener and message queuing (still in the IAC context) to the MeetingContext, in order to create automatically the corresponding Participant.
It sounds like a good DDD design (IMO), however I come across an issue with this webapp's workflow:
User is registering through a registration form and he's redirected to the home page.
The home page needs some Participant values...and that's the issue:
The process of eventual consistency might not finish before the redirection to home page, leading to "no values".
How to deal with this case?
Making the user wait before a notification of consistency? Bad UX no?
Inserting the Participant values in the same transaction of the User? ... violating Bounded contexts concept, wouldn't it?
What I would recommend is to design your UI with the eventual consistency in mind. Let's say you owe your ISP $10. You go into your online banking site and perform an EFT. You log onto your ISP account page but your payment does not reflect. In this scenario it sounds almost silly to expect the money to reflect immediately. Eventual consistency is expected and chances are you would either click a 'refresh' button till the funds reflect or simply wait a day or two for the transaction to reflect since that is the expectation.
I don't think that you should ever try to create an interactive system using messaging since it is asynchronous by nature with no real deterministic outcome w.r.t. timing. However, you could track the registration process in the 'source' bounded context and, therefore, know that the message has been sent and report it as such on, say, the participant page; something like: 'Your participation request is in process'.
Then using either some form of polling or server-based push technology you could update the participation page once the participant object is ready.
It could sound overly simplistic but I still think one should aim to design with the uncertainty in mind.
Hope that helps.