I have a question regarding the suggested implementation that is in binance documentation. The guidelines are avaliable on the link:
How to manage a local order book correctly
If I need a constant stream of #depth data, why do I need first four steps they suggest. Why would I buffer the stream first and then take snapshot just to determine which data to throw away and then continue listening to stream? I don't understand the logical need for those steps if they are even needed for my use case (which is tracking the real time order book data)
If you take a snapshot and then start listening to the stream you may miss an event
between getting the snapshot and starting the stream. This'll mean your local order book will be invalid (and you definitely don't want this in a trading application).
The idea behind taking the snapshot after is that you are guaranteed to have all the events after your snapshot. A side effect of this approach is that you may also have some from before your snapshot. So you can discard the few (if any) you don't need based on their lastUpdateId.
I'm not sure what language you're using to manage one but if you want a java implementation let me know and i'll push mine to github so you can use it.
Related
I am working on implementing a microservice architecture using the CQRS pattern. I have a working implementation using API Gateway, Lambda and DynamoDB with one exception - the event sourcing.
Event Sourcing has the applications publishing a notification to an event stream that other services in the platform can consume. This notification represents an event that took place as part of the originating HTTP request. For instance, if the user makes a HTTP POST with a complete "check patient into hospital" model then the Lambda will break that apart and publish multiple events in sequential order.
Patient Checked in (includes Patient Id, hospital id + visit id)
Room Assigned (includes room number, + visit id)
Patient tested (includes tested + visit id)
Patient checked-out (visit id)
The intent for this pattern is to provide an audit trail of all events that took place while the patient was in the hospital. This example (not what I'm actually building) would be stored in an event source that can be replayed at any time. If the VisitId was deleted across all services we could just replay the events one at a time, in order, and reproduce an exact copy of the original record. You consider all records immutable to achieve this. Each POST would push into the event source and then land in the database that would pull the data out during a HTTP GET request. It would also have subscribers that would take pieces of this data and do other things - such as a "Visit Survey" service that would listen to the Patient Checked Out event and prep a post-op survey.
I've looked at several AWS services to provide this. I know about Kinesis Data Streams but I don't like the pricing structure nor do I want to deal with shards (no autoscaling). Since my entire platform is built on consumption based pricing (Dynamo, Lambda etc) I want to keep my event source the same way. This makes it easier for me to estimate a per-user cost as I just do math based on estimated requests per month, per user.
I've been using SNS for the stream itself, delivering the notifications, and it's been great. Super fast and not had any major issues while developing it. The issue though is that this is not suitable for a replay store - only delivery of the event messages. For a replay store I thought Kinesis Firehose made a lot of sense... Send it to S3 + SNS at the same time. Turns out SNS isn't a delivery destination available. I can Put to S3 myself and then publish to SNS but that seems like duplicate work in the code base when I can setup an S3 trigger to fire a Lambda and just have another small Lambda that reacts to the Event landing in S3 and do the insert into the DynamoDB. I've seen that this can be much slower though than just publishing through SNS. I'm also not sure about retry policies on the Put event. This simplifies retries though as I can just re-use the code in the triggered Lambda to replay all events in a bucket path.
I could just PutObject and then Publish to SNS within the same HTTP POST Lambda. If the SNS Publish fails though then I now have an object in S3 that was never published. I'd have to write a different Lambda to handle the fixing and publishing. Not the end of the world - either-way I have two Lambdas to deploy. I'm just not sure which way makes more sense in this pattern with AWS services.
Has anyone done something similar and have any recommendations? Am I working my way into a technical hole that will be difficult to manage later? I'm open to other paths as well if I can keep it to a consumption based pricing model. Thanks!
Event Sourcing has the applications publishing a notification to an event stream that other services in the platform can consume.
You'll want to be a little bit careful here -- there are at least two different definitions of "event sourcing" running around.
If you care about event sourcing, in the sense usually coupled with CQRS (Greg Young, et al), then your events are your book of record. The important complication this introduces is that your service needs to be able to lock the "event stream" when making changes to it (without that lock, you run into "lost edit" scenarios and have to clean up the mess).
So the "pointer to your current changes" needs to live in something that has transactions. DynamoDB should be fine for this (based on my memory of the event sourcing break out room at re:Invent 2017). In theory, you could have the lock in dynamo, which contains a pointer to an immutable document stored in S3. I haven't been able to persuade myself that the trade offs justify the complexity, but as best I can tell there's nothing in that architecture that violates physics and causality.
If your operations team isn't happy with Dynamo, another reasonable option is RDS; choose your preferred relational data engine, deploy an event storage schema to it, and off you go.
As for the pub sub part, I believe you to be on the right track with SNS. It's the right choice for "fanning out" messages from a publisher to multiple consumers. Yes, it doesn't support replay, but that's fine -- replay can happen by pulling events from the book of record. See the later parts of Greg Young's Polyglot Data talk. Yes, sometimes you will get messages on both the push channel and the pull channel, but that's fine; you already signed up for idempotent message handling when you decided a distributed architecture was a good idea.
Edit
Why the need to store a pointer in DynamoDB?
Because S3 doesn't offer you any locking; which means that on the unhappy path, where two copies of your logic are trying to write different versions of your data, you end up victim to the lost edit problem.
You could manage the situation with optimistic locking - something analogous to HTTP's conditional PUT; but S3 (last time I checked) doesn't support conditional modification.
You could use S3 as an object store for immutable documents, but now you need some mechanism to determine which document in S3 is the "current" one. If you try to implement that in S3, you run into the same lost edit problem all over again.
So you need a different tool to handle that part of the problem; some tool that is suitable for "state succession". So DynamoDB fits there.
If you are using DynamoDB for locking, can you also use it for event storage? I don't have enough laps to feel confident that I know the answer there. For small problems, I'm mostly confident that the answer is yes. For large problems...?
Possibly useful discussions:
Rich Hickey; The Language of the System
Kenneth Truyers; Git as a NoSql Database
I have encountered a problem, that Google notifications quite often do not arrive without any apparent reason. This makes them almost unusable for me, since it seems like they appear just in 60% of cases.
Is this common? Should I stop relying on them and set up a one minute scheduler for syncing event insted?
Thanks for your opinion
AFAIK, notification is not 100% reliable also stated in the document. As an alternative you could use Incremental sync:
Incremental sync allows you to retrieve all the resources that have been modified since the last sync request. To do this, you need to perform a list request with your most recent sync token specified in the syncToken field. Keep in mind that the result will always contain deleted entries, so that the clients get the chance to remove them from storage.
Hope this helps.
Tamper data
There is terrible thing called Tamper Data. It receives all POST'ing data from FLASH to PHP and give ability for user to change values.
Imagine that in flash game (written in ActionScript 3) are score points and time. After match completed score and time variables are sending to PHP and inserting to database.
But user can easy change values with Tamper Data after match completed. So changed values will be inserted to database.
My idea seems that won't work
I had idea to update data in database on every change? I mean If player get +10 score points I need instant to write It to database. But how about time? I need update my table in database every milisecond? Is that protection solution at all? If user can change POST data he can change It everytime also last time when game completed.
So how to avoid 3rd party software like Tamper Data?
Tokens. I've read article about Tokens, there is talking about how to create random string as token and compare It with database, but It's not detailed and I don't have idea how to realise It. Is that good idea? If yes, maybe someone how to realise It practically?
According to me is better way to send both parameter and value in encrypted format like score=12 send like c2NvcmU9MTI= which is base64
function encrypt($str)
{
$s = strtr(base64_encode(mcrypt_encrypt(MCRYPT_RIJNDAEL_256, md5(SALTKEY), serialize($str), MCRYPT_MODE_CBC, md5(md5(SALTKEY)))), '+/=', '-_,');
return $s;
}
function decrypt($str)
{
$s = unserialize(rtrim(mcrypt_decrypt(MCRYPT_RIJNDAEL_256, md5(SALTKEY), base64_decode(strtr($str, '-_,', '+/=')), MCRYPT_MODE_CBC, md5(md5(SALTKEY))), "\0"));
return $s;
}
In general, there is no way to protect the content generated in Flash and sent to server.
Even if you encrypt the data with a secret key, both the key and the encryption algorithm are contained in the swf file and can be decompiled. It is a bit more harder than simply faking the data so it is kind of usable solution but it will not always help.
To have full security, you need to run all game simulation on the server. For example, if player jumped and catched a coin, Flash does not send "score +10" to the server. Instead, it sends player coordinates and speed, and server does the check: where is the coin, where is the player, what is player's speed and can the player get the coin or not.
If you cannot run the full simulation on the server, you can do a partial check by sending data to server at some intervals.
First, never send a "final" score or any other score. It is very easy to fake. Instead, send an event every time the player does something that changes his score.
For example, every time player catches a coin, you send this event to the server. You may not track player coordinates or coin coordinates, but you know that the level contains only 10 coins. So player cannot catch more than 10 coins anyway. Also, player can't catch coins too fast because you know the minimum distance between coins and the maximum player speed.
You should not write the data to database each time you receive it. Instead you need to keep each player's data in memory and change it there. You can use a noSQL database for that, for example Redis.
First, cheaters will always cheat. There's really no easy solution (or difficult one) to completely prevent it. There are lots of articles on the great lengths developers have gone to discourage cheating, yet it is still rampant in nearly every game with any popularity.
That said, here are a few suggestions to hopefully discourage cheating:
Encrypt your data. This is not unbeatable, but will discourage many lazy hackers since they can't just tamper with plain http traffic, they first have to find your encryption keys. Check out as3corelib for AS3 encryption.
Obfuscate your SWFs. There are a few tools out there to do this for you. Again, this isn't unbeatable, but it is an easy way to make it harder for cheaters to find your encryption keys.
Move all your timing logic to the server. Instead of your client telling the server about time, tell the server about actions like "GAME_STARTED" and "SCORED_POINTS". The server then tracks the user's time and calculates the final score. The important thing here is that the client does not tell the server anything related to time, but merely the action taken and the server uses its own time.
If you can establish any rules about maximum possible performance (for example 10 points per second) you can detect some types of cheating on the server. For example, if you receive SCORED_POINTS=100 but the maximum is 10, you have a cheater. Or, if you receive SCORED_POINTS=10, then SCORE_POINTS=10 a few milliseconds later, and again a few milliseconds later, you probably have a cheater. Be careful with this, and know that it's a back and forth battle. Cheaters will always come up with clever ways to get around your detection logic, and you don't want your detection logic to be so strict that you accidentally reject an honest player (perhaps a really skilled player who is out-performing what you initially thought possible).
When you detect a cheater, "honey pot" them. Don't tell them they are cheating, as this will only encourage them to find ways to avoid detection.
We have a situation where there are 2 modules, with one having a publisher and the other subscriber. The publisher is going to publish some samples using key attributes. Is it possible for the publisher to prevent the subscriber from reading certain samples? This case would arise when the module with the publisher is currently updating the sample, which it does not want anybody else to read till it is done. Something like a mutex.
We are planning on using Opensplice DDS but please give your inputs even if they are not specific to Opensplice.
Thanks.
RTI Connext DDS supplies an option to coordinate writes (in the documentation as "coherent write", see Section 6.3.10, and the PRESENTATION QoS.
myPublisher->begin_coherent_changes();
// (writers in that publisher do their writes) /* data captured at publisher */
myPublisher->end_coherent_changes(); /* all writes now leave */
Regards,
rip
If I understand your question properly, then there is no native DDS mechanism to achieve what you are looking for. You wrote:
This case would arise when the module with the publisher is currently updating the sample, which it does not want anybody else to read till it is done. Something like a mutex.
There is no such thing as a "global mutex" in DDS.
However, I suspect you can achieve your goal by adding some information to the data-model and adjust your application logics. For example, you could add an enumeration field to your data; let's say you add a field called status and it can take one of the values CALCULATING or READY.
On the publisher side, in stead of "taking a the mutex", your application could publish a sample with the status value set to CALCULATING. When the calculation is finished, the new sample can be written with the value of status set to READY.
On the subscriber side, you could use a QueryCondition with status=READY as its expression. Read or take actions should only be done through the QueryCondition, using read_w_condition() or take_w_condition(). Whenever the status is not equal to READY, the subscribing side will not see any samples. This approach takes advantage of the mechanism that newer samples overwrite older ones, assuming that your history depth is set to the default value of 1.
If this results in the behaviour that you are looking for, then there are two remaining disadvantages to this approach. First, the application logics get somewhat polluted by the use of the status field and the QueryCondition. This could easily be hidden by an abstraction layer though. It would even be possible to hide it behind some lock/unlock-like interface. The second disadvantage is due to the extra sample going over the wire when setting the status field to CALCULATING. But extra communications can not be avoided anyway if you want to implement a distributed mutex-like functionality. Only if your samples are pretty big and/or high-frequent, this is an issue. In that case, you might have to resort to a dedicated, small Topic for the single purpose of simulating the locking mechanism.
The PRESENTATION Qos is not specific RTI Connext DDS. It is part of the OMG DDS specification. That said the ability to write coherent changes on multiple DataWriters/Topics (as opposed to using a single DataWriter) is part of one of the optional profiles (object model profile), so no all DDS implementations necessariiy support it.
Gerardo
I wish to use Redis to create a system which publishes stock quote data to subscribers in an internal network. The problem is that publishing is not enough, as I need to find a way to implement an atomic "get snapshot and then subscribe" mechanism. I'm pretty new to Redis so I'm not sure my solution is the "proper way".
In a given moment each stock has a book of orders which contains at most 10 bids and 10 asks. The publisher receives data for the exchange and should publish them to subscribers.
While the publishing of changes in the order book can be easily done using publish and subscribe, each subscriber that connects also needs to get the snapshot of the current order book of the stock and only then subscribe to changes in the order book.
As I understand, Redis channel never saves information, so the publisher also needs to maintain the complete order book in a hash key (Or a sorted set. I'm not sure which is more appropriate) in addition to publishing changes.
I also understand that a Redis client cannot issue any commands except subscribing and unsubscribing once it subscribes to the first channel.
So, once the subscriber application is up, it needs first to get the key which contains the complete order book and then subscribe to changes in that book. However, this may result in a race condition. A change in the book order can be made after the client got the key containing the current snapshot but before it actually subscribed to changes, resulting a change which it will never see.
As it is not possible to use subscribe and then use get in a single connection, the client application needs two connections to the Redis server. At this point I started thinking that I'm probably not doing things in the proper way if I need more than one connection in the same application. Anyway, my idea is that the client will have a subscribing connection and a query connection. First, it will use the subscribing connection to subscribe to changes in order book, but still won't not enter the loop which process events. Afterwards, it will use the query connection to get the complete snapshot of the book. Finally, it will enter the loop which process events, but as he actually subscribed before taking the snapshot, it is guaranteed that it will not miss any changed that occurred after the snapshot was taken.
Is there any better way to accomplish my goal?
I hope you found your way already, if not here we goes a personal suggestion:
If you are in javascript land i would recommend having a look on Meteor.js they do somehow achieve the goal you want to achieve, with the default setup you will end up writing to mongodb in order to "update" the GUI for the "end user".
In any case, you might be interested in reading about how meteor's ddp protocol works: https://meteorhacks.com/introduction-to-ddp/ and https://www.meteor.com/ddp