Is it expected behaviour for "map" to return a node multiple times? (i.e. duplicates) - gun

I was experimenting with using gun in a server side rendering (SSR) context and noticed that I began to receive duplicate items in the map callback. The duplicate count was n, where n was the number of times I had refreshed the page.
I did some poking around and realised that I was spawning a gun instance for every request to my server. So basically a new peer was being created for every request and therefore map was returning a duplicate of each node for each peer in the network.
Is this expected behaviour?

Yes, by default gun is peer-to-peer (P2P) which means that every peer (even peers connected through other peers) will try to reply to requests.
Why? If you are not running a centralized server (which you can with gun, but you also don't have to), there is no guarantee that if 1 peer replies that they have the latest or all of the data you want.
However, you are correct, that creating a new gun database instance for every server request... is unnecessary. Does this answer the question?
Also note: map subscribes to the table and the items (as they are added) in it. Meaning that map will get called for every item in the table/list (as they are added even in the future), and when an item updates it will get called again for just that item.
If you only want to get each item once do map().val(cb) however this will still get called for new items that are added. Just each item, only once.

Related

Why should we not return a response in an HTTP DELETE request?

At my company we have a DELETE endpoint which (obviously) deletes some data that the user selects on the client side. Currently we are doing another get request to refresh the data on this page after the deletion requests completes successfully (So we don't have two conflicting states on client/server side).
A developer at our company wants to now change the DELETE endpoint to return the updated state after the data is successfully deleted, so we can just use this response value to update our client side.
This makes sense to me (As we can avoid another GET call) but from other threads the general consensus is that we should return an empty response.
Can someone explain to me why this is the case? I've looked around it seems like people are mostly saying 'because that is how REST is supposed to be' which doesn't really seem like a good reason.

How do I clear up Documentum rendition queue?

We have around 300k items on dmi_queue_item
If I do right click and select "destroy queue item" I see the that row no longer appears if I query by r_object_id.
Would it mean that the file no longer will be processed by the CTS service ? Need to know if this would it be the way to clear up the queue for the rendition process (to convert to PDF) or what it would it be the best way to clear up the queue ?
Also for some items/rows I get this message when doing the right click "destroy" thing, what does it mean ? or how can I avoid it ? Not sure if maybe the item was processed and the row no longer exists or is something else.
dmi_queue_item table is used as queue for all sorts of events at Content Server.
Content Transformation Service is using it to read at least two types of events, afaik.
According to Content Transformation Services, Administration Guide, ver. 7.1, page 18 it reads dm_register_assets and performs the configured content actions for this specific objects.
I was using CTS for generating content renditions for some objects using dm_transcode_content event.
However, be carefull when cleaning up dmi_queue_item since there could be many different event types. It is up to system administrators to keep this queue clean by configuring system components to use events or not to stuff up events that are not supposed to be used.
As per cleaning the queue it is advised to use destroy API command, though you can even try to delete row using DELETE query. Of course try to do this in dev environment first.
You would need to look at 2 queues:
dm_autorender_win31 and dm_mediaserver. In order to delete them you would run a query:
delete dmi_queue_item objects where name = 'dm_mediaserver' or name = 'dm_autorender_win31'

Lagom PersistentEntityRef

I am studying Lagom and try to understand how persistent entities work.
I've read the following description:
Every PersistentEntity has a fixed identifier (primary key) that can
be used to fetch the current state and at any time only one instance
(as a “singleton”) is kept in memory.
Makes sense.
Then there is the following example to create a customer:
#Override
public ServiceCall<CreateCustomerMessage, Done> createCustomer() {
return request -> {
log.info("===> Create or update customer {}", request.toString());
PersistentEntityRef<CustomerCommand> ref = persistentEntityRegistry.refFor(CustomerEntity.class, request.userEmail);
return ref.ask(new CustomerCommand.AddCustomer(request.firstName, request.lastName, request.birthDate, request.comment));
};
}
This confuses me:
Does that mean that the persistentEntityRegistry contain multiple singleton persistentEntities? How exactly does the persistentEntityRegistry get filled and what is in it? Say we have 10k users that are created, does the registry contain 10k persistentEntities, or just 1?
In this case we want to create a new customer. So when we request a persistentEntity using persistentEntityRegistry.refFor(CustomerEntity.class, request.userEmail);, this shouldn't return anything from the registry since the customer doesn't exist yet (?).
Can you shine a light on how this works?
Documentation is good but there a few holes in my understanding that I haven't been able to fill.
Great questions. I'm not sure how far you are along with concepts relating to persistent entities that aren't mentioned here, so I'll start from the beginning.
When doing event sourcing, generally, for a given entity (eg, a single customer), you need a single writer. This is because generally reading and then writing to the event log is not done in a single transaction, so you read some events to load your state, validate an incoming command, and then emit one or more new events to be persisted. If two operations came in for the same entity at the same time, then they would both be validated with the same state - not taking into account the state change that the other might get in before they are executed. Hence, event sourcing requires a single writer principle, only one operation can be handled at a time so there's only one writer.
In Lagom, this is implemented using actors. Each entity (ie each instance of a customer) is loaded and managed by an actor. An actor has a mailbox (ie, a queue), where commands are placed, and it processes them one at a time, in order. For each entity, there is a singleton actor managing it (so, one actor per customer, many actors for many customers). Due to the single writer principle, it's very important that this is true.
But, how does a system like this scale? What happens if you have multiple nodes, do you then have multiple instances of each entity? No. Lagom uses Akka clustering with Akka cluster sharding to shard your entities across many nodes, ensuring that across all of your deployed nodes, you only have one of each entity. So when a command comes in to a node, the entity may live on the same node, in which case it just gets sent straight to the local actor for it to be processed, or it may live on a different node, in which case it gets serialised, sent to the node it lives on, and processed there, with the response being serialised and sent back.
This is one of the reasons why it's a PersistentEntityRef, due to the location transparency (you don't know where the entity lives), you can't hold onto the entity directly, you can only have a reference to it. The same terminology is used for actors, you have the actual Actor that does the behaviour, and an ActorRef is used to communicate with it.
Now, logically, when you get a reference for a customer that according to the domain model of your system doesn't exist yet (eg, they haven't registered), they don't exist. But, the persistent entity for them can, and must exist. There is actually no concept in Lagom of a persistent entity not existing, you can always instantiate a persistent entity of any id, it will load. It's just that there might be no events yet for that entity, in which case, when it loads, it will just have its initialState, with no events applied. For a customer, the state of the customer might be Optional<Customer>. So, when the entity is first created before any events are emitted for a customer, the state will be Optional.empty(). The first event emitted for the customer will be a CustomerRegistered event, and this will cause the state to change to an Optional.of(someCustomer).
To understand why logically this must be so, you don't want the same customer to be able to register themselves twice, right? You want to ensure that there is only one CustomerRegistered event for each customer. To do that, you need to have a state for the customer in their unregistered state, to validate that they are not already registered before they do register.
So, to make clear the answer to your first question, if you have 10K users, then there will be 10K persistent entity instances, one for each user. Though, that is only logically (there will be events for 10K different users in the database physically). In memory, the actual loaded entities will depend on which entities are active, when an entity goes for, by default, 2 minutes without receiving a command, Lagom will passivate that entity, that means, it drops it from memory, so the next time a command comes in for it will have to be loaded by loading the events for it from the database. This helps to ensure that you don't run out of memory by holding all your data in memory if you don't want.

How to keep an API idempotent while receiving multiple requests with the same id at the same time?

From a lot of articles and commercial API I saw, most people make their APIs idempotent by asking the client to provide a requestId or idempotent-key (e.g. https://www.masteringmodernpayments.com/blog/idempotent-stripe-requests) and basically store the requestId <-> response map in the storage. So if there's a request coming in which already is in this map, the application would just return the stored response.
This is all good to me but my problem is how do I handle the case where the second call coming in while the first call is still in progress?
So here is my questions
I guess the ideal behaviour would be the second call keep waiting until the first call finishes and returns the first call's response? Is this how people doing it?
if yes, how long should the second call wait for the first call to be finished?
if the second call has a wait time limit and the first call still hasn't finished, what should it tell the client? Should it just not return any responses so the client will timeout and retry again?
For wunderlist we use database constraints to make sure that no request id (which is a column in every one of our tables) is ever used twice. Since our database technology (postgres) guarantees that it would be impossible for two records to be inserted that violate this constraint, we only need to react to the potential insertion error properly. Basically, we outsource this detail to our datastore.
I would recommend, no matter how you go about this, to try not to need to coordinate in your application. If you try to know if two things are happening at once then there is a high likelihood that there would be bugs. Instead, there might be a system you already use which can make the guarantees you need.
Now, to specifically address your three questions:
For us, since we use database constraints, the database handles making things queue up and wait. This is why I personally prefer the old SQL databases - not for the SQL or relations, but because they are really good at locking and queuing. We use SQL databases as dumb disconnected tables.
This depends a lot on your system. We try to tune all of our timeouts to around 1s in each system and subsystem. We'd rather fail fast than queue up. You can measure and then look at your 99th percentile for timings and just set that as your timeout if you don't know ahead of time.
We would return a 504 http status (and appropriate response body) to the client. The reason for having a idempotent-key is so the client can retry a request - so we are never worried about timing out and letting them do just that. Again, we'd rather timeout fast and fix the problems than to let things queue up. If things queue up then even after something is fixed one has to wait a while for things to get better.
It's a bit hard to understand if the second call is from the same client with the same request token, or a different client.
Normally in the case of concurrent requests from different clients operating on the same resource, you would also want to implementing a versioning strategy alongside a request token for idempotency.
A typical version strategy in a relational database might be a version column with a trigger that auto increments the number each time a record is updated.
With this in place, all clients must specify their request token as well as the version they are updating (typical the IfMatch header is used for this and the version number is used as the value of the ETag).
On the server side, when it comes time to update the state of the resource, you first check that the version number in the database matches the supplied version in the ETag. If they do, you write the changes and the version increments. Assuming the second request was operating on the same version number as the first, it would then fail with a 412 (or 409 depending on how you interpret HTTP specifications) and the client should not retry.
If you really want to stop the second request immediately while the first request is in progress, you are going down the route of pessimistic locking, which doesn't suit REST API's that well.
In the case where you are actually talking about the client retrying with the same request token because it received a transient network error, it's almost the same case.
Both requests will be running at the same time, the second request will start because the first request still has not finished and has not recorded the request token to the database yet, but whichever one ends up finishing first will succeed and record the request token.
For the other request, it will receive a version conflict (since the first request has incremented the version) at which point it should recheck the request token database table, find it's own token in there and assume that it was a concurrent request that finished before it did and return 200.
It's seems like a lot, but if you want to cover all the weird and wonderful failure modes when your dealing with REST, idempotency and concurrency this is way to deal with it.

Long polling on a penny auction site?

On a penny auction site, there are a few fundamental requests that happen over time, namely:
Bidding request (when someone places a bid)
Timer updates
Leading bidder updates
I am trying to understand long polling a bit better and I'm stuck with this. As far as i know, Long polling is there to reduce Ajax requests. I.e. By only having ONE for visual updates, and ONE for actions. So, therefore:
bidding request (to place bids) will remain as is, but all the visual update requests will be combined into one "long poll" request, right?
If the user connects to the site for the first time, he will request the current state of the page by also passing in what he was last told the state of the page was. The server will compare it with the state of what it should be, and if they are different, it will pass the new state back to the user, correct?
When passing the state back, the LONG POLL will effectively stop, the screen will be updated, and a new LONG POLL will be started, correct?
Is this understanding correct so far?
If that is so, how will this in any way decrease the number of requests to the backend if the server still has to compare the state?
How will this help if the page is opened in 50 different windows by one user?
Long polling is used to simulate a connection in which the server pushes data to the client (rather than what is actually happening - which is the client requesting the information from the server). Basically the client requests data from the server, but rather than returning data to the client immediately the server 'holds' the request open - it can then return data to the client at a later time point - which can be used to simulate the server updating the client in 'real time'.
So in your example of an auction site the client might long-poll the sever for an item bid amount - the server would hold this request open, and when the bid value on that item changes can return the updated amount to the client.. this can be used to give the impression of the server updating the client as the bid amount changes.
As far as requests to the server go, this very much depends on how this is implemented. Obviously using long polling will reduce the number of requests made to the server compared with, say, getting the client to issue a new 'standard' request every second to check for updates. Multiple instances of the client will still result in multiple requests to the server - and moreover the server still has to deal with the overhead of holding the long polling requests open and responding to these when appropriate.. Apparently different servers, and server architectures, deal with this more effectively than others.