What is application state? - migration

This is a very general question. I am a bit confused with the term state. I would like to know what do people mean by "state of an application"? Why do they call webserver as "stateless" and database as "stateful"?
How is the state of an application (in a VM) transferred, when the VM memory is moved from one machine to another during live migration.
Is transferring the memory, caches and register values of a system enough to transfer the state of the running application?

You've definitely asked a mouthful -- it's unfortunate that the word state is used in so many different contexts, but each one is a valid use of the word.
State of an application
An application's state is roughly the entire contents of its memory. This can be a difficult concept to get behind until you've seen something like Erlang's server loops, which explicitly pass all the state of the application in a variable from one invocation of the function to the next. In more "normal" programming languages, the "state" of the program is all its global variables, static variables, objects allocated on the heap, objects allocated on the stack, registers, open file descriptors and file offsets, open network sockets and associated kernel buffers, and so forth.
You can actually save that state and resume execution of the process elsewhere. The BLCR checkpoint tools for Linux do exactly this. (Though it is an extremely uncommon task to perform.)
State of a protocol
The state of a protocol is a different sort of meaning -- the statelessness of HTTP requests means that every web browser communication with webservers essentially starts over, from scratch -- every cookie is re-transmitted in both directions to try to "fake" some amount of a "session" for the user's sake. The servers don't hold any resources open for any given client across requests -- each one starts from scratch.
Networked filesystems might also be stateless (earlier versions of NFS) or stateful (newer versions of NFS). The earlier versions assumed every individual packet of reading, writing, or metadata control would be committed as it arrived, and every time a specific byte was needed from a file, it would be re-requested. This allowed the servers to be very simple -- they would do what the client packets told them to do and no effort was required to bring servers and clients back to consistency if a server rebooted or routers disappeared. However, this was bad for performance -- every client requested static data hundreds or thousands of times each day. So newer versions of NFS allowed some amount of data caching on the clients, and persistent file handles between servers and clients, and the servers had to keep track of the state of the clients that were connected -- and vice versa: the clients also had to know what promises they had made to the servers.
A stateful firewall will keep track of active TCP sessions. It knows which sessions the system administrators want to allow through, so it looks for those initial packets specifically. Once the session is set up, it then tracks the established connections as entities in their own rights. (This was a real advancement upon previous stateless firewalls which considered packets in isolation -- the rulesets on previous firewalls were much more permissive to achieve the same levels of functionality, but allowed through far too many malicious packets that pretended a session was already active.)

An application state is simply the state at which an application resides with regards to where in a program is being executed and the memory that is stored for the application. The web is "stateless," meaning everytime you reload a page, no information remains from the previous version of the page. All information must be resent from the server in order to display the page.
Technically, browsers get around the statelessness of the web by utilizing techniques like caching and cookies.

Application state is a data repository available to all classes. Application state is stored in memory on the server and is faster than storing and retrieving information in a database. Unlike session state, which is specific to a single user session, application state applies to all users and sessions. Therefore, application state is a useful place to store small amounts of often-used data that does not change from one user to another.
Resource:http://msdn.microsoft.com/en-us/library/ms178594.aspx

Is transferring the memory, caches and register values of a system enough to transfer the state of the running application?
Does the application have a file open, positioned at byte 225? If so, that file is part of the application's state because the next byte written should go to position 226.
Has the application authenticated itself to a secure server with a time-based key? Then that connection is part of the application's state, because if the application were to be suspended for 24 hours after saving memory, cache, and register values, when it resumes it will no longer have a valid connection to the secure server because it will have timed out.
Things which make an application stateful are easy to overlook.

Related

gun db storage model for large centrally stored data, tiny collaborative clients

Use Case:
Say I wanted to create a realtime-collaborative document editing system.
In this scenario many users could create and collaborate on many documents.
Due to client-device constraints, it's not possible for any client to keep a replica of all documents, only just a handful.
There needs to be central storage server where all documents always live, and this server is always backed up.
Each client can 'subscribe' to any document, and all clients subscribed can see realtime changes of all other clients subscribed/editing the same document.
Questions:
Since each client can't store all documents, there needs to be a way to remove the replicas of 'old' documents from the client, without deleting the document from the central store, ideally based on an automatic least-recently-used approach. How is this handled in gun?
In gun, how can a document be deleted from the central store, so it's then effectively permanently removed from, and no longer accessible to, all clients?
When a document is deleted from the central store, how is the physical storage space ever actually reclaimed for later use?
Great questions, #user2672083 . Here is the current lay out:
Collaborative realtime document editing is possible with gun. Here is a quick prototype I recorded a long time ago, however there are no full pre-built examples/implementations yet.
Not all data is stored on every client by default. The browser only keeps the data it requests/gets/subscribes to.
The default server already acts as a backup. I recommend using the S3 storage adapter, because then you do not have to worry about running out of disk space.
Removing old replicas. Currently, if I want the server to act as a central "master", I just put a localStorage.clear() at the top of my browser code. This will force the browser to have to always load the latest from the server. This is not ideal though, an LRU specific feature is coming soon according to the roadmap.
Permanently removing data and reclaiming space. While this should be easy for a central setup, because gun is P2P by default, it uses a technique called tombstoning to delete data. Given a lot of requests (like yours) for LRU/TTL/GC/deleting, there will be better support for this in the future. Currently, you have to use a mix of rm data.json, localStorage.clear() and 30 day lifecycles on S3 to get this to work. This will be more integrated/easier in the future.
Now a question for you: What are you working on, and how can I help? Many of the things you asked about are possible (with some work) now, but slated to be the focus of the next version of gun - I'd love to get your feedback as we build this out.
All peers reply to requests for data (#2), meaning that localStorage and the server will both reply. Because localStorage is physically closer to a user, it will reply first/fastest and then replies from the server will be merged. GUN does not try each peer "in sequence" doing try/catch cascades, GUN replies from all peers in parallel.
GUN has swappable storage and transport interfaces, so yes, it is easy to build other layers on top or into it.

How to handle temporary unreachable online api

This is a more general question, so bear with my abstraction of the following problem.
I'm currently developing an application, that is interfacing with a remote server over a public api. The api in question does provide mechanisms for fetching data based on a timestamp (e.g. "get me everything that changed since xxx"). Since the amount of data is quite high, I keep a local copy in a database and check for changes on the remote side every hour.
While this makes the application robust against network problems (remote server in maintenance, network outage, etc.) and enables employees to continue working with the application, there is one big gaping problem:
The api in question also offers write access. E.g. my application can instruct the remote server to create a new object. Currently I'm sending the request via api, and upon success create the object in my local database, too. It will eventually propagate via the hourly data fetching, where my application (ideally) sees that no changes need to be made to the local database.
Now when the api is unreachable, i create the object in my database, and cache the request until the api is reachable again. This has multiple problems:
If the request fails (due to not beforehand validateble errors), I end up with an object in the database which shouldn't even exist. I could delete it, but it seems hard to explain to the user(s) ("something went wrong with the api, we deleted that object again").
The problem especially cascades when depended actions que up. E.g. creating the object, and two more requests for modifying it. When the initial create fails, so will the modifying requests (since the object does not exist on the remote side)
Worst case is deletion - when an object is deleted locally, but will not be deleted on the remote site, I have no way of restoring it (easily).
One might suggest to never create objects locally, and let them propagate only through the hourly data sync. This unfortunately is not an option. If the api is not accessible, it can be for hours. And it is mandatory that employees can continue working with the application (which they cannot when said objects don't exist locally).
So bottom line:
How to handle such a scenario, where the api might not be reachable, but certain requests must be cached locally and repeated when the api is reachable again. Especially how to handle cases where those requests unpredictable fail.

Scalability issues with server based authentication

I was reading up on problems with server based authentication. I need help with elaboration on the following point.
Scalability: Since sessions are stored in memory, this provides problems with scalability. As our cloud providers start replicating servers to handle application load, having vital information in session memory will limit our ability to scale.
I don't seem to understand why "... having vital information in session memory will limit our ability to scale", will limit the ability to scale. Is it just because the information is being replicated.. so it's to do with redundancy? I don't think so. Anyway, would anyone be kind enough to explain this further? Much appreciated.
What's being referred to is the difference between stateless and stateful server-side ops. Stateful servers keep part of their resources (main memory, mostly) occupied for retaining state pertaining to some client, even when the server is actually doing nothing at all for the client and just waiting for the client to come back. Such systems' performance profile is "linear" only up to the point where all available memory has been filled with state, and beyond that point the server seems to essentially stall. Stateless servers only keep resources occupied when they're actually doing something, and once finished doing stuff, those resources are immediately freed and available for other clients. Such servers are essentially not capped by memory limits and therefore "scale easier".
Also, the explanation given seems to refer to scenario's where a set of distinct machines present themselves to the outside world as being one, when actually they are not (this is often called a "cluster" of machines/servers). In such scenario's, if a client has connected to the "big single virtual machine", then actually he is connected to just one of the "actual machines" in the cluster. If state is kept there, subsequent visits by that same client must then be routed to the same physical machine, or that piece of state must be trafficked around to whatever machine the next visit happens to be to. The former implies the implementation of management functions that take their own set of resources, plus limitations on the freedom the cluster has to distribute the load (the opposite of why you want to do clustering), the latter implies additional network traffic that will cap scalability in essentially the same way as available memory does.
Server-based authentication makes use of sessions, which in turn make use of a local session id. In the cloud, when the servers are replicated to handle application load, it becomes difficult for one server to know which sessions are active on other servers. Now to overcome this problem, extra steps must be performed... for instance to persist the session id on to the database. However, as the servers are increasingly replicated, it becomes more and more difficult to handle all this. Therefore, server-based or session-based authentication can be problematic for scalability.

Zookeeper vs In-memory-data-grid vs Redis

I've found different zookeeper definitions across multiple resources. Maybe some of them are taken out of context, but look at them pls:
A canonical example of Zookeeper usage is distributed-memory computation...
ZooKeeper is an open source Apacheā„¢ project that provides a centralized infrastructure and services that enable synchronization across a cluster.
Apache ZooKeeper is an open source file application program interface (API) that allows distributed processes in large systems to synchronize with each other so that all clients making requests receive consistent data.
I've worked with Redis and Hazelcast, that would be easier for me to understand Zookeeper by comparing it with them.
Could you please compare Zookeeper with in-memory-data-grids and Redis?
If distributed-memory computation, how does zookeeper differ from in-memory-data-grids?
If synchronization across cluster, than how does it differs from all other in-memory storages? The same in-memory-data-grids also provide cluster-wide locks. Redis also has some kind of transactions.
If it's only about in-memory consistent data, than there are other alternatives. Imdg allow you to achieve the same, don't they?
https://zookeeper.apache.org/doc/current/zookeeperOver.html
By default, Zookeeper replicates all your data to every node and lets clients watch the data for changes. Changes are sent very quickly (within a bounded amount of time) to clients. You can also create "ephemeral nodes", which are deleted within a specified time if a client disconnects. ZooKeeper is highly optimized for reads, while writes are very slow (since they generally are sent to every client as soon as the write takes place). Finally, the maximum size of a "file" (znode) in Zookeeper is 1MB, but typically they'll be single strings.
Taken together, this means that zookeeper is not meant to store for much data, and definitely not a cache. Instead, it's for managing heartbeats/knowing what servers are online, storing/updating configuration, and possibly message passing (though if you have large #s of messages or high throughput demands, something like RabbitMQ will be much better for this task).
Basically, ZooKeeper (and Curator, which is built on it) helps in handling the mechanics of clustering -- heartbeats, distributing updates/configuration, distributed locks, etc.
It's not really comparable to Redis, but for the specific questions...
It doesn't support any computation and for most data sets, won't be able to store the data with any performance.
It's replicated to all nodes in the cluster (there's nothing like Redis clustering where the data can be distributed). All messages are processed atomically in full and are sequenced, so there's no real transactions. It can be USED to implement cluster-wide locks for your services (it's very good at that in fact), and tehre are a lot of locking primitives on the znodes themselves to control which nodes access them.
Sure, but ZooKeeper fills a niche. It's a tool for making a distributed applications play nice with multiple instances, not for storing/sharing large amounts of data. Compared to using an IMDG for this purpose, Zookeeper will be faster, manages heartbeats and synchronization in a predictable way (with a lot of APIs for making this part easy), and has a "push" paradigm instead of "pull" so nodes are notified very quickly of changes.
The quotation from the linked question...
A canonical example of Zookeeper usage is distributed-memory computation
... is, IMO, a bit misleading. You would use it to orchestrate the computation, not provide the data. For example, let's say you had to process rows 1-100 of a table. You might put 10 ZK nodes up, with names like "1-10", "11-20", "21-30", etc. Client applications would be notified of this change automatically by ZK, and the first one would grab "1-10" and set an ephemeral node clients/192.168.77.66/processing/rows_1_10
The next application would see this and go for the next group to process. The actual data to compute would be stored elsewhere (ie Redis, SQL database, etc). If the node failed partway through the computation, another node could see this (after 30-60 seconds) and pick up the job again.
I'd say the canonical example of ZooKeeper is leader election, though. Let's say you have 3 nodes -- one is master and the other 2 are slaves. If the master goes down, a slave node must become the new leader. This type of thing is perfect for ZK.
Consistency Guarantees
ZooKeeper is a high performance, scalable service. Both reads and write operations are designed to be fast, though reads are faster than writes. The reason for this is that in the case of reads, ZooKeeper can serve older data, which in turn is due to ZooKeeper's consistency guarantees:
Sequential Consistency
Updates from a client will be applied in the order that they were sent.
Atomicity
Updates either succeed or fail -- there are no partial results.
Single System Image
A client will see the same view of the service regardless of the server that it connects to.
Reliability
Once an update has been applied, it will persist from that time forward until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been applied. On some failures (communication errors, timeouts, etc) the client will not know if the update has applied or not. We take steps to minimize the failures, but the only guarantee is only present with successful return codes. (This is called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful update, will never be rolled back when recovering from server failures.
Timeliness
The clients view of the system is guaranteed to be up-to-date within a certain time bound. (On the order of tens of seconds.) Either system changes will be seen by a client within this bound, or the client will detect a service outage.

LDAP - write concern / guaranteed write to replicas prior to return

Is OpenLDAP (or are any of LDAP's flavors) capable of providing write concern? I know it's an eventually consistent model, but there's more then a few DB's that have eventual consistency + write concern.
After doing some research, I'm still not able to figure out whether or not it's a thing.
The UnboundID Directory Server provides support for an assured replication mode in which you can request that the server delay the response to an operation until it has been replicated in a manner that satisfies your desired constraints. This can be controlled on a per-operation basis by including a special control in the add/delete/modify/modify DN request, or by configuring the server with criteria that can be used to identify which operations should use this assured replication mode (e.g., you can configure the server so that operations targeting a particular set of attributes are subjected to a greater level of assurance than others).
Our assured replication implementation allows you to define separate requirements for local servers (servers in the same data center as the one that received the request from the client) and nonlocal servers (servers in other data centers). This allows you tune the server to achieve a balance between performance and behavior.
For local servers, the possible assurance levels are:
Do not perform any special assurance processing. The server will send the response to the client as soon as it's processed locally, and the change will be replicated to other servers as soon as possible. It is possible (although highly unlikely) that a permanent failure that occurs immediately after the server sends the response to the client but before it gets replicated could cause the change to be lost.
Delay the response to the client until the change has been replicated to at least one other server in the local data center. This ensures that the change will not be lost even in the event of the loss of the instance that the client was communicating with, but the change may not yet be visible on all instances in the local data center by the time the client receives the response.
Delay the response to the client until the result of the change is visible in all servers in the local data center. This ensures that no client accessing local servers will see out-of-date information.
The assurance options available for nonlocal servers are:
Do not perform any special assurance processing. The server will not delay the response to the client based on any communication with nonlocal servers, but a change could be lost or delayed if an entire data center is lost (e.g., by a massive natural disaster) or becomes unavailable (e.g., because it loses network connectivity).
Delay the response to the client until the change has been replicated to at least one other server in at least one other data center. This ensures that the change will not be lost even if a full data center is lost, but does not guarantee that the updated information will be visible everywhere by the time the client receives the response.
Delay the response to the client until the change has been replicated to at least one server in every other data center. This ensures that the change will be processed in every data center even if a network partition makes a data center unavailable for a period of time immediately after the change is processed. But again this does not guarantee that the updated information will be visible everywhere by the time the client receives the response.
Delay the response to the client until the change is visible in all available servers in all other data centers. This ensures that no client will see out-of-date information regardless of the location of the server they are using.
The UnboundID Directory Server also provides features to help ensure that clients are not exposed to out-of-date information under normal circumstances. Our replication mechanism is very fast so that changes generally appear everywhere in a matter of milliseconds. Each server is constantly monitoring its own replication backlog and can take action if the backlog becomes too great (e.g., mild action like alerting administrators or more drastic measures like rejecting client requests until replication has caught up). And because most replication backlogs are encountered when the server is taken offline for some reason, the server also has the ability to delay accepting connections from clients at startup until it has caught up with all changes processed in the environment while it was offline. And if you further combine this with the advanced load-balancing and health checking capabilities of the UnboundID Directory Proxy Server, you can ensure that client requests are only forwarded to servers that don't have a replication backlog or any other undesirable condition that may cause the operation to fail, take an unusually long time to complete, or encounter out-of-date information.
From reviewing RFC3384 discussion of replication requirements with respect to LDAP, it looks as though LDAP only requires eventual consistency and does not require transactional consistency. Therefore any products which support this feature are likely to do this with vendor specific implementations.
CA Directory does support a proprietary replication model called MULTI-WRITE which guarantees that the client obtains write confirmation only after all replicated instances have been updated. In addition it supports the standard X.525 Shadowing Protocol which provides lesser consistency guarantees and better performance.
With typical LDAP implementations, an update request will normally return immediately when the DSA handling this request has been updated, and not when the replica instances have been updated. This is the case with OpenLDAP I believe. The benefits are speed, the downsides are lack of guarantee that an updated has been applied to all replicas.
CA's directory product uses a Memory Mapped system and writes are so fast this is not a concern.