In certain container boxes chronicle queue is not working.I am seeing this exception: 2018-11-17 16:30:57.825 [failsafe-sender] WARN n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - Unable to append EOF, skipping
java.util.concurrent.TimeoutException: header: 80000000, pos: 104666
at net.openhft.chronicle.wire.AbstractWire.writeEndOfWire(AbstractWire.java:459)
at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueStore.writeEOF(SingleChronicleQueueStore.java:349)
at
I want to understand why only in certain VM's.
Note: We are using NFS file system
Tried to understand the behavior in NFS
Chronicle Queue does not support operating off any network file system, be it NFS, AFS, SAN-based storage or anything else. The reason for this is those file systems do not provide all the required primitives for memory-mapped files Chronicle Queue uses.
Or putting it another way, Chronicle Queue uses off-heap memory mapped files and these files utilize memory mapped CAS based locks, usually these CAS operations are not atomic between processes when using network attached storage and certainly not atomic between processes that are hosted on different machines. If your test sometimes works on different combinations of file-system and/or OS's, then it is possible your test did not experience a concurrency race, or that that on some combination of NAS and OS, it is possible the hardware and OS has honoured these CAS operations, however, we feel this is very unlikely. As a solution to this, we have created a product called chronicle-queue-enterprise, it is a commercial product that will let you share a queue between machines using TCP/IP. Please contact sales#chronicle.software for more information on chronicle-queue-enterprise.
For reliable distribution of data between machines you need to use Chronicle Queue Enterprise. NFS doesn't support atomic memory operations between machines.
Related
I have a producer of tasks and multiple workers to consume those tasks. Many places recommend rabbitmq and/or celery. However python has a builtin multiprocessing queue that can be shared on an ip/port using a manager/proxy. What would be the advantages of using something like rabbitmq instead?
RabbitMq is an enterprise level tool, typically deployed separately on out-of-process servers / VMs / Containers, and plays in the enterprise service bus space.
Rabbit has reliable messaging as an objective - e.g. messages are persisted, and nodes in the cluster can be restarted without losing messages.
Supports a large range of messaging topologies, such as Point-Point, Fan out, and Topic subscriptions
Can be scaled for volume by adding multiple nodes to a cluster
Allows for conditional routing of messages to queues using routing keys or header filters
Agnostic of client technology, i.e. Clients can be on any platform which support the AMQP protocol
Has an out of the box administration, monitoring and diagnostics UI
Has a wide range of extensions and tools, such as shovels allowing messages to be replicated across multiple RabbitMQ clusters.
I'm no Python expert, but from what I understand of the multiprocessing package, it serves as an manager for distributing work between worker processes and threads, so IMO would be regarded as a more local system concern, as opposed to 'enterprise' level.
e.g. you would need to handle persistence, i.e. so messages are not lost during a crash / restart, and would likely need to built your own administration and monitoring tools.
We are planning to use the iscsi target to handle the Activemq master/slave setup. In this case we will mount a SAN storage volume on two virtual machines using the iscsi protocol and those two VMs would share the same mount (from SAN). So the question is, will the file locking works properly with this approach? And can we anticipate any issues in this design?
Mounting as NFS may need a fileserver between SAN and the VMs so we are not considering this as an option and planning to use iscsi. Any help would be greatly appreciated.
You must use a clustered, "shared disk" filesystem on the iSCSI LUNs. Conventional filesystems (ext3, xfs, ntfs, etc.) do not expect (or handle) the data changing out from underneath them. They just won't work.
I don't have any particular one to recommend, but the most accessible of these shared-disk filesystems is probably GFS2? The wikipedia page for clustered filesystems has several examples listed under the Shared-disk file system heading.
https://en.wikipedia.org/wiki/Clustered_file_system
I am new in
Apache ZooKeeper : ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Apache Mesos : Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.
Apache Helix : Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes.
Erlang Langauge : Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability.
It sounds to me that Helix and Mesos both are useful for Clustering management System. How they are related to ZooKeeper? It'd better if someone give me a real world example for their usage.
I am curious to know How [BOINC][1] are distributing tasks to their clients? Are they using any of the above technologies? (Forget about Erlang).
I just need a brief view on it :)
Erlang was built by Ericsson, designed for use in phone systems. By design, it runs hundreds, thousands, or even 10s of thousands of small processes to handle tasks by sending information between them instead of sharing memory or state. This enables all sorts of interesting features that are great for high availability distributed systems such as:
hot code reloading. Each process is paused, it's relevant module code is swapped out, and it is resumed where it left off, so deploys can happen without restarting or causing significant interruption.
Easy distributed messaging and clustering. Sending a message to a local process or a remote one is fairly seamless in most instances.
Process-local GC. Garbage collection happens in each process independently instead of a global stop-the-world even like java, aiding in low-latency results.
Supervision trees and complex process hierarchy and monitoring/managing.
A few concrete real-world examples that makes great use of Erlang would be:
MongooseIM A highly performant and incredibly scalable, distributed XMPP / Chat server
Riak A distributed key/value store.
Mesos, on the other hand, you can sort of think of as a platform effectively for turning a datacenter of servers into a platform for teams and developers. If I, say as a company, own a datacenter with 10,000 physical servers, and I have 1,000 engineers developing hundreds of services, a good way to allow the engineers to deploy and manage services across that hardware without them needing to worry about the servers directly. It's an abstraction layer over-top of the physical servers to that allows you to share and intelligently allocate resources.
As a user of Mesos, I might say that I have Service X. It's an executable bundle that lives in location Y. Each instance of Service X needs 4 GB of RAM and 2 cores. And I need 8 instances which will be attached to a load balancer. You can specify this in configuration and deploy based on that config. Mesos will find hardware that has enough ram and CPU capacity available to handle each instance of that service and start it running in each of those locations.
It can handle a lot of other more complex topics about the orchestration of them as well, but that's probably a bit in-depth for this :)
Zookeepers most common use cases are Service Discover and configuration management. You can think of it, fundamentally, a bit like a nested key value store, where services can look at pre-defined paths to see where other services currently live.
A simple example is that I have a web service using a shared database cluster. I know a simple name for that database cluster and where the configuration for it lives in zookeeper. I can look up (or repeatedly poll) that path in zookeeper to check what the addresses of the active database hosts are. And on the other side, if I take a database node out of rotation and replace it with a new one, the config in zookeeper gets updated with the new address, and anything continually looking at it will detect this change and change where it's connected to.
A more complex use case for zookeeper is how Kafka uses it (or did at the time that I last used Kafka). Kafka has streams, and streams have many shards. Each consumer of each stream use zookeeper to save checkpoints in each shard after they have read and processed up to a certain point in the stream. That way if the consumer crashes or is restarted, it knows where to pick up in the stream.
I dont know about Meos and Earlang language. But this article might help you with Helix and Zookeeper.
This article tells us:
Zookeeper is responsible for gluing all parts together where Helix is cluster management component that registers all cluster details (cluster itself, nodes, resources).
The article is related to clustering in JBPM using helix and zookeeper.But with this you will get a basic idea on what helix and zookeeper is used for.
And from most of the articles i read online it seems like zookeeper and helix are used together.
Apache Zookeeper can be installed on a single machine or on a cluster.
It can be used to keep track of logs. It can provide various services on a distributed platform.
Storm and Kafka rely on Zookeeper.
Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.
Kafka queue consumers can use Zookeeper to store information on what has been consumed from the queue.
I'm wondering what are the pros and cons of using redis as a broker in an infrastructure?
At the moment, all my agents are sending to a central NXLog server which proxies the requests to logstash --> ES.
What would I gain by using a redis server in between my nxlog collector and logstash? To me, it seems pointless as nxlog has already good mem and disk buffers in case logstash is down.
What would I gain?
Thank you
On a heavy load : calling ES (HTTP) directly can be dangerous and you can have problems if ES break down .
Redis can handle More (Much more) Write request and send it in asynch logic to ES(HTTP).
I started using redis because I felt that it would separat the input and the filter part.
At least during periodes in which I change the configuration a lot.
As you know if you change the logstash configuration you have to restart the thing. All clients (in my case via syslog) are doomed to reconnect to the logstash daemon when he is back in business.
By putting an indexer in front which holds the relativly static input configuration and pusing everything to redis I am able to restart logstash without causing hickups throughout the datacenter.
I encountered some issues, because our developers hadn't found time (yet) to reduce the amount of useless logs send to syslog, thus overflowing the server. Before we had logstash they overflowed the disk space for logs - more general issue though... :)
When used with Logstash, Redis acts as a message queue. You can have multiple writers and multiple readers.
By using Redis (or any other queueing service) allows you to scale Logstash horizontaly by adding more servers to the 'cluster'. This will not matter for small operations but can be extremely useful for larger installations.
When using Logstash with Redis, you can configure Redis to only store all the log entries in memory which would like a in memory queue (like memcache).
You mat come to the point where the number of logs sent will not be processed by Logstash and it can bring down your system on constant basis (observed in our environment).
If you feel Redis is an overhead for your disk, you can configure it to store all the logs in memory until they are processed by logstash.
As we built our ELK infrastructure, we originally had a lot of problems with the logstash indexer (reading from redis). Redis would back up and eventually die. I believe this was because, in the hope of not losing log files, redis was configured to persist the cache to disk once in a while. When the queue got "too large" (but still within available disk space), redis would die, taking all of the cached entries with it.
If this is the best redis can do, I wouldn't recommend it.
Fortunately, we were able to resolve the issues with the indexer, which typically kept the redis queue empty. We set our monitoring to alert quickly when the queue did back up, and it was a good sign that the indexer was unhappy again.
Hope that helps.
I'm considering Redis for a section of the architecture of a new project. It will consist of a lot of clients (node.js connections) SUBSCRIBING to particular keys with one process PUBLISHING to those keys as needed.
I'm curious about the limits of the PUBLISH/SUBSCRIBE commands and how to mitigate those. An obvious limit is the amount of file descriptors open on the machine with Redis so at some point I'll need to implement Master-Slave or Consistent Hashing to multiple Redis instances.
Does anyone have any solutions about how to scale this architecture with Redis' PubSub?
Redis PubSub scales really easily since the Master/Slave replication automatically publishes to all slaves.
The easiest way is to load balance the connections to node.js with for instance HAProxy, run a Redis slave on each webserver that syncs with a single master that publishes the messages.
I can't give you exact numbers since that greatly depends on the underlying system, but this should scale extremely well. And you don't need to manage the clients and which server they connect to manually. You obviously need some way to handle session state, so you might need to do that anyway, but that's a lot easier to do in the load balancer than in your application.