MongoDB - Adding new config servers in production - config

I've been doing some tests with mongodb and sharding and at some point I tried to add new config servers to my mongos router (at that time, I was playing with just one config server). But I couldn't find any information on how to do this.
Have anybody tried to do such a thing?

Unfortunately you will need to shutdown the entire system.
Shutdown all processes (mongod, mongos, config server).
Copy the data subdirectories (dbpath tree) from the config server to the new config servers.
Start the config servers.
Restart mongos processes with the new --configdb parameter.
Restart mongod processes.
From: http://www.mongodb.org/display/DOCS/Changing+Config+Servers

Use DNS CNAMES
make sure to use DNS entries or at least /etc/hosts entries for all mongod and mongo config servers when you set-up multiple config servers in /etc/mongos.conf , when you set-up replica sets and/or sharding.
e.g. a common pitfall on AWS is to use just the private DNS name of EC2 instances, but these can change over time... and when that happens you'll need to shut down your entire mongodb system, which can be extremely painful to do if it's in production.

The Configure Sharding and Sample Configuration Session pages appear to have what you're looking for.
You must have either 1 or 3 config servers; anything else will not work as expected.
You need to dump and restore content from your original config server to 2
new config servers before adding them to mongos's --configdb.
The relevant section is:
Now you need a configuration server and mongos:
`$ mkdir /data/db/config
$ ./mongod --configsvr --dbpath /data/db/config --port 20000 > /tmp/configdb.log &
$ cat /tmp/configdb.log
$ ./mongos --configdb localhost:20000 > /tmp/mongos.log &
$ cat /tmp/mongos.log `
mongos does not require a data directory, it gets its information from the config server.

Related

redis snapshot location not as specified in config

After running fine for a while, I am getting write error on my redis instance:
(error) MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.
In the log I see:
9948:C 22 Mar 20:49:32.241 # Failed opening the RDB file root (in server root dir /var/spool/cron) for saving: Read-only file system
However, my redis config file is /etc/redis/redis.conf as confirmed by:
redis-cli -p 6379 info | grep 'config_file'
config_file:/etc/redis/redis.conf
And there I have:
dir /mnt/data/redis
And indeed, there is a snapshot there.
But despite the above, redis now thinks my data directory is
redis-cli -p 6379 CONFIG GET dir
1) "dir"
2) "/var/spool/cron"
Corresponding to the error I was getting as quoted above.
Can anyone tell me why/how my data directory is changing after redis starts, such that it is no longer what is specified in the config file?
So the answer is that the redis server was hacked and the configuration changed, which is very easy to do as it turns out. (I should point out that I had no reason to think it wasn't easy to do. I just assumed security by obscurity was sufficient in this case--wrong. No matter, this was just a playground not any sort of production server).
So don't open your redis port to the world. Use security groups if on AWS to limit access to machines that need it, or use AUTH (which is still not awesome because then all clients need to know the single password which also apparently gets sent in the clear), or have some middleware controlling access.
Hacking redis is easy to do, can compromise your data, and even enable unauthorized SSH access to your server. And that's why you shouldn't highline.

How to disable NFS client caching?

I have a trouble with NFS client file caching. The client read the file which was removed from the server many minutes before.
My two servers are both CentOS 6.5 (kernel: 2.6.32-431.el6.x86_64)
I'm using server A as the NFS server, /etc/exports is written as:
/path/folder 192.168.1.20(rw,no_root_squash,no_all_squash,sync)
And server B is used as the client, the mount options are:
nfsstat -m
/mnt/folder from 192.168.1.7:/path/folder
Flags: rw,sync,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,nosharecache,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.20,minorversion=0,lookupcache=none,local_lock=none,addr=192.168.1.7
As you can see, "lookupcache=none,noac" options are already used to disable the caching, but seems doesn't work...
I did the following steps:
Create a simple text file from server A
Print the file from the server B by cmd "cat", and it's there
Remove the file from the server A
Wait couple minutes and print the file from the server B, and it's still there!
But if I do "ls" from the server B at that time, the file is not in the output. The inconsistent state may last a few minutes.
I think I've checked all the NFS mount options...but can't find the solution.
Is there any other options I missed? Or maybe the issue is not about NFS?
Any ideas would be appreciated :)
I have tested the same steps you have given with below parameters. Its working perfectly. I have added one more parameter "fg" in the client side mounting.
sudo mount -t nfs -o fg,noac,lookupcache=none XXX.XX.XX.XX:/var/shared/ /mnt/nfs/fuse-shared/

Unable to resolve github.com from OpenShift origin pods

I have a basic OpenShift origin cluster started with oc cluster up
Now, in the default 'MyProject' i wanted to build a source from git repo and it's failing with the error
Could not resolve host: github.com; Name or service not known
Even I tried setting up gogs and migrate the public hosted source code on github.com to gogs pod but throwing same error.
Kindly advise if there are any additional network settings required during OpenShift cluster setup in order to access github.com or any other public domains. I can sense it's a network issue but not sure what exactly needs to be configured on the cluster.
I know this is an old ticket, but I came across this issue when looking for a solution for my problem. I had exactly the same problem as described in this issue. For me, the problem lies within the combination between Ubuntu 18.04 and docker. I followed solution B from this answer.
Hopefully this helps someone as I've lost a lot of time trying to resolve this issue by looking for the problem as if it was a problem from openshift/okd while the actual cause lies within the combination between docker and ubuntu (at least for me).
You can edit the config Map of Node in master server ( In order to provide proper information of your nameserver to the pods.)
# oc get cm -n openshift-node
for all compute nodes edit the config map by below command.( Only need to perform in master server)
# oc edit cm node-config-compute -n openshift-node
......
dnsBindAddress: 127.0.0.1:53
dnsDomain: cluster.local
dnsIP: 10.0.80.11
dnsNameservers: null
dnsRecursiveResolvConf: /etc/origin/node/resolv.conf
.......
Edit dnsIP section with your DNS IP. Then restart the service
# systemctl restart atomic-openshift-node.service
The DNS ip will be prepended in all /etc/resolv.conf file of Pods.
Click for detail info
Shutdown the cluster with: oc cluster down
Edit the file: openshift.local.clusterup/node/node-config.yml and set dnsIP: "" to 8.8.8.8
Edit the file openshift.local.clusterup/kubedns/resolv.conf
and add
nameserver 8.8.8.8
nameserver 8.8.4.4
Also make sure you have the DNS options inside the docker config file
Edit /etc/docker/daemon.json and add
"dns": ["8.8.8.8", "8.8.4.4"]
Then start your cluster with
oc cluster up
and now it should work fine.

RabbitMQ - AWS EC2 Clustering hell

Sorry, should be shot for having to even ask this, but wasted day on this - and feel like I've read everything there is.
I can't create a cluster on my EC2 instances (3) that are spread on three different regions. The hosts:
rabbit#ip-172-31-47-217
rabbit#ip-172-31-1-82
rabbit#ip-172-31-36-111
The initial state before trying to make the cluster:
ubuntu#ip-172-31-47-217:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-47-217' ...
[{nodes,[{disc,['rabbit#ip-172-31-47-217']}]},
{running_nodes,['rabbit#ip-172-31-47-217']},
{partitions,[]}]
ubuntu#ip-172-31-36-111:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-36-111' ...
[{nodes,[{disc,['rabbit#ip-172-31-36-111']}]},
{running_nodes,['rabbit#ip-172-31-36-111']},
{partitions,[]}]
ubuntu#ip-172-31-1-82:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-1-82' ...
[{nodes,[{disc,['rabbit#ip-172-31-1-82']}]},
{running_nodes,['rabbit#ip-172-31-1-82']},
{partitions,[]}]
When I try to check status from one server for another:
sudo rabbitmqctl status -n rabbit#ip-172-31-1-82
Status of node 'rabbit#ip-172-31-1-82' ...
Error: unable to connect to node 'rabbit#ip-172-31-1-82': nodedown
nodes in question: ['rabbit#ip-172-31-1-82']
hosts, their running nodes and ports:
- unable to connect to epmd on ip-172-31-1-82: timeout (timed out)
current node details:
- node name: 'rabbitmqctl3835#ip-172-31-36-111'
- home dir: /var/lib/rabbitmq
- cookie hash: 0tsf/OyQZI7zobmv1Ia97w==
All three servers have the same erlang cookie hash.
I can verify the host names are setup properly:
host ip-172-31-36-111
ip-172-31-36-111.us-west-2.compute.internal has address 172.31.36.111
I know the ports are open:
netstat -plten | grep beam
Because I opened all TCP and UDP at this point as a test, no change.
and finally if this would behave differently given those failures:
sudo rabbitmqctl join_cluster --ram rabbit#ip-172-31-1-82
Clustering node 'rabbit#ip-172-31-47-217' with 'rabbit#ip-172-31-1-82' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
Please help, being driven insane by this.
The problem is that they are in different regions (presumably in EC2-classic - you didn't mention whether you were using a VPC). This means they cannot communicate via their private IPs (see e.g. Can EC2 instances in different regions communicate over their private IP addresses?)
ping 172.31.36.111
will fail from one of the other servers, for example. Pinging using the hostname probably will probably even fail on the DNS lookup.
Your options are:
Put them in separate zones in a single region (in EC2 classic, they will be able to communicate). You could also use a VPC in this case, putting the in separate subnets but allowing interconnections via appropriately set up security groups.
Set up /etc/hosts on each server to point the relevant public IPs of the other servers (you could attach elastic IPs to each server to ensure stability across server restarts). You could also set the hostname of each server for clarity. Set you your security groups to allow access on the relevant ports that rabbitmq uses. There may be security implications of doing this, since the data will be travelling over the public internet.
Set up a VPN between each server in the cluster. Amazon VPC has a VPN facility, but there are ways of setting it up yourself I think.
I think only option 1 is simplest. Option 2 has major security implications (I believe there are ways of securing the connection between the cluster servers, but they aren't documented on the rabbitmq website as far as I can tell). Option 3 is complex but probably the best option if you need multiple regions.
Note that rabbitmq clusters aren't meant to be run over wide geographical areas, since they aren't too reliable in the face of network partitions. See here: https://www.rabbitmq.com/clustering.html

graceful_stop.sh not found in HDP2.1 Hbase

I was reading Hortonworks documenrtation to remove regionserver from any host of cluster (http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_system-admin-guide/content/admin_decommission-slave-nodes-3.html).
It uses graceful_stop.sh script . The same script is described at Apache Hbase book (https://hbase.apache.org/book/node.management.html)
I was trying to find this script but not able to locate it .
hbase#node ~]$ ls /usr/lib/hbase/bin/
draining_servers.rb hbase.cmd hbase-daemon.sh region_status.rb test
get-active-master.rb hbase-common.sh hbase-jruby replication
hbase hbase-config.cmd hirb.rb start-hbase.cmd
hbase-cleanup.sh hbase-config.sh region_mover.rb stop-hbase.cmd
[hbase#node ~]$
Is this script is removed from hbase ?
Is there any other way to stop a region server from anyother host of cluster. For eg - I want to stop region server 1 . Can I do this by logging into region server2?
Yes, the script is removed from hbase if you use package install. But you can still find it in src files.
If you want to stop a region server A from another host B, then host B must have privilege to access A. e.g. You have added public key of host B to authorized_keys in A. For a typical cluster, a RS cannot login to other RS directly for security.
For how to write graceful_stop.sh by yourself, you can look at: https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/fA3019_vpZY