BIND9.7. When several named processes are running, how to judge which process is providing the service? - process

For example, I execute "sudo named" several times, so there are several named processes running. When I use "pidof named", I get several pids.
I want to calculate the CPU usage rate of the BIND process,so I need to get some parameters from "/proc/pid/stat", so I need the pid of the named process which is really providing the domain resolution service.
What's the difference between the named process which is providing the service and the others? Could you give me a detailed explanation?
thanks very much~
(It's my first time to use stackoverflow , to use English to ask quetions , please ignore those syntax errors.)

There should be just one named running, the scripts that manage the service ensure that. You shouldn't start it like that, you should use what your distribution uses to start it, probably something along the lines of service bind start (that is probably a RedHat-ism), or /etc/rc.d/bind start (for bog-standard SysVinit).
I was responsible for DNS for quite some time here. Some tips:
DNS is a very critical service, configure and monitor with extreme care. Do read up on setting up and managing this, don't go ahead until you are absolutely clear.
Get somebody as a backup for the case that you aren't available, and make sure they understand the previous point.
DNS isn't CPU intensive (OK, with signed domains and that newfangled stuff that might have changed), it is memory intensive (and network intensive, or at least sensitive to delays). Our main DNS server was running for months at a time, and clocked up some half hour of CPU time during that kind of period IIRC.
Separate your master server (responsible for the domain(s) from the servers queried by clients (caching servers). There have been vulnerabilities where malformed questions or "answers" to questions that hadn't been asked soiled the database
The master server will have all the domain information in RAM, make sure you have got enough of it
Make sure all machines under your jurisdiction use the same caching server. It makes no sense for more than one, that destroys the idea of cache.
The caching servers collect immense amounts of data over time. This data rarely is performance critical, so configure them with plenty of swap space to accommodate overflows.

Bind issues as many named processes as many CPUs you have:
man named:
-n #cpus
Create #cpus worker threads to take advantage of multiple CPUs. If not specified, named will try to determine the number of CPUs present and create one thread per CPU. If it is unable to determine the number of CPUs, a single worker thread will be created.
External source:
https://unix.stackexchange.com/questions/140986/multiple-named-processes-for-bind9-in-debian

Related

Scalability issues with server based authentication

I was reading up on problems with server based authentication. I need help with elaboration on the following point.
Scalability: Since sessions are stored in memory, this provides problems with scalability. As our cloud providers start replicating servers to handle application load, having vital information in session memory will limit our ability to scale.
I don't seem to understand why "... having vital information in session memory will limit our ability to scale", will limit the ability to scale. Is it just because the information is being replicated.. so it's to do with redundancy? I don't think so. Anyway, would anyone be kind enough to explain this further? Much appreciated.
What's being referred to is the difference between stateless and stateful server-side ops. Stateful servers keep part of their resources (main memory, mostly) occupied for retaining state pertaining to some client, even when the server is actually doing nothing at all for the client and just waiting for the client to come back. Such systems' performance profile is "linear" only up to the point where all available memory has been filled with state, and beyond that point the server seems to essentially stall. Stateless servers only keep resources occupied when they're actually doing something, and once finished doing stuff, those resources are immediately freed and available for other clients. Such servers are essentially not capped by memory limits and therefore "scale easier".
Also, the explanation given seems to refer to scenario's where a set of distinct machines present themselves to the outside world as being one, when actually they are not (this is often called a "cluster" of machines/servers). In such scenario's, if a client has connected to the "big single virtual machine", then actually he is connected to just one of the "actual machines" in the cluster. If state is kept there, subsequent visits by that same client must then be routed to the same physical machine, or that piece of state must be trafficked around to whatever machine the next visit happens to be to. The former implies the implementation of management functions that take their own set of resources, plus limitations on the freedom the cluster has to distribute the load (the opposite of why you want to do clustering), the latter implies additional network traffic that will cap scalability in essentially the same way as available memory does.
Server-based authentication makes use of sessions, which in turn make use of a local session id. In the cloud, when the servers are replicated to handle application load, it becomes difficult for one server to know which sessions are active on other servers. Now to overcome this problem, extra steps must be performed... for instance to persist the session id on to the database. However, as the servers are increasingly replicated, it becomes more and more difficult to handle all this. Therefore, server-based or session-based authentication can be problematic for scalability.

Zookeeper vs In-memory-data-grid vs Redis

I've found different zookeeper definitions across multiple resources. Maybe some of them are taken out of context, but look at them pls:
A canonical example of Zookeeper usage is distributed-memory computation...
ZooKeeper is an open source Apacheā„¢ project that provides a centralized infrastructure and services that enable synchronization across a cluster.
Apache ZooKeeper is an open source file application program interface (API) that allows distributed processes in large systems to synchronize with each other so that all clients making requests receive consistent data.
I've worked with Redis and Hazelcast, that would be easier for me to understand Zookeeper by comparing it with them.
Could you please compare Zookeeper with in-memory-data-grids and Redis?
If distributed-memory computation, how does zookeeper differ from in-memory-data-grids?
If synchronization across cluster, than how does it differs from all other in-memory storages? The same in-memory-data-grids also provide cluster-wide locks. Redis also has some kind of transactions.
If it's only about in-memory consistent data, than there are other alternatives. Imdg allow you to achieve the same, don't they?
https://zookeeper.apache.org/doc/current/zookeeperOver.html
By default, Zookeeper replicates all your data to every node and lets clients watch the data for changes. Changes are sent very quickly (within a bounded amount of time) to clients. You can also create "ephemeral nodes", which are deleted within a specified time if a client disconnects. ZooKeeper is highly optimized for reads, while writes are very slow (since they generally are sent to every client as soon as the write takes place). Finally, the maximum size of a "file" (znode) in Zookeeper is 1MB, but typically they'll be single strings.
Taken together, this means that zookeeper is not meant to store for much data, and definitely not a cache. Instead, it's for managing heartbeats/knowing what servers are online, storing/updating configuration, and possibly message passing (though if you have large #s of messages or high throughput demands, something like RabbitMQ will be much better for this task).
Basically, ZooKeeper (and Curator, which is built on it) helps in handling the mechanics of clustering -- heartbeats, distributing updates/configuration, distributed locks, etc.
It's not really comparable to Redis, but for the specific questions...
It doesn't support any computation and for most data sets, won't be able to store the data with any performance.
It's replicated to all nodes in the cluster (there's nothing like Redis clustering where the data can be distributed). All messages are processed atomically in full and are sequenced, so there's no real transactions. It can be USED to implement cluster-wide locks for your services (it's very good at that in fact), and tehre are a lot of locking primitives on the znodes themselves to control which nodes access them.
Sure, but ZooKeeper fills a niche. It's a tool for making a distributed applications play nice with multiple instances, not for storing/sharing large amounts of data. Compared to using an IMDG for this purpose, Zookeeper will be faster, manages heartbeats and synchronization in a predictable way (with a lot of APIs for making this part easy), and has a "push" paradigm instead of "pull" so nodes are notified very quickly of changes.
The quotation from the linked question...
A canonical example of Zookeeper usage is distributed-memory computation
... is, IMO, a bit misleading. You would use it to orchestrate the computation, not provide the data. For example, let's say you had to process rows 1-100 of a table. You might put 10 ZK nodes up, with names like "1-10", "11-20", "21-30", etc. Client applications would be notified of this change automatically by ZK, and the first one would grab "1-10" and set an ephemeral node clients/192.168.77.66/processing/rows_1_10
The next application would see this and go for the next group to process. The actual data to compute would be stored elsewhere (ie Redis, SQL database, etc). If the node failed partway through the computation, another node could see this (after 30-60 seconds) and pick up the job again.
I'd say the canonical example of ZooKeeper is leader election, though. Let's say you have 3 nodes -- one is master and the other 2 are slaves. If the master goes down, a slave node must become the new leader. This type of thing is perfect for ZK.
Consistency Guarantees
ZooKeeper is a high performance, scalable service. Both reads and write operations are designed to be fast, though reads are faster than writes. The reason for this is that in the case of reads, ZooKeeper can serve older data, which in turn is due to ZooKeeper's consistency guarantees:
Sequential Consistency
Updates from a client will be applied in the order that they were sent.
Atomicity
Updates either succeed or fail -- there are no partial results.
Single System Image
A client will see the same view of the service regardless of the server that it connects to.
Reliability
Once an update has been applied, it will persist from that time forward until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been applied. On some failures (communication errors, timeouts, etc) the client will not know if the update has applied or not. We take steps to minimize the failures, but the only guarantee is only present with successful return codes. (This is called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful update, will never be rolled back when recovering from server failures.
Timeliness
The clients view of the system is guaranteed to be up-to-date within a certain time bound. (On the order of tens of seconds.) Either system changes will be seen by a client within this bound, or the client will detect a service outage.

Redis cache in a clustered web farm? Sync between two member nodes?

Ok, so what I have are 2 web servers running inside of a Windows NLB clustered environment. The servers are identical in every respect, and as you'd expect in an NLB clustered environment, everybody is hitting the cluster name and not the individual members. We also have affinity turned off on the members in the cluster.
But, what I'm trying to do is to turn on some caching for a few large files (MP3s). It's easy enough to dial up a Redis node on one particular member and hit it, everything works like you'd expect. I can pull the data from the cache and serve it up as needed.
Now, let's add the overhead of the NLB. With an NLB in play, you may not be hitting the same web server each time. You might make your first hit to member 01, and the second hit to 02. So, I'd need a way to sync between the two servers. That way it doesn't matter which cluster member you hit, you are going to get the same data.
I don't need to worry about one cache being out of date, the only thing I'm storing in there is read only data from an internal web service.
I've only got 2 servers and it looks like redis clusters need 3. So I guess that's out.
Is this the best approach? Or perhaps there is something else better?
Reasons for redis: We only want the cache to use in-memory only. No writes to the database. Thought this would be a good fit, but need to make sure the data is available in both servers.
It's not possible to have redis multi master (writing on both). And I might say it's replication is blazing fast (check the slaveof command of Redis).
But why you need it in the same server? Access it as a service. So every node will access the actual data. If the main server goes down, the slave will promptly turn itself into a master.
One observation: you might notice that Redis makes use of disk in an async way. An append only file that it does checkpoint depending on the size from time to time so.

Alternative for batch job scheduling (in compute pool)

Since I don't have root rights on the machines in a compute pool, and thus cannot adapt the load parameters of atd for batch, I'm looking for an alternative way to do job scheduling. Since the machines are used by multiple users, it should be able to take the load into account. Optionally, I'm looking for a way to do this for all the machines it the pool, I.e. there is one central queue with jobs that need to be ran, and a script that distributes them (over ssh) over the machines that are under a certain load. Any ideas?
First: go talk to the system administrators of the compute pool. Enterprise wide job schedulers have become a rather common component in infrastructures these days. Typically, these schedulers do not take into account system load though.
If the above doesn't lead to a good solution, you should carefully consider what load your jobs will impose on the machine: your jobs could be stressing the cpu more, consume large amounts of memory, generate lots of network or disk IO activity. Consequently, determining whether your job should start may depend on a lot of measurement, some of which you would not be able to do as an ordinary user (depends a bit on the kind of OS you are running, and how tight security is). In any case: you would only be able to take into account the load at the job's start up. Obviously, if every user would do this, you're back at square one in no time...
It might be a better idea to see with your system administrator if they have some sort of resource controls in place (e.g. projects in Solaris) through which they can make sure your batches are not tearing down the nodes in the compute pool. Next, write your batch jobs in such a way that they can cope with the OS declining requests for resources.
EDIT: As for the distributed nature: queueing up the jobs and having clients on all nodes point to the same queue, consuming as much as they can in the context of the resource controls...

The ideal multi-server LAMP environment

There's alot of information out there on setting up LAMP stacks on a single box, or perhaps moving MySQL onto it's own box, but growing beyond that doesn't seem to be very well documented.
My current web environment is having capacity issues, and so I'm looking for best-practices regarding configuration tuning, determining bottlenecks, security, etc.
I presently host around 400 sites, with a fair need for redundany and security, and so I've grown beyond the single-box solution - but am not at the level of a full ISP or dedicated web-hosting company.
Can anyone point me in the direction of some good expertise on setting up a great apache web-farm with a view to security and future expansion?
My web environment consists of 2 redundant MySQL servers, 2 redundant web-content servers, 2 load balancing front-end apache servers that mount the content via nfs and share apache config and sessions directories between them, and a single "developer's" server which also mounts the web-content via nfs, and contains all the developer accounts.
I'm pretty happy with alot of this setup, but it seems to be choking on the load prematurely.
Thanks!!
--UPDATE--
Turns out the "choking on the load" is related to mod_log_sql, which I use to send my apache logs to a mysql database. By re-configuring the webservers to write their sql statements to a disk file, and then creating a separate process to submit those to the database it allows the webservers to free up their threads much quicker, and handle a much greater load.
You need to be able to identify bottlenecks and test improvements.
To identify bottlenecks, you need to use your system's reporting tools. Some examples:
MySQL has a slow query log.
Linux provides stats like load average, iostat, vmstat, netstat, etc.
Apache has the access log and the server-status page.
Programming languages have profilers, like Pear Benchmark.
Use these tools to identifyy the slowest/biggest offenders and concentrate on them. Try an improvement and measure to see if it actually improves performance.
This becomes a never ending loop for two reasons: there's always something in a complex system that can be faster and as your system grows, different functions will start slowing down.
Based on the description of your system, my first hunch would be disk io and network io on the NFS servers, then I'd look at MySQL query times. I'd also check the performance of the shared sessions.
The schoolbook way of doing it would be to identify the bottlenecks with real empirical data.
Is it the database, apache, network, cpu, memory,io? Do you need more ram, sharding(+), is the DiskIO, the NFS network load, cpu for doing full table scans?
When you find out where the problem is you might run into the problem that its not enough to scale the infrastructure, because of the way the code works, and you end up with the need to either just create more instances of you current setup or make the code different.
I would also recommend as a first step in terms of scalability, off-load your content to a CDN like Edgecast. Use your current two content servers as additional web servers.