Keep time in sync among servers without internet connection - clock

I have 5 servers on a LAN without Internet connection. I need them to keep the clock in sync among them.
I could configure them as NTP peers, and set a high stratum for the local clock of one of them. In this way, the other four would sync with that clock.
What I actually want, is them to agree on a time using all of the 5 local clocks (i.e. doing some kind of average), for reasons of robustness and precision. Is it possible with NTP?
PS: I do not want to use an external clock source.
EDIT: and no scripting outside NTP features, that could only make precision worse :)

If you average 5 drifting clocks, the only thing you get is another drifting clock that's harder to correct. It won't be more precise. NTP uses multiple servers to increase precision because it takes network latency into account. Since all your systems are on a fast local network, you just need one server.
Set up two systems to be NTP server, one a primary, and if you feel the need, one a backup. Have all other systems synchronize to them. This will be significantly easier to set up than the clock-averaging solution, and you won't have to develop any crazy scripts.

You might be able to have one of them listen for the times from each computer, perform an average, set the average as it's own time, and broadcast that time for all the other computers. It seems a little excessive, though.

you can set up one of them as ntp server which will broadcast its time on the local network and the others as slaves to listen on the local network
edit:
I missed the average part. well, in that case, you can probably write a script on the local server to collect times from all the slaves get the average and update own time with that value.
You may even want to get rid of ntp in that case and just use the script to update time on all the servers

I wish I could give a definitive proposal, but I don't know enough about your environment. No matter what you'll likely be doing some sort of script kung fu.
If it's unix/linux I would set everyone up with SSH authorized keys to poll each others' date +%s command (to get the epoch), average those times with awk or something, and then set the machine's own local date.
Or perhaps it would be more secure (and reliable) to have one authoritative machine check everyone's time in the same manor, average it, and then provision itself and every other host to that average.
On Windows you'll probably be looking into VBScript and WMI.
EDIT:
You may run into some weird problems if anyone's clock drifts forward from the average and my guess is about half of them will ;). Future timestamps can be rather strange. It will be up to you to determine how frequently this synchronization will need to occur.

Related

What is the minimum amount of time you need to wait before calling out to an NTP server again? (wait at least 64/128/10000/etc. seconds)

What is the minimum amount of time you need to wait before requesting time again from an NTP server? I've seen a wide variety of answers, but I'm looking to get a good ballpark.
Is it OK to query an NTP server every 5 minutes? I've heard if you query too much they'll IP address ban you and I'm using the NIST NTP server so I'm worried about getting IP address banned from a US government server.
I'm looking to not have to deal with an RTC in my hardware prototype.
It depends on the server; typically, NTP will start at 64 seconds and stabilize up to 1024 seconds. Cisco uses iburst by default, but they tend to stay up; otherwise iburst can cause issues/blocks.

How can I deal with the webserver UI of one machine being out of sync with backend/API of another?

The system my company sells is software for a multi-machine solution. In some cases, there is a UI on one of the machines and a backend/API on another. These systems communicate and both use their own clocks for various operations and storage values.
When the UI's system clock gets ahead of the backend by 30 seconds or more, the queries start to misbehave due to the UI's timestamp being sent over as key information to the REST request. There is a "what has been updated by me" query that happens every 30 seconds and the desync will cause the updated data to be missed since they are outside the timing window.
Since I do not have any control over the systems that my software is installed on, I need a solution on my code's side. I can't force customers to keep their clocks in sync.
Possible solutions I have considered:
The UI can query the backend for it's system time and cache that.
The backend/API can reach back further in time when looking for updates. This will give the clocks some room to slip around, but will cause a much heavier query load on systems with large sets of data.
Any ideas?
Your best bet is to restructure your API somewhat.
First, even though NTP is a good idea, you can't actually guarantee it's in use. Additionally, even when it is enabled, OSs (Windows at least) may reject packets that are too far out of sync, to prevent certain attacks (on the order of minutes, though).
When dealing with distributed services like this, the mantra is "do not trust the client". This applies even when you actually control the client, too, and doesn't necessarily mean the client is attempting anything malicious - it just means that the client isn't the authoritative source.
This should include timestamps.
Consider; the timestamps are a problem here because you're trying to use the client's time to query the server - except, we shouldn't trust the client. Instead, what we should do is have the server return a timestamp of when the request was processed, or the update stamp for the latest entry of the database, that can be used in subsequent queries to retrieve new updates (how far back you go on initial query is up to you).
Dealing with concurrent updates safely is a little harder, and depends on what is supposed to happen on collision. There's nothing really different here from most of the questions and answers dealing with database-centric versions of the problem, I'm just mentioning it to note you may need to add extra fields to your API to correctly handle or detect the situation, if you haven't already.

BIND9.7. When several named processes are running, how to judge which process is providing the service?

For example, I execute "sudo named" several times, so there are several named processes running. When I use "pidof named", I get several pids.
I want to calculate the CPU usage rate of the BIND process,so I need to get some parameters from "/proc/pid/stat", so I need the pid of the named process which is really providing the domain resolution service.
What's the difference between the named process which is providing the service and the others? Could you give me a detailed explanation?
thanks very much~
(It's my first time to use stackoverflow , to use English to ask quetions , please ignore those syntax errors.)
There should be just one named running, the scripts that manage the service ensure that. You shouldn't start it like that, you should use what your distribution uses to start it, probably something along the lines of service bind start (that is probably a RedHat-ism), or /etc/rc.d/bind start (for bog-standard SysVinit).
I was responsible for DNS for quite some time here. Some tips:
DNS is a very critical service, configure and monitor with extreme care. Do read up on setting up and managing this, don't go ahead until you are absolutely clear.
Get somebody as a backup for the case that you aren't available, and make sure they understand the previous point.
DNS isn't CPU intensive (OK, with signed domains and that newfangled stuff that might have changed), it is memory intensive (and network intensive, or at least sensitive to delays). Our main DNS server was running for months at a time, and clocked up some half hour of CPU time during that kind of period IIRC.
Separate your master server (responsible for the domain(s) from the servers queried by clients (caching servers). There have been vulnerabilities where malformed questions or "answers" to questions that hadn't been asked soiled the database
The master server will have all the domain information in RAM, make sure you have got enough of it
Make sure all machines under your jurisdiction use the same caching server. It makes no sense for more than one, that destroys the idea of cache.
The caching servers collect immense amounts of data over time. This data rarely is performance critical, so configure them with plenty of swap space to accommodate overflows.
Bind issues as many named processes as many CPUs you have:
man named:
-n #cpus
Create #cpus worker threads to take advantage of multiple CPUs. If not specified, named will try to determine the number of CPUs present and create one thread per CPU. If it is unable to determine the number of CPUs, a single worker thread will be created.
External source:
https://unix.stackexchange.com/questions/140986/multiple-named-processes-for-bind9-in-debian

Justifications for a test/development server

At my current workplace, the production SQL server and web servers are also used as development and test servers. I've asked for dedicated servers, but been refused as I can't justify it to satisfaction (the reasons against being cost of software, software licenses and hardware resources).
So, what justifications are there for a dedicated test/development server (a combined server at the moment - I don't want to push my luck and ask for 6 servers!)?
Summarised list
Resource usage
Prevention of errors
DR purposes
The list doesn't seem as extensive as I'd hoped.
Consider using Virtual Machines to reduce costs.
Well for starters the potential resources the production database has to use is restricted.
Also rogue/accidental developer SQL scripts could play havock with the production data.
Could there be issues with production data sensitivity? (eg personal data)
just a few to get started :)
Try to calculate the cost of downtime if you take the production system down due to a mistake in development.
Try also to calculate the cost of slow response times in production if/when you are doing performance testing.
As a cost benefit the test/dev hardware can be used as a spare if something bad happens to the production hardware.
Explain how often developer have fat-handed moments and hit enter too soon while editing statements starting...
drop table...
UPDATE veryImportantTable SET veryImportantField = '' WHERE 1 = 1 --TODO: make proper condition
This'd be reason enough for me. :)
I hope you have at least separate databases and are not developing on production data.
Check the data protection act, and also look into PCI-DSS if you want to be really secure (Payment Card Industry Data Security Standard).
I think it's livable to have a test-database on the same physical machine as your production DB. Performance is often not an issue (and assuming it's a multicore muchas memory machine, even if you do a heavy query on test, production will often not noticably slow down), and so long as the DB connections are separate, the chance of accidental damage is very very low.
As for a web-server, almost any machine can run one of those (apache is free, and even IIS is free for 10 simultaneous connections or fewer) - you could install a test web server on any old machine, configure it to use your test DB, and have a decent, low-cost solution.
'course a separate machine is "cleaner" - but the difference isn't huge.
One strong argument is availability / reduce downtime / disaster recovery.
i.e. to have another machine on standby to replace the production machine should anything bad happen to it hardware-wise (e.g. disk controllers or motherboards or power supplies dying).
Ideally the additional machine should be identical to the production one so it can be swapped directly, or individual parts swapped in as required. They can also back each other up or have a local copy of their counterparts last backup so they can be restored from quickly.
Of course it depends on how critical uptime is to the business as to how much value they'll see it this. If you're able to roughly work out how much they'll lose in $ due to lost business with and without a 'hot spare' server and present your case from a $ saved viewpoint (hopefully a lot more than the cost of the server), they might go for it.

Is it possible to get sub-1-second latency with transactional replication?

Our database architecture consists of two Sql Server 2005 servers each with an instance of the same database structure: one for all reads, and one for all writes. We use transactional replication to keep the read database up-to-date.
The two servers are very high-spec indeed (the write server has 32GB of RAM), and are connected via a fibre network.
When deciding upon this architecture we were led to believe that the latency for data to be replicated to the read server would be in the order of a few milliseconds (depending on load, obviously). In practice we are seeing latency of around 2-5 seconds in even the simplest of cases, which is unsatisfactory. By a simplest case, I mean updating a single value in a single row in a single table on the write db and seeing how long it takes to observe the new value in the read database.
What factors should we be looking at to achieve latency below 1 second? Is this even achievable?
Alternatively, is there a different mode of replication we should consider? What is the best practice for the locations of the data and log files?
Edit
Thanks to all for the advice and insight - I believe that the latency periods we are experiencing are normal; we were mis-led by our db hosting company as to what latency times to expect!
We're using the technique described near the bottom of this MSDN article (under the heading "scaling databases"), and we'd failed to deal properly with this warning:
The consequence of creating such specialized databases is latency: a write is now going to take time to be distributed to the reader databases. But if you can deal with the latency, the scaling potential is huge.
We're now looking at implementing a change to our caching mechanism that enforces reads from the write database when an item of data is considered to be "volatile".
No. It's highly unlikely you could achieve sub-1s latency times with SQL Server transactional replication even with fast hardware.
If you can get 1 - 5 seconds latency then you are doing well.
From here:
Using transactional replication, it is
possible for a Subscriber to be a few
seconds behind the Publisher. With a
latency of only a few seconds, the
Subscriber can easily be used as a
reporting server, offloading expensive
user queries and reporting from the
Publisher to the Subscriber.
In the following scenario (using the
Customer table shown later in this
section) the Subscriber was only four
seconds behind the Publisher. Even
more impressive, 60 percent of the
time it had a latency of two seconds
or less. The time is measured from
when the record was inserted or
updated at the Publisher until it was
actually written to the subscribing
database.
I would say it's definately possible.
I would look at:
Your network
Run ping commands between the two servers and see if there are any issues
If the servers are next to each other you should have < 1 ms.
Bottlenecks on the server
This could be network traffic (volume)
Like network cards not being configured for 1GB/sec
Anti-virus or other things
Do some analysis on some queries and see if you can identify indexes or locking which might be a problem
See if any of the selects on the read database might be blocking the writes.
Add with (nolock), and see if this makes a difference on one or two queries you're analyzing.
Essentially you have a complicated system which you have a problem with, you need to determine which component is the problem and fix it.
Transactional replication is probably best if the reports / selects you need to run need to be up to date. If they don't you could look at log shipping, although that would add some down time with each import.
For data/log files, make sure they're on seperate drives so the performance is maximized.
Something to remember about transaction replication is that a single update now requires several operations to happen for that change to occur.
First you update the source table.
Next the log readers sees the change and writes the change to the distribution database.
Next the distribution agent sees the new entry in the distribution database and reads that change, then runs the correct stored procedure on the subscriber to update the row.
If you monitor the statement run times on the two servers you'll probably see that they are running in just a few milliseconds. However it is the lag time while waiting for the log reader and distribution agent to see that they need to do something which is going to kill you.
If you truly need sub second processing time then you will want to look into writing your own processing engine to handle data moving from one server to another. I would recommend using SQL Service Broker to handle this as this way everything is native to SQL Server and no third party code has to be written.