Closet server versus Colo? - hardware

As a programmer I need a place to store my stuff. I've been running a server in my parents closet for a long time, but I recently came across a decent 2U server.
I have no experience dealing with hosting companies, beyond the very cheap stuff, and I'm wondering what I should look for in a colo or if I should just keep my closet server.

There are three major factors here.
Cost. The colo will obviously be more expensive than sticking a server in your parents' closet.
Quality. The colo should be a lot more reliable than the server in your parents' closet. They aren't as likely to go down when there's a power surge. They should provide some support if things do go wrong on their end. They will also likely give you better bandwidth.
Convenience. It is a lot easier to fix a broken box when you can walk over to it and plug up a monitor. Going to the colo to troubleshoot is probably not going to be convenient, if it's even possible. Transferring files from your laptop to the server in the closet is also going to be a lot faster than transferring over the Internet. On the other hand, if it's your box in the closet, you have to deal with the hardware problems, so it can balance out.
Personally, I pay for a (shared) server. I find that having someone else handle the server is worth it. Uploading large files can get really frustrating, but having to maintain an extra box in the closet is too much hassle for me.
You really have to decide what you value most. Is it worth the extra money to you to have a more reliable, more hands-off server?

If it were my source control server, I would not want to a) pay the added cost, or b)have to drive to the colo because I can't connect to my repository.

I'd absolutely go for the server that's located under my roof, as long as I don't need it to be connected to the internet with a static IP.
Why:
It's a target for hackers, as soon as it is reachable from the net
Any problems with the machine? I'd rather walk to the closet instead of calling a hotline - and probably pay for the service
Connection speed (from me to the server)
I can turn it off as long as I don't need it. Less power consuption, which is better for the environment.
The hosting of a machine costs money all the time. Even when you don't need it.

Related

architecture for high availability

I have this scenario:
You have a factory process line which runs 24/7. Downtime is extremely expensive.
The software controlling all different parts must use a shared form of database storage
The main reason for this is to know in which state the factory is in. For example some products can be mixed when using the same set of equipement and others DEFINITELY not.
requirements:
I want to the software be able to detect that an error in one part of
the plant must result in some machine shutdown more then 1 km away. so stoing data in the plc's is not an option.
Updates and upgrades to the factory environment are frequent
load (in computer terms) will be really low.
The systems handles a few hunderd assignments a day for which calculations / checks are done followed by instructions send for the factory machines. Systems will be bored most of the time. Most important requirement is the central computer system must be correct and always working.
I was thinking to use a dynamo based database (riak or cassandra) where data gets written to multiple machines with each machine having the whole database
When one system goes down it will go down unoticed. A Traditional sql databse might be more of a pain to upgrade when tables changes and this master slave is harder to configure.
What would be your solution?
Network has been made redundant and most other single points of failure to. The database system is critical because downtime of the db means downtime for the entire plant not just one of the machines which is acceptable.
How to solve shared state problem.
complexity in the database will not be a problem. I will be more like a simple key value store to get the most current and correct data.
I don't think this is a sql/nosql question. All of Postgres, MySQL and MS SQL Server have some kind of cluster or hot standby option.
Configuration is a one-time thing, but any NoSQL option is going to give you headaches from top-to-bottom of code, if you are trying to do something fundamentally relational on a platform that has given up relational for the purposes of running things like Amazon or Facebook. The configuration is once, the coding is forever.
So I would say stick with a tried and true solution and get that hot replication going.
This also provides a solution for upgrades. The typical sequence is to "fail over" to the standby, upgrade the master, flip back to the master, upgrade the standby, and resume. With details specific to the situation of course.
Use an established RDBMS that supports such things natively
Do you really want to run a 24/7 mission critical system on something that may be consistent at any point in time?
You need to avoid single points of failure.
All the major players in our dbms world offer at least one way to avoid making the database itself a single point of failure. I might question whether they can propagate changes fast enough for your manufacturing processes. (Or is data update not really an issue? Can't really tell from your question.) My db work in manufacturing is limited to the car and the chemical industry. Microseconds didn't matter to them.
But the dbms isn't the only thing that can fail. "Always working" means that the clients have to always be working, too. Client hardware, connections to the network, the network and network servers themselves all probably have single points of failure. Failure-tolerant servers have multiple power supplies, multiple NICs, etc.
"Always working" is really expensive. I have a feeling that the database isn't going to be the biggest problem for your company.

What is a good access time to a database (SQL)?

Hey.. i wanna know which time is a good accesstime, because i'm searching for a good sql database and hsqldb says their accesstime is 12ms... <-- good?
I think it would depend on your needs. Is it for a web server or a desktop application? The amount of data is also important, because reading lots of small records will perform differently than reading a few large records. Access time is also based upon your hardware, software and maybe even some other factors.
For example, you can use a database with lightning-fast access, but if your users need to connect to it over a 5 megabit VPN connection, passing through three different proxies and with trafic world-wide, your database would then just be a waste of power.
Basically, it's a marketing thing that they're claiming. It's a good product but don't just focus on access time. Make sure you also look at your other needs. Another system might just perform better, even if it has a slower acess time, because it is more optimized in reading it's indices and stuff.
So, what do you want, exactly?
I don't think access time tells you anything, really. If you have slow or incorrectly configured storage, then this access time metric will be dwarfed by how much time is spent on waits and split I/Os. Network latency is also a factor, since I'm guessing you probably won't want to have your code on the same machine as your database, and you will most likely have a few network devices you'll need to traverse in your production environment.
In my experience, all the database platforms these days will all perform adequately if configured correctly and paired with a complementary application. Pick the DBMS that best fits your requirements, follow the best practices for configuration of the DBMS on your hardware, and you should be please with the outcome.

Is using a geographically distributed development team a better approach for running a software startup?

It's commonly agreed that successful software development is as much about teamwork and communication as it is about individual programming expertise. Given this, one might assume that by operating a geographically distributed team you are at an immediate disadvantage to a tight-knit team all working locally.
When my startup company was founded, we couldn't afford shared office space and I was actually located in a different city to the rest of the team, so we all had to work remotely and use tools such as Basecamp, Skype and Trac to communicate. One the whole, this was really successful - we got a huge amount of quality work done in a short space of time and launched a successful product. Working remotely gave our developers the time and space they needed to focus on the job and be productive without having interruptions or enduring office politics. To me, this is a huge advantage.
Given my experience, as well as the success of software companies with distributed teams such as 37signals and StackOverflow (and I'm sure many more), I'm increasingly of the opinion that the advantages of running a distributed team outweighs those of running a centralised team, especially for start-up companies.
Would you agree?
Given my experience, as well as the
success of software companies with
distributed teams such as 37signals
and StackOverflow (and I'm sure many
more), I'm increasingly of the opinion
that the advantages of running a
distributed team outweighs those of
running a centralised team, especially
for start-up companies.
Would you agree?
I half agree.
Running a distributed team definitely has its disadvantages. As you pointed out in your own post, communication is a big problem. There are times, as a developer, I enjoy just bouncing ideas off other developers and swapping ideas that I may not have thought up on my own. In addition, it can be tough to get feedback or to perform code reviews (practices that I have found useful in my development experience).
With that said, I also think there is an advantage to a distributed team. The biggest of these being that developers tend to do better when they can focus and just develop and not have to worry about being interrupted or having to attend frequent meetings, etc. This was a huge advantage at one job I had at a smaller company.
In your specific situation, have you considered that one reason you were so successful was not because you were geographically dispersed, but you were successful because you're a small company? Small companies have an advantage in that you have a limited number of products, there tends to be more focus, and, as a result, you can maintain a better control over your products/schedules/etc.
That's my 2 cents.
I agree that offices are quite distracting due to noise and interruptions. But the distractions that hinder you are the other side of the coin to the ability to ask people around you questions. Although I've not tried remote working for more than a few days at a time, the inability to get an answer to a quick question in 30s is the main disadvantage that I see.
Like-for-like comparisons that might give us empirical data are very hard to do, arguably practically impossible. So that gives us the licence to speculate, right?
My pet theory is that any sufficiently talented and motivated team can make most any system, method, geographical dispersion work.
I totally agree. An office environment provides mainly distractions and opportunities to waste time and look busy. A distributed team doesn't have to pay rent, they can deduct part of their own rent or mortgage from their taxes, and they can recruit talent from virtually anywhere in the world (instead of trying to find capable RoR developers in East Bumwipe, Oklahoma).
Are you a regular reader of Joel Spolsky's blog?
Joel described the centralized offices they have set up in order to increase productivity.
More than enough room for each developer, so they can walk up and down for a while whenever a bug haunts one of them. :)
Separated offices. During work hours, only the developer and the given task exist. Nothing else.
Sound-proof walls. (As far as I can remember.) Generally useful to provide full control over work space. Devs can listen to music without headphones, for example.
As you can see, FogCreek has managed to combine most advantages of remote work, while still keeping live communication as an option.
However, due to lack of teleportation, this customized and professional office is yet to solve the problem of different world-wide locations.
From personal experience I am much more productive when working remotely. I lose the sense that someone is staring over my shoulder, criticizing me for being lazy when I'm really just taking a moment to collect my thoughts.
I also appreciate not having a commute, even if I'm only saving 20 minutes each way it's a huge load off of my back, plus I don't have to dress to be in the office so I save time getting ready in the morning.
I've found that it's fairly easy to mitigate the communication issues by implementing a certain time during the day to be online, we had people on the east and west coast so we had people stay online between 1-4p EST. Also, just making sure that everyone has each other's phone numbers was a good thing, there were many problems that could be resolved with a quick phone call.
I wish that more businesses would support remote developers, I'm in an office right now and I feel that being here is so wasteful. I could get more done in less time without the distractions involved, and would have a better ability to manage my time.
Pros: You can hire the person you like instead of sticking with those available in the neighborhood.
Cons: It can be difficult to communicate if your team members live in various time zones.
I think a start up works best if the core team are physically close in space. As the team grows and the product and processes matures remote work gains traction in my experience. During that critical first year there can't be too much communication between developers and founders.
Once the startup has real direction and good processes in place remote working becomes very effective.
Certainly having some developers working remotely saves real money in overhead costs and makes everyone happy if its possible.
In my startup a lot of our work requires direct physical interaction with expensive equipment, so we can't all be virtual. Some of us can, and our remote developers are good contributors.
I've been working for US based companies from my country for about 4 years (as of Feb 2014). The experience has been very rewarding, and I feel now absolutely comfortable doing my job remotely, but there is a learning curve that needs to be endured, which cannot be overlooked. There are so many subtleties to communication that suddenly get lost when chatting over skype or sending emails. A whole level of information brought by body language and the sheer empathy that comes from knowing personally the person you're dealing with. Over time, you learn strategies around that, but there's no denial that it is a learning process.
Also, even though sometimes having the team working on the same office is perceived as distraction-prone, in my view, it also fosters a more dynamic environment, where ideas flow more freely and faster. It also encourages a "team-attitude" towards problem solving, which is great for consistency.
I think the best approach, whenever possible, is having a bit of both - work a few days from home, so people can focus and self organize their time, and then work a few days on the same office so that they are still part of a team, instead of islands in isolation.

Restart daily or 100% uptime for enterprise applications?

I have a general question that is rather open-ended (i.e. "depends on platform, application type, etc.") but I am looking for general guidelines as an answer.
When is it preferable to design an application for continuous operation (100% uptime) vs. scheduled daily shutdown/restart?
Obviously, web apps need to be up all the time, so assume for this question that we are discussing an internal enterprise application, such as an accounting system, or a B2B system that is only used actively during weekday business hours.
Arguments I've heard for each are as follows:
Pro 100% Uptime: "once you get an application running, it's better to keep it up, because there's a chance it won't restart when you shut it down."
Pro daily restarts: "an application that is up continuously for 3 years might one day go down, and nobody will know how to bring it back online."
Other considerations are memory growth, performance, need for maintenance, etc. This is a programming issue because the choice you make can affect your technical design. For example, you don't need to code certain batch jobs and clear state daily if you know the application will be shutdown/restarted daily.
Thoughts?
The arguments you state both for and against 100% uptime are foolish arguments, in my opinion. If you're worried about the application not restarting when it is shutdown then you have larger issues than uptime concerns. Likewise, if you feel that nobody will know how to bring it back online after a prolonged period of uptime you have training and documentation issues.
The reality is that you should always design an application to be efficient when it comes to memory consumption and performance. Generally, by doing this you end up with an application that can sucessfully survive as a long running process or one that restarts frequently. Keep in mind that your typical computer system is rebooted periodically anyway due to OS updates, etc.
Unless you have requirements and service level agreements that guarantee 100% uptime, this isn't usually something you have to be overly concerned about as long as you design an application efficiently.
Sorry, but I'm not getting the point or this question is totally pointless.
An application, any application, should be designed, IMO, to stay up unless it's needed. If an application/platform needs to be restarted daily, then it has memory leaks, or bugs, or it's, in general, poorly written.
The point "don't make it stay up too long, otherwise you'd risk nobody will ever remember how to turn it up again" is quite laughable. I do Application Management (Operations) as my daily job, and I've never seen an application staying up for more than one month. After that period, you have to cope with OS maintainance, db patching, software upgrades, etc.
So, to summarize: write applications that can stay up as long as it's needed.
When is it preferable to design an application for continuous operation (100% uptime) vs. scheduled daily shutdown/restart?
I think this is really an orthogonal question to application design. Many web servers and application containers can support hot restarts. In other words, this is not a question so much of "application design" but rather a choice of technology. For example, you can avoid the question entirely by simply having N copies of your application (N > 1), then systematically bringing a particular instance down for maintenance and restarting as needed.
Furthermore, business needs and requirements should be determining the appropriate downtime, not your choice of technology.
Pro daily restarts: "an application that is up continuously for 3 years might one day go down, and nobody will know how to bring it back online."
Hogwash. That is a social/organizational argument, not a technical one. This is solved by having an obvious build process which includes starting the server as one of its possible tasks. That reduces the task of "restarting" to a single command.
If you're not extremely confident in your team, it might be better to go down from time to time, to clear everything. Once a day could do it, but there is a range from this to "never" ...
But this is generally dictated by business contraints. If you don't have those constraints yet ...
Well, why don't you also postpone your decision then ?
As others said, if you can't trust your app to start up again you have much larger issues.
From experience my general, personal, recommendation for web-apps is to cycle them once a day (in the early hours of the morning i.e. at the lowest traffic point) staggered over the whole server cluster. No matter how memory efficient your app is web-apps in particular can always have cache bloat issues over extended periods of up-time and one you accept the inevitability of a restart you absolutely want that to happen on your schedule and not t the whim of w3wp.exe.
Of course this all depends on the number of servers you have, the traffic manager you have (if any) and what your traffic profile looks like.
Apart from "Your app is not good enough if you need to restart it" ideas (which I see them perfect and I like them), I would prefer something in the middle as a preventive measure.
If you application is not too big, and one person can restart it without much trouble, it would be fine to restart it maybe once per month or 3/4 times per year. This way you will ensure that the sysadmin knows well how to do it (people sometimes comes and go form the companies) and also his knowledge keeps fresh.
If you have a problem and your sysadmin has not restarted the application since two years ago, he will have several manuals available and courses done, but probably he has forget some steps, or he is not so quick to solve the problem.
Other topic to consider is: "Is a fully implemented application or are you still working on it?" If it's an application made for yourselves, you still code on it and make frequent upgrades for new features, it can be interesting to restart it more frequently. If a problem appears, it has more probabilities to be hidden on the new code. It will help your programmers to fix it and your sysadmin to keep updated about what's happening with the app.
Of course, making a perfect application is always a top-prio element, but... ok, we all know that not always is possible

Is it premature optimization to develop on slow machines?

We should develop on slow boxen because it forces us to optimize early.
Randall Hyde points out in The Fallacy of Premature Optimization, there are plenty of misconceptions around the Hoare quote:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
In particular, even though machines these days scream compared with those in Hoare's day, it doesn't mean "optimization should be avoided." So does my respected colleague have a point when he suggests that we should develop on boxes of modest tempo? The idea is that performance bottlenecks are more irritating on a slow box and so they are likely to receive attention.
This should be community wiki since it's pretty subjective and there's no "right" answer.
That said, you should develop on the fastest machine available to you. Yes, anything slower will introduce irritation and encourage you to fix the slowdowns, but only at a very high price:
Your productivity as a programmer is directly related to the number of things you can hold in your head, and anything which slows down your process or impedes you at all lengthens the amount of time you have to hold those ideas in short term-memory, making you more likely to forget them, and have to go re-learn them.
Waiting for a program to compile allows the stack of bugs, potential issues, and fixes to drop out of your head as you get distracted. Waiting for a dialog to load, or a query to finish interrupts you similarly.
Even if you ignore that effect, you've still got the truth of the later statement - early optimization will leave you chasing yourself round in circles, breaking code that already works, and guessing (with often poor accuracy) about where things might get bogged down. Design your code properly in the first place, and you can forget about optimization until it's had a chance to settle for a bit, at which point any necessary optimization will be obvious.
Slow computers are not going to help you find your performance problems.
If your test data is only a few hundred rows in a table your db will cache it all and you'll never find badly written queries or bad table/index design. If your server application is not multi-threaded server you will not find that out until you stress test it with 500 users. Or if the app bottlenecks on bandwidth.
Optimization is "A Good Thing" but as I say to new developers who have all sorts of ideas about how to do it better 'I don't care how quickly you give me the wrong answer'. Get it right first, then make it faster when you find a bottleneck. An experienced programmer is going to design and build it reasonably well to start with.
If performance is really critical (real time? millisecond-transactions?) then you need to design and implement a set of benchmarks and tools to scientifically prove to yourselves that your changes are making it faster. There are way too many variables out there that affect performance.
Plus there's the classic programmer excuse they will bring out - 'but it's running slow because we have deliberately picked slow computers, it will run much faster when we deploy it.'
If your colleague thinks its important give him a slow computer and put him in charge of 'performance' :-)
I guess it would depend on what you're making and what the intended audience is.
If you're writing software for fixed hardware (say, console games) then use equipment (at least test equipment) that is similar or the same as what you will deploy on.
If you're developing desktop apps or something in that realm then develop on whatever machine you want and then tune it afterward to run on the desired min-spec hardware. Likewise, if you're developing in-house software, there is likely to be a min-spec for the machines that the company wants to buy. In that case, develop on a fast machine (to decrease development time and therefore costs) and test against that min-spec.
Bottom line, develop on the fastest machine you can get your hands on, and test on the minimum or exact hardware that you'll be supporting.
If you are programming on hardware that is close to the final test and production environments, you tend to find that there are less nasty surprises when it comes time to release the code.
I've seen enough programmers get side-swiped by serious, but unexpected problems caused by their machines being way faster than their most of their users. But also, I've seen the same problem occur with data. The code is tested on a small dataset and then "crumbles" on a large one.
Any differences in development and deployment environments can be the source of unexpected problems.
Still, since programming is expensive and time-consuming, if the end-user is running slow out-of-date equipment, the better solution is to deal with it at testing time (and schedule in a few early tests just to check usability and timing).
Why cripple your programmers just because you're worried about missing a potential problem? That's not a sane development strategy.
Paul.
for the love of Codd, use profiling tools, not slow development machines!
Optimization should be avoided, didn't that give us Vista? :p
But in all seriousness, its always a matter of tradeoffs. Important questions to ask yourself
What platform will your end users be using?
Can I drop cycles? What will happen if I do?
I agree with most that initial development should be done on the fastest or most efficient (not neccesarily the same) machine available to you. But for running tests, run it on your target platform, and test often and early.
Depends on your time to delivery. If you are in a 12 month delivery cycle then you should develop on a box with decent speed since your customers' 12 months from now will have better "average" boxes than the current "average".
As your development cycle approaches "today", your development machines should approach the current "average" speed of your clients' boxes.
I typically develop on the fastest machine I can get my hands on.
Most of the time I'm running a debug build, which is slow enough already.
I think it is a sound concept (but maybe because it works for me).
If my developer workstation is too fast I find I don't think ideas through thorougly enough simply because there is little time-penalty in re-generating the software image or downloading it to the target. I'd say at least half my downloads were unnecessary because I just remembered something I'd missed right before I was going to debug the code.
The target machine could well contain a throttled processor. If - on an embedded MCU - you have half the FLASH, RAM and clock cycles per second chances are developers will be a lot more careful when designing their code. I once suggested byte variables for the lengths of individual records in a data area (not in RAM but in a serial eeprom) and received the reply "we don't need to be stingy." A few months later they hit the RAM ceiling (128KiB). My reflection was that for this app there would never be any records larger than 256 bytes simply because there was no RAM to copy them to.
For server applications I think it would be a great idea to have a (much) lower-performing hardware to test on. Two or four cores instead of sixteen (or more). 1.6 GHz istdo 2.8. The list goes on. A server is usually - due to the very fact that everyone talks to it - a bottleneck in the system architecture. And that is long before you start developing the (server) application for it.