I've been reading about application servers, and the term "enterprise-level capabilities" show up very often. It sounds like a vague term that we, developers, use a lot (just like "cloud").
What does it mean when the server has "enterprise-level capabilities"?
What can the server do to support enterprise-level applications?
What should I expect from these types of application servers?
Any list of things that can be considered would be great, because I have no idea what makes up "enterprise-level capabilities".
It would be great if there are definitions depending on different perspectives (e.g. from developers, architecture, and business point of view).
"Expensive."
It's a vague (and thus largely meaningless) term.
It could mean that the solution scales, but you have scaling up and scaling out, and the preference for each could depend on the enterprise.
It could mean that it has high-availability features (again, different enterprises would have different preferences for failover modes).
If could mean that that it can handle a large amount of data (for instance Quicken would not be considered Enterprise if you expected it to handle the accounts of 10m customers). However, an enterprise system which handles a state-wide bank is a lot different than one that can handle an international bank.
In all cases it means the pricing is always "call for quote" and you'll be paying a lot of money. (But "a lot of money' varies by enterprise)
Related
I have this scenario:
You have a factory process line which runs 24/7. Downtime is extremely expensive.
The software controlling all different parts must use a shared form of database storage
The main reason for this is to know in which state the factory is in. For example some products can be mixed when using the same set of equipement and others DEFINITELY not.
requirements:
I want to the software be able to detect that an error in one part of
the plant must result in some machine shutdown more then 1 km away. so stoing data in the plc's is not an option.
Updates and upgrades to the factory environment are frequent
load (in computer terms) will be really low.
The systems handles a few hunderd assignments a day for which calculations / checks are done followed by instructions send for the factory machines. Systems will be bored most of the time. Most important requirement is the central computer system must be correct and always working.
I was thinking to use a dynamo based database (riak or cassandra) where data gets written to multiple machines with each machine having the whole database
When one system goes down it will go down unoticed. A Traditional sql databse might be more of a pain to upgrade when tables changes and this master slave is harder to configure.
What would be your solution?
Network has been made redundant and most other single points of failure to. The database system is critical because downtime of the db means downtime for the entire plant not just one of the machines which is acceptable.
How to solve shared state problem.
complexity in the database will not be a problem. I will be more like a simple key value store to get the most current and correct data.
I don't think this is a sql/nosql question. All of Postgres, MySQL and MS SQL Server have some kind of cluster or hot standby option.
Configuration is a one-time thing, but any NoSQL option is going to give you headaches from top-to-bottom of code, if you are trying to do something fundamentally relational on a platform that has given up relational for the purposes of running things like Amazon or Facebook. The configuration is once, the coding is forever.
So I would say stick with a tried and true solution and get that hot replication going.
This also provides a solution for upgrades. The typical sequence is to "fail over" to the standby, upgrade the master, flip back to the master, upgrade the standby, and resume. With details specific to the situation of course.
Use an established RDBMS that supports such things natively
Do you really want to run a 24/7 mission critical system on something that may be consistent at any point in time?
You need to avoid single points of failure.
All the major players in our dbms world offer at least one way to avoid making the database itself a single point of failure. I might question whether they can propagate changes fast enough for your manufacturing processes. (Or is data update not really an issue? Can't really tell from your question.) My db work in manufacturing is limited to the car and the chemical industry. Microseconds didn't matter to them.
But the dbms isn't the only thing that can fail. "Always working" means that the clients have to always be working, too. Client hardware, connections to the network, the network and network servers themselves all probably have single points of failure. Failure-tolerant servers have multiple power supplies, multiple NICs, etc.
"Always working" is really expensive. I have a feeling that the database isn't going to be the biggest problem for your company.
I am asking this in the context of NoSQL - which achieves scalability and performance without being expensive.
So, if I needed to achieve massively parallel distributed computing across databases ...
What are the various methodologies available today (within the RDBMS paradigm) to achieve distributed computing with high-scalability?
Does database clustering & mirroring contribute in any way towards distributed computing?
I guess you are asking about scalability of RDBMS databases. Talking about NoSQL databases based on ( amazon dynamo, BigTable ) are a whole another topic. I am talking about HBase, Cassandra etc. There are also commerical products like Oracle Coherence thats more like a distributed cache and key value store , to put it crudely.
going back to rdbms,
Sharding
to scale RDBMS one can do cusstom sharding. Sharding is a technique where you have multiple table is possibly multiple hosts. And then you decide in a certain fashion to assign certain rows to certain tables. For example you can say that rows 1-1M goes to table1, 1M-2M goes to table2 etc. But, this is a difficult process from an administration point of view. A lot of large scale websites scale by relying on sharding. Other techniques worth mentioning are partioning and mysql federation and mysql cluster.
MPP databases
Then there are databases are there very RDBMS which does distribution and scaling for you. Terradata is the most successful of these companies. I believe they used postgres core code at some point. A significant number of fortune 500 companies and a lot of the airlines use Terradata. But, its ridiculously expensive. There are newer companies like greenplum, vertica, netezza.
Unless you're a very big company with extreme scalability requirements, you can horizontally and ACID scale up your DB by building a cluster of identical RDBMS instances and synchronizing them with JTA transactions.
Take a look to this Java/JDBC based article the JEPLayer framework is used but you can use straight JDBC and JTA code.
Within the RDBMS paradigm: Sharding.
Outside the RDBMS paradigm: Key-value stores.
My pick: (I come from an RDBMS background) Key-value stores of the tabluar type - HBase.
Within the RDBMS paradigm, sharding will not get you far.
Use the RDBMS paradigm to design your model, to get your project up and running.
Use tabular key-value stores to SCALE OUT.
Sharding:
A good way to think about sharding is to see it as user-account-oriented
DB design.
The all schema entities touched by a user-account are kept on one host.
The assignment of user to host happens when the user creates an account.
The least loaded host gets that user.
When that user signs on after account creation, he gets connected
to the host that has his data.
Each host has a set of user accounts.
The problem with this approach is that if the host gets hosed,
a fraction of users will be blacked out.
The solution to this is have a replicated standby host that
becomes the primary when the primary host encounters problems.
Also, it's a fairly rigid setup for processes where the design does
not change dramatically.
From the user standpoint, I've noticed that web sites
with a sharded DB backend are not as quick to "turn on a dime"
to create different business models on their platform.
Contrast this with web sites that have truly distributed
key-value stores. These businesses can host any range of
services. Their platform is just that - a platform.
It's not relational and it does have an API interface,
but it just seems to work.
It's commonly agreed that successful software development is as much about teamwork and communication as it is about individual programming expertise. Given this, one might assume that by operating a geographically distributed team you are at an immediate disadvantage to a tight-knit team all working locally.
When my startup company was founded, we couldn't afford shared office space and I was actually located in a different city to the rest of the team, so we all had to work remotely and use tools such as Basecamp, Skype and Trac to communicate. One the whole, this was really successful - we got a huge amount of quality work done in a short space of time and launched a successful product. Working remotely gave our developers the time and space they needed to focus on the job and be productive without having interruptions or enduring office politics. To me, this is a huge advantage.
Given my experience, as well as the success of software companies with distributed teams such as 37signals and StackOverflow (and I'm sure many more), I'm increasingly of the opinion that the advantages of running a distributed team outweighs those of running a centralised team, especially for start-up companies.
Would you agree?
Given my experience, as well as the
success of software companies with
distributed teams such as 37signals
and StackOverflow (and I'm sure many
more), I'm increasingly of the opinion
that the advantages of running a
distributed team outweighs those of
running a centralised team, especially
for start-up companies.
Would you agree?
I half agree.
Running a distributed team definitely has its disadvantages. As you pointed out in your own post, communication is a big problem. There are times, as a developer, I enjoy just bouncing ideas off other developers and swapping ideas that I may not have thought up on my own. In addition, it can be tough to get feedback or to perform code reviews (practices that I have found useful in my development experience).
With that said, I also think there is an advantage to a distributed team. The biggest of these being that developers tend to do better when they can focus and just develop and not have to worry about being interrupted or having to attend frequent meetings, etc. This was a huge advantage at one job I had at a smaller company.
In your specific situation, have you considered that one reason you were so successful was not because you were geographically dispersed, but you were successful because you're a small company? Small companies have an advantage in that you have a limited number of products, there tends to be more focus, and, as a result, you can maintain a better control over your products/schedules/etc.
That's my 2 cents.
I agree that offices are quite distracting due to noise and interruptions. But the distractions that hinder you are the other side of the coin to the ability to ask people around you questions. Although I've not tried remote working for more than a few days at a time, the inability to get an answer to a quick question in 30s is the main disadvantage that I see.
Like-for-like comparisons that might give us empirical data are very hard to do, arguably practically impossible. So that gives us the licence to speculate, right?
My pet theory is that any sufficiently talented and motivated team can make most any system, method, geographical dispersion work.
I totally agree. An office environment provides mainly distractions and opportunities to waste time and look busy. A distributed team doesn't have to pay rent, they can deduct part of their own rent or mortgage from their taxes, and they can recruit talent from virtually anywhere in the world (instead of trying to find capable RoR developers in East Bumwipe, Oklahoma).
Are you a regular reader of Joel Spolsky's blog?
Joel described the centralized offices they have set up in order to increase productivity.
More than enough room for each developer, so they can walk up and down for a while whenever a bug haunts one of them. :)
Separated offices. During work hours, only the developer and the given task exist. Nothing else.
Sound-proof walls. (As far as I can remember.) Generally useful to provide full control over work space. Devs can listen to music without headphones, for example.
As you can see, FogCreek has managed to combine most advantages of remote work, while still keeping live communication as an option.
However, due to lack of teleportation, this customized and professional office is yet to solve the problem of different world-wide locations.
From personal experience I am much more productive when working remotely. I lose the sense that someone is staring over my shoulder, criticizing me for being lazy when I'm really just taking a moment to collect my thoughts.
I also appreciate not having a commute, even if I'm only saving 20 minutes each way it's a huge load off of my back, plus I don't have to dress to be in the office so I save time getting ready in the morning.
I've found that it's fairly easy to mitigate the communication issues by implementing a certain time during the day to be online, we had people on the east and west coast so we had people stay online between 1-4p EST. Also, just making sure that everyone has each other's phone numbers was a good thing, there were many problems that could be resolved with a quick phone call.
I wish that more businesses would support remote developers, I'm in an office right now and I feel that being here is so wasteful. I could get more done in less time without the distractions involved, and would have a better ability to manage my time.
Pros: You can hire the person you like instead of sticking with those available in the neighborhood.
Cons: It can be difficult to communicate if your team members live in various time zones.
I think a start up works best if the core team are physically close in space. As the team grows and the product and processes matures remote work gains traction in my experience. During that critical first year there can't be too much communication between developers and founders.
Once the startup has real direction and good processes in place remote working becomes very effective.
Certainly having some developers working remotely saves real money in overhead costs and makes everyone happy if its possible.
In my startup a lot of our work requires direct physical interaction with expensive equipment, so we can't all be virtual. Some of us can, and our remote developers are good contributors.
I've been working for US based companies from my country for about 4 years (as of Feb 2014). The experience has been very rewarding, and I feel now absolutely comfortable doing my job remotely, but there is a learning curve that needs to be endured, which cannot be overlooked. There are so many subtleties to communication that suddenly get lost when chatting over skype or sending emails. A whole level of information brought by body language and the sheer empathy that comes from knowing personally the person you're dealing with. Over time, you learn strategies around that, but there's no denial that it is a learning process.
Also, even though sometimes having the team working on the same office is perceived as distraction-prone, in my view, it also fosters a more dynamic environment, where ideas flow more freely and faster. It also encourages a "team-attitude" towards problem solving, which is great for consistency.
I think the best approach, whenever possible, is having a bit of both - work a few days from home, so people can focus and self organize their time, and then work a few days on the same office so that they are still part of a team, instead of islands in isolation.
I have custom coded several enterprise applications for mid to large organizations to use internally (some with a minimal external footprint). I now have plans for a web project that may (hopefully) see a large userbase with more daily traffic than my previous projects have ever attained. Obviously I want my design to be scalable and maintainable. The problem is that from a physical layout perspective (servers/VMs) I do not know what to expect.
The question: What are some good resources for this? Books? Websites? I have found plenty on scalable application design, but nothing on scalable physical design.
It's hard to give exact answer without knowing something about what technologies you plan to use. The approach to the application can't be completely unaware of planned physical infrastructure if scaling is a major driver.
Caching would have to be a big concern. Also ways to expand the hardware where your data lives.
A very interesting and instructive read is the real world bio of live journal, a history of scaling, and how they grew their physical presence with a massive growth in their website. One major offshoot of their work was a new caching technology, memcached, which is now used by FaceBook among others. It is surprisingly honest.
The High Scalability blog is good. You can look at some of their examples that go over the physical parts of large sites. I would say the common first level physical scaling technique would be a load balancer. That is pretty easy but at the simplest you still have a database that is a potential bottleneck. Most of the physical parts of scaling require you to just add more and the real issues come in where you are forced to use just one of something.
I keep hearing from associates about grid computing which, from what I can gather, is highly distributed stuff along the lines of SETI#Home.
Is anyone working on these sort of systems for business use? My interest is in figuring out if there's a commercial reason for starting software development in this field.
Rendering Farms such as Pixar
Model Evaluation e.g. weather, financials, military
Architectural Engineering e.g. earthquakes.
To list a few.
Grid computing is really only needed if you have a lot of WORK that needs to be done, like folding proteins, otherwise a simple server farm will likely be plenty.
Obviously Google are major users of Grid Computing; all their search service relies on it, and many others.
Engines such as BigTable are based on using lots of nodes for storage and computation. These are commercially very useful because they're a good alternative to a small number of big servers, providing better redundancy and cost effective scaling.
The downside is that the software is fiendishly difficult to write, but Google seem to manage that one ok :)
So anything which requires big storage and/or lots of computation.
I used to work for these guys. Grid computing is used all over. Anyone who makes computer chips uses them to test designs before getting physical silicon cut. Financial websites use grids to calculate if you qualify for that loan. These days they are starting to replace big iron in a lot of places, as they tend to be cheaper to maintain over the long term.