Optimization algorithms optimizing an existing system connections - optimization

i am currently working on an existing infrastructure where i have about a 1000 customer sites connected to about 5 different hubs. A customer site may connect to one or two hubs to ensure reliability but each customer site is connected to at least one hub. I want to ensure if the current system is the best or can be optimised to have better connection from customer sites to hubs, to help improve connectivity and reliability. Can you suggest good Optimisation Algorithms to look into?. Thank you

Sounds like you're doing some variation of the Facility Problem.
This is a well-known problem, and while there are algorithms that can solve for the global optimum (Djiskra's Algorithm, or other variants of Dynamic Programming), they do not scale well (i.e. you run into the curse of dimensionality). You could try this, but 1000 sounds already pretty big (depends on your exact problem formulation though).
I'd recommend taking a look at this coursera mooc Discrete Optimization. You don't have to take the whole course, but in the "Assignments" section of the video lectures, he also explains a variant of the Facility problem, some possible approaches to think about, and once you've decided which one you want to use, you can look deeper into that particular approach.

Related

Concurrent page request comparisons

I have been hoping to find out what different server setups equate to in theory for concurrent page requests, and the answer always seems to be soaked in voodoo and sorcery. What is the approximation of max concurrent page requests for the following setups?
apache+php+mysql(1 server)
apache+php+mysql+caching(like memcached or similiar (still one server))
apache+php+mysql+caching+dedicated Database Server (2 servers)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/single dbserver)
apache+php+mysql+caching+dedicatedDB+loadbalancing(multi webserver/multi dbserver)
+distributed (amazon cloud elastic) -- I know this one is "as much as you can afford" but it would be nice to know when to move to it.
I appreciate any constructive criticism, I am just trying to figure out when its time to move from one implementation to the next, because they each come with their own implementation feat either programming wise or setup wise.
In your question you talk about caching and this is probably one of the most important factors in a web architecture r.e performance and capacity.
Memcache is useful, but actually, before that, you should be ensuring proper HTTP cache directives on your server responses. This does 2 things; it reduces the number of requests and speeds up server response times (if you have Apache configured correctly). This can also be improved by using an HTTP accelerator like Varnish and a CDN.
Another factor to consider is whether your system is stateless. By stateless, it usually means that it doesn't store sessions on the server and reference them with every request. A good systems architecture relies on state as little as possible. The less state the more horizontally scalable a system. Most people introduce state when confronted with issues of personalisation - i.e serving up different content for different users. In such cases you should first investigate using the HTML5 session storage (i.e store the complete user data in javascript on the client, obviously over https) or if the data set is smaller, secure javascript cookies. That way you can still serve up cached resources and then personalise with javascript on the client.
Finally, your stack includes a database tier, another potential bottleneck for performance and capacity. If you are only reading data from the system then again it should be quite easy to horizontally scale. If there are reads and writes, its typically better to separate the read write datasets into a separate database and have the read only in another. You can then use more relevant methods to scale.
These setups do not spit out a single answer that you can then compare to each other. The answer will vary on way more factors than you have listed.
Even if they did spit out a single answer, then it is just one metric out of dozens. What makes this the most important metric?
Even worse, each of these alternatives is not free. There is engineering effort and maintenance overhead in each of these. Which could not be analysed without understanding your organisation, your app and your cost/revenue structures.
Options like AWS not only involve development effort but may "lock you in" to a solution so you also need to be aware of that.
I know this response is not complete, but I am pointing out that this question touches on a large complicated area that cannot be reduced to a single metric.
I suspect you are approaching this from exactly the wrong end. Do not go looking for technologies and then figure out how to use them. Instead profile your app (measure, measure, measure), figure out the actual problem you are having, and then solve that problem and that problem only.
If you understand the problem and you understand the technology options then you should have an answer.
If you have already done this and the problem is concurrent page requests then I apologise in advance, but I suspect not.

What is a good access time to a database (SQL)?

Hey.. i wanna know which time is a good accesstime, because i'm searching for a good sql database and hsqldb says their accesstime is 12ms... <-- good?
I think it would depend on your needs. Is it for a web server or a desktop application? The amount of data is also important, because reading lots of small records will perform differently than reading a few large records. Access time is also based upon your hardware, software and maybe even some other factors.
For example, you can use a database with lightning-fast access, but if your users need to connect to it over a 5 megabit VPN connection, passing through three different proxies and with trafic world-wide, your database would then just be a waste of power.
Basically, it's a marketing thing that they're claiming. It's a good product but don't just focus on access time. Make sure you also look at your other needs. Another system might just perform better, even if it has a slower acess time, because it is more optimized in reading it's indices and stuff.
So, what do you want, exactly?
I don't think access time tells you anything, really. If you have slow or incorrectly configured storage, then this access time metric will be dwarfed by how much time is spent on waits and split I/Os. Network latency is also a factor, since I'm guessing you probably won't want to have your code on the same machine as your database, and you will most likely have a few network devices you'll need to traverse in your production environment.
In my experience, all the database platforms these days will all perform adequately if configured correctly and paired with a complementary application. Pick the DBMS that best fits your requirements, follow the best practices for configuration of the DBMS on your hardware, and you should be please with the outcome.

Ideas for a distributed processing project?

I am looking for a project idea in distributed processing on Unix based systems. I wish to use only the C programming language. I have to finish the project in 4 months and it's a part of my course work. Can someone help me with an idea?
Cryptography problems
Distributed Ray Tracer
Chess AI (really, AI for any game)
Large Prime Number Search
Web crawler or other search mechanism
Generic Problem Solver (push out problem definition on the fly, followed by problem data).
Note on the last one:
An example would be if you have a gaming website with lots of board games that you were coming out with all the time. You don't want to have to install new clients on all your servers every time you write a new AI for a board game, so you have a program which you can send new AIs to and then after that you can just send the game data and the pushed AI will be used to solve the problem. This is best used for problems which can be broken into smaller chunks.
It is hard to answer without knowing anything about performance, the scale of the project, what you are trying to accomplish, etc. For example, is it one task or multiple tasks? Is the project just totally open?
4 months is pretty short, but maybe some kind of physics problem or math problem. Sorting or some kind of database work might be dull but beneficial.
Check out mapreduce for ideas! I was really motivated by this work, personally.
We used distributed processing here at work, but it's such a broad field..
Yeah.
Why not write a distributed compiler. You may then present an interface for people to compile things on the fly, and it will be passed to your distribute compilenet. Java is probably well-suited, and you'll get to do fun things, like be very mindful of security and so on.
The BOINC project is always looking for help and is very interesting:
http://boinc.berkeley.edu/
If you want to leave your mark and change the way we search the web,
look into B-Trees.
B-Trees and offspring/variants are the working horse of the internet.
Google uses them extensively to index the web.
Database indexes/indices are B-Tree offspring/variants.
Every LAMP system uses a database and indexes/indices.
Also, they are used extensively in distributed VLDB (Very Large DataBases)
Perhaps you can improve existing distributed databases (Cassandra and HBase)
These are lofty goals, but for me, this would leave a lasting mark
in the way Web data is processed, indexed and stored.
Write a distributed, fault tolerant, redundant network B+Tree or B*Tree.
Read Drozdek's book Data Structures and Algorithms in C++.
It's a good survey of B-Trees.
Read about skip trees
http://www.cs.huji.ac.il/~ittaia/papers/AAY-OPODIS05.pdf
Read about Efficient B-tree Based Indexing for Cloud Data Processing
http://www.comp.nus.edu.sg/~ooibc/vldb10-cgindex.pdf
Google search "Network B+Tree"
https://www.google.com/search?rlz=1C1CHKZ_enUS431US431&sourceid=chrome&ie=UTF-8&q=Network+B%2BTree

Given this expectations, what language or system would you choose to implement the solution? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Here are the estimates the system should handle:
3000+ end users
150+ offices around the world
1500+ concurrent users at peak times
10.000+ daily updates
4-5 commits per second
50-70 transactions per second (reads/searches/updates)
This will be internal only business application, dedicated to help shipping company with worldwide shipment management.
What would be your technology choice, why that choice and roughly how long would it take to implement it? Thanks.
Note: I'm not recruiting. :-)
So, you asked how I would tackle such a project. In the Smalltalk world, people seem to agree that Gemstone makes things scale somewhat magically.
So, what I'd really do is this: I'd start developing in a simple Squeak image, using SandstoneDB. Then, this moment would come where a single image begins being too slow.
GemStone then takes care of copying your public objects (those visible from a certain root) back and forth between all instances. You get sessions and enhanced query functionalities, plus quite a fast VM.
It shares data with C, Java and Ruby.
In fact, they have their own VM for ruby, which is also worth a look.
wikipedia manages much more demanding requirements with MySQL
Your volumes are significant but not likely to strain any credible RDBMS if programmed efficiently. If your team is sloppy (i.e., casually putting SQL queries directly into components which are then composed into larger components), you face the likelihood of a "multiplier" effect where one logical requirement (get the data necessary for this page) turns into a high number of physical database queries.
So, rather than focussing on the capacity of your RDBMS, you should focus on the capacity of your programmers and the degree to which your implementation language and environment facilitate profiling and refactoring.
The scenario you propose is clearly a 24x7x365 one, too, so you should also consider the need for monitoring / dashboard requirements.
There's no way to estimate development effort based on the needs you've presented; it's great that you've analyzed your transactions to this level of granularity, but the main determinant of development effort will be the domain and UI requirements.
Choose the technology your developers know and are familiar with. All major technologies out there will handle such requirements with ease.
Your daily update numbers vs commits do not add up. Four commits per second = 14,400 per hour.
You did not mention anything about expected database size.
In any case, I would concentrate my efforts on choosing a robust back end like Oracle, Sybase, MS etc. This choice will make the most difference in performance. The front end could either be a desktop app or WEB app depending on needs. Since this will be used in many offices around the world, a WEB app might make the most sense.
I'd go with MySQL or PostgreSQL. Not likely to have problems with either one for your requirements.
I love object-databases. In terms of commits-per-second and database-roundtrip, no relational database can hold up. Check out db4o. It's dead easy to learn, check out the examples!
As for the programming language and UI framework: Well, take what your team is good at. Dynamic languages with fewer meta-time wasting will probably save time.
There is not enough information provided here to give a proper recommendation. A little more due diligence is in order.
What is the IT culture like? Do they prefer lots of little servers or fewer bigger servers or big iron? What is their position on virtualization?
What is the corporate culture like? What is the political climate like? The open source offerings may very well handle the load but you may need to go with a proprietary vendor just because they are already used to navigating the political winds of a large company. Perception is important.
What is the maturity level of the organization? Do they already have an Enterprise Architecture team in place? Do they even know what EA is?
You've described the operational side but what about the analytical side? What OLAP technology are they expecting to use or already have in place?
Speaking of integration, what other systems will you need to integrate with?

Commercial uses for grid computing?

I keep hearing from associates about grid computing which, from what I can gather, is highly distributed stuff along the lines of SETI#Home.
Is anyone working on these sort of systems for business use? My interest is in figuring out if there's a commercial reason for starting software development in this field.
Rendering Farms such as Pixar
Model Evaluation e.g. weather, financials, military
Architectural Engineering e.g. earthquakes.
To list a few.
Grid computing is really only needed if you have a lot of WORK that needs to be done, like folding proteins, otherwise a simple server farm will likely be plenty.
Obviously Google are major users of Grid Computing; all their search service relies on it, and many others.
Engines such as BigTable are based on using lots of nodes for storage and computation. These are commercially very useful because they're a good alternative to a small number of big servers, providing better redundancy and cost effective scaling.
The downside is that the software is fiendishly difficult to write, but Google seem to manage that one ok :)
So anything which requires big storage and/or lots of computation.
I used to work for these guys. Grid computing is used all over. Anyone who makes computer chips uses them to test designs before getting physical silicon cut. Financial websites use grids to calculate if you qualify for that loan. These days they are starting to replace big iron in a lot of places, as they tend to be cheaper to maintain over the long term.