What database do online games like Farmville use? [closed] - sql

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Apart from graphical features, online games should have a simple Relational Database structure. I am curious what database do online games like Farmville and MafiaWars use?
Is it practical to use SQL based databases for such programs with such frequent writes ?
If not, how could one store the relational dependence of users in these games?
EDIT: As pointed, they use NOSQL databases like Couchbase. NOSQL is fast with good cuncurrency (which is really needed here); but the sotrage size is much larger (due to key/value structure).
1. Does't it slow down the system (as we need to read large database files from the disk)?
2. We will be very limited as we do not have SQL's JOIN to connected different sets of data.

These databases scale to about 500,000 operations per second, and they're massively distributed. Zynga still uses SQL for logs, but for game data, they presently use code that is substantially the same as Couchbase.
“Zynga’s objective was simple: we needed a database that could keep up with the challenging demands of our games while minimizing our average, fully-loaded cost per database operation – including capital equipment, management costs and developer productivity. We evaluated many NoSQL database technologies but all fell short of our stringent requirements. Our membase development efforts dovetailed with work being done at NorthScale and NHN and we’re delighted to contribute our code to the open source community and to sponsor continuing efforts to maintain and enhance the software.” - Cadir Lee, Chief Technology Officer, Zynga

To Answer your edit:
You can decrease storage size by using a non key=>value storage like MongoDB. This does still have some overhead but less than trying to maintain a key=>value store.
It does not slow down the system terribly since quite a few NoSQL products are memory mapped which means that unlike SQL it doesn't go directly to disk but instead to a fsync queue that then writes to disk when it is convient to. For those NoSQL solutions which are not memory mapped they have extremely fast read/write speeds and it is normally a case of trade off between the two.
As for JOINs, it is a case of arranging your schema in such a manner that you can avoid huge joins. Small joins to join say, a user with his score record are fine but aggregated joins will be a problem and you will need to find other ways around this. There are numerous solutions provided by many user groups of various NoSQL products.

The database they use has been reported to be Membase. It's open source, and one of the many nosql databases.
In January 2012, Membase became Couchbase, and you can download it here if you want to give it a try.

Related

SQL or filesystem for FAST storing files/BLOBs? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have an app that stores quite a lot of publications as files on filesystem, using nested dirs like "6/0/3/6/....". Files are not huge (.jpg, .pdf, similar documents), there's "just" a lot of them, running into GB hundreds. Once stored in fs, they typically are never rewritten, just served over http.
Searching, versioning through such files is painfully slow. Copying such dirs is also rather cumbersome.
This got me thinking: would it not be better to store such data as BLOBs in db (my app is using postgres for various purposes anyway).
Which one -- fs or scalable sql db -- could perform better all around? Or would PG collapse under so much weight?
Incremental backup is much easier with the filesystem. So is recovering from partial damage. Versioning is pretty easy to do on top of the file system so long as you don't need atomic change sets, only individual file versioning.
On the other hand, using the DB gets you transactional behaviour - atomic commit, multiple concurrent consistent snapshots, etc. It costs you disk storage efficiency and it adds overhead to access. It means you can't just sendfile() the data directly from the file system, you must do several memory copies and some encoding just to get and send the file.
For a high performance server the file system is almost certainly going to win unless you really need atomic commit and the simultaneous consistent visibility of multiple versions.
There are lots of related past questions you should probably read too, most concerning whether it's better to store images in the DB or on the file system.
FileSystem has other big disadvantages :
You can get problems with user rights
Not atomic
Slow
When dealing with BLOB < 1GB, I would 100% store them in database since all good system databases can handle BLOB properly. (They store it in a different manner than structured data but it is not visible to you)
By the way, when you read on http://www.postgresql.org/about/
Maximum Database Size => Unlimited

Effectively storing and retrieving huge amount of aggregated data for reports [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The problem is the following. We gather some data in real time, let say 100 entries per second. We want to have real-time reports. The reports should present data by hours. All we want to do is to create some sums of incoming data and have some smart indexing so that we can easily serve queries like "give me value2 for featureA = x, and featureB = y, for 2012-01-01 09:00 - 10:00"
To avoid too many I/O operations we aggregate data in memory (which means we sum them up), then flush them to database. Let us say it happens every 10 seconds or so, which is an acceptable latency for our real-time reports.
So basically, in SQL terms, we end-up with 20 (or more) tables like this (ok, we could have little less of them by combining sum, but it does not make a lot of difference):
Time, FeatureA, FeatureB, FeatureC, value1, value2, valu3
Time, FeatureA, FeatureD, value4, value5
Time, FeatureC, FeatureE, value6, value7
etc.
(I do not say the solution has to be SQL, I only present this to explain the issue at hand.) The Time column is timestamp (with hour precision), Feature columns are some ids of system entities, and values are integer values (counts).
So now the problem arises. Because of the very nature of the data, even if we aggregate them, there are still (too) many inserts to these aggregating tables. This is because some of the data are sparse, which means that for every 100 entries, we have, say, 50 entries to some of aggregating tables. I understand that we can go forward by upgrading the hardware, but what I feel is that we could do better by having smarter storing mechanism. For example, we could use SQL database, but we do not need most of its features (transactions, joins etc.).
So given this scenario my question is the following. How do you guys deal with real-time reporting of high volume traffic? Google somehow does this for web analytics, so it is possible after all. Any secret weapon here? We are open to any solutions - be it Hadoop & Co, NoSQL, clustering or whatever else.
Aside from splitting the storage requirements for collection and reporting/analysis, one of the things we used to do, is look at how often significant changes to a value occurred, and how the data would be used.
No idea what your data looks like, but reporting and analysis is usually looking for significant patterns. In tolerance to out, and vice versa and particularly oscillation.
Now while it's might be laudable to collect an "inifinite" amount of data just in case you want to analyse it, when you bump into the finite limits of implementing, choices have to be made.
I did this sort of thing in manufacturing environment. We had two levels of analysis. One for control where the granularity was as high as we could afford. Then as the data got further in the past we summarised it for reporting.
I ran into the issues you appear to be more than a few times, and while the loss of data was bemoaned, the complaints about how much it would cost were much louder.
So I wouldn't look at this issue from simply a technical point of view, but from a practical business one. Start from how much the business believes it can afford, and see how much you can give them for it.

NoSQL DataBases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
For last several years I have noticed that interest to NoSQL DBs is increasing.
A lot of new such DBs are released :
MongoDB
CouchDB
Memcachedb
memcached
Velocity
Cassandra
Tokyo Cabinet
etc..
What you think is it targeted to replace relational model and in general how do you see the future for NoSQL.
Why NoSql (MongoDB)?
Scalable and flexible datastore : This is the primary reason for moving away from relational database.
Schema less : represent complex hierarchical relationships with a single record.
Scaling Out: Partitioning data across more machine.
Amazingly Fast : MongoDB uses a binary wire protocol as the primary mode of interaction
with the server
Features:
Indexing with ease
Stored Java Script
Simple Administration (automatic fail over if master goes down in master-slave architecture)
MongoDB is powerful and attempts to keep many features from relational
systems, it is not intended to do everything that a relational database does. Whenever
possible, the database server offloads processing and logic to the client side.
NoSQL systems such as MongoDB are those which are designed for incredibly data-intensive applications - Facebook for example created a NoSQL solution called Cassandra to handle the vast amounts of data they had. NoSQL is useful for those who are building highly scalable applications and helps to reduce the need for empty table columns by not enforcing a database schema, so for instance if you had a table in which you stored information about your friends, you wouldn't have to include the reading interests of one where you knew the reading interests of the other.
Relational databases do have their place, however NoSQL isn't really meant as a replacement, just as a different way of approaching the idea of data storage on a large scale. I would say that in the future more and more companies will begin using NoSQL solutions, but at the moment most people with small websites simply don't have need of a system designed to deal with such quantities of information.
Hope that helps!
I think NoSQL Is meant to replace 'SQL'. The title alone sort of alludes to it. NoSQL means approaching the problem differently. Any system making use of both NoSQL and SQL are not fully embracing what it means to be a Key value store.
That is not to say this approach is not ideal (it is, since many of the NoSQL technologies do not have the advanced features SQL databases have had for decades that currently solve the problem better than NoSQL.)
As NoSQL technologies mature (data consistency is assured) companies will feel more comfortable completely removing SQL from their technology stack; and the commercial licensing of open source databases (MySql being bought by Oracle) will set the pace for the speed of this migration.
There are systems like playORM that can now do joins within partitions so it is more and more likely that noSQL can replace SQL systems in the future. In fact, playORM uses S-SQL(Scalable SQL) and you do nearly the same SQL as before except specify the partitions you are querying. As a start, you can move a relational model to noSQL using playORM with no partitioning and it will be just as fast and hten you can aadd partitioning to your relational model so that it scales.

Given this expectations, what language or system would you choose to implement the solution? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Here are the estimates the system should handle:
3000+ end users
150+ offices around the world
1500+ concurrent users at peak times
10.000+ daily updates
4-5 commits per second
50-70 transactions per second (reads/searches/updates)
This will be internal only business application, dedicated to help shipping company with worldwide shipment management.
What would be your technology choice, why that choice and roughly how long would it take to implement it? Thanks.
Note: I'm not recruiting. :-)
So, you asked how I would tackle such a project. In the Smalltalk world, people seem to agree that Gemstone makes things scale somewhat magically.
So, what I'd really do is this: I'd start developing in a simple Squeak image, using SandstoneDB. Then, this moment would come where a single image begins being too slow.
GemStone then takes care of copying your public objects (those visible from a certain root) back and forth between all instances. You get sessions and enhanced query functionalities, plus quite a fast VM.
It shares data with C, Java and Ruby.
In fact, they have their own VM for ruby, which is also worth a look.
wikipedia manages much more demanding requirements with MySQL
Your volumes are significant but not likely to strain any credible RDBMS if programmed efficiently. If your team is sloppy (i.e., casually putting SQL queries directly into components which are then composed into larger components), you face the likelihood of a "multiplier" effect where one logical requirement (get the data necessary for this page) turns into a high number of physical database queries.
So, rather than focussing on the capacity of your RDBMS, you should focus on the capacity of your programmers and the degree to which your implementation language and environment facilitate profiling and refactoring.
The scenario you propose is clearly a 24x7x365 one, too, so you should also consider the need for monitoring / dashboard requirements.
There's no way to estimate development effort based on the needs you've presented; it's great that you've analyzed your transactions to this level of granularity, but the main determinant of development effort will be the domain and UI requirements.
Choose the technology your developers know and are familiar with. All major technologies out there will handle such requirements with ease.
Your daily update numbers vs commits do not add up. Four commits per second = 14,400 per hour.
You did not mention anything about expected database size.
In any case, I would concentrate my efforts on choosing a robust back end like Oracle, Sybase, MS etc. This choice will make the most difference in performance. The front end could either be a desktop app or WEB app depending on needs. Since this will be used in many offices around the world, a WEB app might make the most sense.
I'd go with MySQL or PostgreSQL. Not likely to have problems with either one for your requirements.
I love object-databases. In terms of commits-per-second and database-roundtrip, no relational database can hold up. Check out db4o. It's dead easy to learn, check out the examples!
As for the programming language and UI framework: Well, take what your team is good at. Dynamic languages with fewer meta-time wasting will probably save time.
There is not enough information provided here to give a proper recommendation. A little more due diligence is in order.
What is the IT culture like? Do they prefer lots of little servers or fewer bigger servers or big iron? What is their position on virtualization?
What is the corporate culture like? What is the political climate like? The open source offerings may very well handle the load but you may need to go with a proprietary vendor just because they are already used to navigating the political winds of a large company. Perception is important.
What is the maturity level of the organization? Do they already have an Enterprise Architecture team in place? Do they even know what EA is?
You've described the operational side but what about the analytical side? What OLAP technology are they expecting to use or already have in place?
Speaking of integration, what other systems will you need to integrate with?

Is ORM slow? Does it matter? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I really like ORM as compared to store procedure, but one thing that I afraid is that ORM could be slow, because of layers and layers of abstraction. Will using ORM slow down my application? Or does it matter?
Yes, it matters. It is using more CPU cycles and consequently slowing your application down. Hear me out though...
But, consider this: what is more expensive? Server hardware or another programmer? Server hardware, generally, is cheaper than hiring another team of programmers. So, while ORM may be costing you CPU cycles, you need one less programmer to manage your SQL queries, often resulting in a lower net cost.
To determine if it's worth it for you, calculate or determine how many hours you saved by using an ORM. Then, figure out how much money you spent on the server to support ORM. Multiply the hours you saved by your hourly rate and compare to the server cost.
Of course, whether an ORM actually saves you time is a whole another debate...
Is ORM slow?
Not inherently. Some heavyweight ORMs can add a general drag to things but we're not talking orders of magnitude slowdown.
What does make ORM slow is naïve usage. If you're using an ORM because it looks easy and you don't know how the underlying relational data model works, you can easily write code that seems reasonable to an OO programmer, but will murder performance.
ORM is a handy tool, but you need the lower-level understanding (that usually comes from writing SQL queries) to go with it.
Does it matter?
If you end up performing a looped query for each of thousands of entities at once, instead of a single fast join, then certainly it can.
ORM's are slower and do add overhead to applications (unless you specifically know how to get around these problems, which is not very common). The database is the most critical element and Web applications should be designed around it.
Many OOP frameworks using Active Record or ORMs, developers in general - treat the database as an unimportant afterthought and tend to look at it as something they don't really need to learn. But performance and scalability usually suffer as the db is heavily taxed!
Many large scale web apps have fallen flat, wasting millions and months to years of time because they didn't recognize the importance of the database. Hundreds of concurrent users and tables with millions of records require database tuning and optimization. But I believe the problem is noticeable with a few users and less data.
Why are developers so afraid to learn proper SQL and tuning measures when it's the key to performance?
In a Windows Mobile 5 project against using SqlCe, I went from using hand-coded objects to code generated (CodeSmith) objects using an ORM template. In the process all my data access used CSLA as a base layer.
The straight conversion improved my performance by 32% in local testing, almost all of it a result of better access methods.
After that change, we adjusted the templates (after seeing some SqlCe performance stuff at PDC by Steve Lasker) and in less then 20 minutes, our entire data layer was greatly improved, our average 'slow' calls went from 460ms to ~20ms. The cool part about the ORM stuff is that we only had to implement (and unit test) these changes once and all the data access code got changed. It was an amazing time saver, we maybe saved 40 hours or more.
The above being said, we did lose some time by taking out a bunch of 'waiting' and 'progress' dialogs that were no longer needed.
I have used a few of the ORM tools, and I can recommend two of them:
.NET Tiers
CSLA codegen templates
Both of them have performed quite nicely and any performance loss has not been noticeable.
I've always found it doesn't matter. You should use whatever will make you the most productive, responsive to changes, and whatever is easiest to debug and maintain.
Most applications never need enough load for the difference between ORM and SPs to noticeable. And there are optimizations to make ORM faster.
Finally, a well-written app will have its data access seperated from everything else so that in the future switching from ORM to whatever would be possible.
Is ORM slow?
Yes ( compared with stored procedures )
Does it matter?
No ( except your concern is speed )
I think the problem is many people think of ORM as a object "trick" to databases, to code less or simplify SQL usage, while in reality is .. well an Object - To Relational ( DB ) - Mapping.
ORM is used to persist your objects to a relational database manager system, and not ( just ) to substitute or make SQL easier ( although it make a good job at that too )
If you don't have a good object model, or you're using to make reports, or even if you're just trying to get some information, ORM is not worth it.
If in the other hand you have a complex system modeled through objects were each one have different rules and they interact dynamically and you your concern is persist that information into the database rather than substitute some existing SQL scripts then go for ORM.
Yes, ORM will slow down your application. By how much depends on how far the abstraction goes, how well your object model maps to the database, and other factors. The question should be, are you willing to spend more developer time and use straight data access or trade less dev time for slower runtime performance.
Overall, the good ORMs have little overhead and, by and large, are considered well worth the trade off.
Yes, ORMs affect performance, whether that matters ultimately depends on the specifics of your project.
Programmers often love ORM because they like the nice front-end cding environments like Visual Studio and dislike coding raw SQL with no intellisense, etc.
ORMs have other limitations besides a performance hit--they also often do not do what you need 100% of the time, add the complexity of an additional abstraction layer that must be maintained and re-established every time chhnges are made, there are also caching issues to be dealt with.
Just a thought -- if the database vendors would make the SQL programming environment as nice as Visual Studio, and provide a more natural linkage between the db code and front-end code, we wouldn't need the ORMs...I guess things may go in that direction eventually.
Obvious answer: It depends
ORM does a good job of insulating a programmer from SQL. This in effect substitutes mediocre, computer generated queries for the catastrophically bad queries a programmer might give.
Even in the best case, an ORM is going to do some extra work, loading fields it doesn't need to, explicitly checking constraints, and so forth.
When these become a bottle-neck, most ORM's let you side-step them and inject raw SQL.
If your application fits well with objects, but not quite so easily with relations, then this can still be a win. If instead your app fits nicely around a relational model, then the ORM represents a coding bottleneck on top of a possible performance bottleneck.
One thing I've found to be particularly offensive about most ORM's is their handling of primary keys. Most ORM's require pk's for everything they touch, even if there is no concievable use for them. Example: Authors should have pk's, Blog posts SHOULD have pk's, but the links (join table) between authors and posts not.
I have found that the difference between "too slow" and "not too much slower" depends on if you have your ORM's 2nd level (SessionFactory) cache enabled. With it off it handles fine under development load, but will crush your system under mild production load. After turning on the 2nd Level cache the server handled the expected load and scaled nicely.
ORM can get an order of magnitude slower, not just on the grount=s of wasting a lot of CPU cycles on it's own but also using much more memeory which then has to be GC-d.
Much worse that that however is that the is no standard for ORM (unlike SQL) and that my and large ORM-s use SQL vary inefficiently so at the end of the day you still have to dig into SQL to fix per issues and every time an ORM makes a mess and you have to debug it. Meaning that you haven't gained anything at all.
It's terribly immature technology for real production-level applications. Very problematic things are handling indexes, foreign keys, tweaking tables to fit object hierarchies and terribly long transactions, which means much more deadlocks and repeats - if an ORM knows hows to handle that at all.
It actually makes servers less scalable which multiplies costs but these costs don't get mentioned at the begining - a little inconvenient truth :-) When something uses transactions 10-100 times bigger than optimal it becomes impossible to scale SQL side at all. Talking about serious systems again not home/toy/academic stuff.
An ORM will always add some overhead because of the layers of abstraction but unless it is a poorly designed ORM that should be minimal. The time to actually query the database will be many times more than the additional overhead of the ORM infrastructure if you are doing it correctly, for example not loading the full object graph when not required. A good ORM (nHibernate) will also give you many options for the queries run against the database so you can optimise as required as well.
Using an ORM is generally slower. But the boost in productivity you get will get your application up and running much faster. And the time you save can later be spent finding the portions of your application that are causing the biggest slow down - you can then spend time optimizing the areas where you get the best return on your development effort. Just because you've decided to use an ORM doesn't mean you can't use other techniques in the sections of code that can really benefit from it.
An ORM can be slower, but this is offset by their ability to cache data, therefore however fast the alternative, you can't get much faster than reading from memory.
I never really understood why people think that this is slower or that is slower... get a real machine I say. I have had mixed results... I've seen where execution time for a stored procedure is much slower than ORM and vise versa.. But in both cases the performance was due to difference in hardware.