SimpleDB as backup? - backup

Is there any solution (RDBMS or NoSQL) that can use SimpleDB as a backup?
Thanks,
A.

I'm pretty sure no RDBMS can be configured to do this, and I doubt they ever will. First, SimpleDb is a quasi-competitor to commercial databases so it wouldn't make sense for most companies to build this. Second, it would have a huge negative effect on transaction performance since writing transactions to SimpleDB would probably take at least 100 times as long as writing them to disk.

Related

Would mysql with sharding (Vitess) be faster than any no SQL database?

I have been trying to understand why nosql db is considered to be faster than RDBMS. I understand that nosql dbs don't follow the ACID properties but instead follows BASE principle which is the reason why nosql is able to horizontally scale.
What I am trying to understand here is that if there is a big difference in similar queries.
For instance let's say we have a search which searches all the matching users in our db. We have same data in MySQL as well as in any nosql db and let's assume that for nosql we have just 1 shard. So would there still be a difference in the speed of the query or would it be the same?
With disks getting faster and cheaper, the speed advantage may not be significant enough to make a difference.
However, you can do a lot more with systems like Vitess in terms of transactions, joins and indexes. At the same time, you can continue to scale like NoSQL systems. For this reason, I think Vitess is an overall better trade-off.

Distributed SQL Database

I need to decide which database to use for a system where I need AP from CAP theorem. Data I constantly but slowly going in. Big queries are expected. It should be reliable - no single point of failure. I can use up to 3 instances on different nodes. In-memory solutions are bad for me because of data size -
it should be running for years and I expect up to terabyte data sizes. Most guys in my team prefer SQL. But I understand that traditional SQL databases are not fault tolerant in terms of hardware failure. Any ideas?
Since this question was asked there have been some significant changes in the Distributed SQL or NewSQL landscape...the most noteworthy being the viability of CockroachDB. That appears to be the best option in a situation like the one referenced in this question.
No single points of failure. Easy to scale. Can handle tons of volume. You can run it wherever you want. Speaks postgres. Super fault tolerant.
Amazon Redshift seems to be the best answer(thank you kuujo). But we will try rethinkdb because it has some nice feature

access sql database as nosql (couchbase)

I hope to access sql database as the way of nosql key-value pairs/document.
This is for future upgrade if user amount increases a lot,
I can migrate from sql to nosql immediately while application code changes nothing.
Of course I can write the api/solution by myself, just wonder if there is any person has done same thing as I said before and published the solution.
Your comment welcome
While I agree with everything scalabilitysolved has said, there is an interesting feature in the offing for Postgres, scheduled for the 9.4 Postgres release, namely, jsonb: http://www.postgresql.org/docs/devel/static/datatype-json.html with some interesting indexing and query possibilities. I mention this as you tagged Mongodb and Couchbase, both of which use JSON (well, technically BSON in Mongodb's case).
Of course, querying, sharding, replication, ACID guarantees, etc will still be totally different between Postgres (or any other traditional RDBMS) and any document-based NoSQL solution and migrations between any two RDBMS tends to be quite painful, let alone between an RDBMS and a NoSQL data store. However, jsonb looks quite promising as a potential half-way house between two of the major paradigms of data storage.
On the other hand, every release of MongoDB brings enhancements to the aggregation pipeline, which is certainly something that seems to appeal to people used to the flexibility that SQL offers and less "alien" than distributed map/reduce jobs. So, it seems reasonable to conclude that there will continue to be cross pollination.
See Explanation of JSONB introduced by PostgreSQL for some further insights into jsonb.
No no no, do not consider this, it's a really bad idea. Pick either a RDBMS or NoSQL solution based upon how your data is modelled and your usage patterns. Converting from one to another is going to be painful and especially if your 'user amount increases a lot'.
Let's face it, either approach would deal with a large increase in usage and both would benefit more from specific optimizations to their database then simply swapping because one 'scales more'.
If your data model fits RDBMS and it needs to perform better than analyze your queries, check your indexes are optimized and look into caching and better data pattern access.
If your data model fits a NoSQL database then as your dataset grows you can add additional nodes (Couchbase),caching expensive map reduce jobs and again optimizing your data pattern access.
In summary, pick either SQL or NoSQL dependent on your data needs, don't just assume that NoSQL is a magic bullet as with easier scaling comes a much less flexible querying model.

How would I implement separate databases for reading and writing operations?

I am interested in implementing an architecture that has two databases one for read operations and the other for writes. I have never implemented something like this and have always built single database, highly normalised systems so I am not quite sure where to begin. I have a few parts to this question.
1. What would be a good resource to find out more about this architecture?
2. Is it just a question of replicating between two identical schemas, or would your schemas differ depending on the operations, would normalisation vary too?
3. How do you insure that data written to one database is immediately available for reading from the second?
Any further help, tips, resources would be appreciated. Thanks.
EDIT
After some research I have found this article which I found very informative for those interested..
http://www.codefutures.com/database-sharding/
I found this highscalability article very informative
I'm not a specialist but the read/write master database and read-only slaves pattern is a "common" pattern, especially for big applications doing mostly read accesses or data warehouses:
it allows to scale (you add more read-only slaves if required)
it allows to tune the databases differently (for either efficient reads or efficient writes)
What would be a good resource to find out more about this architecture?
There are good resources available on the Internet. For example:
Highscalability.com has good examples (e.g. Wikimedia architecture, the master-slave category,...)
Handling Data in Mega Scale Systems (starting from slide 29)
MySQL Scale-Out approach for better performance and scalability as a key factor for Wikipedia’s growth
Chapter 24. High Availability and Load Balancing in PostgreSQL documentation
Chapter 16. Replication in MySQL documentation
http://www.google.com/search?q=read%2Fwrite+master+database+and+read-only+slaves
Is it just a question of replicating between two identical schemas, or would your schemas differ depending on the operations, would normalisation vary too?
I'm not sure - I'm eager to read answers from experts - but I think the schemas are identical in traditional replication scenari (the tuning may be different though). Maybe people are doing more exotic things but I wonder if they rely on database replication in that case, it sounds more like "real-time ETL".
How do you insure that data written to one database is immediately available for reading from the second?
I guess you would need synchronous replication for that (which is of course slower than asynchronous). While some databases do support this mode, not all do AFAIK. But have a look at this answer or this one for SQL Server.
You might look up data warehouses.
These serve as 'normalized for reporting' type databases, while you can keep a normalized OLTP style instance for the data maintenance.
I don't think the idea of 'immediate' equivalence will be a reality. There will be some delay while the new data and changes are migrated in to the other system. The schedule and scope will be your big decisions here.
In regards to questions 2:
It really depends on what you are trying to achieve by having two databases. If it is for performance reasons (which i suspect it may be) i would suggest you look into denormalizing the read-only database as needed for performance. If performance isn't an issue then I wouldn't mess with the read-only schema.
I've worked on similar systems where there would be a read/write database that was only lightly used by administrative users. That database would then be replicated to the read only database during a nightly process.
Question 3:
How immediate are we talking here? Less than a second? 10 seconds? Minutes?

SimpleDB as Denormalized DB

In an environment where you have a relational database which handles all business transactions is it a good idea to utilise SimpleDB for all data queries to have faster and more lightweight search?
So the master data storage would be a relational DB which is "replicated"/"transformed" into SimpleDB to provide very fast read only queries since no JOINS and complicated subselects are needed.
What you're considering smells of premature optimization ...
Have you benchmarked your application? Have you identified your search queries as a performance bottleneck? Have you correctly implemented indexes into your database?
IF (and that's a big if) there's no way using a relational database to offer decent search times to your users, going NOSQL might be something worth considering ... but not before !
SimpleDB is a good technology but its claim to fame is not faster queries than a relational database. Offloading queries to replicated SimpleDB is not likely to significantly improve your query response time.
I am still finding it hard to believe, but our experiments show that a round trip from and EC2 instance to simpledb is averaging out to 300milliseconds or so, on a good day! On a bad day we've seen it go down to 1.5sec. This is for a single insert. I'd love to see somebody replicate the experiment to verify these results, but as it is... simpledb is no solution for anything but post processing- in the request/response cycle it would just be way to slow.
If the data is largely read-only, try using indexed views. Otherwise, cache the data in the application.