Value of In-Memory DB for a use case without hot partition problem [closed] - redis

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
While DynamoDB has millisecond latency (6ms ~ 10ms if partition scheme is designed properly), and In-memory DB will offer microsecond latency. Essentially moving to in-memory DB would remove 10ms from our overall latency.
If network + compute latency is 30ms, and DB fetch is 10ms, total of 40ms, how much value does in-memory DB bring if the overall latency goes from 40ms to 30ms for a service that needs to be as low latency as possible?
From my research in-memory is best used when there is a large read requests on a single partition key in order to solve hot-partition problems. We will have large number of read requests, but if we don't have a hot partition problem and requests will be distributed evenly throughout different partition keys, does an in-memory DB not bring that much value other than the 10ms savings in latency?
Thanks

Like you state, in memory databases are best when massive amounts of reads are pointed to a single partition or key. DynamoDB would limit strongly consistent reads to 3000 per second.
If you can tolerate the small latency increase a database like DynamoDB would be much more beneficial, being serverless and HTTP based means it's really easy to develop with.

Related

What is the simplest and fastest way of storing and querying simply-structured data? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
What is the best way of storing and querying data for a simple tasks management application (e.g.)? The goal is to have maximum performance with minimum resources consumption (CPU, disk, RAM) for a single EC2 instance.
This depends also on the use case - will it be the database with many reads or many writes? When you are talking about tasks management, you have to know how many records do you expect, and if you expect more INSERTs or more SELECTs, etc.
Regarding SQL databases, interresting benchmark can be found here:
https://www.sqlite.org/speed.html
The benchmark shows that SQLite can be in many cases very fast, but in some cases also uneffective. (unfortunately the benchmark is not the newest, but still may be helpful)
SQLite is also good in the way it is just a single file on your disk that contains whole database and you can access the database using SQL language.
Very long and exhausting benchmark of the No-SQL can be found i.e. here:
http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf
It is also good to know the database engines, i.e. when using MySQL, choose carefully between MyISAM and InnoDB (nice answer is here What's the difference between MyISAM and InnoDB?).
If you just want to optimize performance, you can think of optimizing using hardware resources (if you read a lot from the DB and you do not have that many writes, you can cache the database (innodb_cache_size) - if you have enough RAM, you can read whole database from RAM.
So the long story short - if you are choosing engine for a very simple and small database, SQLite might be the minimalistic approach you want to use. If you want to build something larger, first be clear about your needs.

SQL summing vs running totals [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently in disagreement with my colleague regarding the best design of our database.
We have a need to frequently access the total user balance from our database of transactions, we will potentially need to access this information several times a second.
He says that sql is fast and all we need to do is SUM() the transactions. I, on the other hand, believe that eventually with enough users and a large database our server will be spending most of its time summing the same records in the database. My solution is to have a separate table to keep a record of the totals.
Which one of us is right?
That is an example for database denormalization. It makes the code more complex and introduces potential for inconsistencies, but the query will be faster. If that's worth it depends on the need for the performance boost.
The sum could also be quite fast (i.e. fast enough) if it can be indexed properly.
A third way would be using cached aggregates that are periodically recalculated. Works best if you don't need real-time data (such as for account activity up until yesterday, which you can maybe augment with real-time data from the smaller set of today's data).
Again, the tradeoff is between making things fast and keeping things simple (don't forget that complexity also tend to introduce bugs and increase maintenance costs). It's not a matter of one approach being "right" for all situations.
I don't think that one solution fits all.
You can go very far with a good set of indexes and well written queries. I would start with querying real time until you can't, and then jump to the next solution.
From there, you can go to storing aggregates for all non changing data (for example, beginning of time up to prior month), and just query the sum for any data that changes in this month.
You can save aggregated tables, but how many different kinds of aggregates are you going to save? At some point you have to look into some kind of a multi dimensional structure.

website speed performance as it relates to database load [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am new to this but I am curious, does the size of a database negatively affect page load speeds. Like if you had to fetch 20 items from a small database with 20,000 records and then fetch those same 20 items from a database of 2,000,000 records would it be safe to assume that the latter would be much slower all else being equal? And would buying more dedicated servers improve the speed. I want to educate myself on this so I can be prepared for future events.
It is not safe to assume that the bigger data is much slower. An intelligently designed database is going to do such page accesses through an index. For most real problems, the index will fit in memory. The cost of any page access is then:
Cost of looking up where the appropriate records are in the database.
Cost of loading the database pages containing those records into memory.
The cost of index lookups varies little (relatively) based on the size of the index. So, the typical worst case scenario is about 20 disk accesses for getting the data. And, for a wide range of overall table sizes, this doesn't change.
If the table is small and fits in memory, then you have the advantage of fully caching it in the in-memory page cache. This will speed up queries in that case. But the upper limit on performance is fixed.
If the index doesn't fit in memory, then the situation is a bit more complicated.
What would typically increase performance for a single request is having more memory. You need more processors if you have many multiple requests at the same time.
And, if you are seeing linear performance degradation, then you have a database poorly optimized for your application needs. Fixing the database structure in that case will generally be better than throwing more hardware at the problem.

Can we restrict our database to not autogrow? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
can we make our database ( what ever its size) to not auto grow at all ( data and log file ) ?
if we proceed with this choice maybe we will face problems when the database is full during the on hours
Typically the way you prevent growth events from occurring during business hours is by pre-allocating the data and log files to a large enough size to minimize or completely eliminate auto-growth events in the first place. This may mean making the files larger than they need to be right now, but large enough to handle all of the data and/or your largest transactions across some time period x.
Other things you can do to minimize the impact of growth events:
balance the growth size so that growth events are rare, but still don't take a lot of time individually. You don't want the default of 10% and 1MB that come from the model database; but there is no one-size-fits-all answer for what your settings should be.
ensure you are in the right recovery model. If you don't need point-in-time recovery, put your database in SIMPLE. If you do, put it in FULL, but make sure you are taking frequent log backups.
ensure you have instant file initialization enabled. This won't help with log files, but when your data file grows, it should be near instantaneous, up to a certain size (again, no one-size-fits-all here).
get off of slow storage.
Much more info here:
How do you clear the SQL Server transaction log?

Why would a webapp need a server online for only a few hours? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
EC2, RDS charge by the number of hours online, but who would actually benefit from this kind of tariff? Why would a webapp need a server online for only a few hours a day/week/etc.?
The hourly tariffs have many use cases. A big one is scientific research: Astrophysics, Theoretical Computer Science and Mathematics etc. Traditionally universities would have to pay huge amounts of money for computing clusters to be purchased and installed on-site even though they spend most of their time idle and a small amount of time actually processing data.
With the advent of cloud computing, researchers can launch a huge server cluster and have it crunch over data for a few hours or days, get the results and then terminate the cluster. See amazon's high performance computing page for more details. You can also read case studies on how NASA's jet propulsion Lab and European space agency make use of flexible tarriff cloud compute clusters on EC2 for processing their data.
Another use case is for auto-scaling. Amazon's Autoscaling feature allows a load balanced EC2 cluster to be scaled up and down with demand. During heavy load additional servers will be launched and added to the cluster, when load drops again they will be removed. Therefore companies can have massive scalability and only pay for the additional capacity if/when the demand on their web site requires it.
One of the main benefits of cloud deployment is scalability.
For example, if you had an application that served the UK retail industry you might find that your peak usage occurs between 7-9am, 12-2pm and 5-8pm, when your audience are awake/not working.
You may have multiple servers employed during these peak times but only one through the night when traffic is low.
Hourly charging allows for this scalability.