website speed performance as it relates to database load [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am new to this but I am curious, does the size of a database negatively affect page load speeds. Like if you had to fetch 20 items from a small database with 20,000 records and then fetch those same 20 items from a database of 2,000,000 records would it be safe to assume that the latter would be much slower all else being equal? And would buying more dedicated servers improve the speed. I want to educate myself on this so I can be prepared for future events.

It is not safe to assume that the bigger data is much slower. An intelligently designed database is going to do such page accesses through an index. For most real problems, the index will fit in memory. The cost of any page access is then:
Cost of looking up where the appropriate records are in the database.
Cost of loading the database pages containing those records into memory.
The cost of index lookups varies little (relatively) based on the size of the index. So, the typical worst case scenario is about 20 disk accesses for getting the data. And, for a wide range of overall table sizes, this doesn't change.
If the table is small and fits in memory, then you have the advantage of fully caching it in the in-memory page cache. This will speed up queries in that case. But the upper limit on performance is fixed.
If the index doesn't fit in memory, then the situation is a bit more complicated.
What would typically increase performance for a single request is having more memory. You need more processors if you have many multiple requests at the same time.
And, if you are seeing linear performance degradation, then you have a database poorly optimized for your application needs. Fixing the database structure in that case will generally be better than throwing more hardware at the problem.

Related

What is the simplest and fastest way of storing and querying simply-structured data? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
What is the best way of storing and querying data for a simple tasks management application (e.g.)? The goal is to have maximum performance with minimum resources consumption (CPU, disk, RAM) for a single EC2 instance.
This depends also on the use case - will it be the database with many reads or many writes? When you are talking about tasks management, you have to know how many records do you expect, and if you expect more INSERTs or more SELECTs, etc.
Regarding SQL databases, interresting benchmark can be found here:
https://www.sqlite.org/speed.html
The benchmark shows that SQLite can be in many cases very fast, but in some cases also uneffective. (unfortunately the benchmark is not the newest, but still may be helpful)
SQLite is also good in the way it is just a single file on your disk that contains whole database and you can access the database using SQL language.
Very long and exhausting benchmark of the No-SQL can be found i.e. here:
http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf
It is also good to know the database engines, i.e. when using MySQL, choose carefully between MyISAM and InnoDB (nice answer is here What's the difference between MyISAM and InnoDB?).
If you just want to optimize performance, you can think of optimizing using hardware resources (if you read a lot from the DB and you do not have that many writes, you can cache the database (innodb_cache_size) - if you have enough RAM, you can read whole database from RAM.
So the long story short - if you are choosing engine for a very simple and small database, SQLite might be the minimalistic approach you want to use. If you want to build something larger, first be clear about your needs.

SQL summing vs running totals [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently in disagreement with my colleague regarding the best design of our database.
We have a need to frequently access the total user balance from our database of transactions, we will potentially need to access this information several times a second.
He says that sql is fast and all we need to do is SUM() the transactions. I, on the other hand, believe that eventually with enough users and a large database our server will be spending most of its time summing the same records in the database. My solution is to have a separate table to keep a record of the totals.
Which one of us is right?
That is an example for database denormalization. It makes the code more complex and introduces potential for inconsistencies, but the query will be faster. If that's worth it depends on the need for the performance boost.
The sum could also be quite fast (i.e. fast enough) if it can be indexed properly.
A third way would be using cached aggregates that are periodically recalculated. Works best if you don't need real-time data (such as for account activity up until yesterday, which you can maybe augment with real-time data from the smaller set of today's data).
Again, the tradeoff is between making things fast and keeping things simple (don't forget that complexity also tend to introduce bugs and increase maintenance costs). It's not a matter of one approach being "right" for all situations.
I don't think that one solution fits all.
You can go very far with a good set of indexes and well written queries. I would start with querying real time until you can't, and then jump to the next solution.
From there, you can go to storing aggregates for all non changing data (for example, beginning of time up to prior month), and just query the sum for any data that changes in this month.
You can save aggregated tables, but how many different kinds of aggregates are you going to save? At some point you have to look into some kind of a multi dimensional structure.

Will denormalization improve performance in SQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to speed up our SQL queries. I have started to read a book on Datawarehousing, where you have a separate database with data in different tables etc. Problem is I do not want to create a separate reporting database for each of our clients for a few reasons:
We have over 200, maintenance on these databases is enough
Reporting data must be available immediately
I was wondering, if i could simply denormalize the tables that i report on, as currently there are a lot of JOINs and believe these are expensive (about 20,000,000 rows in tables). If i copied the data into multiple tables, would this increase the performance by a far bit? I know there are issues with data being copied all over the place, but this could also be good for a history point of view.
Denormalization is no guarantee of an improvement in performance.
Have you considered tuning your application's queries? Take a look at what reports are running, identify places where you can add indexes and partitioning. Perhaps most reports only look at the last month of data - you could partition the data by month, so only a small amount of the table needs to be read when queried. JOINs are not necessarily expensive if the alternative is a large denormalized table that requires a huge full table scan instead of a few index scans...
Your question is much too general - talk with your DBA about doing some traces on the report queries (and look at the plans) to see what you can do to help improve report performance.
The question is very general. It is hard to answer whether denormalization will increase performance.
Basically, it CAN. But personally, I wouldn't consider denormalizing as a solution for Reporting issues. In my practice business people love to build huuuge reports which would kill OLTP DB in the least appropriate time. I would continue reading Datawarehousing :)
Yes for OLAP application your performance will improve by denormalization. but if you use same denormalized table for your OTLP application you will see a performance bottleneck over there. I suggest you too create new denormlize tables or materialized view for your reporting purpose and also you can incremently fast refresh your MV so you will get reporting data immediately.

SQL insert very slow [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
Every 5 seconds I want insert around 10k rows into the table. Table is unnormalized and has no primary keys or any indexes. I noticed that insert performance is very slow - 10k rows in 20 seconds, which is unacceptable for me.
In my understanding indexing could improve only searching performance but not insert. Is it true? Do you have any suggestions how it is possible improve performance?
Besides what Miky's suggesting, you can also improve the performance optimizing your db structure by for example reducing the length of varchar fields, using enums instead of texts and so on. It is also related to referential integrity, and first of all I think, you should normalize the database anyways. Then you can go on optimizing the queries.
You're right in that indexing will do nothing to improve the insert performance (if anything it might hurt it due to extra overhead).
If inserts are slow it could be due to external factors such as the IO performance of the hardware running your SQL Server instance or it could be contention at the database or table level due to other queries. You'll need to get the performance profiler running to determine the cause.
If you're performing the inserts sequentially, you may want to look into performing a bulk insert operation instead which will have better performance characteristics.
And finally, some food for thought, if you're doing 10K inserts every 5 seconds you might want to consider a NoSQL database for bulk storage since they tend to have better performance characteristics for this type of application where you have large and frequent writes.

Does EF not using the same old concept of creating large query that leads to degrade performance? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I know this could be a stupid question, but a beginner I must ask this question to experts to clear my doubt.
When we use Entity Framework to query data from database by joining multiple tables it creates a sql query and this query is then fired to database to fetch records.
"We know that if we execute large query from .net code it will
increase network traffic and performance will be down. So instead of
writing large query we create and execute stored procedure and that
significantly increases the performance."
My question is - does EF not using the same old concept of creating large query that leads to degrade performance.
Experts please clear my doubts.
Thanks.
Contrary to popular myth, stored procedure are not any faster than a regular query. There are some slight, possible direct performance improvements when using stored procedures ( execution plan caching, precompiltion ) but with a modern caching environment and newer query optimizers and performance analysis engines, the benefits are small at best. Combine this with the fact that these potential optimization were already just a small part of the query results generation process, the most time-intensive part being the actual collection, seeking, sorting, merging, etc. of data, these stored procedure advantages are downright irrelevant.
Now, one other point. There is absolutely no way, ever, that by creating 500 bytes for the text of a query versus 50 bytes for the name of a stored procedure that you are going to have any effect on a 100 M b / s link.