What is the simplest and fastest way of storing and querying simply-structured data? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
What is the best way of storing and querying data for a simple tasks management application (e.g.)? The goal is to have maximum performance with minimum resources consumption (CPU, disk, RAM) for a single EC2 instance.

This depends also on the use case - will it be the database with many reads or many writes? When you are talking about tasks management, you have to know how many records do you expect, and if you expect more INSERTs or more SELECTs, etc.
Regarding SQL databases, interresting benchmark can be found here:
https://www.sqlite.org/speed.html
The benchmark shows that SQLite can be in many cases very fast, but in some cases also uneffective. (unfortunately the benchmark is not the newest, but still may be helpful)
SQLite is also good in the way it is just a single file on your disk that contains whole database and you can access the database using SQL language.
Very long and exhausting benchmark of the No-SQL can be found i.e. here:
http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf
It is also good to know the database engines, i.e. when using MySQL, choose carefully between MyISAM and InnoDB (nice answer is here What's the difference between MyISAM and InnoDB?).
If you just want to optimize performance, you can think of optimizing using hardware resources (if you read a lot from the DB and you do not have that many writes, you can cache the database (innodb_cache_size) - if you have enough RAM, you can read whole database from RAM.
So the long story short - if you are choosing engine for a very simple and small database, SQLite might be the minimalistic approach you want to use. If you want to build something larger, first be clear about your needs.

Related

Will denormalization improve performance in SQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to speed up our SQL queries. I have started to read a book on Datawarehousing, where you have a separate database with data in different tables etc. Problem is I do not want to create a separate reporting database for each of our clients for a few reasons:
We have over 200, maintenance on these databases is enough
Reporting data must be available immediately
I was wondering, if i could simply denormalize the tables that i report on, as currently there are a lot of JOINs and believe these are expensive (about 20,000,000 rows in tables). If i copied the data into multiple tables, would this increase the performance by a far bit? I know there are issues with data being copied all over the place, but this could also be good for a history point of view.
Denormalization is no guarantee of an improvement in performance.
Have you considered tuning your application's queries? Take a look at what reports are running, identify places where you can add indexes and partitioning. Perhaps most reports only look at the last month of data - you could partition the data by month, so only a small amount of the table needs to be read when queried. JOINs are not necessarily expensive if the alternative is a large denormalized table that requires a huge full table scan instead of a few index scans...
Your question is much too general - talk with your DBA about doing some traces on the report queries (and look at the plans) to see what you can do to help improve report performance.
The question is very general. It is hard to answer whether denormalization will increase performance.
Basically, it CAN. But personally, I wouldn't consider denormalizing as a solution for Reporting issues. In my practice business people love to build huuuge reports which would kill OLTP DB in the least appropriate time. I would continue reading Datawarehousing :)
Yes for OLAP application your performance will improve by denormalization. but if you use same denormalized table for your OTLP application you will see a performance bottleneck over there. I suggest you too create new denormlize tables or materialized view for your reporting purpose and also you can incremently fast refresh your MV so you will get reporting data immediately.

Why don't databases have good full text indexes [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Why don't any of the major RDBMS systems like MySQL, SQL Server, Oracle, etc. have good full text indexing support?
I realize that most databases support full text indexes to some degree, but that they are usually slower, and with a smaller feature set. It seems that every time you want a really good full text index, you have to go outside the database and use something like Lucene/Solr or Sphinx.
Why isn't the technology in these full text search engines completely integrated into the database engine? There's lot of problems with keeping the data in another system such as Lucence, including keeping the data up to date, and the inability to join the results with other tables. Is there a specific technological reason why these two technologies can't be integrated?
RDBMS indexed serve a different purpose. They are there to offer the engine a way to optimize access to the data, both by the user and by the engine itself (to resolve joins, check foreign keys, etc...). As such they are really not a functional data structure.
Tools like full-text search, tag clouds may be very useful for enhancing the user experience. These serve only the user and applications. They are functional, and require real data structures... secondary tables or derived fields... with, typically, a whole lot of triggers and code to keep these updated.
And IMHO... there are many ways to implement these technologies. RDBMS producers would have to maybe choose some tech over another... for reasons that have nothing to do with the RDBMS engine itself. That does not really seem their job.

When is a graph database (like Neo4j) not a good use? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
There is a lot of articles on the web supporting the trend to move to a graph database like Neo4j... but I can't find much against them.
When would a graph database not be the best solution?
Any links to articles that compare graphs, nosql, and relational databases would be great.
Currently I would not use Neo4j in a high volume write situation. The writes are still limited to a single machine, so you're restricted to a single machine's throughput, until they figure out some way of sharding (which is, by the way, in the works). In high volume write situations, you would probably look at some other store like Cassandra or MongoDB, and sacrifice other benefits a graph database gives you.
Another thing I would not currently use Neo4j for is full-text search, although it does have some built-in facility (as it uses Lucene for indexing under the hood), it is limited in scope and difficult to use from the latest Cypher. I understand that this is going to be improving rapidly in the next couple of releases, and look forward to that. Something like ElasticSearch or Solr would do a better job for FTS-related things.
Contrary to popular belief, tabular data is often well-fitted to the graph, unless you really have very denormalized data, like log records.
The good news is you can take advantage of many of these things together, picking the best tool for the job, and implement a polyglot persistence solution to answer your questions the best way possible.
Also, I would not use neo4j for serving and storing binary data. There are much better options for images, videos and large text documents out there - use them either as indexes with Neo4j, or just reference them.
When would a graph database not be the best solution?
When you work in a conservative company.
Insert some well thought-out technical reason here.

Does EF not using the same old concept of creating large query that leads to degrade performance? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I know this could be a stupid question, but a beginner I must ask this question to experts to clear my doubt.
When we use Entity Framework to query data from database by joining multiple tables it creates a sql query and this query is then fired to database to fetch records.
"We know that if we execute large query from .net code it will
increase network traffic and performance will be down. So instead of
writing large query we create and execute stored procedure and that
significantly increases the performance."
My question is - does EF not using the same old concept of creating large query that leads to degrade performance.
Experts please clear my doubts.
Thanks.
Contrary to popular myth, stored procedure are not any faster than a regular query. There are some slight, possible direct performance improvements when using stored procedures ( execution plan caching, precompiltion ) but with a modern caching environment and newer query optimizers and performance analysis engines, the benefits are small at best. Combine this with the fact that these potential optimization were already just a small part of the query results generation process, the most time-intensive part being the actual collection, seeking, sorting, merging, etc. of data, these stored procedure advantages are downright irrelevant.
Now, one other point. There is absolutely no way, ever, that by creating 500 bytes for the text of a query versus 50 bytes for the name of a stored procedure that you are going to have any effect on a 100 M b / s link.

what so special about redis? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I mean: link text
why should one use this over MySQL or something similar?
SPEED
Redis is pretty fast!, 110000
SETs/second, 81000 GETs/second in an
entry level Linux box. Check the
benchmarks.
Most important is speed. No way you can get these numbers using SQL.
COMMANDS
It is possible to think at Redis as a
data structures server, it is not just another key-value DB, see all the
commands supported by Redis to
get the first feeling
Sometimes people call Redis Memcached on steroids
Like many NoSQL databases, one would use Redis if it fits your needs. It does not directly compete with RDBMS solutions like MySQL, PostgreSQL, etc. One may need to use multiple NoSQL solutions in order to replace the functionality of a RDBMS. I personally do not consider Redis to be a primary data store - only something to be used for speciality cases like caching, queuing, etc. Document databases like MongoDB or CouchDB may work as a primary data store and be able to replace RDBMSs, but there are certainly projects where a RDBMS would work better than a document database.
This Wikipedia article on NoSQL will explain.
These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally.