Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
If I wanted to run a small personal site that added, say, 2000 rows of data (150 kb) every hour, would there be any significant difference between using a CSV file or SQL database? I am very new to databases and currently have a prototype that appends data to a CSV file for simplicity, but I would like to know if there are any downsides in speed or memory. I will only need write and lookup. Also, if there is a large amount of redundant data, will a relational database be able to store or detect this efficiently? I do not fully understand the concept.
Edit: this question is not a duplicate of my other question. The other concerns an interchange format that should work between a server and a website, while this question is about a method to store data as a flat file or database.
A CSV is a sequential text file, so lookups will be O(n). That is, it wil take 10x longer to lookup in a file with 10,000 lines than one with 1000.
For this reason, id recommend a SQL database, as they have built in indexing features. You can use something like Access or SQLlite for next to nothing.
The only real downside to a SQL database is that you have to learn how to use it.
So, sql has several features that you need to imlement when using CSV.
CSV won't let you create indexes for fast searching.
If you always need all data from a single table (like for application settings), CSV is faster, otherwise not.
What are some disadvantages?
No indexing
Cannot be partitioned
No transactions
Cannot have NULL values
As per your case as you have large data....its better to go with database rather than using csv.
You can create constraints like unique key constraints to uniquely identify data....There are several features that trival CSV flat file will not support.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
What is the best way of storing and querying data for a simple tasks management application (e.g.)? The goal is to have maximum performance with minimum resources consumption (CPU, disk, RAM) for a single EC2 instance.
This depends also on the use case - will it be the database with many reads or many writes? When you are talking about tasks management, you have to know how many records do you expect, and if you expect more INSERTs or more SELECTs, etc.
Regarding SQL databases, interresting benchmark can be found here:
https://www.sqlite.org/speed.html
The benchmark shows that SQLite can be in many cases very fast, but in some cases also uneffective. (unfortunately the benchmark is not the newest, but still may be helpful)
SQLite is also good in the way it is just a single file on your disk that contains whole database and you can access the database using SQL language.
Very long and exhausting benchmark of the No-SQL can be found i.e. here:
http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf
It is also good to know the database engines, i.e. when using MySQL, choose carefully between MyISAM and InnoDB (nice answer is here What's the difference between MyISAM and InnoDB?).
If you just want to optimize performance, you can think of optimizing using hardware resources (if you read a lot from the DB and you do not have that many writes, you can cache the database (innodb_cache_size) - if you have enough RAM, you can read whole database from RAM.
So the long story short - if you are choosing engine for a very simple and small database, SQLite might be the minimalistic approach you want to use. If you want to build something larger, first be clear about your needs.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to speed up our SQL queries. I have started to read a book on Datawarehousing, where you have a separate database with data in different tables etc. Problem is I do not want to create a separate reporting database for each of our clients for a few reasons:
We have over 200, maintenance on these databases is enough
Reporting data must be available immediately
I was wondering, if i could simply denormalize the tables that i report on, as currently there are a lot of JOINs and believe these are expensive (about 20,000,000 rows in tables). If i copied the data into multiple tables, would this increase the performance by a far bit? I know there are issues with data being copied all over the place, but this could also be good for a history point of view.
Denormalization is no guarantee of an improvement in performance.
Have you considered tuning your application's queries? Take a look at what reports are running, identify places where you can add indexes and partitioning. Perhaps most reports only look at the last month of data - you could partition the data by month, so only a small amount of the table needs to be read when queried. JOINs are not necessarily expensive if the alternative is a large denormalized table that requires a huge full table scan instead of a few index scans...
Your question is much too general - talk with your DBA about doing some traces on the report queries (and look at the plans) to see what you can do to help improve report performance.
The question is very general. It is hard to answer whether denormalization will increase performance.
Basically, it CAN. But personally, I wouldn't consider denormalizing as a solution for Reporting issues. In my practice business people love to build huuuge reports which would kill OLTP DB in the least appropriate time. I would continue reading Datawarehousing :)
Yes for OLAP application your performance will improve by denormalization. but if you use same denormalized table for your OTLP application you will see a performance bottleneck over there. I suggest you too create new denormlize tables or materialized view for your reporting purpose and also you can incremently fast refresh your MV so you will get reporting data immediately.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Currently we have a complex business object which need around 30 joins on our sql database to retrieve one item. (and this is our main use case). The database is around 2Gb in sql server.
We are using entity framework to retrieve data and it takes around 3,5sec to retrieve one item. We haved noticed that using subquery in a parrallel invoke is more performant than using joins when there is a lot of rows in the other table. (so we have something like 10 subqueries). We don't use stored procedure because we would like to keep the Data Access Layer in "plain c#".
The goal would be to retrieve the item under 1sec without changing too much the environnement.
We are looking into no sql solutions (RavenDB, Cassandra, Redis with the "document client") and the new feature "in-memory database" of sql server.
What do you recommend ? Do you think that just one stored procedure call with EF would do the job ?
EDIT 1:
We have indexes on all columns where we are doing joins
In my opinion, if you need 30 joins to retrieve one item, it is something wrong with the design of your database. Maybe it is correct from the relational point of view but what is sure it is totally impractical from the funcional/performance point of view.
A couple of solutions came to my mind:
Denormalize your database design.
I am pretty sure that you can reduce the number of joins improving your performance a lot with that technique.
http://technet.microsoft.com/en-us/library/cc505841.aspx
Use a NoSQL solution like you mention.
Due to the quantity of SQL tables involved this is not going to be an easy change, but maybe you can start introducing NoSQL like a cache for this complex objects.
NoSQL Use Case Scenarios or WHEN to use NoSQL
Of course using stored procedures for this case in much better and it will improve the performance but I do not believe is going to make a dramatic change. You should try id and compare. Also revise all your indexes.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
SQL is one of the most powerful and most currently used languages​​, but for purposes of curiosity and learning would test new technologies and want to know what are the fastest.
I text talking about NoSQL (json) and also about "plain text" file (. Txt or. Dat or. INI) with information from publications, settings, and the like.
What is the fastest processing, taking for example the Wordpress CMS is a very famous and one of the largest in the world, it uses SQL, say we make a request of 50 posts from the database, using the default template, all standardized compared with a requisition 50 posts from the same hierarchy but in file. txt or json file, which technology and fashion that renders faster?
If you will work with storage only in read or write, json or text file will be more faster than mysql, otherwise if you want to process complex data, mysql is faster.
If you want to work with less overhead, try to use SQLite database or similar
NoSQL databases like Redis, MongoDB is faster than MySql, but for using it, you must have personal hosting with root access
Although I don't have numbers to prove my guess, I think that any database will always be faster than a text file, just consider its indexing capabilities.
If instead you want to compare different databases, then, as others already said, it's a matter of the specific domain / problem you're working on and the structure you gave to the specific database schema.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In my app I use a static database to store all counties and census areas with the states and territories of the US. This gets updated by the government every 10 years. I use it to search within a county but there are multiple counties of the same name so a state or territory has to be picked first then they select the county within. My question is that I currently have that data in an SQLite database, is that the most efficient or should I use core data? Ther are 3600 lines with 4 items on each line. I just want it to be the most efficient way of storing and reading the data. There will be no writing to it. So which should I choose, I'm open to others than the two I mentioned.
for you case:
core data:
more flexible
(+) it's a ORM (a big plus! So you can add methods to your data models or store whole object-trees with one command)
(+) better integrated into XCode
(+) can easy handle migration of the data scheme
(-) performance (in your case, 3600records only one table)
sqlite:
(+)performance
(-)only C bindings (or you need a
framework)
and don't forget, CoreData is another layer on the top of sqlite. So,.. it's not so easy to compare those both things.
Core data is not a database, it's an object model graph, based upon a database which may (or not) be SQLite.
Using CoreData would need to rebuild all your queries in your app. You're using a rather small and static database, which, I guess, is working fine. Why do you want to change ? Does your app experiencing some speed problems (in that case set up an index on your columns)? Whatever your choice may be, I bet you won't notice any improvement from a database to another with such a simple and small database.
You can look at this answer about CoreData.