To Neo4j or to not? If yes, to what extent? - sql

My Rails app relies on match making algorithms, for which I found Neo4j DB to be a great candidate. One issue is that I need to switch to JRuby in order to integrate Neo4j. Another gem called Neography doesn't need JRuby, but doesn't cover all the features of Neo4j. I'm not that happy switching to Java and JBoss.
Should I rely only on Neo4j, or should I have SQL (mySql or PostGRE) to store all my data and use Neo4G just for match making?
If yes to the second, how hard would it be to integrate both databases, how hard would it be to use Neo4j only for the match making, and what should I take into consideration?
Another issue is keeping both DBs synchronized.

Neography supports all features of Neo4j that you would need.
There is no need to go with jruby and neo4j.rb if you don't want to.
Shouldn't be too hard to synchronize the two db's. Just write consistently to both.
In Neo4j you probably just want to keep the data needed for the matchmaking queries.

Related

For a data analytical web app, would it be better to use django's auto-generated relational objects or raw SQL queries?

I am making an application that will serve as a demonstration of the data analysis capabilities our team can provide. I am very new to django and a the auto-generated APIs seem pretty cool, but I worry about scalability and having the quickness that only a carefully constructed and queried database can provide. Has anyone been in this situation and regretted/been satisfied with there choice of raw queries vs django APIs?
Django's ORM is great. It is one of the most complete and easy to use ORM I have ever encounter, but as every ORM, it has its limitations.
If your application requires full control of the database and very efficient queries, you might consider other approaches and compare them to Django and see which one fits better. You'll have to do some research.
Django is great for developing very fast complicated database applications, but I'm sure --if the application grows long enough-- sooner or later you'll have to start working directly with your database engine for optimization reasons. ORMs are generic tools, so database engine specific functions will not be available.
There is no rule to decide wheter or not django is gonna work for you but, one thing I can tell you is that its ORM helps you get your app started very quick and if you find some specific circumstance where you need to customize your SQL, then you can do it in Django as well. If not, just create a Python module which handle the database as you like in those specific circumstances and use it from your Django code. That probably will be the best way if you need to show your very efficient data analysis capabilities.
I hope this bring some light, it is a very wide question. One thing I'm sure is that you won't regret until your app grows big enough and when it does, you'll have the resources to find great programmers that could twist Python to handle every specific situation that behaves odd with Django's database accessing tools.
Found this link which may be helpful.
Good luck!
I'd consider this a case of premature optimization.
Django ORM is good enough in a general case, provided that your database is reasonably designed, has appropriate indexes, etc.
In > 90% cases this will be adequate, and often optimal.
When you will have identified specific complicated and slow queries, have reviewed the ORM-generated SQL and came up with a better query, then you may add a special case for it.
Maybe you will have more than one such special case. I still think that ORM will save you a lot of legwork in the 90% of database access cases where it is adequate.
Besides querying, an ORM allows you to describe the DB schema, its constraints, ways to recreate it and migrate it between versions, etc. Even if the ORM would not let you query the DB, these management capabilities would be enough reason to use an ORM.

How do I add (sql) database to my sintra application?

When I started to code my sinatra application I never used it before. Note that I had and still have no experience with RoR. I had one .rb file and one .haml and was happy. Now I had to split .rb file into about 10 'library' files as the whole application gets more and more complex.
I store some application logs/info in csv files and now I am getting conflicts when accessing the csv file. So I think that I need to introduce "proper" database solution. I want it to be part of my ruby (sinatra) application.
How can I introduce 'light' sql database into my sintra application?
I am on ruby 1.8.7 (2010-08-16 patchlevel 302) [i386-mingw32] soon upgrading to 1.9 (hopefully)
I'd recommend looking at Sequel. It's very flexible and powerful, and works well with SQLite, MySQL, Postgres, Oracle and other DBMs. It's not opinionated about how you talk to the database; You can use it as an ORM or with simple datasets, and allows embedded SQL or more programmatic approaches.
For ORM, both ActiveRecord and Sequel are recommended. About database, I guess sqlite3 will be good enough for your need. Also you can choose mysql or pg.
If you want to use active_record, you'll find this article very useful.
And if Sequel is the choice, just read Sequel documents here.
After the gems installed. You can start writing some code to connect the db. Then maybe some migration task to build database tables (and don't forget build some corresponding models). Both gem have similar syntax for migrations. After that, import your csv data and well done.
I have had no trouble using either Active Record or DataMapper to add object persistence to my Sinatra apps. People also tell me Sequel is very good but philosophically it is not not worlds apart from Active Record imho.
Active Record and Sequel favour a more database-centric approach, whereby you spell out your tables as a set of database and table definitions in a collection of migration files and merge them into a schema which is then used to build or update your database tables. If you really care about the underlying SQL database then one of these is for you. I find them to be six of one, a half-dozen the other.
DataMapper is more object-centric and lets you define the properties and object relationships you need in your object's own class definition; and then when your app launches you make sure you call DataMapper.auto_upgrade! and it upgrades the database to suit your object graph. The upside is that you only have one place to look to find what properties your object might have. The downside is you have less control over the specifics of the underlying databases, though it's not impossible to tightly define the mappings, DataMapper works well when you care about object graphs over database tables.
The good news is they pretty much all work in the same way once you have your mappings from object graph to SQL database tables defined. All support lazy or pre-emptive loading of related collections of objects, many_to_many relationships, polymorphism, etc, and tend to vary only in configuration and seeding details.
I often start projects using DataMapper just for its speed of throwing up and tearing down database schemas, as the app's object graph is still in flux; I refactor quickly to use Active Record when the schema has settled down. Next project I think I'll give Sequel a go though, as people do seem to rave about it.
I have had success using datamapper with Sinatra, put like the other post you can also use Sequel and Active Record. One advantage to maybe using Active Record though is if you do ever want to use/learn ROR, Active Record is the default ORM so that might be something that you want to consider.
If you don't want to go the ORM route you can always use the SQL-Ruby gem which will allow you to create and run sql queries. Here is some example code from the website http://sqlite-ruby.rubyforge.org/
require 'sqlite'
db = SQLite::Database.new( "data.db" )
db.execute( "select * from table" ) do |row|
p row
end
db.close

How to maintain SQL scripts when developing an application working against many databases

Imagine an application which is supposed to work with different database vendors. As we all know the syntax for SQLs (especially DDL) is not portable. How do you deal with maintaing the SQL scripts?
Until now I see three options:
to store SQLs in format of one of the databases and have a tool which automatically converts from one syntax do another (do you know such tools?)
to store SQLs in some artificial language and a have a tool which is able to generate vendor-specific SQLs on demand (any recommendation here?)
to store SQLs in many database formats neglecting the redundancy (this is the worst one, isn't it?)
Do you recommend any of them? Do you have a better idea?
The development environment tries to follow the continuous integration principles, so automation is a key feature here.
Have a look at Liquibase (that's essentially your second item on the list)
http://www.liquibase.org
It's not perfect (e.g. it does not support check constraints) but it is quite useful
This video shows a solution using the Subsonic project http://subsonicproject.com/docs/Using_SimpleRepository and its data migration capabilities. The strategy is to use a general language and apply it to different databases.
Hope this is what you were looking for
Use some kind of ORM framework with schema generation capability.

Replace SQLite with SQL Server?

Does anyone know if it's good solution to use SQLite in multi-thread environment.
I want to replace SQL Server with more simple and built-in database as there is no need to feed such big server DB. The supposed max size of DB would be 4 gigabyte after 4-5 years of usage. Is it normal for built-in DB? Could it affect performance?
It depends on the type of queries you would use. If the queries are simple selects with plain joins, then SQLite could do fine but I think you would still be better off with e. g. Firebird 2.5 when the stable release gets out (RC3 is available now). You would have somewhat richer SQL to work with. I don't know how much bulk loads are important for you, but neither SQLite nor Firebird are very strong in this area. If you need good bulk insert performance and low cost, then you should look at PostgreSQL or MySQL. There is also a very interesting looking database I happened to stumble upon recently called CUBRID. I have only installed it so far, so I can't tell how good or bad it is but it certainly seems worth a look.
You might also want to look at this wikipedia article:
http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
I don't know which distro you're talking about here. I've only used SQLite.NET and I know it works well on multithreaded applications.
It can also be deployed on client-server systems so you need not worry at all.
Considering Vinko's statement of 'real' databases, you can ignore him. SQLite is really worth it's salt.
If you're working with .NET, you might find this link useful:
http://sqlite.phxsoftware.com
According to the documentation SQLite is thread safe but there are caveats.
You can use SQLite in a multithreaded environment, but if and only if you build a special version of it (and find out if the library you'll be using it supports it and tweak it if it doesn't.) So, assuming your library supports multithreaded SQLite, if you really need a high level of concurrency to the database you may prefer to use a 'real' database. Be it MSSQL or any other falls out of the scope of the question.
Consider MySQL and SQL Server Express, for example.
If your concurrency level is low, SQLite can cope with it.
I would also suggest you to take a look at the CUBRID database. It has nice optimizations for Web applications and it is easy to learn.

ORM: Handwritten schema or auto-generated?

Should I use a hand-written schema for my projected developed in a high-level language (such as Python, Ruby) or should I let my ORM solution auto-generate it?
Eventually I will need to migrate without destroying all the data. It's okay to be tied to a specific RDBMS but it would be nice if features such as constraints and procedures could be supported somehow.
I never go with ORM-generated schema.
I find that the ways in which the ORM wants to generate the schema are often at total odds with how I want my database to be structured. Also, and I know this is trivial, the nomenclature scheme is usually poor.
Database structure has its own constraints, that I find that usually the ORM autogeneration tools don't consider fully. And if you're going to be wanting to run reports on your database later (and you will), then having good database structure and design is very important.
See this Coding Horror article and links for discussion on that migration you'll eventually need to do. Plan for it now.
Also see Martin Fowler on database evolution; I particularly recommend the notion that test data generation is part of database set-up. The idea may be a little underdeveloped, in that there is not a clear delineation of the different problems in different environments, development versus QA versus production.
Let the ORM generate the schema it wants. Then you can always change things that are too slow or that you want differently. But it allows you to quickly get started and have something working plus the ORM people usually know what they do when it comes to generating schemas.
Let your ORM solution generate it, but don't just blindly use it; read through it and sanity-check it.