I've got to the point in development where I need to set up a database migration tool. So far I've checked out some stackoverflow answers about the subject and it would seem like sails-db-migrate would be good to use, since db-migrate which it's based on seems to have ok docs, and it was recommended by one of the Balderdash guys.
My problem here is this: I have two databases related to my app, one where we store data like users, models related to our devices etc, and one where we store data collected by these devices, like movement data or power on times. I've been able to set it up in Sails pretty easily in connections.js, but I found no mention of using multiple DBs in sails-db-migrate nor db-migrate docs.
Does anyone have experience in how to deal with this situation?
Related
I am currently working on a private project that is going to use Google's GTFS spec to get information about 100s of Public Transit agencies, their routers, stations, times, and other related information. I will be getting my information from here and the google code wiki page with similar info. There is a lot of data and its partitioned into multiple CSV formatted text files. These can be huge, some ranging in 80-100mb of data.
With the data I have, I want to translate it all into a nice solid database that I can build layers on top of to use for my project. I will be using GPS positioning to pinpoint a location and all surrounding stations/stops.
My goal is to access all the information for all these stops and stations with as few calls as possible, while keeping datasets small for queried results.
I am currently leaning towards MongoDB and CouchDB for their GeoSpatial support that can really optimize getting small datasets. But I also need to be sure to link all the stops on a route because I will be propagating information along a transit route for that line. In this case I have found that I can benefit from a Graph DB like Neo4j and OrientDB, but from what I know, neither has GeoSpatial support nor am I 100% sure that a Graph DB would be what I need.
The perfect solution might not exist, but I come here asking for help on finding the best possible for my situation. I know I will possible have to work around limitations of whatever I choose, but I want to at least have done my research and know that its the best I can get at the moment.
I have also been suggested to splinter the data into multiple DBs, but that could get very messy because all the information is very tightly interconnected through IDs.
Any help would be appreciated.
Obviously a graph database fits 100% your problem. My advice here is to go for some geo spatial module over neo4j or orientdb, althought you have some others free and open source implementation.
I think the best one right now, with all the geo spatial thing implemented is neo4j-spatial package. But as far as I know, you can also reproduce most of the geo spatial thing on your own if necessary.
BTW talking about splitting, if the amount of data/queries will be high, I strongly recommend you to share the load and think the model in this terms. Sure you can do something.
I've used Mongo's GeoSpatial features and can offer some guidance if you need help with a C# or javascript implementation - I would recommend it to start because it's super easy to use. I'm learning all about Neo4j right now and I am working on a hybrid approach that takes advantage of both Mongo and Neo4j. You might want to cross reference the documents in Mongo to the nodes in Neo4j using the Mongo object id.
For my hybrid implementation, I'm storing profiles and any other large static data in Mongo. In Neo4j, I'm storing relationships like friend and friend-of-friend. If I wanted to analyze movies two friends are most likely to want to watch together (or really any other relationship I hadn't thought of initially), by keeping that object id reference I can simply add some code instructing each node go out and grab a list of movies from the related profile.
Added 2011-02-12:
Just wanted to follow up on this "hybrid" idea as I created prototypes for and implemented a few more solutions recently where I ended up using more than one database. Martin Fowler refers to this as "Polyglot Persistence."
I'm finding that I am often using a combination of a relational database, document database and a graph database (in my case this is generally SQL Server, MongoDB and Neo4j). Since the question is related to data modeling as much as it is to geospatial, I thought I would touch on that here:
I've used Neo4j for site organization (similar to the idea of hypermedia in the REST model), modeling social data and building recommendations (often based on social data). As a result, I will generally model this part of the application before I begin programming.
I often end up using MongoDB for prototyping the rest of the application because it provides such a simple persistence mechanism. I like to start developing an application with the user interface, so this ends up working well.
When I start moving entities from Mongo to SQL Server, the context is usually important - for instance, if I have an application that allows users to build daily reports based on periodically collected data, it may make sense to run a procedure that builds those reports each night and stores daily report objects in Mongo that may be combined into larger aggregate reports as needed (obviously this doesn't consider a few special cases, but that is not relevant to the point)...on the other hand, if users need to pull on-demand reports limited to very specific time periods, it may make sense to keep everything in SQL server and build those reports as needed.
That said, and this deserves more intense thought, here are some considerations that may be helpful:
I generally try to store entities in a relational database if I find that pulling an entity from the database [in other words(in the context of a relational database) - querying data from the database that provides the data required to generate an entity or list of entities that fulfills the requested parameters] does not require significant processing (multiple joins, for instance)
Do you require ACID compliance(aside:if you have a graph problem, you can leverage Neo4j for this)? There are document databases with ACID compliance, but there's a reason Mongo is not: What does MongoDB not being ACID compliant really mean?
One use of Mongo I saw in the wild that I thought was worthy of mention - Hadoop was being used to compute massive hash tables that were then stored in Mongo. I believe a similar approach is used by TripAdvisor for user based customization in terms of targeting offers, advertising, etc..
NoSQL only exists because MySQL users assume that all databases have their performance problems when their database grows large and/or becomes complex.
I suggest that you use PostGIS. You can use the same database for the rest of your data needs as well.
http://postgis.refractions.net/
This may be a pipe dream, but I'm hoping someone knows of a tool which can be configured to compare all or some (keys) of the data in two identical database and merge, perhaps based on relationships.
Specifically looking for one for SQL Server.
I'm not really asking for the best one, but if it exists it would be nice to hear how it is used.
Any other ideas for how to manage the work done or data added in dev and push it out to production without copying the entire database are welcome.
Thanks!
We use this and personally think it's excellent.
http://www.red-gate.com/products/sql-development/sql-data-compare/
There is also another product for the schema side.
http://www.red-gate.com/products/sql-development/sql-compare/
I don't know of a specific tools but you can implement in your process of publications the analysis and executions of delta files, containing the diffs from one verision to another. Magento, Wordpress are using something like this for example. They have something like this
//sql_update_001_002.sql
UPDATE some_table...
DELETE some entries
CREATE a_new_table...
// compere some keys or do other logic.
//etc
Then they have a script that analyses the current version and if needed it executes the corresponding sql.
Navicat allows to make data and structure synchronization between 2 databases (also located on different servers).
In terms of tools I agree with Chris - Redgate's toolset for both schema and data comparisons
If you are also thinking about your overall db dev process - then I have written a blog on the topic which might be of interest.
It also has some links to how others have tackled this subject.
http://michaelbaylon.wordpress.com/category/data-management/database-development/sql-script-management/
My project (in Ruby on Rails 3) is to develop a "social network" site with the following features:
Users can be friends. It's mutual friendships; not asymetric like Twitter.
Users can publish links, to share them. Friends of a user can see what this user has shared.
Friends can comment on those shared links.
So basically we have Users, Links, and Comments, and all that is connected. An interesting thing in social networks is that the User table has kind of a many-to-many relation with itself.
I think I can handle that level of complexity with SQL and RoR.
My question is: would it be a good idea to use MongoDB (or CouchDB) for such a site?
To be honest, I think the answer is no. MongoDB doesn't seem to fit really well with many-to-many relationships. I can't think of a good MongoDB way to implement the friendship relationships. And I've read that Diaspora started with MongoDB but then switched back to classic SQL.
But some articles on the web defend MongoDB for social networks, and above all I want to make a well-informed decision, and not miss a really cool aspect of MongoDB that would change my life.
Also, I've heard about graph DB, which are probably great, but they really seem too young to me, and I don't know how they'd fit with RoR (and not mentioning heroku).
So, am I missing something?
My advice would be to use whatever you're most familiar with so that you can get up and running quickly. From your question it sounds like that would be SQL rather than MongoDB.
I like MongoDB and use it a lot, but I am of the opinion that if you are dealing with relational data, you should use the right tool for it. We have relational databases for that. Mongo and Couch are document stores.
Mongo has a serious disadvantage if you are going to be maintaining a lot of inter-document links. Writes are only guaranteed to be atomic for one document. So you could have inconsistent updates for relations if you are not careful with your schema.
The good thing about MongoDB is that it is very good at scaling. You can shard and create replica sets. Foursquare currently uses MongoDB and it has been working pretty well for them. MongoDB also does map-reduce and has decent geospatial integration. The team that develops MongoDB is excellent, and I live in NY where they are based and have met them. You probably are not going to have scaling issues though I would think starting out.
As far as Diaspora switching... I would not want to follow anything they are doing :)
Your comment about graph dbs is interesting though. I would probably not use a graph DB as my primary DB either, but when dealing with relationships, you can do amazing things with them. In fact usually the demo the guys from graph DB companies will give you is extracting relationship knowledge from a social network. However, there is nothing preventing you from playing with these in the future for network analysis.
In conclusion, when you are starting out here, you are not running into the problems of massive scale yet, and are probably limited on time and money. Keep in mind that even Facebook does not use just one technology, they have basically expanded to NoSQL for certain functionality (like Facebook messaging). There is nothing stopping you in the future from using say Mongo and gridFS for handling image uploads or geo-location etc. It is good to grow as your needs change. I think your gut feeling that you have an SQL app here is right, and the benefits gained with MongoDB would not be realized for a while.
An interesting possibility for this is Riak. It is a cross between a key-value store and a graph database. The userID could be your key and comments and links could be stored in your value. But Riak also has key-to-key linking. This could be used for connecting up your users as friends. The linking is asymmetric so you would need to deal with adding and deleting in both directions, but that shouldn't be too hard.
But note that Riak is not a document datastore, which means it is value-agnostic, which means it won't help you extract internal parts of your value, which means if you want to pull a comment out you'll need to first retrieve every comment and link that's stored with the user.
You may also want to check out other graph databases. Social graphs are the prototypical use-case for a graph database.
Sorry, but you got to start learning about graphs. Use graph databases that spits out data in JSON format & write it to your Mongodb database.You are dealing with a network , which means there must be a graph of some sort that will allow you to create an ego(user, for example) or focal point or logical center, which is connected to other nodes around it.Nodes could be properties of a particular user such as name, likes,friends, pictures etc. An ego graph would resemble a hub and spokes if you were to draw it.
For further reading, you can refer to how Facebook implemented it's Graph API - https://developers.facebook.com/docs/graph-api/overview/
Here is an attempt to connect Mongodb and Neo4j graph database - https://neo4j.com/developer/mongodb/
I am looking for an overview of data synchronization techniques available on the iPhone platform. We need the ability to be able to sync a subset of content from a server to a local database residing on the iPhone.
On other projects I have worked on, the data synchronization was handled by the database. Is that available in SQLite? If not, any suggestions on techniques? Rolling our own would not be my first choice.
Thanks in advance.
Unfortunately I don't think there's currently a tool/feature that does this. A similar question was posted, which links to an article how someone rolled their own.
In essence, you could create a "pending" queue table that kept track of the record id's that needed to be updated. When the app needed to synch, as long as you had a way to identify the local record with the server record it could synch it that way. And of course the opposite way from the server down.
iPhone SQLite DB and Web-based DB synchronization and interaction recommendations
There is one framework that could help you if you are using Core Data.
You shall have a look at ZSync. An overview is available here:
http://ideveloper.tv/freevideo/details?index=17089146
If you are using SQLite I strongly suggest that you consider switching to Core Data. You will certainly gain some performances and the integration of undo/redo.
You'll have to remove a whole bunch of custom code anyway... :)
I was searching the net for something like a wiki database, just like wikipedia but instead stores structured content, editable by users. What I was looking for was an online database accessible by everyone where people can design the schema and data with proper versioning of both schema and data. I couldn't find any such site. I am not sure if it is my search skills or if there really is no wiki database as of now. Does anyone out there know anything like this?
I think there is a great potential for something like this. A possible example will be a website with a GUI for querying a MySQL DB where any website visitor can create DB objects and populate data.
UPDATE: I had registered the domain wikidatabase.org to get started on a tool but I didn't find enough time yet. If anyone is interested in spending some time and coding on this, please let me know at wikidatabase.org
It's not quite what you're looking for, but Semantic Mediawiki adds database-like features to MediaWiki:
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
It's still fundamentally a Wiki, but you can add semantic tags to pages ([[foo::bar]] [[baz::1000]]) and then do database-type queries across them: SELECT baz FROM pages WHERE foo=bar would be {{#ask: [[foo::bar]] | ?baz}}. There is even an embryonic SPARQL implementation for pseudo-SQL queries.
OK this question is old, but Google led me here, so for anyone else out there looking for a wiki for structured data: Take a look at Foswiki.
This might be like what you're looking for: dbpedia.org. They're working on extracting data from Wikipedia, and encoding it in a structured format using RDF, so that it can be queried using SPARQL.
Linkeddata.org has a big list of RDF data sets.
Do you mean something like http://www.freebase.com?
You should check out https://www.wikidata.org/wiki/Wikidata:Main_Page which is a bit different but still may be of interest.
Something that might come close to your requirements is Google Docs.
What's offered is document editing roughly similar to MS Word, and spreadsheets roughly similar to Excel. I'm thinking of the latter, of course.
In Google Docs, You can create spreadsheets for free; being spreadsheets, they naturally have a row-and-column structure similar to a database, and which you can define flexibly. You can also share these sheets with other people. This seems to be a by-invite-only process rather than open-to-all, but there may be other possibilities I'm not aware of, or that level of sharing might be enough for you in any case.
mindtouch should be able to do it. It's rather easy to get data in / out. (for example: it's trivial to aggregate all the IP's for servers into one table).
I pretty much use it as a DB in the wiki itself (pages have tables, key/value..inheritance, templates, etc...) but you can also interface with the API, write dekiscript, grab the XML...
I like this idea. I have heard of some sites that are trying to pull together large datasets for various things for open consumption, but none that would allow a wiki feel.
You could start with something as simple as an installation of phpMyAdmin with a known password that would allow people to log in, create a database, edit data and query from any other site on the web.
It might suffer from more accuracy problems than wikipedia though.
OpenRecord, development of which seems to have halted in 2008, seems to approach this. It is a structured wiki in which pages are views on the data. Unlike RDBMSes it is loosely typed - the system tries to make a best guess about what data you entered, but defaults to text when it cannot guess. Schemas appear to have been implied.
http://openrecord.org
An example of the typing that is given is that of a date. If you enter '2008' in a record, the system interprets this as a date. If you enter 'unknown' however, the system allows that as well.
Perhaps you might be interested in Couch DB:
Apache CouchDB is a document-oriented
database that can be queried and
indexed in a MapReduce fashion using
JavaScript. CouchDB also offers
incremental replication with
bi-directional conflict detection and
resolution.
I'm working on an Open Source PHP / Symfony / PostgreSQL app that does this.
It allows multiple projects, each project can have multiple directories, each directory has a defined field structure. Admins set all this up.
Then members of the public can suggest new records, edit or report existing ones. All this is moderated and versioned.
It's early days yet but it basically works and is already in real world use in several projects.
Future plans already in progress include tools to help keep the data up to date, better searching/querying and field types that allow translations of content between languages.
There is more at http://www.directoki.org/
I'm surprised that nobody has mentioned Wikibase yet, which is the software that powers Wikidata.