Is there a tool which I can use to inspect and visualize my SQL database?
I'm using MySQL and MySQL Workbench. MySQL Workbench is fine, but I would like to be able to see my db as a graph of objects.
For example, if I have schools, professors and students, the tool would have to figure out the relations and to give me a tree structure (or a forest) of schools that have professors as children and students would be the leafs of the tree.
In general case it would be a graph.
It looks to me as a common problem, but I could not find any good tool for this.
It does not have to be specifically for MySQL, any other SQL db would be good for me.
If I get it right you want to navigate the data and its relationships of your (relational) data source.
This is not exactly what Relational really perform so well: in particular JOINs are the issue in this kind of task.
The more JOIN the slower the query will be. And usually you have to use a lot of JOINs to navigate your data.
Said that, I've tried to give an JS panorama for this kind of tools on this answer: Big data visualization using "search, show context, and expand on demand" concept
More, the developers in my company just released a new blog post about visualizing networks from different data sources on our blog: while the post talks about KeyLines the process it pretty much the same for every JS framework out there, the complexity will change mostly when you have to implement the visualization itself.
Disclaimer: 'm part of the the Keylines dev team.
Related
After my first try to misappropriate Ms-Access - with your help - turned out to be a great success, I have been sent back to do "more of this".
A bit of introduction you can skip if you want:
I am building a data foundation about certain projects from which I want to create analysises and overviews.
The data and findings are to be represented in programs like Excel or Powerpoint, so the process itself is very open. It will probably be very visual with detailed points on request.
However, the data might be changing periodically and if this turns out well, I might repeat the process.
Therefore I think the ideal way would be to have a data layer, then a fixed set of queries on that data and then I would (semi-)manually compile the results into a report in whatever format fits, maybe using external data analysis tools such as R in between.
Trouble is, the only database I have access to is.. well.. Ms Access 2010. I am not at liberty to install anything on this machine.
I could of course use non-install or online tools if you have recommendations for this.
tl;dr: I want to use Ms Access to query data from a relational db into tabular format to be processed further by hand, using as little of Ms-Access VBA and forms as possible.
I have since started to implement a prototype in ms-Access, a standard relational database.
One interesting problem I have come up with with this kind of design is that I have a table for companies involved in the projects. Along with this, I have a table of "relationship" - like stakeholding, ownerships or cooperations.
So let's say company A is building project A, but is just a subsidary of company B, which then partly owned by company C and so on.
Now let's say I want to query all companies involved in a project, but as owners I just want to show the last "elements" of the chain.
Imagine I want to sort the list by net assets, which is usually a figure which is only available for the public companies at the end of the chain, not the project subsidaries up the chain etc.
Is this possible with (Ms-)SQL or would I need to do this in VBA?
Right now I think I could manage do write a VBA function and dump it into a temporary table, but then I'd have to create forms and such.
Another idea that immediately springs from this is ´to answer the question "In which project does company C have stake" by a query. You can see where this is going.
I would prefer the database and the queries to be as flexible as possible (and in this case, independend from the actual Access).
So this time, no mock-program or user-interface. It was a pain to get what I want from Access in the last project and that was with a very specific question set...
But in general I am also open to use different tools if I can.
Thank you so much!
Modelling hierarchies in an RDBMS is a fairly tricky process - some (like Oracle) have built-in functionality to query hierarchical data, but I don't think Access does.
The best solution is to use a "nested set" model. This allows you to model hierarchical data while using standard SQL; it's also pretty fast for querying.
If your data isn't hierarchical, the nested set isn't so useful; the typical solution in that case is to introduce a table to map the relationship - typically including the two related entities, and often with a "relationship type" field (e.g. "parent", "part owner" etc.). This is often called a Directed Acyclical Graph or DAG. There are several ways of modelling these in a database; a "Closure table" is probably the most efficient. This article shows how to do this - it's a heavy read, but I think it answers your question.
I like to use a GUI application to design databases using ERD. Currently I am using the EER Diagram of the free MySQLWorkbench.
Once I like the way the ERD looks, I Forward Engineer the ERD in MySQLWorkbench to create the actual database. Then I introspect the MySQL database with django-admin.py inspectdb to Reverse Engineer into an output of a Python snippet code for Django's models.py.
But then I have to take the inspectdb output and manually edit it to my liking. One particular part I really don't like to do is manually eliminating each join table from a many-to-many relationship.
Is there a good (and preferably free) GUI ERD design program out there specifically designed for Django?
If you wish to design your models at the database level, in the way that you're describing, you are going to need to do exactly that: design your SQL and then convert that into Django models.
This is not the normal way of designing a Django application: typically you would design the models as you wanted them to be, and only put a lot of effort into schema design if you need to resolve some performance problems. Django models aren't really meant to be an abstraction of a relational database: they're meant to be an abstraction of your application's persisted objects, which happens to be implemented on top of a relational database.
There is nothing wrong with wanting/needing to do an explicit schema design, but it makes you a bit of an outlier (most web devlelopers don't), hence the difficulty you're having finding tools suited to your needs.
The closest thing is the graph_models command which is part of Django command extensions. This lets you visualize your models (you still write them in python code, but the visual representation will help you iterate faster).
It is not usual to design a database layout for django apps as such, and doing so is likely to lead to a sub-optimal design.
Instead, just design your model classes, taking account of how you will query them, and the way that ForeignKey (and the other relations types) work. If you don't do this, you are likely to find that your app suffers from conceptual mismatch.
I am currently working on a private project that is going to use Google's GTFS spec to get information about 100s of Public Transit agencies, their routers, stations, times, and other related information. I will be getting my information from here and the google code wiki page with similar info. There is a lot of data and its partitioned into multiple CSV formatted text files. These can be huge, some ranging in 80-100mb of data.
With the data I have, I want to translate it all into a nice solid database that I can build layers on top of to use for my project. I will be using GPS positioning to pinpoint a location and all surrounding stations/stops.
My goal is to access all the information for all these stops and stations with as few calls as possible, while keeping datasets small for queried results.
I am currently leaning towards MongoDB and CouchDB for their GeoSpatial support that can really optimize getting small datasets. But I also need to be sure to link all the stops on a route because I will be propagating information along a transit route for that line. In this case I have found that I can benefit from a Graph DB like Neo4j and OrientDB, but from what I know, neither has GeoSpatial support nor am I 100% sure that a Graph DB would be what I need.
The perfect solution might not exist, but I come here asking for help on finding the best possible for my situation. I know I will possible have to work around limitations of whatever I choose, but I want to at least have done my research and know that its the best I can get at the moment.
I have also been suggested to splinter the data into multiple DBs, but that could get very messy because all the information is very tightly interconnected through IDs.
Any help would be appreciated.
Obviously a graph database fits 100% your problem. My advice here is to go for some geo spatial module over neo4j or orientdb, althought you have some others free and open source implementation.
I think the best one right now, with all the geo spatial thing implemented is neo4j-spatial package. But as far as I know, you can also reproduce most of the geo spatial thing on your own if necessary.
BTW talking about splitting, if the amount of data/queries will be high, I strongly recommend you to share the load and think the model in this terms. Sure you can do something.
I've used Mongo's GeoSpatial features and can offer some guidance if you need help with a C# or javascript implementation - I would recommend it to start because it's super easy to use. I'm learning all about Neo4j right now and I am working on a hybrid approach that takes advantage of both Mongo and Neo4j. You might want to cross reference the documents in Mongo to the nodes in Neo4j using the Mongo object id.
For my hybrid implementation, I'm storing profiles and any other large static data in Mongo. In Neo4j, I'm storing relationships like friend and friend-of-friend. If I wanted to analyze movies two friends are most likely to want to watch together (or really any other relationship I hadn't thought of initially), by keeping that object id reference I can simply add some code instructing each node go out and grab a list of movies from the related profile.
Added 2011-02-12:
Just wanted to follow up on this "hybrid" idea as I created prototypes for and implemented a few more solutions recently where I ended up using more than one database. Martin Fowler refers to this as "Polyglot Persistence."
I'm finding that I am often using a combination of a relational database, document database and a graph database (in my case this is generally SQL Server, MongoDB and Neo4j). Since the question is related to data modeling as much as it is to geospatial, I thought I would touch on that here:
I've used Neo4j for site organization (similar to the idea of hypermedia in the REST model), modeling social data and building recommendations (often based on social data). As a result, I will generally model this part of the application before I begin programming.
I often end up using MongoDB for prototyping the rest of the application because it provides such a simple persistence mechanism. I like to start developing an application with the user interface, so this ends up working well.
When I start moving entities from Mongo to SQL Server, the context is usually important - for instance, if I have an application that allows users to build daily reports based on periodically collected data, it may make sense to run a procedure that builds those reports each night and stores daily report objects in Mongo that may be combined into larger aggregate reports as needed (obviously this doesn't consider a few special cases, but that is not relevant to the point)...on the other hand, if users need to pull on-demand reports limited to very specific time periods, it may make sense to keep everything in SQL server and build those reports as needed.
That said, and this deserves more intense thought, here are some considerations that may be helpful:
I generally try to store entities in a relational database if I find that pulling an entity from the database [in other words(in the context of a relational database) - querying data from the database that provides the data required to generate an entity or list of entities that fulfills the requested parameters] does not require significant processing (multiple joins, for instance)
Do you require ACID compliance(aside:if you have a graph problem, you can leverage Neo4j for this)? There are document databases with ACID compliance, but there's a reason Mongo is not: What does MongoDB not being ACID compliant really mean?
One use of Mongo I saw in the wild that I thought was worthy of mention - Hadoop was being used to compute massive hash tables that were then stored in Mongo. I believe a similar approach is used by TripAdvisor for user based customization in terms of targeting offers, advertising, etc..
NoSQL only exists because MySQL users assume that all databases have their performance problems when their database grows large and/or becomes complex.
I suggest that you use PostGIS. You can use the same database for the rest of your data needs as well.
http://postgis.refractions.net/
My project (in Ruby on Rails 3) is to develop a "social network" site with the following features:
Users can be friends. It's mutual friendships; not asymetric like Twitter.
Users can publish links, to share them. Friends of a user can see what this user has shared.
Friends can comment on those shared links.
So basically we have Users, Links, and Comments, and all that is connected. An interesting thing in social networks is that the User table has kind of a many-to-many relation with itself.
I think I can handle that level of complexity with SQL and RoR.
My question is: would it be a good idea to use MongoDB (or CouchDB) for such a site?
To be honest, I think the answer is no. MongoDB doesn't seem to fit really well with many-to-many relationships. I can't think of a good MongoDB way to implement the friendship relationships. And I've read that Diaspora started with MongoDB but then switched back to classic SQL.
But some articles on the web defend MongoDB for social networks, and above all I want to make a well-informed decision, and not miss a really cool aspect of MongoDB that would change my life.
Also, I've heard about graph DB, which are probably great, but they really seem too young to me, and I don't know how they'd fit with RoR (and not mentioning heroku).
So, am I missing something?
My advice would be to use whatever you're most familiar with so that you can get up and running quickly. From your question it sounds like that would be SQL rather than MongoDB.
I like MongoDB and use it a lot, but I am of the opinion that if you are dealing with relational data, you should use the right tool for it. We have relational databases for that. Mongo and Couch are document stores.
Mongo has a serious disadvantage if you are going to be maintaining a lot of inter-document links. Writes are only guaranteed to be atomic for one document. So you could have inconsistent updates for relations if you are not careful with your schema.
The good thing about MongoDB is that it is very good at scaling. You can shard and create replica sets. Foursquare currently uses MongoDB and it has been working pretty well for them. MongoDB also does map-reduce and has decent geospatial integration. The team that develops MongoDB is excellent, and I live in NY where they are based and have met them. You probably are not going to have scaling issues though I would think starting out.
As far as Diaspora switching... I would not want to follow anything they are doing :)
Your comment about graph dbs is interesting though. I would probably not use a graph DB as my primary DB either, but when dealing with relationships, you can do amazing things with them. In fact usually the demo the guys from graph DB companies will give you is extracting relationship knowledge from a social network. However, there is nothing preventing you from playing with these in the future for network analysis.
In conclusion, when you are starting out here, you are not running into the problems of massive scale yet, and are probably limited on time and money. Keep in mind that even Facebook does not use just one technology, they have basically expanded to NoSQL for certain functionality (like Facebook messaging). There is nothing stopping you in the future from using say Mongo and gridFS for handling image uploads or geo-location etc. It is good to grow as your needs change. I think your gut feeling that you have an SQL app here is right, and the benefits gained with MongoDB would not be realized for a while.
An interesting possibility for this is Riak. It is a cross between a key-value store and a graph database. The userID could be your key and comments and links could be stored in your value. But Riak also has key-to-key linking. This could be used for connecting up your users as friends. The linking is asymmetric so you would need to deal with adding and deleting in both directions, but that shouldn't be too hard.
But note that Riak is not a document datastore, which means it is value-agnostic, which means it won't help you extract internal parts of your value, which means if you want to pull a comment out you'll need to first retrieve every comment and link that's stored with the user.
You may also want to check out other graph databases. Social graphs are the prototypical use-case for a graph database.
Sorry, but you got to start learning about graphs. Use graph databases that spits out data in JSON format & write it to your Mongodb database.You are dealing with a network , which means there must be a graph of some sort that will allow you to create an ego(user, for example) or focal point or logical center, which is connected to other nodes around it.Nodes could be properties of a particular user such as name, likes,friends, pictures etc. An ego graph would resemble a hub and spokes if you were to draw it.
For further reading, you can refer to how Facebook implemented it's Graph API - https://developers.facebook.com/docs/graph-api/overview/
Here is an attempt to connect Mongodb and Neo4j graph database - https://neo4j.com/developer/mongodb/
I'm new to SQL and could use some help in creating a database schema for my program, which manages and installs programs for my home network. Are there any guidelines/tutorials for creating database schemas?
Probably the most important concept to understand before you design your schema (you'll thank yourself for it later, trust me! :-) is that of Normalisation. The tutorial at db.grussell.org doesn't look too shabby and will give you a good grounding. In fact, if you click the "Up One Level" link and take a look around, some of the other information might be quite useful as well.
My "top tip" is: Write it down on paper or in notepad, or anything other than a database, before you start writing code. Get a good idea of what you need your schema to be able to do before you set it in stone (And by "set it in stone" I mean, realise that you've written a load of code against the schema that would have to be re-written if you change it to do what you've just realised you now need).
Designing Databases is a separate field of study and expertise. It cannot be condensed into one answer. Since you are interested in tutorials, look at the section on Database Design in any text book on Database Management Systems. I would recommend
Database System Concepts, 5e, Abraham Silberschatz, Henry F.Korth, Sudarshan
In database design, remember the following
1) You are identifying the important objects of interest in your home network. Try to avoid excessive indulgence in the processes themselves though they are important to identify the important data units that you need to capture
2) Use ER/UML modelling techniques to come up with a Data Model Diagram/design. There are many case tools that can help you in drawing this.
3) Use the principles of Database Normalization to fine tune your schema to avoid data redundancies. Redundant data leads to the following side effects: Inability to maintain consistency among redundant data, Inability to store some data in an elegant manner
3) Forward engineer your design to DDL statements for the DB of your choice. Most case tools support this.
Case tools:
Microsoft Visio
ER Studio (very expensive)
TOAD data modeller
There are many open source tools too. You can try Dia. This does not support forward engineering