free public databases with non-trivial table structures? - sql

I'm looking for some sample database data that I can use for testing and demonstrating a DB tool I am working on. I need a DB that has (preferably) many tables, and many foreign key relationships between the tables.
Ideally the data would be in SQL dump format, or at least in something that maintains the foreign key references, and could be easily imported into an RDBMS (MySQL or H2).
The dataset itself doesn't have to be huge (in fact, best if it's not). I thought about using the Stackoverflow Data Dump, but it's only about 5 tables.

What about using the entire Wikipedia database?

I should learn to RTFM- MySQL has a sample database for exactly this kind of thing. It's called Sakila. It's small, but it does have a good number of connected tables. I'm still eager to hear more suggestions though.

Related

what are the cons and pros of SQL and NoSQL databases usage in the same project?

actually, I'm not sure if Stackoverflow is a suitable platform for asking such questions or not but I looked for this question many times and found many answers, and all answers agreed that NoSQL usage is perfect for real-time data transfer.
what I want to ask is that during a conversation between me and someone I used Django Celery to achieve the task using PostgreSQL where I used to Update data in real-time but he advised me that SQL is not the preferable database to execute such that task because it's slower so, he advised me to use NoSQL database instead or something like MongoDB. I take a while to search for what he advised me and I found that NoSQL structure is good for Graphs and Real-time data transactions and SQL for other purposes such as relationships between data and other.
but there are many questions that I want to ask for these:
1 - first of all, Is that right that SQL is slower than NoSQL when it deals with data in Real-time? if yes, so why?
2- if NoSQL is good for some purposes and bad for others unlike SQL then, Can I use two different databases like Postgresql and MongoDB together in one project (with Django for our example)?
3- if I can mix those two databases together so, I see that there are many things that will make it slower because once I use for example User as a column in the database and when I want to Update something for that user so, it will do two requests for both of database updates, Am I right in that?

SQL server management studio Relationships connection

In databases you can define relationships between tables. But what exactly is the use, besides documentation, of making these relationships explicit in a diagram (for example by connecting the keys in SQL server management studio)?
Does it give you any advantage in writing SQL statements? Computation time? Memory usage? Usually you "repeat" the relationship in the join statement. I have the feeling I'm missing something trivial.
Thanks
From the fine manual
You can use Object Explorer to create new database diagrams. Database diagrams graphically show the structure of the database. Using database diagrams you can create and modify tables, columns, relationships, and keys. Additionally, you can modify indexes and constraints.
You asked:
Does it give you any advantage in writing SQL statements?
They can tell you how the datase is structured. That's usually pretty key to understanding it, and reading all the FKs into your head and remembering which table relates to what can be quite the puzzle and even then not actually relate to how the data is used and related in-application
Computation time?
Not quite sure what this means, but the presence or absence of a database diagram won't impact the amount of time your SQL Server spends planning or executing queries
Memory usage?
Not really
Usually you "repeat" the relationship in the join statement
Sometimes; there are ways of joining data without using joins, and presence or absence of FKs or database diagrams have nothing to do with SQLS ability to join data
It might be best to think of DB Diagrams as a visual design aid and tool

Databases for chess games

A previous question wanted to save each move to a database as a game of chess plays out. And asked what database to use to do this. Various possibilities were given:
MongoDb, CouchDb, MySql, SQLite
One answer in particular mentioned a traditional one to many mapping:
The only advantage I can see to a mongodb or couchdb is that you could conceivably store the entire match in a single record, thus making your data a little simpler. You wouldn't have to do the traditional one to many mapping between moves table and a game table.
What exactly does this mean and what would this look like in say PostgreSql so I have a concrete idea of what this means?
Below example Entity Relationship Diagram based on SQL 2005, but with some tweaks to datatypes it can be transferred to MySql or PostgreSql.

Database model refactoring for combining multiple tables of heterogeneous data in SQL Server?

I took over the task of re-developing a database of scientific data which is used by a web interface, where the original author had taken a 'table-per-dataset' approach which didn't scale well and is now fairly difficult to manage with more than 200 tables that have been created. I've spent quite a bit of time trying to figure out how to wrangle the thing, but the datasets contain heterogeneous values, so it is not reasonably possible to combine them into one table with a set schema for column definitions.
I've explored the possibility of EAV, XML columns, and ended up attempting to go with a table with many sparse columns since the database is running on SQL Server 2008. The DBAs are having some issues with my recently created sparse columns causing some havoc with their backup scripts, so I'm left wondering again if there isn't a better way to do this. I know that EAV does not lead to decent performance, and my experiments with XML data types also demonstrated poor performance, probably thanks to the large number of records in some of the tables.
Here's the summary:
Around 200 tables, most of which have a few columns containing floats and small strings
Some tables have as many as 15,000 records
Table schemas are not consistent, as the columns depended on the number of samples in the original experimental data.
SQL Server 2008
I'll be treating most of this data as legacy in the new version I'm developing, but I still need to be able to display it and query it- and I'd rather not have to do so by dynamically specifying the table name in my stored procedures as it would be with the current multi-table approach. Any suggestions?
I would suggest that the first step is looking to rationalise the data through views; attempt to consolidate similar data sets into logical pools through views.
You could then look at refactoring the code to look at the views, and see if the web platform operates effectively. From there you could decided whether or not the view structure is beneficial and if so, look to physically rationalising the data into a new table.
The benefit of using views in this manner is you should be able to squeak a little performance out of indexes on the views, and it should also give you a better handle on the data (that said, since you are dev'ing the new version, it would suggest you are perfectly capable of understanding the problem domain).
With 200 tables as simple raw data sets, and considering you believe your version will be taking over, I would probably go through the prototype exercise of seeing if you can't write the views to be identically named to what your final table names will be in V2, that way you can also backtest if your new database structure is in fact going to work.
Finally, a word to the wise, when someone has built a database in the way you've described, without looking at the data, and really knowing the problem set; they did it for a reason. Either it was bad design, or there was a cause for what now appears on the surface to be bad design; you raise consistency as an issue - look to wrap the data and see how consistent you can make it.
Good luck!

How to Design HBase Schema

We have currently one running project which uses RDBMS database( with lots of tables and stored procedures for manipulating data). The current flow is like : the data access layer will call stored procedures, which will insert/delete/update or fetch data from RDBMS(please note that these stored procedures are not doing any bulk proccesses.). The current data structure contains lots of primary key, foreign key relation ship and have lots of updates to existing database tables.a I just want to know whether we can use HBase for our purpose? then how can we use Hadoop with HBase replacing RDBMS?
You need to ask yourself, what is the RDBMS not doing for you, and what is it that you hope to achieve by moving to Hadoop/HBase?
This article may help. There are a lot more.
http://it.toolbox.com/blogs/madgreek/nosql-vs-rdbms-apples-and-oranges-37713
If the purpose is trying new technology, I suggest trying their tutorial/getting started.
If it's a clear problem you're trying to solve, then you may want to articulate the problem.
Good Luck!
I hesitate to suggest replacing your current rdbms simply because of the large developer effort that you've already spent. Consider that your organization probably has no employees with the needed experience for hbase. Moving to hbase with the attendant data conversion and application rewriting will be very expensive and risky.