SQL Database Data Dictionary Inquiry - sql

I'm a university student who is completely new to SQL and I don't know what I'm doing, so go easy on me. I have been given an assignment to design a Facebook-like database for a website for people who read books. I have made a very basic data dictionary which gives the general idea.
Is this a valid approach? Is there a better way of doing this? I'd appreciate any feedback I can get.

First of all, I would say good to see your efforts.
Here I would suggests you bit enhancements. Instead of maintaining different tables for
books_liked,books_read,wish_list, etc., you can maintain same with the help of flags in the master table books. This is to normalize your data.
eg. flg_liked, flg_wish_list, flg_read

Related

Sql Database structure for housing historical data and display changes

Good morning,
This is more of a concept question then anything.
I am looking to design a database and interface that will track changes to the entries (in this case people) and display those changes readily.
(user experience would look something like this)
for user A
Date Category Activity
8/8/14 change position position 1 -> position 2
8/9/14 change department department a -> department b
...
...
the visual experience seem like it would benefit from an E-A-V design, however i am designing the database to be easy to data mine and from my reading, i think that E-A-V is not the right way to go.
does it make sense to duplicate data just to display it?
if not, does anyone have a suggestion of how to query the history table and display? (currently using jquery and php to leverage the db...i suppose i could do something interesting from a coding perspective to get it done)
thank you for your help,
Travis
Creating an efficient operational database environment and a creating an 'easy-to-data mine' environment are two separate (and often opposing) goals.
Others might disagree with me but in my opinion it is best to create your database based on operational readiness (This means using your E-A-V design as mentioned above) and then worry about data transformation later. This may make it inconvenient later to transform the data to allow for easy mining but it will accomplish an incredibly important goal which is to eliminate the possibility for data error.
Once you have a good system in place where you can collect data appropriately, then you can create a warehouse or datamart environment to more conveniently extract that data.
This may sound like a lot of work but from a data integrity perspective, it is much safer than trying to create some system that is designed entirely for reporting. That's my personal opinion at least.
(sorry cannot comment yet)
You have to analyse the data you need to persist.
if you have only a couple of tables, with no relationship, you probably don't need the database.
In this case the database solution probably will be slower(connection/transmission/security overhead ...).
well if it's a few MBs of data, I would keep everything in one table.
You can easily load the whole data set in memory and do what you need to do.

Data migration concept in databases

i have a generic question about data migration/conversion.
here the situation:
i have data in a access db and want to migrate it into a oracle db. its not a 1 to 1 migration. i need to define which data should be imported into a specific oracle table and create some logic after that somehow.
do you have some ideas or a concept how to do this? maybe a checklist not that i forget something or i dont know.
Thank you.
Are you going to run it just once and be done? Or do you need to set up a job that routinely transfers the data? The keyword you might not be familiar with for googling is "ETL" - Extract, Transform, Load - and there are lots of products designed to handle ETL tasks.
Common sticking points are converting dates from one db to another and floating point numbers (especially currency.) When you hit specific problems, post new questions. There's a stackoverflow sister site, serverfault.com that might be a better fit for a lot of ETL questions. The products you're working with are popular so the db experts there are likely to have quick answers on datatype issues.

Implementing Review flags in Databases; best practices

I need store some review flags that relate to some entities. Each review flag can only related to a single entity property group. For example table Parents has a ParentsStatus flag and table Children has a set of ChildrenStatus flags.
In the current design proposal I have three tables:
ReviewTypes: stores the flags and the properties they relate to.
ReviewPositions: stores the values the flags can have.
Reviews: stores the transaction data, the actual reviews. It is like UsersToFlags: Flags in a database rows, best practices.
The problem is I am getting push back that there is no need to have the Reviews table and it would be better to just store this actual review data on each entity. For example add an extra column to Parents to hold ParentsStatus. They feel it is a simpler solution and separating the data out is just “overkill” for out scenario.
I don’t like this idea as this means that every time we want to add a new review flag we need to update the core entity table to hold that flag.
Space is not a problem.
Do people have any strong opinions?
Edit:
This comment applies to the three answers. The consensus is the relational approach is best but I think I need to read up a little more on the EAV model as from some very basic reading Best beginner resources for understanding the EAV database model? and its related links it does not appear to be super straightforward and I don't want to dig myself a hole. Thanks to wildplasser. I'll loop back once I read up a bit more.
Oh yes. Their idea is simpler, until you want to enhance it. Given the scheme they are proposing what if two reviews were need per entity. What if you wanted to attach other things such as notes/annotations. Once they find out how much of an inflatable dartboard their idea is, what do you have to move to a more useful one? Not to mention you need some way of identifying status fields, with fragile rubbish like Column name ends with "_Status", or you have to hard code them somewhere.
Doing it properly is not that much more work, it's not more complex, in fact in many ways it's simpler and it will cope with the invetible changes at far less cost.
normalization is always preferable to premature optimization.
One reason why I like the reviews table separate is that you can hold changes you may not want to display yet (as it hasn't been reviewed and approved) and still maintain the old dat until the new is approved. I don't know if your situation requires that.
To make future programming simpler for when you want to display the changes, you can write a view that shows the old and new data.

How to understand the business of AdventureWorks2008 DB

I am trying to understand AdventureWorks Database as most of the good examples on web and books are usually explained taking it as a sample database.
I found few links like
http://msdn.microsoft.com/en-us/library/ms124825(v=sql.100).aspx
http://merc.tv/img/fig/AdventureWorks2008_Conceptual.pdf
http://merc.tv/img/fig/AdventureWorks2008.gif
But I found them insufficient for explaining the business completely.
I tried to practice some queries as well so that I could get more knowledge about its business. But looking at 70 tables, I finds myself no were in the understanding of its Business.
Can you help me giving some good likes where I can get more details about it.
EDIT
I never read northwind database. Is it important to understand it to get a good grasp on AdventureWorks2008
Looking into what you are doing, I think it is not important for you to fully understand the AW business. If would be a better use of your time to understand the queries that are required and just the 2-3 tables in question rather than the entire database.
Even thought the MSTS certification book pulls from the AW DB, understanding that database is not part of the certification. If you have trouble with specific questions from your book, I would suggest you post what you are doing and what result you are trying to achieve.
I can go on explaining the business of AW but it will consume a lot of time as I cannot completely give the business understanding. You will only understand those things that I will describe here. I am currently going through this book- The Microsoft Datawarehouse Toolkit by Ralph Kimball. This is the best book that will give you a detailed understanding of AW business.
The business requirements ranges from a variety of people. From sales to technical people and others. So I suggest you read the first 3 chapters of this book as it simplifies learning the business.
If reading is not feasible I will show you another approach:
AdventureWorks (AW) is an imaginary manufacturer and seller of bicycles and its accessories (Table Production.ProductCategory).
The schemas help a lot in understanding their business, sales, employee data and product info.
The schemas involved in AW database are: HumanResources, Person, Production, Purchasing, Sales.
The table names are self-explanatory to a great extent. A simple query against each table will give you a fair amount of information about data.
You can go through this link that gives a brief overview of schemas used in AW
https://technet.microsoft.com/en-us/library/ms124894(v=sql.100).aspx
Think of yourself an interviewer and then frame questions that you will be asking the officers/managers of AW. Trust me there are lot of pitfalls if you are a newbie. Like, if you ask the question to business: "What do you want in your datawarehouse?" WRONG. This is your job to figure out.
All the best with exploration.
I have found this link below which if you follow the link depending on which sample database you are interested, it explains what it is all about:
https://msdn.microsoft.com/en-us/library/ms124501(v=sql.100).aspx
I have also fouund this ETL (but could not find for other databases):
http://msftdbprodsamples.codeplex.com/wikipage?title=%20AdventureWorksLTDiagram&referringTitle=AWSchemaDiag
I have also found these exercises:
http://sqlzoo.net/wiki/AdventureWorks

Many-to-many relationship: use associative table or delimited values in a column?

Update 2009.04.24
The main point of my question is not developer confusion and what to do about it.
The point is to understand when delimited values are the right solution.
I've seen delimited data used in commercial product databases (Ektron lol).
SQL Server even has an XML datatype, so that could be used for the same purpose as delimited fields.
/end Update
The application I'm designing has some many-to-many relationships. In the past, I've often used associative tables to represent these in the database. This has caused some confusion to the developers.
Here's an example DB structure:
Document
---------------
ID (PK)
Title
CategoryIDs (varchar(4000))
Category
------------
ID (PK)
Title
There is a many-to-many relationship between Document and Category.
In this implementation, Document.CategoryIDs is a big pipe-delimited list of CategoryIDs.
To me, this is bad because it requires use of substring matching in queries -- which cannot make use of indexes. I think this will be slow and will not scale.
With that model, to get all Documents for a Category, you would need something like the following:
select * from documents where categoryids like '%|' + #targetCategoryId + '|%'
My solution is to create an associative table as follows:
Document_Category
-------------------------------
DocumentID (PK)
CategoryID (PK)
This is confusing to the developers. Is there some elegant alternate solution that I'm missing?
I'm assuming there will be thousands of rows in Document. Category may be like 40 rows or so. The primary concern is query performance. Am I over-engineering this?
Is there a case where it's preferred to store lists of IDs in database columns rather than pushing the data out to an associative table?
Consider also that we may need to create many-to-many relationships among documents. This would suggest an associative table Document_Document. Is that the preferred design or is it better to store the associated Document IDs in a single column?
Thanks.
This is confusing to the developers.
Get better developers. That is the right approach.
Your suggestion IS the elegant, powerful, best practice solution.
Since I don't think the other answers said the following strongly enough, I'm going to do it.
If your developers 1) can't understand how to model a many-to-many relationship in a relational database, and 2) strongly insist on storing your CategoryIDs as delimited character data,
Then they ought to immediately lose all database design privileges. At the very least, they need an actual experienced professional to join their team who has the authority to stop them from doing something this unwise and can give them the database design training they are completely lacking.
Last, you should not refer to them as "database developers" again until they are properly up to speed, as this is a slight to those of us who actually are competent developers & designers.
I hope this answer is very helpful to you.
Update
The main point of my question is not developer confusion and what to do about it.
The point is to understand when delimited values are the right solution.
Delimited values are the wrong solution except in extremely rare cases. When individual values will ever be queried/inserted/deleted/updated this proves it was the wrong decision, because you have to parse and touch all the other values just to work with the desired one. By doing this you're violating first (!!!) normal form (this phrase should sound to you like an unbelievably vile expletive). Using XML to do the same thing is wrong, too. Storing delimited values or multi-value XML in a column could make sense when it is treated as an indivisible and opaque "property bag" that is NOT queried on by the database but is always sent whole to another consumer (perhaps a web server or an EDI recipient).
This takes me back to my initial comment. Developers who think violating first normal form is a good idea are very inexperienced developers in my book.
I will grant there are some pretty sophisticated non-relational data storage implementations out there using text property bags (such as Facebook(?) and other multi-million user sites running on thousands of servers). Well, when your database, user base, and transactions per second are big enough to need that, you'll have the money to develop it. In the meantime, stick with best practice.
It's almost always a big mistake to use comma separated IDs.
RDBMS are designed to store relationships.
My solution is to create an
associative table as follows: This is
confusing to the developers
Really? this is database 101, if this is confusing to them then maybe they need to step away from their wizard generated code and learn some basic DB normalization.
What you propose is the right solution!!
The Document_Category table in your design is certainly the correct way to approach the problem. If it's possible, I would suggest that you educate the developers instead of coming up with a suboptimal solution (and taking a performance hit, and not having referential integrity).
Your other options may depend on the database you're using. For example, in SQL Server you can have an XML column that would allow you to store your array in a pre-defined schema and then do joins based on the contents of that field. Other database systems may have something similar.
The many-to-many mapping you are doing is fine and normalized. It also allows for other data to be added later if needed. For example, say you wanted to add a time that the category was added to the document.
I would suggest having a surrogate primary key on the document_category table as well. And a Unique(documentid, categoryid) constraint if that makes sense to do so.
Why are the developers confused?
The 'this is confusing to the developers' design means you have under-educated developers. It is the better relational database design - you should use it if at all possible.
If you really want to use the list structure, then use a DBMS that understands them. Examples of such databases would be the U2 (Unidata, Universe) DBMS, which are (or were, once upon a long time ago) based on the Pick DBMS. There are likely to be other similar DBMS providers.
This is the classic object-relational mapping problem. The developers are probably not stupid, just inexperienced or unaccustomed to doing things the right way. Shouting "3NF!" over and over again won't convince them of the right way.
I suggest you ask your developers to explain to you how they would get a count of documents by category using the pipe-delimited approach. It would be a nightmare, whereas the link table makes it quite simple.
The number one reason that my developers try this "comma-delimited values in a database column" approach is that they have a perception that adding a new table to address the need for multiple values will take too long to add to the data model and the database.
Most of them know that their work around is bad for all kinds of reasons, but they choose this suboptimal method because they just can. They can do this and maybe never get caught, or they will get caught much later in the project when it is too expensive and risky to fix it. Why do they do this? Because their performance is measured solely on speed and not on quality or compliance.
It could also be, as on one of my projects, that the developers had a table to put the multi values in but were under the impression that duplicating that data in the parent table would speed up performance. They were wrong and they were called out on it.
So while you do need an answer to how to handle these costly, risky, and business-confidence damaging tricks, you should also try to find the reason why the developers believe that taking this course of action is better in the short and the long run for the project and company. Then fix both the perception and the data structures.
Yes, it could just be laziness, malicious intent, or cluelessness, but I'm betting most of the time developers do this stuff because they are constantly being told "just get it done". We on the data model and database design sides need to ensure that we aren't sending the wrong message about how responsive we can be to requests to fulfill a business requirement for a new entity/table/piece of information.
We should also see that data people need to be constantly monitoring the "as-built" part of our data architectures.
Personally, I never authorize the use of comma delimited values in a relational database because it is actually faster to build a new table than it is to build a parsing routine to create, update, and manage multiple values in a column and deal with all the anomalies introduced because sometimes that data has embedded commas, too.
Bottom line, don't do comma delimited values, but find out why the developers want to do it and fix that problem.