What business folks have to understand about database design - sql

I have a business team asking me about setting up a meeting to explain them about database design considerations. Since they do not have much idea on RDMS I'm to thinking to explain below things
What is RDBMS
What is a table and what are constraints / why we need them
What is a transaction and what are ACID Properties
Things to consider before/while developing a dbms
a. Decide how much detail you need and how much you may in need future
b. Identify fields with unique values
c. Select the appropriate data types for your fields
d. Normalization and Index design
Also most of the time this team has their data coming in from flat files which we need to load into the DB and represent into the format they need. Anybody please suggest what can i explain more or any better way I can explain. And kind of their data is all over the place. I just want to emphazise more on thinking it through because we couldn't set up a stable process to do the import. Any suggestion for me is welcome as well :)
Appreciate your help!

You haven't said what your audience expects to take away from your presentation. So I'll have to guess, based on my dealings with business people in the past. Your mileage may vary.
Business people typically don't care about the skills and knowledge you put into doing a good job with database design, even when they say they do. They want to understand database design in terms of costs and benefits. That is how business people think.
So if you must cover some technical topic like indexing, do so from a cost benefit point of view. There is a cost to adding an index to a table, and there is a benefit to adding an index to a table. Figuring out in advance whether the benefit is worth the cost is the really tricky part, and they will be interested in this.
On a larger scale, data is a business asset. There is a cost to managing that asset well, and there is a benefit to managing that asset well. If you can connect your talk to these two concepts, they will be interested.
If they are really good business people, they will have a good understanding of the subject matter that the database covers, provided it's a part of the enterprise data that affects their business. If you have a good ER model of the data in the database, this model will connect every value in every table to an attribute, and every attribute will describe some aspect of the subject matter. This is a very different use of an ER model than just using it as a preliminary to creating a relational model.
Technical people tend to think of ER modeling as "relational modeling light". It's really much deeper than that. It's an analytical handle on the question "what does the data really mean?" And this is a handle on "what is the data really worth?". And this is where the technical world meets the business world.

How about starting from the basis of CRUD operations, then move on to normalization, give the scenarios for the need of Normalization and concept of Keys in RDBMS ,then you can talk about the ER modeling

Considering the fact that you are presenting to business folks, I think there would be 2 approaches best suited to your needs.
a) WHEN YOU HAVE LESS TIME:
Only cover topics which need minimum or no prior knowledge. Cover RDMS & things to consider.
Keep it simple and easy to understand. Tell them how your solution works and why it is an effective one.
Cover only topics which are relevant and make it layman friendly. Provide them the pros & cons of your DB design. Connect it to business needs.
In all cases, provide contextual examples which they may relate to with ease.
b) WHEN YOU HAVE MORE TIME
You may cover topics in detail as suggested in the previous comments. (#SQL_Underworld & #Ramya)

Related

What do I need to know about databases in order to create a quality Django app?

I'm trying to optimize my site and found this nice little Django doc:
Database Access Optimization, which suggests profiling followed by indexing and the selection of proper fields as the starting point for database optimization.
Normally, the django docs explain things pretty well, even things that more experienced programmers might consider "obvious". Not so in this case. After no explanation of indexing, the doc goes on to say:
We will assume you have done the obvious things above.
Uhhh. Wait! What the heck is indexing?
Obviously I can figure out what indexing is via google, my question is: what is it that I need to know as far as database stuff goes in order to create a scalable website? What should I be aware of about the Django framework specifically? What other "obvious" things ought I know? Where can I learn them?
I'm looking to get pointed in a direction here. I don't need to learn anything and everything about SQL, I just want to be informed enough to build my app the right way.
Thanks in advance!
I encourage you to read all that the other answers suggest and whatever else you can find on the subject, because it's all good information to know and will make you a better programmer.
That said, one of the nice things about Django and other similar frameworks is that for the most part you don't have to know what's going on behind the scenes in the DB. Django adds indexes automatically for fields that need them. The encouragement to add more is based on the use cases of your app. If you continually query based on one particular field, you should ensure that that field is indexed. It might be already (if it's a foreign key, primary key, etc.), but other random fields typically aren't.
There's also various optimizations that are database client-specific. Django can't do much here because it's goal is to remain database independent. So, if you're using PostgreSQL, MySQL, whatever, read about optimizations and best practices concerning those particular clients.
Wikipedia database design, and database normalization http://en.wikipedia.org/wiki/Database_design, and http://en.wikipedia.org/wiki/Database_normalization are two very important concepts, in addition to indexing.
In addition to these, having a basic understanding of your database of choice is necessary. Being able to add users, set permissions, and create a database are key things that you should know.
Learning how to backup your data is also a crucial thing.
The list keeps getting longer, one should also be aware of the db relationships that django handles for you, OneToOne, ManyToMany, ManyToOne. https://docs.djangoproject.com/en/dev/topics/db/models/
The performance impact of JOINs shouldn't be ignored. Access model properties in django is so easy, but understanding that some of Foreign Key relationships could have huge performance impacts is something to consider too.
Once you have a basic understanding of these things you should be at a pretty good starting point for creating a non-trivial django app!
Wikipedia has a nice article about database indexes, they are similar(ish) to an index in a book i.e. lets you (the computer) find things faster because you just look at the index (probably a very bad example :-)
As for performance there are many things you can do and presumably as it is a very detailed subject in itself, and is something that is particular to each RDBMS then it would be distracting / irrelevant for them (django) to go into great detail. Best thing is really to google performance tips for your particular RDBMS. There are some general tips such as indexing, limiting queries to only return the required data etc.
I think one of the main things is a good design, sticking as much as possible to Normal Form and in general actually taking your database into consideration before programming your models etc (which clearly you seem to be doing). Naming conventions are also a big plus, remembering explicit is better then implicit :-)
To summarise:
Learn/understand the fundamentals such as the relational model
Decide on a naming convention
Design your database perhaps using an ERM tool
Prefer surrogate ID's
Use the correct data type of minimum possible size
Use indexes appropriately and don't over index
Avoid unecessary/over querying
Prioritise security and stability over raw performance
Once you have an up and running database 'tune' the database analysing/profiling settings, queries, design etc
Backup and archive regularly - cron
Hang out here :-)
If required advance into replication (master/slave - django supports this quite well too)
Consider upgrading your hardware
Don't get too hung up about it

SQL vs NOSQL: Which to use for this schema?

I've got an upcoming project and I can't decide whether to stick with SQL or switch over to NoSQL. It's basically a reporting system with the main interface being reporting on the data entered in by users.
Here's the schema I've got mapped out:
Because this schema is so nested, I started thinking about NoSQL. With SQL, I'm afraid I'm going to have a crap-ton of joins to get to the bottom of the tree (the Record model).
My concerns, though, are two-fold:
I'm only just starting to get into NoSQL and I'm worried my
knowledge may limit me because of the tight timeframe.
Although creating data at the bottom of the tree will probably be relatively simple, I'm worried that it may be hard to report on without getting into some heavy map/reduce stuff (that I have zero experience with)
My question:
Given my concerns, do you think this schema -- because of how deeply nested it is -- lends itself more to NoSQL? If so, do you think the reporting on the "records" will be difficult?
I realize that it may be difficult to answer these questions without more info, so please let me know what other info may be helpful in coming up with an answer.
Thanks in advance for your help!
Just my opinion:
I Stared at diagram for approx 3 sec, this is clearly relational. Benefits of an RDBMS heavily outweigh a NoSQL solution here. Why would you want to use NoSQL? Are there 100,000+ records (may a million plus)? You need microsecond/millisecond performance?
NoSQL, as I understand, is not because you don't like lots of joins. It's because big systems for hierarchical data don't suit every situation. This suit this perfectly, however.
You can probably implode all of the {organisation, region,campus,event } hierarchy into one hierarchical / tree based / self-referential relation. Maybe "user", too.
That would drastically reduce the number of tables needed. for an example, please take a look at this implementation: Interesting tree/hierarchical data structure problem (which is actually more complex than yours).
BTW: I don't have the faintest idea what "metric model" means. Inches? Miles to the gallon? Or just "measurements" ? Could you please explain a bit more what you intend to do?
EDIT: BTW2: the model you propose is technically not too difficult for postgres. But it is probably bigger than necessary for humans.
My question: Given my concerns, do you think this schema -- because of
how deeply nested it is -- lends itself more to NoSQL?
Deep nesting is not a point pro or contra SQL/NoSQL.
If so, do you > think the reporting on the "records" will be difficult?
That's the tipping point and here you don't give us the relevant information: What is this "reporting" thing in your case?
Does one report aggregate much data? E.g. does it simply aggregate all records and return a sum of them?
Does it aggregate over many of your layers?
Does a report evaluate strictly hierarchical or does it correlate event1.metric4.record42 to event2.metric18.record50 (or something like that)?
How much data must be transfered from the NoSQL DB to your application only to aggregate it an throw most of the parts away.
How unstructured is your data? Well - very structured it seems.
Those are typical situations/points where RDBMs have proven their value. If these items are not important in your case, then you can choose freely.

Any Validity to the NoSQL movement?

First of all im relatively new to the Database world, Im graduating with my B.S. in Comp Science this semester and Database Technologies have really caught my eye so ive been studying alot of T-SQL because I want to in the end get a SQL Development job (MS SQL server seemed like the best choice right now because it's on the rise)
ANYWAYS, i've heard alot of hoopla about this NOsql movement of Non-relational database management systems. Trying to keep this question and non-subjective as possible i mainly want to know the advantages/disadvantages of NRDBMS's (like Nosql) and if there is really a future in them. Perhaps as a side question, is it a bad time to be studying SQL in general (specifically the normal RDBMS's we are so used to). I forsee people sticking with this for a long time, but then again.....I dont know. I'd hate to see my interest suddenly be taking a dive in the market.
There is definitely validity to the NoSQL movement, but I wouldn't worry about your SQL skills going to waste. NoSQL storage architectures were born out of the need for highly available and scalable data stores that went beyond what a typical relational database could provide. This comes at a cost though, and typically that cost is guaranteed consistency. This isn't always a large concern. In the case of something like Facebook doesn't have complete consistency for a period of time for things like your pictures, status updates, etc. As long as they get consistent at some point, it's okay. On the other end, take your bank account. That type of data store needs to provide the strong ACID characteristics that a relational database provides.
NoSQL isn't something that I see taking over the world, it's an alternative to the common approach of RDBMS's and as with everything else it has it's strengths and weaknesses.
Here is an excellent article on the subject written about NetFlix.
Others can address the NoSQL specifics better than I can, but as for the second part of your question (worrying about getting into SQL if NoSQL starts becoming more popular): I have customers who still use very old flat-file based mainframes.
SQL hasn't even reached full penetration yet, and it is VERY entrenched in a large number of business processes. The market for SQL development and maintenance won't be going away any time soon, and if it starts to it won't be overnight - you'll have time to learn the Next Big Thing before you're obsolete.
NoSql databases are great for storing unstructured data. Think of it as the next generation of Lotus Notes.
I wouldn't leverage a NoSql database for storing a list of people and addresses, as those are completely structured and well known.
However, if I had a set of dynamic attributes of some type (name/value pairs) or something a similar which required a lot of pivoting to get to, then I'd seriously look into it. I might even go that route even if there is structure, but it isn't known ahead of time. Such as with dynamic tables.
That said, when we did some evaluations earlier this year (March 2010) and we didn't think the state of the available open source NoSql databases were ready for serious production. There's a lot more to databases than just putting data in and getting it out. Automated backups, load balancing, solid query tools, consistency checkers, etc are an absolute must. We will reevaluate early next year.
SQL ain't going away, and the relational model is a basic information-systems building block that's definitely worth studying and understanding in its own right. I'd stick with it.
Databases based on an object instead of relational model have existed forever. The difference is that in the past they tended to be closed (and expensive!) packages from single vendors. No-one really wants to have their mission-critical apps locked into a proprietary database, dependent on licensing from a single, sometimes unresponsive, supplier.
In contrast today's NoSQL databases are typically free, open, and well-aligned to existing web-oriented technologies, allowing for quick, responsive scaling without worrying about licenses, and potential participation in future development (or local forking/patching if necessary).
What they also are is diverse, such that you can't really classify them all together as being good for a particular kind of task. There are trivial key-value buckets that make no attempt at being ACID-safe, there are object databases with their own safety paradigms (like CouchDB's revision conflicts), there are more traditional relational-like databases that just don't use SQL as a query mechanism (because let's face it, nice though it is that you can use the same query language across databases, hacking together SQL queries into a string just so that the database at the other end can pick the string apart to get the logic of the query you wanted to do, is a bit silly).
There are lots of them, most are very immature compared to the ancient edifice of SQL, and it'll take a while for winners to emerge. Is NoSQL “valid”? Sure. But I would say to use a particular NoSQL database as a basis for study today (as opposed to using one that fits your needs for a particular task that SQL is bad at) would be premature.
The future of big systems will require skills with both SQL and NoSQL.
NoSQL is an important paradigm and it's not going anywhere. Joins don't scale horizontally and SQL database are effectively just big "join machines". NoSQL is still in relative infancy, there are tons of players and just like SQL, each one has its own little variations.
But that's all going to shake out in the next few years
As a recent grad, you have to start somewhere. SQL is simply the easiest place to start. You will see lots of it going forward. However, once you've got your head around SQL (say you've passed your MS T-SQL course), I strongly suggest taking a look at something like MongoDB/Riak/CouchDB as your next adventure.
You probably won't jump into a company using NoSQL, but you will run in to problems where NoSQL is actually a much simpler solution. But you won't know this until you actually play with NoSQL.
It sounds like you're already pointing in the right direction by looking at job postings and seeing what current needs are in the way of data storage and management, if this is your passion. I wouldn't be surprised if interviews will start asking about the advantages/disadvantages of nosql just to see if you're familiar with the latest developments (and if you're applying for a dba position, they might also ask about ACID compliance and the CAP theorem).
Lots of companies are starting to use NoSQL technologies, so it's valid in that people are using it. And not just small startups either, but companies like facebook (cassandra), yahoo (hadoop), google (bigtable), and etsy (mongodb) believe that nosql solutions fit certain needs.
I think NoSQL is more of a niche. It's really good for some applications, but will probably never totally displace RDBMSs (although combinations of NoSQL on top of an RDBMS backend seem to be coming out more I hear). Advice would be to get good with an old-school RDBMS (it's still much more common, at least from what I've seen), and then get into NoSQL on the side if you want.
Brent Ozar did an excellent writeup on the topic here: NoSQL Basics for Database Administrators

Database modelling or database design: Which comes first?

I would like to know which is the common practice for a domain implementation. Designing the business objects first, that need persistence or the database schema first, generating it from an entity relationship diagram (and afterwards the ORM poco*'s)?
I am going to start a solution, but I would like to know which is the most preferable "pattern".
(*powered by NHibernate)
Depends on whether you're an object or relational modeler. Preference is dictated by what you know best.
I'm an object person, so I'd say model the problem in objects and then get the relational schema from that.
I think there are lots of issues around data that aren't addressed by objects (e.g., indexing, primary and foreign keys, normalization) that say you still have some work to do when you're finished.
But any relational person will argue that they're primary and should be in the driver's seat.
I doubt that there will be a definitive answer to this one. I don't believe there should be. There's an object-relational impedance mismatch that's real. Objects are instance-centric; relational models are set-based. Both need careful consideration.
Common practice are both, and it comes down to the preference of each implementer. As duffymo suggested you should go with the one you know best.
However you should also take into consideration what your regular patterns for working with data are. Having something that's nicely modeled in either one, but very costly in terms of performance is not a good choice. The balance is somewhere in the middle.
I personally tend to pay more attention to the database side of things mainly because databases are the ones that are harder to scale. Keeping this is mind when designing a database helps. You don't have to necessarily make the initial design following strict scaling rules, but having it in mind might help you not to make design decisions that will be the equivalent of shooting yourself in the foot, when later on the need for scaling arises.

What are the principles behind, and benefits of, the "party model"?

The "party model" is a "pattern" for relational database design. At least part of it involves finding commonality between many entities, such as Customer, Employee, Partner, etc., and factoring that into some more "abstract" database tables.
I'd like to find out your thoughts on the following:
What are the core principles and motivating forces behind the party model?
What does it prescribe you do to your data model? (My bit above is pretty high level and quite possibly incorrect in some ways. I've been on a project that used it, but I was working with a separate team focused on other issues).
What has your experience led you to feel about it? Did you use it, and if so, would you do so again? What were the pros and cons?
Did the party model limit your choice of ORMs? For example, did you have to eliminate certain ORMs because they didn't allow for enough of an "abstraction layer" between your domain objects and your physical data model?
I'm sure every response won't address every one of those questions ... but anything touching on one or more of them is going to help me make some decisions I'm facing.
Thanks.
What are the core principles and motivating forces behind the party
model?
To the extent that I've used it, it's mostly about code reuse and flexibility. We've used it before in the guest / user / admin model and it certainly proves its value when you need to move a user from one group to another. Extend this to having organizations and companies represented with users under them, and it's really providing a form of abstraction that isn't particularly inherent in SQL.
What does it prescribe you do to your data model? (My bit above is
pretty high level and quite possibly
incorrect in some ways. I've been on a
project that used it, but I was
working with a separate team focused
on other issues).
You're pretty correct in your bit above, though it needs some more detail. You can imagine a situation where an entity in the database (call it a Party) contracts out to another Party, which may in turn subcontract work out. A party might be an Employee, a Contractor, or a Company, all subclasses of Party. From my understanding, you would have a Party table and then more specific tables for each subclass, which could then be further subclassed (Party -> Person -> Contractor).
What has your experience led you to feel about it? Did you use it, and if
so, would you do so again? What were
the pros and cons?
It has its benefits if you need flexibly to add new types to your system and create relationships between types that you didn't expect at the beginning and architect in (users moving to a new level, companies hiring other companies, etc). It also gives you the benefit of running a single query and retrieving data for multiple types of parties (Companies,Employees,Contractors). On the flip side, you're adding additional layers of abstraction to get to the data you actually need and are increasing load (or at least the number of joins) on the database when you're querying for a specific type. If your abstraction goes too far, you'll likely need to run multiple queries to retrieve the data as the complexity would start to become detrimental to readability and database load.
Did the party model limit your choice of ORMs? For example, did you
have to eliminate certain ORMs because
they didn't allow for enough of an
"abstraction layer" between your
domain objects and your physical data
model?
This is an area that I'm admittedly a bit weak in, but I've found that using views and mirrored abstraction in the application layer haven't made this too much of a problem. The real problem for me has always been a "where is piece of data X living" when I want to read the data source directly (it's not always intuitive for new developers on the system either).
The idea behind the party models (aka entity schema) is to define a database that leverages some of the scalability benefits of schema-free databases. The party model does that by defining its entities as party type records, as opposed to one table per entity. The result is an extremely normalized database with very few tables and very little knowledge about the semantic meaning of the data it stores. All that knowledge is pushed to the data access in code. Database upgrades using the party model are minimal to none, since the schema never changes. It’s essentially a glorified key-value pair data model structure with some fancy names and a couple of extra attributes.
Pros:
Kick-ass horizontal scalability. Once your 5-6 tables are defined in your entity model, you can go to the beach and sip margaritas. You can virtually scale this database out as much as you want with minimum efforts.
The database supports any data structure you throw at it. You can also change data structures and party/entities definitions on the fly without affecting your application. This is very very powerful.
You can model any arbitrary data entity by adding records, not changing the schema. Meaning you can say goodbye to schema migration scripts.
This is programmers’ paradise, since the code they write will define the actual entities they use in code, and there are no mappings from Objects to Tables or anything like that. You can think of the Party table as the base object of your framework of choice (System.Object for .NET)
Cons:
Party/Entity models never play well with ORMs, so forget about using EF or NHibernate to get semantically meaningful entities out of your entity database.
Lots of joins. Performance tuning challenges. This ‘con’ is relative to the practices you use to define your entities, but is safe to say that you’ll be doing a lot more of those mind-bending queries that will bring you nightmares at night.
Harder to consume. Developers and DB pros unfamiliar with your business will have a harder time to get used to the entities exposed by these models. Since everything is abstract, there no diagram or visualization you can build on top of your database to explain what is stored to someone else.
Heavy data access models or business rules engines will be needed. Basically you have to do the work of understanding what the heck you want out of your database at some point, and your database model is not going to help you this time around.
If you are considering a party or entity schema in a relational database, you should probably take a look at other solutions like a NoSql data store, BigTable or KV Stores. There are some great products out there with massive deployments and traction such as MongoDB, DynamoDB, and Cassandra that pioneered this movement.
This is a vast topic, I would recommend reading The Data Model Resource Book Volume 3 - Universal Patterns for Data Modeling by Len Silverston and Paul Agnew.
I've just received my copy and it's pretty good - It provides you with an overlook for many approaches to data modeling, including hybrid contextual role patterns and so on. It has detailed PROs and CONs for every approach.
There is a pletheora of ways to model party relationships and roles all with their benefits and disadvantages. The question that was accepted as an answer covers just one instance of a 'party model'.
For instance, in many approaches, notions like "Employee", "Project Manager" etc. are roles that a party can play within a certain context. I will try to give you a better breakdown once I get home.
When I was part of a team implementing these ideas in the early 1980's, it did not limit our choice of ORM's because those hadn't been invented yet.
I'd fall back on those ideas any time, as that particular project was one of the most convincing proofs-of-concept I have ever seen of a "revolutionary" idea (which it certainly was at the time).
It forces you to nothing. And it doesn't stop you from anything (from any mistake, I mean). The one defining your own information model is you.
All parties have lots of properties in common. The fact that they have a name and such (we called those "signaletics"). The fact that they have principal/primary locations called "addresses". The fact that they all are involved, in some sense, in the business' contracts.
as a simple talk from my understanding: Party modeling gives the flexibility and needs more effort (like T-sql join and ...) to be implemented.
I also wanna point that, "using Party modeling (serialization/generalization) gives you the ability to have FK-Relation to other tables". for example: think of different types of users (admin, user, ...) which generalized into User table, and you can have UserID in your Authorization table.
I'm not sure, but the party model sounds like a particular case of the generalization-specialization pattern. A search on "generalization specialization relational modeling" finds some interesting articles.