I've been working on a database application (Access 2007) and have a couple questions regarding architecture. This is, by far, the most complex application I've written yet.
I presently have 5 tables, each with its own unique ID field:
platforms
vehicles
manuals
procedures
faults
It is possible for the same fault to appear in multiple procedures, and for the same procedure to appear in multiple manuals. I would like to provide users the opportunity to copy and delete records, as well as cross-link records between tables.
Presently, I have join tables between each of the tables listed above,
platforms_vehicles
vehicles_manuals
manuals_procedures
procedures_faults
Editing the relationships between the main tables (by editing contents of the join tables) has gotten quite complicated, as copy and delete actions have a cascading effect on other tables. In my mind, this could be performed efficiently using recursion, but actually getting it to work is another story altogether.
Question #1: Is the architecture I described a "good" way to arrange this data, or is it unnecessarily complicated? (I have a habit of doing things that way.)
Question #2: Is there a "preferred" method of doing this?
Question #3: Would I be better off organizing the tables in a different manner, such as this:
Only 5 tables,
platforms
vehicles
manuals
procedures
faults
Each table has a key for its content fields, as well as another key representing its parent record in the next higher table. For example,
platforms
platformID, platformName
vehicles
platformID, vehicleID, vehicleName
manuals
vehicleID, manualID, manualName
...
This seems more straightforward to me, and certainly less complex. If you all would care to provide some professional feedback, I would greatly appreciate it.
Thanks!
You need to use MS Access' Relationship tool to set up Referential integrity rules. That way, you can cascade deletes and updates automatically. I would avoid "join tables" as they are unnecessary as long as the parent key is in the table (which would be called a foreign key).
Related
I feel that this should be a simple question, but I can't seem to find an answer anywhere.
I have an MS Access database where all the key fields have their proper key icon when I view the tables, but no relationships are defined. I need to create relationships between the "UnitID" key field for all the data tables. Some relationships are one-to-one and others are one-to-many (or one to none), but that doesn't matter, I don't need to enforce referential integrity. I just need to query the database, and worked with the query result tables, not add anything or change the data. All the UnitID fields have the same name.
Right now, I am just pulling up the relationship tab and dragging-and-dropping the names for each table, which takes forever. I can use the edit relationships icon that brings up a form, but it still needs to be re-opened for each table.
I am working with a government, publicly downloadable Access database. I realize Access isn't ideal, but that is the format it comes in and the program I'm am supposed to use for my job.
If there is a way to do it in the interface, that would be the best, since I can share it directly with others in my office who are unfamiliar with macros. But I have used VBA before for Excel and know some basic SQL. I've never used macros in Access, so I don't know what their capacities are; can this be done if there is no in-built functionality?
So are you talking about the Relationship Designer Window (Database Tools | Relationships menu option) in MS Access as pictured? With all the tables added, it takes about 5 seconds to click UnitID on one table, drag/drop to UnitID on another table and click Create. I guess it might take an hour or two to do them all?
Why must you have Relationships created at all? They don't define what Queries you can run. And if you don't need Referential Integrity, then I don't see much practical use for them anyhow.
If you can't get your Queries to run, then I would look elsewhere for the root of the problem.
By the way, once you get this problem solved, consider this: you may not need to actually create any Query result Tables if they are used as intermediate results. Since the result of a Query is a Table, then anywhere that the syntax mentions "Table", you can insert a Query. That is, Queries can be nested inside of other Queries. I mention this because you seem to be saying that you need a whole lot of result Tables, which in itself is going to get messy, not to mention that they will take up and lot of space and, worse, will be redundant and will have to recreated whenever your source Tables change (liable to be a maintenance nightmare).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am going to implement several lookup tables in a system. In general all lookup tables have the same structure like
id, name, value, rank, active
We are using AngularJS as front-end and Web API/Entity framework as backend in this project
There are some options on my mind
Option 1 - Create a set of lookup tables with the same structure
e.g. LKRegion, LKStatus, LKPeriod, LKState, LKDepartment, etc.
This option is a traditional design. The data schema is structural and easy to understand. It is easy to implement/enforce foreign key integrity. But you have to create separated web methods to handle CRUD actions. You have to repeat the same thing if you have another lookup table to add in the future.
Option 2 - Create a big lookup table by adding an extra column called LookupType to identify the lookup group
This option reduces the number of tables. Make the lookup table easy to maintain and retrieve (e.g. One schema, one web method can handle all general lookup CRUD actions). But the foreign key integrity is a little bit loose due to the LookupType.
Please share your preference and the tell me why. I would like to get the best practise on this implementation. Thank you!
I'll defend Option 2, although in general, you want Option 1. As others have mentioned, Option 1 is the simpler method and easily allows foreign key relationships.
There are some circumstances where having a single reference table is handy. For instance, if you are writing a system that will support multiple human languages, then having a single reference table with the names of things is much simpler than a zillion reference tables spread throughout the database. Or, I suppose, you could have very arcane security requirements that require complex encryption algorithms -- and dealing with a single table is easier.
Nevertheless, referential integrity on reference tables is important. Some databases have non-trigger-based mechanisms that will support referential integrity for one table (foreign keys and computed columns in Oracle and SQL Server). These mechanisms are a bit cumbersome but they do allow different foreign key references to a single table. And, you can always enforce referential integrity using triggers, although I don't recommend that approach.
As with most things that databases do, there isn't a right answer. There is a generally accepted answer that works/is correct in most cases (Option 1). The second option would only be desirable under limited circumstances depending on system requirements.
I suggest that :
A. Follow the organization standard if this is an enterprise system (some may laugh loud on this, I know). If such a thing exists, it would certainly promote individual tables.
B. Use Enums or 1 aggregated lookup table for programming level lookups only (such as error messages, etc,) if you must only. Any lookup data for business related data should be (in my opinion) be in a separate table for the following reasons at least:
When you have separate tables, you need to use the correct table name when you join and not use a code column of the reference table. This makes writing queries less error prone. Writing "Select ... Where (TableID=12 and State="NY") AND (TableId=133 and Country="USA")"...style of coding is quite error prone during development. This is the major issue for me from coding perspective.
RI errors on inserts and updates may be ambiguous when there is more 1 than reference to the lookup in the row being inserted or updated.
In some cases, the a lookup table may have self references (relationships). For example, a Geographical location can be described as a hierarchy which would add more confusion to the model.
The relationships (references) could loose meaning in your database. You will find that almost every table in your system is linked to this one table. It some how will not make sense.
If you ever decided to allow the user to perform ad-hoc reporting, it would be difficult for them to use codes for lookup tables instead of names.
I feel that the 1 table approach breaks Normalization concepts - Can't prove it now though.
An disadvantage, is that you may need to build an indexes on PKs and FKs for some (or all) of the separate tables. However, in the world of powerful database available today, this may not be a big deal.
There are plenty of discussion in the net that I should have read before answering your question, however I present some of the links if you care to take a look at some yourself:
Link 1, Link 2, Link 3, Link 4, Link 5...
Avoid option 2 at all costs, go with option 1 without even thinking about it.(*)
Referential integrity is far too important to compromise in favour of virtually any other concern.
If there you go, only pain will you find.
If you want to reduce duplication, implement a list of services in your web-api implementation language (java?) and parametrize each service with the name of the lookup table to work with.
Edit
(*) It was wrong on my behalf to say "without even thinking about it". Of course, think about it. If need be, go ahead and even post a question on stackoverflow about it. Thinking is good, and Gordon Linoff's answer above demonstrates this nicely.
I'm a SQL noob, and whilst I'm aware of the major tools available, I'm not experienced enough to know the best tool for certain situations.
As an example, I current have a group of tables where referential integrity is needed. Each table does not have all the necessary columns itself to be able to constrain the data, so I have at least 3 options open to me.
Create other table/tables that connect the data together - apart from duplicated data, this leaves multiple files to keep synced.
Create a trigger - not too difficult, but how trustworthy is a trigger? And is it scalable?
Create a function - not something I've done before, but I came across an example showing how it could be used to constrain data stored across multiple tables.
Given what I'm trying to do - maintain integrity by joining data, what should I consider, and are all 3 methods suited to what I'm trying to do?
Here an example using a bridge table to link missing table:
Using foreign keys are the best (and fastest and lightest footstep) way to guarantee data consistency. If you want a table of States to guarantee that only valid state spellings (and states that you do business in) are added to sales orders (so when you search for all sales to New Jersey you only have to search for one spelling) to your other tables there is no easier way then to use FKs.
I need to use one Access(2007)database on 2 offline locations and then get all the data back in one database. Some advised me to use SharePoint, but after some trial and frustration I wonder if it's really the best way.
Is it possible to manage this in an automated way, with update queries or so?
I have 26 tables, but only 14 need to be updated frequently. I use autonumber to create the parentkey and use cascade updating for the linked tables.
If your data can handle it, it's probably better to use a more natural key for the tables that require frequent updating. I.e. ideally you can uniquely identify a record my some combination of the columns in that record. Autonumbers in two databases can, and very likely will, step on each other, then when you do merge any records based on an old auto number need to be mapped properly. That can be done but is kind of a pain. It'd be nicer to avoid it all from the start.
As for using Sharepoint (I assume the suggestion is to replace your tables with lists, not to just put your accdb on SP) it has a lot of limitations in terms of the kinds of indices that can be created and relationships you can establish. Maybe your data are simple enough to live with this. I'm yet to be able to justify the move.
ultimate the answer to your question is YES it is possible to manage the synchonization with insert/update queries and very likely some VBA (possibly lots depending on how complicated your table hierarchy is). You'll need to be vigilant about two people updating a single record. You'll need to come up with some means to resolve the conflict.
I know, I quite dislike the catch-all survey type questions, but I couldn't think of a better way to find out what I think I need to know. I'm very green in the world of database development, having only worked on a small number of projects that merely interacted with the database rather than having to actually create a new one from scratch. However, things change and now I am faced with creating my own database.
So far, I have created the tables I need and added the columns that I think I need, including any link tables for many-many relationships and columns for one-to-many relationships. I have some specific questions on this, but I felt that rather than get just these answered, it would make more sense to ask about things I may not even know, which I should address now rather than 6 months from now when we have a populated database and client tools using it.
First the questions on my database which have led me to realise I don't know enough:
How do I ensure my many-to-many link tables and my one-to-many columns are up-to-date when changes are made to the referenced tables? What problems may I encounter?
I am using nvarchar(n) and nvarchar(MAX) for various text fields. Should I use varchar equivalents instead (I had read there may be performance risks in using nvarchar)? Are there any other gotchas regarding the selection of datatypes besides being wary of using fixed length char arrays to hold variable length information? Any rules on how to select the appropriate datatype?
I use int for the ID column of each table, which is my primary key in all but the link tables (where I have two primary keys, the IDs of the referenced table rows). This ID is set as the identity. Are there pitfalls to this approach?
I have created metadata tables for things like unit types and statuses, but I don't know if this was the correct thing to do or not. Should you create new tables for things like enumerated lists or is there a better way?
I understand that databases are complex and the subject of many many worthy tomes, but I suspect many of you have some tips and tricks to augment such reading material (though tips for essential reading would also be welcome).
Community wiki'd due to the rather subjective nature of these kinds of posts. Apologies if this is a duplicate, I've conducted a number of searches for something like this but couldn't find any, though this one is certainly related. Thanks.
Update
I just found this question, which is very similar in a roundabout way.
Not normalising
Not using normalisation
Trying to implement a denormalised schema from the start
Seriously:
Foreign keys will disallow deletes or updates from the parent tables. Or they can be cascaded.
Small as possible: 2 recent SO questions datatypes and (n)varchar
May not be portable and your "natural key" (say "product name") still needs a unique constraint. Otherwise no, but remember that an IDENTITY column is a "surrogate key"
Edit: Say you expect to store fruit with columns FruitID and FruitName. You have no way to restrict to one occurence of "Apple" or "Orange" because although this is your "natural key", you are using a surrogate key (FruitID). So, to maintain integrity, you need a unique constraint on FruitName
Not sure or your meaning, sorry. Edit: Don't do it. Ye olde "One true lookup table" idea.
I'll reply to your subjective query with some vague generalities. :)
The most common pitfall of designing a database is the same pitfall of any programming solution, not fully understanding the problem being solved. In the case of a database, it is understanding the nature of the data. How big it is, how it comes and goes, what business rules must it adhere to.
Here are some questions to ponder.
What is updated the most frequently? Is keeping that table write-locked going to lock up queries? Will it become a hot spot? Even a seemingly well normalized schema can be a poor performer if you don't understand your read versus write ratios.
What are your external interface needs? I've been on projects where the dotted line to "that other system" nearly scuttled the whole project because implementing it was delayed until everything else was in place, that is to say, everything else was inflexible.
Any other unspoken requirements? My favorite is date sensitivity. All the data is there, your reports are beautiful, the boss looks them over and asks, when did that datum change? Who did it and when? Is the database supposed to track itself and its users, or just the data? Will your front end do it for you?
Just some things to think about.
It does sounds like you've got a good grasp on what you're meant to be doing, and indeed there isn't "one true path" to doing databases.
Have you set up cascades for your hierarchical objects (i.e., a single delete at the 'head' of your object in the database will delete all entries in tables relating to that entry)?
Your link tables and 1:n columns should be foreign keys, so there isn't much to worry about if the data changes. By "two primary keys" here, did you mean indexes?
As for metadata tables, I've done them in the past, and I've not done them. A single char status with SQL comment can suffice for a limited set of statuses, but beyond a certain amount, or where you can think of adding more in the future, you might want to have a reference to another table of metadata, or maybe a char(8ish). E.g., I've seen user tables have "NORMAL", "ADMIN", "SUPER", "GUEST", etc for user type, which could have been 1,2,3,4,5 fkeys to a "UserType" table, but with such a restricted enumeration does it matter? Other people have a table of permissions (booleans of what a user can do) instead - many ways to skin a cat.
You might find some usable stuff in these slides:
[http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back][1]
I also am a beginner to database design, but I found this online tutorial very, very helpful:
Database design with UML and SQL, 3rd edition
The author explains all the fundamental design aspects of database, and in a very clear manner. Before I found this online guide I did a lot of wikipedia reading about normalization. While that helped, this author explains the exact same stuff (through 3rd normal form, at least) but in a much, much easier to read way. It pretty much addresses all your questions as well.
I'd suggest a good book. The best IMO is this:
http://www.amazon.com/Server-2005-Database-Design-Optimization/dp/1590595297/ref=ntt_at_ep_dpt_1
In addition to not normalizing, a common problem I see is overindexing, done before there are performance measurements that take into account your in-production mix of reads vs. writes.
It's really, really easy to add an index to speed up a query, and harder to figure out which one to remove when you have several that are getting updated during an INSERT or UPDATE.
The middle ground is to go after obvious secondary indexes (e.g., for common, frequent lookups by name on large tables), deferring other candidate indexes until you have reasonable performance tests in place.
Among other things, not using primary keys, not thinking ahead about whether you'll be using indexed views (and designing tables accordingly; I once had to drop and recreate a large table at my site to change its ANSI_NULL attribute to ON so that I could then use it with an indexed view), using indices.