Best way to ensure referential integrity - sql

I'm a SQL noob, and whilst I'm aware of the major tools available, I'm not experienced enough to know the best tool for certain situations.
As an example, I current have a group of tables where referential integrity is needed. Each table does not have all the necessary columns itself to be able to constrain the data, so I have at least 3 options open to me.
Create other table/tables that connect the data together - apart from duplicated data, this leaves multiple files to keep synced.
Create a trigger - not too difficult, but how trustworthy is a trigger? And is it scalable?
Create a function - not something I've done before, but I came across an example showing how it could be used to constrain data stored across multiple tables.
Given what I'm trying to do - maintain integrity by joining data, what should I consider, and are all 3 methods suited to what I'm trying to do?
Here an example using a bridge table to link missing table:

Using foreign keys are the best (and fastest and lightest footstep) way to guarantee data consistency. If you want a table of States to guarantee that only valid state spellings (and states that you do business in) are added to sales orders (so when you search for all sales to New Jersey you only have to search for one spelling) to your other tables there is no easier way then to use FKs.

Related

Create SQL tables for each user as security measure

I've research this topic and I'm relatively sure in most practices the answer is "No", but I would like some second opinions specific to my case.
We're currently working on a multi user web-app where each user will basically have there own copy "portal/app" within the web-app. It's not performance I'm worried about, but security.
I'm considering partitioning the data with a prefix userid_table1, userid_table2 to make it more manageable and ensure no security validation oversight is made by the team in development as we can easily add a validation to ensure that queries can only be run against tables with userid_*.
Would you still recommend against this method ?
I'm considering partitioning the data with a prefix userid_table1, userid_table2 to make it more manageable and ensure no security validation oversight is made by the team in development as we can easily add a validation to ensure that queries can only be run against tables with userid_*.
More manageable? That sounds like a joke. Your database will end up with a zillion different tables. Any operation that you want to do across all users will be a nightmare:
Declaring foreign key constraints.
Defining a new index on the tables.
Adding a new column.
Restructuring the tables.
And so on. And so on.
Your users may be limited to a single table. But the application developer and DBA need to deal with all of them. I cringe thinking about trying to figure out where performance bottlenecks are in such a system.
I should add that databases are optimized for big tables not lots of tables, so multiple tables are typically less efficient. And even less efficient when you think about all the half-filled pages in all those tables.
The same entities should not be spread among multiple tables, unless you have a really, really good reason. This is not a really good reason. One simple solution is to prevent users from having access to the base tables. Just give them access to views or user-defined table functions -- and have all of these filter on user ids.
There are some edge cases where you do want separate tables for each user. Typically, each user would have a very complex tables (think B2B application) and, in fact, they might have their own database. There may also be legal requirements to separate data. In these cases, though, the "separateness" would typically be at the database level, not the table level.

How to create a dynamique data base

i'm working with .Net core, i want to create a data base of a stock so that the user can add a new type of product with unknown features and he can also add features to existing product.
i really need help with the design of the data base.
Databases have schemas. This is a rigid structure that defines both the characteristics and constraints of the data that can be placed in it. You cannot do something like dynamically add columns, etc. without fundamentally impacting the database integrity.
In true relational databases (SQL Server, MySQL, Postgresql, etc.). Such changes are flat out disallowed. However, some less rigid NoSQL solutions are either schema-less or have malleable schemas and will allow you to just start tracking some new data point without first altering the structure of the database. Even then, though, data integrity becomes a serious issue, and you can end up borking your entire dataset if you do this kind of stuff willy-nilly.
Long and short, there's really no "dynamic" where databases are concerned. Even in NoSQL solutions, you're largely expected to plan out your data structure before hand, and failure to do so, results in inconsistencies in the data that can negate its usefulness entirely.
Your best bet for something like the described requirement is to actually have a Features table. In the simplest form, it might just have a string column for a name and a foreign key or simply an ID referencing column (depending on whether it's relational or not) back to the product it's associate with. You'll need a primary key as well, which could either be a composite of the name and product ID (essentially making the combination a unique) or you might want to have an actual identity type column.
The key with data, in general, is to generalize. Nothing is completely unique and is just usually variations of other things. Boil down your data to least common denominators to determine your actual schema. Then, where there's outliers, you can take a less-rigid strategy like described above.

Lookup tables implementation - one table or separate tables [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am going to implement several lookup tables in a system. In general all lookup tables have the same structure like
id, name, value, rank, active
We are using AngularJS as front-end and Web API/Entity framework as backend in this project
There are some options on my mind
Option 1 - Create a set of lookup tables with the same structure
e.g. LKRegion, LKStatus, LKPeriod, LKState, LKDepartment, etc.
This option is a traditional design. The data schema is structural and easy to understand. It is easy to implement/enforce foreign key integrity. But you have to create separated web methods to handle CRUD actions. You have to repeat the same thing if you have another lookup table to add in the future.
Option 2 - Create a big lookup table by adding an extra column called LookupType to identify the lookup group
This option reduces the number of tables. Make the lookup table easy to maintain and retrieve (e.g. One schema, one web method can handle all general lookup CRUD actions). But the foreign key integrity is a little bit loose due to the LookupType.
Please share your preference and the tell me why. I would like to get the best practise on this implementation. Thank you!
I'll defend Option 2, although in general, you want Option 1. As others have mentioned, Option 1 is the simpler method and easily allows foreign key relationships.
There are some circumstances where having a single reference table is handy. For instance, if you are writing a system that will support multiple human languages, then having a single reference table with the names of things is much simpler than a zillion reference tables spread throughout the database. Or, I suppose, you could have very arcane security requirements that require complex encryption algorithms -- and dealing with a single table is easier.
Nevertheless, referential integrity on reference tables is important. Some databases have non-trigger-based mechanisms that will support referential integrity for one table (foreign keys and computed columns in Oracle and SQL Server). These mechanisms are a bit cumbersome but they do allow different foreign key references to a single table. And, you can always enforce referential integrity using triggers, although I don't recommend that approach.
As with most things that databases do, there isn't a right answer. There is a generally accepted answer that works/is correct in most cases (Option 1). The second option would only be desirable under limited circumstances depending on system requirements.
I suggest that :
A. Follow the organization standard if this is an enterprise system (some may laugh loud on this, I know). If such a thing exists, it would certainly promote individual tables.
B. Use Enums or 1 aggregated lookup table for programming level lookups only (such as error messages, etc,) if you must only. Any lookup data for business related data should be (in my opinion) be in a separate table for the following reasons at least:
When you have separate tables, you need to use the correct table name when you join and not use a code column of the reference table. This makes writing queries less error prone. Writing "Select ... Where (TableID=12 and State="NY") AND (TableId=133 and Country="USA")"...style of coding is quite error prone during development. This is the major issue for me from coding perspective.
RI errors on inserts and updates may be ambiguous when there is more 1 than reference to the lookup in the row being inserted or updated.
In some cases, the a lookup table may have self references (relationships). For example, a Geographical location can be described as a hierarchy which would add more confusion to the model.
The relationships (references) could loose meaning in your database. You will find that almost every table in your system is linked to this one table. It some how will not make sense.
If you ever decided to allow the user to perform ad-hoc reporting, it would be difficult for them to use codes for lookup tables instead of names.
I feel that the 1 table approach breaks Normalization concepts - Can't prove it now though.
An disadvantage, is that you may need to build an indexes on PKs and FKs for some (or all) of the separate tables. However, in the world of powerful database available today, this may not be a big deal.
There are plenty of discussion in the net that I should have read before answering your question, however I present some of the links if you care to take a look at some yourself:
Link 1, Link 2, Link 3, Link 4, Link 5...
Avoid option 2 at all costs, go with option 1 without even thinking about it.(*)
Referential integrity is far too important to compromise in favour of virtually any other concern.
If there you go, only pain will you find.
If you want to reduce duplication, implement a list of services in your web-api implementation language (java?) and parametrize each service with the name of the lookup table to work with.
Edit
(*) It was wrong on my behalf to say "without even thinking about it". Of course, think about it. If need be, go ahead and even post a question on stackoverflow about it. Thinking is good, and Gordon Linoff's answer above demonstrates this nicely.

Merge the same Access database frequently (daily/weekly)

I need to use one Access(2007)database on 2 offline locations and then get all the data back in one database. Some advised me to use SharePoint, but after some trial and frustration I wonder if it's really the best way.
Is it possible to manage this in an automated way, with update queries or so?
I have 26 tables, but only 14 need to be updated frequently. I use autonumber to create the parentkey and use cascade updating for the linked tables.
If your data can handle it, it's probably better to use a more natural key for the tables that require frequent updating. I.e. ideally you can uniquely identify a record my some combination of the columns in that record. Autonumbers in two databases can, and very likely will, step on each other, then when you do merge any records based on an old auto number need to be mapped properly. That can be done but is kind of a pain. It'd be nicer to avoid it all from the start.
As for using Sharepoint (I assume the suggestion is to replace your tables with lists, not to just put your accdb on SP) it has a lot of limitations in terms of the kinds of indices that can be created and relationships you can establish. Maybe your data are simple enough to live with this. I'm yet to be able to justify the move.
ultimate the answer to your question is YES it is possible to manage the synchonization with insert/update queries and very likely some VBA (possibly lots depending on how complicated your table hierarchy is). You'll need to be vigilant about two people updating a single record. You'll need to come up with some means to resolve the conflict.

Database Design Without Foreign Keys

After having worked at various employers I've noticed a trend of "bad" database design with some of these companies - primarily the exclusion of Foreign Keys Constraints. It has always bugged me that these transactional systems didn't have FK's, which would've promoted referential integrity.
Are there any scenarios, in transactional systems, whereby the omission of FK's would be beneficial?
Has anyone else experienced this, if so what was the outcome?
What should one do if they're presented with this scenario and their asked to maintain/enhance the system?
I cannot think of any scenario where, if two columns have a dependency, they should not have a FK constraint set up between them. Removing referential integrity may certainly speed up database operations but there's a pretty high cost to pay for that.
I have experienced such systems and the usual outcome is corrupted data, in the sense that records exists that shouldn't exist (or vice versa). These are the sort of systems where people believe they're okay because the application takes care of it, not caring that:
Every application has to take care of it, rather than one DB server.
It only takes one bug, or malignant app, to screw it up for everyone.
It is the responsibility of the database to protect itself! That is one of its best features.
As to what you should do, I simply put forward the possible things that can go wrong and how using FKs will prevent that (often with a cost/benefit analysis "skewed" toward my viewpoint, if necessary). Then let the company decide - it is their database, after all.
There is a school of thought that a well-written application does not need referential integrity. If the application does things right, the thinking goes, there's no need for constraints.
Such thinking is akin to not doing defensive programming because if you write the code correctly, you won't have bugs. While true, it simply won't happen. Not using appropriate constraints is asking for data corruption.
As for what you should do, you should encourage the company to add constraints at every opportunity. You don't want to push it to the point of getting in trouble or making a bad name for yourself, but as long as the environment is appropriate, keep pushing for it. Everyone's life will be better in the long run.
Personally, I have no problem with a database not having explicit declarations for foreign keys. But, it depends on how the database is being used.
Most of the databases that I work with are relatively static data derived from one or more transactional systems. I am not particularly concerned with rogue updates affecting the database, so an explicit definition of a foreign key relationship is not particularly important.
One thing that I do have is very consistent naming. Basically, every table has a first column called ID, which is exactly how the column is refered to in other tables (or, sometimes with a prefix, when there are multiple relationships between two entities). I also try to insist that every column in such a database has a unique name that describes the attribute (so "CustomerStartDate" is different from "ProductStartDate").
If I were dealing with data that had more "cooks in the pot", then I would want to be more explicit about the foreign key relationships. And, I then I am more willing to have the overhead of foreign key definitions.
This overhead arises in many places. When creating a new table, I may want to use use "create table as" or "select into" and not worry about the particulars of constraints. When running update or insert queries, I may not want the database overhead of checking things that I know are ok. However, I must emphasize that consistent naming greatly increases my confidence that things are ok.
Clearly, my perspective is not one of a DBA but of a practitioner. However, invalid relationships between tables are something I -- or the rest of my team -- almost never has to deal with.
As long as there's a single point of entry into the database it ultimately doesn't matter which "layer" is maintaining referential integrity. Using the "built-in layer" of foreign key constraints seems to make the most sense, but if you have a rock solid service layer responsible for the same thing then it has freedom to break the rules if necessary.
Personally I use foreign key constraints and engineer my apps so they don't have to break the rules. Relational data with guaranteed referential integrity is just easier to work with.
The performance gained is probably equivalent to the performance lost from having to maintain integrity outside of the db.
In an OLTP database, the only reason I can think of is if you care about performance more than data integrity. Enforcing a FK when row is inserted to the child table requires an index seek on the parent table and I can imagine there may be extreme situations where even this relatively quick index seek is too much. For example, some kind of very intensive logging where you can live with incorrect log entries and the application doing the writing is simple and unlikely to have bugs.
That being said, if you can live with corrupt data, you can probably live without a database in the first place.
Defensive Programming withot foreign keys works if you primarily use stored procedures and every application uses those stored procedures, instead of writing their own queries. Then you can control it quite easily and more flexible than the standard foreign keys.
One situation I can think of off the top of my head where foreign key constraints are not readily usable is a permissions module where permissions can be applied per user or per group, determined by a Boolean. So some of the records in the permissions table have a user id and others have a group id. If you still wanted foreign key constraints, you would have to have two different fields for the same mutally exclusive information and allow them to be null. Meaning adding another constraint saying that one is allowed to be null but they can't both be null, as well as a combination of 3 fields must be unique instead of a combination of 2 fields (user/group id and permission id). And the alternative is two separate tables containing the same data, meaning maintaining both tables separately.
But perhaps in that scenario, it's best to separate the data. Anything where you need the same field to connect to different tables based on other data in that record, you cannot use foreign field constraints, and it becomes best to keep the constraints in the stored procedures and views instead.