How would you write this model validation as a SQL constraint? - sql

I have this validation in a Model for a number attribute, and I also want to add a database constraint. How would you do this? would you add an index or partial index to satisfy the scope and the if statements?
validates :number,
numericality: { greater_than: 0 },
uniqueness: { scope: :tenant_id },
unless: state == 'pending',
if: :number
I'm thinking about this migration to create the constraint but I'm not sure it is the best way
def change
reversible do |dir|
dir.up do execute <<-SQL
ALTER TABLE work_orders
ADD CONSTRAINT number_uniqueness_constraint
CHECK ( number IS NULL OR number NOT IN
( SELECT number
FROM work_orders
WHERE tenant_id = work_orders.tenant_id )
)
SQL
end
end
end

Business constraints should be defined both in the database as a hard constraint (if possible) as well as within Rails.
The advantage of having the constraint within your database is that this strictly ensures that your dataset is valid as the database will not allow to break the constraint. The disadvantage of this however is that if there is a query which tries to break the constraint, the error message will not be a great user experience.
Within Rails, you get great messages for each constraint and can handle errors very well. This works great for e.g format constraints (such as the numericality check). For uniqueness constraints however, there is the possibility for race conditions, as in parallel requests, the uniqueness check and later commit can overlap, resulting in a failed constraint.
Because of that, it is often a good idea to define constraints in both places. The Rails constraint will result in good error messages for most cases, the SQL constraint will strictly ensure valid data, even when there are races.
With that being said, your Rails constraint currently doesn't matych your Postgres constraint. You are missing the unless rule. For the uniqueness constraint, you may also use a much simpler combined UNIQUE constraint for the two columns, rather that writing your own sub-query, depending on how exactly you plan to resolve your edge-cases (i.e. the unless and if rules of your Rails validation).

Related

How can I enforce constraints on many-to-many relationships in PostgresSQL?

Suppose I have two tables: thread and user, with a join table thread_user that models a many-to-many relationship between them.
Suppose I want to enforce the following constraints:
A thread can have at most 10 users.
No two threads should have the same set of users.
How can this be done? The first constraint seems easy enough to enforce with a trigger (is there a better approach?). The second constraint I have no idea how to enforce.
I could imagine a combination of triggers and constraints/indexes. Actually, you could just use triggers, but that requires handling updates, inserts, and deletes.
Instead, you can modify threads to have two additional columns:
The number of users
An array of the users
You can keep these up-to-date using triggers. Actually, the first isn't really necessary.
Then you can create a check constraint:
alter table t add constraint chk_t_num_users (num_users <= 10);
If you keep the array in sorted order, you can just add a unique index:
create unique index unq_t_users on (users);
Or you can define your own sort_array() function, as in this answer.

What's a foreign key for?

I've been using table associations with SQL (MySQL) and Rails without a problem, and I've never needed to specify a foreign key constraint.
I just add a table_id column in the belongs_to table, and everything works just fine.
So what am I missing? What's the point of using the foreign key clause in MySQL or other RDBMS?
Thanks.
A foreign key is a referential constraint between two tables
The reason foreign key constraints exist is to guarantee that the referenced rows exist.
The foreign key identifies a column or set of columns in one (referencing or child) table that refers to a column or set of columns in another (referenced or parent) table.
you can get nice "on delete cascade" behavior, automatically cleaning up tables
There are lots of reason of using foreign key listed over here: Why Should one use foreign keys
Rails (ActiveRecord more specifically) auto-guesses the foreign key for you.
... By default this is guessed to be the name of the association with an “_id” suffix.
Foreign keys enforce referential integrity.
Foreign key: A column or set of columns in a table whose values are required to match at least one PrimaryKey values of a row of another table.
See also:
http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html#method-i-belongs_to
http://c2.com/cgi/wiki?ForeignKey
The basic idea of foreign keys, or any referential constraint, is that the database should not allow you to store obviously invalid data. It is a core component of data consistency, one of the ACID rules.
If your data model says that you can have multiple phone numbers associated with an account, you can define the phone table to require a valid account number. It's therefore impossible to store orphaned phone records because you cannot insert a row in the phone table without a valid account number, and you can't delete an account without first deleting the phone numbers. If the field is birthdate, you might enforce a constraint that the date be prior to tomorrow's date. If the field is height, you might enforce that the distance be between 30 and 4000 cm. This means that it is impossible for any application to store invalid data in the database.
"Well, why can'd I just write all that into my application?" you ask. For a single-application database, you can. However, any business with a non-trivial database that stores data used business operations will want to access data directly. They'll want to be able to import data from finance or HR, or export addresses to sales, or create application user accounts by importing them from Active Directory, etc. For a non-trivial application, the user's data is what's important, and that's what they will want to access. At some point, they will want to access their data without your application code getting in the way. This is the real power and strength of an RDMBS, and it's what makes system integration possible.
However, if all your rules are stored in the application, then your users will need to be extremely careful about how they manipulate their database, lest they cause the application to implode. If you specify relational constraints and referential integrity, you require that other applications modify the data in a way that makes sense to any application that's going to use it. The logic is tied to the data (where it belongs) rather than the application.
Note that MySQL is absolute balls with respect to referential integrity. It will tend to silently succeed rather than throw errors, usually by inserting obviously invalid values like a datetime of today when you try to insert a null date into a datetime field with the constraint not null default null. There's a good reason that DBAs say that MySQL is a joke.
Foreign keys enforce referential integrity. Foreign key constraint will prevent you or any other user from adding incorrect records by mistake in the table. It makes sure that the Data (ID) being entered in the foreign key does exists in the reference table. If some buggy client code tries to insert incorrect data then in case of foreign key constraint an exception will raise, otherwise if the constraint is absent then your database will end up with inconsistent data.
Some advantages of using foreign key I can think of:
Make data consistent among tables, prevent having bad data( e.g. table A has some records refer to something does not exist in table B)
Help to document our database
Some framework is based on foreign keys to generate domain model

Unique constraint - these columns currently doesn't have unique values..?

Unique constraint - these columns currently doesn't have unique values..??
i have created Unique Constraint with 3 columns ;
my code works perfectly but once - two users # same time entered same data and somehow it saved in DB; after that incident
This UniqueConstraint gives me error - These columns currently doesn't have unique values
How can i check if user enters this kind of entry or how can i restrict the ENTRY ???
Can you give more details about the constraints and how you are enforcing them?
(I thought) Any modern DBMS should be able to handle concurrency/multiple users at the same time without constraint problems. My guess is that you are using ADO.NET DataSet/DataTable and adding constraints yourself.
If that's the case, I think the easiest/best thing to do is to add the constraint in the database as well. If two users update/save at the same time, the database will handle it correctly; one will successfully save data, the other will receive an error. You can handle that error in the application gracefully.
I guess you're using SQL Server, in which case, if the constraints have been defined such that the constraint is checked upon insert. According to the SQL Server documentation, what you describe cannot happen:
The Database Engine automatically
creates a UNIQUE index to enforce the
uniqueness requirement of the UNIQUE
constraint. Therefore, if an attempt
to insert a duplicate row is made, the
Database Engine returns an error
message that states the UNIQUE
constraint has been violated and does
not add the row to the table.
Even if the inserts happen (almost) simultaneously, the requests will be queued in the database, so that one of the requests will fail if it detects the constraint would be violated.
As Rob P says, it looks as though you are creating the constraints outside of the DB layer.

What are database constraints? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What is a clear definition of database constraint? Why are constraints important for a database? What are the types of constraints?
Constraints are part of a database schema definition.
A constraint is usually associated with a table and is created with a CREATE CONSTRAINT or CREATE ASSERTION SQL statement.
They define certain properties that data in a database must comply with. They can apply to a column, a whole table, more than one table or an entire schema. A reliable database system ensures that constraints hold at all times (except possibly inside a transaction, for so called deferred constraints).
Common kinds of constraints are:
not null - each value in a column must not be NULL
unique - value(s) in specified column(s) must be unique for each row in a table
primary key - value(s) in specified column(s) must be unique for each row in a table and not be NULL; normally each table in a database should have a primary key - it is used to identify individual records
foreign key - value(s) in specified column(s) must reference an existing record in another table (via it's primary key or some other unique constraint)
check - an expression is specified, which must evaluate to true for constraint to be satisfied
To understand why we need constraints, you must first understand the value of data integrity.
Data Integrity refers to the validity of data. Are your data valid? Are your data representing what you have designed them to?
What weird questions I ask you might think, but sadly enough all too often, databases are filled with garbage data, invalid references to rows in other tables, that are long gone... and values that doesn't mean anything to the business logic of your solution any longer.
All this garbage is not alone prone to reduce your performance, but is also a time-bomb under your application logic that eventually will retreive data that it is not designed to understand.
Constraints are rules you create at design-time that protect your data from becoming corrupt. It is essential for the long time survival of your heart child of a database solution. Without constraints your solution will definitely decay with time and heavy usage.
You have to acknowledge that designing your database design is only the birth of your solution. Here after it must live for (hopefully) a long time, and endure all kinds of (strange) behaviour by its end-users (ie. client applications). But this design-phase in development is crucial for the long-time success of your solution! Respect it, and pay it the time and attention it requires.
A wise man once said: "Data must protect itself!". And this is what constraints do. It is rules that keep the data in your database as valid as possible.
There are many ways of doing this, but basically they boil down to:
Foreign key constraints is probably the most used constraint,
and ensures that references to other
tables are only allowed if there
actually exists a target row to
reference. This also makes it
impossible to break such a
relationship by deleting the
referenced row creating a dead link.
Check constraints can ensure that only specific values are allowed in
certain column. You could create a constraint only allowing the word 'Yellow' or 'Blue' in a VARCHAR column. All other values would yield an error. Get ideas for usage of check constraints check the sys.check_constraints view in the AdventureWorks sample database
Rules in SQL Server are just reusable Check Constraints (allows
you to maintain the syntax from a
single place, and making it easier to
deploy your constraints to other
databases)
As I've hinted here, it takes some thorough considerations to construct the best and most defensive constraint approach for your database design. You first need to know the possibilities and limitations of the different constraint types above. Further reading could include:
FOREIGN KEY Constraints - Microsoft
Foreign key constraint - w3schools
CHECK Constraints
Good luck! ;)
Constraints are nothing but the rules on the data. What data is valid and what is invalid can be defined using constraints. So, that integrity of data can be maintained.
Following are the widely used constraints:
Primary Key : which uniquely identifies the data . If this constraint has been specified for certain column then we can't enter duplicate data in that column
Check : Such as NOT NULL . Here we can specify what data we can enter for that particular column and what is not expected for that column.
Foreign key : Foreign key references to the row of other table. So that data referred in one table from another table is always available for the referencing table.
Constraints can be used to enforce specific properties of data. A simple example is to limit an int column to values [0-100000]. This introduction looks good.
Constraints dictate what values are valid for data in the database. For example, you can enforce the a value is not null (a NOT NULL constraint), or that it exists as a unique constraint in another table (a FOREIGN KEY constraint), or that it's unique within this table (a UNIQUE constraint or perhaps PRIMARY KEY constraint depending on your requirements). More general constraints can be implemented using CHECK constraints.
The MSDN documentation for SQL Server 2008 constraints is probably your best starting place.
UNIQUE constraint (of which a PRIMARY KEY constraint is a variant). Checks that all values of a given field are unique across the table. This is X-axis constraint (records)
CHECK constraint (of which a NOT NULL constraint is a variant). Checks that a certain condition holds for the expression over the fields of the same record. This is Y-axis constraint (fields)
FOREIGN KEY constraint. Checks that a field's value is found among the values of a field in another table. This is Z-axis constraint (tables).
A database is the computerized logical representation of a conceptual (or business) model, consisting of a set of informal business rules. These rules are the user-understood meaning of the data. Because computers comprehend only formal representations, business rules cannot be represented directly in a database. They must be mapped to a formal representation, a logical model, which consists of a set of integrity constraints. These constraints — the database schema — are the logical representation in the database of the business rules and, therefore, are the DBMS-understood meaning of the data. It follows that if the DBMS is unaware of and/or does not enforce the full set of constraints representing the business rules, it has an incomplete understanding of what the data means and, therefore, cannot guarantee (a) its integrity by preventing corruption, (b) the integrity of inferences it makes from it (that is, query results) — this is another way of saying that the DBMS is, at best, incomplete.
Note: The DBMS-“understood” meaning — integrity constraints — is not identical to the user-understood meaning — business rules — but, the loss of some meaning notwithstanding, we gain the ability to mechanize logical inferences from the data.
"An Old Class of Errors" by Fabian Pascal
There are basically 4 types of main constraints in SQL:
Domain Constraint: if one of the attribute values provided for a new
tuple is not of the specified attribute domain
Key Constraint: if the value of a key attribute in a new tuple
already exists in another tuple in the relation
Referential Integrity: if a foreign key value in a new tuple
references a primary key value that does not exist in the referenced
relation
Entity Integrity: if the primary key value is null in a new tuple
constraints are conditions, that can validate specific condition.
Constraints related with database are Domain integrity, Entity integrity, Referential Integrity, User Defined Integrity constraints etc.

What is the purpose of constraint naming

What is the purpose of naming your constraints (unique, primary key, foreign key)?
Say I have a table which is using natural keys as a primary key:
CREATE TABLE Order
(
LoginName VARCHAR(50) NOT NULL,
ProductName VARCHAR(50) NOT NULL,
NumberOrdered INT NOT NULL,
OrderDateTime DATETIME NOT NULL,
PRIMARY KEY(LoginName, OrderDateTime)
);
What benefits (if any) does naming my PK bring?
Eg.
Replace:
PRIMARY KEY(LoginName, OrderDateTime)
With:
CONSTRAINT Order_PK PRIMARY KEY(LoginName, OrderDateTime)
Sorry if my data model is not the best, I'm new to this!
Here's some pretty basic reasons.
(1) If a query (insert, update, delete) violates a constraint, SQL will generate an error message that will contain the constraint name. If the constraint name is clear and descriptive, the error message will be easier to understand; if the constraint name is a random guid-based name, it's a lot less clear. Particulary for end-users, who will (ok, might) phone you up and ask what "FK__B__B_COL1__75435199" means.
(2) If a constraint needs to be modified in the future (yes, it happens), it's very hard to do if you don't know what it's named. (ALTER TABLE MyTable drop CONSTRAINT um...) And if you create more than one instance of the database "from scratch" and use system-generated default names, no two names will ever match.
(3) If the person who gets to support your code (aka a DBA) has to waste a lot of pointless time dealing with case (1) or case (2) at 3am on Sunday, they're quite probably in a position to identify where the code came from and be able to react accordingly.
To identify the constraint in the future (e.g. you want to drop it in the future), it should have a unique name. If you don't specify a name for it, the database engine will probably assign a weird name (e.g. containing random stuff to ensure uniqueness) for you.
It keeps the DBAs happy, so they let your schema definition into the production database.
When your code randomly violates some foreign key constraint, it sure as hell saves time on debugging to figure out which one it was. Naming them greatly simplifies debugging your inserts and your updates.
It helps someone to know quickly what constraints are doing without having to look at the actual constraint, as the name gives you all the info you need.
So, I know if it is a primary key, unique key or default key, as well as the table and possibly columns involved.
By correctly naming all constraints, You can quickly associate a particular constraint with our data model. This gives us two real advantages:
We can quickly identify and fix any errors.
We can reliably modify or drop constraints.
By naming the constraints you can differentiate violations of them. This is not only useful for admins and developers, but your program can also use the constraint names. This is much more robust than trying to parse the error message. By using constraint names your program can react differently depending on which constraint was violated.
Constraint names are also very useful to display appropriate error messages in the user’s language mentioning which field caused a constraint violation instead of just forwarding a cryptic error message from the database server to the user.
See my answer on how to do this with PostgreSQL and Java.
While the OP's example used a permanent table, just remember that named constraints on temp tables behave like named constraints on permanent tables (i.e. you can't have multiple sessions with the exact same code handling the temp table, without it generating an error because the constraints are named the same). Because named constraints must be unique, if you absolutely must name a constraint on a temp table try to do so with some sort of randomized GUID (like SELECT NEWID() ) on the end of it to ensure that it will uniquely-named across sessions.
Another good reason to name constraints is if you are using version control on your database schema. In this case, if you have to drop and re-create a constraint using the default database naming (in my case SQL Server) then you will see differences between your committed version and the working copy because it will have a newly generated name. Giving an explicit name to the constraint will avoid this being flagged as a change.