Prevent duplicates for a certain GraphCool model

Prevent duplicates for a certain GraphCool model - sql

I have a GraphCool model called Student which has a field called studentNumber. How can I prevent two (or more) different Student nodes with the same studentNumber? In SQL databases I was able to create a unique index to accomplish this.

The easiest way to do so is by enabling the "Unique" constraint for a certain field (studentNumber in your case for the Student model). You can enable constraints in the structure view when editing a model.
Note: Under the hood, this creates a unique index in the database just like you suggested.

Related

Enable adding unknown properties to table objects

I am creating an database managment webapp that has a strange requirement to provide the user with ability to add new 'fields' to existing objects. For example the table 'Employees' has Names and ID's of emloyers. Suddenly the owner of system wants to know if his employees have driver license. But our table and app did not expect that
The only options that come to my mind is to
1) Add big varchar field storing aditional properties as JSON or something
2) Add table 'additional properties' That will allow creating new objects linked by PK to existing users
Then we will have
TABLE USER TABLE PROPERTIES
-ID <-------------FK
-NAME -NAME (driver livence)
-VALUE (true)
How bad is the second idea? Are there any better options other then going noSQL?

There is no issue with either approach. EAV (entity-attribute-value) models have been part of relational databases, probably since the earliest databases were created.
They do have some downsides:
The value attribute is (generally) a string. That means that other types cannot be coerced into the column.
The values cannot be part of declared foreign key relationships.
Validating values is tricky -- very complicated check constraints, for instance.
There is no (easy) way to ensure that an entity has a particular value.
But for user-defined or sparsely populated values, EAV is definitely a reasonable choice.
JSON is another reasonable choice. That does, however, require a one-time change to the database to add a JSON column. Some databases offer indexing on JSON values, which can improve performance.
If "has-drivers-license" is a one-time change, then you might just want a separate table with the same primary key. The next time that a new column is needed, you can modify the "options" table, rather than the main table. This allows better support for validating values (all values are unique, for example) or defining foreign key constraints.

Adding a row to Table A if it has a required foreign key to Table B which has a required foreign key to Table A

This might sound complicated, so I'll give an example.
Say, I have two tables Instructor and Class.
Instructor has a required field called PreferredClassID which has a foreign key against Class.
Class has a required field called CurrentInstructorID which is a foreign key against Instructor
Is it possible to insert a row to either of these tables?
Cause if I insert a row to Instructor, I won't be able to as I'll need to supply a PreferredClassID, but I can't create a Class row either because it needs a CurrentInstructorID.
If I can't do this, how would I solve this problem? Would I just need to make one of those fields non-required (even if business requirements specifies it really should be required?)

If you find yourself here, reevaluate your data relation model.
In this case, you could simply have a lookup table called PreferredCourse with courseId and instructorId.
This will enforce that both the course and instructor exist before adding the row to the PreferredCourse lookup. Maintaining business model requirements without bending the rules of database model requirements.
While it may seem excessive to have another table, it will prevent a whole lot of maintenance overhead in both your database procedures and jobs, and your application code. Circular references create nothing but headaches and are easily solved with small lookup tables and JOINs.
The Impaler gave an example of how to accomplish this with your current data structure. Please note, that you have to 1: make a key nullable in at least one of the tables, and then 2: Perform INSERTs in a specified order. Or, 3: disable the constraints, 4: perform INSERTS, 5: reenable constraints, 6: roll back transaction if constraints are now broken.
There is a whole lot that can go wrong, simply fix the relation model now before things get out of hand.

As long as one of those foreign keys allows a null value, you're good. So you:
Insert the row that accepts the null value first (say Instructor), with a null value on the FK. Get the ID of the inserted row.
Insert in the other table (say Class). In the FK you use the ID you got from step #1. Once inserted, you get the ID of this new row.
Update the FK on the first row (Instructor) with the ID you got from step #2.
Commit.
Alternatively, if both FKs are NOT NULL then you have a bit of a problem. The options I see for this last case are:
Use deferrable FK integrity check. Some databases do allow you to insert without checking integrity until the COMMIT happens. This is really tricky, and enabling this is looking for trouble.
Disable the FK for a short period of time. Some databases allow you to enable/disable constraints. You are not deleting them, just temporarily disabling them. If you do this, don't forget to enable them back.
Drop the constraint temporarily, while you do the insert, and the add it again. This is really a work around of last resort. Adding/Dropping constraint are DML statements and usually cannot participate in a transaction. Do this at your own peril.

Something to consider (as per user7396598's answer) is looking at how normal forms apply to your data as it fits within your relational model.
In this case, it might be worth looking at the following:
With your Instructor table, is the PreferredClassID a necessary component? Does an instructor -need- to have a preferred class, or is it okay to say "Hey, I'm creating an entry for a new instructor, I don't know their preferred class."
(if they're new, they might not have a preferred class that your school offers)
This is a case where you definitely want to have a foreign key, but it should be okay to say 'I don't necessarily know the value I want to put there.'
In a similar vein, does a Class need to have an instructor when it's created? Is it possible to create a Class that an instructor has not been assigned to yet?
Again, both of these points are really a case of 'I don't know what I want to put here, but when I do, it should be a specific instance that exists in another table.'

Should I include user_id in multiple tables?

I'm at the planning stages of a multi-user application where each user will only have access their own data. There'll be a few tables that relate to each other, so I could use JOINs to ensure they're accessing only their data, but should I include user_id in each table? Would this be faster? It would certainly make some of the queries easier in the long run.
Specifically, the question is about multiple tables containing the user_id field.
For example, each user can configure categories, items (in those categories), and sub-items against those items. There's a logical path from user, to sub-items through the other tables, but it would require 3 JOINs. Should I just include user_id in all the tables?
Thanks!

This is a design decision in multi-tenant databases. With "root" tables, obviously you have to have the user_id. But in the non-"root" tables, you do have a choice when you are using surrogate PKs.
Say you have users with projects and projects with actions. Projects obviously has to have a user_id, but if actions are tied to one and only one project, then the user_id is redundant, and also violates normal form, since if it was to move to another user's project (probably not likely in your use cases), both the project FK and the user FK would have to be updated. Typically in multi-tenant scenarios, this isn't really a possible scenario, and so the primary key of every table is really a combination of tenant and a unique primary key "within" the tenant (which may also happen to be globally unique).
If you use natural keys extensively in your design, then clearly tenant+natural key is necessary so that each tenant's natural keys can be used. It's only when using surrogates like IDENTITY or GUIDs or sequences, that this becomes an issue, since it is tempting to make the IDENTITY the PK, after all, it is unique by definition.
Having the user_id in all tables does allow you to do certain things in views to enhance security (defense in depth), giving you a little bit of defensive programming (in SQL Server you can restrict all access through inline table valued function - essentially parametrized views - which require the app to specify user_id on every "table" access), and also allows you to easily scale out to multiple databases by forklifting everything on shared keys.
See this article for some interesting insights.
(In a massively multi-parallel paradigm like Teradata, the PRIMARY INDEX determines the amp on which the data lives, so I would think that this is a must to stop redistribution of rows to the other amps.)
In general, I would say you have a tenantid in each table, it should be the first column in the table, in most indexes and should be part of the primary key in most cases, unless otherwise justified. Where possible, it should be a required parameter in most stored procedures.

Generally, you use foreign keys to relate data between tables. In many cases, this foreign key is the user id. For example:
users
id
name
phonenumbers
user_id
phonenumber
So yes, that'd make perfect sense.

If a category can only belong to one user then yes, you need to include the user_id in the category table. If a category can belong to multiple people then you would have a separate table that maps category IDs to user IDs. You can still do this if you have a one to one mapping between the two, but there is no real reason for it.
You don't need to include the user_id in further tables if you can guarantee that those child tables will always be accessed via joining to the category table. If there is a chance that you will access them independantly of the category table then you should also have the user_id on those tables.

The extent to which to normalize can be a difficult decision. One of the best StackOverflow answers on this topic (Database Development Mistakes Made by App Developers) warns against both (1) failing to normalize, and (2) over-normalizing.
You mention that it might be easier "in the long run" to repeat the same data in multiple tables (that is, not to normalize that data). Look at the "Not simplifying complex queries through views" topic in the previous link. If you use views effectively, you will only have to do the 3 join query once when writing the view and then you can use a query with no joins for most purposes.
Most developers tend to under-normalize because it seems simpler. Go ahead and normalize. Use views to simplify your daily queries. When your requiremens get more complex or you decide to add features, you will be glad that you put time into a relational database design.
Alternatively, depending on your toolset, you may want to use a database abstraction layer that does the relational design under the covers while you manipulate higher level data object.

if it is Oracle, then you would probably set up a fine grained security rule to do the joins and prevent certain activities based on the existence of the original user id... (SELECT INSERT UPDATE DELETE etc)
You would need a map between the logged in user and the user_id. You could use uid, but then remember this umber may change if the database is reconstructed after some disaster...

How to model a mutually exclusive relationship in SQL Server

I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!

It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.

Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.

your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.

Is there any way to fake an ID column in NHibernate?

Say I'm mapping a simple object to a table that contains duplicate records and I want to allow duplicates in my code. I don't need to update/insert/delete on this table, only display the records.
Is there a way that I can put a fake (generated) ID column in my mapping file to trick NHibernate into thinking the rows are unique? Creating a composite key won't work because there could be duplicates across all of the columns.
If this isn't possible, what is the best way to get around this issue?
Thanks!
Edit: Query seemed to be the way to go

The NHibernate mapping makes the assumption that you're going to want to save changes, hence the requirement for an ID of some kind.
If you're allowed to modify the table, you could add an identity column (SQL Server naming - your database may differ) to autogenerate unique Ids - existing code should be unaffected.
If you're allowed to add to the database, but not to the table, you could try defining a view that includes a RowNumber synthetic (calculated) column, and using that as the data source to load from. Depending on your database vendor (and the products handling of views and indexes) this may face some performance issues.
The other alternative, which I've not tried, would be to map your class to a SQL query instead of a table. IIRC, NHibernate supports having named SQL queries in the mapping file, and you can use those as the "data source" instead of a table or view.

If you're data is read only one simple way we found was to wrapper the query in a view and build the entity off the view, and add a newguid() column, result is something like
SELECT NEWGUID() as ID, * FROM TABLE
ID then becomes your uniquer primary key. As stated above this is only useful for read-only views. As the ID has no relevance after the query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas