Creating constraint on multiple properties in neo4j - properties

I'm new in Neo4j an I need some help.
I'm trying to make constraint in multiple properties of nodes at once per two meanings:
I need to specify as constraint many properties without typing again and again all over the properties with the command
I need to define many properties as ONE- UNITY constraint like in SQL when 3 attributes is a primary key and not separably.
How can I achieve it?

You are actually asking 2 questions.
The APOC procedure apoc.schema.assert is helpful for conveniently ensuring that the DB has the required set of indexes and constraints. (Be aware, though, that this procedure will drop any existing indexes and constraints not specified in the call.)
For example, as shown in the documentation, this call:
CALL apoc.schema.assert(
{Track:['title','length']},
{Artist:['name'],Track:['id'],Genre:['name']});
will return a result like this (also, if an index or constraint had been dropped, a row with the action value of "DROPPED" would have been returned as well):
╒════════════╤═══════╤══════╤═══════╕
│label │key │unique│action │
╞════════════╪═══════╪══════╪═══════╡
│Track │title │false │CREATED│
├────────────┼───────┼──────┼───────┤
│Track │length │false │CREATED│
├────────────┼───────┼──────┼───────┤
│Artist │name │true │CREATED│
├────────────┼───────┼──────┼───────┤
│Genre │name │true │CREATED│
├────────────┼───────┼──────┼───────┤
│Track │id │true │CREATED│
└────────────┴───────┴──────┴───────┘
Since there is not (yet) any way to create an index or constraint on multiple properties of a node label, one popular workaround is to use an extra property whose value is an array of the values you want to use. You will have to make sure the values are all of the same type, converting some if necessary. Unfortunately, this requires storing some data redundantly, and makes your code a bit more complex.

Related

Enable adding unknown properties to table objects

I am creating an database managment webapp that has a strange requirement to provide the user with ability to add new 'fields' to existing objects. For example the table 'Employees' has Names and ID's of emloyers. Suddenly the owner of system wants to know if his employees have driver license. But our table and app did not expect that
The only options that come to my mind is to
1) Add big varchar field storing aditional properties as JSON or something
2) Add table 'additional properties' That will allow creating new objects linked by PK to existing users
Then we will have
TABLE USER TABLE PROPERTIES
-ID <-------------FK
-NAME -NAME (driver livence)
-VALUE (true)
How bad is the second idea? Are there any better options other then going noSQL?
There is no issue with either approach. EAV (entity-attribute-value) models have been part of relational databases, probably since the earliest databases were created.
They do have some downsides:
The value attribute is (generally) a string. That means that other types cannot be coerced into the column.
The values cannot be part of declared foreign key relationships.
Validating values is tricky -- very complicated check constraints, for instance.
There is no (easy) way to ensure that an entity has a particular value.
But for user-defined or sparsely populated values, EAV is definitely a reasonable choice.
JSON is another reasonable choice. That does, however, require a one-time change to the database to add a JSON column. Some databases offer indexing on JSON values, which can improve performance.
If "has-drivers-license" is a one-time change, then you might just want a separate table with the same primary key. The next time that a new column is needed, you can modify the "options" table, rather than the main table. This allows better support for validating values (all values are unique, for example) or defining foreign key constraints.

Behavior of a SORT without BY on standard internal tables? Is it safe?

What exactly does the SORT statement without key specification do when run on a standard internal table? As per the documentation:
If no explicit sort key is entered using the addition BY, the internal table itab is sorted by the primary table key. The priority of the sort is based on the order in which the key fields are specified in the table definition. In standard keys, the sort is prioritized according to the order of the key fields in the row type of the table. If the primary table key of a standard table is empty, no sort takes place. If this is known statically, the syntax check produces a warning.
With the primary table key being defined as:
Each internal table has a primary table key that is either a self-defined key or the standard key. For hashed tables, the primary key is a hash key, for sorted tables, the primary key is a sorted key. Both of these table types are key tables for which key access is optimized and the primary key thus has its own administration. The key fields of these tables are write-protected when you access individual rows. Standard tables also have a primary key, but the corresponding access is not optimized, there is no separate key administration, and the key fields are not write-protected.
And for good measure, the standard key is defined as:
Primary table key of an internal table, whose key fields in a structured row type are all table fields with character-like data types and byte-like data types. If the row type contains substructures, these are broken down into elementary components. The standard key for non-structured row types is the entire table row if the row type itself is not a table type. If there are no corresponding table fields, or the row type itself is a table type, the standard key from standard tables is empty or contains no key fields.
All of which mainly just confuses me as I'm not sure if I can really rely on the basic SORT statement to provide a reliable or safe result. Should I really just avoid it in all situations or does it have a purpose if used properly?
By extension, if I want to run a DELETE ADJACENT DUPLICATES FROM itab COMPARING ALL FIELDS, when would it be safe to do so after a simple SORT itab.? Only if I added a key on all fields? Without an explicit key only if I have an internal table with clike and xsequence columns? If I want to execute that DELETE statement, what is the most optimal SORT statement to run on the internal table?
SORT without BY should be avoided in all situations because it "makes the program difficult to understand and possibly unpredictable" (dixit ABAP documentation). I think that if you don't mention BY, there is a warning by a static check in the Code Inspector. You should use SORT itab BY table_line where table_line is a special name ("pseudo-component") meaning "all fields of the line".
Not your question, but you may also define the internal table with primary and secondary keys, so that you don't need to sort explicitly - DELETE ADJACENT DUPLICATES can be used with any of those keys.
Internal tables can have keys that can be inherited from structures the itab is based on or specified. As the documentation says, sort without by sorts by primary key, and that is safe assuming the internal table is implemented correctly.
I think this feature is designed as a dynamic feature to be used with smart table key design. If done correctly, sort without by can get your program to adapt to table key changes in the future. (so if your key changes, sort with change with it). Problems might arise when key is modified in an odd way.
As rule of a thumb:
The more specific your program code is, the less prone to errors (and safer) it is.
So sort by key_id, key_date will always produce the same sort by those 2 fields.
Dynamic components in an application make it more flexible, but tend to have (often hard to notice) bugs coming out when things they rely on are modified .
So if you take the previous example with 2 key fields, you add 1 in the middle (let's say key_is_active between 2 existing fields), sorting results might change in a way you did not expect.
If you had an algorithm that processes based on date, your algorithm might be broken by that change.
In your particular case with delete adjacent I would follow Sandra Rossi's advice.

PostgreSQL - Modify CREATE TABLE syntax in an extension

Is it possible to write an PostgreSQL extension that modifies the DDL syntax?
I'm creating an extension on top of PostgreSQL and PostGIS to support integrity constraints of a specific spatial data model (OMT-G). For that, I want to modify the CREATE TABLE syntax, which accept CONSTRAINTS with this syntax:
CONSTRAINT constraint_name CHECK ( expression )
But I want to create my own syntax, like those in the following example, which will then call functions or triggers I have written already.
CREATE TABLE school_district (
id integer PRIMARY KEY,
school_name varchar(120)
geom geometry,
SPATIAL_CONSTRAINT PLANAR_SUBDIVISION (geom),
SPATIAL_CONSTRAINT CONTAINS school (geom)**
);
Is that possible? If so, how?
As others have commented, it's not possible to alter Postgres grammar via an extension. There's been some discussion related to this on the hacker's mailing list, but no one sees any feasible way to make the Bison grammar extensible.
The other problem with what you're proposing is that CHECK constraints (which it appears is what you're trying to do here) can not safely reference other tables.
It sounds like what you really want here is extensible foreign key support. That's something the community would actually like as well, at least for arrays. The idea is to support something like int[], where each element should be treated as a foreign key reference to another table. What you're describing is similar: instead of a different data type, you want to use a different operator.
For now though, I think the best you could do is to provide your users a function that will put an appropriate column and trigger on the table. The column would be a foreign key to the school table. The trigger would find an appropriate school and populate it's primary key in the new column. (You'd probably want a BEFORE DELETE trigger on the school table to deal with deleting a school too.)
The reason you need the foreign key field is because foreign key triggers operate with different visibility rules than normal queries, so you can't fully simulate them in user space. If you're not quite that paranoid you could just have a single AFTER INSERT trigger that looks for a school and throws an error if it doesn't find one.
As far as any pure-DDL solutions, the most promising method is using CREATE TABLE ... CONSTRAINT ... EXCLUDE, which operates GiST on a single table, and has relatively limited operators that work only on the bounding box (e.g. &&):
CREATE TABLE polygons (
geom geometry(Polygon,4326),
EXCLUDE USING gist (geom WITH &&)
);
INSERT INTO polygons(geom)
VALUES('SRID=4326;POLYGON ((0 0, 0 1, 1 0, 0 0))');
But then, this conflicts (even though the geometries don't actually overlap):
INSERT INTO polygons(geom)
VALUES('SRID=4326;POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))');
ERROR: conflicting key value violates exclusion constraint "polygons_geom_excl"
DETAIL: Key (geom)=(0103000020E61000000100000005000000000000000000F03F000000000000F03F000000000000F03F0000000000000040000000000000004000000000000000400000000000000040000000000000F03F000000000000F03F000000000000F03F) conflicts with existing key (geom)=(0103000020E61000000100000004000000000000000000000000000000000000000000000000000000000000000000F03F000000000000F03F000000000000000000000000000000000000000000000000).
As mentioned above by #Jim, the best approach to have one table to build a constraint over another table is to make a good trigger function, and use it on both tables. It would normally be written in PL/pgSQL, where you can embed useful messages, such as:
RAISE EXCEPTION 'School % is not located in school district % (it is % away)',
s.name, d.name, ST_Distance(s.geom, d.geom);
This way, if you edit either the school_district or schools table, the trigger would to the check on UPDATE, INSERT, or DELETE to see if the conditions remain valid.

SQL database design: storing the type of a row

I am designing a database to contain a table reference, with a column type that is one of several predefined values (e.g., book, movie, magazine, etc.). I intend the range of possible values to expand over time (e.g. if I realize that I missed the academic_paper type, I want to be able to put that in).
The easiest solution would seem to be to simply store a string representing the type into the table. But this sounds like it would result in a lot of wasted space.
The other solution I thought of is creating a new table reference_types, which the type column references in its foreign key. This seems to have the added benefit of ensuring valid foreign keys (so that I won't accidentally mistype a "magzine" somewhere in my code), possible allow for faster queries for all media of a certain type (since integer comparisons should be much faster than string comparisons), but also slow my application down a bit as joins would be required whenever I need the reference type, and probably complicate logic because of those extra joins.
What are your thoughts on schema design for this problem?
Your second solution is the correct one. Create a secondary table to store your reference types and link them using a foreign key.
For further reading on this subject the search term you'd want to use is 'database normalisation'.
Create the reference_types table. And in your references table use integer and also add a reference_type_name field.
You can query the references table to get the integer key and print its name when needed without performing a join to the other table, and still use that table to perfom other operations, just keep both tables with equal type names.
I know it sonds redundant, but it's really the fastest way to do a simple query by int key and have it all together.
It depends, if you will want to add some other information to reference types, then use the second approach. If not, use the first one because it's faster and the information stored is only a string (you can always select unique to retrieve your types). Read this article for more info.

Setting the right foreign key on insert

Morning all,
I'm doing a lot of work to drag a database (SQL Server 2005, in 2000 compatibility mode) kicking and screaming towards having a sane design.
At the moment, all the tables' primary keys are nvarchar(32), and are set using uniqId() (oddly, this gets run through a special hashing function, no idea why)
So in several phases, I'm making some fundamental changes:
Introducing ID_int columns to each table, auto increment and primary key
Adding some extra indexing, removing unused indexes, dropping unused columns
This phase has worked well so far, test db seems a bit faster, total index sizes for each table are MUCH smaller.
My problem is with the next phase: foreign keys. I need to be able to set these INT foreign keys on insert in the other tables.
There are several applications pointing at this DB, only one of which I have much control over. It also contains many stored procs and triggers.
I can't physically make all the changes needed in one go.
So what I'd like to be able to do is add the integer FKs to each table and have them automatically set to the right thing on insert.
To illustrate this with an example:
Two tables, Call and POD, linked pod.Call_ID -> Call.Call_ID. This is an nvarchar(32) field.
I've altered call such that Call_ID_int is identity, auto increment, primary key. I need to add POD.Call_ID_int such that, on insert, it gets the right value from Call.Call_ID_int.
I'm sure I could do this with a BEFORE trigger, but I'd rather avoid this for maintenance and speed reasons.
I thought I could do this with a constraint, but after much research found I can't. I tried this:
alter table POD
add constraint
pf_callIdInt
default([dbo].[map_Call_ID_int](Call_ID))
for Call_ID_int
Where the map_Call_ID_int function takes the Call_ID and returns the right Call_ID_int, but I get this error:
The name "Call_ID" is not permitted in this context. Valid expressions
are constants, constant expressions, and (in some contexts) variables.
Column names are not permitted.
Any ideas how I can achieve this?
Thanks very much in advance!
-Oli
Triggers are the easiest way.
You'll have odd concurrency issues with defaults based on UDFs too (like you would for CHECK constraints).
Another trick is to use views to hide schema changes but still with triggers to intercept DML. So your "old" table no longer exists only as a view on "new" table. A write to the "old" table/view actually happens on the new table.