Hibernate and IDs

Hibernate and IDs - sql

Is it possible in hibernate to have an entity where some IDs are assigned and some are generated?
For instance:
Some objects have an ID between 1-10000 that are generated outside of the database; while some entities come in with no ID and need an ID generated by the database.

You could use 'assigned' as the Id generation strategy, but you would have to give the entity its id before you saved it to the database. Alternately you could build your own implementation of org.hibernate.id.IdentifierGenerator to provide the Id in the manner you've suggested.
I have to agree w/ Cade Roux though, and doing so seems like it be much more difficult than using built in increment, uuid, or other form of id generation.

I would avoid this and simply have an auxiliary column for the information about the source of the object and a column for the external identifier (assuming the external identifier was an important value you wanted to keep track of).
It's generally a bad idea to use columns for mixed purposes - in this case to infer from the nature of a surrogate key the source of an object.

Use any generator you like, make sure it can start at an offset (when you use a sequence, you can initialize it accordingly).
For all other entities, call setId() before you insert them. Hibernate will only generate an id if the id property is 0. Note that you should first insert objects with ids into the db and then work with them. There is a lot of code in Hibernate which expects the object to be in the DB when id != 0.
Another solution is to use negative ids for entities which come with an id. This will also make sure that there are no collisions when you insert an new object.

Related

Domain Driven Design Auto Incremented Entity Key

Just starting with Domain Driven Design and I've learned that you should keep your model in a valid state and when creating a new instance of a class it's recomended to put all required attributes as constructor parameters.
But, when working with auto incremented keys I just have this new ID when I call an Add method from my persistent layer. If I instanciate my objects without a key, I think they will be in a invalid state because they need some sort of unique identifier.
How should I implement my architecture in order to have my IDs before creating a new instance of my entity ?

Generated Random IDs
The pragmatic approach here is to use random IDs and generate them before instantiating an entity, e.g. in a factory. GUIDs are a common choice.
And before you ask: No, you won't run out of GUIDs :-)
Sequential IDs with ID reservation
If you must use a sequential ID for some reason, then you still have options:
Query a sequence on the DB to get the next ID. This depends on your DB product, Oracle for example has them).
Create a table with an auto-increment key that you use only as key reservation table. To get an ID, insert a row into that table - the generated key is now reserved for you, so you can use it as ID for the entity.
Note that both approaches for sequential IDs require a DB round-trip before you even start creating the entity. This is why the random IDs are usually simpler. So if you can, use random IDs.
DB-generated IDs
Another possibility is to just live with the fact that you don't have the ID at creation time, but only when the insert operation on the DB succeeds. In my experience, this makes entity creation awkward to use, so I avoid it. But for very simple cases, it may be a valid approach.

IN adition to theDmi's comments
1) You can in your factory method make sure your entity gets stored to the database. This might or might not be applicable to your domain but if you are sure that entity is going to be saved that might be a valid approach
2) You can separate the ID from the primary key from the database. I've worked with a case there something was only an order if the customer payed and at that point it would be identified by it's invoice id (a sequentual ID). that doesn't mean in the database i would need an column ID which was also the primary key of the object. You could have a primary key in the database (random guid) and till have an ID (int?) to be sequentual and null if it hasn't be filled yet.

Why use sql tags in struct in some go libs like gorm?

Well I know the necessity of tags in struct in golang and how is it accessed by reflect in golang. But I have searched and could not find a reliable answer to the question of why I should use sql tags in struct while writing struct for sql results. I have explored many sample code and people are using sql:"index" in the struct and sql:"primary_key" in the struct.
Now I have done indexing in the database layer, isn’t it enough? Should I have to use sql:"index" too get the best results? Like so I have defined primary key attribute in the database should I have to specify sql:"primary_key" as well?
My code seems to work fine without those. Just want to know their benefit and usages.

I think you are referring to an ORM library like gorm
In that case, metadata like sql:"primary_key" or sql:"index" will just tell the ORM to create an index while trying to setup the tables or maybe migrate them.
A couple of examples in gorm could be: indexes, primary keys, foreign keys, many2many relations or when trying to adapt an exiting schema into your gorm models, setting the type explicitly, like for example:
type Address struct {
ID int
Address1 string `sql:"not null;unique"` // Set field as not nullable and unique
Address2 string `sql:"type:varchar(100);unique"`
Post sql.NullString `sql:"not null"`
}

Depends on the package you are using and your use-case. Is it enough for CRUD? Almost always, unless the package says so which is often rare but possible. Few packages sometime do under the hood magic which may give rise to bugs. If you are aware of these behaviours, or are quite explicit in your code, you'll probably avoid it.
Indexing tags mostly allows you to use package's migration tools translating your model declaration into sql queries (CREATE statements). So if you always want to do this by yourself, then you probably needn't bother adding such tags.
But you may find yourself a bug if your package requires a tag. For example, in case of gorm, the Model method takes a struct pointer as an input. If this struct has a field named ID it uses it as a primary key, that is, say ID has a value of "4", it will add a WHERE id=4 automatically. In case your struct has ID, you needn't even add a primary_key tag and it will still be treated as one. This behaviour may cause issues when you have both a "non-primary-key" ID field, and another field which you are actually using as the primary key. Another example for gorm is this. A possible behaviour can also be checking for nullable property and throwing an error if an INSERT statement involves a NOT NULL field getting a NULL value.
On a different note, adding tags to your structs can be considered good practice since it gives context of its properties in the DB.

How does rails come up with the ID for a new model/record?

How does activerecord assign an ID to a newly created record? The ID values seem to be all over the place. Sometimes they are sequential, but sometimes they seem to be some kind of a hash.
Is there a way to control the behavior?

Within a relational database you'll see that IDs are usually sequential. This happens to be an automatically incrementing field called id by default in these databases with Rails. This is the 99% case, meaning that 99% of the time you can expect to see it done this way. It's the sane way.
However, There are some cases in which the "id" field within the database may not be automatically incrementing and may instead be a string. In a database I am working with at the moment, the id field is called client_id, is a 6-character string such as "RAB001" and needs to be manually assigned by the code itself. This is due to a legacy system we are supporting and there's nothing we can do to fix that. It's just how it is.
In other databases such as Mongoid the ids are, once again, generated automatically. There's a difference here though: instead of them being automatically incrementing numbers they are a hash. In a Mongo database I happen to have handy, one of the object's _id fields (note the underscore) is this lovely, easy-to-understand1 hash: 4e22b5812f8b7d6f6d000001. This is automatically generated by Mongo and I don't really care what it is except for when I need to find an object and there's no other way of finding it by another unique value.
I would recommend sticking with an automatically generating ID system, be it something provided by the traditional database systems such as PostgreSQL or MySQL or something by Mongo.
Any system where you need to generate the primary key for a record manually needs to have a huge "HERE BE DRAGONS" label on it and should be handled like a case of nitroglycerin or similarly to this apt analogy. Avoid this system if you can.
1 I am being sarcastic here.

Should one include ID as a property on objects persisted to a database?

I am creating the model for a web application. The tables have ID fields as primary keys. My question is whether one should define ID as a property of the class?
I am divided on the issue because it is not clear to me whether I should treat the object as a representation of the table structure or whether I should regard the table as a means to persist the object.
If I take the former route then ID becomes a property because it is part of the structure of the database table, however if I take the latter approach then ID could be viewed as a peice of metadata belonging to the database which is not strictly a part of the objects model.
And then we arrive at the middle ground. While the ID is not really a part of the object I'm trying to model, I do realise that the the objects are retrieved from and persisted to the database, and that the ID of an object in the database is critical to many operations of the system so it might be advantageous to include it to ease interactions where an ID is used.
I'm a solo developer, so I'd really like some other, probably more experienced perspectives on the issue

Basically: yes.
All the persistence frameworks ive used (including Hibernate, Ibatis) do require the ID to be on the Object.
I understand your point about metadata, but an Object from a database should really derive its identity in the same way the database does - usually an int primary key. Then Object-level equality should be derived from that.
Sometimes you have primary keys that are composite, e.g first name and last name (don't ever do this!), in which cases the primary key doesn't become 'metadata' because it is part of the Object's identity.
I generally reserve the ID column of an object for the database. My opinion is that to use it for any 'customer-facing' purpose, (for example, use the primary key ID as a customer number) you will always shoot yourself in the foot later.

If you ever make changes to the existing data (instead of exclusively adding new data), you need the PK. Otherwise you don't know which record to change in the DB.

You should have the ID in the object. It is essential.
The easiest use case to give as an example is testing equality:
public bool Equals(Object a, Object b) { return {a.ID = b.ID}; }
Anything else is subject to errors, and you'll find that out when you start getting primary key violations or start overwriting existing data.
By counterargument:
Say you don't have the ID in the object. Once you change an object, and don't have it's ID from the database, how will you know which record to update?
At the same time, you should note that the operations I mention are really private to the object instance, so ID does not necessarily have to be a public property.

I include the ID as a property. Having a simple unique identifier for an object is often very handy regardless of whether the object is persisted in a database or not. It also makes your database queries much more simple.
I would say that the table is just a means to persist an object, but that doesn't mean the object can't have an ID.

I'm very much of the mindset that the table is a means to persist the object, but, even so, I always expose the IDs on my objects for two primary reasons:
The database ID is the most convenient way to uniquely identify an object, either within a class (if you're using a per-table serial/autonumber ID) or universally (if you're maintaining a separate "ID-to-class" mapping). In the context of web applications, it makes everything much simpler and more efficient if your forms are able to just specify <input type=hidden name=id value=12345> instead of having to provide multiple fields which collectively contain sufficient information to identify the target object (or, worse, use some scheme to concatenate enough identifying information into a single string, then break it back down when the form is submitted).
It needs to have an ID anyhow in order to maintain a sane database structure and there's no reason not to expose it.

Should the ID in the object read-only or not? In my mind it should be read-only as by definition the ID will never change (as it uniquely identifies a record in the database).
This creates a problem when you create a new object (ID not set yet), save it in the database through a stored procedure which returns the newly created ID then how do you store it back in the object if the ID property is read-only?
Example:
Employee employee = new Employee();
employee.FirstName="John";
employee.LastName="Smith";
EmployeeDAL.Save(employee);
How does the Save method (which actually connects to the database to save the new employee) update the EmployeeId property in the Employee object if this property is read-only (which should be as the EmployeeId will never ever change once it's created).

Database-wide unique-yet-simple identifiers in SQL Server

First, I'm aware of this question, and the suggestion (using GUID) doesn't apply in my situation.
I want simple UIDs so that my users can easily communicate this information over the phone :
Hello, I've got a problem with order
1584
as opposed to
hello, I've got a problem with order
4daz33-d4gerz384867-8234878-14
I want those to be unique (database wide) because I have a few different kind of 'objects' ... there are order IDs, and delivery IDs, and billing-IDs and since there's no one-to-one relationship between those, I have no way to guess what kind of object an ID is referring to.
With database-wide unique IDs, I can immediately tell what object my customer is referring to. My user can just input an ID in a search tool, and I save him the extra-click to further refine what is looking for.
My current idea is to use identity columns with different seeds 1, 2, 3, etc, and an increment value of 100.
This raises a few question though :
What if I eventually get more than 100 object types? granted I could use 1000 or 10000, but something that doesn't scale well "smells"
Is there a possibility the seed is "lost" (during a replication, a database problem, etc?)
more generally, are there other issues I should be aware of?
is it possible to use an non integer (I currently use bigints) as an identity columns, so that I can prefix the ID with something representing the object type? (for example a varchar column)
would it be a good idea to user a "master table" containing only an identity column, and maybe the object type, so that I can just insert a row in it whenever a need a new idea. I feel like it might be a bit overkill, and I'm afraid it would complexify all my insertion requests. Plus the fact that I won't be able to determine an object type without looking at the database
are there other clever ways to address my problem?

Why not use identities on all the tables, but any time you present it to the user, simply tack on a single char for the type? e.g. O1234 is an order, D123213 is a delivery, etc.? That way you don't have to engineer some crazy scheme...

Handle it at the user interface--add a prefix letter (or letters) onto the ID number when reporting it to the users. So o472 would be an order, b531 would be a bill, and so on. People are quite comfortable mixing letters and digits when giving "numbers" over the phone, and are more accurate than with straight digits.

You could use an autoincrement column to generate the unique id. Then have a computed column which takes the value of this column and prepends it with a fixed identifier that reflects the entity type, for example OR1542 and DL1542, would represent order #1542 and delivery #1542, respectively. Your prefix could be extended as much as you want and the format could be arranged to help distiguish between items with the same autoincrement value, say OR011542 and DL021542, with the prefixes being OR01 and DL02.

I would implement by defining a generic root table. For lack of a better name call it Entity. The Entity table should have at a minimum a single Identity column on it. You could also include other fields that are common accross all your objects or even meta data that tells you this row is an order for example.
Each of your actual Order, Delivery...tables will have a FK reference back to the Entity table. This will give you a single unique ID column
Using the seeds in my opinion is a bad idea, and one that could lead to problems.
Edit
Some of the problems you mentioned already. I also see this being a pain to track and ensure you setup all new entities correctly. Imagine a developer updating the system two years from now.
After I wrote this answer I had thought a but more about why your doing this, and I came to the same conclusion that Matt did.

MS's intentional programing project had a GUID-to-word system that gave pronounceable names from random ID's

Why not a simple Base36 representation of a bigint? http://en.wikipedia.org/wiki/Base_36

We faced a similar problem on a project. We solved it by first creating a simple table that only has one row: a BIGINT set as auto-increment identity.
And we created an sproc that inserts a new row in that table, using default values and inside a transaction. It then stores the SCOPE_IDENTITY in a variable, rolls back the transaction and then returns the stored SCOPE_IDENTITY.
This gives us a unique ID inside the database without filling up a table.
If you want to know what kind of object the ID is referring to, I'd lose the transaction rollback and also store the type of object along side the ID. That way findout out what kind of object the Id is referring to is only one select (or inner join) away.

I use a high/low algorithm for this. I can't find a description for this online though. Must blog about it.
In my database, I have an ID table with an counter field. This is the high part. In my application, I have a counter that goes from 0 to 99. This is the low part. The generated key is 100 * high + low.
To get a key, I do the following
initially high = -1
initially low = 0
method GetNewKey()
begin
if high = -1 then
high = GetNewHighFromDatabase
newkey = 100 * high + low.
Inc low
If low = 100 then
low = 0
high = -1
return newKey
end
The real code is more complicated with locks etc but that is the general gist.
There are a number of ways of getting the high value from the database including auto inc keys, generators etc. The best way depends on the db you are using.
This algorithm gives simple keys while avoiding most the db hit of looking up a new key every time. In testing, I found it had similar performance to guids and vastly better performance than retrieving an auto inc key every time.

You could create a master UniqueObject table with your identity and a subtype field. Subtables (Orders, Users, etc.) would have a FK to UniqueObject. INSTEAD OF INSERT triggers should keep the pain to a minimum.

Maybe an itemType-year-week-orderNumberThisWeek variant?
o2009-22-93402
Such identifier can consist of several database column values and simply formatted into a form of an identifier by the software.

I had a similar situation with a project.
My solution: By default, users only see the first 7 characters of the GUID.
It's sufficiently random that collisions are extremely unlikely (1 in 268 million), and it's efficient for speaking and typing.
Internally, of course, I'm using the entire GUID.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas