Storing dynamic fields with Doctrine2 - orm

in our app, we are looking to use doctrine2, however, there is one feature we want to offer but am completely confused as to how it would work.
we want our customers to be able to define custom fields to our standard objects. so, these fields would be made on-the-fly, and not part of the object definition that is known and mapped by doctrine.
our first thought was to use nosql (mongodb or amazon dynamodb) to store some of this data, but since we want to use doctrine to handle our core objects, we would like to stay within the realm of doctrine to achieve this without have to extend beyond it to store this data.
one thing on my mind was using doctrine's ability to serialize/unserialize complex objects and just have like a hash of custom field names and their values as an extra property in the object, however, this would not allow us to have a feature that would search these fields if we ever wanted to allow that...
anyone ever attempted to do this with doctrine2 or any orm variant?

You could consider using Doctrine ODM, which is Doctrine 2 but for NoSQL - I believe they support at least MongoDB.
Another approach would be to use serialization as you said. You probably shouldn't worry about search too much - I would recommend to use a separate fulltext search engine (Solr, ElasticSearch, or other) as they provide much more versatility and performance for search vs SQL fulltext search.
Third, you could use Doctrine alongside with NoSQL. In this case, you probably should abstract your querying into a service class or such, so that you can use Doctrine to query for the data from your SQL DB, and some other to query the remaining data.
Finally, you could consider using a key-value table. One column represents the key, another the value.

Related

Mongo: query documents from multi-collection

There are two tables such as student and class:
SELECT student.name, class.subj
FROM student
INNER JOIN class
ON student.class_id = class.class_id;
In sql is ok, but in mongodb,
I know the MongoDB does not support joins,
but I don't want put in one collection,
I want to put in 2 collections and query it and return in one data.
reason that I want to do like this, please see this
so how can I do?
Currently Mongodb does not support cross collection requests and AFAIK there is no plan to do such a functionality. It differs with the whole concept of document based databases.
We faced same issue with Mongodb earlier working with Nodejs project. The solution for us was to put subdocuments into another collection with a reference to parent document by _id parameter of Mongodb. Large part of it was handled by Mongoose ORM, but in its core it still will do two different requests - one for retrieving parent document and another for retrieving all children where parent document will still have a parameter array with list of _id of all its children.
This is a difference in schema design pattern between SQL and NoSQL. In SQL the schema is fixed and changing it is sometimes painful, but you benefit from this fixed schema by ability to do complex requests. In NoSQL there is no fixed schema, all schema is in your head (and perhaps documentation) and you yourself need to follow it, but this provides you a good speed on database level.
UPDATE: After all we ended up with merging two collections into one. There still were some problems with quering subdocuments from parent document, but it was pretty easy and did not change much for us. I would recommend you looking into this rather than splitting into two separate collections. it also primarily depends on the workflow with your DB, will you be doing more read queries or more write queries? With NoSQL schema you need also consider those points. If more reading - single collection is a way to go.

Can I use straight SQL in Django models?

And, if I can, does that mean I lose my advantage of treating the results as objects? I find complex queries confusing in many ORMs, not just Django's. But, it is probably because I have never really used an ORM. Does anyone use straight up SQL anymore?
edit: Am I defeating the purpose of having a framework if I bypass the ORM completely? They all have a "nifty" ORM, but when it comes to queries with lots of subqueries, derived tables, it doesn't look pretty.
Using Django's QuerySet API you have different possibilities:
You can use extra() which will return a queryset which evaluates to model objects. Therefore it is, as the name says, somehow limited, because for returning model instances it is necessary to eg. query the model's table. But you have the possibility to add additional SQL eg. the WHERE or ORDER clause. Querysets that use extra() can still use the features of the ORM - like chaining multiple filter() for example.
raw() returns a RawQueryset which also can be iterated over to get model instances, but you loose a lot of features that the ORM would normally provide.
And of course you can execute SQL directly, using a low level connection cursor API (no model instances of course).
Study the documentation on raw queries, there's also a lot of information on eg. how to map a model's fields on the data coming from a raw query and documeting a few gotchas when passing parameters into the query.
To also answer your edited question: I wouldn't use raw SQL when you can do it with the ORM, but of course the ORM is limited and if you need to do some more complex stuff you will always have to switch to SQL (but sometimes using extra() is enough-so you can still use the advantages of the ORM). Don't forget that the ORM works with every DB backend, while the custom SQL might not work with every database.
You can use raw SQL to either return objects; or if you want you can bypass the ORM completely.

Problems in reusing a single POJO class for different databases

I am using POJO (Plain Old Java Object) classes for mapping the relational database and using Apache Solr to index the database.
I don't know whether I can re-use pojo classes for Apache Solr or not.
Since mapping classes are too specific and are designed with foreign key relationship in mind, it is very difficult to use the classes with Solr (a single schema search server), but creating new POJO classes for Apache Solr is also difficult.
So I want to know which is the better design approach for reusing.
Also I would like to know the pitfalls of reusing the same POJO class.
SOLR is very different from a relational database...basically it is something like a big table (with several differences like multivalued columns).
Now, I see your problem a step behind the concrete implementation (POJO)...
First, you have to de-normalize your table(s)...that's the real hard thing you need to do when working with SOLR. I mean passing from your ER to SOLR schema. Once did that, you can use Solrj to map entity with pojo, but this is only the last part of the story.
Still about denormalization: doesn't make sense do a raw translation of a set of POJO mapped on top of a relational database. Relational databases are general-purpose: their design approach is data-centric. I mean, first decide how to store your data and after that SQL clients will be able to get what they need.
SOLR works in a different way: in order to determine moreless exactly your schema you should know your search requirements (i.e. queries). The schema is not general-purpose (like a database) but is tiered on top of search requirements. That's the reason why you could index an atrtribute or not, decide what kind of analysis needs a particular field, multivalued, monovalued, stemming, etc etc etc
So basically, it's all about denormalization and query requirements.

What is the recommendation on using NHibernate CreateSQLQuery?

My gut tells me that advanced NHibernate users would be against it and I have been looking for actual analysis on this and have found nothing, I'd like for the answer to address these questions:
What are the pros/cons of using it?
Are there any performance implications, both good or bad (e.g. use it to call stored procedures?)
In which scenarios should we use/avoid it?
Who should use/avoid it?
basically, what are the reasons to use/avoid it and why?
CreateSQLQuery exists for a reason, which is executing queries that are either:
Not supported
Hard to write
using any of the other methods.
Of course it's usually the last choice, because:
It's not object oriented (i.e. you're back to thinking of tables and columns instead of entities, properties and relationships)
It ties you to the physical model
It ties you to a specific RDBMS
It usually forces you to do more work in order to retrieve entities
It doesn't automatically support features like paging
But if you think it's needed for a particular query, go ahead. Make sure to learn all the other methods first (HQL, Linq, QueryOver, Criteria and Get) to avoid doing unnecessary work.
One of the main reasons to avoid SQL and use HQL is to avoid making the code base dependent on the RDBMS type (e.g. MySQL, Oracle). Another reason is that you have to make your code dependent on the table and column names rather than the entity names and properties.
If you are comparing raw SQL to using the NHibernate LINQ provider there are other compelling reasons to go for LINQ queries (when it works), such as type safety and being able to use VS reference search to determine in what queries a certain table or column is referenced.
My opinion is that CreateSQLQuery() is a "last way out" option. It is there because there are things you cannot do with the other NHibernate APIs but it should be avoided since it more or less goes against the whole idea of using NHibernate in the first place.

What should one map strings to in a database ORM?

Strings are unbounded, but it seems every normal relational database requires that a column declare its maximum length. This seems to be a rather significant discrepancy and I'm curious how typical ORMs handle this.
Using a 'text' column type would theoretically give you much more string-like storage, but as I understand it text columns are not queryable, or at least not efficiently (non-indexed?).
I'm thinking of using something like NHibernate perhaps, but my ORM needs are relatively simple, so if I can just write it myself it might save some bloat.
SqlServer for instance stores only the actual size of the data. For me, defining a large enough size for strings is sufficient that the user doesn't recognize the limits.
Exmaples:
name of a product, person etc: 500
Paths, Urls etc: 1000
Comments, free text: 2000 or even more
NHibernate does not anything with the size at runtime. You need to use some kind of validator or let the database either cut or throw an exception.
Quote: "my ORM needs are relatively simple". It's hard to say if NHibernate is overkill or not. Data access isn't generally that simple.
As a simple guide of the top of my head, take NHibernate if:
You have a fine-grained or complex domain model. You need to map inheritance.
You want your domain model somewhat independent from the database model.
You need some lazy loading features
You want to be database independent, eg. run it on SqlServer or Oracle
If you think that a class per table is what you need, you don't actually need a ORM.
According to my knowledge, they dont handle it, either you let the ORM define the schema, then:
The ORM will decide the size either defaulted or defined by your config.
Or the schema is not deined by the ORM, then it just has to obey the rules, if you insert too large strings, then you'll get errors from the DB.
I would stay on varcharish types, e.g. varchar2 for oracle or nvarchar for sql server, unless you're dealing with clobs.