Problems in reusing a single POJO class for different databases - oop

I am using POJO (Plain Old Java Object) classes for mapping the relational database and using Apache Solr to index the database.
I don't know whether I can re-use pojo classes for Apache Solr or not.
Since mapping classes are too specific and are designed with foreign key relationship in mind, it is very difficult to use the classes with Solr (a single schema search server), but creating new POJO classes for Apache Solr is also difficult.
So I want to know which is the better design approach for reusing.
Also I would like to know the pitfalls of reusing the same POJO class.

SOLR is very different from a relational database...basically it is something like a big table (with several differences like multivalued columns).
Now, I see your problem a step behind the concrete implementation (POJO)...
First, you have to de-normalize your table(s)...that's the real hard thing you need to do when working with SOLR. I mean passing from your ER to SOLR schema. Once did that, you can use Solrj to map entity with pojo, but this is only the last part of the story.
Still about denormalization: doesn't make sense do a raw translation of a set of POJO mapped on top of a relational database. Relational databases are general-purpose: their design approach is data-centric. I mean, first decide how to store your data and after that SQL clients will be able to get what they need.
SOLR works in a different way: in order to determine moreless exactly your schema you should know your search requirements (i.e. queries). The schema is not general-purpose (like a database) but is tiered on top of search requirements. That's the reason why you could index an atrtribute or not, decide what kind of analysis needs a particular field, multivalued, monovalued, stemming, etc etc etc
So basically, it's all about denormalization and query requirements.

Related

Benefits to Abstracting SQL Tables

I'm going to use Drupal as my example, but it extends to other situations as well.
I've seen database schema that are abstracted away from what a DBA would implement, most notably with Drupal. For example, When you create a Content Type in Drupal (equivalent of table), it abstracts away the fields, as new tables, in the form of field_{machineName}, which then relates back to the original "parent" table (node_type in drupal).
When I'm dealing with MVC frameworks, like Rails, Django, or Laravel, we don't abstract away the tables, so fields are stored right on the table itself, not related back.
What benefits do you get from implementing an abstracted table rather than a concrete table? Are there situations that this should be used, or is it generally a bad idea? It seems like a bad design choice to me, but I'm a fairly isolated programmer.
A feeble attempt to illustrate my question, using a "Book" example.
EDIT
I see that my diagram isn't exactly accurate. I will post a new one that reflects that node_id should relate to a node table, which then stores a reference to node_type
My 2 cents:
Pros of abstraction :
Can handle any entity type the same exact way.
you can define "Generic UI" & plugin system based on node type
You can define Generic behaviours (like ACL based on node field title) applicable to any model built.
Cons of abstraction:
You cannot see the "final" model directly (however, you may rebuild an image of it)
performance & querying complexity (can be mitigated with "flat" indexation tables)
So i would say :
for "open datamodel" , able to suit any need of data representation , abstraction has many advantages (at the cost of readability & performance). That's the typical case of many "multipurpose meta builders" (like Drupal)
If you know what you are modelizing and are defining an "application" rather than an "application factory" , you'd better use a "specific" datamodel for the application scope.
Another "meta" database construction pattern i like to use is :
Defining entity specific tables with associated "generic" table. (typed base table & open "key/value" property table associated with each entry of the base entity table). So it gives the ability to add "extra info" to existing base entity without having to modify the core model at each iteration. Letting the choice to find out what "properties" to migrate in the base table over time.
Another variant of this is EAV model , used for example in Magento.
IMHO, here are the 2 main reasons why the Drupal schema is build this way
Fields are dynamic, they cab be added and removed from a an entity bundle at any time fron the Web UI. Using separated table ease mutation of the schema.
Field values can be translated, in Drupal 7 the translations is a done at the field level. The title field could be translatable, while the content field may be not.
Note that most of the times, when using the Drupal APIs, you don't have to deal with these tables.

What design patterns for marshalling JSON APIs to/from SQL

I'm working on a first JSON-RPC/JSON-REST API. One of the conveniences of JSON is that it can easily represent structured data (a user may have multiple email addresses, multiple addresses), etc...
For example, the Facebook Graph API nicely represents the kind of thing that's handy to return as JSON objects:
https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851559_339008529558010_1864655268_n.png
However, in implementing an API such as this with a relational database, we end up shattering structured objects into very many tables (at least one for each list in the JSON object), and un-shattering them when responding to requests. So:
requires a lot of modelling (separate models for JSON object and SQL tables).
inconsistencies creep in between the models: e.g. user_id (in SQL) vs. userID (in JSON)
marshaling stuff between one model and the other is very time consuming (tedious, error-prone and pointless boilerplate).
What design-patterns exist to help in this situation?
I'm not sure you are looking for design patterns. I would look for tools that handle this better.
I assume that you want to be able to query these objects, and not just store them in TEXT fields. Many databases support XML fairly well, so I would convert the JSON to XML (with a library) and then store that in the database.
You may also want to consider a JSON document based database. That will definitely get you where you want to go.
If you don't need to be able to query these, or only need to query a very small subset of fields, just store the objects as text, and extract those query-able fields into actual columns. This way you don't need to touch the majority of the data, but you can still query the few fields you care about. (Plus you can index them for speedier lookup.)
I have always chosen to implement this functionality in a facade pattern. Since the point of the facade is to simplify (abstract) an underlying complexity as a boundary between two or more systems, it seemed like the perfect place to handle this.
I realize however that this does not quite answer the question. I am talking about the container for the marshalling while the question is about how to better manage the contents (the code that does the job).
My approach here is somewhat old fashioned, but since this an old question maybe that’s okay. I employ (as much as possible) stored procedures in the dB. This promotes better encapsulation than one typically finds with a code layer outside of the dB that has to “know about” dB structure. What inevitably happens in the latter case is that more than one system will be written to do this (one large company I worked at had at least 6 competing ESBs) and there will be conflicts. Also, usually the stored procedure scripting will benefit from some sort of IDE that will helps maintain contextual awareness of the dB structure.
So this approach - even though it is not a pattern per se - makes managing the ORM a lot easier.

Storing dynamic fields with Doctrine2

in our app, we are looking to use doctrine2, however, there is one feature we want to offer but am completely confused as to how it would work.
we want our customers to be able to define custom fields to our standard objects. so, these fields would be made on-the-fly, and not part of the object definition that is known and mapped by doctrine.
our first thought was to use nosql (mongodb or amazon dynamodb) to store some of this data, but since we want to use doctrine to handle our core objects, we would like to stay within the realm of doctrine to achieve this without have to extend beyond it to store this data.
one thing on my mind was using doctrine's ability to serialize/unserialize complex objects and just have like a hash of custom field names and their values as an extra property in the object, however, this would not allow us to have a feature that would search these fields if we ever wanted to allow that...
anyone ever attempted to do this with doctrine2 or any orm variant?
You could consider using Doctrine ODM, which is Doctrine 2 but for NoSQL - I believe they support at least MongoDB.
Another approach would be to use serialization as you said. You probably shouldn't worry about search too much - I would recommend to use a separate fulltext search engine (Solr, ElasticSearch, or other) as they provide much more versatility and performance for search vs SQL fulltext search.
Third, you could use Doctrine alongside with NoSQL. In this case, you probably should abstract your querying into a service class or such, so that you can use Doctrine to query for the data from your SQL DB, and some other to query the remaining data.
Finally, you could consider using a key-value table. One column represents the key, another the value.

Hibernate vs SQL

I'm currently implementing a new functionality to a tool in an e-learning platform. I need to retrieve some columns from 3 different tables in the database. The particular tool is implemented with Hibernate technology where a class is mapped to a database table.
However, I need to use some information from different tables to build a single class.
Can you Hibernate provide this sort of implementation?
If not, will it be appropriate for me to use SQL in this situation?
Is it a good practice to have 2 database technologies in one place?
Hibernate is made to do this, multiple tables either through relationships such as many-to-one, or multiple tables which represent different subtypes, etc. You can use entity-name to map a single class to two different tables for different situations. So the answer is yes.
As to doing Hibernate and hand-coding SQL in the same application, I think it's a very common practice, sometimes it's 200% easier to do than figure out the Hibernate mapping for a small detail. I'm referring to something like JDBC, but as nowaq points out, you cna do this in Hibernate as well.
Yes, Hibernate can do this. Take a look at this post: Mapping One Java Class to Two Database Tables with Hibernate.
Maybe its not your case, but in general having two tables mapped to a single class may be a signal that you've got something wrong with your design. Make sure to take a second look.
Try not to mix too many frameworks and technologies in your app. You may end up with bunch of dependencies and very complex, unclean code.

Complex taxonomy ORM mapping - looking for suggestions

In my project (ASP.NET MVC + NHibernate) I have all my entities, lets say Documents, described by set of custom metadata. Metadata is contained in a structure that can have multiple tags, categories etc. These terms have the most importance for users seeking the document they want, so it has an impact on views as well as underlying data structures, database querying etc.
From view side of application, what interests me the most are the string values for the terms. Ideally I would like to operate directly on the collections of strings like that:
class MetadataAsSeenInViews
{
public IList<string> Categories;
public IList<string> Tags;
// etc.
}
From model perspective, I could use the same structure, do the simplest-possible ORM mapping and use it in queries like "fetch all documents with metadata exactly like this".
But that kind of structure could turn out useless if the application needs to perform complex database queries like "fetch all documents, for which at least one of categories is IN (cat1, cat2, ..., catN) OR at least one of tags is IN (tag1, ..., tagN)". In that case, for performance reasons, we would probably use numeric keys for categories and tags.
So one can imagine a structure opposite to MetadataAsSeenInViews that operates on numeric keys and provide complex mappings of integers to strings and other way round. But that solution doesn't really satisfy me for several reasons:
it smells like single responsibility violation, as we're dealing with database-specific issues when just wanting to describe Document business object
database keys are leaking through all layers
it adds unnecessary complexity in views
and I believe it doesn't take advantage of what can good ORM do
Ideally I would like to have:
single, as simple as possible metadata structure (ideally like the one at the top) in my whole application
complex querying issues addressed only in the database layer (meaning DB + ORM + at less as possible additional code for data layer)
Do you have any ideas how to structure the code and do the ORM mappings to be as elegant, as effective and as performant as it is possible?
I have found that it is problematic to use domain entities directly in the views. To help decouple things I apply two different techniques.
Most importantly I'm using separate ViewModel classes to pass data to views. When the data corresponds nicely with a domain model entity, AutoMapper can ease the pain of copying data between them, but otherwise a bit of manual wiring is needed. Seems like a lot of work in the beginning but really helps out once the project starts growing, and is especially important if you haven't just designed the database from scratch. I'm also using an intermediate service layer to obtain ViewModels in order to keep the controllers lean and to be able to reuse the logic.
The second option is mostly for performance reasons, but I usually end up creating custom repositories for fetching data that spans entities. That is, I create a custom class to hold the data I'm interested in, and then write custom LINQ (or whatever) to project the result into that. This can often dramatically increase performance over just fetching entities and applying the projection after the data has been retrieved.
Let me know if I haven't been elaborate enough.
The solution I've finally implemented don't fully satisfy me, but it'll do by now.
I've divided my Tags/Categories into "real entities", mapped in NHibernate as separate entities and "references", mapped as components depending from entities they describe.
So in my C# code I have two separate classes - TagEntity and TagReference which both carry the same information, looking from domain perspective. TagEntity knows database id and is managed by NHibernate sessions, whereas TagReference carries only the tag name as string so it is quite handy to use in the whole application and if needed it is still easily convertible to TagEntity using static lookup dictionary.
That entity/reference separation allows me to query the database in more efficient way, joining two tables only, like select from articles join articles_tags ... where articles_tags.tag_id = X without joining the tags table, which will be joined too when doing simple fully-object-oriented NHibernate queries.