What should one map strings to in a database ORM? - sql

Strings are unbounded, but it seems every normal relational database requires that a column declare its maximum length. This seems to be a rather significant discrepancy and I'm curious how typical ORMs handle this.
Using a 'text' column type would theoretically give you much more string-like storage, but as I understand it text columns are not queryable, or at least not efficiently (non-indexed?).
I'm thinking of using something like NHibernate perhaps, but my ORM needs are relatively simple, so if I can just write it myself it might save some bloat.

SqlServer for instance stores only the actual size of the data. For me, defining a large enough size for strings is sufficient that the user doesn't recognize the limits.
Exmaples:
name of a product, person etc: 500
Paths, Urls etc: 1000
Comments, free text: 2000 or even more
NHibernate does not anything with the size at runtime. You need to use some kind of validator or let the database either cut or throw an exception.
Quote: "my ORM needs are relatively simple". It's hard to say if NHibernate is overkill or not. Data access isn't generally that simple.
As a simple guide of the top of my head, take NHibernate if:
You have a fine-grained or complex domain model. You need to map inheritance.
You want your domain model somewhat independent from the database model.
You need some lazy loading features
You want to be database independent, eg. run it on SqlServer or Oracle
If you think that a class per table is what you need, you don't actually need a ORM.

According to my knowledge, they dont handle it, either you let the ORM define the schema, then:
The ORM will decide the size either defaulted or defined by your config.
Or the schema is not deined by the ORM, then it just has to obey the rules, if you insert too large strings, then you'll get errors from the DB.
I would stay on varcharish types, e.g. varchar2 for oracle or nvarchar for sql server, unless you're dealing with clobs.

Related

SQL database design with some user defined fields

I'm developing a database schema to handle collection of data and later, reporting on this data.
After a requirements discussion, it seems that either an entity-attribute-value (EAV) solution, or a flat table solution would be alright - since the data is somewhat sparse but not highly sparse.
However, user defined fields will become a must in the future, but I understand that querying and optimizing an RDBMS with EAV tables can become complex.
I've taken a look at the discussion here, and I was thinking an option similar to option 1 would be possible. For example, have a number of set fields, then a number of spare fields that users can define the labels of.
In terms of reporting, is there any downside in using this approach rather than using EAV?
You will regret EAV, especially when it comes to reporting
Make sure you're aware of existing data model patterns before you try anything: Ready to use database model patterns
Familiarize yourself with Table Inheritance: How can you represent inheritance in a database?
Consider allowing users to modify their own schemas: https://martinfowler.com/bliki/UserDefinedField.html
EAV is almost always a really bad idea. If you still need custom fields after trying the above, use a blob type (like JSON or XML) with indexing: http://backchannel.org/blog/friendfeed-schemaless-mysql . Postgres's binary jsonb is fast and allows indexing/querying

What is the best way to represent a form with hundreds of questions in a model

I am trying to design an income return tax software.
What is the best way to represent/store a form with hundreds of questions in a model?
Just for this example, I need at least 6 models (T4, T4A(OAS), T4A(P), T1032, UCCB, T4E) which possibly contain hundreds of fields.
Is it by creating hundred of fields? Storing values in a map? An Array?
One very generic approach could be XML
XML allows you to
nest your data to any degree
combine values and meta information (attributes and elements)
describe your data in detail with XSD
store it externally
maintain it easily
even combine it with additional information (look at processing instructions)
and (last but not least) store the real data in almost the same format as the modell...
and (laster but even not leaster :-) ) there is XSLT to transform your XML data into any other format (such as HTML for nice presentation)
There is high support for XML in all major languages and database systems.
Another way could be a typical parts list (or bill of materials/BOM)
This tree structure is - typically - implemented as a table with a self-referenced parentID. Working with such a table needs a lot of recursion...
It is very highly recommended to store your data type-safe. Either use a character storage format and a type identifier (that means you have to cast all your values here and there), or you use different type-safe side tables via reference.
Further more - if your data is to be filled from lists - you should define a datasource to load a selection list dynamically.
Conclusio
What is best for you mainly depends on your needs: How often will the modell change? How many rules are there to guarantee data's integrity? Are you using a RDBMS? Which language/tools are you using?
With a case like this, the monolithic aggregate is probably unavoidable (unless you can deduce common fields). I'm going to exclude RDBMS since the topic seems to focus more on lower-level data structures and a more proprietary-style solution, though that could be a very valid option that can manage all these fields.
In this case, I think it ceases to become so much about formalities as just daily practicalities.
Probably worst from that standpoint in this case is a formal object aggregating fields, like a class or struct with a boatload of data members. Those tend to be the most awkward and the most unattractive as monoliths, since they tend to have a static nature about them. Depending on the language, declaration/definition/initialization could be separate which means 2-3 lines of code to maintain per field. If you want to read/write these fields from a file, you have to write a separate line of code for each and every field, and maintain and update all that code if new fields added or existing ones removed. If you start approaching anything resembling polymorphic needs in this case, you might have to write a boatload of branching code for each and every field, and that too has to be maintained.
So I'd say hundreds of fields in a static kind of aggregate is, by far, the most unmaintainable.
Arrays and maps are effectively the same thing to me here in a very language-agnostic sense provided that you need those key/value pairs, with only potential differences in where you store the keys and what kind of algorithmic complexity is involved. Whatever you do, probably a key search in this monolith should be logarithmic time or better. 'Maps/associative arrays' in most languages tend to inherently have this quality.
Those can be far more suitable, and you can achieve the kind of runtime flexibility that you like on top of those (like being able to manage these from a file and add the fields on the fly with no pre-existing knowledge). They'll be far more forgiving here.
So if the choice is between a bunch of fields in a class and something resembling a map, I'd suggest going for a map. The dynamic nature of it will be far more forgiving for these kinds of cases and will typically far outweigh the compile-time benefits of, say, checking to make sure a field actually exists and producing a syntax error otherwise. That kind of checking is easy to add back in and more if we just accept that it will occur at runtime.
An exception that might make the field solution more appealing is if you involve reflection and more dynamic techniques to generate an object with the appropriate fields on the fly. Then you get back those dynamic benefits and flexibility at runtime. But that might be more unwieldy to initialize the structure, could involve leaning a lot more heavily on heavy-duty (and possibly very computationally-expensive) introspection and type manipulation and code generation mechanisms, and also end up with more funky code that's hard to maintain.
So I think the safest bet is the map or associative array, and a language that lets you easily add new fields, inspect existing ones, etc. with very fast turnaround. If the language doesn't inherently have that quality, you could look to an external file to dynamically add fields, and just maintain the file.

What design patterns for marshalling JSON APIs to/from SQL

I'm working on a first JSON-RPC/JSON-REST API. One of the conveniences of JSON is that it can easily represent structured data (a user may have multiple email addresses, multiple addresses), etc...
For example, the Facebook Graph API nicely represents the kind of thing that's handy to return as JSON objects:
https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851559_339008529558010_1864655268_n.png
However, in implementing an API such as this with a relational database, we end up shattering structured objects into very many tables (at least one for each list in the JSON object), and un-shattering them when responding to requests. So:
requires a lot of modelling (separate models for JSON object and SQL tables).
inconsistencies creep in between the models: e.g. user_id (in SQL) vs. userID (in JSON)
marshaling stuff between one model and the other is very time consuming (tedious, error-prone and pointless boilerplate).
What design-patterns exist to help in this situation?
I'm not sure you are looking for design patterns. I would look for tools that handle this better.
I assume that you want to be able to query these objects, and not just store them in TEXT fields. Many databases support XML fairly well, so I would convert the JSON to XML (with a library) and then store that in the database.
You may also want to consider a JSON document based database. That will definitely get you where you want to go.
If you don't need to be able to query these, or only need to query a very small subset of fields, just store the objects as text, and extract those query-able fields into actual columns. This way you don't need to touch the majority of the data, but you can still query the few fields you care about. (Plus you can index them for speedier lookup.)
I have always chosen to implement this functionality in a facade pattern. Since the point of the facade is to simplify (abstract) an underlying complexity as a boundary between two or more systems, it seemed like the perfect place to handle this.
I realize however that this does not quite answer the question. I am talking about the container for the marshalling while the question is about how to better manage the contents (the code that does the job).
My approach here is somewhat old fashioned, but since this an old question maybe that’s okay. I employ (as much as possible) stored procedures in the dB. This promotes better encapsulation than one typically finds with a code layer outside of the dB that has to “know about” dB structure. What inevitably happens in the latter case is that more than one system will be written to do this (one large company I worked at had at least 6 competing ESBs) and there will be conflicts. Also, usually the stored procedure scripting will benefit from some sort of IDE that will helps maintain contextual awareness of the dB structure.
So this approach - even though it is not a pattern per se - makes managing the ORM a lot easier.

Storing dynamic fields with Doctrine2

in our app, we are looking to use doctrine2, however, there is one feature we want to offer but am completely confused as to how it would work.
we want our customers to be able to define custom fields to our standard objects. so, these fields would be made on-the-fly, and not part of the object definition that is known and mapped by doctrine.
our first thought was to use nosql (mongodb or amazon dynamodb) to store some of this data, but since we want to use doctrine to handle our core objects, we would like to stay within the realm of doctrine to achieve this without have to extend beyond it to store this data.
one thing on my mind was using doctrine's ability to serialize/unserialize complex objects and just have like a hash of custom field names and their values as an extra property in the object, however, this would not allow us to have a feature that would search these fields if we ever wanted to allow that...
anyone ever attempted to do this with doctrine2 or any orm variant?
You could consider using Doctrine ODM, which is Doctrine 2 but for NoSQL - I believe they support at least MongoDB.
Another approach would be to use serialization as you said. You probably shouldn't worry about search too much - I would recommend to use a separate fulltext search engine (Solr, ElasticSearch, or other) as they provide much more versatility and performance for search vs SQL fulltext search.
Third, you could use Doctrine alongside with NoSQL. In this case, you probably should abstract your querying into a service class or such, so that you can use Doctrine to query for the data from your SQL DB, and some other to query the remaining data.
Finally, you could consider using a key-value table. One column represents the key, another the value.

When is sqlite's manifest typing useful?

sqlite uses something that the authors call "Manifest Typing", which basically means that sqlite is dynamically typed: You can store a varchar value in a "int" column if you want to.
This is an interesting design decision, but whenever I've used sqlite, I've used it like a standard RDMS and treated the types as if they were static. Indeed, I've never even wished for dynamically typed columns when designing databases in other systems.
So, when is this feature useful? Has anybody found a good use for it in practice that could not have been done just as easily with statically typed columns?
It really just makes types easier to use. You don't need to worry about how big this field needs to be at a database level anymore, or how many digets your intergers can be. More or less it is a 'why not?' thing.
On the other side, static typing in SQL Server allows the system to search and index better, in some cases much better, but for half of all applications, I doubt that database performance improvements would matter, or their performance is 'poor' for other reasons (temp tables being created every select, exponential selects, etc).
I use SqLite all the time for my .NET projects as a client cache because it is just too easy too use. Now if they can only get it to use GUIDs the same as SQL server I would be a happy camper.
Dynamic typing is useful for storing things like configuration settings. Take the Windows Registry, for example. Each key is a lot like having an SQLite table of the form:
CREATE TABLE Settings (Name TEXT PRIMARY KEY, Value);
where Value can be NULL (REG_NONE) or an INTEGER (REG_DWORD/REG_QWORD), TEXT (REG_SZ), or BLOB (REG_BINARY).
Also, I'll have to agree with the Jasons about the usefulness of not enforcing a maximum size for strings. Because much of the time, those limits are purely arbitary, and you can count on someday finding a 32-byte string that needs to be stored in your VARCHAR(30).