Related
In a nutshell here's the situation, we have a database that is used to build a hierarchy of "locations". (Example: Street Address > Building 1 > First Floor > Room).
Each of the locations are stored in a table. Each location can be of a different "type". The types are defined in another table. (We use the types of locations to restrict what locations can be added to a location).
Here's the quandary we are facing: We need to be able to store different types of information for different types of locations. For example, a location type of "building" may need to have the address stored where as a location type of "room" may need to have dimensions or paint color stored.
Obviously we could create a table for each location type we define to hold the properties required for the particular location type and then use application logic to query the appropriate table to pull in a particular location's additional information. Is there a more elegant or practical way to accomplish this relationally in the database without having to rely on application logic?
Thanks!
The relationally pure way to do this would be to implement your initial suggestion i.e. have a separate table such as BUILDING_LOCATIONS that holds the attributes that are only applicable to this type of location. A different [TYPE]_LOCATIONS table would be created for each type of location that has its own attributes. In this way you could use standard database constraint functionality to ensure the integrity of the data in these tables.
Another method would be to add a series of nullable columns to the LOCATIONS table such as BUILDING_ADDRESS and ROOM_DIMENSIONS. This is not relationally pure as it means null values can exist in this table. However, you can still use standard database functionality to ensure the integrity of the data. It can be a bit more convoluted if certain values are mandatory in certain situations. Also, if there are many types of location with many differing attributes the number of columns in the table can become unwieldy.
Another method is the Entity-Attribute-Value model. Generally, this is to be avoided if at all possible. It is not relationally pure, as your column values are now no longer defined over domains, and it is extremely difficult, if not impossible, to ensure the integrity of the data. Any real attempt to do so will require a lot of bespoke coding (which needs to be carefully implemented to cater for things like concurrency control that database constraints give you for free) as you cannot use standard database constraints. However, if you are just interested in storing values for information and not doing anything with them you could use this method.
The EAV method does have a danger that because it appears so easy to add attributes to an entity, it becomes the default way of doing so. It is then used to add attributes for which vital processing is dependent and, because you cannot ensure the integrity of the data using this method, you find the values being used are meaningless and the whole logical basis for the processing is destroyed.
I am new to ABAP if anyone can tell me a website which I can refer to learn ABAP in depth or to understand it better, I have confusions with this global structures, internal tables, and work areas, someone please explain the need of them clearly with difference in each.
thanks in advance.
Internal tables are a bit like lists in other languages, for instance like a List< T > in c#. They exist only in memory and only within the program they were defined in. I never encountered the term "global structure" as such, but structures are pretty much the same as in other languages. In abap they can be used to define the row structure of tables. Translate this to c# and you would end up with a class X with some properties (your row structure) and a List< X >, your internal table.
Work areas are essentially a single row of a defined structure. Work areas are for instance used to hold the contents of a single row when looping over an internal table. For instance:
data:
it_vbak type standard table of vbak,
wa_vbak type vbak.
select * from vbak into corresponding fields of table it_vbak.
loop at it_vbak into wa_vbak.
....
endloop.
this defines both an internal table it_vbak and a work area wa_vbak. Both are defined using the structure of the DDIC table VBAK, that is one of the SAP ERP tables and contains sales order header data. The example selects some data (in this case: all of it, not a good idea) into the internal table and then loops over the entries in the internal table. At the beginning of each loop, the contents of the current row are transferred to the work area. You can for instance manipulate the contents within the loop and then move the modifications back into the table by using the abap command modify:
modify it_vbak from wa_vbak.
You can define structures both within a program (using the TYPES keyword) and in the SAP ERP Data Dictionary (DDIC). The DDIC is globally available to all programs and contains definitions for tables, structures, views, table definitions, data elements and domains (and a bunch of stuff more which isn't too relevant here).
as a general reference for abap have a look at the SAP Help portal
I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)
We are using SQL Server 2008 and one of the requirements is to have extendable user defined attributes on the entities that are defined for the system. For example, we might have a entity called Doctor, we want the admins of the system to be able to define extra attributes that normally are not in the system. These attributes will most likely be needed as query criteria linking parent or joiner tables.
There will be tables that define the attributes (Name, description, type) and so forth, but my question is on the storage of the actual data values.
Im not a DBA (just a programmer pretending to be one) but my first thought was to store them in one generic column as a
nvarchar(450)
This would cover most of the basic types and still allow an index, but I thought I would run into lots of conversion type issues (converting to dates, numbers, etc.) as well as unusual query issues since everything is a nvarchar.
So, my latest thinking is to create a column for each data type that we would support:
ColNVarCharData
nvarchar(450)
ColBitData
bit
ColIntData
int
..And so forth
When the user defined the extendable attribute, they would pick a data type and then we would store attribute value in that column for that type. For example, if they picked int, the data value would be stored ColIntData and the other two columns would be null in this example.
I think this solves the conversion issues rather than storing each attribute as a generic type. Additionally I could add indexes as needed for each type depending on the query's being used.
I am leaning towards using this, but wondered if anyone else had any suggestions. I have breifly looked at the XML data type, but the "schema" could be changing quite frequently, so I thought this was a better fit.
Here are just a few SO questions/answers relevant to the topic.
One
Two
Three
Based on the link the #gbn sent me in the comments, I think this is an effective answer (Taken from the WIKI link):
The Value
Coercing all values into strings, as in the EAV data example above, results in a simple, but non-scalable, structure: constant data type inter-conversions are required if one wants to do anything with the values, and an index on the value column of an EAV table is essentially useless. Also, it is not convenient to store large binary data, such as images, in Base64 encoded form in the same table as small integers or strings. Therefore larger systems use separate EAV tables for each data type (including binary large objects, "BLOBS"), with the metadata for a given attribute identifying the EAV table in which its data will be stored. This approach is actually quite efficient because the modest amount of attribute metadata for a given class or form that a user chooses to work with can be cached readily in memory. However, it requires moving of data from one table to another if an attribute’s data type is changed. (This does not happen often, but mistakes can be made in metadata definition just as in database schema design.)
My friend is building a product to be used by different independent medical units.
The database stores a vast collection of measurements taken at different times, like the temperature, blood pressure, etc...
Let us assume these are held in a table called exams with columns temperature, pressure, etc... (as well as id, patient_id and timestamp). Most of the measurements are stored as floats, but some are of other types (strings, integers...)
While many of these measurements are handled by their product, it needs to allow the different medical units to record and process other custom measurements. A very nifty UI allows the administrator to edit these customs fields, specify their name, type, possible range of values, etc...
He is unsure as to how to store these custom fields.
He is leaning towards a separate table (say a table custom_exam_data with fields like exam_id, custom_field_id, float_value, string_value, ...)
I worry that this will make searching both more difficult to achieve and less efficient.
I am leaning towards modifying the exam table directly (while avoiding conflicts on column names with some scheme like prefixing all custom fields with an underscore or naming them custom_1, ...)
He worries about modifying the database dynamically and having different schemas for each medical unit.
Hopefully some people which more experience can weigh in on this issue.
Notes:
he is using Ruby on Rails but I think this question is pretty much framework agnostic, except from the fact that he is only looking for solutions in SQL databases only.
I simplified the problem a bit since the custom fields need to be available for more than one table, but I believe this doesn`t really impact the direction to take.
(added) A very generic reporting module will need to search, sort, generate stats, etc.. of this data, so it is required that this data be stored in the columns of the appropriate type
(added) User inputs will be filtered, for the standard fields as well as for the custom fields. For example, numbers will be checked within a given range (can't have a temperature of -12 or +444), etc... Thus, conversion to the appropriate SQL type is not a problem.
I've had to deal with this situation many times over the years, and I agree with your initial idea of modifying the DB tables directly, and using dynamic SQL to generate statements.
Creating string UserAttribute or Key/Value columns sounds appealing at first, but it leads to the inner-platform effect where you end up having to re-implement foreign keys, data types, constraints, transactions, validation, sorting, grouping, calculations, et al. inside your RDBMS. You may as well just use flat files and not SQL at all.
SQL Server provides INFORMATION_SCHEMA tables that let you create, query, and modify table schemas at runtime. This has full type checking, constraints, transactions, calculations, and everything you need already built-in, don't reinvent it.
It's strange that so many people come up with ad-hoc solutions for this when there's a well-documented pattern for it:
Entity-Attribute-Value (EAV) Model
Two alternatives are XML and Nested Sets. XML is easier to manage but generally slow. Nested Sets usually require some type of proprietary database extension to do without making a mess, like CLR types in SQL Server 2005+. They violate first-normal form, but are nevertheless the fastest-performing solution.
Microsoft Dynamics CRM achieves this by altering the database design each time a change is made. Nasty, I think.
I would say a better option would be to consider an attribute table. Even though these are often frowned upon, it gives you the flexibility you need, and you can always create views using dynamic SQL to pivot the data out again. Just make sure you always use LEFT JOINs and FKs when creating these views, so that the Query Optimizer can do its job better.
I have seen a use of your friend's idea in a commercial accounting package. The table was split into two, first contained fields solely defined by the system, second contained fields like USER_STRING1, USER_STRING2, USER_FLOAT1 etc. The tables were linked by identity value (when a record is inserted into the main table, a record with same identity is inserted into the second one). Each table that needed user fields was split like that.
Well, whenever I need to store some unknown type in a database field, I usually store it as String, serializing it as needed, and also store the type of the data.
This way, you can have any kind of data, working with any type of database.
I would be inclined to store the measurement in the database as a string (varchar) with another column identifying the measurement type. My reasoning is that it will presumably, come from the UI as a string and casting to any other datatype may introduce a corruption before the user input get's stored.
The downside is that when you go to filter result-sets by some measurement metric you will still have to perform a casting but at least the storage and persistence mechanism is not introducing corruption.
I can't tell you the best way but I can tell you how Drupal achieves a sort of schemaless structure while still using the standard RDBMSs available today.
The general idea is that there's a schema table with a list of fields. Each row really only has two columns, the 'table':String column and the 'column':String column. For each of these columns it actually defines a whole table with just an id and the actual data for that column.
The trick really is that when you are working with the data it's never more than one join away from the bundle table that lists all the possible columns so you end up not losing as much speed as you might otherwise think. This will also allow you to expand much farther than just a few medical companies unlike the custom_ prefix you were proposing.
MySQL is very fast at returning row data for short rows with few columns. In this way this scheme ends up fairly quick while allowing you lots of flexibility.
As to search, my suggestion would be to index the page content instead of the database content. Use Solr to parse through rendered pages and hold links to the actual page instead of trying to search through the database using clever SQL.
Define two new tables: custom_exam_schema and custom_exam_data.
custom_exam_data has an exam_id column, plus an additional column for every custom attribute.
custom_exam_schema would have a row to describe how to interpret each of the columns of the custom_exam_data table. It would have columns like name, type, minValue, maxValue, etc.
So, for example, to create a custom field to track the number of fingers a person has, you would add ('fingerCount', 'number', 0, 10) to custom_exam_schema and then add a column named fingerCount to the exam table.
Someone might say it's bad to change the database schema at run time, but I'd argue that configuring these custom fields is part of set up and won't happen too often. Still, this method lets you handle changes at any time and doesn't risk messing around with your core table schemas.
lets say that your friend's database has to store data values from multiple sources such as demogrphic values, diagnosis, interventions, physionomic values, physiologic exam values, hospitalisation values etc.
He might have as well to define choices, lets say his database is missing the race and the unit staff need the race of the patient (different races are more unlikely to get some diseases), they might want to use a drop down with several choices.
I would propose to use an other table that would have these choices or would you just use a "Custom_field_choices" table, which at some point is exactly the same but with a different name.
Considering that the database :
- needs to be flexible
- that data from multiple tables can be added and be customized
- that you might want to keep the integrity of the main structure of your database for distribution and uniformity purpose
- that data MUST have a limit and alarms and warnings
- that data must have units ( 10 kg or 10 pounds) ?
- that data can have a selection of choices
- that data can be with different rights (from simple user to admin)
- that these data might be needed to generate reports without modifying the code (automation)
- that these data might be needed to make cross reference analysis within the system without modifying the code
the custom table would be my solution, modifying each table would end up being too risky.
I would store those custom fields in a table where each record ( dataType, dataValue, dataUnit ) would use in one row. So there would be a relation oneToMany from one sample to the data. You can also create a table to record all the kind of cutsom types you would use. For example:
create table DataType
(
id int primary key,
name varchar(100) not null unique
description text,
uri varchar(255) //<-- can be used for an ONTOLOGY
)
create table DataRecord
(
id int primary key,
sample_id int not null,//<-- reference to the sample
dataType_id int not null, //<-- references DataType
value varchar(100),//<-- the value as string
unit varchar(50)//<-- g, mg/ml, etc... but it could also be a link to a table describing the units just like DataType
)