Normalizing DBs - Can this be normalized further? - sql

I have this mockup for a database I will be creating. I'm wondering how I can further normalize it, and so far my thoughts are breaking out date into it's own table. What would be common practice?

The answer is: probably yes. But without having an exact definition of every field, i.e. what do they mean in the context of your data model, it's hard for us to give a good answer on this.
Looking at the trips table, I'm seeing the column zip_code which looks to be out of place. zip_code field is not directly related to the primary key of the trip table (AFAICT anyway). A zip code is a property of a city. I would say that zip_code should be stored in the city table.
What you are aiming for is probably to end up in a database normalized to the third normal form (3NF). You should read up on normalization and apply the rules up to 3NF. To go further into what this entails would be duplicating numerous tutorials, courses and books. You could take this question on SO as a starting point and try to apply this to your data model.

Related

Table architecture best practice

I have a table where I am storing configurations for a tool I have. It has a ConfigID which is just an identity field, customer name, application name, then it has 18 well known fields (wellknownfield1,wellknownfield2,...,wellknownfield18) that I know what to put in based off another table values.
Now my problem comes in. I also need custom values. Currently I have a dumb solution of having customfieldname1, customfieldvalue1,...,customfieldname20, customfieldvalue20). Where the values have all the random values I need delimited by pipes. I am using a SQL Server Database. Anyone have any suggestions? Please comment if anything is unclear.
Strictly speaking, you should not put groups of values in a column. It violates the first normal form of relational data. Create a separate table called Custom Data (Config_ID, CUSTOM_NAME, CUSTOM_DATA_VALUE, CUSTOM_DATA_TYPE) and store the custom values in it.
Use another table with a foreign key. Save there all the customfieldname values you need to save. Use the ConfigID as foreign key to reference the ConfigID on main table that has the extra custom value.
There is a standard way to lay out database tables to make them easy to manage - called normalization. There are different levels of normalization - first normal form, second normal form, third normal form...and higher (above third normal form are somewhat esoteric, in my opinion).
Explanations of these definitions here :
Normalization in plain English
What are 1NF, 2NF and 3NF in database design?
It can seem quite abstract - but the point is to get rid of any ambiguity
or duplication in your database, and prevent problems further down the line.
As srini.venigalla points out - your table doesn't meet the criteria for first normal form - that every row should have the same number of data, one per DB column. Again, it might seem an abstract rule - but it's there to prevent real world problems - like, how do I parse this column value? How do I know what the separator is? What if it doesn't have enough data points? What if there are extra columns, and what are their names? All of these problems go away if you stick to one value per column.
The same is true for second normal form and third normal form - they disallow repeated values / redudancy in your database, which prevents real world problems of getting your DB in an inconsistent state.
There is debate / trade-offs about how far to normalize your database - but making everything meet third normal form seems to be an acceptable rule of thumb for a beginner.
(this is my conclusion after having to write code workarounds for my own non-1NF and non-2NF database schema)

Survey Data Model - How to avoid EAV and excessive denormalization?

My database skills are mediocre at best and I have to design a data model for survey data. I have spent some thoughts on this and right now I feel that I am stuck between some kind of EAV model and a design involving hundreds of tables, each with hundreds of columns (and thousands of records). There must be a better way to do this and I hope that the wise folks on this forum can help me.
My question is: how should I model the answers to survey questions in an RDBMS? Using SQL Server is mandatory. So alternative data storage systems should be excluded from this discussion. (Sure, some should and will be evaluated, but not here please.) I don't need a solution for the entire data model, for now I'm only interested in the Answers part.
I have already searched various forums, but I couldn't really find a solution. If it has already been given elsewhere, please excuse me and provide me with a link so I can read it up.
Some assumptions about the data I have to deal with:
Each survey consists of 1 to n questionnaires
Each questionnaire consists of 100-2,000 questions (please ignore that 2,000 questions really sound like a lot to answer...)
Questions can be of various types: multiple-choice, free text, a number (like age, income, percentages, ...)
Each survey involves 10-200 countries (These are not the respondents. The respondents are actually people in the countries.)
Depending on the type of questionnaire, each questionnaire is answered by 100-20,000 respondents per country.
A country can adapt the questionnaires for a survey, i.e. add, remove or edit questions
The data for one country is gathered in a separate database in that country. There is no possibility for online integration from the start.
The data for all countries has to be integrated later. This means for example, if a country has deleted a question, that data must somehow be derived from what they sent in order to achieve a uniform design across all countries
I will have to write the integration and cleaning software, which will need to work with every country's data
In the end the data needs to be exported to flat files, one rectangular grid per country and questionnaire.
I have already discussed this topic with people from various backgrounds and have not come to a good solution yet. I mainly got two kinds of opinions.
The domain experts, who are used to working with flat files (spreadsheet-style) for data processing and analysis vote for a denormalized structure with loads of tables and columns as I described above (1 table per country and questionnaire). This sounds terrible to me, because I learned that wide tables are to be avoided, it will be annoying to determine which columns are actually in a table when working with it, the database will become cluttered with hundreds of tables (or I even need to set up multiple databases, each with a similar yet a bit differetn design), etc.
O-O-programmers vote for a strongly "normalized" design, which would effectively lead to a central table containing all the answers from all respondents to all questions. This table would either need to contain a column of type sql_variant type or multiple answer columns with different types to store answers of different types (multiple choice, free text, ..). The former would essentially be a EAV model. I tend to follow Joe Celko here, who strongly discourages its use (he calls it OTLT or "One True Lookup Table"). The latter would imply that each row would contain null cells for the not applicable types by design.
Another alternative I could think of would be to create one table per answer type, i.e., one for multiple-choice questions, one for free text questions, etc.. That's not so generic, it would lead to a lot of union joins, I think and I would have to add a table if a new answer type is invented.
Sorry for boring you with all this text and thank you for your input!
Cheers,
Alex
PS: I asked the same question here: http://www.eggheadcafe.com/community/aspnet/13/10242616/survey-data-model--how-to-avoid-eav-and-excessive-denormalization.aspx
Well imgur is down so i'll post the pic later.
I think this is completely feasible within a relational model. I've built a CDM to show how I would do this.
Outbound
It takes 4 entities to define a Country's Survey. Some Parent Survey, the country and a list of questions. Your questions have an internal relationship so when one country "edits" a question, you can track both the question asked by the country and the question it came from. The other thing you need is a Possible Answer entity/table. Each question may have an associated list of possible answers (multiple choice or ranges etc). Those 4 should completely define the "OUTBOUND" side of this.
Inbound
The "INBOUND" side is just 2 new entities, The Respondent and the answer. The respondent is straightforward, just the demographics of that person if you know them and here you can include a relationship back to country. Each respondent answered the survey in a given country. (Person may be 1:n with Respondent if the person travels or has dual citizenship)
The answer is basic; either it is one of the choices listed in the list of Possible Answers or it is provided. Don't get all caught up in the fact that the answer may be a number, date, etc just yet. Either it's a FK or a string of characters.
Reporting
A report is a join over all of these... You'll choose a country and a survey, get the list of questions and answers.
Answer Complexity
Depends on where you want to do your calculations. If you used a Varchar2(4000) column for your user-provided answers, you could add an attribute to question to describe the datatype of the answer. Q: Age? DT: Integer Between (0 and 130). Then your integration layer can do the validation instead of the database enforcing it. Or you can have 4 columns, one for number, date, character and CLOB. And your integration layer will determine the column to use. When you report those answers out, you'll just select all four columns with Coalesce().
Is this an EAV because there's a slight ambiguity to the datatype of "Answer"
No, it's not.
AN EAV model breaks down an Entity into a list of attributes.
like so:
Entity Attribute Value
1 Fname Stephanie
1 Lname Page
1 Age 30
because you see the Answer column of the Survey schema is holding both words and numbers like the Value column does here you think that defines EAV. It does not. Just as if I added 3 datatype columns to this model it wouldn't change it FROM an EAV.
I soooo hate it when
I've had people tell me that the query I'm tuning has to go "as fast as possible". Ok, so give me a billion dollars and 30 years. "Wait, a Billion what?" "As much as", "as fast as" aren't requirements. You can validate anything you want in a database... build a shedload of Before triggers, voila! Validation galore.
What's the datatype of an age column? Or Birthdate column? Depends on what your data source is. Some older records may only have Month and Year, or just year, or 'around' or 'circa' some year. You couldn't have just a number column and do 'as much validation as possible'. and NUMBER(2) may be BETTER validation than just NUMBER. So now you'll have NUMBER(1), NUMBER(2), NUMBER... to have "as much as".
Where I think you are getting tripped up
Think of this as a Conceptual Data Model, not a Physical one. In those terms Survey is an entity. Is Question an entity or just an attribute of Survey. If you built One table PER you're clearly saying that Question is just an Attribute of Survey and storing them vertically makes this an EAV. What this model shows is that Question is actually another entity. There is a relationship between Questions, e.g. 'a country [can] edit questions'. There was the original question and edited one. Each question has a collection of possible answers. And the most important this is that, they are all questions. In an EAV I call fname, lname, bdate, age, major, salary, etc... all very disparate things, just attributes. In this case we're not including the name of the agency who originated the survey and the date it was issued and the date is due back and the etc... as questions.
Let me put this another way. You're Fedex. You want to store timestamps for certain events. Each time a package enters or leaves a facility or vehicle. Time on the picking up truck, time off the truck and into the first facility, time out of that facility and onto a plane, etc. Do you store them Horizontally? How do you know the number of hops in advance? If you store them vertically does that automatically make it an EAV? And if so why.
You're a weather company getting temps from stations around the country. Let's say the sensors are designed to send a reading when the temperature changes +/- a full degree. If you store a sensor_ID|timestamp|temp is a Reading Table is that an EAV? Each reading isn't an attribute of the sensor, they are themselves entities which belong to a collection/series.
One thing that vertical storage of answers has in common with an EAV is its difficulty in performing analytic queries. If you wanted a list of all the people who answered TRUE to question 5 and 10 but FALSE to 6 and 11 would be very difficult when done vertically. Maybe that's why you see this an EAV. If you want to do that, you need a different storage. The relational storage of the question and answers isn't the best reporting database. Let's go back to the Fedex example. It's not simple to do "transit" time reporting when the rows are vertical.
This sounds like you are wrestling with a common problem: how to use a hammer to fasten a screw.
Both alternatives you listed are bad, each for different reasons. But that's because you are trying to stuff your particular data model into a relational database system. A good approach would be to look beyond the relational database at some other database/storage systems, try a couple out, and find the best fit for your project.
I have tried the EAV model and gave up because it was far too complex, and I am afraid to try the multi-tables model with a relational database system. The easiest solution I have found with a relational database is: store each complete response as a single CLOB, serialized into JSON or YAML (or something else lightweight), in a responses table.
create table responses (
id uuid primary key,
questionnaire_id uuid references questionnaires.id,
data text
)
If I was using SQL Server, Express will be OK, then I would do this:
Table with list of questions, flags
for type (bit), if required flag
(bit), the correct answer if exists,
etc
Table with list of countries
Table linking of countries and
questions (some countries may not get some questions
Table for answers with columns for
the question(s) and a xml
column for the optional questions
including those which are added
If you are not versed in shredding XML then use sparse columns for all the optional questions. I do not recall exactly the limit on the number of sparse columns in a table but I believe it is above 30,000. SQL Server internally stores sparse columns as XML and will shred it when one selects the column and yes it can be indexed
The diagram below show a diagram created with SQL Server. the column AL_A4 will hold the answer to QL_Id = 4 and is of type sparse. The QL_Id in the QuestionList table is not flagged required letting you know to make the column in AnswerList sparse.
Since countries will add questions create QuestionListCustom, QuestiontoCountryCustom and AnswerListCustom tables and add the information from the custom questions.
I am sure there are other ways to design the storage, this is the way I would turn in the homework, if this is not homework then you surely work for the UN.
Have you considered not reinventing the wheel? There are open source survey applications already built. Even if they don't meet your needs, download a few and check out their data models.

Sql design question - many tables or not?

15 ECTS credits worth of database design down the bin.. I really can't come up with the best design solution for my problem.
Which is this: Basically I'm making a tool that gathers a lot of information concerning the user. At the most the user would fill in 50 fields of data, ranging from simple checkboxes to text input. I'm designing the db right now (with mySql) and can't decide whether or not to use a single User table with all of those fields, or to have a table for each category of input.
One example would be "type of payment". This one has three options and if I went with the "table" way I would add a table paymentType and give it binary fields for each payment type. Then I would need and id table to identify which paymentType the user has chosen whereas if I use a single user table, the data would already be there.
The site will probably see a lot of users (tv, internet and radio marketing) so I'm concerned which alternative would be the best.
I'll be happy to provide more details if you need more to base a decision.
Thanks for reading.
Read this article "Database Normalization Basics", and come back here if you still have questions. It should help a lot.
The most fundamental idea behind these decisions, as you will see in this article, is that each table should represent one and only one "thing", and each field should relate directly and only to that thing.
In your payment types example, it probably makes sense to break it out into a separate table if you anticipate the need to store additional information about each payment type.
Create your "Type of Payment" table; there's no real question there. That's proper normalization and the power behind using relational databases. One of the many reasons to do so is the ability to update a Type of Payment record and not have to touch the related data in your users table. Your join between the two tables will allow your app to see the updated type of payment info by changing it in just the 1 place.
Regarding your other fields, they may not be as clear cut. The question to ask yourself about each field is "does this field relate only to a user or does it have meaning and possible use in its own right?". If you can never imagine a field having meaning outside of the context of a user you're safe leaving it as a field on the user table, otherwise do the primary key-foreign key relationship and put the information in its own table.
If you are building a form with variable inputs, I wouldn't recommend building it as one table. This is inflexible and dirty.
Normalization is the key, though if you end up with a key/value setup, or effectively a scalar type implementation across many tables and can't cache:
a) the form definition from table data and
b) the joined result of storage (either a caching view or otherwise)
c) or don't build in proper sharding
Then you may hit a performance boundary.
In this KVP setup, you might want to look at something like CouchDB or a less table-driven storage format.
You may also want to look at trickier setups such as serialized object storage and cache-tables if your internal data is heavily relative to other data already in the database
50 columns is a lot. Have you considered a table that stores values like a property sheet? This would only be useful if you didn't need to regularly query the values it contains.
INSERT INTO UserProperty(UserID, Name, Value)
VALUES(1, 'PaymentType', 'Visa')
INSERT INTO UserProperty(UserID, Name, Value)
VALUES(1, 'TrafficSource', 'TV')
I think I figured out a great way of solving this. Thanks to a friend of mine for suggesting this!
I have three tables, Field {IdField, FieldName, FieldType}, FieldInput {IdInput, IdField, IdUser} and User { IdUser, UserName... etc }
This way it becomes very easy to see what a user has answered, the solution is somewhat scalable and it provides a good overview. I will constrain the alternatives in another layer, farther away from the db. I believe it's a tradeoff worth doing.
Any suggestions or critics to this solution?

What should I name a table that maps two tables together? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Let's say I have two tables:
Table: Color
Columns: Id, ColorName, ColorCode
Table: Shape
Columns: Id, ShapeName, VertexList
What should I call the table that maps color to shape?
Table: ???
Columns: ColorId, ShapeId
There are only two hard things in
Computer Science: cache invalidation
and naming things-- Phil Karlton
Coming up with a good name for a table that represents a many-to-many relationship makes the relationship easier to read and understand. Sometimes finding a great name is not trivial but usually it is worth to spend some time thinking about.
An example: Reader and Newspaper.
A Newspaper has many Readers and a Reader has many Newspapers
You could call the relationship NewspaperReader but a name like Subscription might convey better what the table is about.
The name Subscription also is more idiomatic in case you want to map the table to objects later on.
The convention for naming many-to-many tables is a concatenation of the names of both tables that are involved in the relation. ColourShape would be a sensible default in your case. That said, I think Nick D came up with two great suggestions: Style and Texture.
How about ColorShapeMap or Style or Texture.
Interesting about half of the answers give a general term for any table that implements a many-to-many relationship, and the other half of the answers suggest a name for this specific table.
I called these tables intersections tables generally.
In terms of naming conventions, most people give a name that is an amalgam of the two tables in the many-to-many relationship. So in this case, "ColorShape" or "ShapeColor." But I find this looks artificial and awkward.
Joe Celko recommends in his book "SQL Programming Style" to name these tables in some natural language manner. For instance, if a Shape is colored by a Color, then name the table ColoredBy. Then you could have a diagram that more or less reads naturally like this:
Shape <-- ColoredBy --> Color
Conversely, you could say a Color colors a Shape:
Color <-- Colors --> Shape
But this looks like the middle table is the same thing as Color with a plural naming convention. Too confusing.
Probably most clear to use the ColoredBy naming convention. Interesting that using the passive voice makes the naming convention more clear.
Name the table whatever you like, as long as it is informative:
COLOR_SHAPE_XREF
From a model perspective, the table is called a join/corrollary/cross reference table. I've kept the habit of using _XREF at the end to make the relationship obvious.
A mapping table is what this is usually called.
ColorToShape
ColorToShapeMap
This is an Associative Entity and is quite often significant in its own right.
For example, a many to many relationship between TRAINS and TIMES gives rise to a TIMETABLE.
If there's no obvious new entity (such as timetable) then the convention is to run the two words together, giving COLOUR_SHAPE or similar.
I've worked with DBAs that call it a join table.
Colour_Shape is fairly typical - unless the relationship has an explicit domain-specific name.
Junction table
OR Bridge Table
OR Join Table
OR Map Table
OR Link Table
OR Cross-Reference Table
This comes into use when we go for many-to-many relationships where the keys from both the tables forms the composite primary key of the junction table.
I recommend using a combination of the names of entities and put them in the plural. Thus the name of the table will express connection "many-to-many".
In your case:
Color + Shape = ColorsShapes
I usually hear that called a Junction Table. I name the table by what it joins, so in your case either ColorShape, or ShapeColor. I think it makes more sense for a Shape to have a color than for a Color to have a shape, so I would go with ShapeColor.
Intermediate Table or a Join Table
I would name it "ColorShapes" or "ColorShape", depending on your preference
I've also heard the term Associative table used.
a name for your table might be ColorShapeAssociations meaning that each row represents an association between that color and that shape. The existence of a row implies that the color comes in that shape, and that the shape comes in that color. All rows with a specific color would be the set of all shapes the color is associated with, and the rows for a specific shape would be the set of all colors that shape came in...
In general most databases have some sort of naming convention for indexes, primary key and so forth. In PostgreSQL the following naming has been suggested:
primary key: tablename_columnname_pkey
unique constraint: tablename_columnname_key
exclusive constraint: tablename_columnname_excl
index for other purposes: tablename_columnname_idx
foreign key: tablename_columnname_fkey
sequence: tablename_columnname_seq
triggers: tablename_actionname_after|before_trig
Your table is a linked table to me. To stay in line with the naming above I would choose the following:
linked table: tablename1_tablename2_lnk
In a list of table objects the linked table will be after tablename1. This might be visually more appealing. But you could also choose a name that describes the purpose of the link like others have suggested. This might help to keep the name of the id column short (if your link must have its own named id and is referenced in other tables).
or liked table: purposename_lnk
A convention I see a lot for joining tables that I personally like is 'Colour_v_Shape', which I've heard folk refer to colloquially as 'versus tables'.
It makes it very clear at a glance that the table represents a many-to-many relationship, and helps avoid that (albeit rare) confusing situation when you try to concatenate two words that might otherwise form a compound word, for example 'Butter' and 'Milk' may become 'ButterMilk', but what if you also needed to represent an entity called 'Buttermilk'?
Doing it this way, you'd have 'Butter_v_Milk' and 'Buttermilk' - no confusion.
Also, I like to think there's a Foo Fighters reference in the original question.
"Many-Many" table. I'd call it "ColourShape" or vice versa.
I've always been partial to the term "Hamburger Table". Don't know why - it just sounds good.
Oh, and I would call the table ShapeColor or ColorShape depending on which is the more commonly used table.
It's hard to answer something as arbitrary as this, but I tend to prefer tosh's idea of naming it after something in the actual domain instead of some generic description of the underlying relationships.
Quite often this sort of table will evolve into something richer for the domain model and will take on additional attributes above and beyond the linked foreign keys.
For example, what if you need to store a texture in addition to color? It might seem a bit funky to expand the SHAPE_COLOR table to hold its texture.
On the other hand, there's also something to be said for making a well-informed decision based on what requirements you have today and being prepared to refactor when additional requirements are introduced later.
All that said, I would call it SURFACE if I had insight that there would be additional surface-like properties introduced later. If not, I'd have no problems calling it SHAPE_COLOR or something of the sort and moving on to more pressing design problems.
Maybe just ColoredShape?
I'm not sure I get the question. Is this about this specific case or are you looking for general guidelines?
I would name it with the exact names of the tables being joined = ColorShape.
In adiction to what Developer Art has related,
ColorShape
would be a usual naming convention. In ER diagram, it would be a relation.
Call it a cross reference table.
XREF_COLOR_SHAPE
(
XCS_ID INTEGER
C_ID INTEGER
S_ID INTEGER
)
I'd use r_shape_colors or r_shape_color depending on its meaning.
r_ would a replacement for xref_ in this case.
My vote is for a name that describes the table best. In this case it might be ShapeColor but in many cases a name different from a concatenation is better. I like readability and for me, that means no suffixes, no underscores and no prefixes.
I would personally go for Colour_Shape, with the underscore: just because I have seen this convention turn up quite a bit. [but agree with the other posts here that there are probably more 'poetic' ways of doing this].
Bear in mind that the foreign keys should also be built on this join table which would reference both the Colour & Shape tables which would also help with identifying the relationship.

Optimal DB structure for additional fields entity

I have a table in a DB (Postgres based), which acts like a superclass in object-oriented programming. It has a column 'type' which determines, which additional columns should be present in the table (sub-class properties). But I don't want the table to include all possible columns (all properties of all possible types).
So I decided to make a table, containg the 'key' and 'value' columns (i.e. 'filename' = '/file', or 'some_value' = '5'), which contain any possible property of the object, not included in the superclass table. And also made one related table to contain the available 'key' values.
But there is a problem with such architecture - the 'value' column should be of a string data type by default, to be able to contain anything. But I don't think converting to and from strings is a good decision. What is the best way to bypass this limitation?
The design you're experimenting with is a variation of Entity-Attribute-Value, and it comes with a whole lot of problems and inefficiencies. It's not a good solution for what you're doing, except as a last resort.
What could be a better solution is what fallen888 describes: create a "subtype" table for each of your subtypes. This is okay if you have a finite number of subtypes, which sounds like what you have. Then your subtype-specific attributes can have data types, and also a NOT NULL constraint if appropriate, which is impossible if you use the EAV design.
One remaining weakness of the subtype-table design is that you can't enforce that a row exists in the subtype table just because the main row in the superclass table says it should. But that's a milder weakness than those introduced by the EAV design.
edit: Regarding your additional information about comments-to-any-entity, yes this is a pretty common pattern. Beware of a broken solution called "polymorphic association" which is a technique many people use in this situation.
How about this instead... each sub-type gets its own DB table. And the base/super table just has a varchar column that holds the name of the sub-type DB table. Then you can have something like this...
Entity
------
ID
Name
Type
SubTypeName (value of this column will be 'Dog')
Dog
---
VetName
VetNumber
etc
If you don't want your (sub-)table names to be varchar values in the base table, you can also just have a SubType table whose primary key will be in the base table.
The only workaround (while retaining your strucure) is to have separate tables:
create table IntProps(...);
create table StringProps(...);
create table CurrencyProps(...);
But I do not think that this is a good idea...
One common approach is having the key-value table contain multiple columns, one for each data type, i.e. StringValue, DecimalValue, etc.
Just know you're trading queryability and performance for a database schema you don't need to change. You could also consider ORM mapping or an object database.
You could have a per type key/value table. The available table would need to encode the availability of a specific key/type pair to point to the correctly typed key/value table.
This seems like a highly inefficient architecture in for a row based relational databases however.
Perhaps you should take a look at a column oriented relational database?
Thanks for the answers. I'll explain a little bit more specifically what i need.
There's a need to program a blog+forum website, and I've been looking at the WordPress DB structure.
There's a strong need for the ability to place comments to any kind of 'object', like a blog entry, or a video file attachment to it. The above DB structure being very easy to scale and to fulfill all our needs was the reason of its choice.
But that's not late to change it, cause this is in stage of early engineering. Also our model smells now like a completely tree-hierarchy based DB. For now I'll accept Bill Karwin's and fallen888 answers, but maybe I'm going in a totally wrong direction?
about the user being able to add a new field to the table:
I admire all these people making comments.
I used to be interested in this kind of thing a few years ago, but have written little code recently (apart from a little bit of PHP and MYSQL).
I think it's fine if you want to keep going - you may end up with something new.
Sorry to pour any cold water on the scheme - I admire your efforts. My personal belief is that if you go far enough in this direction, you will end up with a system that interprets more of natural language than SQL does. (Around 1970, SQL was actually spelt Sequel, and it actually stood for "structured english query language", but after they standardized it in the 1970's - I think someone said that Oracle was the first commercial implementation, 19079, the "English" got dropped off, because I guess they decided that it was only a tiny subset of English.
I have run out of steam in this area, because I haven't got a job. Without an easy job that pays the bills, where I can experiment with these ideas, it's a bit hard to concentrate on this area.
Best wishes to all.
sorry, I wrote 19079 above, I meant the year 1979. Oracle got their first contract writing a database for the CIA.