design for vehicle identification number (VIN) - sql

I've designed a few Vehicle Identification Number (VIN) decoders for different OEMs. The thing about VIN numbers...despite being somewhat standardized, each OEM can overload the character position codes and treat them differently, add "extra" metadata (i.e. asterisks pointing to more data outside the VIN number), etc., etc. Despite all that, I've been able to build several different OEM VIN decoders, and now I'm trying to build a GM VIN decoder, and it is giving me a headache.
The gist of the problem is that GM treats the vehicle attributes section (position 4,5,6,7) differently depending on whether it is a truck or a car. Here is the breakdown:
GM Passenger Car VIN breakdown
GM Truck VIN breakdown
Normally what I do is design my own crude ETL process to import the data into an RDMBS - each table roughly correlates with the major VIN breakdown. For example, there will be a WMI table, EngineType table, ModelYear table, AssemblyPlant table, etc. Then I construct a View that joins on some contextual data that may or not be gleaned directly from the character codes in the VIN number itself (e.g. some vehicle types only have certain vehicle engines).
To look up a VIN is simply a matter of querying the VIEW with each major character code position breakdown of the VIN string. For example, an example VIN of 1FAFP53UX4A162757 breaks down like this in a different OEM's VIN structure:
| WMI | Restraint | LineSeriesBody | Engine | CheckDigit | Year | Plant | Seq |
| 123 | 4 | 567 | 8 | 9 | 10 | 11 | 12-17 |
---------------------------------------------------------------------------------
| 1FA | F | P53 | U | X | 4 | A | ... |
GM has thrown a wrench into this...depending on whether it is a car or truck, the character code positions mean different things.
Example of what I mean - each ASCII table below correlates somewhat to a SQL table. etc.. means there is a whole lot of other columnar data
Passenger Car
Here's an example of position 4,5 (corresponds to vehicle line/series). These really go together, the VIN source data doesn't really differentiate between position 4 and 5 despite the breakdown illustrated above.
| Code (45)| Line | Series | etc..
--------------------------------------
| GA | Buick | Lacrosse | etc..
..and position 6 corresponds to body style
| Code (6) | Style | etc..
--------------------------------------
| 1 | Coupe, 2-Door | etc..
Trucks
..but for trucks, the structure is completely different. Consider position 4 stands on its own as Grosse Vehicle Weight Restriction GVWR.
| Code (4) | GVWR | etc..
-------------------------------
| L | 6000 lbs | etc..
..and positions 5,6 (Chassis/Series) now mean something similar to position 4,5 of passenger car:
| Code (56) | Line | Series | etc..
---------------------------------------
| RV | Buick | Enclave | etc..
I'm looking for a crafty way to resolve this in the relational design. I would like to return a common structure when a VIN is decoded -- if possible (i.e. not returning a different structure for cars vs. trucks)

Based on your answer to my comment regarding if you can identify the type of vehicle by using other values, a possible approach could be to have a master table with the common fields and 2 detail tables, each one with the appropriate fields for either cars or trucks.
Approximately something like the following (here I am guessing WMI is the PK):
Master table
| WMI | Restraint | Engine | CheckDigit | Year | Plant | Seq |
| 123 | 4 | 8 | 9 | 10 | 11 | 12-17 |
Car detail table
| WMI | Veh Line | Series | Body Type |
| 123 | 2 | 3 | 4 |
Truck detail table
| WMI | GWVR | Chassis |Body Type |
| 123 | 7 | 8 | 9 |
Having this, you could use a unique select to retrieve the needed data like following:
Select *
From
(
Select M.*,
C.*,
Null GWVR,
Null Chassis,
Null Truck_Body_Type
From Master_Table M
Left Join Car_Table C
on M.WMI = C.WMI
Union
Select M.*,
Null Veh_Line,
Null Series,
Null Car_Body_Type
T.*
From Master_Table M
Left Join Truck_Table T
on M.WMI = T.WMI
)
As for DML SQL you would only need to control prior to insert or update sentences whether you have a car or a truck model.
Of course you would need to make sure that only one detail exists for each master row, either on the car detail table or on the truck detail table.
HTH

Why you do not define both of these rules for the decoding; only one will resolve a valid result.

Related

Want to update table in MS Access to add column and populate based on other column

I have a table (Fruit) in access that is in the form
Fruit, Cost
and I want to update the table so that I will have
Fruit, Cost, Cost bracket
and the cost bracket will be based on cost e.g Cost = .89 - Cost bracket='<1dollar', Cost=2 -Cost bracket ='1-5dollars' etc.
So far I have:
Alter Table [Fruit]
Add [Cost Bracket] Varchar(50)
Update [Cost Bracket]
Set [Cost Bracket] = Switch(Cost<1,'<1 dollar',Cost Between 1 and
5,'1- 5 dollars' etc...)
Rather than modifying the structure of your current table to include a cost bracket description* and then populating this additional field with fixed values dependent upon the value held by the Cost field, an alternative might be to construct a separate table containing the upper & lower bounds of the cost brackets and the corresponding descriptions.
For example, assuming that your Fruit table looks something along the lines of:
+----+-----------+-------+
| ID | fruit | cost |
+----+-----------+-------+
| 1 | Apple | £0.50 |
| 2 | Orange | £0.80 |
| 3 | Pineapple | £3.00 |
| 4 | Grape | £1.50 |
+----+-----------+-------+
You might create a Cost Brackets table with the following structure:
And populate it with the following cost bracket data:
+----+--------+--------+-------------+
| ID | lbound | ubound | description |
+----+--------+--------+-------------+
| 1 | £0.00 | £1.00 | < £1 |
| 2 | £1.01 | £2.00 | < £2 |
| 3 | £2.01 | £5.00 | < £5 |
+----+--------+--------+-------------+
Then, you can link the two using a query such as:
select f.*, c.description
from fruit f left join cost_brackets c on (f.cost between c.lbound and c.ubound)
Yielding the following result for the above sample data:
+----+-----------+-------+-------------+
| ID | fruit | cost | description |
+----+-----------+-------+-------------+
| 1 | Apple | £0.50 | < £1 |
| 2 | Orange | £0.80 | < £1 |
| 3 | Pineapple | £3.00 | < £5 |
| 4 | Grape | £1.50 | < £2 |
+----+-----------+-------+-------------+
This approach has the distinct advantage that, if you subsequently decide to change the ranges of your cost brackets and their associated descriptions, the change need only be made in one place and the values will cascade through all queries which reference the Cost Brackets table.
Whereas, with your current approach, any change to the cost brackets would involve:
Changing the hard-coded cost brackets found within the Switch functions used in every query (and hope that you've covered them all).
Update the values held by every table which contains the cost bracket description and hope that the value shown by a table has been suitably updated and reflects the current cost brackets.
*Which seems like it should be a one-time task and part of your database design, rather than an operation to be performed by code (unless perhaps you are generating the table on-the-fly?)

Correct Database Design / Relationship

Below I have shown a basic example of my proposed database tables.
I have two questions:
Categories "Engineering", "Client" and "Vendor" will have exactly the same "Disciplines", "DocType1" and "DocType2", does this mean I have to enter these 3 times over in the "Classification" table, or is there a better way? Bear in mind there is the "Vendor" category that is also covered in the classification table.
In the "Documents" table I have shown "category_id" and "classification_id", I'm not sure if the will depend on the answer to the first question, but is "category_id" necessary, or should I just be using a JOIN to allow me to filter the category based on the classification_id?
Thank you in advance.
Table: Category
id | name
---|-------------
1 | Engineering
2 | Client
3 | Vendor
4 | Commercial
Table: Discipline
id | name
---|-------------
1 | Electrical
2 | Instrumentation
3 | Proposals
Table: DocType1
id | name
---|-------------
1 | Specifications
2 | Drawings
3 | Lists
4 | Tendering
Table: Classification
id | category_id | discipline_id | doctype1_id | doctype2
---|-------------|---------------|-------------|----------
1 | 1 | 1 | 2 | 00
2 | 1 | 1 | 2 | 01
3 | 2 | 1 | 2 | 00
4 | 4 | 3 | 4 | 00
Table: Documents
id | title | doc_number | category_id | classification_id
---|-----------------|------------|-------------|-------------------
1 | Electrical Spec | 0001 | 1 | 1
2 | Electrical Spec | 0002 | 2 | 3
3 | Quotation | 0003 | 3 | 4
From what you've provided, it looks like we have three simple lookup tables: category, discipline, and doctype1. The part that's not intuitively obvious to me and may also be causing confusion on your end, is that the last two tables are both serving as cross-references of the lookup tables. The classification table in particular seems like it might be out of place. If there are only certain combinations of category, discipline, and doctype that would ever be valid, then the classification table makes sense and the right thing to do would be to look up that valid combination by way of the classification ID from the document table. If this is not the case, then you would probably just want to reference the category, discipline, and document type directly from the document table.
In your example, the need to make this distinction is illuminated by the fact that the document table has a referenc to the classification table and a references to the category table. However the row that is looked up in the classification table also references a category ID. This is not only redundant but also opens the door to the possibility of having conflicting category IDs.
I hope this helps.

What type of data structure should I use for mimicking a file-system?

The title might be worded strange, but it's probably because I don't even know if I'm asking the right question.
So essentially what I'm trying to build is a "breadcrumbish" categoricalization type system (like a file directory) where each node has a parent (except for root) and each node can contain either data or another node. This will be used for organizing email addresses in a database. I have a system right now where you can create a "group" and add email addresses to that group, but it would be very nice to add an organizational system to it.
This (in my head) is in a tree format, but I don't know what tree.
The issue I'm having is building it using MySQL. It's easy to traverse trees that are in memory, but on database, it's a bit trickier.
Image of tree: http://j.imagehost.org/0917/asdf.png
SELECT * FROM Businesses:
Tim's Hardware Store, 7-11, Kwik-E-Mart, Cub Foods, Bob's Grocery Store, CONGLOM-O
SELECT * FROM Grocery Stores:
Cub Foods, Bob's Grocery Store, CONGLOM-O
SELECT * FROM Big Grocery Stores:
CONGLOM-O
SELECT * FROM Churches:
St. Peter's Church, St. John's Church
I think this should be enough information so I can accurately describe what my goal is.
Well, there are a few patterns you could use. Which one is right depends on your needs.
Do you need to select a node and all its children? If so, then a Nested set Model (Scroll down to the heading) may be better for you. The table would look like this:
| Name | Left | Right |
| Emails | 1 | 12 |
| Business | 2 | 7 |
| Tim's | 3 | 4 |
| 7-11 | 5 | 6 |
| Churches | 8 | 11 |
| St. Pete | 9 | 10 |
So then, to find anything below a node, just do
SELECT name FROM nodes WHERE Left > *yourleftnode* AND Right < *yourrightnode*
To find everything above the node:
SELECT name FROM nodes WHERE Left < *yourleftnode* AND Right > *yourrightnode*
If you only want to query for a specific level, you could do an Adjacency List Model (Scoll down to the heading):
| Id | Name | Parent_Id |
| 1 | Email | null |
| 2 | Business | 1 |
| 3 | Tim's | 2 |
To find everything on the same level, just do:
SELECT name FROM nodes WHERE parent_id = *yourparentnode*
Of course, there's nothing stopping you from doing a hybrid approach which will let you query however you'd like for the query at hand
| Id | Name | Parent_Id | Left | Right | Path |
| 1 | Email | null | 1 | 6 | / |
| 2 | Business | 1 | 2 | 5 | /Email/ |
| 3 | Tim's | 2 | 3 | 4 | /Email/Business/ |
Really, it's just a matter of your needs...
The easiest way to do it would be something like this:
Group
- GroupID (PK)
- ParentGroupID
- GroupName
People
- PersonID (PK)
- EmailAddress
- FirstName
- LastName
GroupMembership
- GroupID (PK)
- PersonID (PK)
That should establish a structure where you can have groups that have parent groups and people that can be members of groups (or multiple groups). If a person can only be a member of one group, then get rid of the GroupMembership table and just put a GroupID on the People table.
Complex queries against this structure can get difficult though. There are other less intuitive ways to model this that make querying easier (but often make updates more difficult). If the number of groups is small, the easiest way to handle queries against this is often to load the whole tree of Groups into memory, cache it, and use that to build your queries.
As always when I see questions about modeling trees and hierarchies, my suggestion is that you get a hold of a copy of Joe Celko's book on the subject. He presents various ways to model them in a RDBMS, some of which are fairly imaginative, and he gives the pros and cons for each pattern.
Create an object Group which has a name, many email addresses, and a parent, which can be null.

How to store Goals (think RPG Quest) in SQL

Someone asked me today how they should store quest goals in a SQL database. In this context, think of an RPG. Goals could include some of the following:
Discover [Location]
Kill n [MOB Type]
Acquire n of [Object]
Achieve a [Skill] in [Skillset]
All the other things you get in RPGs
The best I could come up with is:
Quest 1-* QuestStep
QuestStep 1-* MobsToKill
QuestStep 1-* PlacesToFind
QuestStep 1-* ThingsToAcquire
QuestStep 1-* etc.
This seems a little clunky - Should they be storing a query of some description instead (or a formula or ???)
Any suggestions appreciated
User can embark on many quests.
One quest belongs to one user only (in this model).
One quest has many goals, one goal belongs to one quest only.
Each goal is one of possible goals.
A possible goal is an allowed combination of an action and an object of the action.
PossibleGoals table lists all allowed combinations of actions and objects.
Goals are ordered by StepNo within a quest.
Quantity defines how many objects should an action act upon, (kill 5 MOBs).
Object is a super-type for all possible objects.
Location, MOBType, and Skill are object sub-types, each with different properties (columns).
I would create something like this.
For the Quest table:
| ID | Title | FirstStep (Foreign key to GuestStep table) | etc.
The QuestStep table
| ID | Title | Goal (Foreign key to Goal table) | NextStep (ID of next QuestStep) | etc.
Ofcourse this is where the hard part start, how do we describe the goals? I'd say create one record for the goal in the Goal table and save each of the fields of the goal (I.E. how many mobs of what type to kill, what location to visit, etc.) in a GoalFields table, thus:
Goal table:
| ID | Type (type is one from an Enum of goal types) |
The GoalFields Table
| ID | Goal (Foreign key to goal) | Field | Value |
I understand that this can be a bit vague, so here is an example of what dat in the database could look like.
Quest table
| 0 | "Opening quest" | 0 | ...
| 1 | "Time for a Sword" | 2 | ...
QuestStep table
| 0 | "Go to the castle" | 0 | 1 | ...
| 1 | "Kill two fireflies" | 1 | NULL | ...
| 2 | "Get a sword" | 2 | NULL | ...
Goal table
| 0 | PlacesToFind |
| 1 | MobsToKill |
| 2 | ThingsToAcquire |
GoalFields table
| 0 | 0 | Place | "Castle" |
| 1 | 1 | Type | "firefly" |
| 2 | 1 | Amount | 2 |
| 3 | 2 | Type | "sword" |
| 4 | 2 | Amount | 1 |

Retrieve comma delimited data from a field

I've created a form in PHP that collects basic information. I have a list box that allows multiple items selected (i.e. Housing, rent, food, water). If multiple items are selected they are stored in a field called Needs separated by a comma.
I have created a report ordered by the persons needs. The people who only have one need are sorted correctly, but the people who have multiple are sorted exactly as the string passed to the database (i.e. housing, rent, food, water) --> which is not what I want.
Is there a way to separate the multiple values in this field using SQL to count each need instance/occurrence as 1 so that there are no comma delimitations shown in the results?
Your database is not in the first normal form. A non-normalized database will be very problematic to use and to query, as you are actually experiencing.
In general, you should be using at least the following structure. It can still be normalized further, but I hope this gets you going in the right direction:
CREATE TABLE users (
user_id int,
name varchar(100)
);
CREATE TABLE users_needs (
need varchar(100),
user_id int
);
Then you should store the data as follows:
-- TABLE: users
+---------+-------+
| user_id | name |
+---------+-------+
| 1 | joe |
| 2 | peter |
| 3 | steve |
| 4 | clint |
+---------+-------+
-- TABLE: users_needs
+---------+----------+
| need | user_id |
+---------+----------+
| housing | 1 |
| water | 1 |
| food | 1 |
| housing | 2 |
| rent | 2 |
| water | 2 |
| housing | 3 |
+---------+----------+
Note how the users_needs table is defining the relationship between one user and one or many needs (or none at all, as for user number 4.)
To normalise your database further, you should also use another table called needs, and as follows:
-- TABLE: needs
+---------+---------+
| need_id | name |
+---------+---------+
| 1 | housing |
| 2 | water |
| 3 | food |
| 4 | rent |
+---------+---------+
Then the users_needs table should just refer to a candidate key of the needs table instead of repeating the text.
-- TABLE: users_needs (instead of the previous one)
+---------+----------+
| need_id | user_id |
+---------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 1 | 2 |
| 4 | 2 |
| 2 | 2 |
| 1 | 3 |
+---------+----------+
You may also be interested in checking out the following Wikipedia article for further reading about repeating values inside columns:
Wikipedia: First normal form - Repeating groups within columns
UPDATE:
To fully answer your question, if you follow the above guidelines, sorting, counting and aggregating the data should then become straight-forward.
To sort the result-set by needs, you would be able to do the following:
SELECT users.name, needs.name
FROM users
INNER JOIN needs ON (needs.user_id = users.user_id)
ORDER BY needs.name;
You would also be able to count how many needs each user has selected, for example:
SELECT users.name, COUNT(needs.need) as number_of_needs
FROM users
LEFT JOIN needs ON (needs.user_id = users.user_id)
GROUP BY users.user_id, users.name
ORDER BY number_of_needs;
I'm a little confused by the goal. Is this a UI problem or are you just having trouble determining who has multiple needs?
The number of needs is the difference:
Len([Needs]) - Len(Replace([Needs],',','')) + 1
Can you provide more information about the Sort you're trying to accomplish?
UPDATE:
I think these Oracle-based posts may have what you're looking for: post and post. The only difference is that you would probably be better off using the method I list above to find the number of comma-delimited pieces rather than doing the translate(...) that the author suggests. Hope this helps - it's Oracle-based, but I don't see .