ER diagram relationship and Bridge Tables - orm

I have to design a database for buses timetables.
Entities:
Bus (idBus*)
Stop (idStop*,stopDescription)
Line (idLine*,lineDescription)
Position (lat,lon)
Some constraints are the following:
Multiple Buses may operate for one Line (therefore BUS:LINE = N:1)
One Line has many Stops , and from one Stop are passing many Lines (therefore STOP:LINE = N:N)
One Bus passes from many Stops and vice versa (therefore BUS:STOP = N:N)
A Stop has One Position (therefore STOP:POSITION = N:N)
A Bus has multiple Positions (therefore BUS:POSITION = 1:N)
E-R DIAGRAM
An example of modelling would be a bridging table for the STOP-POSITION relationship that would look like this:
STOP_POSITION(idStop*,lat,lon) whereas idStop is the Foreign Key.
In general:
If i have an idBus i would like to be able to get the associated idLine.
If i have an idBus and an idStop i would like to have info on the itinerary of the Bus (which is the next stop , time of arrival, direction)
If i have an idBus and an idLine i would like to get the itinerary of the Bus(all the Stops from where the Bus will pass and their order)
Questions
The problem arise when considering the BUS-STOP relationship, because when i consider to know the id of the Stop and the id of the Bus then i will know a number of attributes like Direction,ID of NextStop, TimeOfArrival..
How should i model those attributes?
For example, every Bus is passing from multiple Stops and the progression is denoted by an attribute (e.g progressiveStop). How should i model this attribute?
Does it really make sense modelling the association of LINE-STOP?
Does it really make sense storing dynamic data in the database? I am referring to the BUS-STOP relationship.

Related

Database design solution

I am having the following case:
There is entity 'Master_Entity'. This entity has properties as name, type, duration etc. There are other two type of entities 'Entity' and 'Sub-Entity'. There are identical as the 'Master_Entity' (They have absolutely the same properties).
At the end the 'Master_Entity' should hold a collection of 'Entity' and 'Entity' should hold a collection of 'Sub-Entity'. The tricky part is that records of type 'Entity' can be part of different 'Master_Entity' (same for 'Sub-Entity'), but they can have different values for duration for example. How can achieve such modularity?
Here I came up with, but it's not quite do the work. May you guys help me with this.
Edit: Imagine this as some sort of a work tracker. For example you have a 'Create PHP App' (Master entity). This entity contains duration of how long it will take to finish this job. In addition it contains a entity 'Writing Code' (Entity) and this one can be divided to 'Writing Http Client' (Sub-Entity) which has duration property which is specific for this job.
On other side you might have other job: 'Create an Java App' (Master Entity) which will contain the same 'Writing Code' entity, but with duration which will have different value, because of the context of the Application you are building.
I want to have a single record 'Writing Code', but the duration value that it have should be different for every job it's assigned to. How can achieve that with creating a minimum duplicating records of type 'Entity'?
It sounds like something like these 3 tables will work for you:
Entity
* Id
* Name
* Type
EntityGroup
* Id
* Name
* ParentEntityGroupId
* ParentEntityId
EntityRelationship
* Id
* EntityId
* ParentEntityId
* EntityGroupId
With this structure, you can have an Entity be a member of a Group, or a solo Child of another Entity. You can also have a Group be a Child of an Entity, or even a Child of another Group. Without knowing specifics of your data, it's hard to know what you might need, but this should get you started.
From what you have said, it seems that you don't need EAV at all because you don't have different properties for each item just different values. And thus you should not be using it.
What you need is a combination of lookup tables and then tables that address the actual tracking history of the work. This is because this is time sensitive data. The tasks at the time the projects was created may be substantially different than the tasks associated with that task group two years from now, but you need to record the tasks at the the time of creation). Note that this is not denormalizing, it is creating a picture of data in time. The real duration always goes to teh project not ever to the Task. In the task, you can have a suggested duration to use as a starting point. I used a similar design (with far more fields of course) to design a database for building sales proposals for technical-hardware related projects. The real key here is to recognize what data needs to be stored as a point time time and what is lookup data used to build the final project data. If someone adds a new task to the "Create a Java App" group, you don't want to change details about projects already completed or in work, only new projects.
So you need:
Task group
Task Group ID
Task Group Name
Task
TaskID
Task Name
SuggestedDuration (can be null if you have tasks that are always different
but filled in for tasks that usually have a similar duration)
Task_Taskgroup
TaskID
TaskGroupID
Project
ProjectID
ProjectName
TaskGroupID
ProjectTask (should be filled in automatically when the task group is
chosen for the project)
ProjectID
Task ID
EstimatedDuration (fills in the default value, but can be changed
by the person creating the work project)
ActualDuration (Field in after the task is done, can be used by an
analyst to create more reflective task default duration values)
Of course each of these tables may have other fields depending on the need.

Optaplanner VRP with Pickups Before DropOffs

I am working on using Optaplanner to solve the following a complex vrp problem with many requirements. I was able to handle most of them except for the following 2 aspects.
Pickups before DropOffs only
Enforce a specific path on the way to pickup customers.
The goal is to pickup a group of customers who are going to destinations that close together and put them in the same vehicle.
Thanks in Advance! I appreciate the help!
The Problem is very similar to the example VRP TimeWindow example but with the following changes.
Customers will be picked up at fixed locations (in a circuit)
Every pickup Customer will have a drop off destination (outside of
Circuit)
The vehicle will not head to a drop-Off then come back to pickup
again. (Once vehicle leaves circuit all it does is drop-Off its
customers at set location)
The Vehicle moving in the circuit has to move in a specific path
(imagine a 1 way street)
Planning on Using Road Distances with the Score between each Pickup-to-Pickup is Known. Pickup -> Drop-Off is not known (Planning on using Air).
I'm having a hard time in enforcing that after leaving the circuit to drop-Off customers a vehicle may not come back to pickup more customers, and having this work with the fixed path a vehicle can make in the circuit.
My main idea was to do the following.
Added a TYPE attribute to the customer to differentiate between pickup & customer
Added a variable listener to the customer class that keep track of all the DropOffIds currently when a vehicle arrives to it so that it only goes to a dropOffLocation if it has a passenger heading to that place. When a vehicle arrives to a dropOff it removes that item from the list. (Essentially serves as a stack).
The problem is theoretically this isn't stopping from a vehicle picking up a customer dropping him off then picking up another, if the customers locations are relatively close.
Also having a hardtime enforcing a fixed route a vehicle must take in a circuit, was planning on using a Cost Matrix to use the soft constraint to enforce the route implicitity(A vehicle wont go backwards or a skip a point as the cost would be too high), but not working the way it should be.
I might consider a domain model like this:
#PlanningEntity
class Pickup implements PickupOrVehicle {
Customer customer;
#PlanningVariable
PickupOrVehicle previousPickup;
#PlanningVariable
int dropOffPriority;
}
#PlanningEntity // Shadow entity
class Vehicle implements PickupOrVehicle {
...
#ShadowVariable(based on dropOffPriority and previousPickup)
List<Customer> dropOffOrderList;
// For consistency we might also add pickUpOrderList
}
That dropOffPriority should either be globally unique (by initializing it uniquely and only configure SwapMoves for that variable.
Or otherwise, the VariableListener should just order 2 assignments with the same dropOffPriority by their customer's id (because the ordering must be deterministic!).
No sure if it will work well. If you do try it out, do let us know here if it works well or not.

Linking two seperate sets of data codes without a common identifier

I have two large sets of data. Both sets are a form of structured coding system,and is used to categorize groups of people based on their occupation. The two sets of data have no common identifier. Besides a column that contains a unique identifier each table has a description for said identifier, but although they may be describing similar things the descriptions are not identical.
How do I create a table, that connects the two sets of data, without having to go back and manually try to figure out how to make the connection between the two identifiers. I am not sure if this can be done on Access or SQL. If there is a way to do this, I would like to know what software is maybe out there.
Here's some example data:
Table 1:
Z Identifier DescriptionA
162000 Pharmacist
3123566 Electronic Repairman
143246 Banker
8444455 Doctor
Table 2:
Q Identifier DescriptionB
XX134556 COPY/PRINT/SCAN EQUIP
666Q1224 DRUGS
722WWYZ Financial Svc
8456435T Medical Services
15666PP Health Services
Desired Output:
Table 3:
Z Identifier DescriptionA Q Identifier DescriptionB
162000 Pharmacist 666Q1224 DRUGS
3123566 Electr Repairman XX134556 COPY/PRINT/SCAN EQUIP
143246 Banker 722WWYZ Financial Svc
8444455 Doctor 8456435T Medical Services
Table 1:
Z Identifier DescriptionA
162000 Pharmacist
3123566 Electronic Repairman
143246 Banker
8444455 Doctor
Table 2:
Q Identifier DescriptionB
XX134556 COPY/PRINT/SCAN EQUIP
666Q1224 DRUGS
722WWYZ Financial Svc
8456435T Medical Services
15666PP Health Services
Output:
Z Identifier DescriptionA Q Identifier DescriptionB
162000 Pharmacist 666Q1224 DRUGS
3123566 Electr Repairman XX134556 COPY/PRINT/SCAN EQUIP
143246 Banker 722WWYZ Financial Svc
8444455 Doctor 8456435T Medical Services
Conventional tools that you are used to (like Access, Excel, and SQL) can only go so far with comparing the meaning and usage of words.
In other words (forgive the pun), in order to do this, you need some sort of natural language processing toolkit (NLPT). Along with that, you also need some knowledge of how to program, because I don't think there exists front-end interfaces that can give you the output you want given only the input you listed by just filling out some forms.
So with that in mind, in order to solve your problem (I'll assume you know how to program and can pick up a NLPT in a language of your choice), you need to do the following:
Put your two datasets in some tables.
Manipulate DescriptionA and DescriptionB to be something meaningful to the NLPT you are using. They won't like a string such as "COPY/PRINT/SCAN/ EQUIP". They'll want the slashes removed and the words separated.
Compare DescriptionA with DescriptionB in a permutation-style manner by using a path_similarity type of function in the library. For example path_similarity('animal.definition1', 'dog.definition1') should return a high value, say .60, while path_similarity('animal.definition1', 'book.definition1') should return a low value, like .10.
If the path_similarity is above a certain value (up for you to decide), join the two items together and append them as a single row to a results table, while removing them from their respective tables. Continue doing this until the list is exhausted of DescriptionA greater than a certain similarity to a DescriptionB. Then do something else with the rows that are left in Table 1 and Table 2.
This should all be fairly easy to do programmatically. You may find you are not getting proper matches in some places with this method because you are randomly choosing two words to compare. Because of that, you may want to find another algorithm other than just permutations, perhaps one that looks at the statistics of the path_similarity of every piece of your data to every other piece and acts more appropriately.
Additionally, you may want to allow more than two words to be paired up. For example; "lumberjack", "tree cutter", and "tree chopper" make more sense to be grouped in one row with an additional two columns created than to throw one of them out who will likely be left without a pair. All of the problems I just listed in this paragraph, I'm sure are not new problems and you can search around the internet in order to solve them. Best of luck!

How to name my enum elements?

I have a problem naming the elements in my application's data model.
In the application, the user has the possibility to create his own metamodel. He does so by creating entity types and a type defines which properties an entity has. However, there are three kinds of entity types:
There is always exactly one instance of the type.
For instance, I want to model the company I am working for. It has a name, a share price and a number of employees. These values change over time, but there is always exactly one company.
There are different instances of the type, each is unique.
Example: Cities. A city has a name and a population count, there are different cities and each city exists exactly once.
Each instance of the type defines multiple entities.
Example: Cars. A car has a color and a manufacturer. But there is not only one red mercedes. And even though they are similar, red mercedes #1 is different from red mercedes #2.
So lets say you are a user of this tool and you understood the concept of these three flavors. You want to create a new entity type and are prompted to choose between option 1, 2 and 3. How would you name these options?
Edit:
Documentation and help is available to the user. Also the user can be expecteted to have a technical/programming background, so understanding these three concepts should be no problem.
First of all let me make sure I understand the problem,
Here's what you have (correct me if I'm wrong):
#of instances , is/are Unique
(1,true)
(n,true)
(n,false)
If so,
for #of instances I would use single \ plural
for is\are unique (\ not unique) I would use unique \ ununique.
so you'll get:
singleUnique
pluralUnique
pluralUnunique
That's the best I could think of.. I don't know exactly who are your users and what is the environment, But if you have an option of adding tips (or documentation) that should be used for sure.

I am designing a bus timetable using SQL. Each bus route has multiple stops, do I need a different table for each route?

I am trying to come up with the most efficient database as possible. My bus routes all have about 10 stops. The bus starts at number one until it reaches the 10th stop, then it comes back again. This cycle happens 3 times a day.
I am really stuck as to how I can efficiently generate the times for the buses and where I should store the stops. If I put all the stops in one field and the times in another, the database won't be very dynamic.
If I store all the stops one by one in a column and then the times in another column, there will be a lot of repeating happening further down as one stop has multiple times.
Maybe I am missing something, I've only just started learning SQL and this is a task we have been set.
Thanks in advance.
You will need one table that contains your Timetable:
Route ID
Stop ID
Time
Possibly other fields as needed (direction, sequence #'s, Block #, etc)
I would recommend creating separate tables Bus Stop (to store stop names, lat/longs, etc) and Route (to store route name, first stop, last stop, direction, etc).
You are probably aware of this already, but bus scheduling can get complicated very quickly. For example:
You may need to designate certain stops as "Time Points" which show up in the printed schedules
Each route may have multiple variations. For example, some versions may start or end at a different bus stop
The schedule will probably be different on Saturday and Sunday, and most agencies change their schedules quarterly
You may need to consider some of these cases, and build them into your schema.
Does that help?
Here's just one (of the many) ways to do this:
It sounds like you probably want to have a routes table, which describes each route, and has a start time.
Then, a stops table with descriptions and wait times for the bus at each stop.
A stopDistanceMapping table would describe the distance between two stops, and the drive time between them.
Finally, your routeMap table will link individual routes with a list of stops. You can then fill your routes table distance and time in using the wait time from each individual stop, and the times/distances from stopDistanceMapping.
Good luck!
On a (very rough) 1st pass, I would keep the bus route times in a table like this:
RouteID StartingLocationID EndingLocationID TravelTime
Also I would keep a table of stops such as:
StopID Address City etc... (whatever other information you need about each location)
For the routes themselves I would store:
RouteID StartingLocationID RouteStartTime
Obviously you should tailor this to your own needs, but this should give you a place to start.