Hi I just want to sort out these fields to a table. But I'm having trouble.
I have details of cities.
These are main city detail examples.
CityName CityID
Colombo 001
Kandy 002
Kandy is Directly connected to these cities
CityName CityID DistancefromKandy
Katugastota 006 1km
Peadeniya 008 2km
I want to store distance between every city. As an example Katugastota to Peradeniya, Colombo to Katugastota, Colombo to Kandy and Kandy To Peradeniya too.
And for a singe city I want to store what are the directly connected cities and the distance to those cities.
How to sort this data to tables..?Any ideas.. I have given the table structure I tried but I cant add the distance between each city in to that and the directly connected cities and distance to directly connected cities in this..
Appreciate any help in this..
I just don't need the sql, if someone can suggest a better table design that would be a great help.
Like #EvilEpidemic suggested above, my first choice would be to store coordinates (latitude & longitude) for each city and calculate the distances between them.
That said, if you need to store your pre-calculated distances for specific pairs of cities, then you may want to try the following:
Add a table that includes two (2) CityID columns (for example, SourceCityId and DestinationCityId) as well as a NOT NULL distance column (of a numeric data type).
For example, in SQL Server you might have a table like (this oversimplified example assumes you store distances as int kilometers, but feel free to change the data type as needed):
CREATE TABLE Distances (
[SourceCityId] NOT NULL int,
[DestinationCityId] NOT NULL int,
[DistanceInKilometers] NOT NULL int,
CONSTRAINT [PK_Distances] PRIMARY KEY CLUSTERED (
[SourceCityId] ASC,
[DestinationCityId] ASC
)
)
Related
I want to represent a vehicle (think car or truck) in a database. I have up to 62 pieces of information I'd like to store for each. Examples: year, make, model, drive type, brake system, Mfr. body code, steering type, wheel base, etc. The information are Ids which reference a 3rd party database which provides the labels for each Id. The provider has 1 table to list all makes, 1 table to list all "Steering types", etc.
All vehicles will populate the year, make, and model columns. Almost no record (if any) will populate more than 10 columns. But if I looked at all vehicles, then every column would be used by at least one record.
One approach would be to have a single table that has 62 columns. Again most records will have NULL values in most columns.
Alternatively I can do something like this (ignoring indices and primary key for sake of example):
create table vehicles (
id identity(1,1) int,
year int,
make int,
model int
)
create table constraints (
id identity(1,1) int,
vehicleId int, -- foreign key to vehicles.id
constraintTypeId int, -- foreign key to constraintTypes.id
value int
)
create table constraintTypes (
id identity(1,1) int,
name nvarchar(200) -- Example: "wheel base", "brake system" etc
)
With this second method if a vehicle only stores 2 pieces of information (aside from year, make, model), then it would have 2 records in table constraints.
Users wish to have a page to view all applications. If I have a table with 62 columns I'd need 62 joins in the query to get the labels. I could store labels on the vehicle to make retrieval faster, but than when labels change in the source data it might be slow to update my vehicles table.
At current there are over 12 million vehicle records, and the source data changes monthly (additions, deletes, and a few label changes).
Is it better a better design to have more columns, even if most are always just NULL. Or is the second approach better? How does one even calculate the best approach? Even if I had 62 columns they are all valid to every vehicle, but for cataloging purposes most are left empty. For example if a record should match any "1999 Dodge Viper" (regardless of steering type, or body style, etc) the user doesn't want to have to populate all 62 columns, they want to just see one record for "1999 Dodge Viper".
Your question is a specific case of the general issue related to data anomalys and normalisation. https://en.wikipedia.org/wiki/Database_normalization
And there is no 'right' answer although experience suggests there are 'better' and 'worse' answers. So a question to help you with your planning.
Will the requirements ever change? E.g. will someone one day want to
record the brake shoe type, or drivers seat type? If yes what are the
implications of your 62 column table becoming a 63 (or 99) column
table. (In my mind this leads me towards your second method)
Also remember thanks to Views the presentation of the data, even in the DB, does not have to match it storage. E.g. you can have well normalised tables and a view to show users 62 (or 63 or 99) columns.
So I am an SQL noobie, and I would like to organize a structured database that has metadata and data in SQLite. I am not sure how to do this, and I have looked around at different internet sites but I haven't found anything helpful.
Basically what I would want is something like this (using different data collection stations as an example):
SQL TABLE:
Location
Lat
Long
Other Important info about the station
And then somehow when I query this table and want to see info about specific stations data I would be able to pull up the data that would look something like this:
datetime data
1/1/1980 11.6985
1/2/1980 43.6431
1/3/1980 54.9089
1/4/1980 63.1225
1/5/1980 72.4399
1/6/1980 79.1363
1/7/1980 82.2778
1/8/1980 86.0785
1/9/1980 86.8612
1/10/1980 84.3342
1/11/1980 80.4646
1/12/1980 77.1508
1/13/1980 74.827
1/14/1980 73.387
1/15/1980 72.1774
1/16/1980 71.6423
Since I don't know much about table hierarchy, I don't know how to do this, but I feel like it is probably possible. Any help would be appreciated!
using different data collection stations
Immediately indicates that a separate table for stations should be used and that the readings should relate to/associate with/reference the stations table.
For the stations table you could have something like :-
CREATE TABLE IF NOT EXISTS stations (id INTEGER PRIMARY KEY, station_name TEXT, station_latitude REAL, station_longitude REAL);
This will create a table (if it doesn't already exist) that has 4 columns :-
The first column id is a unique identifier that will be generated automatically and is what you would use to reference a specific station.
The second column, station_name is for the name of the station an is of type TEXT.
The third and fourth columns are for the stations location according to lat and long.
You could add a couple of stations using :-
INSERT INTO stations (station_name, station_latitude,station_longitude) VALUES("Zebra", 100.7892, 60.789);
INSERT INTO stations (station_name, station_latitude,station_longitude) VALUES("Yankee", 200.2967, 95.234);
You could display/return these using :-
SELECT * FROM stations
that is SELECT all columns (*) FROM the table called stations, the result would be :-
Next you could create the readings table e.g. :-
CREATE TABLE IF NOT EXISTS readings(recorded_datetime INTEGER DEFAULT (datetime('now')), data_recorded REAL, station_reference INTEGER);
This will create a table named readings (if it doesn't already exist) it will have 3 columns :-
recorded_datetime which is of type INTEGER (can store integer of up to 8 bytes i.e. pretty large). This will be used to store a time stamp. Although perhaps not what you want, but as an example, a default value will be used, the current datetime, if no value is specified for this column.
data_recorded as a REAL that is for the data.
station_reference this will refer to the station's id.
You could then insert a reading for the for the Zebra station using :-
INSERT INTO readings (data_recorded,station_reference) VALUES(11.6985,1);
As the record_datetime column has not been provided then the current datetime will be used.
If :-
INSERT INTO readings VALUES(datetime('1980-01-01 10:40'),11.6985,1);
Then this reading would be for 1/1/1980 at 10:40 for station 1.
Using :-
INSERT INTO readings VALUES(datetime('1980-01-01 10:40'),13.6985,2);
INSERT INTO readings VALUES(datetime('1966-03-01 10:40'),15.6985,2);
INSERT INTO readings VALUES(datetime('2000-01-01 10:40'),11.6985,2);
Will add some readings for Yankee station (id 2).
using SELECT station_reference, recorded_datetime, data_recorded FROM READINGS; will select all the columns but the station_reference will be the first column in the result etc e.g. :-
The obvious progression is to display the data including the respective station. For this a JOIN will be used. That is the readings table will be joined with the stations table where the respective stations details are according to the station_refrence value matching the station's id.
However, let's say that we wanted the Station info to be something like stationname (Long=???? - Lat=????) date/time data and be sorted according to station name and then according to date/time. Then the following could be used :-
SELECT
stations.station_name ||
'(Long='||station_longitude||' - Lat='||station_latitude||')'
AS stationinfo,
readings.recorded_datetime,
readings.data_recorded
FROM readings
JOIN stations ON readings.station_reference = stations.id
ORDER BY stations.station_name ASC, readings.recorded_datetime
Note this is shown more of an example that you can do quite complex things in SQL, rather than there being an expectation of fully understanding the coding.
This would result in :-
You may (or some would) argue but why can't I just have a single table with reading, datetime, station name, station latitude, station longitude.
Well you could. BUT :-
Say there were a directive to change station Zebra's name then you'd have to trawl through all the rows to and change the name numerous times. Easy to code but relatively costly in terms of resource usage. Using the two tables means that just a single update is required.
For each row you would have to duplicate data very likely wasting disk space and thus increasing the resources needed to access the data. That is Zebra would take at least 5 bytes, Real's take 8 bytes (2 of them) so that's 21 bytes. A reference to one occurrence of that repeated data would take a maximum of 8 bytes for an integer (initially just a single byte). So there would be a cost of 13 bytes per reading.
I have country, region, county, town data and I'm currently deciding between 2 schema designs (if there's a better one, do tell).
I first thought
Country
Id
Name
Region
Id
CountryId
Name
County
Id
RegionId
Name
Town
Id
CountyId
Name
Does the job however to get all towns in a country you have to 3 inner joins to do the filtering. I guess this could be ok but potentially expensive?
The other design was:
Country
Id
Name
Region
Id
Name
County
Id
Name
Town
Id
CountryId
RegionId
CountyId
Name
This way all hierarchical data so to speak is at the bottom and you can go back up however if you want all regions in a country you're a bit screwed which makes we wonder whether the first design is best.
What do you think is the best schema design?
The best database design depends on how the data is being used.
If this is pretty static data that will all be updated at one time and external references are all to towns, then I would probably go for a denormalized dimension. That is, store the information all in one row:
Town Id
Town name
County name
Region name
Country name
Under the above scenario, the ids for county, region, and country are not necessary (by assumption).
If the data is being provided as separate tables with separate ids, and these tables can be updated independently or row-by-row, then a separate table for each makes sense. Putting all the ids into the towns table may or may not be a good idea. You will have to verify and maintain the hierarchies when data is inserted and updated.
If ids for each level are necessary for your, then you should have appropriate table structure for declaring foreign key constraints. But, this can get complicated. Will an external entity have a "geography" attribute that can be at any level? Will an external always know what level it is going to refer to as?
In other words, you need to know how the data is going to be used in order to define an appropriate data model.
I have 5 tables that have the same structure and same columns: id, name, description. So I wonder what is the best way to design or to avoid having 5 tables that have the same columns:
Create a category table that will include my three common
columns and another column "enum" that will differentiate my categories
ex (city, country, continent, etc.)
Create a category table that will include my three common
columns and create the other five tables that will just include an
id.
Note that I would have an assocation table that should include the id of cities, id countries, id continents, etc. so i can display them into a report
Thank you for your advice.
The decision on how many tables to have under these circumstances simply depends.
The most important factor is whether the five things are independent entities or whether they are related. A simple way to understand this is by understanding foreign key relationships: Will other tables have a column that could refer to any of the five (say "geoid")? Or will other tables have a column that generally refers to one of the five ("cityid", "countryid")? The ability to define helpful foreign key constraints often drives the table structure.
There are other considerations. If your data is at the geographic level, then it might represent hierarchies . . . cities are in countries, countries are on continents. Some databases (such as MySQL) do not support hierarchical queries at all. Under these circumstances, you might consider denormalizing the data for querying purposes.
Other considerations can also come into play. If your application is going to be internationalized, then having all the reference tables in a single place is handy -- for providing cultural-specific information (language, currency symbol, and so on). One way of handling this situation is to put all such references in a single table (and perhaps using more sophisticated foreign key relationships).
The column names are not important, just the data in the columns. If City description, country description and continent description are different information then you are already doing this the right way. The only time you would aim to reduce this data would be if you were repeating information but for the titles of the data it's fine.
In fact. You are doing this correctly. Country will have different values from city for every field mentioned. Id is just an id, every table should have one. Name and description wont be the same across country and city.
Also, this way if you want a countrys name you dont have to go through every country, continent and city. You only have 192 or so entries to go through. If you had all of that in one massive table you would have to use it for everything and go through every result every time you want data. You would also have to distinguish between cities, countries and continents in some other way than the separate tables.
Eg:
method 1, with 5 tables:
SELECT * FROM country
does the same as
method 2, 1 table:
SELECT * FROM table WHERE enumvalue = 'country';
If you have tables representing city, country and continent, and they all have exactly the same fields, you have a fundamental problem. In real life, every city is in a country and every country is in at least one continent (more or less) but your data structure does not reflect that. Your city table should look something like this:
id (primary key)
countryId (foreign key to country)
name
other fields
You'll need a similar relationship between countries and continents. However, before you do so, you have to decide what to do about countries like Russia which is in two continents and Palau which isn't really in any.
You may also want to have a provinceStateTerritory table to help you sort out the 38 places in the United States named Springfield. Or, you may want to handle that situation differently.
Suppose I have the following:
Table region_city
id name parent_id
==============================
1 North null
2 South null
3 Manchester 1
4 London 2
In my user table I store the ID of the City that the user is in.
Now in my search form I need to be able to perform a top-level search, i.e. find all Users that belong to a given region (North or South).
Will it make life easier if I included a region_id field in my user table? Or is that going against the normalisation concept?
It does denormalize the table structures and it could introduce data update anomalies. Consider: the user moves from Manchester to London and the city_id changes. The region_id could still point to the North.
The region_id only depends on the city therefore it does not belong in the user table. Since it can be derived from the city.
If the design absolutely calls for only two levels (region and city) and you are willing to forgo the possible addition of other levels in the future (not a decision I would be inclined to make, but you know your data better than I do) then do not include the regionID in your user table; that would denormalize your database. Instead, you have several choices for representing the data (including two related tables, region and city) and you would perform your search by JOINing the city table to the user table or using an IN clause in your search.