How to represent a set of entities from separate tables? - sql

I have a few tables representing geographical entities (CITIES, COUNTIES, ZIPCODES, STATES, COUNTRIES, etc).
I need to way represent sets of geographical entities. A set can contain records from more than one table. For example, one set may contain 3 records from CITIES, 1 record from COUNTIES and 4 from COUNTRIES.
Here are two possible solutions:
A table which contains three columns - one record for each entity. The table will contain multiple records for each set, all sharing the the set number.
set_id INT, foreign_table VARTEXT(255), foreign_id INT
Sample entries for set #5:
(5,'CITIES',4)
(5,'CITIES',12)
(5,'ZIPCODES',91)
(5,'ZIPCODES',92)
(5,'COUNTRIES',15)
A table which contains a TEXT column for each entity type, which will include a string set with the appropriate entries:
set_id INT,cities TEXT,counties TEXT,zipcodes TEXT,states TEXT,countries TEXT
So the above set will be represented with a single record
(5,'4,12','','91,92','','15')
Any other ideas? Would love to hear your input.
Thanks!

Both solutions you propose don't have real foreign keys. In the first solution, one foreign_id can point to many tables, which is hard (or at least inefficient) for a database to enforce. The second solution stores multiple values in one column, which is the one thing everyone agrees you shouldn't do (it breaks first normal form.)
What I would do is this: cities, zip codes, and states all "have a" geographical location. The normal way to implement that is a one to many relation. Create a geolocation table, and add a geolocation_id column to the cities, zip code, and state tables.
EDIT: Per your comment, to get from a geolocation to its cities:
select *
from geolocation g
left join cities c
on g.id = c.geolocation_id
left join zipcodes z
on g.id = z.geolocation_id
....
The database will resolve the joins using the foreign key index, which is very fast.

One Location Set can have many Geography items
One Geography item can belong to many Location Sets
Regarding the Geography item table, two approaches are possible. In the first case the super-type/subtype relationship is overlapping -- more than one sub-type can be linked to the super-type.
For example, there can be GeographyID = 5 in Geography and Zipcodes, Cities, States, Countries tables.
In the second case, we can consider the exclusive (disjoint) relationship, in which only one subtype can be connected to the super-type. The parent-child relationship is used to create paths, like ZIP/City/State/Country -- that is if actual administrative areas allow for this type of relationship.
In this example, there can be GeographyID = 5 in the Geography and only one more sub-type table.

Related

Design Sql Tables with common columns

I have 5 tables that have the same structure and same columns: id, name, description. So I wonder what is the best way to design or to avoid having 5 tables that have the same columns:
Create a category table that will include my three common
columns and another column "enum" that will differentiate my categories
ex (city, country, continent, etc.)
Create a category table that will include my three common
columns and create the other five tables that will just include an
id.
Note that I would have an assocation table that should include the id of cities, id countries, id continents, etc. so i can display them into a report
Thank you for your advice.
The decision on how many tables to have under these circumstances simply depends.
The most important factor is whether the five things are independent entities or whether they are related. A simple way to understand this is by understanding foreign key relationships: Will other tables have a column that could refer to any of the five (say "geoid")? Or will other tables have a column that generally refers to one of the five ("cityid", "countryid")? The ability to define helpful foreign key constraints often drives the table structure.
There are other considerations. If your data is at the geographic level, then it might represent hierarchies . . . cities are in countries, countries are on continents. Some databases (such as MySQL) do not support hierarchical queries at all. Under these circumstances, you might consider denormalizing the data for querying purposes.
Other considerations can also come into play. If your application is going to be internationalized, then having all the reference tables in a single place is handy -- for providing cultural-specific information (language, currency symbol, and so on). One way of handling this situation is to put all such references in a single table (and perhaps using more sophisticated foreign key relationships).
The column names are not important, just the data in the columns. If City description, country description and continent description are different information then you are already doing this the right way. The only time you would aim to reduce this data would be if you were repeating information but for the titles of the data it's fine.
In fact. You are doing this correctly. Country will have different values from city for every field mentioned. Id is just an id, every table should have one. Name and description wont be the same across country and city.
Also, this way if you want a countrys name you dont have to go through every country, continent and city. You only have 192 or so entries to go through. If you had all of that in one massive table you would have to use it for everything and go through every result every time you want data. You would also have to distinguish between cities, countries and continents in some other way than the separate tables.
Eg:
method 1, with 5 tables:
SELECT * FROM country
does the same as
method 2, 1 table:
SELECT * FROM table WHERE enumvalue = 'country';
If you have tables representing city, country and continent, and they all have exactly the same fields, you have a fundamental problem. In real life, every city is in a country and every country is in at least one continent (more or less) but your data structure does not reflect that. Your city table should look something like this:
id (primary key)
countryId (foreign key to country)
name
other fields
You'll need a similar relationship between countries and continents. However, before you do so, you have to decide what to do about countries like Russia which is in two continents and Palau which isn't really in any.
You may also want to have a provinceStateTerritory table to help you sort out the 38 places in the United States named Springfield. Or, you may want to handle that situation differently.

Can this table structure work or should it change

I have an existing table which is expected to work for a new piece of functionality. I have the opinion that a new table is needed to achieve the objective and would like an opinion if it can work as is, or is the new table a must? The issue is a query returning more records than it should, I believe this is why:
There is a table called postcodes. Over time this has really become a town table because different town names have been entered so it has multiple records for most postcodes. In reference to the query below the relevant fields in the postcode table are:
postcode.postcode - the actual postcode, as mentioned this is not unique
postcode.twcid - is a foreign key to the forecast table, this is not unique either
The relevant fields in the forecast table are:
forecast.twcid - identifyer for the table however not unique because there four days worth of forecasts in the table. Only ever four, newver more, never less.
And here is the query:
select * from forecast
LEFT OUTER JOIN postcodes ON forecast.TWCID = postcodes.TWCID
WHERE postcodes.postcode = 3123
order by forecast.twcid, forecast.theDate;
Because there are two records in the postcode table for 3123 the results are doubled up. Two forecasts for day 1, two for day 2 etc......
Given that the relationship between postcodes and forecast is many to many (there are multiple records in the postcode tables for each postcode and twcid. And there are multiple records for each twcid in the forecast table because it always holds four days worth of forecasts) is there a way to re-write the query to only get four forecast records for a post code?
Or is my thought of creating a new postcode table which has unique records for each post code necessary?
You have a problem that postcodes can be in multiple towns. And towns can have multiple postcodes. In the United States, the US Census Bureau and the US Post Office have defined very extensive geographies for various coding schemes. The basic idea is that a zip code has a "main" town.
I would suggest that you either create a separate table with one row per postcode and the main town. Or, add a field to your database indicating a main town. You can guarantee uniqueness of this field with a filtered index:
create unique index postcode_postcode_maintown on postcodes(postcode) where IsMainTown = 1;
You might need the same thing for IsMainPostcode.
(Filtered indexes are a very nice feature in SQL Server.)
With this construct, you can change your query to:
select *
from forecast LEFT OUTER JOIN
postcodes
ON forecast.TWCID = postcodes.TWCID and postcodes.IsMainPostcode = 1
WHERE postcodes.postcode = 3123
order by forecast.twcid, forecast.theDate;
You should really never have a table without a primary key. Primary keys are, by definition, unique. The primary key should be the target for your foreign keys.
You're having problems because you're fighting against a poor database design.

Reproduce certain rows in PostgreSQL

I have a table with IDs that represent Individuals (call it id_table) and another table with characteristics an individual can have (call it char_table).
Now I want to add a column to the id_table containing certain values from the char_table. The difficulty, for me, is, that some ID shall get only one characteristic (simple case) and some ID shall get several characteristics. For that reason the rows with these ID that get several characteristics must be reproduced so often until I have matched every characteristic to that ID.
For Example:
ID '001' shall get the characteristics 'a', 'b', and 'c'. So the row ID='001' must be reproduced 2 times (to get the same row 3 times) and each of these rows shall get one of these 3 characteristics.
I hope I explained intelligible enough.
Anyone an idea how to do this?
Thanks.
In almost all cases, you want to do this with an association or junction table. This table would like like:
create table IndividualCharacteristicsr (
individualId int not null,
characterId int not null
);
(It might also have a unique id itself.)
Each characteristic an individual has would be on a separate row. So, three characteristics mean three rows. A typical query using this information would join all three tables:
from id_table i left outer join
IndividualCharacteristicsr ic
on i.individualId = ic.individualId left outer join
char_table c
on c.characteristicId = ic.characteristicId
(The left outer join includes individuals with no characteristics.)
In Postgres, you could store the characteristics in a string or an array. Neither is very convenient for joining back to id_char. The better approach is an additional table.

PowerPivot relationship based on two columns per table

I have a table geoLocations which holds among others the two columns latitude and longitude. There is a second table (let's name it cities), which holds for each unique pair of laitude and longitude the corresponding city.
How can I model this relationship using PowerPivot? Creating two separate relationships will fail, as one coordinate may occur several times in the lookup-table (only the combination of both is unique).
Thanks in advance :)
You should create a key in each table that concatenates the 2 fields together. This could be done in a calculate column eg:
= LATITUDE & "-" & LONGITUDE
Or on import which would yield better performance on a large data set.
Jacob

Can a many-to-many join table have more than two columns?

I have some tables that benefit from many-to-many tables. For example the team table.
Team member can hold more than one 'position' in the team, all the positions are listed in the position db table. The previous positions held are also stored for this I have a separate table, so I have
member table (containing team details)
positions table (containing positions)
member_to_positions table (id of member and id of position)
member_to_previous_positions (id of member and id of position)
Simple, however the crux comes now that a team member can belong to many teams aghhh.
I already have a team_to_member look-up table.
Now the problem comes how do I tie a position to a team? A member may have been team leader on one team, and is currently team radio man and press officer on a different team. How do I just pull the info per member to show his current position, but also his past history including past teams.
Do I need to add a position_to team table and somehow cross reference that, or can I add the team to the member to positions table?
It's all very confusing, this normalization.
Yes, a many-to-many junction table can have additional attributes (columns).
For example, if there's a table called PassengerFlight table that's keyed by PassengerID and FlightID, there could be a third column showing the status of the given passenger on the given flight. Two different statuses might be "confirmed" and "wait listed", each of them coded somehow.
In addition, there can be ternary relationships, relationships that involve three entities and not just two. These tables are going to have three foreign keys that taken together are the primary key for the relationship table.
It's perfectly legitimate to have a TeamPositionMember table, with the columns
Team_Id
Position_Code
Member_Id
Start_Date
End_Date NULLABLE
And and a surrogate ID column for Primary Key if you want; otherwise it's a 3-field composite Primary Key. (You'll want a uniqueness constraint on this anyway.)
With this arrangement, you can have a team with any set of positions. A team can have zero or more persons per position. A person can fill zero or more positions for zero or more teams.
EDIT:
If you want dates, just revise as shown above, and add Start_Date to the PK to allow the same person to hold the same position at different times.
My first thought:
Give your many-to-many teams/members table an ID column. Every team-to-member relationship now has an ID.
Then create a many-to-many linking positions to team-member relationships.
This way, teams can have multiple members, members can have multiple teams, and members can have multiple positions on a per-team basis.
Now everything is nice and DRY, and all the linking up seems to work. Does that sound right to anyone else?
Sounds like you need a many-to-many positions to teams table now.
Your team_to_member table can indeed have an extra column position_id to describe (or in this case point to) the position the member has within that team.
Get rid of member_to_previous_position table. Just use member_to_positions and have these columns:
MemberToPositionID (autoincrement OK only)
MemberID
PositionID
StartDate
EndDate
Then to find current positions, you do:
select *
from member_to_positions
where EndDate is null