Designing database schema for models with multiple sorts of data

Designing database schema for models with multiple sorts of data - sql

This question is about database design and how to best split your
entities when they serve more than one purpose.
My database schema models sports events around the world. What type of
event and sport, when and where it is played, what the participants
are and the winner and so on. One of the entities in the schema is
Country which keeps track of in which country in the world the sport
event took place in.
This works well, but then I also need to add auxilliary data to the
Country model, which is not related to the sport events model per se,
but is required for rendering the data nicely on a web page.
Examples of that data is the countrys flags sprite offset in the
sprite image, a long description of the country, the adjective for the
country (China - chinese etc), number of visitors on the country page
and the subjective importance of the country on a scale from 1-5
(events in countries rated five are shown on the front page).
I could easily put all those attributes on the Country object itself,
but it seems wrong, and pollutes my clean sports event schema. I don't
think the structure of the data should be mixed with details like how
to render it nicely... So the question is how I should organise it
instead?

I would keep this data on a 'Countries' table and create a FK to your sports data. This will allow you to easily maintain each Country's attributes as you will only have to update them in one place and the changes will take effect everywhere it is referenced. I don't think this pollutes your data as the information is relevant to your application. Furthermore, if you do try to separate this data it will only make your schema more complex and maintenance more difficult.

Related

Wikipedia-typed Django models relations

I need to connect some articles where mentioned some people from another fields. For example, science, cinema, music, etc. How to build models in this situation, I'm sure, it is many to many relation, but how to control this third table and which data needed to connect? I need model, where will be name of article, text of article, field (cinema), and person (50 Cent). How to deal wit that fact, that one person may be related to several fields, for example, music and cinema?

Object oriented model for tables in MS Access

I have an MS Access database with several tables. Almost all tables contain inventory information about different classes of items (there are some utility tables which store extra information, such as a list of classes and lists of commonly used lookup values). Some classes of items have particular data specific to them - for instance, volume is relevant for liquids but not solid objects, but all objects have a location. The logical structure of my database is a textbook example of a case where an object oriented model provides clarity and maintainability benefits:
There is one basic table which is a catch-all table for all items that don't fit into other categories. It contains a few columns, like item name, date, location and notes that is applicable to any item. This would be the top superclass, e.g. class InventoryTable.
There are tables for specific classes, such as a table for printer cartridges. This table will have all the columns that InventoryTable has, but also include some specialized information that is only relevant for printer cartridges, such as printer model, ink color and brand. This table would be a subclass, e.g. class PrinterCartridgeTable : InventoryTable.
Sometimes there is a deeper inheritance structure. For example, there may be a table for all documents (class DocumentTable : InventoryTable, includes extra field for how many pages a document has) and then another table for letters (class LetterTable : DocumentTable which also has columns for sender and recipient of the letter). The assumption is that one would look for letters in the LetterTable, and if not found there, could try looking in the DocumentTable and the top level InventoryTable.
Let's say my dates are currently displayed as MM/DD/YYYY. I want to change them to ISO format (YYYY-MM-DD). Currently, I have to open every single table I have (about 20) and change the format in each one of them one by one. If there was some kind of inheritance mechanism, I could instead change the format only in my top-level InventoryTable, and all my other tables would inherit the change.
Or, suppose I decide to store a new piece of data, called "Owner", for all items. This would describe who entered the item into the inventory. I could simply add this column to InventoryTable, and it would appear in all the child tables automatically.
Lastly, let's say I make cosmetic changes such as rearranging the order of columns. Let's say in my document-related tables, the page number appeared at the end. I instead move the page number to the very beginning of the table - this would propagate to both DocumentTable as well as LetterTable but not unrelated tables.
Bear in mind that I am editing these tables manually using the GUI of MS Access 2013. When editing information pertaining to a single class of items, I would not like to switch back and forth between tables or queries to edit different parts of the same record - I want to be able to see and edit all of the information for any given record in one place. Therefore, some complicated solutions based on chaining queries may be impractical.
Is it possible for me to accomplish what I want (the inheritance structure) in Access using some kind of object oriented scheme? Is there an alternative way of obtaining the same benefits? Do I have no choice except to give up and manually propagate every change to all tables?

The relational data model does not have inheritance built in. There are several design patterns that allow the database designer to mimic the behavior of inheritance in a system of relational tables. Two common designs are known as "Single Table Inheritance" and "Class Table Inheritance". There are two tags in this area with questions that relate to these two techniques, and a brief description in the info under the tag. With one of these two techniques, you will be able to model a superclass/subclass situation.
For a more complete description, you could search for Martin Fowler's treatment of the two techniques on the web. There is a third technique, called "Shared Primary Key" which allows you to enforce the one-to-one nature of the IS-A relationship between members of the subclasses and members of the superclass.
Your big problem in MS Access is going to be implementing the code that these techniques leave to the application programmer. Get ready to do plenty of coding in VBA, and tying this code to the user's dashboard.

It is not possible to make tables in Access object-oriented because it is not possible to directly associate methods with tables. An object is defined to be both properties and methods. Access is not designed to do that.
Also note that Access is not the best that Microsoft has to offer. You will get more power and capabilities with SQL Server.

How to design an entity which is sub-entity of similar entity in UML diagram?

I'm designing and XML and oracle database based on following specification. The question looks confusing but it is very simple. I am trying to design UML diagram for following:
A multinational network of hotels owns the hotels in the countries all over the world. etc. The
network of hotels is described by a name and an address of its headquarters. An address of
headquarters consists of country, city, street and building number and it optionally includes
phone and email address.
The network has many hotels in many different countries. However each one of its hotels is
located in a different city. A hotel is described by a name and address that consists of country,
city, street and building number and it optionally includes phone and email address. A name of a
hotel together with a city name uniquely identifies each hotel, e.g. Sheraton at Suwa, or Holiday
Inn at Port Villa.
My problem is with first paragraph. When we say "The
network of hotels is described by a name and an address of its headquarters." Does it mean that one hotel can have one or more headquarter (because it say plural headquarters)? Is the following diagram is correct or I need to create another entity and define headquarters and then I should say "headquarters" contain "HOTEL"

The diagram is almost correct I would just use a Composition or aggregation instead of an Association.

This looks like a "homework" question. In case if it was a real task then you MUST get requirements clarified before serious analysis and design and coding takes place.
So in real world if a sentence from requirements is not clear then stop and get to your customer to make it clear to you.
In the "homework" world it looks to me like hotels and headquarters are completely two different things that may exist in completely different places and different buildings.
You can have headquarters in a business center in a big city with owner, secretary, accounting etc.
Then you can have 10 hotels at different vacation resorts spread near the sea.
For the modeling of headquarters I'd use hierarchical recursive tree-style data structure. As the structure of company owners, headquarters etc. should be very flexible. Company can be bought easily and its headquarters moved anywhere (at least from the point of law). Hotels on the other hand are physical entities that can not change it's address or move to another city once built from bricks, iron, glass..
For tips on how to model the headquarters data structure you can use Google or following old Stack Overflow related article Relationships in a UML class diagram

How far can I take this database design?

I am interested in knowing the pros and cons of creating a custom system supported by a database like the one described below:
It has 6 tables that support it.
Entity: Lets say, anything "physical" that can exist and have detail stored against it
(Hilton Hotel, Tony Taxi, One Bar)
Entity Type: A grouping/type of entity
(Bar, Hotel, Restaurant)
Metadata: Any detail describing or belonging to an entity item
(IR232PH, foo#bar.com, 555-555-555)
Metadata Type: A grouping/type of metadata
(Post Code, Telephone, Email, address)
Entity Relationship: The ability to group any entity item to another
(Entity1-Entity2, Entity3)
Entity Relationship Type: The grouping/type of entity relationship.
I can see how this model is good for Entities that are similar but don't always have the same amount of attributes.
What are the pro/cons of using it as it is for entities as described?
An artist can be performing (relationship type) at a venue.
An artist can be supporting (relationship type) another artist
What would be the pro/cons of using it also to store more standard entities like users of the system?
A user can have a favourite (relationship type) venue/artist/bar etc
A user can have a attending (relationship type) event
Would you take it as far as having the news and blog posts in it?

This is highly subjective, but before I went up the abstraction ladder to where you are suggesting, I'd rather code my application to use DDL to modify the database schema to match the concrete aspects of the actual entities it was using, rather than having a static schema abstracted so far as to be able to store data about any potential entities.
In a way, to be a bit facetious, IMHO, what you are suggesting has already been done.... It is called a Relational Database. Every RDBMS is a software tool designed to be able to model any possible set of entities, and their attributes, in a way that accurately models those entities and the relationships between them.

Although you can certainly store the data in such a data model, there are a couple of problems (at least) with it.
The first problem is controlling the data. When an 'hotel' is described, what is the set of attributes and metadata that must be defined? Which metadata types can legitimately be entered for an hotel? Related to that is 'when I delete an hotel from the list, what else do I have to delete'? When I delete all hotels from the list (and I never want to store information about hotels again), what else do I have to delete? It is terrifically (terrifyingly?) easy to get all sorts of stray extraneous, unreferenced data into the database.
The second problem is retrieving the data. Suppose I want to know all the information about a specific hotel? How do I write a query for that? Actually, even inserting the data is hard, but selecting it is, if anything, harder. If I only want three attributes, it is easy - if the hotel actually has them all. It is harder if the hotel only has two of the three specified. But suppose the hotel has 30 atttributes, which is not a lot. Then it is terrifically difficult.
What you are describing is a souped-up version of a model known as the EAV or Entity-Attribute-Value model of data. It is generally accepted to be a 'bad idea', for all it is a common idea.

What you described is also known as a triplestore. A triple is a subject-object-predicate (Hotel HAS Rooms, Joe Likes HotelX, etc.). There are mechanisms for running these things (triplestore implementations), controlling the data (eg with ontologies) and for querying them, too (eg the SPARQL language). However, this is all fairly bleeding edge stuff and is known to have scalability problems. Nevertheless, combined with NoSQL approaches (index all your hotels in a big document store, etc.), it's an interesting area to keep an eye on.
See: http://en.wikipedia.org/wiki/Triplestore.

Modeling Geographic Locations in an Relational Database

I am designing a contact management system and have come across an interesting issue regarding modeling geographic locations in a consistent way. I would like to be able to record locations associated with a particular person (mailing address(es) for work, school, home, etc.) My thought is to create a table of locales such as the following:
Locales (ID, LocationName, ParentID) where autonomous locations (such as countries, e.g. USA) are parents of themselves. This way I can have an arbitrarily deep nesting of 'political units' (COUNTRY > STATE > CITY or COUNTRY > STATE > CITY > UNIVERSITY). Some queries will necessarily involve recursion.
I would appreciate any other recommendations or perhaps advice regarding predictable issues that I am likely to encounter with such a scheme.

You might want to have a look at Freebase.com as a site that's had some open discussion about what a "location" means and what it means when a location is included in another. These sorts of questions can generate a lot of discussion.
For example, there is the obvious "geographic nesting", but there are less obvious logical nestings. For example, in a strictly geographic sense, Vatican City is nested within Italy. But it's not nested politically. Similarly, if your user is located in a research center that belongs to a university, but isn't located on the University's property, do you model that relationship or not?

Sounds like a good approach to me. The one thing that I'm not clear on when reading you post is what "parents of themselves" means - if this is to indicate that the locale does not have a parent, you're better off using null than the ID of itself.

I think you might be overthinking this. There's a reason most systems just store addresses and maybe a table of countries. Here are some things to look out for:
Would an address in the Bronx include the borough as a level in the hierarchy? Would an address in an unincorporated area eliminate the "city" level of the hierarchy? How do you model an address within a university vs an address that's not within one? You'll end up with a ragged hierarchy which will force you to traverse the tree every time you need to display an address in your application. If you have an "address book" page the performance hit could be significant.
I'm not sure that you even have just one hierarchy. Brown University has facilities in Providence, RI and Bristol, RI. The only clean solution would be to have a double hierarchy with two campuses that each belong to their respective cities in one hierarchy but that both belong to Brown University on the other hierarchy. (A university is fundamentally unlike a political region. You shouldn't really mix them.)
What about zip codes? Some zip codes encompass multiple towns, other times a city is broken into multiple zip codes. And (rarely) some zip codes even cross state lines. (According to Wikipedia, at least...)
How will you enter the data? Building out the database by parsing conventionally-formatted addresses can be difficult when you take into account vanity addresses, alternate names for certain streets, different international formats, etc. And I think that entering every address hierarchically would be a PITA.
It sounds like you're trying to model the entire world in your application. Do you really want or need to maintain a table that could conceivable contain every city, state, province, postal code, and country in the world? (Or at least every one where you know somebody?) The only thing I can think of that this scheme would buy you is proximity, but if that's what you want I'd just store state and country separately (and maybe the zip code) and add latitude and longitude data from Google.
Sorry for the extreme pessimism, but I've gone down that road myself. It's logically beautiful and elegant, but it doesn't work so well in practice.

Here's a suggestion for a pretty flexible schema. An immediate warning: it could be too flexible/complex for what you actually need
Location
(LocationID, LocationName)
-- Basic building block
LocationGroup
(LocationGroupID, LocationGroupName, ParentLocationGroupID)
-- This can effective encapsulate multiple hierarchies. You have one root node and then you can create multiple independent branches. E.g. you can split by state first and then create several sub-hierarchies e.g. ZIP/city/xxxx
LocationGroupLocation
(LocationID, LocationGroupID)
-- Here's how you link Location with one or more hierarchies. E.g. you can link your house to a ZIP, as well as a City... What you need to implement is a constraint that you should not be able to link up a location with any two hierarchies where one of them is a parent of the other (as the relationship is already implicit).

I would think carefully about this since it may not be a necessary feature.
Why not just use a text field and let users type in an address?
Remember the KISS principle (Keep It Simple, Stupid).

I agree with the other posts that you need to be very careful here about your requirements. Location can become a tricky issue and this is why GIS systems are so complicted.
If you are sure you just need a basic heirarchy structure, I have the following suggestions:
I support the previous comment that root level items should not have themselves as the parent. Root level items should have a null value for the parent. Always be careful about putting data into a field that has no meaning (i.e. "special" value to represent no data). This practice is rarely necessarily and way overused in the devleoper community.
Consider XPath / XML. This is Something to consider for bother recording the heirarchy structure, and for processing / parsing the data at retrieval. If you are using MSSQL Server, the XPath expressions in select statements are perfect for tasks such as returning the full location/heirarchy path of a record as the code is simple and the results are fast.

For Geographic locations you may wish to resolve an address to a Latitude, Longitude array (perhaps using Google maps etc.) to calculate proximities etc.. For Geopolitical nesting ... I'd go with the KISS response.
If you really want to model it, perhaps you need the types to be more generic ... Country -> State -> County -> Borough -> Locality -> City -> Suburb -> Street or PO Box -> Number -> -> Appartment etc. -> Institution (University or Employer) -> Division -> Subdivision-1 -> subdivision-n ... Are you sure you can't do KISS?

I'm modeling an apps for global users and I have the same problems, but I think that this approach could already be in use in many enterprise. But why this problem don't have an universal solution? Or, has this problem one best solution that can be the start point or anybody in the world need think in a solution for it since beginnig?
In IT, we are making the same things any times and in many places, unfortunately. For exemplo, who are not have made more than one user, customer or product's database? And the worst, all enterprise in the world has made it. I think that could have universal solutions for universal problems.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas