Recommend SQL data model for Semantic Network nodes? - sql

We're building a RDBMS-based web site for a federal semantic network (RDF, Protege, etc). This is basically a large collection of nodes, each having a large and indefinite set of named relationships to (and from) other nodes.
My first thought is a single table for all the nodes (name, description, etc), plus one table per named relationship. Any better ideas out there?

On further reflection, two tables total might do, one for nodes (id, name, description), and other for relations (id, name, description, from, to),
where from and two are ids in the nodes table (ints). Still on the right track?

You could optimize the performance by creating 2 rows per relation.
Let's say you have a table Items and a table Relations and that Person A has a relation with Person B. The Relations table has a left and right column, both referring to Items. Now, if you only have one row for this relation, and you want all relations for a certain Item, you would have a query looking like this:
SELECT * FROM Relations WHERE LeftItemId = #ItemId OR RightItemId = #ItemId
The OR in this query will ruin your performance! If you would duplicate the row and switch the relation (left becomes right and vice versa) the query looks like this:
SELECT * FROM Relations WHERE LeftItemId = #ItemId
With the right index this one will go blazingly fast.

No, that sould be fine. Pay attention to primary key and indexes, so that the performance is good.

If you didn't have a single table for the nodes, you'd have to define a lot of relation tables. Each new node type would require a new relation table with every old node type. That could get out of hand quickly.
So a single table sounds best. You can always use a 1:1 relation to extend it, if you need additional fields for certain node types.

if you're using sql server 2008, you might want to consider the new HierarchyID datatype to store your hierarchy in. It's optimized for storage.

Related

SQL Server Table Design ; one table with type column vs multiple tables

I have made a website in which I have a blog and a products page. Posts and products are stored in different tables. I use microsoft sql server.
I want to create a table to store the views for each post and for each product. My 2 possible designs are:
One table for all views
(id, ref_id, date, ip, type)
where type is either post or product
Two separate tables, table post_views and table product_views
post_views(id, post_id, date, ip)
product_views(id, product_id, date, ip)
Which design is better and why?
My reasoning:
Solution 1 requires more database space (we need to store the type value for each view), also requires a more "complex" query. We need to search by id and type.
Solution 1 is more compact. We have less tables, but the performance won't be so great. If we have 1 million records and 500k views are for the posts and 500k views are for the products, we would have to search all the views to filter by date (just an example)
Pros of 1st solution are the compactness and that I use one less table.
The second solution requires one more table, but the query performance and disk space would be better.
This is a very common design question that I face and I would love to receive an answer from a very good expert.
Since posts and products each have their own tables, go with the second option - meaning different tables for post views and product views.
The reason it's a better option is that this option allows you to use foreign keys between the views tables and the posts and products tables and keep the posts and products separated.
the first option will also allow you to use foreign key between these tables, but it will mean that you can only post views where the ref_id exists in both posts and products tables. Also, it will force more cumbersome select statements to include different joins based on the type column.
With no more information than you have provided, I would go with the first option, simply because it is more concise. Unless you have a compelling reason to break them into two tables, you should keep them in one. It's easy enough to filter any queries by type as needed.
Another consideration is that it's more common practice in stored procs to have a variable field parameter (such as your type field) than it is to have a variable table name, so the function of any stored procs that you create will be a bit more obvious to those who come after you if you keep them in one table.

What is the most correct way to store a "list" in a SQL Database?

So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?
You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.
There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.
I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.
Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.
a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp
This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.
You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.

Multiple record types and how to split them amongst tables

I'm working on a database structure and trying to imagine the best way to split up a host of related records into tables. Records all have the same base type they inherit from, but each then expands on it for their particular use.
These 4 properties are present for every type.
id, name, groupid, userid
Here are the types that expand off those 4 properties.
"Static": value
"Increment": currentValue, maxValue, overMaxAllowed, underNegativeAllowed
"Target": targetValue, result, lastResult
What I tried initially was to create a "records" table with the 4 base properties in it. I then created 3 other tables named "records_static/increment/target", each with their specific properties as columns. I then forged relationships between a "rowID" column in each of these secondary tables with the main table's "id".
Populating the tables with dummy data, I am now having some major problems attempting to extract the data with a query. The only parameter is the userid, beyond that what I need is a table with all of the columns and data associated with the userid.
I am unsure if I should abandon that table design, or if I just am going about the query incorrectly.
I hope I explained that well enough, please let me know if you need additional detail.
Make the design as simple as possible.
First I'd try a single table that contains all attributes that might apply to a record. Irrelevant attributes can be null. You can enforce null values for a specific type with a check constraint.
If that doesn't work out, you can create three tables for each record type, without a common table.
If that doesn't work out, you can create a base table with 1:1 extension tables. Be aware that querying that is much harder, requiring join for every operation:
select *
from fruit f
left join
apple a
on a.fruit_id = f.id
left join
pear p
on p.fruit_id = f.id
left join
...
The more complex the design, the more room for an inconsistent database state. The second option you could have a pear and an apple with the same id. In the third option you can have missing rows in either the base or the extension table. Or the tables can contradict each other, for example a base row saying "pear" with an extension row in the Apple table. I fully trust end users to find a way to get that into your database :)
Throw out the complex design and start with the simplest one. Your first attempt was not a failure: you now know the cost of adding relations between tables. Which can look deceptively trivial (or even "right") at design time.
This is a typical "object-oriented to relational" mapping problem. You can find books about this. Also a lot of google hits like
http://www.ibm.com/developerworks/library/ws-mapping-to-rdb/
The easiest for you to implement is to have one table containing all columns necessary to store all your types. Make sure you define them as nullable. Only the common columns can be not null if necessary.
Just because object share some of the same properties does not mean you need to have one table for both objects. That leads to unnecessary right outer joins that have a 1 to 1 relationship which is not what I think of as good database design.
but...
If you want to continue in your fashion I think all you need is a primary key in the table with common columns "id, name, groupid, userid" (I assume ID) then that would be the foreign key to your table with currentValue, maxValue, overMaxAllowed, underNegativeAllowed

Difference between a db view and a lookuptable

When I create a view I can base it on multiple columns from different tables.
When I want to create a lookup table I need information from one table, for example the foreign key of an order table, to get customer details from another table. I can create a view having parameters to make sure it will get all data that I need. I could also - from what I have been reading - make a lookup table. What is the difference in this case and when should I choose for a lookup table?? I hope this ain't a bad question, I'm not very into db's yet ;).
Creating a view gives you a "live" representation of the data as it is at the time of querying. This comes at the cost of higher load on the server, because it has to determine the values for every query.
This can be expensive, depending on table sizes, database implementations and the complexity of the view definition.
A lookup table on the other hand is usually filled "manually", i. e. not every query against it will cause an expensive operation to fetch values from multiple tables. Instead your program has to take care of updating the lookup table should the underlying data change.
Usually lookup tables lend themselves to things that change seldomly, but are read often. Views on the other hand - while more expensive to execute - are more current.
I think your usage of "Lookup Table" is slightly awry. In normal parlance a lookup table is a code or reference data table. It might consist of a CODE and a DESCRIPTION or a code expansion. The purpose of such tables is to provide a lsit of permitted values for restricted columns, things like CUSTOMER_TYPE or PRIORITY_CODE. This category of table is often referred to as "standing data" because it changes very rarely if at all. The value of defining this data in Lookup tables is that they can be used in foreign keys and to populate Dropdowns and Lists Of Values.
What you are describing is a slightly different scenario:
I need information from one table, for
example the foreign key of an order
table, to get customer details from
another table
Both these tables are application data tables. Customer and Order records are dynamic. Now it is obviously valid to retrieve additional data from the Customer table to display along side the Order data, and in that sense Customer is a "lookup table". More pertinently it is the parent table of Order, because it has the primary key referenced by the foreign key on Order.
By all means build a view to capture the joining logic between Order and Customer. Such views can be quite helpful when building an application that uses the same joined tables in several places.
Here's an example of a lookup table. We have a system that tracks Jurors, one of the tables is JurorStatus. This table contains all the valid StatusCodes for Jurors:
Code: Value
WS : Will Serve
PP : Postponed
EM : Excuse Military
IF : Ineligible Felon
This is a lookup table for the valid codes.
A view is like a query.
Read this tutorial and you may find helpful info when a lookup table is needed:
SQL: Creating a Lookup Table
Just learn to write sql queries to get exactly what you need. No need to create a view! Views are not good to use in many instances, especially if you start to base them on other views, when they will kill performance. Do not use views just as a shorthand for query writing.

Hierarchical Database, multiple tables or column with parent id?

I need to store info about county, municipality and city in Norway in a mysql database. They are related in a hierarchical manner (a city belongs to a municipality which again belongs to a county).
Is it best to store this as three different tables and reference by foreign key, or should I store them in one table and relate them with a parent_id field?
What are the pros and cons of either solution? (both structural end efficiency wise)
If you've really got a limit of these three levels (county, municipality, city), I think you'll be happiest with three separate tables with foreign keys reaching up one level each. This will make queries almost trivial to write.
Using a single table with a parent_id field referencing the same table allows you to represent arbitrary tree structures, but makes querying to extract the full path from node to root an iterative process best handled in your application code.
The separate table solution will be much easier to use.
three different tables:
more efficient, if your application mostly accesses information about only one entity (county, municipality, city)
owner-member-relationship is a clear and elegant model ;)
County, Municipality, and City don't sound like they are the same kind of data ; so, I would use three different tables : one per data-type.
And, then, I would indeed use foreign keys between those.
Efficiency-speaking, not sure it'll change much :
you'll do joins on 3 tables instead of joining 3 times on the same table ; I suppose it's quite the same.
it might make a little difference when you need to work on only one of those three type of data ; but with the right indexes, the differences should be minimal.
But, structurally speaking, if those are three different kind of entities, it makes sense to use three different tables.
I would recommend for using three different tables as they are three different entities.
I would use only one table in those cases you don´t know the depth of the hierarchy, but it is not case.
I would put them in three different tables, just on the grounds that it is 3 different concepts. This will hamper speed and will complicate your queries. However given that MySQL does not have any special support for hirachical queries (like Oracle's connect by statement) these would be complicated anyway.
Different tables: it's just "right". I doubt you'll see any performance gains/losses either way but this is one where modelling it properly up-front will probably save you lots of headaches later on. For one thing it'll make SQL SELECTs easier to write and read.
You'll get different opinions coming back to you on this but my personal preference would be to have separate tables because they are separate entities.
In reality you need to think about the queries you will doing on this data and usually your answer will come from that. With separate tables your queries will look much cleaner and in the end your not saving yourself anything because you'll still be joining tables together, even if they are the same table.
I would use three separate tables, since you know exactly what categories of information you are working with, and won't need to dynamically alter the 'depth' of your hierarchy.
It'll also make the data simpler to manage, as you'll be able to tell if the data is for a city, municipality or a county just by knowing the table (and without having to discern the 'depth' of a record in the hierarchy first!).
Since you'll probably be doing self joins anyway to get the hierarchy to work, I'd doubt there would be any benefits from having all the data in a single table.
In dataware housing applications, adherents of the Kimball methodology might place these fields in the same attribute table:
create table city (
id int not null,
county varchar(50) not null,
municipality varchar(50),
city varchar(50),
primary key(id)
);
The idea being that attibutes should never be more than l join away from the fact table.
I just state this as an alternative view. I would go with the 3 table design personally.
This is a case of ‘Database Normalization’, which is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. The purpose is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Multiple tables will help in the situation if the task has been distributed among different developers, or users at different levels require different rights to view and change the data or the small tables help when you need this data for other purposes as well or so.
My vote would be for multiple tables - with data appropriately distributed.