SQL database problems with addressbook table design - sql

I am writing a addressbook module for my software right now. I have the database set up so far that it supports a very flexible address-book configuration.
I can create n-entries for every type I want. Type means here data like 'email', 'address', 'telephone' etc.
I have a table named 'contact_profiles'.
This only has two columns:
id Primary key
date_created DATETIME
And then there is a table called contact_attributes. This one is a little more complex:
id PK
#profile (Foreign key to contact_profiles.id)
type VARCHAR describing the type of the entry (name, email, phone, fax, website, ...) I should probably change this to a SET later.
value Text (containing the value for the attribute).
I can now link to these profiles, for example from my user's table. But from here I run into problems.
At the moment I would have to create a JOIN for each value that I want to retrieve.
Is there a possibility to somehow create a View, that gives me a result with the type's as columns?
So right now I would get something like
#profile type value
1 email name#domain.tld
1 name Sebastian Hoitz
1 website domain.tld
But it would be nice to get a result like this:
#profile email name website
1 name#domain.tld Sebastian Hoitz domain.tld
The reason I do not want to create the table layout like this initially is, that there might always be things to add and I want to be able to have multiple attributes of the same type.
So do you know if there is any possibility to convert this dynamically?
If you need a better description please let me know.

You have reinvented a database design called Entity-Attribute-Value. This design has a lot of weaknesses, including the weakness you've discovered: it's very hard to reproduce a query result in a conventional format, with one column per attribute.
Here's an example of what you must do:
SELECT c.id, c.date_created,
c1.value AS name,
c2.value AS email,
c3.value AS phone,
c4.value AS fax,
c5.value AS website
FROM contact_profiles c
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'name')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'email')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'phone')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'fax')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'website');
You must add another LEFT OUTER JOIN for every attribute. You must know the attributes at the time you write the query. You must use LEFT OUTER JOIN and not INNER JOIN because there's no way to make an attribute mandatory (the equivalent of simply declaring a column NOT NULL).
It's far more efficient to retrieve the attributes as they are stored, and then write application code to loop through the result set, building an object or associative array with an entry for each attribute. You don't need to know all the attributes this way, and you don't have to execute an n-way join.
SELECT * FROM contact_profiles c
LEFT OUTER JOIN contact_attributes ca ON (c.id = ca.profile);
You asked in a comment what to do if you need this level of flexibility, if not use the EAV design? SQL is not the correct solution if you truly need unlimited metadata flexibility. Here are some alternatives:
Store a TEXT BLOB, containing all the attributes structured in XML or YAML format.
Use a semantic data modeling solution like Sesame, in which any entity can have dynamic attributes.
Abandon databases and use flat files.
EAV and any of these alternative solutions is a lot of work. You should consider very carefully if you truly need this degree of flexibility in your data model, because it's hugely more simple if you can treat the metadata structure as relatively unchanging.

If you are limiting yourself to displaying a single email, name, website, etc. for each person in this query, I'd use subqueries:
SELECT cp.ID profile
,cp.Name
,(SELECT value FROM contact_attributes WHERE type = 'email' and profile = cp.id) email
,(SELECT value FROM contact_attributes WHERE type = 'website' and profile = cp.id) website
,(SELECT value FROM contact_attributes WHERE type = 'phone' and profile = cp.id) phone
FROM contact_profiles cp
If you're using SQL Server, you could also look at PIVOT.
If you want to show multiple emails, phones, etc., then consider that each profile must have the same number of them or you'll have blanks.
I'd also factor out the type column. Create a table called contact_attribute_types which would hold "email", "website", etc. Then you'd store the contact_attribute_types.id integer value in the contact_attributes table.

You will need to generate a query like:
select #profile,
max(case when type='email' then value end) as email,
max(case when type='name' then value end) as name,
max(case when type='website' then value end) as website
from mytable
group by #profile
However, that will only show one value for each type per #profile. Your DBMS may have a function you can use instead of MAX to concatenate all the values as a comma-separated string, or you may be able to write one.
This kind of data model is generally best avoided for the reasons you have already mentioned!

You create a view for each contact type
When you want all the information you pull from the entire table, when you want a subset of a specific contact type, you pull from the view.
I'd create a stored procedure that takes the intent {all, phone, email, address} as one of the parameters and then derive the data. All my app code would call this stored procedure to get the data. Also, when a new type is added (which should be very infrequently, you create another view and modify only this sproc).
I've implemented a similar design for multiple small/med size systems and have had no issues.
Am I missing something? This seems trivial?
EDIT:
I see what I was missing... You are trying to be normalized and denormalized at the same time. I'm not sure what the rest of your business rules are for pulling records. You could have profiles with multiple or null values for phone/email/addresses etc. I would keep your data format the same and again use a sproc to create the specific view you want. As your business needs change, you leave your data alone and just create another sproc to access it.

There is no one right answer for this question, as one would need to know, for your specific organization or application, how many of those contact methods the business wants to collect, how current they want the information to be, and how much flexibility they are willing to invest in.
Of course, many of here could make some good guesses as to what the average business would want to do, but the real answer is to find out what your project, what your users, are interested in.
BTW, all architecture questions about "best"-ness require this sort of cost, benefit, and risk analysis.

Now that the approach of document-oriented databases is getting more and more popular, one could use one of them to store all this information in one entry - and therefor deleting all those extra joins and queries.

Related

MS Access SQL, sort by age in a specific order

My task: "Compile an SQL query that outputs a specific store (enter parameter window) the age of the youngest buyer"
I´ve tried some things, but because i´m new to SQL and i have no idea what i´m doing non of them seem to work.
I´d really appreciate, if someone would help me.
Thanks!
First you need to know the fields to SELECT (or return) and the table FROM which you are querying (asking) data; let's say you have the following tables: tblStores (containing a list of stores and related info), tblCustomers (containing customers and related info, e.g. ages, names, phone numbers, etc.), and tblPurchases (containing all the purchases at all stores by all customers and related info). You want the minimum age of a customer making a purchase at a specfic store, so you could use a MIN aggregating function. You would want to join (or relate) the tables based on customers and purchases. See my INNER JOINs in the example below. Then you filter the result by the user-inputted store name (inputStoreName) using WHERE; since the inputStoreName is undefined, in Access this would cause the parameter entry popup window to appear.
SELECT list of fields or aggregating functions you want (comma-separated)
FROM list of tables the fields are in (comma-separated) and how to join the tables
WHERE list of conditions to filter the data (separated by AND or OR)
Example:
SELECT tblStores.Name, tblStores.Description, MIN(tblCustomers.age)
FROM tblStores INNER JOIN ( tblPurchases INNER JOIN tblCustomers on tblPurchases.customerID = tblCustomer.customerID) ON tblStores.storeID = tblPurchases.storeID
WHERE (tblStores.Name = inputStoreName);
I recommend checking W3 schools. They are usually helpful for most programming tasks. If you provide more info about your database, we can provide more directed help.

SQL query join different tables based on value in a given column?

Well I am designing a domain model and data mapper system for my site. The system has different kind of user domain objects, with the base class for users with child classes for admins, mods and banned users. Every domain object uses data from a table called 'users', while the child classes have an additional table to store admin/mod/banned information. The user type is determined by a column in the table 'users' called 'userlevel', its value is 3 for admins, 2 for mods, 1 for registered users and -1 for banned users.
Now it comes a problem when I work on a members list feature for my site, since the members list is supposed to load all users from the database(with pagination, but lets not worry about this now). The issue is that I want to load the data from both the base user table and additional admin/mod/banned table. As you see, the registered users do not have additional table to store extra data, while for admin/mod/banned users the table is different. Moreover, the columns in these tables are also different.
So How am I supposed to handle this situation using SQL queries? I know I can simply just select from the base user table and then use multiple queries to load additional data if the user level is found to be a given value, but this is a bad idea since it will results in n+1 queries for n admins/mods/banned users, a very expensive trip to database. What else am I supposed to do? Please help.
If you want to query all usertypes with one query you will have to have the columns from all tables in your result-table, several of them filled with null-values.
To get them filled with data use a left-join like this:
SELECT *
FROM userdata u
LEFT OUTER JOIN admindata a
ON ( u.userid = a.userid
AND u.usertype = 3 )
LEFT OUTER JOIN moddata m
ON ( u.userid = m.userid
AND u.usertype = 2 )
LEFT OUTER JOIN banneddata b
ON ( u.userid = b.userid
AND u.usertype = -1 )
WHERE...
You could probably drop the usertype-condition, since there should only be data in one of the joined tables, but you never know...
Then your program-code will have the job to pick the correct columns based on the usertype.
P.S.: Not that select * is only for sake of simplicity, in real code better list all of the column-names...
While is totally fine having this hierarchy in your domain classes, I would suggest changing the approach in your database. Otherwise your queries are going to be very complex and slow.
You can have just another table called e.g. users_additional_info with the mix of the columns that you need for all your user types. Then you can do
SELECT * FROM users
LEFT JOIN users_additional_info ON users.id = users_additional_info.user_id
to get all the information in a single simple query. Believe me or not, this approach will save you a lots of headaches in the future when your tables start to grow or you decide to add another type of user.

Storing Multiple data type

I want to store user metadata setting for user. but the metadata value is multiple datatype, can be integer, string, date, or boolean.
So I came with my own solution
user(user_id(PK), ...)
meta(
meta_id (PK)
, user_id (FK)
, data_type
, meta_name
, ...)
meta_user(
user_id(FK)
, meta_id(FK)
, number_value
, decimal_value
, string_value
, date_value
, time_value
, boolean_value)
But I'm not sure that's the right way to store multiple data type. I hope some one here can help me share their solution.
UPDATE:
User may have many metadata, and user must first register their metadata.
My contribution to this is:
As you should implement names for the metadata, add a metadefinition table:
meta_definition (metadefinition_ID(PK), name (varchar), datatype)
Then modify your meta_user table
meta_user (meta_ID (PK), user_ID(FK), metadefinition_ID(FK))
You have 3 choices (A,B,C) or even more...
Variant A: Keep your design of storing the values for all possible datatypes in one single row resulting in a sparse table.
This is the easiest to implement but the ugliest in terms of 'clean design' (my opinion).
Variant B: Use distinct value tables per data type: Instead of having one sparse table meta_user you could use 6 tables meta_number, meta_decimal, meta_string. Each table has the form:
meta_XXXX (metadata_ID(PK), meta_ID(FK), value)
Imho, this is the cleanest design (but a bit complicated to work with).
Variant C: reduce meta_user to hold three columns (I renamed it to meta_values, as it hold values and not users).
meta_values (metavalue_ID(PK), meta_ID(FK), value (varchar))
Format all values as string/varchar and stuff them into the value column. This is not well designed and a bad idea if you are going to use the values within SQL as you would have to do expensive and complicated casting in order to use the 'real' values.
This is imho the most compact design.
To list all metadata of a specific user, you can use
select u.name,
md.name as 'AttributeName',
md.DataType
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
selecting the values for a given user would be
Variant A:
select u.name,
md.name as 'AttributeName',
mv.* -- show all different data types
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
join meta_value mv on mv.meta_ID = mu.metaID
Disadvantage: When new datatypes are available, you would have to add a column, recompile the query and change your software as well.
select u.name,
md.name as 'AttributeName',
mnum.value as NumericValue,
mdec.value as DecimalValue
...
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
left join meta_numeric mnum on mnum.meta_ID = mu.metaID
left join meta_decimal mdec on mdec.meta_ID = mu.metaID
...
Disadvantage: Slow if many users and attributes are being stored. Needs a new table when a new datatype is being introduced.
Variant C:
select u.name,
md.name as 'AttributeName',
md.DataType -- client needs this to convert to original datatype
mv.value -- appears formatted as string
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
join meta_value mv on mv.meta_ID = mu.metaID
Advantage: Don't have to change the query in case new datatypes are being introduced.
Every data type in PostgreSQL can be cast to text, which is therefore the natural common ground for data or variable type.
I suggest you have a look at the additional module hstore. I quote the manual:
This module implements the hstore data type for storing sets of
key/value pairs within a single PostgreSQL value. This can be useful
in various scenarios, such as rows with many attributes that are
rarely examined, or semi-structured data. Keys and values are simply
text strings.
That's a fast, proven, versatile solution and easy to extend for more attributes.
In addition you could have a table where you register meta-data (like data type and more) for each attribute. You could use this meta-data to check whether the attribute can be cast to its attached type (and passes additional tests stored in the meta-table) in a trigger on INSERT OR UPDATE to maintain integrity.
Be aware that many standard-features of the RDBMS are not easily available for such a storage regime. Basically you run into most of the problems you have with an EAV model ("entity-attribute-value"). You can find very good advice for that under this related question on dba.SE

Query to get all revisions of an object graph

I'm implementing an audit log on a database, so everything has a CreatedAt and a RemovedAt column. Now I want to be able to list all revisions of an object graph but the best way I can think of for this is to use unions. I need to get every unique CreatedAt and RemovedAt id.
If I'm getting a list of countries with provinces the union looks like this:
SELECT c.CreatedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.CreatedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
UNION
SELECT c.RemovedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.RemovedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
For more complicated queries this could get quite complicated and possibly perform very poorly so I wanted to see if anyone could think of a better approach. This is in MSSQL Server.
I need them all in a single list because this is being used in a from clause and the real data comes from joining on this.
You have most likely already implemented your solution, but to address a few issues; I would suggest considering Aleris's solution, or some derivative thereof.
In your tables, you have a "removed at" field -- well, if that field were active (populated), technically the data shouldn't be there -- or perhaps your implementation has it flagged for deletion, which will break the logging once it is removed.
What happens when you have multiple updates during a reporting period -- the previous log entries would be overwritten.
Having a separate log allows for archival of the log information and allows you to set a different log analysis cycle from your update/edit cycles.
Add whatever "linking" fields required to enable you to get back to your original source data OR make the descriptions sufficiently verbose.
The fields contained in your log are up to you but Aleris's solution is direct. I may create an action table and change the field type from varchar to int, as a link into the action table -- forcing the developers to some standardized actions.
Hope it helps.
An alternative would be to create an audit log that might look like this:
AuditLog table
EntityName varchar(2000),
Action varchar(255),
EntityId int,
OccuranceDate datetime
where EntityName is the name of the table (eg: Contries, Provinces), the Action is the audit action (eg: Created, Removed etc) and the EntityId is the primary key of the modified row in the original table.
The table would need to be kept synchronized on each action performed to the tables. There are a couple of ways to do this:
1) Make triggers on each table that will add rows to AuditTable
2) From your application add rows in AuditTable each time a change is made to the repectivetables
Using this solution is very simple to get a list of logs in audit.
If you need to get columns from original table is also possible using joins like this:
select *
from
Contries C
join AuditLog L on C.Id = L.EntityId and EntityName = 'Contries'
You could probably do it with a cross join and coalesce, but the union is probably still better from a performance standpoint. You can try testing each though.
SELECT
COALESCE(C.CreatedAt, P.CreatedAt)
FROM
dbo.Countries C
FULL OUTER JOIN dbo.Provinces P ON
1 = 0
WHERE
C.LocalID = #Country OR
P.LocalID = #Country

Weird many to many and one to many relationship

I know I'm gonna get down votes, but I have to make sure if this is logical or not.
I have three tables A, B, C. B is a table used to make a many-many relationship between A and C. But the thing is that A and C are also related directly in a 1-many relationship
A customer added the following requirement:
Obtain the information from the Table B inner joining with A and C, and in the same query relate A and C in a one-many relationship
Something like:
alt text http://img247.imageshack.us/img247/7371/74492374sa4.png
I tried doing the query but always got 0 rows back. The customer insists that I can accomplish the requirement, but I doubt it. Any comments?
PS. I didn't have a more descriptive title, any ideas?
UPDATE:
Thanks to rcar, In some cases this can be logical, in order to have a history of all the classes a student has taken (supposing the student can only take one class at a time)
UPDATE:
There is a table for Contacts, a table with the Information of each Contact, and the Relationship table. To get the information of a Contact I have to make a 1:1 relationship with Information, and each contact can have like and an address book with; this is why the many-many relationship is implemented.
The full idea is to obtain the contact's name and his address book.
Now that I got the customer's idea... I'm having trouble with the query, basically I am trying to use the query that jdecuyper wrote, but as he warns, I get no data back
This is a doable scenario. You can join a table twice in a query, usually assigning it a different alias to keep things straight.
For example:
SELECT s.name AS "student name", c1.className AS "student class", c2.className as "class list"
FROM s
JOIN many_to_many mtm ON s.id_student = mtm.id_student
JOIN c c1 ON s.id_class = c1.id_class
JOIN c c2 ON mtm.id_class = c2.id_class
This will give you a list of all students' names and "hardcoded" classes with all their classes from the many_to_many table.
That said, this schema doesn't make logical sense. From what I can gather, you want students to be able to have multiple classes, so the many_to_many table should be where you'd want to find the classes associated with a student. If the id_class entries used in table s are distinct from those in many_to_many (e.g., if s.id_class refers to, say, homeroom class assignments that only appear in that table while many_to_many.id_class refers to classes for credit and excludes homeroom classes), you're going to be better off splitting c into two tables instead.
If that's not the case, I have a hard time understanding why you'd want one class hardwired to the s table.
EDIT: Just saw your comment that this was a made-up schema to give an example. In other cases, this could be a sensible way to do things. For example, if you wanted to keep track of company locations, you might have a Company table, a Locations table, and a Countries table. The Company table might have a 1-many link to Countries where you would keep track of a company's headquarters country, but a many-to-many link through Locations where you keep track of every place the company has a store.
If you can give real information as to what the schema really represents for your client, it might be easier for us to figure out whether it's logical in this case or not.
Perhaps it's a lack of caffeine, but I can't conceive of a legitimate reason for wanting to do this. In the example you gave, you've got students, classes and a table which relates the two. If you think about what you want the query to do, in plain English, surely it has to be driven by either the student table or the class table. i.e.
select all the classes which are attended by student 1245235
select all the students which attend class 101
Can you explain the requirement better? If not, tell your customer to suck it up. Having a relationship between Students and Classes directly (A and C), seems like pure madness, you've already got table B which does that...
Bear in mind that the one-to-many relationship can be represented through the many-to-many, most simply by adding a field there to indicate the type of relationship. Then you could have one "current" record and any number of "history" ones.
Was the customer "requirement" phrased as given, by the way? I think I'd be looking to redefine my relationship with them if so: they should be telling me "what" they want (ideally what, in business domain language, their problem is) and leaving the "how" to me. If they know exactly how the thing should be implemented, then I'd be inclined to open the source code in an editor and leave them to it!
I'm supposing that s.id_class indicates the student's current class, as opposed to classes she has taken in the past.
The solution shown by rcar works, but it repeats the c1.className on every row.
Here's an alternative that doesn't repeat information and it uses one fewer join. You can use an expression to compare s.id_class to the current c.id_class matched via the mtm table.
SELECT s.name, c.className, (s.id_class = c.id_class) AS is_current
FROM s JOIN many_to_many AS mtm ON (s.id_student = mtm.id_student)
JOIN c ON (c.id_class = mtm.id_class);
So is_current will be 1 (true) on one row, and 0 (false) on all the other rows. Or you can output something more informative using a CASE construct:
SELECT s.name, c.className,
CASE WHEN s.id_class = c.id_class THEN 'current' ELSE 'past' END AS is_current
FROM s JOIN many_to_many AS mtm ON (s.id_student = mtm.id_student)
JOIN c ON (c.id_class = mtm.id_class);
It doesn't seem to make sense. A query like:
SELECT * FROM relAC RAC
INNER JOIN tableA A ON A.id_class = RAC.id_class
INNER JOIN tableC C ON C.id_class = RAC.id_class
WHERE A.id_class = B.id_class
could generate a set of data but inconsistent. Or maybe we are missing some important part of the information about the content and the relationships of those 3 tables.
I personally never heard a requirement from a customer that would sound like:
Obtain the information from the Table
B inner joining with A and C, and in
the same query relate A and C in a
one-many relationship
It looks like that it is what you translated the requirement to.
Could you specify the requirement in plain English, as what results your customer wants to get?