Storing Multiple data type - sql

I want to store user metadata setting for user. but the metadata value is multiple datatype, can be integer, string, date, or boolean.
So I came with my own solution
user(user_id(PK), ...)
meta(
meta_id (PK)
, user_id (FK)
, data_type
, meta_name
, ...)
meta_user(
user_id(FK)
, meta_id(FK)
, number_value
, decimal_value
, string_value
, date_value
, time_value
, boolean_value)
But I'm not sure that's the right way to store multiple data type. I hope some one here can help me share their solution.
UPDATE:
User may have many metadata, and user must first register their metadata.

My contribution to this is:
As you should implement names for the metadata, add a metadefinition table:
meta_definition (metadefinition_ID(PK), name (varchar), datatype)
Then modify your meta_user table
meta_user (meta_ID (PK), user_ID(FK), metadefinition_ID(FK))
You have 3 choices (A,B,C) or even more...
Variant A: Keep your design of storing the values for all possible datatypes in one single row resulting in a sparse table.
This is the easiest to implement but the ugliest in terms of 'clean design' (my opinion).
Variant B: Use distinct value tables per data type: Instead of having one sparse table meta_user you could use 6 tables meta_number, meta_decimal, meta_string. Each table has the form:
meta_XXXX (metadata_ID(PK), meta_ID(FK), value)
Imho, this is the cleanest design (but a bit complicated to work with).
Variant C: reduce meta_user to hold three columns (I renamed it to meta_values, as it hold values and not users).
meta_values (metavalue_ID(PK), meta_ID(FK), value (varchar))
Format all values as string/varchar and stuff them into the value column. This is not well designed and a bad idea if you are going to use the values within SQL as you would have to do expensive and complicated casting in order to use the 'real' values.
This is imho the most compact design.
To list all metadata of a specific user, you can use
select u.name,
md.name as 'AttributeName',
md.DataType
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
selecting the values for a given user would be
Variant A:
select u.name,
md.name as 'AttributeName',
mv.* -- show all different data types
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
join meta_value mv on mv.meta_ID = mu.metaID
Disadvantage: When new datatypes are available, you would have to add a column, recompile the query and change your software as well.
select u.name,
md.name as 'AttributeName',
mnum.value as NumericValue,
mdec.value as DecimalValue
...
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
left join meta_numeric mnum on mnum.meta_ID = mu.metaID
left join meta_decimal mdec on mdec.meta_ID = mu.metaID
...
Disadvantage: Slow if many users and attributes are being stored. Needs a new table when a new datatype is being introduced.
Variant C:
select u.name,
md.name as 'AttributeName',
md.DataType -- client needs this to convert to original datatype
mv.value -- appears formatted as string
from user u
join meta_user mu on u.user_ID = mu.userID
join meta_definition md on md.metadefinition_ID = mu. metadefinition_ID
join meta_value mv on mv.meta_ID = mu.metaID
Advantage: Don't have to change the query in case new datatypes are being introduced.

Every data type in PostgreSQL can be cast to text, which is therefore the natural common ground for data or variable type.
I suggest you have a look at the additional module hstore. I quote the manual:
This module implements the hstore data type for storing sets of
key/value pairs within a single PostgreSQL value. This can be useful
in various scenarios, such as rows with many attributes that are
rarely examined, or semi-structured data. Keys and values are simply
text strings.
That's a fast, proven, versatile solution and easy to extend for more attributes.
In addition you could have a table where you register meta-data (like data type and more) for each attribute. You could use this meta-data to check whether the attribute can be cast to its attached type (and passes additional tests stored in the meta-table) in a trigger on INSERT OR UPDATE to maintain integrity.
Be aware that many standard-features of the RDBMS are not easily available for such a storage regime. Basically you run into most of the problems you have with an EAV model ("entity-attribute-value"). You can find very good advice for that under this related question on dba.SE

Related

How can I remove the repetitive joins from this SELECT query?

I have a postgres database with two tables: services and meta.
The first table stores the "core" information the application needs, and the app also has a "custom field" feature implemented similar to how Wordpress's wp_post_meta table works.
Users can add on meta rows with arbitrary keys and values, in a one-to-many relationship with the service.
The schema of the meta table is:
id
key (string)
value (string)
service_id (foreign key)
That works great for the app, so I'm not interested in changing the schema, but for some infrequently used admin dashboards I need to get back a list of services with several of the meta rows joined on as columns.
Here's what I have so far:
SELECT
services.*,
meta1.value AS funding,
meta2.value AS ownership
FROM services
JOIN meta meta1
ON services.id = meta.service_id
AND meta.key = 'Funding'
JOIN meta meta2
ON services.id = meta2.service_id
AND meta2.key = 'Ownership'
Now, this works great, but I have to do another join every time I want to add another meta value.
That seems like it will slow down the query and make it less readable.
Is there a good way to refactor this to keep it easy to read and fast to run?
here's an attempted refactor using OR, which doesn't work:
SELECT
*,
meta.value AS funding,
meta.value AS ownership
FROM services
JOIN meta
ON services.id = meta.service_id
AND meta.key = 'Funding' OR meta.key = 'Ownership'
One way would be to aggregate the key/value pairs into a JSON value with a derived table:
select srv.*,
mv.vals ->> 'Funding' as funding,
mv.vals ->> 'Ownership' as ownership
from services srv
cross join lateral (
select jsonb_object_agg(m.key, m.value) as vals
from meta m
where m.key in ('Funding', 'Ownership')
and m.service_id = srv.id
) as mv
If your application can handle the JSON, then maybe the conversion into two separate columns isn't actually necessary which would avoid the repetition of the keys.
You can use conditional aggregation:
SELECT s.*, m.funding, m.ownership
FROM services s JOIN
(SELECT m.service_id,
MAX(value) FILTER (WHERE key = 'Funding') as Funding,
MAX(value) FILTER (WHERE key = 'Ownership') as Ownership
FROM meta m
GROUP BY m.service_id
) m
ON m.service_id = s.id

Subquery that matches column with several ranges defined in table

I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.
All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.
As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.
My mapping date is in a table like this:
CREATE TABLE zip2state (
state TEXT NOT NULL,
range_start CHARACTER VARYING(5) NOT NULL,
range_end CHARACTER VARYING(5) NOT NULL
)
Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.
Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level).
Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)
Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...
For completeness' sake, here's the subquery for the zip codes:
(select array_agg(distinct address.zipcode)
from affiliation
join company
on affiliation.ins_id = company.id
join address
on address.com_id = company.id
where affiliation.per_id = person.id) AS zipcodes,
I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:
SELECT p.*, z.*
FROM person p
LEFT JOIN LATERAL (
SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
, array_agg(DISTINCT z.state) AS states
FROM affiliation a
-- JOIN company c ON a.ins_id = c.id -- suspect you don't need this
JOIN address d ON d.com_id = a.ins_id -- c.id
LEFT JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
WHERE a.per_id = p.id
) z ON true;
If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.
Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

SQL query join different tables based on value in a given column?

Well I am designing a domain model and data mapper system for my site. The system has different kind of user domain objects, with the base class for users with child classes for admins, mods and banned users. Every domain object uses data from a table called 'users', while the child classes have an additional table to store admin/mod/banned information. The user type is determined by a column in the table 'users' called 'userlevel', its value is 3 for admins, 2 for mods, 1 for registered users and -1 for banned users.
Now it comes a problem when I work on a members list feature for my site, since the members list is supposed to load all users from the database(with pagination, but lets not worry about this now). The issue is that I want to load the data from both the base user table and additional admin/mod/banned table. As you see, the registered users do not have additional table to store extra data, while for admin/mod/banned users the table is different. Moreover, the columns in these tables are also different.
So How am I supposed to handle this situation using SQL queries? I know I can simply just select from the base user table and then use multiple queries to load additional data if the user level is found to be a given value, but this is a bad idea since it will results in n+1 queries for n admins/mods/banned users, a very expensive trip to database. What else am I supposed to do? Please help.
If you want to query all usertypes with one query you will have to have the columns from all tables in your result-table, several of them filled with null-values.
To get them filled with data use a left-join like this:
SELECT *
FROM userdata u
LEFT OUTER JOIN admindata a
ON ( u.userid = a.userid
AND u.usertype = 3 )
LEFT OUTER JOIN moddata m
ON ( u.userid = m.userid
AND u.usertype = 2 )
LEFT OUTER JOIN banneddata b
ON ( u.userid = b.userid
AND u.usertype = -1 )
WHERE...
You could probably drop the usertype-condition, since there should only be data in one of the joined tables, but you never know...
Then your program-code will have the job to pick the correct columns based on the usertype.
P.S.: Not that select * is only for sake of simplicity, in real code better list all of the column-names...
While is totally fine having this hierarchy in your domain classes, I would suggest changing the approach in your database. Otherwise your queries are going to be very complex and slow.
You can have just another table called e.g. users_additional_info with the mix of the columns that you need for all your user types. Then you can do
SELECT * FROM users
LEFT JOIN users_additional_info ON users.id = users_additional_info.user_id
to get all the information in a single simple query. Believe me or not, this approach will save you a lots of headaches in the future when your tables start to grow or you decide to add another type of user.

SQL database problems with addressbook table design

I am writing a addressbook module for my software right now. I have the database set up so far that it supports a very flexible address-book configuration.
I can create n-entries for every type I want. Type means here data like 'email', 'address', 'telephone' etc.
I have a table named 'contact_profiles'.
This only has two columns:
id Primary key
date_created DATETIME
And then there is a table called contact_attributes. This one is a little more complex:
id PK
#profile (Foreign key to contact_profiles.id)
type VARCHAR describing the type of the entry (name, email, phone, fax, website, ...) I should probably change this to a SET later.
value Text (containing the value for the attribute).
I can now link to these profiles, for example from my user's table. But from here I run into problems.
At the moment I would have to create a JOIN for each value that I want to retrieve.
Is there a possibility to somehow create a View, that gives me a result with the type's as columns?
So right now I would get something like
#profile type value
1 email name#domain.tld
1 name Sebastian Hoitz
1 website domain.tld
But it would be nice to get a result like this:
#profile email name website
1 name#domain.tld Sebastian Hoitz domain.tld
The reason I do not want to create the table layout like this initially is, that there might always be things to add and I want to be able to have multiple attributes of the same type.
So do you know if there is any possibility to convert this dynamically?
If you need a better description please let me know.
You have reinvented a database design called Entity-Attribute-Value. This design has a lot of weaknesses, including the weakness you've discovered: it's very hard to reproduce a query result in a conventional format, with one column per attribute.
Here's an example of what you must do:
SELECT c.id, c.date_created,
c1.value AS name,
c2.value AS email,
c3.value AS phone,
c4.value AS fax,
c5.value AS website
FROM contact_profiles c
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'name')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'email')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'phone')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'fax')
LEFT OUTER JOIN contact_attributes c1
ON (c.id = c1.profile AND c1.type = 'website');
You must add another LEFT OUTER JOIN for every attribute. You must know the attributes at the time you write the query. You must use LEFT OUTER JOIN and not INNER JOIN because there's no way to make an attribute mandatory (the equivalent of simply declaring a column NOT NULL).
It's far more efficient to retrieve the attributes as they are stored, and then write application code to loop through the result set, building an object or associative array with an entry for each attribute. You don't need to know all the attributes this way, and you don't have to execute an n-way join.
SELECT * FROM contact_profiles c
LEFT OUTER JOIN contact_attributes ca ON (c.id = ca.profile);
You asked in a comment what to do if you need this level of flexibility, if not use the EAV design? SQL is not the correct solution if you truly need unlimited metadata flexibility. Here are some alternatives:
Store a TEXT BLOB, containing all the attributes structured in XML or YAML format.
Use a semantic data modeling solution like Sesame, in which any entity can have dynamic attributes.
Abandon databases and use flat files.
EAV and any of these alternative solutions is a lot of work. You should consider very carefully if you truly need this degree of flexibility in your data model, because it's hugely more simple if you can treat the metadata structure as relatively unchanging.
If you are limiting yourself to displaying a single email, name, website, etc. for each person in this query, I'd use subqueries:
SELECT cp.ID profile
,cp.Name
,(SELECT value FROM contact_attributes WHERE type = 'email' and profile = cp.id) email
,(SELECT value FROM contact_attributes WHERE type = 'website' and profile = cp.id) website
,(SELECT value FROM contact_attributes WHERE type = 'phone' and profile = cp.id) phone
FROM contact_profiles cp
If you're using SQL Server, you could also look at PIVOT.
If you want to show multiple emails, phones, etc., then consider that each profile must have the same number of them or you'll have blanks.
I'd also factor out the type column. Create a table called contact_attribute_types which would hold "email", "website", etc. Then you'd store the contact_attribute_types.id integer value in the contact_attributes table.
You will need to generate a query like:
select #profile,
max(case when type='email' then value end) as email,
max(case when type='name' then value end) as name,
max(case when type='website' then value end) as website
from mytable
group by #profile
However, that will only show one value for each type per #profile. Your DBMS may have a function you can use instead of MAX to concatenate all the values as a comma-separated string, or you may be able to write one.
This kind of data model is generally best avoided for the reasons you have already mentioned!
You create a view for each contact type
When you want all the information you pull from the entire table, when you want a subset of a specific contact type, you pull from the view.
I'd create a stored procedure that takes the intent {all, phone, email, address} as one of the parameters and then derive the data. All my app code would call this stored procedure to get the data. Also, when a new type is added (which should be very infrequently, you create another view and modify only this sproc).
I've implemented a similar design for multiple small/med size systems and have had no issues.
Am I missing something? This seems trivial?
EDIT:
I see what I was missing... You are trying to be normalized and denormalized at the same time. I'm not sure what the rest of your business rules are for pulling records. You could have profiles with multiple or null values for phone/email/addresses etc. I would keep your data format the same and again use a sproc to create the specific view you want. As your business needs change, you leave your data alone and just create another sproc to access it.
There is no one right answer for this question, as one would need to know, for your specific organization or application, how many of those contact methods the business wants to collect, how current they want the information to be, and how much flexibility they are willing to invest in.
Of course, many of here could make some good guesses as to what the average business would want to do, but the real answer is to find out what your project, what your users, are interested in.
BTW, all architecture questions about "best"-ness require this sort of cost, benefit, and risk analysis.
Now that the approach of document-oriented databases is getting more and more popular, one could use one of them to store all this information in one entry - and therefor deleting all those extra joins and queries.

Query to get all revisions of an object graph

I'm implementing an audit log on a database, so everything has a CreatedAt and a RemovedAt column. Now I want to be able to list all revisions of an object graph but the best way I can think of for this is to use unions. I need to get every unique CreatedAt and RemovedAt id.
If I'm getting a list of countries with provinces the union looks like this:
SELECT c.CreatedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.CreatedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
UNION
SELECT c.RemovedAt AS RevisionId from Countries as c where localId=#Country
UNION
SELECT p.RemovedAt AS RevisionId from Provinces as p
INNER JOIN Countries as c ON p.CountryId=c.LocalId AND c.LocalId = #Country
For more complicated queries this could get quite complicated and possibly perform very poorly so I wanted to see if anyone could think of a better approach. This is in MSSQL Server.
I need them all in a single list because this is being used in a from clause and the real data comes from joining on this.
You have most likely already implemented your solution, but to address a few issues; I would suggest considering Aleris's solution, or some derivative thereof.
In your tables, you have a "removed at" field -- well, if that field were active (populated), technically the data shouldn't be there -- or perhaps your implementation has it flagged for deletion, which will break the logging once it is removed.
What happens when you have multiple updates during a reporting period -- the previous log entries would be overwritten.
Having a separate log allows for archival of the log information and allows you to set a different log analysis cycle from your update/edit cycles.
Add whatever "linking" fields required to enable you to get back to your original source data OR make the descriptions sufficiently verbose.
The fields contained in your log are up to you but Aleris's solution is direct. I may create an action table and change the field type from varchar to int, as a link into the action table -- forcing the developers to some standardized actions.
Hope it helps.
An alternative would be to create an audit log that might look like this:
AuditLog table
EntityName varchar(2000),
Action varchar(255),
EntityId int,
OccuranceDate datetime
where EntityName is the name of the table (eg: Contries, Provinces), the Action is the audit action (eg: Created, Removed etc) and the EntityId is the primary key of the modified row in the original table.
The table would need to be kept synchronized on each action performed to the tables. There are a couple of ways to do this:
1) Make triggers on each table that will add rows to AuditTable
2) From your application add rows in AuditTable each time a change is made to the repectivetables
Using this solution is very simple to get a list of logs in audit.
If you need to get columns from original table is also possible using joins like this:
select *
from
Contries C
join AuditLog L on C.Id = L.EntityId and EntityName = 'Contries'
You could probably do it with a cross join and coalesce, but the union is probably still better from a performance standpoint. You can try testing each though.
SELECT
COALESCE(C.CreatedAt, P.CreatedAt)
FROM
dbo.Countries C
FULL OUTER JOIN dbo.Provinces P ON
1 = 0
WHERE
C.LocalID = #Country OR
P.LocalID = #Country