I want to try and create a boolean table with the same structure as another table. I know how to create the table but my issue is the updating.
Lets say i have the table A1 with 10 columns with different attributes for a person such as, height, run speed, name, hair colour etc.
I then want to be able to modifiy this table by either removing or adding columns to table A1 and these updates apply to my other column B1 so it has the same columns but a boolean value (the boolean value is not based on A1).
My first question is if it's doable.
My second is: Will the updates be super ineffecient for lets say 200-300 records.
(I could probably create an external program that reads the table and manually removes and adds columns via ADD/DROP sql statements, but i was hoping there was a more dynamic/efficent solution)
What you want, as another answer posted, is EAV schema "entity - attribute - value". This allows you to dynamically add new attributes without changing any physical table schema. It is also horrible for performance (but with only a few hundred entities it shouldn't be too bad).
Another equally ugly solution is to add as many columns as you think you'll ever need, named Attribute_1, Attribute_2, etc. Then you have a lookup table which allows you to map attributes to their definitions.
This is less flexible than the EAV schema, but allows you to index on specific attributes so that your queries are a little more performant.
Another solution would be to use XML data types to hold the attributes and values. SQL Server has built-in functionality for XML data, while it's not as easy to use as normal SQL, it does work quite well.
Rather than add and subtract columns to the table. I would suggest that you have a table with the fixed attributes. Then have another table which stores additional attributes (the names of the columns) then a third table which holds the id of the person, the is off the attribute and the value of the attribute.
For example the user table :
UserId
Firstname
Surname
The attribute table
AttrId
AttrName
The UserAttribute table:
UserId
AttrId
AttrValue
For this to answer your question you could have two sets of these tables but the AttrValue would be boolean for the second table.
An intermediate option is to go for multiple spare columns in the table and use the attribute table to store a column name and a boolean to indicate if the column is in use
Related
Is there a way in Microsoft SQL to reference a specific item of data based on table, column and record?
For example, table A (COL1 INT, COL2 INT) has 2 records (1,2) and (3,4). Can I somehow capture value 4 by reference, rather than as "4"?
The purpose is to allow me to create an audit method that can point to specific value in a (table, column, record) without having to duplicate that value in my audit table (which could be large, therefore bloating my database size).
I am thinking ... just like Object_Id identifies a particular SQL object, so would this reference (some kind of GUID, perhaps?) identify a specific piece of data.
Many thanks in advance.
The answer is No. In MS SQL (and, as far as I know in other popular databases) there are no such references to specific values.
Moreover, even table rows in MS SQL do not have embedded unique identifiers, unless you take care to create an IDENTITY column.
You can yourself make the implementation of such references. For example, create a table with columns
data_id,
table_name,
row_id,
column_name
and fill it up every time you need a reference. Then you can refer to piece of data by data_id.
But this is not a good solution.
in most cases, a single entry in this table will consume more space
than the referenced data value itself
to get the values you still have to use dynamic sql
this will only work for tables that have an IDENTITY column and it
has the same name for all tables
and so on
Abstract
I have one main table with columns named A through Z. I have a dozen other tables that also have strictly letter-named columns (same data types too), just not all columns from A to Z. (e.g. one tables may contain columns A,C,H while another contains columns A,B,C,X,Y,Z etc)
Assuming main table has defaults for all columns A through Z, is there a way to select every other tables' records into it without specifying each column name?
I know it is best to be explicit, but in reality I'm dealing with hundreds of fields, not 26, so I'm hoping there's a better way that's eluded me.
Concrete
I am trying to store annual data in one table (has additional year column). Each year, all existing fields remain from the previous year, but new fields may be added. I have imported each year's data into its own table (25 of them) and would like to consolidate. Most recent year/main table contains hundreds of columns.
I found a similar question here Insert into table from another with different columns count (MySql) but answer lists all columns explicitly in insert.
Understanding that there could be various reasons to do this, I think you would better off be looking at something like table inheritance to provide a unified query interface for previous tables. There are of course difficulties here. But that is probably a better solution if you can. Note the adding new fields would be very different if you go with inheritance.
The next option you would have would be to create an SQL mapping for each table to the archive table. So if you have an additional year int field, you need to fill you could:
CREATE FUNCTION mytable_to_archive(mytable) returns archive language sql as
$$
select $1.*, null::int as year;
$$ IMMUTABLE;
Note for performance reasons, this just takes in a tuple, and returns a tuple. No db lookups. Then you could:
INSERT INTO archive
SELECT (mytable_to_archive(mytable)).* from mytable;
I have a problem with finding a way to represent multiple tables hash tables into a single table.
Say I have 3 tables with the format:
Table1(Table1_PK1,Table1_PK2,Table1_PK3,Table1_Hash)
Table2(Table2_PK1,Table2_PK2,Table2_Hash)
Table3(Table3_Pk1,Table3_PK2,Table3_PK3,Table3_PK4,Table3_PK5,Table3_Hash)
Table1_PK1,Table1_PK2,Table1_PK3... are columns and they might have different datatypes (VARCHAR, INT or DATETIME ...).
My question is if there is a way to create a single table (fixed number of columns) that can represent all of these 3 tables (may be more in practical).
I am trying to do this for my database tool. Each table actual a table which contains primary keys and a hash data associating with them.
Since you're apparently building a database tool, not a database, it might make more sense to do this in application code rather than in a database table.
In a different answer, you commented
I am still looking for a dynamic way to do it without knowing how many primary keys a table can have.
A table can have only one primary key. That primary key can consist of more than one column, though. (You already knew this; you were just using the wrong words, which might confuse others.)
A table can also have an arbitrary number of other keys, which will be either declared (as NOT NULL UNIQUE) or "undeclared" (by creating an index that guarantees uniqueness over a set of columns).
You can look all that stuff up at run time in one or both of two ways. (Links go to documentation for PostgreSQL.)
System tables, sometimes called system catalogs
information_schema views
As far as I know, all modern SQL platforms implement at least one of these interfaces. The information_schema views are covered in the SQL standards, but there seems to be some room for interpretation. They don't look quite the same on all platforms.
Why combine the 3 tables into one? Would be really bad db design. But here's a way to do it:
The one table will have a column for each of the 3 tables' columns you want in the final table. I am making the assumption that TableX_Hash is the same type, so that remains as one unique column:
Table_All_in_One (
Table1_PK1,
Table1_PK2,
Table1_PK3,
# space just for clarity of grouping
Table2_PK1,
Table2_PK2,
Table3_PK1,
Table3_PK2,
Table3_PK3,
Table3_PK4,
Table3_PK5,
TableX_Hash # Assuming all the _Hash'es are the same type+length,
# otherwise, add Table1_Hash, Table2_Hash, Table3_Hash
# This can be your new primary key
)
The Primary Keys (PKx) are required to be non-NULL only in their own tables. For this table, they have to allow nulls. The idea is that each row of this new table will only hold the data for one of the tables. The other columns will be empty for that row. If you want to associate the row of one table with another, you can add that to the same row or add FK_Table1_Hash, FK_Table2_Hash and FK_Table3_Hash columns which will refer to the TableX_Hash value of a record.
PS: I wonder if what you are really looking for is a View and not this really bad all-in-one table.
Edit: Combining them into one "without knowing how many primary keys a table can have." as per your comment:
Store all the _PKs concatenated into one column:
Table_All_in_One (
New_PK,
TableX_Hash,
Table1_PKx, # Concatenated PKs of Table1
Table2_PKx, # Concatenated PKs of Table2, etc.
...,
# OR just one
TableX_PKs, # concatenate all the PK's into one VARCHAR field
# Add a pipe `|` between them optionally.
Table_Num # If using just one, then you'll need to store the table number
)
You will not be able to conveniently pick records based on part of their composite primary key. It will always have to be TableX_PKs = CONCAT_WS('|', Table1_PK1, Table1_PK2, ...). So your only dependency is the number of PKs in the original column.
In order to model a bunch of tables you will need 3 tables. An entity table that contains the table names of the tables you wish to set up this way called a factor or entity table. A Factor_detail table that contains all the columns and their associated properties of the tables. A table, factor_detail_value, for storing things like lookup values for lookup tables. I'm trying to learn more about this myself as well because we are using this technique at work as well. Genrate sql on the fly for any table so encoded, and store the data in a repository pertiinant to the data itself. This way if a table changes and you need to add a column or change a datatype, you can add a row to the factor detail table without affecting a database shut down in production. In most businesses a four hour shut down to make a sql data table change can cost thousands of dollars. If you are dealing with insurance for example, each additional state that you sell insurance in has different requirements for being able to seel it and that will result in table changes. We reduced our table count way down from over 700 tables in this manner also we can make changes without database shut down thus avoiding the loss in revenue.
My application has one table called 'events' and each event has approx 30 standard fields, but also user defined fields that could be any name or type, in an 'eventdata' table. Users can define these event data tables, by specifying x number of fields (either text/double/datetime/boolean) and the names of these fields. This 'eventdata' (table) can be different for each 'event'.
My current approach is to create a lookup table for the definitions. So if i need to query all 'event' and 'eventdata' per record, i do so in a M-D relaitionship using two queries (i.e. select * from events, then for each record in 'events', select * from 'some table').
Is there a better approach to doing this? I have implemented this so far, but most of my queries require two distinct calls to the DB - i cannot simply join my master 'events' table with different 'eventdata' tables for each record in in 'events'.
I guess my main question is: can i join my master table with different detail tables for each record?
E.g.
SELECT E.*, E.Tablename
FROM events E
LEFT JOIN 'E.tablename' T ON E._ID = T.ID
If not, is there a better way to design my database considering i have no idea on how many user defined fields there may be and what type they will be.
There are four ways of handling this.
Add several additional fields named "Custom1", "Custom2", "Custom3", etc. These should have a datatype of varchar(?) or similiar
Add a field to hold the unstructured data (like an XML column).
Create a table of name /value pairs which are associated with some type of template. Let them manage the template. You'll have to use pivot tables or similiar to get the data out.
Use a database like MongoDB or another NoSql style product to store this.
The above said, The first one has the advantage of being fast but limits the number of custom fields to the number you defined. Older main frame type applications work this way. SalesForce CRM used to.
The second option means that each record can have it's own custom fields. However, depending on your database there are definite challenges here. Tried this, don't recommend it.
The third one is generally harder to code for but allows for extreme flexibility. SalesForce and other applications have gone this route; including a couple I'm responsible for. The downside is that Microsoft apparently acquired a patent on doing things this way and is in the process of suing a few companies over it. Personally, I think that's bullcrap; but whatever. Point is, use at your own risk.
The fourth option is interesting. We've played with it a bit and the performance is great while coding is pretty darn simple. This might be your best bet for the unstructured data.
Those type of joins won't work because you will need to pivot the eventdata table to make it columns instead of rows. Therefore it depends on which database technology you are using.
Here is an example with MySQL: How to pivot a MySQL entity-attribute-value schema
My approach would be to avoid using a different table for each event, if that's possible.
I would use something like:
Event (EventId, ..., ...)
EventColumnType (EventColumnTypeId, EventTypeId, ColumnName)
EventColumnData (EventColumnTypeId, Data)
You are them limited to the type of data you can store (everything would have to be strings, for example), but you the number of events and columns are unrestricted.
What I'm getting from your description is you have an event table, and then a separate EventData table for each and every event.
Rather than that, why not have a single EventCustomFields table that contains a foreign key to the event table, a field Name (event+field being the PK) and a field value.
Sure it's not the best. You'd be stuck serializing the value or storing everything as a string. And you'd still be stuck doing two queries, one for the event table and one to get it's custom fields, but at least you wouldn't have a new table for every event in the system (yuck x10)
Another, (arguably worse) option is to serialize the custom fields into a single column of the and then deserialize when you need. So your query would be something like
Select E.*, C.*
From events E, customFields C
Where E.ID = C.ID
Is it possible to just impose a limit on your users? I know the tables underneath Sharepoint 2007 had a bunch of columns for custom data that were just named like CustomString1, CustomDate2, etc. That may end up easier than some of the approaches above, where everything is in one column (though that's an approach I've taken as well), and I would think it would scale up better.
The answer to your main question is: no. You can't have different rows in the result set with different columns. The result set is kind of like a table, so each row has to have the same columns. You can fake it with padding and dummy columns, but that's probably not much better.
You could try defining a fixed event data table, with (say) ten of each type of column. Then you'd store the usage metadata in a separate table and just read that in at system startup. The metadata would tell you that event type "foo" has a field "name" mapped to column string0 in the event data table, a field named "reporter" mapped to column string1, and a field named "reportDate" mapped to column date0. It's ugly and wastes space, but it's reasonably flexible. If you're in charge of the database, you can even define a view on the table so to the client it looks like a "normal" table. If the clients create their own tables and just stick the table name in the event record, then obviously this won't fly.
If you're really hardcore you can write a database procedure to query the table structures and serialize everything to a lilst of key/type/value tuples and return that in one long string as the last column, but that's probably not much handier than what you're doing now.
I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.