SQL Direct Reference to Data - sql

Is there a way in Microsoft SQL to reference a specific item of data based on table, column and record?
For example, table A (COL1 INT, COL2 INT) has 2 records (1,2) and (3,4). Can I somehow capture value 4 by reference, rather than as "4"?
The purpose is to allow me to create an audit method that can point to specific value in a (table, column, record) without having to duplicate that value in my audit table (which could be large, therefore bloating my database size).
I am thinking ... just like Object_Id identifies a particular SQL object, so would this reference (some kind of GUID, perhaps?) identify a specific piece of data.
Many thanks in advance.

The answer is No. In MS SQL (and, as far as I know in other popular databases) there are no such references to specific values.
Moreover, even table rows in MS SQL do not have embedded unique identifiers, unless you take care to create an IDENTITY column.
You can yourself make the implementation of such references. For example, create a table with columns
data_id,
table_name,
row_id,
column_name
and fill it up every time you need a reference. Then you can refer to piece of data by data_id.
But this is not a good solution.
in most cases, a single entry in this table will consume more space
than the referenced data value itself
to get the values ​​you still have to use dynamic sql
this will only work for tables that have an IDENTITY column and it
has the same name for all tables
and so on

Related

MS SQL Server, Create and update a table based on another table

I want to try and create a boolean table with the same structure as another table. I know how to create the table but my issue is the updating.
Lets say i have the table A1 with 10 columns with different attributes for a person such as, height, run speed, name, hair colour etc.
I then want to be able to modifiy this table by either removing or adding columns to table A1 and these updates apply to my other column B1 so it has the same columns but a boolean value (the boolean value is not based on A1).
My first question is if it's doable.
My second is: Will the updates be super ineffecient for lets say 200-300 records.
(I could probably create an external program that reads the table and manually removes and adds columns via ADD/DROP sql statements, but i was hoping there was a more dynamic/efficent solution)
What you want, as another answer posted, is EAV schema "entity - attribute - value". This allows you to dynamically add new attributes without changing any physical table schema. It is also horrible for performance (but with only a few hundred entities it shouldn't be too bad).
Another equally ugly solution is to add as many columns as you think you'll ever need, named Attribute_1, Attribute_2, etc. Then you have a lookup table which allows you to map attributes to their definitions.
This is less flexible than the EAV schema, but allows you to index on specific attributes so that your queries are a little more performant.
Another solution would be to use XML data types to hold the attributes and values. SQL Server has built-in functionality for XML data, while it's not as easy to use as normal SQL, it does work quite well.
Rather than add and subtract columns to the table. I would suggest that you have a table with the fixed attributes. Then have another table which stores additional attributes (the names of the columns) then a third table which holds the id of the person, the is off the attribute and the value of the attribute.
For example the user table :
UserId
Firstname
Surname
The attribute table
AttrId
AttrName
The UserAttribute table:
UserId
AttrId
AttrValue
For this to answer your question you could have two sets of these tables but the AttrValue would be boolean for the second table.
An intermediate option is to go for multiple spare columns in the table and use the attribute table to store a column name and a boolean to indicate if the column is in use

Is there any use to duplicate column names in a table?

In sqlite3, I can force two columns to alias to the same name, as in the following query:
SELECT field_one AS overloaded_name,
field_two AS overloaded_name
FROM my_table;
It returns the following:
overloaded_name overloaded_name
--------------- ---------------
1 2
3 4
... ...
... and so on.
However, if I create a named table using the same syntax, it appends one of the aliases with a :1:
sqlite> CREATE TABLE temp AS
SELECT field_one AS overloaded_name,
field_two AS overloaded_name
FROM my_table;
sqlite> .schema temp
CREATE TABLE temp(
overloaded_name TEXT,
"overloaded_name:1" TEXT
);
I ran the original query just to see if this was possible, and I was surprised that it was allowed. Is there any good reason to do this? Assuming there isn't, why is this allowed at all?
EDIT:
I should clarify: the question is twofold: why is the table creation allowed to succeed, and (more importantly) why is the original select allowed in the first place?
Also, see my clarification above with respect to table creation.
I can force two columns to alias to the same name...
why is [this] allowed in the first place?
This can be attributed to the shackles of compatibility. In the SQL Standards, nothing is ever deprecated. An early version of the Standard allowed the result of a table expression to include columns with duplicate names, probably because an influential vendor had allowed it, possibly due to the inclusion of a bug or the omission of a design feature, and weren't prepared to take the risk of breaking their customers' code (the shackles of compatibility again).
Is there any use to duplicate column names in a table?
In the relational model, every attribute of every relation has a name that is unique within the relevant relation. Just because SQL allows duplicate column names that doesn't mean that as a SQL coder you should utilise such as feature; in fact I'd say you have to vigilant not to invoke this feature in error. I can't think of any good reason to have duplicate column names in a table but I can think of many obvious bad ones. Such a table would not be a relation and that can't be a good thing!
why is the [base] table creation allowed to succeed
Undoubtedly an 'extension' to (a.k.a purposeful violation of) the SQL Standards, I suppose it could be perceived as a reasonable feature: if I attempt to create columns with duplicate names the system automatically disambigutes them by suffixing an ordinal number. In fact, the SQL Standard specifies that there be an implementation dependent way to ensure the result of a table expression does not implicitly have duplicate column names (but as you point out in the question this does not perclude the user from explicitly using duplicate AS clauses). However, I personally think the Standard behaviour of disallowing the duplicate name and raising an error is the correct one. Aside from the above reasons (i.e. that duplicate columns in the same table are of no good use), a SQL script that creates an object without knowing if the system has honoured that name will be error prone.
The table itself can't have duplicate column names because inserting and updating would be messed up. Which column gets the data?
During selects the "duplicates" are just column labels so do not hurt anything.
I assume you're talking about the CREATE TABLE ... AS SELECT command. This looks like an SQL extension to me.
Standard SQL does not allow you to use the same column name for different columns, and SQLite appears to be allowing that in its extension, but working around it. While a simple, naked select statement simply uses as to set the column name, create table ... as select uses it to create a brand new table with those column names.
As an aside, it would be interesting to see what the naked select does when you try to use the duplicated column, such as in an order by clause.
If you were allowed to have multiple columns with the same name, it would be a little difficult for the execution engine to figure out what you meant with:
select overloaded_name from table;
The reason why you can do it in the select is to allow things like:
select id, surname as name from users where surname is not null
union all
select id, firstname as name from users where surname is null
so that you end up with a single name column.
As to whether there's a good reason, SQLite is probably assuming you know what you're doing when you specify the same column name for two different columns. Its essence seems to be to allow a great deal of latitude to the user (as evidenced by the fact that the columns are dynamically typed, for example).
The alternative would be to simply refuse your request, which is what I'd prefer, but the developers of SQLite are probably more liberal (or less anal-retentive) than I :-)

Define One to Many Relationships with SQL

I'm looking for a way to set up a one to many relationship between 2 tables. The table structures is explained below but I've tried to leave everything off that has nothing to do with the problem.
Table objects has 1 column called uuid.
Table contents has 3 columns called content, object_uuid and timestamp.
The basic idea is to insert a row into objects and get a new uuid from the database. This uuid is then used stored for every row in contents to associate contents with objects.
Now I'm trying to use the database to enforce that:
Each row in contents references a row in objects (a foreign key should do)
No row in objects exists without at least a row in contents
These constraints should be enforced on commit of transactions.
Ordinary triggers can't help probably because when a row in the objects table is written, there can't be a row in contents yet. Postgres does have so called constraint triggers that can be deferred until the end of the transaction. It would be possible to use those but they seem to be some sort of internal construct not intended for everyday use.
Ideas or solutions should be standard SQL (preferred) or work with Postgres (version does not matter). Thanks for any input.
Your main problem is that other than foreign key constraints; no constraint can reference another table.
Your best bet is to denormalize this a little and have a column on object containing the count of contents that reference it. You can create a trigger to keep this up to date.
contents_count INTEGER NOT NULL DEFAULT 0
This won't be as unbreakable unless you put some user security over who can update this column. But if you keep it up to date with a trigger and all you're looking to avoid is accidental corruption, this should be sufficient.
EDIT: As per the comment, CHECK constraints are not deferrable. This solution would raise an error if all the contents are removed even if the intention is to add more in the same transaction.
Maybe what you want to do is normalize a little bit more. You need a third table, that references elements of the other tables. Table objects should have its own uuid and table contents sholud have also its own uuid and no reference to the table objects. The third table should have only the references to the other two tables, but the primary key is the combination of both references.
so for example you have an uuid of the table objects and you want all the contents of that uuid, assuming that the third table has as columns object_uuid and content_uuid, and the table contents has its own serial column named uuid, your query should be like this:
SELECT * FROM thirdtable,contents
WHERE thirdtable.content_uuid = contents.uuid AND thirdtable.object_uuid=34;
Then you can use an on insert trigger on every table
CREATE TRIGGER my_insert_trigger AFTER INSERT OR UPDATE ON contents
FOR EACH ROW EXECUTE PROCEDURE my_check_function();
and then in function my_check_function() delete every row in objects that is not present in the third table. Somebody else answered first while I was answering, if you guys like my solution I could help you to make the my_check_function() function.

Normalization Help

I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.

identity column in Sql server

Why does Sql server doesn't allow more than one IDENTITY column in a table?? Any specific reasons.
Why would you need it? SQL Server keeps track of a single value (current identity value) for each table with IDENTITY column so it can have just one identity column per table.
An Identity column is a column ( also known as a field ) in a database table that :-
Uniquely identifies every row in the table
Is made up of values generated by the database
This is much like an AutoNumber field in Microsoft Access or a sequence in Oracle.
An identity column differs from a primary key in that its values are managed by the server and ( except in rare cases ) can't be modified. In many cases an identity column is used as a primary key, however this is not always the case.
SQL server uses the identity column as the key value to refer to a particular row. So only a single identity column can be created. Also if no identity columns are explicitly stated, Sql server internally stores a separate column which contains key value for each row. As stated if you want more than one column to be having unique value, you can make use of UNIQUE keyword.
The SQL Server stores the identity in an internal table, using the id of the table as it's key. So it's impossible for the SQL Server to have more than one Identity column per table.
Because MS realized that better than 80% of users would only want one auto-increment column per table and the work-around to have a second (or more) is simple enough i.e. create an IDENTITY with seed = 1, increment = 1 then a calculated column multiplying the auto-generated value by a factor to change the increment and adding an offset to change the seed.
Yes , Sequences allow more than one identity like columns in atable , but there are some issues here . In a typical development scenario i have seen developers manually inserting valid values in a column (which is suppose to be inserted through sequence) . Later on when a sequence try inserting value in to the table , it may fail due to unique key violation.
Also , in a multi developer / multi vendor scenario, developers might use the same sequence for more than one table (as sequences are not linked to tables) . This might lead to missing values in one of the table . ie tableA might get the value 1 while tableB might use value 2 and tableA will get 3. This means that tableA will have 1 and 3 (missing 2).
Apart from this , there is another scenario where you have a table which is truncated every day . Since Sequences are not having any link with table , the truncated table will continue to use the Seq.NextVal again (unless you manually reset the sequence) leading to missing values or even more dangerous arthmetic overflow error after sometime.
Owing to above reason , i feel that both Oracle sequences and SQL server identity column are good for their purposes. I would prefer oracle implementing the concept of Identity column and SQL Server implementing the sequence concept so that developers can implement either of the two as per their requirement.
The whole purpose of an identity column is that it will contain a unique value for each row in the table. So why would you need more than one of them in any given table?
Perhaps you need to clarify your question, if you have a real need for more than one.
An identity column is used to uniquely identify a single row of a table. If you want other columns to be unique, you can create a UNIQUE index for each "identity" column that you may need.
I've always seen this as an arbitrary and bad limitation for SQL Server. Yes, you only want one identity column to actually identify a row, but there are valid reasons why you would want the database to auto-generate a number for more than one field in the database.
That's the nice thing about sequences in Oracle. They're not tied to a table. You can use several different sequences to populate as many fields as you like in the same table. You could also have more than one table share the same sequence, although that's probably a really bad decision. But the point is you could. It's more granular and gives you more flexibility.
The bad thing about sequences is that you have to write code to actually increment them, whether it's in your insert statement or in an on-insert trigger on the table. The nice thing about SQL Server identity is that all you have to do is change a property or add a keyword to your table creation and you're done.