I'm copying a subset of some data, so that the copy will be independently modifiable in future.
One of my SQL statements looks something like this (I've changed table and column names):
INSERT Product(
ProductRangeID,
Name, Weight, Price, Color, And, So, On
)
SELECT
#newrangeid AS ProductRangeID,
Name, Weight, Price, Color, And, So, On
FROM Product
WHERE ProductRangeID = #oldrangeid and Color = 'Blue'
That is, we're launching a new product range which initially just consists of all the blue items in some specified current range, under new SKUs. In future we may change the "blue-range" versions of the products independently of the old ones.
I'm pretty new at SQL: is there something clever I should do to avoid listing all those columns, or at least avoid listing them twice? I can live with the current code, but I'd rather not have to come back and modify it if new columns are added to Product. In its current form it would just silently fail to copy the new column if I forget to do that, which should show up in testing but isn't great.
I am copying every column except for the ProductRangeID (which I modify), the ProductID (incrementing primary key) and two DateCreated and timestamp columns (which take their auto-generated values for the new row).
Btw, I suspect I should probably have a separate join table between ProductID and ProductRangeID. I didn't define the tables.
This is in a T-SQL stored procedure on SQL Server 2008, if that makes any difference.
In a SELECT, you can either use * or list all columns and expressions explicitly, there are no partial wildcards.
You could possibly do a SELECT * into a temporary table or a table variable followed by an UPDATE of that table if that is just an one-off query and performance is of no importance.
You can omit the field names for the target table, but generally you shouldn't.
If you don't specify the fields for the target table, you will be relying on the order that they are defined in the table. If someone changes that order, the query will stop working. Or ever worse, it can put the values in the wrong field.
Another aspect is that it's easier to see what the query does, so that you don't have to open the table to see what the fields are to understand the query.
Related
I am developing an application for making quotations. First you make cost break down (or calculation) and upon that result you add item to quotation. The problem is that i have many product, so each category of a product will have its own cost break down form with different parameters to be filled in. If I will have only one table for cost breakdown, then it will be huge (a lot of fields in table). I have a feeling that this is not the right approach. So I came up with diagram below:
Is this solution even possible, or I must have "N" (if I have N-tables) different FK for each cost break down table? Do you have any better solutions?
I have another question if my linking table "Quotation_QtnDetail" is necessary?
It would be possible to store a reference to a particular value in one of these tables by having a CalculationType column indicating which table the record is in, along with a generic reference ID column (containing the ID of the relevant record). For example, if you were storing a CalcId of 123 and a CalculationType of 2, this would point to the record with ID 123 in the Calc2 table.
The downside to doing this is you're going to lose the ability to validate your data using FK constraints, and it will also make joins to your calculation tables a bit more complicated.
Regarding the Quotation_QtnDetail table, unless a QtnDetail record could ever be linked to multiple Quotation records, there is no need for this extra linking table. Instead, just link it directly by adding a QtnId column to the QtnDetail table. Similarly, you may also be able to remove the Calc_QtnItm table if an item is only ever linked to a single calculation record.
I want to try and create a boolean table with the same structure as another table. I know how to create the table but my issue is the updating.
Lets say i have the table A1 with 10 columns with different attributes for a person such as, height, run speed, name, hair colour etc.
I then want to be able to modifiy this table by either removing or adding columns to table A1 and these updates apply to my other column B1 so it has the same columns but a boolean value (the boolean value is not based on A1).
My first question is if it's doable.
My second is: Will the updates be super ineffecient for lets say 200-300 records.
(I could probably create an external program that reads the table and manually removes and adds columns via ADD/DROP sql statements, but i was hoping there was a more dynamic/efficent solution)
What you want, as another answer posted, is EAV schema "entity - attribute - value". This allows you to dynamically add new attributes without changing any physical table schema. It is also horrible for performance (but with only a few hundred entities it shouldn't be too bad).
Another equally ugly solution is to add as many columns as you think you'll ever need, named Attribute_1, Attribute_2, etc. Then you have a lookup table which allows you to map attributes to their definitions.
This is less flexible than the EAV schema, but allows you to index on specific attributes so that your queries are a little more performant.
Another solution would be to use XML data types to hold the attributes and values. SQL Server has built-in functionality for XML data, while it's not as easy to use as normal SQL, it does work quite well.
Rather than add and subtract columns to the table. I would suggest that you have a table with the fixed attributes. Then have another table which stores additional attributes (the names of the columns) then a third table which holds the id of the person, the is off the attribute and the value of the attribute.
For example the user table :
UserId
Firstname
Surname
The attribute table
AttrId
AttrName
The UserAttribute table:
UserId
AttrId
AttrValue
For this to answer your question you could have two sets of these tables but the AttrValue would be boolean for the second table.
An intermediate option is to go for multiple spare columns in the table and use the attribute table to store a column name and a boolean to indicate if the column is in use
I have finished all my changes to a database table in sql server management studio 2012, but now I have a large gap between some values due to editing. Is there a way to keep my data, but re-assign all the ID's from 1 up to my last value?
I would like this cleaned up as I populate dropdownlists with these values and then I make interactions with my database with the assumption that my dropdownlist index and the table's ID match up, which is not the case right now.
My current DB has a large gap from 7 to 28, I would like to shift everything from 28 and up, back down to 8, 9, 10, 11, ect... so that my database has NO gaps from 1 and onward.
If the solution is tricky please give me some steps as I am new to SQL.
Thank you!
Yes, there are any number of ways to "close the gaps" in an auto generated sequence. You say you're new to SQL so I'll assume you're also new to relational concepts. Here is my advice to you: don't do it.
The ID field is a surrogate key. There are several aspects of surrogates one must be mindful of when using them, but the one I want to impress upon you is,
-- A surrogate key is used to make the row unique. Other than the guarantee that
-- the value is unique, no other assumptions may be made concerning the value.
-- In particular, no meaning may be derived from the value as to the contents of
-- the row or the row's relationship to any other row.
You have designed your app with a built-in assumption of the value of the key field (that they will be consecutive). Already it is causing you problems. Do you really want to go through this every time you make changes to the table? And suppose a future feature requires you to filter out some of the choices according to an option the user has selected? Or enable the user to specify the order of the items? Not going to be easy. So what is the solution?
You can create an additional (non-visible) field in the dropdown list that contains the key value. When the user makes a selection, use that index to get the key value of the selection and then go out to the database and get whatever additional data you need. This will work if you populate the list from the entire table or just select a few according to some as yet unknown filtering criteria or change the order in any way.
Viola. You never have this problem again, no matter how often you add and remove rows in the table.
However, on the off chance that you are as stubborn as me (not likely!) or just refuse to listen to the melodious voice of reason and experience, then try this:
Create a new table exactly like the old table, including auto incrementing PK.
Populate the new table using a Select from the old table. You can specify any order you want.
Drop the old table.
Rename the new table to the old table name.
You will have to drop and redefine any FKs from other tables. But this entire process
can be placed in a script because if you do this once, you'll probably do it again.
Now all the values are consecutive. Until you edit the table again...
You should refactor the code for your dropdown list and not the PK of the table.
If you do not agree, you can do one of the following:
Insert another column holding the dropdown's "order of appearance", make a unique index on it and fill this by hand (or programmatically).
Replace the SERIAL with an INT would work, make a unique index on the column and fill this by hand (or programmatically).
Remove the large ids and reseed your serial - the code depending on your DBMS
This happens to me all the time. If you don't have any foreign key constraints then it should be an easy fix.
Remember a DELETE statement will remove the record but keep the identity seed the same. (If I remove id # 5 and #5 was the last record inserted then SQL server still stores the identity seed value at "6").
TRUNCATING the table will reset the identity seed back to it's original value.
INSERT_IDENTITY [TABLE] ON can also be used to insert the correct data in the correct order if tuncating cannot happen.
SELECT *
INTO #tempTable
FROM [TableTryingToFix]
TRUNCATE TABLE [TableTryingToFix];
INSERT INTO [TableTryingToFix] (COL1, COL2, COL3, ETC)
SELECT COL1, COL2, COL2, ETC
FROM #tempTable
ORDER BY oldTableID
In sqlite3, I can force two columns to alias to the same name, as in the following query:
SELECT field_one AS overloaded_name,
field_two AS overloaded_name
FROM my_table;
It returns the following:
overloaded_name overloaded_name
--------------- ---------------
1 2
3 4
... ...
... and so on.
However, if I create a named table using the same syntax, it appends one of the aliases with a :1:
sqlite> CREATE TABLE temp AS
SELECT field_one AS overloaded_name,
field_two AS overloaded_name
FROM my_table;
sqlite> .schema temp
CREATE TABLE temp(
overloaded_name TEXT,
"overloaded_name:1" TEXT
);
I ran the original query just to see if this was possible, and I was surprised that it was allowed. Is there any good reason to do this? Assuming there isn't, why is this allowed at all?
EDIT:
I should clarify: the question is twofold: why is the table creation allowed to succeed, and (more importantly) why is the original select allowed in the first place?
Also, see my clarification above with respect to table creation.
I can force two columns to alias to the same name...
why is [this] allowed in the first place?
This can be attributed to the shackles of compatibility. In the SQL Standards, nothing is ever deprecated. An early version of the Standard allowed the result of a table expression to include columns with duplicate names, probably because an influential vendor had allowed it, possibly due to the inclusion of a bug or the omission of a design feature, and weren't prepared to take the risk of breaking their customers' code (the shackles of compatibility again).
Is there any use to duplicate column names in a table?
In the relational model, every attribute of every relation has a name that is unique within the relevant relation. Just because SQL allows duplicate column names that doesn't mean that as a SQL coder you should utilise such as feature; in fact I'd say you have to vigilant not to invoke this feature in error. I can't think of any good reason to have duplicate column names in a table but I can think of many obvious bad ones. Such a table would not be a relation and that can't be a good thing!
why is the [base] table creation allowed to succeed
Undoubtedly an 'extension' to (a.k.a purposeful violation of) the SQL Standards, I suppose it could be perceived as a reasonable feature: if I attempt to create columns with duplicate names the system automatically disambigutes them by suffixing an ordinal number. In fact, the SQL Standard specifies that there be an implementation dependent way to ensure the result of a table expression does not implicitly have duplicate column names (but as you point out in the question this does not perclude the user from explicitly using duplicate AS clauses). However, I personally think the Standard behaviour of disallowing the duplicate name and raising an error is the correct one. Aside from the above reasons (i.e. that duplicate columns in the same table are of no good use), a SQL script that creates an object without knowing if the system has honoured that name will be error prone.
The table itself can't have duplicate column names because inserting and updating would be messed up. Which column gets the data?
During selects the "duplicates" are just column labels so do not hurt anything.
I assume you're talking about the CREATE TABLE ... AS SELECT command. This looks like an SQL extension to me.
Standard SQL does not allow you to use the same column name for different columns, and SQLite appears to be allowing that in its extension, but working around it. While a simple, naked select statement simply uses as to set the column name, create table ... as select uses it to create a brand new table with those column names.
As an aside, it would be interesting to see what the naked select does when you try to use the duplicated column, such as in an order by clause.
If you were allowed to have multiple columns with the same name, it would be a little difficult for the execution engine to figure out what you meant with:
select overloaded_name from table;
The reason why you can do it in the select is to allow things like:
select id, surname as name from users where surname is not null
union all
select id, firstname as name from users where surname is null
so that you end up with a single name column.
As to whether there's a good reason, SQLite is probably assuming you know what you're doing when you specify the same column name for two different columns. Its essence seems to be to allow a great deal of latitude to the user (as evidenced by the fact that the columns are dynamically typed, for example).
The alternative would be to simply refuse your request, which is what I'd prefer, but the developers of SQLite are probably more liberal (or less anal-retentive) than I :-)
I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.