Was looking through the beloved W3schools and found this page and actually learned something interesting. I didn't know you could call an insert command without specifying columns to values. For example;
INSERT INTO table_name
VALUES (value1, value2, value3,...)
Pulling from my hazy memory, I seem to remember the SQL prof mentioning that you have to treat fields as if they are not in any particular order (although there is on the RDB side, but it's not guaranteed).
My question is, how does the server know which values get assigned to which fields?* I would test this myself, but am not going to use a production server to do which is all I a have access to at the moment.
If this technology specific, I am working on PostgresSQL. How is this particular syntax even useful?
Your prof was right - you should name the columns explicitly before naming the values.
In this case though the values will be inserted in the order that they appear in the table definition.
The problem with this is that if that order changes, or columns are removed or added (even if they are nullable), then the insert will break.
In terms of its usefulness, not that much in production code. If you're hand coding a quick insert then it might just help save you typing all the column names out.
They get inserted into the fields in the order they are in the table definition.
So if your table has fields (a,b,c), a=value1, b=value2, c=value3.
Your professor was right, this is lazy, and liable to break. But useful for a quick and dirty lazy insert.
I cannot resist to put a "RTFM" here.
The PostgreSQL manual details what happens in the chapter on INSERT:
The target column names can be listed in any order. If no list of
column names is given at all, the default is all the columns of the
table in their declared order; or the first N column names, if there
are only N columns supplied by the VALUES clause or query. The values
supplied by the VALUES clause or query are associated with the
explicit or implicit column list left-to-right.
Bold emphasis mine.
The values are just added in the same order as the columns appear in the table. It's useful in situations where you don't know the names of the columns you're working with but know what data needs to go in. It's generally not a good idea to do this though as it of course breaks if the order of the columns change or new columns are inserted in the middle.
That syntax only works without specifying columns if and only if you provide with the same number of values as the number of columns. The second more important thing is columns in a sql table are always in the same order and that depends on your table definition. The only thing that has no innate order in a sql table are rows.
when the Table is created , each column will have an order number in the system table. So each value would be inserted as per the order..
Firt value will go to first column ... an so on
In sql server , system table syscolumn maintains this order. Postgresql should have something similar to this..
Related
In Oracle SQL Developer, is it possible to alias multiple column names as part of a SELECT statement using an expression (as opposed to manually specifying the aliases for each column)?
Specifically, I have a mapping table that stores the task-relevant subset of columns from a large data table. Each entry in the mapping table ties a data table column name to a human readable description. I want to select the data table columns listed in the mapping table and display them with the mapping table descriptions as the column headers, but WITHOUT manually typing in the column names and their human-readable aliases one-by-one. Is this possible?
The closest I've found to an answer online is this SO question which suggests what I want to do is NOT possible: Oracle rename columns from select automatically?
But, that question is from 2010. I'm hoping the situation has changed. Thank you for your help.
This still cannot be done with 100% native SQL. These overly-dynamic situations are usually best avoided; a little extra typing is generally better than adding
complicated code.
If you truly have an exceptional case and are willing to pay the price there is a way to do this. It doesn't use 100% natural SQL, but it could be considered "pure" SQL since it uses the Oracle Data Cartridge framework to extend the database.
You can use my open source project Method4 to run dynamic SQL in SQL. Follow the Github steps to download and install the objects. The code is painfully complicated but luckily you won't need to understand most of it. Only the simple changes below are necessary to get started on customizing column names.
Method4 Changes
Create a variable to hold the new column name. Add it to the declaration section of the function ODCITableDescribe, on line 12 of the file METHOD4_OT.TPB.
v_new_column_name varchar2(32);
Create a SQL statement to map the old column to the new column. Add this to line 31, where it will be run for each column.
--Get mapped column name if it exists. If none, use the existing name.
select nvl(max(target_column_name), r_sql.description(i).col_name)
into v_new_column_name
from column_names
where source_column_name = r_sql.description(i).col_name;
Change line 42 to refer to the new variable name:
substr(v_new_column_name, 1, 30),
Mapping Table
drop table column_names;
create table column_names
(
source_column_name varchar2(30),
target_column_name varchar2(30),
constraint column_names_pk primary key(source_column_name)
);
insert into column_names values('A1234', 'BETTER_COLUMN_NAME');
insert into column_names values('B4321', 'Case sensitive column name.');
Query Example
Now the column names from any query can magically change to whatever values you want. And this doesn't simply use text replacement; the columns from a * will also change.
SQL> select * from table(method4.query('select 1 a1234, 2 b4321, 3 c from dual'));
BETTER_COLUMN_NAME Case sensitive column name. C
------------------ --------------------------- ----------
1 2 3
Warnings
Oracle SQL is horrendously complicated and any attempt to build a layer on top if it has many potential problems. For example, performance will certainly be slower. Although I've created many unit tests I'm sure there are some weird data types that won't work correctly. (But if you find any, please create a Github issue so I can fix it.)
In my experience, when people ask for this type of dynamic behavior, it's usually not worth the cost. Sometimes a little extra typing is the best solution.
I was told at my last position never to do this; it wasn't explained why. I understand it enhances the opportunity for mistakes, but if there are just a couple of columns and I'm confident of their order, why can't I shorthand it and just insert the values in the right order without explicitly matching up the column names? Is there a large performance difference? If so, does it matter on a small scale?
If there's no performance hit and this isn't a query that will be saved for others to view, why shouldn't I?
Thanks in advance.
This is acceptable only when you type your query by hand into an interactive DB tool. When your SQL statement is executed by your program, you cannot be absolutely confident about the order of columns in a table, unless you are the only developer who has access to your database. In other words, in any team environment there is an opportunity that someone would break your query simply by re-ordering columns in your database. Logically, your table would remain the same, but your program would still break.
Devil's advocate: if there are only a couple of columns, what does short-handing it gain you? You saved a few keystrokes, big deal.
For ad hoc queries you're writing once and throwing away, it's not a big deal. I suspect you were told to never do this in production code or anything anyone else would later have to reverse engineer (either simply to understand it, or to account for underlying schema changes). Remember that code that you write may only ever be viewed and maintained by you right now but you should be writing code with the intention that it will outlast you.
Another reason including the column list is good is if you later want to search for all references to a specific column name in your data model...
The reason is to make the code more robust.
Specifying the fields makes the code less dependant on that the table layout stays exactly the same, and also gives you the ability to add fields to the table without the need to change the code as long as you provide default values for the new field.
It also makes it easier to see that the query is supposed to do, without the need to look up the table layout to see where the data will end up.
In most development shops, you will not be the only developer working on a given project. In which case, you run the risk of unintended inserts. You may be confident now that there won't be any schema changes and things can change... now or after you're gone.
Also, if a column is added, you inserts will fail.
Its mostly for readability so if its just you its fine but if you do muck up the order it can be a hard one to explain if people find out you were not using good practices, especially if someone modifies the structure while you were writing your query... Also if you have complex things to do in a column it can help visualisation and if as you say it is small then adding a few column names is hardly going to take long so you may as well.
Because things change and if you code that query in your program, thinking that the column order will never change and then comes along someone else that decides to alter the table adding a new column your query will fail.
Also, there's a quick trick if you don't want to type column names out. In management studio, you can set it up so that your result set returns in a CSV (with column headers). CTRL-T.
Then do a quick,
select top 1 * from <tablename>
and copy and paste the column list from the resultset window.
A relation has no left-to-right ordering of columns. The same cannot be said of a SQL table, not a good thing. However, just because SQL has non-relational features doesn't mean you have to use them! Specifying a column list as part of a table or row value constructor is one way of mitigating against SQL's column ordering.
Consider that the following SQL statements are semantically equivalent:
INSERT INTO Table1 (col1, col2, col3) VALUES (1, 2, 3);
INSERT INTO Table1 (col2, col1, col3) VALUES (2, 1, 3);
INSERT INTO Table1 (col3, col2, col1) VALUES (3, 2, 1);
I've got and sql express database I need to extract some data from. I have three fields. ID,NAME,DATE. In the DATA column there is values like "654;654;526". Yes, semicolons includes. Now those number relate to another table(two - field ID and NAME). The numbers in the DATA column relate to the ID field in the 2nd table. How can I via sql do a replace or lookup so instead of getting the number 654;653;526 I get the NAME field instead.....
See the photo. Might explain this better
http://i.stack.imgur.com/g1OCj.jpg
Redesign the database unless this is a third party database you are supporting. This will never be a good design and should never have been built this way. This is one of those times you bite the bullet and fix it before things get worse which they will. Yeu need a related table to store the values in. One of the very first rules of database design is never store more than one piece of information in a field.
And hopefully those aren't your real field names, they are atriocious too. You need more descriptive field names.
Since it a third party database, you need to look up the split function or create your own. You will want to transform the data to a relational form in a temp table or table varaiable to use in the join later.
The following may help: How to use GROUP BY to concatenate strings in SQL Server?
This can be done, but it won't be nice. You should create a scalar valued function, that takes in the string with id's and returns a string with names.
This denormalized structure is similar to the way values were stored in the quasi-object-relational database known as PICK. Cool database, in many respects ahead of its time, though in other respects, a dinosaur.
If you want to return the multiple names as a delimited string, it's easy to do with a scalar function. If you want to return the multiple rows as a table, your engine has to support functions that return a type of TABLE.
I'm reading CJ Date's SQL and Relational Theory: How to Write Accurate SQL Code, and he makes the case that positional queries are bad — for example, this INSERT:
INSERT INTO t VALUES (1, 2, 3)
Instead, you should use attribute-based queries like this:
INSERT INTO t (one, two, three) VALUES (1, 2, 3)
Now, I understand that the first query is out of line with the relational model since tuples (rows) are unordered sets of attributes (columns). I'm having trouble understanding where the harm is in the first query. Can someone explain this to me?
The first query breaks pretty much any time the table schema changes. The second query accomodates any schema change that leaves its columns intact and doesn't add defaultless columns.
People who do SELECT * queries and then rely on positional notation for extracting the values they're concerned about are software maintenance supervillains for the same reason.
While the order of columns is defined in the schema, it should generally not be regarded as important because it's not conceptually important.
Also, it means that anyone reading the first version has to consult the schema to find out what the values are meant to mean. Admittedly this is just like using positional arguments in most programming languages, but somehow SQL feels slightly different in this respect - I'd certainly understand the second version much more easily (assuming the column names are sensible).
I don't really care about theoretical concepts in this regard (as in practice, a table does have a defined column order). The primary reason I would prefer the second one to the first is an added layer of abstraction. You can modify columns in a table without screwing up your queries.
You should try to make your SQL queries depend on the exact layout of the table as little as possible.
The first query relies on the table only having three fields, and in that exact order. Any change at all to the table will break the query.
The second query only relies on there being those three felds in the table, and the order of the fields is irrelevant. You can change the order of fields in the table without breaking the query, and you can even add fields as long as they allow null values or has a default value.
Although you don't rearrange the table layout very often, adding more fields to a table is quite common.
Also, the second query is more readable. You can tell from the query itself what the values put in the record means.
Something that hasn't been mentioned yet is that you will often be having a surrogate key as your PK, with auto_increment (or something similar) to assign a value. With the first one, you'd have to specify something there — but what value can you specify if it isn't to be used? NULL might be an option, but that doesn't really fit in considering the PK would be set to NOT NULL.
But apart from that, the whole "locked to a specific schema" is a much more important reason, IMO.
SQL gives you syntax for specifying the name of the column for both INSERT and SELECT statements. You should use this because:
Your queries are stable to changes in the column ordering, so that maintenance takes less work.
The column ordering maps better to how people think, so it's more readable. It's more clear to think of a column as the "Name" column rather than the 2nd column.
I prefer to use the UPDATE-like syntax:
INSERT t SET one = 1 , two = 2 , three = 3
Which is far easier to read and maintain than both the examples.
Long term, if you add one more column to your table, your INSERT will not work unless you explicitly specify list of columns. If someone changes the order of columns, your INSERT may silently succeed inserting values into wrong columns.
I'm going to add one more thing, the second query is less prone to error orginally even before tables are changed. Why do I say that? Becasue with the seocnd form you can (and should when you write the query) visually check to see if the columns in the insert table and the data in the values clause or select clause are in fact in the right order to begin with. Otherwise you may end up putting the Social Security Number in the Honoraria field by accident and paying speakers their SSN instead of the amount they should make for a speech (example not chosen at random, except we did catch it before it actually happened thanks to that visual check!).
I have a statement that looks something like this:
MERGE INTO someTable st
USING
(
SELECT id,field1,field2,etc FROM otherTable
) ot on st.field1=ot.field1
WHEN NOT MATCHED THEN
INSERT (field1,field2,etc)
VALUES (ot.field1,ot.field2,ot.etc)
where otherTable has an autoincrementing id field.
I would like the insertion into someTable to be in the same order as the id field of otherTable, such that the order of ids is preserved when the non-matching fields are inserted.
A quick look at the docs would appear to suggest that there is no feature to support this.
Is this possible, or is there another way to do the insertion that would fulfil my requirements?
EDIT: One approach to this would be to add an additional field to someTable that captures the ordering. I'd rather not do this if possible.
... upon reflection the approach above seems like the way to go.
I cannot speak to what the Questioner is asking for here because it doesn't make any sense.
So let's assume a different problem:
Let's say, instead, that I have a Heap-Table with no Identity-Field, but it does have a "Visited" Date field.
The Heap-Table logs Person WebPage Visits and I'm loading it into my Data Warehouse.
In this Data Warehouse I'd like to use the Surrogate-Key "WebHitID" to reference these relationships.
Let's use Merge to do the initial load of the table, then continue calling it to keep the tables in sync.
I know that if I'm inserting records into an table, then I'd prefer the ID's (that are being generated by an Identify-Field) to be sequential based on whatever Order-By I choose (let's say the "Visited" Date).
It is not uncommon to expect an Integer-ID to correlate to when it was created relative to the rest of the records in the table.
I know this is not always 100% the case, but humor me for a moment.
This is possible with Merge.
Using (what feels like a hack) TOP will allow for Sorting in our Insert:
MERGE DW.dbo.WebHit AS Target --This table as an Identity Field called WebHitID.
USING
(
SELECT TOP 9223372036854775807 --Biggest BigInt (to be safe).
PWV.PersonID, PWV.WebPageID, PWV.Visited
FROM ProdDB.dbo.Person_WebPage_Visit AS PWV
ORDER BY PWV.Visited --Works only with TOP when inside a MERGE statement.
) AS Source
ON Source.PersonID = Target.PersonID
AND Source.WebPageID = Target.WebPageID
AND Source.Visited = Target.Visited
WHEN NOT MATCHED BY Target THEN --Not in Target-Table, but in Source-Table.
INSERT (PersonID, WebPageID, Visited) --This Insert populates our WebHitID.
VALUES (Source.PersonID, Source.WebPageID, Source.Visited)
WHEN NOT MATCHED BY Source THEN --In Target-Table, but not in Source-Table.
DELETE --In case our WebHit log in Prod is archived/trimmed to save space.
;
You can see I opted to use TOP 9223372036854775807 (the biggest Integer there is) to pull everything.
If you have the resources to merge more than that, then you should be chunking it out.
While this screams "hacky workaround" to me, it should get you where you need to go.
I have tested this on a small sample set and verified it works.
I have not studied the performance impact of it on larger complex sets of data though, so YMMV with and without the TOP.
Following up on MikeTeeVee's answer.
Using TOP will allow you to Order By within a sub-query, however instead of TOP 9223372036854775807, I would go with
SELECT TOP 100 PERCENT
Unlikely to reach that number, but this way just makes more sense and looks cleaner.
Why would you care about the order of the ids matching? What difference would that make to how you query the data? Related tables should be connected through primary and foreign keys, not order records were inserted. Tables are not inherently ordered a particular way in databases. Order should come from the order by clause.
More explanation as to why you want to do this might help us steer you to an appropriate solution.