instead of SELECT * FROM mytable, i would like to select all fields EXCEPT one (namely, the 'serialized' field, which stores a serialized object). this is because i think that losing that field will speed up my query by a lot. however, i have so many fields and am quite the lazy guy. is there a way to say...
`SELECT ALL_ROWS_EXCEPT(serialized) FROM mytable`
?
thanks!
No, there is no convention in SQL to get all but one (or a number of designated) column(s).
Being explicit about what column(s) are being returned, preferably using a table alias (even if only for one table), is ideal.
Related
I always hear from SQL specialists that it is not efficient to use the '*' sign in SELECT statement and it is better to list all the field names instead.
But I don't find it efficient for me personally when it comes to adding new fields to a table and then updating all the stored procedures accordingly.
So what are the pros and cons in using '*' ?
Thanks.
In general, the use of SELECT * is not a good idea.
Pros:
When you add/remove columns, you don't have to make changes where you did use SELECT *
It is shorter to write
Also see the answers to: Can select * usage ever be justified?
Cons:
You are returning more data than you need. Say you add a VARBINARY column that contains 200k per row. You only need this data in one place for a single record - using SELECT * you can end up returning 2MB per 10 rows that you don't need
Explicit about what data is used
Specifying columns means you get an error when a column is removed
The query processor has to do some more work - figuring out what columns exist on the table (thanks #vinodadhikary)
You can find where a column is used more easily
You get all columns in joins if you use SELECT *
You can't use ordinal referencing (though using ordinal references for columns is bad practice in itself)
Also see the answers to: What is the reason not to use select *?
Pros:
when you really need all the columns, it's shorter to write select *
Cons:
most of the time, you don't need all the columns, but only some of them. It's more efficient to only retrieve what you want
you have no guarantee of the order of the retrieved columns (or at least, the order is not obvious from the query), which forbids accessing columns by index (only by name). But the names are also far from obvious
when joining multiple tables having potentially columns with the same name, you can define aliases for these columns
If I create a view and select my fields in the order I want to "receive" them in can I be fully assured that I can call "Select * from myView" from my apps instead of specifying ALL of the fieldnames yet again in my select query?
I ask this because I pass whole datarows to my DataModels and construct the objects by assigning properties to the different indexes in the itemarray attached to this datarow. If these fields get out of order there's no telling what could happen to my object.
I know that I can't rely on an order-by that lives inside of a view (been burned before on this one). But the order of the fields I was not sure about.
Sorry if this is sql noob level. We all start somewhere with it. Right now all the extraneous field names in my app code is making readability somewhat difficult so if I can safely go back and replace a lot of syntax with a * then that would be great.
These tables are small so i'm not worried about implications of using a * over individual fields. I'm just looking to not code unnecessary syntax.
Column order is guaranteed, row order (as you noted) is not.
Column order may not be guaranteed or reliable if both of these are true
the view definition has SELECT * or SELECT tableA.* internally
any changes are made to the table(s) concerned
You'd need to run sp_refreshview: see this question/answer for potential issues.
Of course, if you have simple SELECT * FROM table in a view, why not just use the table and save some maintenance pain?
Finally, and I have to say it, it isn't recommeded to use SELECT *... :-)
Yes, left-to-right ordering of columns is guaranteed in SQL. In fact, it's one of the top three flaws used to prove that SQL is not truly relational (e.g. see The Importance of Column Names by Hugh Darwen), duplicate rows and the NULL value being the other two.
Yes, I've always relied on select * returning fields in the order specified in the view or table.
For example Microsoft SQL - "* Specifies that all columns from all tables and views in the FROM clause should be returned. The columns are returned by table or view, as specified in the FROM clause, and in the order in which they exist in the table or view."
my query returns a column that can hold types of real estate. Values can be condo or duplex or house and so on. Instead of displaying condo, I just want a C in the column. My plan was to use a huge case/when structure to cover all the cases, is there an easier way? Just displaying the first letter in upper case wont work by the way, because sometimes that rule cant be applied to create the short code. Duplex for example is DE...
Thanks :-)
If you don't want to use a CASE statement how about creating a lookup table to map the column value to the lookup code you want. Join on to this in your query.
NB - Only worth considering if your query is running over a fairly small resultset or you'll hit performance issues. Indexing the column would help.
Some other options are depending on your DB server features:
Use a UDF to do the conversion.
Computed column on the source table.
Helper table with a column for each shorthand matching the long string?
The obvious thing to do would be to have another table which maps your value to a code, to which you can then join your results. But it smells a bit wrong. I'd want to join this other table to key values, not strings (which I assume aren't key values)
Why dont you use a decode function in sql
select decode(your_column_name,"condo","C",your_column_name) from table
I'm reading CJ Date's SQL and Relational Theory: How to Write Accurate SQL Code, and he makes the case that positional queries are bad — for example, this INSERT:
INSERT INTO t VALUES (1, 2, 3)
Instead, you should use attribute-based queries like this:
INSERT INTO t (one, two, three) VALUES (1, 2, 3)
Now, I understand that the first query is out of line with the relational model since tuples (rows) are unordered sets of attributes (columns). I'm having trouble understanding where the harm is in the first query. Can someone explain this to me?
The first query breaks pretty much any time the table schema changes. The second query accomodates any schema change that leaves its columns intact and doesn't add defaultless columns.
People who do SELECT * queries and then rely on positional notation for extracting the values they're concerned about are software maintenance supervillains for the same reason.
While the order of columns is defined in the schema, it should generally not be regarded as important because it's not conceptually important.
Also, it means that anyone reading the first version has to consult the schema to find out what the values are meant to mean. Admittedly this is just like using positional arguments in most programming languages, but somehow SQL feels slightly different in this respect - I'd certainly understand the second version much more easily (assuming the column names are sensible).
I don't really care about theoretical concepts in this regard (as in practice, a table does have a defined column order). The primary reason I would prefer the second one to the first is an added layer of abstraction. You can modify columns in a table without screwing up your queries.
You should try to make your SQL queries depend on the exact layout of the table as little as possible.
The first query relies on the table only having three fields, and in that exact order. Any change at all to the table will break the query.
The second query only relies on there being those three felds in the table, and the order of the fields is irrelevant. You can change the order of fields in the table without breaking the query, and you can even add fields as long as they allow null values or has a default value.
Although you don't rearrange the table layout very often, adding more fields to a table is quite common.
Also, the second query is more readable. You can tell from the query itself what the values put in the record means.
Something that hasn't been mentioned yet is that you will often be having a surrogate key as your PK, with auto_increment (or something similar) to assign a value. With the first one, you'd have to specify something there — but what value can you specify if it isn't to be used? NULL might be an option, but that doesn't really fit in considering the PK would be set to NOT NULL.
But apart from that, the whole "locked to a specific schema" is a much more important reason, IMO.
SQL gives you syntax for specifying the name of the column for both INSERT and SELECT statements. You should use this because:
Your queries are stable to changes in the column ordering, so that maintenance takes less work.
The column ordering maps better to how people think, so it's more readable. It's more clear to think of a column as the "Name" column rather than the 2nd column.
I prefer to use the UPDATE-like syntax:
INSERT t SET one = 1 , two = 2 , three = 3
Which is far easier to read and maintain than both the examples.
Long term, if you add one more column to your table, your INSERT will not work unless you explicitly specify list of columns. If someone changes the order of columns, your INSERT may silently succeed inserting values into wrong columns.
I'm going to add one more thing, the second query is less prone to error orginally even before tables are changed. Why do I say that? Becasue with the seocnd form you can (and should when you write the query) visually check to see if the columns in the insert table and the data in the values clause or select clause are in fact in the right order to begin with. Otherwise you may end up putting the Social Security Number in the Honoraria field by accident and paying speakers their SSN instead of the amount they should make for a speech (example not chosen at random, except we did catch it before it actually happened thanks to that visual check!).
How do you select all fields of two joined tables, without having conflicts with the common field?
Suppose I have two tables, Products and Services. I would like to make a query like this:
SELECT Products.*, Services.*
FROM Products
INNER JOIN Services ON Products.IdService = Services.IdService
The problem with this query is that IdService will appear twice and lead to a bunch of problems.
The alternative I found so far is to discriminate every field from Products except the IdService one. But this way I'll have to update the query every time I add a new field to Products.
Is there a better way to do this?
What are the most common SQL anti-patterns?
You've hit anti-pattern #1.
The better way is to provide a fieldlist. One way to get a quick field list is to
sp_help tablename
And if you want to create a view from this query - using select * gets you in more trouble. SQL Server captures the column list at the time the view is created. If you edit the underlying tables and don't recreate the view - you're signing up for trouble (I had a production fire of this nature - view was against tables in a different database though).
You should NEVER have SELECT * in production code (well, almost never, but the times where it is justified can be easily counted).
As far as I am aware you'll have to avoid SELECT * but this't really a problem.
SELECT * is usually regarded as a problem waiting to happen for the reason you quote as an advantage! Usually extra results columns appearing for queries when the database has been modified will cause problems.
Does your dialect of SQL support COMPOSE? COMPOSE gets rid of the extra copy of the column that's used on an equijoin, like the one in your example.
As others have said the Select * is bad news especially if other fields are added to the tables in which you are querying. You should select out the exact fields you want from the tables and can use an alias for fields with the same names or just use table.columnName.
Do not use *. Use somthing like this:
SELECT P.field1 AS 'Field from P'
, P.field2
, S.field1 AS 'Field from S'
, S.field4
FROM Products P
INNER JOIN
Services S
ON P.IdService = S.IdService
That would be correct, list the fields you want (in SQL Server you can drag them over from the object browser, so you don't have to type them all). Incidentally, if there are fields your specific query doe not need, do not list them. This creates extra work for the server and uses up extra network resources and can be one of the causes of poor performance when it is done thoughout your system and such wasteful queries are run thousands of times a day.
As to it being a maintenance problem, you only need to add the fields if the part of the application that uses your query would be affected by them. If you don't know what affect the new field would have or where you need to add it, you shouldn't be adding the field. Also adding new fileds unexopectedly through the use of select * can cause maintenance problems as well. Creating performance problems to avoid doing maintenance (maintenance you may never even need to do as column changes should be rare (if they aren't you need to look at your design)) is pretty short-sighted.
The best way is to specify the exact fields that you want from the query. You shouldn't use * anyway.
It is convenient to use * to get all fields, but it doesn't produce robust code. Any change in the table will change the result that is returned from the query, and that is not always desirable.
You should return only the data that you really want from the query, specified in the exact order you want it. That way the result looks exactly the same even if you add fields to the table or change the order of the fields in the table.
It's a litte more work to specify the exact output, but in the long run it usually pays off. When you make a change, only what you actually change is affected, you don't get cascading effects that breaks code that you didn't even know was affected.