What is the reason not to use select *? - sql

I've seen a number of people claim that you should specifically name each column you want in your select query.
Assuming I'm going to use all of the columns anyway, why would I not use SELECT *?
Even considering the question *SQL query - Select * from view or Select col1, col2, … colN from view*, I don't think this is an exact duplicate as I'm approaching the issue from a slightly different perspective.
One of our principles is to not optimize before it's time. With that in mind, it seems like using SELECT * should be the preferred method until it is proven to be a resource issue or the schema is pretty much set in stone. Which, as we know, won't occur until development is completely done.
That said, is there an overriding issue to not use SELECT *?

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.
When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select * is an anti-pattern.
So selecting columns is not a premature optimization. A few things off the top of my head ....
If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
You can more easily scan code where that column is being used.
You should always write queries to bring back the least amount of information.
As others mention if you use ordinal column access you should never use select *
If your SQL statement joins tables, select * gives you all columns from all tables in the join
The corollary is that using select * ...
The columns used by the application is opaque
DBA's and their query profilers are unable to help your application's poor performance
The code is more brittle when changes occur
Your database and network are suffering because they are bringing back too much data (I/O)
Database engine optimizations are minimal as you're bringing back all data regardless (logical).
Writing correct SQL is just as easy as writing Select *. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

If your code depends on the columns being in a specific order, your code will break when there are changes to the table. Also, you may be fetching too much from the table when you select *, especially if there is a binary field in the table.
Just because you are using all the columns now, it doesn't mean someone else isn't going to add an extra column to the table.
It also adds overhead to the plan execution caching since it has to fetch the meta data about the table to know what columns are in *.

One major reason is that if you ever add/remove columns from your table, any query/procedure that is making a SELECT * call will now be getting more or less columns of data than expected.

In a roundabout way you are breaking the modularity rule about using
strict typing wherever possible. Explicit is almost universally
better.
Even if you now need every column in the table, more could be added
later which will be pulled down every time you run the query and
could hurt performance. It hurts performance because
You are pulling more data over the wire; and
Because you might defeat the optimizer's ability to pull the data right out of the index (for queries on columns that are all part of an index.) rather than doing
a lookup in the table itself
When TO use select *
When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.

There are a few reasons:
If the number of columns in a database changes and your application expects there to be a certain number...
If the order of columns in a database changes and your application expects them to be in a certain order...
Memory overhead. 8 unnecessary INTEGER columns would add 32 bytes of wasted memory. That doesn't sound like a lot, but this is for each query and INTEGER is one of the small column types... the extra columns are more likely to be VARCHAR or TEXT columns, which adds up quicker.
Network overhead. Related to memory overhead: if I issue 30,000 queries and have 8 unnecessary INTEGER columns, I've wasted 960kB of bandwidth. VARCHAR and TEXT columns are likely to be considerably larger.
Note: I chose INTEGER in the above example because they have a fixed size of 4 bytes.

If your application gets data with SELECT * and the table structure in the database is changed (say a column is removed), your application will fail in every place that you reference the missing field. If you instead include all the columns in your query, you application will break in the (hopefully) one place where you initially get the data, making the fix easier.
That being said, there are a number of situations in which SELECT * is desirable. One is a situation that I encounter all the time, where I need to replicate an entire table into another database (like SQL Server to DB2, for example). Another is an application written to display tables generically (i.e. without any knowledge of any particular table).

I actually noticed a strange behaviour when I used select * in views in SQL Server 2005.
Run the following query and you will see what I mean.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[D] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest
Compare the results of last 2 select statements.
I believe what you will see is a result of Select * referencing columns by index instead of name.
If you rebuild the view it will work fine again.
EDIT
I have added a separate question, *“select * from table” vs “select colA, colB, etc. from table” interesting behaviour in SQL Server 2005* to look into that behaviour in more details.

You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.
Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.

I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessary optimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?
As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).

When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.

SELECT * is not always evil. In my opinion, at least. I use it quite often for dynamic queries returning a whole table, plus some computed fields.
For instance, I want to compute geographical geometries from a "normal" table, that is a table without any geometry field, but with fields containing coordinates.
I use postgresql, and its spatial extension postgis. But the principle applies for many other cases.
An example:
a table of places, with coordinates stored in fields labeled x, y, z:
CREATE TABLE places (place_id integer, x numeric(10, 3), y numeric(10, 3), z numeric(10, 3), description varchar);
let's feed it with a few example values:
INSERT INTO places (place_id, x, y, z, description)
VALUES
(1, 2.295, 48.863, 64, 'Paris, Place de l\'Étoile'),
(2, 2.945, 48.858, 40, 'Paris, Tour Eiffel'),
(3, 0.373, 43.958, 90, 'Condom, Cathédrale St-Pierre');
I want to be able to map the contents of this table, using some GIS client. The normal way is to add a geometry field to the table, and build the geometry, based on the coordinates.
But I would prefer to get a dynamic query: this way, when I change coordinates (corrections, more accuracy, etc.), the objects mapped actually move, dynamically.
So here is the query with the SELECT *:
CREATE OR REPLACE VIEW places_points AS
SELECT *,
GeomFromewkt('SRID=4326; POINT ('|| x || ' ' || y || ' ' || z || ')')
FROM places;
Refer to postgis, for GeomFromewkt() function use.
Here is the result:
SELECT * FROM places_points;
place_id | x | y | z | description | geomfromewkt
----------+-------+--------+--------+------------------------------+--------------------------------------------------------------------
1 | 2.295 | 48.863 | 64.000 | Paris, Place de l'Étoile | 01010000A0E61000005C8FC2F5285C02405839B4C8766E48400000000000005040
2 | 2.945 | 48.858 | 40.000 | Paris, Tour Eiffel | 01010000A0E61000008FC2F5285C8F0740E7FBA9F1D26D48400000000000004440
3 | 0.373 | 43.958 | 90.000 | Condom, Cathédrale St-Pierre | 01010000A0E6100000AC1C5A643BDFD73FB4C876BE9FFA45400000000000805640
(3 lignes)
The rightmost column can now be used by any GIS program to properly map the points.
If, in the future, some fields get added to the table: no worries, I just have to run again the same VIEW definition.
I wish the definition of the VIEW could be kept "as is", with the *, but hélas it is not the case: this is how it is internally stored by postgresql:
SELECT places.place_id, places.x, places.y, places.z, places.description, geomfromewkt(((((('SRID=4326; POINT ('::text || places.x) || ' '::text) || places.y) || ' '::text) || places.z) || ')'::text) AS geomfromewkt FROM places;

Even if you use every column but address the row array by numeric index you will have problems if you add another row later on.
So basically it is a question of maintainability! If you don't use the * selector you will not have to worry about your queries.

Selecting only the columns you need keeps the dataset in memory smaller and therefor keeps your application faster.
Also, a lot of tools (e.g. stored procedures) cache query execution plans too. If you later add or remove a column (particularly easy if you're selecting off a view), the tool will often error when it doesn't get back results that it expects.

It makes your code more ambiguous and more difficult to maintain; because you're adding extra unused data to the domain, and it's not clear which you've intended and which not. (It also suggests that you might not know, or care.)

To answer you question directly: Do not use "SELECT *" when it makes your code more fragle to changes to the underlying tables. Your code should break only when a change is made to the table that directly affects requirments of your program.
Your application should take advantage of the abstraction layer that Relational access provides.

I don't use SELECT * simply because it is nice to see and know what fields I am retrieving.

Generally bad to use 'select *' inside of views because you will be forced to recompile the view in the event of a table column change. Changing the underlying table columns of a view you will get an error for non-existant columns until you go back and recompile.

It's ok when you're doing exists(select * ...) since it never gets expanded. Otherwise it's really only useful when exploring tables with temporary select statments or if you had a CTE defined above and you want every column without typing them all out again.

Just to add one thing that no one else has mentioned. Select * returns all the columns, someone may add a column later that you don't necessarily want the users to be able to see such as who last updated the data or a timestamp or notes that only managers should see not all users, etc.
Further, when adding a column, the impact on existing code should be reviewed and considered to see if changes are needed based on what information is stored in the column. By using select *, that review will often be skipped because the developer will assume that nothing will break. And in fact nothing may explicitly appear to break but queries may now start returning the wrong thing. Just because nothing explicitly breaks, doesn't mean that there should not have been changes to the queries.

because "select * " will waste memory when you don't need all the fields.But for sql server, their performence are the same.

Related

Store results of SQL Server query for pagination

In my database I have a table with a rather large data set that users can perform searches on. So for the following table structure for the Person table that contains about 250,000 records:
firstName|lastName|age
---------|--------|---
John | Doe |25
---------|--------|---
John | Sams |15
---------|--------|---
the users would be able to perform a query that can return about 500 or so results. What I would like to do is allow the user see his search results 50 at a time using pagination. I've figured out the client side pagination stuff, but I need somewhere to store the query results so that the pagination uses the results from his unique query and not from a SELECT * statement.
Can anyone provide some guidance on the best way to achieve this? Thanks.
Side note: I've been trying to use temp tables to do this by using the SELECT INTO statements, but I think that might cause some problems if, say, User A performs a search and his results are stored in the temp table then User B performs a search shortly after and User A's search results are overwritten.
In SQL Server the ROW_NUMBER() function is great for pagination, and may be helpful depending on what parameters change between searches, for example if searches were just for different firstName values you could use:
;WITH search AS (SELECT *,ROW_NUMBER() OVER (PARTITION BY firstName ORDER BY lastName) AS RN_firstName
FROM YourTable)
SELECT *
FROM search
WHERE RN BETWEEN 51 AND 100
AND firstName = 'John'
You could add additional ROW_NUMBER() lines, altering the PARTITION BY clause based on which fields are being searched.
Historically, for us, the best way to manage this is to create a complete new table, with a unique name. Then, when you're done, you can schedule the table for deletion.
The table, if practical, simply contains an index id (a simple sequenece: 1,2,3,4,5) and the primary key to the table(s) that are part of the query. Not the entire result set.
Your pagination logic then does something like:
SELECT p.* FROM temp_1234 t, primary_table p
WHERE t.pkey = p.primary_key
AND t.serial_id between 51 and 100
The serial id is your paging index.
So, you end up with something like (note, I'm not a SQL Server guy, so pardon):
CREATE TABLE temp_1234 (
serial_id serial,
pkey number
);
INSERT INTO temp_1234
SELECT 0, primary_key FROM primary_table WHERE <criteria> ORDER BY <sort>;
CREATE INDEX i_temp_1234 ON temp_1234(serial_id); // I think sql already does this for you
If you can delay the index, it's faster than creating it first, but it's a marginal improvement most likely.
Also, create a tracking table where you insert the table name, and the date. You can use this with a reaper process later (late at night) to DROP the days tables (those more than, say, X hours old).
Full table operations are much cheaper than inserting and deleting rows in to an individual table:
INSERT INTO page_table SELECT 'temp_1234', <sequence>, primary_key...
DELETE FROM page_table WHERE page_id = 'temp_1234';
That's just awful.
First of all, make sure you really need to do this. You're adding significant complexity, so go & measure whether the queries and pagination really hurts or you just "feel like you should". The pagination can be handled with ROW_NUMBER() quite easily.
Assuming you go ahead, once you've got your query, clearly you need to build a cache so first you need to identify what the key is. It will be the SQL statement or operation identifier (name of stored procedure perhaps) and the criteria used. If you don't want to share between users then the user name or some kind of session ID too.
Now when you do a query, you first look up in this table with all the key data then either
a) Can't find it so you run the query and add to the cache, storing the criteria/keys and the data or PK of the data depending on if you want a snapshot or real time. Bear in mind that "real time" isn't really because other users could be changing data under you.
b) Find it, so remove the results (or join the PK to the underlying tables) and return the results.
Of course now you need a background process to go and clean up the cache when it's been hanging around too long.
Like I said - you should really make sure you need to do this before you embark on it. In the example you give I don't think it's worth it.

Index on VARCHAR column

I have a table of 32,589 rows, and one of the columns is called 'Location' and is a Varchar(40) column type. The column holds a location, which is actually a suburb, all uppercase text.
A function that uses this table does a:
IF EXISTS(SELECT * FROM MyTable WHERE Location = 'A Suburb')
...
Would it be beneficial to add an index to this column, for efficiency? This is more a read-only table, so not much edits or inserts except for maintanance.
Without an index SQL Server will have to perform a table scan to find the first instance of the location you're looking for. You might get lucky and have the value be in one of the first few rows, but it could be at row 32,000, which would be a waste of time. Adding an index only takes a few second and you'll probably see a big performance gain.
I concur with #Brian Shamblen answer.
Also, try using TOP 1 in the inner select
IF EXISTS(SELECT TOP 1 * FROM MyTable WHERE Location = 'A Suburb')
You don't have to select all the records matching your criteria for EXISTS, one is enough.
An opportunistic approach to performance tuning is usually a bad idea.
To answer the specific question - if your function is using location in a where clause, and the table has more than a few hundred rows, and the values in the location column are not all identical, creating an index will speed up your function.
Whether you notice any difference is hard to say - there may be much bigger performance problems lurking in the database, and you might be fixing the wrong problem.

How do I optimize a database for superstring queries?

So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of:
SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')
My worry is that this won't scale. I'm currently doing some tests to see if this is a problem but I'm wondering if anyone has any suggestions for an alternative approach. I've had a brief look at MySQL's full-text indexing but that also appears to be geared toward finding a substring in the data, rather than finding out if the data exists in a given string.
You could create a temporary table with a full text index and insert 'my superstring' into it. Then you could use MySQL's full text match syntax in a join query with your permanent table. You'll still be doing a full table scan on your permanent table because you'll be checking for a match against every single row (what you want, right?). But at least 'my superstring' will be indexed so it will likely perform better than what you've got now.
Alternatively, you could consider simply selecting column from table and performing the match in a high level language. Depending on how many rows are in table, this approach might make more sense. Offloading heavy tasks to a client server (web server) can often be a win because it reduces load on the database server.
If your superstrings are URLs, and you want to find substrings in them, it would be useful to know if your substrings can be anchored on the dots.
For instance, you have superstrings :
www.mafia.gov.ru
www.mymafia.gov.ru
www.lobbies.whitehouse.gov
If your rules contain "mafia' and you want the first 2 to match, then what I'll say doesn't apply.
Else, you can parse your URLs into things like : [ 'www', 'mafia', 'gov', 'ru' ]
Then, it will be much easier to look up each element in your table.
Well it appears the answer is that you don't. This type of indexing is generally not available and if you want it within your MySQL database you'll need to create your own extensions to MySQL. The alternative I'm pursuing is to do the indexing in my application.
Thanks to everyone that responded!
I created a search solution using views that needed to be robust enought to grow with the customers needs. For Example:
CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)
CREATE VIEW viewMySearchData
as
SELECT
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData
SELECT
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'
If they then decide to add columns to tblMyData and they want those columns to be searched then modify viewMysearchData by adding the new colums to "AS SearchData" section.
If they decide that there are two many columns in the search then just modify the viewMySearchData by removing the unwanted columns from the "AS SearchData" section.

MySQL - Selecting data from multiple tables all with same structure but different data

Ok, here is my dilemma I have a database set up with about 5 tables all with the exact same data structure. The data is separated in this manner for localization purposes and to split up a total of about 4.5 million records.
A majority of the time only one table is needed and all is well. However, sometimes data is needed from 2 or more of the tables and it needs to be sorted by a user defined column. This is where I am having problems.
data columns:
id, band_name, song_name, album_name, genre
MySQL statment:
SELECT * from us_music, de_music where `genre` = 'punk'
MySQL spits out this error:
#1052 - Column 'genre' in where clause is ambiguous
Obviously, I am doing this wrong. Anyone care to shed some light on this for me?
I think you're looking for the UNION clause, a la
(SELECT * from us_music where `genre` = 'punk')
UNION
(SELECT * from de_music where `genre` = 'punk')
It sounds like you'd be happer with a single table. The five having the same schema, and sometimes needing to be presented as if they came from one table point to putting it all in one table.
Add a new column which can be used to distinguish among the five languages (I'm assuming it's language that is different among the tables since you said it was for localization). Don't worry about having 4.5 million records. Any real database can handle that size no problem. Add the correct indexes, and you'll have no trouble dealing with them as a single table.
Any of the above answers are valid, or an alternative way is to expand the table name to include the database name as well - eg:
SELECT * from us_music, de_music where `us_music.genre` = 'punk' AND `de_music.genre` = 'punk'
The column is ambiguous because it appears in both tables you would need to specify the where (or sort) field fully such as us_music.genre or de_music.genre but you'd usually specify two tables if you were then going to join them together in some fashion. The structure your dealing with is occasionally referred to as a partitioned table although it's usually done to separate the dataset into distinct files as well rather than to just split the dataset arbitrarily. If you're in charge of the database structure and there's no good reason to partition the data then I'd build one big table with an extra "origin" field that contains a country code but you're probably doing it for legitimate performance reason.
Either use a union to join the tables you're interested in http://dev.mysql.com/doc/refman/5.0/en/union.html or by using the Merge database engine http://dev.mysql.com/doc/refman/5.1/en/merge-storage-engine.html.
Your original attempt to span both tables creates an implicit JOIN. This is frowned upon by most experienced SQL programmers because it separates the tables to be combined with the condition of how.
The UNION is a good solution for the tables as they are, but there should be no reason they can't be put into the one table with decent indexing. I've seen adding the correct index to a large table increase query speed by three orders of magnitude.
The union statement cause a deal time in huge data. It is good to perform the select in 2 steps:
select the id
then select the main table with it

Is there a difference between Select * and Select [list each col] [duplicate]

This question already has answers here:
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc
(49 answers)
Closed 8 years ago.
I'm using MS SQL Server 2005. Is there a difference, to the SQL engine, between
SELECT * FROM MyTable;
and
SELECT ColA, ColB, ColC FROM MyTable;
When ColA, ColB, and ColC represent every column in the table?
If they are the same, is there a reason why you should use the 2nd one anyway? I have a project that's heavy on LINQ, and I'm not sure if the standard SELECT * it generates is a bad practice, or if I should always be a .Select() on it to specify which cols I want.
EDIT: Changed "When ColA, ColB, and ColC are all the columns to the table?" to "When ColA, ColB, and ColC represent every column in the table?" for clarity.
Generally, it's better to be explicit, so Select col1, col2 from Table is better. The reason being that at some point, an extra column may be added to that table, and would cause unneeded data to be brought back from the query.
This isn't a hard and fast rule though.
1) The second one is more explicit about which columns are returned. The value of the 2nd one then is how much you value explicitly knowing which columns come back.
2) This involves potentially less data being returned when there are more columns than the ones explicitly used as well.
3) If you change the table by adding a new column, the first query changes and the second does not. If you have code like "for all columns returned do ..." then the results change if you use the first, but not the 2nd.
I'm going to get a lot of people upset with me, but especially if I'm adding columns later on, I usually like to use the SELECT * FROM table. I've been called lazy for this reason, because if I make any modifications to my tables, I'd like not to track down all the stored procs that use that table, and just change it in the data access layer classes in my application. There are cases in which I will specify the columns, but in the case where I'm trying to get a complete "object" from the database, I'd rather just use the "*". And, yes, I know people will be hating me for this, but it has allowed me to be quicker and less bug free while adding fields to my applications.
The two sides of the issue are this: Explicit column specification gives better performance as new columns are added, but * specification requires no maintenance as new columns are added.
Which to use depends on what kind of columns you expect to add to the table, and what the point of the query is.
If you are using your table as a backing store for an object (which seems likely in the LINQ-to-SQL case), you probably want any new columns added to this table to be included in your object, and vice-versa. You're maintaining them in parallel. For this reason, for this case, * specification in the SELECT clause is right. Explicit specification would give you an extra bit of maintenance every time something changed, and a bug if you didn't update the field list correctly.
If the query is going to return a lot of records, you are probably better off with explicit specification for performance reasons.
If both things are true, consider having two different queries.
You should specify an explicit column list. SELECT * will bring back more columns than you need creating more IO and network traffic, but more importantly it might require extra lookups even though a non-clustered covering index exists (On SQL Server).
Some reasons not to use the first statement (select *) are:
If you add some large fields (a BLOB column would be very bad) later to that table, you could suffer performance problems in the application
If the query was a JOIN query with two or more tables, some of the fields could have the same name. It would be better to assure that your field names are different.
The purpose of the query is clearer with the second statement from an programming esthetics viewpoint
When you select each field individually, it is more clear which fields are actually being selected.
SELECT * is a bad practice in most places.
What if someone adds a 2gb BLOB column to that table?
What is someone adds really any column to that table?
It's a bug waiting to happen.
A couple things:
A good number of people have posted here recommending against using *, and given several good reasons for those answers. Out of 10 other responses so far only one doesn't recommend listing columns.
People often make exceptions to that rule when posting to help sites like StackOverflow, because they often don't know what columns are in your table or are important to your query. For that reason, you'll see a lot of code here and elsewhere on the web that uses the * syntax, even though the poster would tend to avoid it in his own code.
Its good for forward-compatiblity.
When you use
SELECT * FROM myTable
and in "myTable" are 3 columns. You get same results as
SELECT Column1, Column2, Column3 FROM myTable
But if you add new column in future, you get a diferent results.
Of course, if you change name one of existing column, in first case you get results and in the second case you get a error ( I think, this is correct behaviour of application ).
If your code relies on certain columns being in a certain order, you need to list the columns. If not, it doesn't really make a difference if you use "*" or write the column names out in the select statement.
An example is if you insert a column into a table.
Take this table:
ColA ColB ColC
You might have a query:
SELECT *
FROM myTable
Then the code might be:
rs = executeSql("SELECT * FROM myTable")
while (rs.read())
Print "Col A" + rs[0]
Print "Col B" + rs[1]
Print "Col C" + rs[2]
If you add a column between ColB and ColC, the query wouldn't return what you're looking for.
For LinqToSql, if you plan to modify those records later, you should pull the whole record into memory.
It depends on what you mean by "difference". There is the obvious syntax difference, but the real difference is one of performance.
When you say SELECT * FROM MyTable, you are telling the SQL query engine to return a data set with all of the columns from that table, while SELECT ColA, ColB, ColC FROM MyTable tells the query engine to return a data set with only ColA, ColB, and ColC from the table.
Say you have a table with 100 columns defined as CHAR[10]. SELECT * will return 100 columns * 10 bytes worth of data while SELECT ColA, ColB, ColC will return 3 columns * 10 bytes worth of data. This is a huge size difference in the amount of data that is being passed back across the wire.
Specifying the column list also makes it much clearer what columns you are interested in. The drawback is that if you add/remove a column from the table you need to ensure that the column list is updated as well, but I think that's a small price compared to the performance gain.
SELECT * FROM MyTable
select * is dependent on the column order in the schema so if you refer to the result set by the index # of the collection you will be looking at the wrong column.
SELECT Col1,Col2,Col3 FROM MyTable
this query will give you a collection that stays the same over time, but how often are you changing the column order anyways?
A quick look at the query execution plan shows that the querys are the same.
The general rule of thumb is that you will want to limit your queries to only the fields that you need returned.
selecting each column is better than just * because in case you add or delete a new row you HAVE to look at the code and take a look what you were doing with the retrieved data.
Also, it helps you understand your code better and allows you to use aliases as column names (in case you're performing a join of tables with a column sharing the name)
An example as to why you never (imho) should use SELECT *. This does not relate to MSSQL, but rather MySQL. Versions prior to 5.0.12 returned columns from certain types of joins in a none-standard manner. Of course, if your queries defines which columns you want and in which order you have no problem. Imagine the fun if they don't.
(One possible exception: Your query SELECTs from just one table and you identify columns in your programming language of choice by name rather than position.)
Using "SELECT *" optimizes for programmer typing. That's it. That's the only advantage.