SQL select * vs. selecting specific columns [duplicate]

SQL select * vs. selecting specific columns [duplicate] - sql

This question already has answers here:
Why is SELECT * considered harmful?
(16 answers)
Closed 9 years ago.
I was wondering which is best practice. Lest say I have a table with 10+ columns and I want to select data from it.
I've heard that 'select *' is better since selecting specific columns makes the database search for these columns before selecting while selecting all just grabs everything. On the other hand, what if the table has a lot of columns in it?
Is that true?
Thanks

It is best practice to explicitly name the columns you want to select.
As Mitch just said the performance isn't different. I even heard that looking up the actual columns names when using * is slower.
But the advantage is that when your table changes then your select does not change when you name your columns.

I think these two questions here and here have satisfactory answers.
* is not better, actually it is slower is one reason that select * is not good. In addition to this, according to OMG Ponies, select * is anti-pattern. See the questions in the links for detail.

selecting specific columns is better as it is raises the probability that SQL Server can access the data from indexes rather than querying the table data.
It's also require less changes, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.

Definetly not. Try making a SELECT * from a table which has millions of rows and tens of columns.
The performance with SELECT * will be worse.

It depends on what you're about to do with the result. Selecting unnecessary data is not a good practice either. You wouldn't create a bunch of variables with values you would never use. So selecting many columns you don't need is not a good idea either.

It depends.
Selecting all columns can make query slower because of need of reading all columns from disk -- if there are a lot of string columns (which are not in index) then it can have huge impact on query (IO) performance. And from my practise -- you rely need all columns.
From the other hand -- for small database with a few user and good enough hardware it's much easier to select just all columns -- especially if schema changes often.
However -- I would always recommended to explicitly select columns to make sure it doesn't hurt performance.

Related

Is selecting all columns from a SQL table expensive? [duplicate]

This question already has answers here:
What is the reason not to use select *?
(20 answers)
select * vs select column
(12 answers)
Closed 9 years ago.
Is it expensive to select all columns from a SQL table, compared to specifying which columns to retrieve?
SELECT * FROM table
vs
SELECT col1, col2, col3 FROM table
Might be useful to know that some of the tables I'm querying have over 100 columns.

It may be. It depends on what indexes are defined on the table.
If there's a non-clustered index on col1,col2,col3 then that index may be used to satisfy the query (since it's narrower than the table itself, be it a heap or a clustered index), which should result in lower I/O costs.
It's also, generally, preferred so that you can determine which queries are using particular columns in a table.
If the table has no indexes, or only a single clustered index, or there are no indexes that cover your particular query, then every page of the heap/clustered index is going to have to be accessed anyway. Even then, if you have any off-row data (e.g. a largish varchar(max)) then, if you don't include that column in the SELECT then you can avoid that I/O cost.

performance in this case, depends upon the proper use of indexes in your query.
If your DB is normalized properly and you have made use of indexes in where clause then for sure performance is going to be better.
Eg.
select * from tableName where id=232
Here index is used.
You can refer following link:
Performance issue in using SELECT *?
What is the reason not to use select *?

Lets take it apart into the main issues:
The actual database/application: How you type your query MIGHT change how the SQL application actually optimizes your query, where it gets the data from, etc. Then again, it might not. Its hard to generalize here and depends on the database application and setup.
Programmer resources: Using * instead of typing things out is easier and quicker for you. Yay! And if the "implication" behind the command is literally "get everything", maybe its a nice bit of programmer communication to use * instead of listing all the columns out by hand. Being hit in the face with a list of hundreds of column names as a programmer reading code afterwards is an unpleasant experience. On the other hand, listing things by hand can act as a bit of a signal that there's some reason you're asking for those columns specifically. Its not a strong signal, but its still a signal.
Other resources/IO/memory, etc: Now, if you don't actually need all 100 columns and you're querying them because you're lazy, then we get into further grey area. What's the database being loaded from? Where are the query results going? How fast are the read/write speeds on those things? Do you really want to do that with all the columns? How much memory or resources are going to be used in actioning the query? Will it be using an index? Is it indexed? Do you even need to care about optimization at this stage?
So the long and short of it is, its a grey area...

Generally you should only select the columns that you need for the query.
Sometimes selecting all the columns for a query which is used later in a stored procedure won't make any difference due to how the execution plan optimises the whole stored procedure. Also indexes on columns will have an effect.

Select * from table vs Select col1,col2,col3 from table [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
select * vs select column
I was just having a discussion with one of my colleague on the SQL Server performance on specifying the query command in the stored procedure.
So I want to know which one is preferred over another and whats the concrete reason behind that.
Suppose, We do have one table called
Employees(EmpName,EmpAddress)
And we want to select all the records from the table. So we can write the query in two ways,
Select * from Employees
Select EmpName, EmpAddress from Employees
So I would like to know is there any specific difference or performance issue in the above queries or are they just equal to the SQL Server Engine.
UPDATE:
Lets say the table schema won't change anymore. So no point for future maintenance.
Performance wise, lets say, the usage is very very high i.e. millions of hits per seconds on the database server. I want a clear and precise performance rating on both approaches.
No Indexing is done on the entire table.

The specific difference would show its ugly head if you add a column to the table.
Suddenly, the query you expected to return two columns now returns three. If you coded specifically for the two columns, the rest of your code is now broken.
Performance-wise, there shouldn't be a difference.
I always take the approach that being as specific as possible is the best when dealing with databases. If the table has two columns and you only need those two columns, be specific. Specify those two columns. It'll save you headaches in the future.

I am an avid avokat of the "be as specific as possible" rule, too. Not following it will hurt you in the long run. However, your question seems to be coming from a different background, so let me attempt to answer it.
When you submit a query to SQL Server it goes through several stages:
transmitting of query string over the network.
parsing of query string, producing a parse-tree
linking the referenced objects in the parse tree to existing objects
optimizing based on statistics and row count/size estimates
executing
transmitting of result data over the network
Let's look at each one:
The * query is a few bytes shorter, so step this will be faster
The * query contains fewer "tokens" so this should(!) be faster
During linking the list of columns need to be puled and compared to the query string. Here the "*" gets resolved to the actual column reference. Without access to the code it is impossible to say which version takes less cycles, however the amount of data accessed is about the same so this should be similar.
-6. In these stages there is no difference between the two example queries, as they will both get compiled to the same execution plan.
Taking all this into account, you will probably save a few nanoseconds when using the * notation. However, you example is very simplistic. In a more complex example it is possible that specifying as subset of columns of a table in a multi table join will lead to a different plan than using a *. If that happens we can be pretty certain that the explicit query will be faster.
The above comparison also assumes that the SQL Server process is running alone on a single processor and no other queries are submitted at the same time. If the process has to yield during the compilation those extra cycles will be far more than the ones we are trying to save.
So, the amont of saving we are talking about is very minute compared to the actual execution time and should not be used as an excuse for a "bad" coding practice.
I hope this answers your question.

You should always reference columns explicitly. This way, if the table structure changes (and such changes are made in an intelligent, backward-compatible way), your queries will continue to work and can be modified over time.
Also, unless you actually need all of the columns from the table (not typical), using SELECT * is bringing more data to your application than is necessary, and potentially forcing a clustered index scan instead of what might have been satisfied by a narrower covering index.
Bad habits to kick : using SELECT * / omitting the column list

Performance wise there are no difference between those 2 i think.But those 2 are used in different cases what may be the difference.
Consider a slightly larger table.If your table(Employees) contains 10 columns,then the 1st query will retain all of the information of the table.But for 2nd query,you may specify which columns information you need.So when you need all of the information of employees no.1 is the best one rather than specifying all of the column names.
Ofcourse,when you need to ALTER a table then those 2 would not be equal.

SQL: Using Select * [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc.
Is it bad practice to use Select * ?
I was going through some old code and saw some 'SELECT *' statements. My previous coworker had told me Select * was bad practice, but I couldn't really see the reason why (unless of course I only needed to return a few fields). But for full 'detail retrieves' (Get by Id type queries) Select * seems right.

It's bad practice.
If your schema changes down the road, the calling application may get more fields than it knows what to do with.
Also, you are getting more info than you need, which affects performance.
Also also, it implies you don't know what the columns are.

Using SELECT * is bad practice for two reasons:
It can return extra columns that you don't need, wasting bandwidth
It can break your code if someone adds a column

Yes, Select * is a bad practice. For one, it is not clear to other developers which columns you really are using. Are you actually using all of them? What happens when you add columns are you using those too? That makes it much more difficult to refactor column names should that need arise. Second, there are some instances where some database systems will remember which columns existed at the time you created an object. For example, if you create a stored procedure with Select *, it will bake in the columns that exist in the table at the time it is compiled. If the table changes, it make not reflect those changes in the stored procedure. There really isn't any reason to use Select * beyond laziness.

Yes, it is deemed bad practice.
It is better to specify an explicit column list, especially if the table contains many columns and you only really need some of them.

If any schema changes occur (extra columns are added), these will be caught by your application. This might be undesirable, say, if you bind a grid dynamically to a DataTable. Also it incurs more overhead on network communications.
Even if you are selecting all columns as of today, define the columns by name - its readable and explicit. Any additional columns will then not cause any problems with your code.

When you use SELECT *, you choose to trade immediate productivity (writing a query faster) for potential maintenance productivity (should your underlying query change and thus break dependent code/queries). The "bad-ness" of the practice is a risk management activity.

Even if you need to select all columns it is still better to specify them rather then use 'select *'

Is there a performance difference between select * from tablename and select column1, column2 from tablename? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
Select * vs Specifying Column Names
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc.
Is there a performance difference between select * from tablename and select column1, column2 from tablename?
When it is select * from, the database pulls out all fields/columns which are more than 2 fields/columns. So does the first query cost more time/resources?

If you do select * from, there are two performance issues:
the database has to determine which columns exist in the table
there is more data sent from the server to the client (all columns instead of only two)

The answer is YES in general! Because for small databases you don't see a performance difference ... but with biggest databases could be relevant differences if you use the unqualified * selector as shorthand!
In general, it's better to instantiate each column from which you want to retrieve data!
I can suggest you to read the official document about how to optimize SELECT and other statements!

In each case you always should test your changes. You could use a profiler to do that.
For mysql see : http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html

there is difference, specially in case when the other columns are BLOB or (big) TEXT fields. in case, your table contains just the two columns, there is no difference.

I've checked with the profiler - and it seems that the answer is no - both queries took the same time to execute.
The table had relatively small fields so the result set wasn't bloated with large quantities of data I potentially wouldn't need.
If you have large data fields you don't need in your result-set don't include them in your query

And while this may not make a big difference with one query run one time, if you use select * through the application, and especially if you use it when you are doing joins where you are definitely returning unneeded data, you are clearly slowing down the system for no good reason other than developer laziness. Select * should almost never be used on a production system.

Well, I guess it would depend on which type of performance you're talking about: Database or Programmer.
Speaking as someone who has had to clean up database queries that were written in the
select * from foo
format, it's a total nightmare. The database can and does change, so now the neat and tidy normalized database has become denormalized for performance reasons, and the host of sloppy queries written to grab everything are now slogging tons more data back.
If you don't need it, don't ask for it. Take a moment, think of the people who will follow after you and need to deal with your choices.
I've still got an entire section in our LIMS to uncluster, thanks for reminding me. -- Sigh

SQL query - Select * from view or Select col1, col2, ... colN from view [duplicate]

This question already has answers here:
What is the reason not to use select *?
(20 answers)
Closed 3 years ago.
We are using SQL Server 2005, but this question can be for any RDBMS.
Which of the following is more efficient, when selecting all columns from a view?
Select * from view
or
Select col1, col2, ..., colN from view

NEVER, EVER USE "SELECT *"!!!!
This is the cardinal rule of query design!
There are multiple reasons for this. One of which is, that if your table only has three fields on it and you use all three fields in the code that calls the query, there's a great possibility that you will be adding more fields to that table as the application grows, and if your select * query was only meant to return those 3 fields for the calling code, then you're pulling much more data from the database than you need.
Another reason is performance. In query design, don't think about reusability as much as this mantra:
TAKE ALL YOU CAN EAT, BUT EAT ALL YOU TAKE.

It is best practice to select each column by name. In the future your DB schema might change to add columns that you would then not need for a particular query. I would recommend selecting each column by name.

Just to clarify a point that several people have already made, the reason Select * is inefficient is because there has to be an initial call to the DB to find out exactly what fields are available, and then a second call where the query is made using explicit columns.
Feel free to use Select * when you are debugging, running casual queries or are in the early stages of developing a query, but as soon as you know your required columns, state them explicitly.

Select * is a poor programming practice. It is as likely to cause things to break as it is to save things from breaking. If you are only querying one table or view, then the efficiency gain may not be there (although it is possible if you are not intending to actually use every field). If you have an inner join, then you have at least two fields returning the same data (the join fields) and thus you are wasting network resources to send redundant data back to the application. You won't notice this at first, but as the result sets get larger and larger, you will soon have a network pipeline that is full and doesn't need to be. I can think of no instance where select * gains you anything. If a new column is added and you don't need to go to the code to do something with it, then the column shouldn't be returned by your query by definition. If someone drops and recreates the table with the columns in a different order, then all your queries will have information displaying wrong or will be giving bad results, such as putting the price into the part number field in a new record.
Plus it is quick to drag the column names over from the object browser, so that is just pure laziness not efficiency in coding.

It depends. Inheritance of views can be a handy thing and easy to maintain (SQL Anywhere):
create view v_fruit as select F.id, S.strain from F key join S;
create view v_apples as select v_fruit.*, C.colour from v_fruit key join C;

If you're really selecting all columns, it shouldn't make any noticeable difference whether you ask for * or if you are explicit. The SQL server will parse the request the same way in pretty much the same amount of time.

Always do select col1, col2 etc from view. There's no efficieny difference between the two methods that I know of, but using "select *" can be dangerous. If you modify your view definition adding new columns, you can break a program using "select *", whereas selecting a predefined set of columns (even all of them, named), will still work.

I guess it all depends on what the query optimizer does.
If I want to get every record in the row, I will generally use the "SELECT *..." option, since I then don't have to worry should I change the underlying table structure. As well, for someone maintaining the code, seeing "SELECT *" tells them that this query is intended to return every column, whereas listing the columns individually does not convey the same intention.

For performance - look at the query plan (should be no difference).
For maintainability. - always supply a fieldlist (that goes for INSERT INTO too).

select
column1
,column2
,column3
.
.
.
from Your-View
this one is more optimizer than Using the
select *
from Your View

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas