Delphi: How to get table structure into object - sql

Using Delphi I need to create a class which will contain specific table structure (without data), including all fields, constraints, foreign keys, indexes. The goal is having "standard" table, compare them and find differences. This thing ought to be included into my big project so I can't use any "outer" comparators. Furthermore, this functionality might be extended, so I need to have my own realization. The question is how can I retrieve this information, having connection string and knowing specific table name. SQL Server 2008 is being used.

If you look at Delphi source, it's done this way :
Select * from Table where 1=2
Update:
Metadata can be retrieved with Information Schema Views , for example constraints :
SELECT * FROM databaseName.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE
Where TABLE_NAME='tableName'

Been a very long time since I touched Delphi, but I do remember a couple things I used to do. Like
select top 0 * from table
returns 0 records but the TQuery is 'populated' with the meta data. Or I think on a TClientDataSet you can set the rows to -1, which has the same effect.
As I said, it's been a long time since I tinkered in Delphi and I was using the BDE and not native clients so this could all be useless info.
Hope this helps a bit though.

Related

Multiple aliases for a Sybase table?

I am working on a team trying to phase out a legacy system. As this is a rather large system with multiple integrations, the database will live on, even after the legacy system is replaced.
Now, the problem is that all the table names in the database have numerical names: "RT001", "RT002", "RT003", etc. With well over 100 tables it gets harder to know what each table is, and how can be joined to get hold of specific data.
Is there a way to define a global table alias in sybase so that sybase knows that the SQL"select * from Order, OrderItems where ..." is referring to tables RT035 and RT036 ? This way I can keep the original table names as RT035, while having an alias like "Order", or even "RT035_Order" refer to it.
As far as I know there is no such thing as a "synonym" (that's what it's called in an Oracle database) in Sybase ASE. But you could still use simple view to do basically the same thing:
CREATE VIEW Order AS SELECT * FROM RT035;

What is wrong with using SELECT * FROM sometable [duplicate]

I've heard that SELECT * is generally bad practice to use when writing SQL commands because it is more efficient to SELECT columns you specifically need.
If I need to SELECT every column in a table, should I use
SELECT * FROM TABLE
or
SELECT column1, colum2, column3, etc. FROM TABLE
Does the efficiency really matter in this case? I'd think SELECT * would be more optimal internally if you really need all of the data, but I'm saying this with no real understanding of database.
I'm curious to know what the best practice is in this case.
UPDATE: I probably should specify that the only situation where I would really want to do a SELECT * is when I'm selecting data from one table where I know all columns will always need to be retrieved, even when new columns are added.
Given the responses I've seen however, this still seems like a bad idea and SELECT * should never be used for a lot more technical reasons that I ever though about.
One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data.
Here's a post I wrote about it: The real reason select queries are bad index coverage
It's also less fragile to change, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.
Given your specification that you are selecting all columns, there is little difference at this time. Realize, however, that database schemas do change. If you use SELECT * you are going to get any new columns added to the table, even though in all likelihood, your code is not prepared to use or present that new data. This means that you are exposing your system to unexpected performance and functionality changes.
You may be willing to dismiss this as a minor cost, but realize that columns that you don't need still must be:
Read from database
Sent across the network
Marshalled into your process
(for ADO-type technologies) Saved in a data-table in-memory
Ignored and discarded / garbage-collected
Item #1 has many hidden costs including eliminating some potential covering index, causing data-page loads (and server cache thrashing), incurring row / page / table locks that might be otherwise avoided.
Balance this against the potential savings of specifying the columns versus an * and the only potential savings are:
Programmer doesn't need to revisit the SQL to add columns
The network-transport of the SQL is smaller / faster
SQL Server query parse / validation time
SQL Server query plan cache
For item 1, the reality is that you're going to add / change code to use any new column you might add anyway, so it is a wash.
For item 2, the difference is rarely enough to push you into a different packet-size or number of network packets. If you get to the point where SQL statement transmission time is the predominant issue, you probably need to reduce the rate of statements first.
For item 3, there is NO savings as the expansion of the * has to happen anyway, which means consulting the table(s) schema anyway. Realistically, listing the columns will incur the same cost because they have to be validated against the schema. In other words this is a complete wash.
For item 4, when you specify specific columns, your query plan cache could get larger but only if you are dealing with different sets of columns (which is not what you've specified). In this case, you do want different cache entries because you want different plans as needed.
So, this all comes down, because of the way you specified the question, to the issue resiliency in the face of eventual schema modifications. If you're burning this schema into ROM (it happens), then an * is perfectly acceptable.
However, my general guideline is that you should only select the columns you need, which means that sometimes it will look like you are asking for all of them, but DBAs and schema evolution mean that some new columns might appear that could greatly affect the query.
My advice is that you should ALWAYS SELECT specific columns. Remember that you get good at what you do over and over, so just get in the habit of doing it right.
If you are wondering why a schema might change without code changing, think in terms of audit logging, effective/expiration dates and other similar things that get added by DBAs for systemically for compliance issues. Another source of underhanded changes is denormalizations for performance elsewhere in the system or user-defined fields.
You should only select the columns that you need. Even if you need all columns it's still better to list column names so that the sql server does not have to query system table for columns.
Also, your application might break if someone adds columns to the table. Your program will get columns it didn't expect too and it might not know how to process them.
Apart from this if the table has a binary column then the query will be much more slower and use more network resources.
There are four big reasons that select * is a bad thing:
The most significant practical reason is that it forces the user to magically know the order in which columns will be returned. It's better to be explicit, which also protects you against the table changing, which segues nicely into...
If a column name you're using changes, it's better to catch it early (at the point of the SQL call) rather than when you're trying to use the column that no longer exists (or has had its name changed, etc.)
Listing the column names makes your code far more self-documented, and so probably more readable.
If you're transferring over a network (or even if you aren't), columns you don't need are just waste.
Specifying the column list is usually the best option because your application won't be affected if someone adds/inserts a column to the table.
Specifying column names is definitely faster - for the server. But if
performance is not a big issue (for example, this is a website content database with hundreds, maybe thousands - but not millions - of rows in each table); AND
your job is to create many small, similar applications (e.g. public-facing content-managed websites) using a common framework, rather than creating a complex one-off application; AND
flexibility is important (lots of customization of the db schema for each site);
then you're better off sticking with SELECT *. In our framework, heavy use of SELECT * allows us to introduce a new website managed content field to a table, giving it all of the benefits of the CMS (versioning, workflow/approvals, etc.), while only touching the code at a couple of points, instead of a couple dozen points.
I know the DB gurus are going to hate me for this - go ahead, vote me down - but in my world, developer time is scarce and CPU cycles are abundant, so I adjust accordingly what I conserve and what I waste.
SELECT * is a bad practice even if the query is not sent over a network.
Selecting more data than you need makes the query less efficient - the server has to read and transfer extra data, so it takes time and creates unnecessary load on the system (not only the network, as others mentioned, but also disk, CPU etc.). Additionally, the server is unable to optimize the query as well as it might (for example, use covering index for the query).
After some time your table structure might change, so SELECT * will return a different set of columns. So, your application might get a dataset of unexpected structure and break somewhere downstream. Explicitly stating the columns guarantees that you either get a dataset of known structure, or get a clear error on the database level (like 'column not found').
Of course, all this doesn't matter much for a small and simple system.
Lots of good reasons answered here so far, here's another one that hasn't been mentioned.
Explicitly naming the columns will help you with maintenance down the road. At some point you're going to be making changes or troubleshooting, and find yourself asking "where the heck is that column used".
If you've got the names listed explicitly, then finding every reference to that column -- through all your stored procedures, views, etc -- is simple. Just dump a CREATE script for your DB schema, and text search through it.
Performance wise, SELECT with specific columns can be faster (no need to read in all the data). If your query really does use ALL the columns, SELECT with explicit parameters is still preferred. Any speed difference will be basically unnoticeable and near constant-time. One day your schema will change, and this is good insurance to prevent problems due to this.
definitely defining the columns, because SQL Server will not have to do a lookup on the columns to pull them. If you define the columns, then SQL can skip that step.
It's always better to specify the columns you need, if you think about it one time, SQL doesn't have to think "wtf is *" every time you query. On top of that, someone later may add columns to the table that you actually do not need in your query and you'll be better off in that case by specifying all of your columns.
The problem with "select *" is the possibility of bringing data you don't really need. During the actual database query, the selected columns don't really add to the computation. What's really "heavy" is the data transport back to your client, and any column that you don't really need is just wasting network bandwidth and adding to the time you're waiting for you query to return.
Even if you do use all the columns brought from a "select *...", that's just for now. If in the future you change the table/view layout and add more columns, you'll start bring those in your selects even if you don't need them.
Another point in which a "select *" statement is bad is on view creation. If you create a view using "select *" and later add columns to your table, the view definition and the data returned won't match, and you'll need to recompile your views in order for them to work again.
I know that writing a "select *" is tempting, 'cause I really don't like to manually specify all the fields on my queries, but when your system start to evolve, you'll see that it's worth to spend this extra time/effort in specifying the fields rather than spending much more time and effort removing bugs on your views or optimizing your app.
While explicitly listing columns is good for performance, don't get crazy.
So if you use all the data, try SELECT * for simplicity (imagine having many columns and doing a JOIN... query may get awful). Then - measure. Compare with query with column names listed explicitly.
Don't speculate about performance, measure it!
Explicit listing helps most when you have some column containing big data (like body of a post or article), and don't need it in given query. Then by not returning it in your answer DB server can save time, bandwidth, and disk throughput. Your query result will also be smaller, which is good for any query cache.
You should really be selecting only the fields you need, and only the required number, i.e.
SELECT Field1, Field2 FROM SomeTable WHERE --(constraints)
Outside of the database, dynamic queries run the risk of injection attacks and malformed data. Typically you get round this using stored procedures or parameterised queries. Also (although not really that much of a problem) the server has to generate an execution plan each time a dynamic query is executed.
It is NOT faster to use explicit field names versus *, if and only if, you need to get the data for all fields.
Your client software shouldn't depend on the order of the fields returned, so that's a nonsense too.
And it's possible (though unlikely) that you need to get all fields using * because you don't yet know what fields exist (think very dynamic database structure).
Another disadvantage of using explicit field names is that if there are many of them and they're long then it makes reading the code and/or the query log more difficult.
So the rule should be: if you need all the fields, use *, if you need only a subset, name them explicitly.
The result is too huge. It is slow to generate and send the result from the SQL engine to the client.
The client side, being a generic programming environment, is not and should not be designed to filter and process the results (e.g. the WHERE clause, ORDER clause), as the number of rows can be huge (e.g. tens of millions of rows).
Naming each column you expect to get in your application also ensures your application won't break if someone alters the table, as long as your columns are still present (in any order).
Performance wise I have seen comments that both are equal. but usability aspect there are some +'s and -'s
When you use a (select *) in a query and if some one alter the table and add new fields which do not need for the previous query it is an unnecessary overhead. And what if the newly added field is a blob or an image field??? your query response time is going to be really slow then.
In other hand if you use a (select col1,col2,..) and if the table get altered and added new fields and if those fields are needed in the result set, you always need to edit your select query after table alteration.
But I suggest always to use select col1,col2,... in your queries and alter the query if the table get altered later...
This is an old post, but still valid. For reference, I have a very complicated query consisting of:
12 tables
6 Left joins
9 inner joins
108 total columns on all 12 tables
I only need 54 columns
A 4 column Order By clause
When I execute the query using Select *, it takes an average of 2869ms.
When I execute the query using Select , it takes an average of 1513ms.
Total rows returned is 13,949.
There is no doubt selecting column names means faster performance over Select *
Select is equally efficient (in terms of velocity) if you use * or columns.
The difference is about memory, not velocity. When you select several columns SQL Server must allocate memory space to serve you the query, including all data for all the columns that you've requested, even if you're only using one of them.
What does matter in terms of performance is the excecution plan which in turn depends heavily on your WHERE clause and the number of JOIN, OUTER JOIN, etc ...
For your question just use SELECT *. If you need all the columns there's no performance difference.
It depends on the version of your DB server, but modern versions of SQL can cache the plan either way. I'd say go with whatever is most maintainable with your data access code.
One reason it's better practice to spell out exactly which columns you want is because of possible future changes in the table structure.
If you are reading in data manually using an index based approach to populate a data structure with the results of your query, then in the future when you add/remove a column you will have headaches trying to figure out what went wrong.
As to what is faster, I'll defer to others for their expertise.
As with most problems, it depends on what you want to achieve. If you want to create a db grid that will allow all columns in any table, then "Select *" is the answer. However, if you will only need certain columns and adding or deleting columns from the query is done infrequently, then specify them individually.
It also depends on the amount of data you want to transfer from the server. If one of the columns is a defined as memo, graphic, blob, etc. and you don't need that column, you'd better not use "Select *" or you'll get a whole bunch of data you don't want and your performance could suffer.
To add on to what everyone else has said, if all of your columns that you are selecting are included in an index, your result set will be pulled from the index instead of looking up additional data from SQL.
SELECT * is necessary if one wants to obtain metadata such as the number of columns.
Gonna get slammed for this, but I do a select * because almost all my data is retrived from SQL Server Views that precombine needed values from multiple tables into a single easy to access View.
I do then want all the columns from the view which won't change when new fields are added to underlying tables. This has the added benefit of allowing me to change where data comes from. FieldA in the View may at one time be calculated and then I may change it to be static. Either way the View supplies FieldA to me.
The beauty of this is that it allows my data layer to get datasets. It then passes them to my BL which can then create objects from them. My main app only knows and interacts with the objects. I even allow my objects to self-create when passed a datarow.
Of course, I'm the only developer, so that helps too :)
What everyone above said, plus:
If you're striving for readable maintainable code, doing something like:
SELECT foo, bar FROM widgets;
is instantly readable and shows intent. If you make that call you know what you're getting back. If widgets only has foo and bar columns, then selecting * means you still have to think about what you're getting back, confirm the order is mapped correctly, etc. However, if widgets has more columns but you're only interested in foo and bar, then your code gets messy when you query for a wildcard and then only use some of what's returned.
And remember if you have an inner join by definition you do not need all the columns as the data in the join columns is repeated.
It's not like listing columns in SQl server is hard or even time-consuming. You just drag them over from the object browser (you can get all in one go by dragging from the word columns). To put a permanent performance hit on your system (becasue this can reduce the use of indexes and becasue sending unneeded data over the network is costly) and make it more likely that you will have unexpected problems as the database changes (sometimes columns get added that you do not want the user to see for instance) just to save less than a minute of development time is short-sighted and unprofessional.
Absolutely define the columns you want to SELECT every time. There is no reason not to and the performance improvement is well worth it.
They should never have given the option to "SELECT *"
If you need every column then just use SELECT * but remember that the order could potentially change so when you are consuming the results access them by name and not by index.
I would ignore comments about how * needs to go get the list - chances are parsing and validating named columns is equal to the processing time if not more. Don't prematurely optimize ;-)

Can select * usage ever be justified?

I've always preached to my developers that SELECT * is evil and should be avoided like the plague.
Are there any cases where it can be justified?
I'm not talking about COUNT(*) - which most optimizers can figure out.
Edit
I'm talking about production code.
And one great example I saw of this bad practice was a legacy asp application that used select * in a stored procedure, and used ADO to loop through the returned records, but got the columns by index. You can imagine what happened when a new field was added somewhere other than the end of the field list.
I'm quite happy using * in audit triggers.
In that case it can actually prove a benefit because it will ensure that if additional columns are added to the base table it will raise an error so it cannot be forgotten to deal with this in the audit trigger and/or audit table structure.
(Like dotjoe) I am also happy using it in derived tables and column table expressions. Though I habitually do it the other way round.
WITH t
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY a) AS RN
FROM foo)
SELECT a,
b,
c,
RN
FROM t;
I'm mostly familiar with SQL Server and there at least the optimiser has no problem recognising that only columns a,b,c will be required and the use of * in the inner table expression does not cause any unnecessary overhead retrieving and discarding unneeded columns.
In principle SELECT * ought to be fine in a view as well as it is the final SELECT from the view where it ought to be avoided however in SQL Server this can cause problems as it stores column metadata for views which is not automatically updated when the underlying tables change and the use of * can lead to confusing and incorrect results unless sp_refreshview is run to update this metadata.
There are many scenarios where SELECT * is the optimal solution. Running ad-hoc queries in Management Studio just to get a sense of the data you're working with. Querying tables where you don't know the column names yet because it's the first time you've worked with a new schema. Building disposable quick'n'dirty tools to do a one-time migration or data export.
I'd agree that in "proper" development, you should avoid it - but there's lots of scenarios where "proper" development isn't necessarily the optimum solution to a business problem. Rules and best practices are great, as long as you know when to break them. :)
I'll use it in production when working with CTEs. But, in this case it's not really select *, because I already specified the columns in the CTE. I just don't want to respecify in the final select.
with t as (
select a, b, c from foo
)
select t.* from t;
None that I can think of, if you are talking about live code.
People saying that it makes adding columns easier to develop (so they automatically get returned and can be used without changing the Stored procedure) have no idea about writing optimal code/sql.
I only ever use it when writing ad-hoc queries that will not get reused (finding out the structure of a table, getting some data when I am not sure what the column names are).
I think using select * in an exists clause is appropriate:
select some_field from some_table
where exists
(select * from related_table [join condition...])
Some people like to use select 1 in this case, but it's not elegant, and it doesn't buy any performance improvements (early optimization strikes again).
In production code, I'd tend to agree 100% with you.
However, I think that the * more than justifies its existence when performing ad-hoc queries.
You've gotten a number of answers to your question, but you seem to be dismissing everything that isn't parroting back what you want to hear. Still, here it is for the third (so far) time: sometimes there is no bottleneck. Sometimes performance is way better than fine. Sometimes the tables are in flux, and amending every SELECT query is just one more bit of possible inconsistency to manage. Sometimes you've got to deliver on an impossible schedule and this is the last thing you need to think about.
If you live in bullet time, sure, type in all the column names. But why stop there? Re-write your app in a schema-less dbms. Hell, write your own dbms in assembly. That'd really show 'em.
And remember if you use select * and you have a join at least one field will be sent twice (the join field). This wastes database resources and network resources for no reason.
As a tool I use it to quickly refresh my memory as to what I can possibly get back from a query. As a production level query itself .. no way.
When creating an application that deals with the database, like phpmyadmin, and you are in a page where to display a full table, in that case using SELECT * can be justified, I guess.
About the only thing that I can think of would be when developing a utility or SQL tool application that is being written to run against any database. Even here though, I would tend to query the system tables to get the table structure and then build any necessary query from that.
There was one recent place where my team used SELECT * and I think that it was ok... we have a database that exists as a facade against another database (call it DB_Data), so it is primarily made up of views against the tables in the other database. When we generate the views we actually generate the column lists, but there is one set of views in the DB_Data database that are automatically generated as rows are added to a generic look-up table (this design was in place before I got here). We wrote a DDL trigger so that when a view is created in DB_Data by this process then another view is automatically created in the facade. Since the view is always generated to exactly match the view in DB_Data and is always refreshed and kept in sync, we just used SELECT * for simplicity.
I wouldn't be surprised if most developers went their entire career without having a legitimate use for SELECT * in production code though.
I've used select * to query tables optimized for reading (denormalized, flat data). Very advantageous since the purpose of the tables were simply to support various views in the application.
How else do the developers of phpmyadmin ensure they are displaying all the fields of your DB tables?
It is conceivable you'd want to design your DB and application so that you can add a column to a table without needing to rewrite your application. If your application at least checks column names it can safely use SELECT * and treat additional columns with some appropriate default action. Sure the app could consult system catalogs (or app-specific catalogs) for column information, but in some circumstances SELECT * is syntactic sugar for doing that.
There are obvious risks to this, however, and adding the required logic to the app to make it reliable could well simply mean replicating the DB's query checks in a less suitable medium. I am not going to speculate on how the costs and benefits trade off in real life.
In practice, I stick to SELECT * for 3 cases (some mentioned in other answers:
As an ad-hoc query, entered in a SQL GUI or command line.
As the contents of an EXISTS predicate.
In an application that dealt with generic tables without needing to know what they mean (e.g. a dumper, or differ).
Yes, but only in situations where the intention is to actually get all the columns from a table not because you want all the columns that a table currently has.
For example, in one system that I worked on we had UDFs (User Defined Fields) where the user could pick the fields they wanted on the report, the order as well as filtering. When building a result set it made more sense to simply "select *" from the temporary tables that I was building instead of having to keep track of which columns were active.
I have several times needed to display data from a table whose column names were unknown. So I did SELECT * and got the column names at run time.
I was handed a legacy app where a table had 200 columns and a view had 300. The risk exposure from SELECT * would have been no worse than from listing all 300 columns explicitly.
Depends on the context of the production software.
If you are writing a simple data access layer for a table management tool where the user will be selecting tables and viewing results in a grid, then it would seem *SELECT ** is fine.
In other words, if you choose to handle "selection of fields" through some other means (as in automatic or user-specified filters after retrieving the resultset) then it seems just fine.
If on the other hand we are talking about some sort of enterprise software with business rules, a defined schema, etc. ... then I agree that *SELECT ** is a bad idea.
EDIT: Oh and when the source table is a stored procedure for a trigger or view, "*SELECT **" should be fine because you're managing the resultset through other means (the view's definition or the stored proc's resultset).
Select * in production code is justifiable any time that:
it isn't a performance bottleneck
development time is critical
Why would I want the overhead of going back and having to worry about changing the relevant stored procedures, every time I add a field to the table?
Why would I even want to have to think about whether or not I've selected the right fields, when the vast majority of the time I want most of them anyway, and the vast majority of the few times I don't, something else is the bottleneck?
If I have a specific performance issue then I'll go back and fix that. Otherwise in my environment, it's just premature (and expensive) optimisation that I can do without.
Edit.. following the discussion, I guess I'd add to this:
... and where people haven't done other undesirable things like tried to access columns(i), which could break in other situations anyway :)
I know I'm very late to the party but I'll chip in that I use select * whenever I know that I'll always want all columns regardless of the column names. This may be a rather fringe case but in data warehousing, I might want to stage an entire table from a 3rd party app. My standard process for this is to drop the staging table and run
select *
into staging.aTable
from remotedb.dbo.aTable
Yes, if the schema on the remote table changes, downstream dependencies may throw errors but that's going to happen regardless.
If you want to find all the columns and want order, you can do the following (at least if you use MySQL):
SHOW COLUMNS FROM mytable FROM mydb; (1)
You can see every relevant information about all your fields. You can prevent problems with types and you can know for sure all the column names. This command is very quick, because you just ask for the structure of the table. From the results you will select all the name and will build a string like this:
"select " + fieldNames[0] + ", fieldNames[1]" + ", fieldNames[2] from mytable". (2)
If you don't want to run two separate MySQL commands because a MySQL command is expensive, you can include (1) and (2) into a stored procedure which will have the results as an OUT parameter, that way you will just call a stored procedure and every command and data generation will happen at the database server.

Why is using '*' to build a view bad?

Why is using '*' to build a view bad ?
Suppose that you have a complex join and all fields may be used somewhere.
Then you just have to chose fields needed.
SELECT field1, field2 FROM aview WHERE ...
The view "aview" could be SELECT table1.*, table2.* ... FROM table1 INNER JOIN table2 ...
We have a problem if 2 fields have the same name in table1 and table2.
Is this only the reason why using '*' in a view is bad?
With '*', you may use the view in a different context because the information is there.
What am I missing ?
Regards
I don't think there's much in software that is "just bad", but there's plenty of stuff that is misused in bad ways :-)
The example you give is a reason why * might not give you what you expect, and I think there are others. For example, if the underlying tables change, maybe columns are added or removed, a view that uses * will continue to be valid, but might break any applications that use it. If your view had named the columns explicitly then there was more chance that someone would spot the problem when making the schema change.
On the other hand, you might actually want your view to blithely
accept all changes to the underlying tables, in which case a * would
be just what you want.
Update: I don't know if the OP had a specific database vendor in mind, but it is now clear that my last remark does not hold true for all types. I am indebted to user12861 and Jonny Leeds for pointing this out, and sorry it's taken over 6 years for me to edit my answer.
Although many of the comments here are very good and reference one common problem of using wildcards in queries, such as causing errors or different results if the underlying tables change, another issue that hasn't been covered is optimization. A query that pulls every column of a table tends to not be quite as efficient as a query that pulls only those columns you actually need. Granted, there are those times when you need every column and it's a major PIA having to reference them all, especially in a large table, but if you only need a subset, why bog down your query with more columns than you need.
Another reason why "*" is risky, not only in views but in queries, is that columns can change name or change position in the underlying tables. Using a wildcard means that your view accommodates such changes easily without needing to be changed. But if your application references columns by position in the result set, or if you use a dynamic language that returns result sets keyed by column name, you could experience problems that are hard to debug.
I avoid using the wildcard at all times. That way if a column changes name, I get an error in the view or query immediately, and I know where to fix it. If a column changes position in the underlying table, specifying the order of the columns in the view or query compensates for this.
These other answers all have good points, but on SQL server at least they also have some wrong points. Try this:
create table temp (i int, j int)
go
create view vtemp as select * from temp
go
insert temp select 1, 1
go
alter table temp add k int
go
insert temp select 1, 1, 1
go
select * from vtemp
SQL Server doesn't learn about the "new" column when it is added. Depending on what you want this could be a good thing or a bad thing, but either way it's probably not good to depend on it. So avoiding it just seems like a good idea.
To me this weird behavior is the most compelling reason to avoid select * in views.
The comments have taught me that MySQL has similar behavior and Oracle does not (it will learn about changes to the table). This inconsistency to me is all the more reason not to use select * in views.
Using '*' for anything production is bad. It's great for one-off queries, but in production code you should always be as explicit as possible.
For views in particular, if the underlying tables have columns added or removed, the view will either be wrong or broken until it is recompiled.
Using SELECT * within the view does not incur much of a performance overhead if columns aren't used outside the view - the optimizer will optimize them out; SELECT * FROM TheView can perhaps waste bandwidth, just like any time you pull more columns across a network connection.
In fact, I have found that views which link almost all the columns from a number of huge tables in my datawarehouse have not introduced any performance issues at all, even through relatively few of those columns are requested from outside the view. The optimizer handles that well and is able to push the external filter criteria down into the view very well.
However, for all the reasons given above, I very rarely use SELECT *.
I have some business processes where a number of CTEs are built on top of each other, effectively building derived columns from derived columns from derived columns (which will hopefully one day being refactored as the business rationalizes and simplifies these calculations), and in that case, I need all the columns to drop through each time, and I use SELECT * - but SELECT * is not used at the base layer, only in between the first CTE and the last.
The situation on SQL Server is actually even worse than the answer by #user12861 implies: if you use SELECT * against multiple tables, adding columns to a table referenced early in the query will actually cause your view to return the values of the new columns under the guise of the old columns. See the example below:
-- create two tables
CREATE TABLE temp1 (ColumnA INT, ColumnB DATE, ColumnC DECIMAL(2,1))
CREATE TABLE temp2 (ColumnX INT, ColumnY DATE, ColumnZ DECIMAL(2,1))
GO
-- populate with dummy data
INSERT INTO temp1 (ColumnA, ColumnB, ColumnC) VALUES (1, '1/1/1900', 0.5)
INSERT INTO temp2 (ColumnX, ColumnY, ColumnZ) VALUES (1, '1/1/1900', 0.5)
GO
-- create a view with a pair of SELECT * statements
CREATE VIEW vwtemp AS
SELECT *
FROM temp1 INNER JOIN temp2 ON 1=1
GO
-- SELECT showing the columns properly assigned
SELECT * FROM vwTemp
GO
-- add a few columns to the first table referenced in the SELECT
ALTER TABLE temp1 ADD ColumnD varchar(1)
ALTER TABLE temp1 ADD ColumnE varchar(1)
ALTER TABLE temp1 ADD ColumnF varchar(1)
GO
-- populate those columns with dummy data
UPDATE temp1 SET ColumnD = 'D', ColumnE = 'E', ColumnF = 'F'
GO
-- notice that the original columns have the wrong data in them now, causing any datatype-specific queries (e.g., arithmetic, dateadd, etc.) to fail
SELECT *
FROM vwtemp
GO
-- clean up
DROP VIEW vwTemp
DROP TABLE temp2
DROP TABLE temp1
It's because you don't always need every variable, and also to make sure that you are thinking about what you specifically need.
There's no point getting all the hashed passwords out of the database when building a list of users on your site for instance, so a select * would be unproductive.
Once upon a time, I created a view against a table in another database (on the same server) with
Select * From dbname..tablename
Then one day, a column was added to the targetted table. The view started returning totally incorrect results until it was redeployed.
Totally incorrect : no rows.
This was on Sql Server 2000.
I speculate that this is because of syscolumns values that the view had captured, even though I used *.
A SQL query is basically a functional unit designed by a programmer for use in some context. For long-term stability and supportability (possibly by someone other than you) everything in a functional unit should be there for a purpose, and it should be reasonably evident (or documented) why it's there - especially every element of data.
If I were to come along two years from now with the need or desire to alter your query, I would expect to grok it pretty thoroughly before I would be confident that I could mess with it. Which means I would need to understand why all the columns are called out. (This is even more obviously true if you are trying to reuse the query in more than one context. Which is problematic in general, for similar reasons.) If I were to see columns in the output that I couldn't relate to some purpose, I'd be pretty sure that I didn't understand what it did, and why, and what the consequences would be of changing it.
It's generally a bad idea to use *. Some code certification engines mark this as a warning and advise you to explicitly refer only the necessary columns. The use of * can lead to performance louses as you might only need some columns and not all. But, on the other hand, there are some cases where the use of * is ideal. Imagine that, no matter what, using the example you provided, for this view (aview) you would always need all the columns in these tables. In the future, when a column is added, you wouldn't need to alter the view. This can be good or bad depending the case you are dealing with.
I think it depends on the language you are using. I prefer to use select * when the language or DB driver returns a dict(Python, Perl, etc.) or associative array(PHP) of the results. It makes your code alot easier to understand if you are referring to the columns by name instead of as an index in an array.
No one else seems to have mentioned it, but within SQL Server you can also set up your view with the schemabinding attribute.
This prevents modifications to any of the base tables (including dropping them) that would affect the view definition.
This may be useful to you for some situations. I realise that I haven't exactly answered your question, but thought I would highlight it nonetheless.
And if you have joins using select * automatically means you are returning more data than you need as the data in the join fields is repeated. This is wasteful of database and network resources.
If you are naive enough to use views that call other views, using select * can make them even worse performers (This is technique that is bad for performance on its own, calling mulitple columns you don't need makes it much worse).

Which are the SQL improvements you are waiting for?

Dealing with SQL shows us some limitations and gives us an opportunity to imagine what could be.
Which improvements to SQL are you waiting for? Which would you put on top of the wish list?
I think it can be nice if you post in your answer the database your feature request lacks.
T-SQL Specific: A decent way to select from a result set returned by a stored procedure that doesn't involve putting it into a temporary table or using some obscure function.
SELECT * FROM EXEC [master].[dbo].[xp_readerrorlog]
I know it's wildly unrealistic, but I wish they'd make the syntax of INSERT and UPDATE consistent. Talk about gratuitous non-orthogonality.
Operator to manage range of dates (or numbers):
where interval(date0, date1) intersects interval(date3, date4)
EDIT: Date or numbers, of course are the same.
EDIT 2: It seems Oracle have something to go, the undocumented OVERLAPS predicate. More info here.
A decent way of walking a tree with hierarchical data. Oracle has CONNECT BY but the simple and common structure of storing an object and a self-referential join back to the table for 'parent' is hard to query in a natural way.
More SQL Server than SQL but better integration with Source Control. Preferably SVN rather than VSS.
Implicit joins or what it should be called (That is, predefined views bound to the table definition)
SELECT CUSTOMERID, SUM(C.ORDERS.LINES.VALUE) FROM
CUSTOMER C
A redesign of the whole GROUP BY thing so that every expression in the SELECT clause doesn't have to be repeated in the GROUP BY clause
Some support for let expressions or otherwise more legal places to use an alias, a bit related to the GROUP BY thing, but I find other times what I just hate Oracle for forcing me to use an outer select just to reference a big expression by alias.
I would like to see the ability to use Regular Expressions in string handling.
A way of dynamically specifying columns/tables without having to resort to full dynamic sql that executes in another context.
Ability to define columns based on other columns ad infinitum (including disambiguation).
This is a contrived example and not a real world case, but I think you'll see where I'm going:
SELECT LTRIM(t1.a) AS [a.new]
,REPLICATE(' ', 20 - LEN([a.new])) + [a.new] AS [a.conformed]
,LEN([a.conformed]) as [a.length]
FROM t1
INNER JOIN TABLE t2
ON [a.new] = t2.a
ORDER BY [a.new]
instead of:
SELECT LTRIM(t1.a) AS [a.new]
,REPLICATE(' ', 20 - LEN(LTRIM(t1.a))) + LTRIM(t1.a) AS [a.conformed]
,LEN(REPLICATE(' ', 20 - LEN(LTRIM(t1.a))) + LTRIM(t1.a)) as [a.length]
FROM t1
INNER JOIN TABLE t2
ON LTRIM(t1.a) = t2.a
ORDER BY LTRIM(t1.a)
Right now, in SQL Server 2005 and up, I would use a CTE and build up in successive layers.
I'd like the vendors to actually standardise their SQL. They're all guilty of it. The LIMIT/OFFSET clause from MySQL and PostGresql is a good solution that no-one else appears to do. Oracle has it's own syntax for explicit JOINs whilst everyone else uses ANSI-92. MySQL should deprecate the CONCAT() function and use || like everyone else. And there are numerous clauses and statements that are outside the standard that could be wider spread. MySQL's REPLACE is a good example. There's more, with issues about casting and comparing types, quirks of column types, sequences, etc etc etc.
parameterized order by, as in:
select * from tableA order by #columName
Support in SQL to specify if you want your query plan to be optimized to return the first rows quickly, or all rows quickly.
Oracle has the concept of FIRST_ROWS hint, but a standard approach in the language would be useful.
Automatic denormalization.
But I may be dreaming.
Improved pivot tables. I'd like to tell it to automatically create the columns based on the keys found in the data.
On my wish list is a database supporting sub-queries in CHECK-constraints, without having to rely on materialized view tricks. And a database which supports the SQL standard's "assertions", i.e. constraints which may span more than one table.
Something else: A metadata-related function which would return the possible values of a given column, if the set of possible values is low. I.e., if a column has a foreign key to another column, it would return the existing values in the column being referred to. Of if the column has a CHECK-constraint like "CHECK foo IN(1,2,3)", it would return 1,2,3. This would make it easier to create GUI elements based on a table schema: If the function returned a list of two values, the programmer could decide that a radio button widget would be relevant - or if the function returned - e.g. - 10 values, the application showed a dropdown-widget instead. Etc.
UPSERT or MERGE in PostgreSQL. It's the one feature whose absence just boggles my mind. Postgres has everything else; why can't they get their act together and implement it, even in limited form?
Check constraints with subqueries, I mean something like:
CHECK ( 1 > (SELECT COUNT(*) FROM TABLE WHERE A = COLUMN))
These are all MS Sql Server/T-SQL specific:
"Natural" joins based on an existing Foreign Key relationship.
Easily use a stored proc result as a resultset
Some other loop construct besides while
Unique constraints across non NULL values
EXCEPT, IN, ALL clauses instead of LEFT|RIGHT JOIN WHERE x IS [NOT] NULL
Schema bound stored proc (to ease #2)
Relationships, schema bound views, etc. across multiple databases
WITH clause for other statements other than SELECT, it means for UPDATE and DELETE.
For instance:
WITH table as (
SELECT ...
)
DELETE from table2 where not exists (SELECT ...)
Something which I call REFERENCE JOIN. It joins two tables together by implicitly using the FOREIGN KEY...REFERENCES constraint between them.
A relational algebra DIVIDE operator. I hate always having to re-think how to do all elements of table a that are in all of given from table B.
http://www.tc.umn.edu/~hause011/code/SQLexample.txt
String Agregation on Group by (In Oracle is possible with this trick):
SELECT deptno, string_agg(ename) AS employees
FROM emp
GROUP BY deptno;
DEPTNO EMPLOYEES
---------- --------------------------------------------------
10 CLARK,KING,MILLER
20 SMITH,FORD,ADAMS,SCOTT,JONES
30 ALLEN,BLAKE,MARTIN,TURNER,JAMES,WARD
More OOP features:
stored procedures and user functions
CREATE PROCEDURE tablename.spname ( params ) AS ...
called via
EXECUTE spname
FROM tablename
WHERE conditions
ORDER BY
which implicitly passes a cursor or a current record to the SP. (similar to inserted and deleted pseudo-tables)
table definitions with inheritance
table definition as derived from base table, inheriting common columns etc
Btw, this is not necessarily real OOP, but only syntactic sugar on existing technology, but it would simplify development a lot.
Abstract tables and sub-classing
create abstract table person
(
id primary key,
name varchar(50)
);
create table concretePerson extends person
(
birth date,
death date
);
create table fictionalCharacter extends person
(
creator int references concretePerson.id
);
Increased temporal database support in Sql Server. Intervals, overlaps, etc.
Increased OVER support in Sql Server, including LAG, LEAD, and TOP.
Arrays
I'm not sure what's holding this back but lack of arrays lead to temp tables and related mess.
Some kind of UPGRADE table which allows to make changes on the table to be like the given:
CREATE OR UPGRADE TABLE
(
a VARCHAR,
---
)
My wish list (for SQLServer)
Ability to store/use multiple execution plans for a stored procedure concurrently and have the system automatically understand the best stored plan to use at each execution.
Currently theres one plan - if it is no longer optimal its used anyway or a brand new one is computed in its place.
Native UTF-8 storage
Database mirroring with more than one standby server and the ability to use a recovery model approaching 'simple' provided of course all servers are up and the transaction commits everywhere.
PCRE in replace functions
Some clever way of reusing fragments of large sql queries, stored match conditions, select conditions...etc. Similiar to functions but actually implemented more like preprocessor macros.
Comments for check constraints. With this feature, an application (or the database itself when raising an error) can query the metadata and retrieve that comment to show it to the user.
Automated dba notification in the case where the optimizer generates a plan different that the plan that that the query was tested with.
In other words, every query can be registered. At that time, the plan is saved. Later when the query is executed, if there is a change to the plan, the dba receives a notice, that something unexpected occurred.