Due to the rollout/architecture of our system, I would like to create a scalar UDF to use within a SELECT statement in a stored procedure
Because our various DEV/TEST/LIVE environments have different schema names, I would ideally like to name my UDF as dbo.MyFunctionName, HOWEVER, because my function contains logic that uses a SELECT statement to get values from a database table to return the answer which means I cannot use dbo but must put it under the schema of the database (ie: TEST) NOTE - this is right, right?
When calling the function from a SELECT statement, you must provide the two-part name, so I must use TEST.MyFunctionName
SELECT Name, Age, TEST.MyFunctionName(PersonID) FROM People
Is there a way to parameterise the schema name so that when I roll it out to different schema, I do not need to create the function under each one and/or amend how it is called? Ideally, I'm looking for something like
SELECT Name, Age, SCHEMA_NAME().MyFunctionName(PersonID) FROM People
but I'd like to NOT use Dynamic SQL if possible
Many thanks
Your question is really two questions:
Does my function need to have the same schema as the tables it's selecting from? The answer is no. dbo.MyFunction() can select from test.MyTable just fine. However, you may run into issues with ownership chaining -- different schemas mean explicit permissions must exist for the user to select from the table. See Books Online for details.
Can I invoke a function with a parameterized schema, without using dynamic SQL? The answer is no. An object name is static -- although SQL does have a "feature" in the form of deferred name resolution that allows you to refer to objects in stored procedures even if they do not exist, this still doesn't allow you to make the schema name variable.
I would more so highly recommend changing the architecture to be different Servers (ideally--to keep instance/database/schema names the same), else difference Instances (less ideal, but keeps database/schema names the same), or at the very least different Databases (even less ideal, but keeps the schema names the same). Just from a pure coding and testing perspective, you are co-mingling your test and production code and data. It is way to easy to make a mistake and cross schemas for data. And why force production to suffer from performance issues in dev? You can't even backup/restore dev separately. Even if these different schemas are in different databases or servers, from a QA perspective, the code that you are testing in "dev" is not the code being deployed to production.
HOWEVER, you can accomplish this without much trickery by making use of different Users. Each user has a default schema. If you had three users: DevUser, TestUser, and LiveUser, each with their respective default schemas, then you would simply not use schema names when referencing objects. SQL Server will check the default schema first. This way, each user would get their respective objects.
Related
I asked a previous question about shortening the table names in SQL Server, and came across this article: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2012/dn148262(v=msdn.10), which suggests using the fully scoped table name, such as [dbo].[Title] instead of just Title.
What is the reason to use the fully-scoped table name? I've never come across a place where there have been multiple identical table names in a database (actually I didn't even think it's possible?), so for me that seems like it's unnecessary, but I'm very new to SQL Server, so was wondering why this is the preferred way to do it.
From the documentation that you gave link to:
"For ad-hoc queries and prepared statements, query reuse will not occur if object name resolution needs to occur. Queries that are contained within stored procedures, functions, and triggers do not have this limitation.
However, in general it should be noted that the use of two-part object names (that is, schema.object) provides more opportunities for plan reuse and should be encouraged."
query reuse will not occur if object name resolution needs to occur
Is it possible with Netezza queries to include sql files (which contain specific sql code) or it is not the right way of usage ?
Here is an example.
I have some common sql code (lets say common.sql) which creates some temp table and needs to be used across multiple other queries (lets say analysis1.sql, analysis2.sql etc.) - . From a code management perspective it is quite overwhelming to maintain if the code in common.sql is repeated across the many other queries. Is there a DRY way to do this - something like #include <common.sql> from the other queries to call the reused code common.sql ?
Including sql files is not the right way to do it. If you wish to persist with this you could use a preprocessor like cpp or even php to assemble the files for you and have a build process to generate finished ones.
However from a maintainability perspective you are better off creating views and functions for reusable content. Note that this can pose optimization barriers so large queries are often the way to go.
I agree, views, functions (table values if needed) or more likely: stored procedures is the way to go.
We have had a lot of luck letting stored procedures generate complex but repeatable code patterns on the fly based on input parameters and metadata on the tables being processed.
An example: all tables has a 'unique constraint' (which is not really unique, but that doesn't matter since it isn't enforced in Netezza) that has a fixed name of UBK_[tablename]
UBK is used as a 'signal' to the stored procedure identifying the columns of the BusinessKey for a classic 'kimball style' type 2 dimension table.
The SP can then apply the 'incoming' rows to the target table just by being supplied With the name of the target table and a 'stage' table containing all the same column names and data types.
Other examples could be a SP that takes a tablename and three arguments each with a 'string,of,columns' and does a 'excel style pivot' with group-by on the columns in the first argument and uses the second argument as to do a 'select distinct' to generate new column names for the pivoted columns, and does a 'sum' on the column in the third argument into some target table you specify the name for...
Can you follow me?
I think that the nzsql command line tool may be able to do an 'include', but a combination of strong 'building block stored procedures' and perl/python and/or an ETL tool will most likely proove a better choice
In an ad-hoc query using Select ColumnName is better, but does it matter in a Stored Procedure after it's saved in the plan guide?
Always explicitly state the columns, even in a stored procedure. SELECT * is considered bad practice.
For instance you don't know the column order that will be returned, some applications may be relying on a specific column order.
I.e. the application code may look something like:
Id = Column[0]; // bad design
If you've used SELECT * ID may no longer be the first column and cause the application to crash. Also, if the database is modified and an additional 5 fields have been added you are returning additional fields that may not be relevant.
These topics always elicit blanket statements like ALWAYS do this or NEVER do that, but the reality is, like with most things it depends on the situation. I'll concede that it's typically good practice to list out columns, but whether or not it's bad practice to use SELECT * depends on the situation.
Consider a variety of tables that all have a common field or two, for example we have a number of tables that have different layouts, but they all have 'access_dt' and 'host_ip'. These tables aren't typically used together, but there are instances when suspicious activity prompts a full report of all activity. These aren't common, and they are manually reviewed, as such, they are well served by a stored procedure that generates a report by looping through every log table and using SELECT * leveraging the common fields between all tables.
It would be a waste of time to list out fields in this situation.
Again, I agree that it's typically good practice to list out fields, but it's not always bad practice to use SELECT *.
Edit: Tried to clarify example a bit.
It's a best practice in general but if you actually do need all the column, you'd better use the quickly read "SELECT *".
The important thing is to avoid retreiving data you don't need.
It is considered bad practice in situations like stored procedures when you are querying large datasets with table scans. You want to avoid using table scans because it causes a hit to the performance of the query. It's also a matter of readability.
SOme other food for thought. If your query has any joins at all you are returning data you don't need because the data in the join columns is the same. Further if the table is later changed to add some things you don't need (such as columns for audit purposes) you may be returning data to the user that they should not be seeing.
Nobody has mentioned the case when you need ALL columns from a table, even if the columns change, e.g. when archiving table rows as XML. I agree one should not use "SELECT *" as a replacement for "I need all the columns that currently exist in the table," just out of laziness or for readability. There needs to be a valid reason. It could be essential when one needs "all the columns that could exist in the table."
Also, how about when creating "wrapper" views for tables?
I've always preached to my developers that SELECT * is evil and should be avoided like the plague.
Are there any cases where it can be justified?
I'm not talking about COUNT(*) - which most optimizers can figure out.
Edit
I'm talking about production code.
And one great example I saw of this bad practice was a legacy asp application that used select * in a stored procedure, and used ADO to loop through the returned records, but got the columns by index. You can imagine what happened when a new field was added somewhere other than the end of the field list.
I'm quite happy using * in audit triggers.
In that case it can actually prove a benefit because it will ensure that if additional columns are added to the base table it will raise an error so it cannot be forgotten to deal with this in the audit trigger and/or audit table structure.
(Like dotjoe) I am also happy using it in derived tables and column table expressions. Though I habitually do it the other way round.
WITH t
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY a) AS RN
FROM foo)
SELECT a,
b,
c,
RN
FROM t;
I'm mostly familiar with SQL Server and there at least the optimiser has no problem recognising that only columns a,b,c will be required and the use of * in the inner table expression does not cause any unnecessary overhead retrieving and discarding unneeded columns.
In principle SELECT * ought to be fine in a view as well as it is the final SELECT from the view where it ought to be avoided however in SQL Server this can cause problems as it stores column metadata for views which is not automatically updated when the underlying tables change and the use of * can lead to confusing and incorrect results unless sp_refreshview is run to update this metadata.
There are many scenarios where SELECT * is the optimal solution. Running ad-hoc queries in Management Studio just to get a sense of the data you're working with. Querying tables where you don't know the column names yet because it's the first time you've worked with a new schema. Building disposable quick'n'dirty tools to do a one-time migration or data export.
I'd agree that in "proper" development, you should avoid it - but there's lots of scenarios where "proper" development isn't necessarily the optimum solution to a business problem. Rules and best practices are great, as long as you know when to break them. :)
I'll use it in production when working with CTEs. But, in this case it's not really select *, because I already specified the columns in the CTE. I just don't want to respecify in the final select.
with t as (
select a, b, c from foo
)
select t.* from t;
None that I can think of, if you are talking about live code.
People saying that it makes adding columns easier to develop (so they automatically get returned and can be used without changing the Stored procedure) have no idea about writing optimal code/sql.
I only ever use it when writing ad-hoc queries that will not get reused (finding out the structure of a table, getting some data when I am not sure what the column names are).
I think using select * in an exists clause is appropriate:
select some_field from some_table
where exists
(select * from related_table [join condition...])
Some people like to use select 1 in this case, but it's not elegant, and it doesn't buy any performance improvements (early optimization strikes again).
In production code, I'd tend to agree 100% with you.
However, I think that the * more than justifies its existence when performing ad-hoc queries.
You've gotten a number of answers to your question, but you seem to be dismissing everything that isn't parroting back what you want to hear. Still, here it is for the third (so far) time: sometimes there is no bottleneck. Sometimes performance is way better than fine. Sometimes the tables are in flux, and amending every SELECT query is just one more bit of possible inconsistency to manage. Sometimes you've got to deliver on an impossible schedule and this is the last thing you need to think about.
If you live in bullet time, sure, type in all the column names. But why stop there? Re-write your app in a schema-less dbms. Hell, write your own dbms in assembly. That'd really show 'em.
And remember if you use select * and you have a join at least one field will be sent twice (the join field). This wastes database resources and network resources for no reason.
As a tool I use it to quickly refresh my memory as to what I can possibly get back from a query. As a production level query itself .. no way.
When creating an application that deals with the database, like phpmyadmin, and you are in a page where to display a full table, in that case using SELECT * can be justified, I guess.
About the only thing that I can think of would be when developing a utility or SQL tool application that is being written to run against any database. Even here though, I would tend to query the system tables to get the table structure and then build any necessary query from that.
There was one recent place where my team used SELECT * and I think that it was ok... we have a database that exists as a facade against another database (call it DB_Data), so it is primarily made up of views against the tables in the other database. When we generate the views we actually generate the column lists, but there is one set of views in the DB_Data database that are automatically generated as rows are added to a generic look-up table (this design was in place before I got here). We wrote a DDL trigger so that when a view is created in DB_Data by this process then another view is automatically created in the facade. Since the view is always generated to exactly match the view in DB_Data and is always refreshed and kept in sync, we just used SELECT * for simplicity.
I wouldn't be surprised if most developers went their entire career without having a legitimate use for SELECT * in production code though.
I've used select * to query tables optimized for reading (denormalized, flat data). Very advantageous since the purpose of the tables were simply to support various views in the application.
How else do the developers of phpmyadmin ensure they are displaying all the fields of your DB tables?
It is conceivable you'd want to design your DB and application so that you can add a column to a table without needing to rewrite your application. If your application at least checks column names it can safely use SELECT * and treat additional columns with some appropriate default action. Sure the app could consult system catalogs (or app-specific catalogs) for column information, but in some circumstances SELECT * is syntactic sugar for doing that.
There are obvious risks to this, however, and adding the required logic to the app to make it reliable could well simply mean replicating the DB's query checks in a less suitable medium. I am not going to speculate on how the costs and benefits trade off in real life.
In practice, I stick to SELECT * for 3 cases (some mentioned in other answers:
As an ad-hoc query, entered in a SQL GUI or command line.
As the contents of an EXISTS predicate.
In an application that dealt with generic tables without needing to know what they mean (e.g. a dumper, or differ).
Yes, but only in situations where the intention is to actually get all the columns from a table not because you want all the columns that a table currently has.
For example, in one system that I worked on we had UDFs (User Defined Fields) where the user could pick the fields they wanted on the report, the order as well as filtering. When building a result set it made more sense to simply "select *" from the temporary tables that I was building instead of having to keep track of which columns were active.
I have several times needed to display data from a table whose column names were unknown. So I did SELECT * and got the column names at run time.
I was handed a legacy app where a table had 200 columns and a view had 300. The risk exposure from SELECT * would have been no worse than from listing all 300 columns explicitly.
Depends on the context of the production software.
If you are writing a simple data access layer for a table management tool where the user will be selecting tables and viewing results in a grid, then it would seem *SELECT ** is fine.
In other words, if you choose to handle "selection of fields" through some other means (as in automatic or user-specified filters after retrieving the resultset) then it seems just fine.
If on the other hand we are talking about some sort of enterprise software with business rules, a defined schema, etc. ... then I agree that *SELECT ** is a bad idea.
EDIT: Oh and when the source table is a stored procedure for a trigger or view, "*SELECT **" should be fine because you're managing the resultset through other means (the view's definition or the stored proc's resultset).
Select * in production code is justifiable any time that:
it isn't a performance bottleneck
development time is critical
Why would I want the overhead of going back and having to worry about changing the relevant stored procedures, every time I add a field to the table?
Why would I even want to have to think about whether or not I've selected the right fields, when the vast majority of the time I want most of them anyway, and the vast majority of the few times I don't, something else is the bottleneck?
If I have a specific performance issue then I'll go back and fix that. Otherwise in my environment, it's just premature (and expensive) optimisation that I can do without.
Edit.. following the discussion, I guess I'd add to this:
... and where people haven't done other undesirable things like tried to access columns(i), which could break in other situations anyway :)
I know I'm very late to the party but I'll chip in that I use select * whenever I know that I'll always want all columns regardless of the column names. This may be a rather fringe case but in data warehousing, I might want to stage an entire table from a 3rd party app. My standard process for this is to drop the staging table and run
select *
into staging.aTable
from remotedb.dbo.aTable
Yes, if the schema on the remote table changes, downstream dependencies may throw errors but that's going to happen regardless.
If you want to find all the columns and want order, you can do the following (at least if you use MySQL):
SHOW COLUMNS FROM mytable FROM mydb; (1)
You can see every relevant information about all your fields. You can prevent problems with types and you can know for sure all the column names. This command is very quick, because you just ask for the structure of the table. From the results you will select all the name and will build a string like this:
"select " + fieldNames[0] + ", fieldNames[1]" + ", fieldNames[2] from mytable". (2)
If you don't want to run two separate MySQL commands because a MySQL command is expensive, you can include (1) and (2) into a stored procedure which will have the results as an OUT parameter, that way you will just call a stored procedure and every command and data generation will happen at the database server.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What is the point (if any) in having a table in a database with only one row?
Note: I'm not talking about the possibility of having only one row in a table, but when a developer deliberately makes a table that is intended to always have exactly one row.
Edit:
The sales tax example is a good one.
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one large table; I assume I'm missing something.
I've seen something like this when a developer was asked to create a configuration table to store name-value pairs of data that needs to persist without being changed often. He ended up creating a one-row table with a column for each configuration variable. I wouldn't say it's a good idea, but I can certainly see why the developer did it given his instructions. Needless to say it didn't pass review.
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one row; I assume I'm missing something.
This doesn't sound like good design, unless there are some important details you don't know about. If there are three pieces of information that have the same constraints, the same use and the same structure, they should be stored in the same table, 99% of the time. That's a big part of what tables are for fundamentally.
For some things you only need one row - typically system configuration data. For example, "current sales tax rate". This might change in the future and so shouldn't be hardcoded, but you'll typically only ever need one at any given time. This kind of data needs to be in the database so that queries can use it in computations.
It's not necessarily a bad idea.
What if you had some global state (say, a boolean) that you wanted to store somewhere? And you wanted your stored procedures to easily access this state?
You could create a table with a primary key whose value range was limited to exactly one value.
Single row is like a singleton class. purpose: to control or manage some other process.
Single row table could act as a critical section or as deterministic automaton (kind of dispatcher based on row values)
Single row is use full in a table COMPANY_DESCRIPTION, to obtain consistent data about that company. Use full on company letters and addressing.
Single row is use full to contain an actual value like VAT or Date or Time, and so on.
It can be useful sometime to emulate some features the Database system doesn't provide. I'm thinking of sequences in MySQL for instance.
If your database is your application, then it probably makes sense for storing configuration data that might be required by stored procedures implementing business logic.
If you have an application that could use the file system to store information, then I don't think there is an advantage to using the database over an XML or flat file, except maybe that most developers are now far more well versed in using SQL to store and retrieve data than accessing the file system.
What is the point (if any) in having a table in a database with only one row?
A relational database stores things as relations: a tuples of data satisfying some relation.
Like, this one: "a VAT of this many percent is in effect in my country now".
If only one tuple satisifies this relation, then yes, it will be the only one in the table.
SQL cannot store variables: it can store a set consisting of 1 element, this is a one-row table.
Also, SQL is a set based language, and for some operations you need a fake set of only one row, like, to select a constant expression.
You cannot just SELECT out of nothing in Oracle, you need a FROM clause.
Oracle has a pseudotable, dual, which contains only one row and only one column.
Once, long time ago, it used to have two rows (hence the name dual), but lost its second row somewhere on its way to version 7.
MySQL has this pseudotable too, but MySQL is able to do selects without FROM clause. Still, it's useful when you need an empty rowset: SELECT 1 FROM dual WHERE NULL
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one large table; I assume I'm missing something.
It may be a kind of "have it all or lose" scenario, when all three certificates are needed at once:
SELECT *
FROM ssl1
CROSS JOIN
ssl2
CROSS JOIN
ssl3
If any if the certificates is missing, the whole query returns nothing.
A table with a single row can be used to store application level settings that are shared across all database users. 'Maximum Allowed Users' for example.
Funny... I asked myself the same question. If you just want to store some simple value and your ONLY method of storage is an SQL server, that's pretty much what you have to do. If I have to do this, I usually end up creating a table with several columns and one row. I've seen a couple commercial products do this as well.
We have used a single-row table in the past (not often). In our case, this table was used to store system-wide configuration values that were updatable via a web interface. We could have gone the route of a simple name/value table, but the end client preferred a single row. I personally would have preferred the latter, but it really is up to preference, especially if this table will never have any sort of relationship with another table.
I really cannot figure out why this would be the best solution. It seams more efficient to just have some kind of config file that will contain the data that would be in the tables one row. The cost of connecting to the database and querying the one row would be more costly. However if this is going to be some kind of config for the database logic. Then this would make a little bit more sense depending on the type of database you are using.
I use the totally awesome rails-settings plugin for this http://github.com/Squeegy/rails-settings/tree/master
It's really easy to set up and provides a nice syntax:
Settings.admin_password = 'supersecret'
Settings.date_format = '%m %d, %Y'
Settings.cocktails = ['Martini', 'Screwdriver', 'White Russian']
Settings.foo = 123
Want a list of all the settings?
Settings.all # returns {'admin_password' => 'super_secret', 'date_format' => '%m %d, %Y'}
Set defaults for certain settings of your app. This will cause the defined settings to return with the Specified value even if they are not in the database. Make a new file in config/initializers/settings.rb with the following:
Settings.defaults[:some_setting] = 'footastic'
A use for this might be to store the current version of the database.
If one were storing database versions for schema changes it would need to reside within the database itself.
I currently analyse the schema and update accordingly but am thinking of moving to versioning. Unless someone has a better idea.
I use vb.net and sql express
Unless there are insert constraints on the table a timestamp for versioning then this sounds like a bad idea.
There was a table set up like this in a project I inherited. It was for configuration data, and the reason that was given was that it made for very simple queries:
SELECT WidgetSize FROM ConfigTable
SELECT FooLength FROM ConfigTable
Okay fine. We converted to a generalized configuration table:
ID Name IntValue StringValue TextValue
This has served our purposes well.
CREATE TABLE VERSION (VERSION_STRING VARCHAR2(20 BYTE))
?
I used a single datum in a SQLite database as a counter in a dynamic web page. That's the simplest way I can think of to make it thread-safe (or process-safe to be precise). But I am not sure whether it's a good idea.
I think the best way to deal with these scenarios is to, rather than using a database at all, use the configuration file (which is usually XML) or make your own configuration file that is read during start up of the application. It only takes a few minutes to write the code to read the file in.
The advantage here is that the there is no chance accidentally adding additional values for the same XML variable, and its great for testing because you don't need to write a lot of code to test the different inputs, just a simple change to the text value and re-run the application.