I'm working with CA (Broadcom) UIM. I want the most efficient method of pulling distinct values from several views. I have views that start with "V_" for every QOS that exists in the S_QOS_DATA table. I specifically want to pull data for any view that starts with "V_QOS_XENDESKTOP."
The inefficient method that gave me quick results was the following:
select * from s_qos_data where qos like 'QOS_XENDESKTOP%';
Take that data and put it in Excel.
Use CONCAT to turn just the qos names into queries such as:
SELECT DISTINCT samplevalue, 'QOS_XENDESKTOP_SITE_CONTROLLER_STATE' AS qos
FROM V_QOS_XENDESKTOP_SITE_CONTROLLER_STATE union
Copy the formula cell down for all rows and remove Union from the last query as well
as add a semicolon.
This worked, I got the output, but there has to be a more elegant solution. Most of the answers I've found related to iterating through SQL uses numbers or doesn't seem quite what I'm looking for. Examples: Multiple select queries using while loop in a single table? Is it Possible? and Syntax of for-loop in SQL Server
The most efficient method to do what you want to do is to do something like what CA's scripts do (the ones you linked to). That is, use dynamic SQL: create a string containing the SQL you want from system tables, and execute it.
A more efficient method would be to write a different query based on the underlying tables, mimicking the criteria in the views you care about.
Unless your view definitions are changing frequently, though, I recommend against dynamic SQL. (I doubt they change frequently. You regenerate the views no more frequently than you get a new script, right? CA isn't adding tables willy nilly.) AFAICT, that's basically what you're doing already.
Get yourself a list of the view names, and write your query against a union of them, explicitly. Job done: easy to understand, not much work to modify, and you give the server its best opportunity to optimize.
I can imagine that it's frustrating and error-prone not to be able to put all that work into your own view, and query against it at your convenience. It's too bad most organizations don't let users write their own views and procedures (owned by their own accounts, not dbo). The best I can offer is to save what would be the view body to a file, and insert it into a WITH clause in your queries
WITH (... query ...) as V select ... from V
I have a table with over 300 million records only containing key, source and hash value. The inbuilt sql of the application runs a large IN predicate on hash value in its sql to fetch the data.The sqls are performing slowly so need suggestions on how to improve the performance of the sql. I wouldn't be able to change the sql as its an internal sql already built-in the application. So far I have tried to put in an index on the key and another on hash column but doesn't provide much help.
Off the top of my head, you could create a table, perhaps a temporary table (which can also accept indices), containing the values in the IN predicate of your current query. Then, a simple inner join of your original query to this table would have the same effect, except it might be substantially faster if DB2 can take advantage of the index. You would need to make sure that the columns involved in the join both have an index.
You (or your DBA) might like to try the Design Advisor to advise on new indexes etc.
This can be run from the command line tool db2advis https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c0005144.html
Or from Data Server Manager which provides a Web Based front end for the
Query Advisor and Access Path Advisor. https://www-01.ibm.com/support/docview.wss?uid=swg27048195
Problem:
We use entity framework (6.21) as our ORM manager.
Our database is Azure Sql Database.
Because some of the parametrized queries (frequently used in our app) are slow on some of the inputs (on some input it runs 60 seconds on other input it runs 0.4 seconds)
We started investigate those queries using QueryStore and QueryStore explorer in MS SQL Management Studio (MSSMS -> Object Explorer -> Query Store).
We found out, that QueryStore stores two same (same sql query but different params - params are not even stored) queries as different queries (with different query_id).
By different query I mean different row in table
sys.query_store_query).
I checked this by looking into QueryStore tables:
SELECT
qStore.query_id,
qStore.query_text_id,
queryTextStore.query_sql_text
ROW_NUMBER() OVER(PARTITION BY query_sql_text ORDER BY query_sql_text ASC) AS rn
FROM
sys.query_store_query qStore
INNER JOIN
sys.query_store_query_text queryTextStore
ON qStore.query_text_id = queryTextStore.query_text_id
I am not able to compare plans of those queries easily in MSSMS, because each query has its own associated plan.
Expected behaviour:
I would assume that each subsequent run of same query with different parametres would result in either:
1/ re-use of existing plan
or
2/ in creation of another plan based on passed params values...
Example:
The query would look like this (in reality queries are much more complex as they are generated by EntityFramework):
SELECT * FROM tbl WHERE a = #__plinq__
and it's two subsequent runs (with different params) would result in two rows in sys.query_store_query.
Question:
How can I make Azure to save queries with same text as same queries? Or am I missing something or is this expected behaviour?
Or more generally how to tune database queries if they are generated by Entity Framework?
How SQL Server Query Store considers two queries same or different?
Edit1: Update
Based on #PeterB comment (Adding a query hint when calling Table-Valued Function) we were able to solve our problem with slow queries on some params values (we added hint "recompile" on problematic queries).
Based on #GrantFritchey hint I checked context_settings, but there are still multiple rows in query_store table which have same query_sql_text and same context_settings_id but with different query_id .
So we still wonder how SQL Server Query Store consider two queries same or different?
As for the different query entries, the key that Query Store uses for a query consists of:
query_text_id,
context_settings_id,
object_id,
batch_sql_handle,
query_parameterization_type
If any of these is different for a query it will generate a new entry in the query table. Note that batch_sql_handle is only populated for queries referencing temp tables.
So you can check which of these values is different for the queries that you listed.
Currently there are no settings that control the way Query Store aggregates queries. The only way to make it treat them as same is to change your workload so that the fields listed above match. But alternatively probably better approach is that you write your own reporting queries that will aggregate queries and their statistics according to your needs.
I have a web app that has a large number of tables and variables that the user can select (or not select) at run time. Something like this:
In the DB:
Table A
Table B
Table C
At run time the user can select any number of variables to return. Something like this:
Result Display = A.field1, A.Field3, B.field19
There are up to 100+ total fields spread across 15+ tables that can be returned in a single result set.
We have a query that currently works by creating a temp table to select and aggregate the desired fields then selecting the desired variables from that table. However, this query takes quite some time to execute (30 seconds). I would like to try and find a more efficient way to return the desired results while still allowing the ability for the user to configure the variables to see. I know this can be done as I have seen it done in other areas. Any suggestions?
Instead of using a temporary table, use a view and recompile the view each time your run the query (or just use a subquery or CTE instead of a view). SQL Server might be able to optimize the view based on the fields being selected.
The best reason to use a temporary table would be when intra-day updates are not needed. Then you could create the "temporary" table at night and just select from that table.
The query optimization approach (whether through a view, CTE, or subquery) may not be sufficient. This is a rather hard problem to solve in general. Typically, though, there are probably themes of variables that come from particular subqueries. If so, you can write a stored procedure to generate dynamic SQL that just has the requisite joins for the variables chosen for a given run. Then use that SQL for fetching from the database.
And finally, perhaps there are other ways to optimize the query regardless of the fields being chosen. If you think that might be the case, then simplify the query for human consumption and ask another question
I've always preached to my developers that SELECT * is evil and should be avoided like the plague.
Are there any cases where it can be justified?
I'm not talking about COUNT(*) - which most optimizers can figure out.
Edit
I'm talking about production code.
And one great example I saw of this bad practice was a legacy asp application that used select * in a stored procedure, and used ADO to loop through the returned records, but got the columns by index. You can imagine what happened when a new field was added somewhere other than the end of the field list.
I'm quite happy using * in audit triggers.
In that case it can actually prove a benefit because it will ensure that if additional columns are added to the base table it will raise an error so it cannot be forgotten to deal with this in the audit trigger and/or audit table structure.
(Like dotjoe) I am also happy using it in derived tables and column table expressions. Though I habitually do it the other way round.
WITH t
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY a) AS RN
FROM foo)
SELECT a,
b,
c,
RN
FROM t;
I'm mostly familiar with SQL Server and there at least the optimiser has no problem recognising that only columns a,b,c will be required and the use of * in the inner table expression does not cause any unnecessary overhead retrieving and discarding unneeded columns.
In principle SELECT * ought to be fine in a view as well as it is the final SELECT from the view where it ought to be avoided however in SQL Server this can cause problems as it stores column metadata for views which is not automatically updated when the underlying tables change and the use of * can lead to confusing and incorrect results unless sp_refreshview is run to update this metadata.
There are many scenarios where SELECT * is the optimal solution. Running ad-hoc queries in Management Studio just to get a sense of the data you're working with. Querying tables where you don't know the column names yet because it's the first time you've worked with a new schema. Building disposable quick'n'dirty tools to do a one-time migration or data export.
I'd agree that in "proper" development, you should avoid it - but there's lots of scenarios where "proper" development isn't necessarily the optimum solution to a business problem. Rules and best practices are great, as long as you know when to break them. :)
I'll use it in production when working with CTEs. But, in this case it's not really select *, because I already specified the columns in the CTE. I just don't want to respecify in the final select.
with t as (
select a, b, c from foo
)
select t.* from t;
None that I can think of, if you are talking about live code.
People saying that it makes adding columns easier to develop (so they automatically get returned and can be used without changing the Stored procedure) have no idea about writing optimal code/sql.
I only ever use it when writing ad-hoc queries that will not get reused (finding out the structure of a table, getting some data when I am not sure what the column names are).
I think using select * in an exists clause is appropriate:
select some_field from some_table
where exists
(select * from related_table [join condition...])
Some people like to use select 1 in this case, but it's not elegant, and it doesn't buy any performance improvements (early optimization strikes again).
In production code, I'd tend to agree 100% with you.
However, I think that the * more than justifies its existence when performing ad-hoc queries.
You've gotten a number of answers to your question, but you seem to be dismissing everything that isn't parroting back what you want to hear. Still, here it is for the third (so far) time: sometimes there is no bottleneck. Sometimes performance is way better than fine. Sometimes the tables are in flux, and amending every SELECT query is just one more bit of possible inconsistency to manage. Sometimes you've got to deliver on an impossible schedule and this is the last thing you need to think about.
If you live in bullet time, sure, type in all the column names. But why stop there? Re-write your app in a schema-less dbms. Hell, write your own dbms in assembly. That'd really show 'em.
And remember if you use select * and you have a join at least one field will be sent twice (the join field). This wastes database resources and network resources for no reason.
As a tool I use it to quickly refresh my memory as to what I can possibly get back from a query. As a production level query itself .. no way.
When creating an application that deals with the database, like phpmyadmin, and you are in a page where to display a full table, in that case using SELECT * can be justified, I guess.
About the only thing that I can think of would be when developing a utility or SQL tool application that is being written to run against any database. Even here though, I would tend to query the system tables to get the table structure and then build any necessary query from that.
There was one recent place where my team used SELECT * and I think that it was ok... we have a database that exists as a facade against another database (call it DB_Data), so it is primarily made up of views against the tables in the other database. When we generate the views we actually generate the column lists, but there is one set of views in the DB_Data database that are automatically generated as rows are added to a generic look-up table (this design was in place before I got here). We wrote a DDL trigger so that when a view is created in DB_Data by this process then another view is automatically created in the facade. Since the view is always generated to exactly match the view in DB_Data and is always refreshed and kept in sync, we just used SELECT * for simplicity.
I wouldn't be surprised if most developers went their entire career without having a legitimate use for SELECT * in production code though.
I've used select * to query tables optimized for reading (denormalized, flat data). Very advantageous since the purpose of the tables were simply to support various views in the application.
How else do the developers of phpmyadmin ensure they are displaying all the fields of your DB tables?
It is conceivable you'd want to design your DB and application so that you can add a column to a table without needing to rewrite your application. If your application at least checks column names it can safely use SELECT * and treat additional columns with some appropriate default action. Sure the app could consult system catalogs (or app-specific catalogs) for column information, but in some circumstances SELECT * is syntactic sugar for doing that.
There are obvious risks to this, however, and adding the required logic to the app to make it reliable could well simply mean replicating the DB's query checks in a less suitable medium. I am not going to speculate on how the costs and benefits trade off in real life.
In practice, I stick to SELECT * for 3 cases (some mentioned in other answers:
As an ad-hoc query, entered in a SQL GUI or command line.
As the contents of an EXISTS predicate.
In an application that dealt with generic tables without needing to know what they mean (e.g. a dumper, or differ).
Yes, but only in situations where the intention is to actually get all the columns from a table not because you want all the columns that a table currently has.
For example, in one system that I worked on we had UDFs (User Defined Fields) where the user could pick the fields they wanted on the report, the order as well as filtering. When building a result set it made more sense to simply "select *" from the temporary tables that I was building instead of having to keep track of which columns were active.
I have several times needed to display data from a table whose column names were unknown. So I did SELECT * and got the column names at run time.
I was handed a legacy app where a table had 200 columns and a view had 300. The risk exposure from SELECT * would have been no worse than from listing all 300 columns explicitly.
Depends on the context of the production software.
If you are writing a simple data access layer for a table management tool where the user will be selecting tables and viewing results in a grid, then it would seem *SELECT ** is fine.
In other words, if you choose to handle "selection of fields" through some other means (as in automatic or user-specified filters after retrieving the resultset) then it seems just fine.
If on the other hand we are talking about some sort of enterprise software with business rules, a defined schema, etc. ... then I agree that *SELECT ** is a bad idea.
EDIT: Oh and when the source table is a stored procedure for a trigger or view, "*SELECT **" should be fine because you're managing the resultset through other means (the view's definition or the stored proc's resultset).
Select * in production code is justifiable any time that:
it isn't a performance bottleneck
development time is critical
Why would I want the overhead of going back and having to worry about changing the relevant stored procedures, every time I add a field to the table?
Why would I even want to have to think about whether or not I've selected the right fields, when the vast majority of the time I want most of them anyway, and the vast majority of the few times I don't, something else is the bottleneck?
If I have a specific performance issue then I'll go back and fix that. Otherwise in my environment, it's just premature (and expensive) optimisation that I can do without.
Edit.. following the discussion, I guess I'd add to this:
... and where people haven't done other undesirable things like tried to access columns(i), which could break in other situations anyway :)
I know I'm very late to the party but I'll chip in that I use select * whenever I know that I'll always want all columns regardless of the column names. This may be a rather fringe case but in data warehousing, I might want to stage an entire table from a 3rd party app. My standard process for this is to drop the staging table and run
select *
into staging.aTable
from remotedb.dbo.aTable
Yes, if the schema on the remote table changes, downstream dependencies may throw errors but that's going to happen regardless.
If you want to find all the columns and want order, you can do the following (at least if you use MySQL):
SHOW COLUMNS FROM mytable FROM mydb; (1)
You can see every relevant information about all your fields. You can prevent problems with types and you can know for sure all the column names. This command is very quick, because you just ask for the structure of the table. From the results you will select all the name and will build a string like this:
"select " + fieldNames[0] + ", fieldNames[1]" + ", fieldNames[2] from mytable". (2)
If you don't want to run two separate MySQL commands because a MySQL command is expensive, you can include (1) and (2) into a stored procedure which will have the results as an OUT parameter, that way you will just call a stored procedure and every command and data generation will happen at the database server.