I have a web app that has a large number of tables and variables that the user can select (or not select) at run time. Something like this:
In the DB:
Table A
Table B
Table C
At run time the user can select any number of variables to return. Something like this:
Result Display = A.field1, A.Field3, B.field19
There are up to 100+ total fields spread across 15+ tables that can be returned in a single result set.
We have a query that currently works by creating a temp table to select and aggregate the desired fields then selecting the desired variables from that table. However, this query takes quite some time to execute (30 seconds). I would like to try and find a more efficient way to return the desired results while still allowing the ability for the user to configure the variables to see. I know this can be done as I have seen it done in other areas. Any suggestions?
Instead of using a temporary table, use a view and recompile the view each time your run the query (or just use a subquery or CTE instead of a view). SQL Server might be able to optimize the view based on the fields being selected.
The best reason to use a temporary table would be when intra-day updates are not needed. Then you could create the "temporary" table at night and just select from that table.
The query optimization approach (whether through a view, CTE, or subquery) may not be sufficient. This is a rather hard problem to solve in general. Typically, though, there are probably themes of variables that come from particular subqueries. If so, you can write a stored procedure to generate dynamic SQL that just has the requisite joins for the variables chosen for a given run. Then use that SQL for fetching from the database.
And finally, perhaps there are other ways to optimize the query regardless of the fields being chosen. If you think that might be the case, then simplify the query for human consumption and ask another question
Related
Does it take more time to create a table using select as statement or to just run the select statement? Is the time difference too large or can it be neglected?
For example between
create table a
as select * from b
where c = d;
and
select * from b
where c = d;
which one should run faster and can the time difference be neglected?
Creating the table will take more time. There is more overhead. If you look at the metadata for your database, you will find lots of tables or views that contain information about tables. In particular, the table names and column names need to be stored.
That said, the data processing effort is pretty similar. However, there might be overhead in storing the result set in permanent storage rather than in the data structures needed for a result set. In fact, the result set may never need to be stored "on disk" (i.e. permanently). But with a table creation, that is needed.
Depending on the database, the two queries might also be optimized differently. The SELECT query might be optimized to return the first row as fast as possible. The CREATE query might be optimized to return all rows as fast as possible. Also, the SELECT quite might just look faster if your database and interface start returning rows when they first appear.
I should point out that under most circumstances, the overhead might not really be noticeable. But, you can get other errors with the create table statement that you would not get with just the select. For instance, the table might already exist. Or duplicate column names might pose a problem (although some databases don't allow duplicate column names in result sets either).
I'm working with CA (Broadcom) UIM. I want the most efficient method of pulling distinct values from several views. I have views that start with "V_" for every QOS that exists in the S_QOS_DATA table. I specifically want to pull data for any view that starts with "V_QOS_XENDESKTOP."
The inefficient method that gave me quick results was the following:
select * from s_qos_data where qos like 'QOS_XENDESKTOP%';
Take that data and put it in Excel.
Use CONCAT to turn just the qos names into queries such as:
SELECT DISTINCT samplevalue, 'QOS_XENDESKTOP_SITE_CONTROLLER_STATE' AS qos
FROM V_QOS_XENDESKTOP_SITE_CONTROLLER_STATE union
Copy the formula cell down for all rows and remove Union from the last query as well
as add a semicolon.
This worked, I got the output, but there has to be a more elegant solution. Most of the answers I've found related to iterating through SQL uses numbers or doesn't seem quite what I'm looking for. Examples: Multiple select queries using while loop in a single table? Is it Possible? and Syntax of for-loop in SQL Server
The most efficient method to do what you want to do is to do something like what CA's scripts do (the ones you linked to). That is, use dynamic SQL: create a string containing the SQL you want from system tables, and execute it.
A more efficient method would be to write a different query based on the underlying tables, mimicking the criteria in the views you care about.
Unless your view definitions are changing frequently, though, I recommend against dynamic SQL. (I doubt they change frequently. You regenerate the views no more frequently than you get a new script, right? CA isn't adding tables willy nilly.) AFAICT, that's basically what you're doing already.
Get yourself a list of the view names, and write your query against a union of them, explicitly. Job done: easy to understand, not much work to modify, and you give the server its best opportunity to optimize.
I can imagine that it's frustrating and error-prone not to be able to put all that work into your own view, and query against it at your convenience. It's too bad most organizations don't let users write their own views and procedures (owned by their own accounts, not dbo). The best I can offer is to save what would be the view body to a file, and insert it into a WITH clause in your queries
WITH (... query ...) as V select ... from V
I have a very massive PostgreSQL database table. One of its columns stores a boolean parameter. What I want is a very simple query:
SELECT COUNT(*) FROM myTable WHERE myTable.isSthTrue;
The problem I have is that it is very slow as it needs to check every row if it satisfies the criteria. Adding an index to this parameter speeds up the calculation roughly two times, but doesn't really improve the complexity in general.
Some people on the internet suggest adding triggers that update the count which is stored in a separate table. But that feels like too much effort for an easy problem like this.
If you need an exact result.
Then, yes, a trigger-based solution is likely the best path to go.
If an estimate is okay, consider Materialized Views. (Postgres 9.3+)
Something like CREATE MATERIALIZED VIEW myCount AS SELECT COUNT(*) FROM myTable WHERE myTable.isSthTrue; would maintain a copy of the expensive query you reference. The only caveat is that this aggregate view would not be automatically updated. To do that you need to call REFRESH MATERIALIZED VIEW, which could be done in a cron, or other timed task.
Is there any diffrence between the time taken for Select * and Select count(*) for the table having no primary key and other indexes in SQL server 2008 R2?
I have tried select count(*) from a view and it has taken 00:05:41 for 410063922 records.
Select (*) from view has already taken 10 minutes for first 600000 records and the query is still running. So it looks like that it will take more than 1 hour.
Is there any way through which I can make this view faster without any change in the structure of the underlying tables?
Can I create indexed view for tables without indexes?
Can I use caching for the view inside sql server so if it is called again, it takes less time?
It's a view which contains 20 columns from one table only. The table does not have any indexes.The user is able to query the view. I am not sure whether user does select * or select somecolumn from view with some where conditions. The only thing which I want to do is to propose them for some changes through which their querying on the view will return results faster. I am thinking of indexing and caching but I am not sure whether they are possible on a view with table having no indexes. Indexing is not possible here as mentioned in one of the answers.
Can anyone put some light on caching within sql server 2008 R2?
count(*) returns just a number and select * returns all the data. Imagine having to move all that data and the time it takes for your hundred of thousands of records. Even if your table was indexed probably, running select * on your hundreds of thousands of records will still take a lot of time even if less than before, and should never bee needed in the first place.
Can I create indexed view for tables without indexes?
No, you have to add indexes for indexed results
Can I use caching for the view inside sql server so if it is called again, it takes less time?
Yes you can, but its of no use for such a requirement. Why are you selecting so many records in the first place? You should never have to return millions or thousands of rows of complete data in any query.
Edit
Infact you are trying to get billions of rows without any where clause. This is bound to fail on any server that you can get hold off, so better stop there :)
TL;DR
Indexes do not matter for a SELECT * FROM myTABLE query because there is no condition and billions of rows. Unless you change your query, no optimization can help you
The execution time difference is due to the fact that SELEC * will show the entire content of your table and the SELECT COUNT(*) will only count how many rows are present without showing them.
Answer about optimisation
In my opinion you're taking the problem with the wrong angle. First of all it's important to define the real need of your clients, when the requirements are defined you'll certainly be able to improve your view in order to get better performance and avoid returning billions of data.
Optimisations can even be made on the table structure sometimes (we don't have any info about your current structure).
SQL Server will automatically use a system of caching in order to make the execution quicker but that will not solve your problem.
SQL Server apparently does very different work when its result set field list is different. I just did a test of a query joining several tables where many millions of rows were in play. I tested different queries, which were all the same except for the list of fields in the SELECT clause. Also, the base query (for all tests) returned zero rows.
The SELECT COUNT(*) took 6 seconds and the SELECT MyPrimaryKeyField took 6 seconds. But once I added any other column (even small ones) to the SELECT list, the time jumped to 20 minutes - even though there were no records to return.
When SQL Server thinks it needs to leave its indexes (e.g., to access table columns not included in an index) then its performance is very different - we all know this (which is why SQL Server supports including base columns when creating indexes).
Getting back to the original question, the SQL Server optimizer apparently chooses to access the base table data outside of the indexes before it knows that it has no rows to return. In the poster's original scenario, though, there were no indexes or PK (don't know why), but maybe SQL Server is still accessing table data differently with COUNT(*).
I have two potential roads to take on the following problem, the try it and see methodology won't pay off for this solution as the load on the server is constantly in flux. The two approaches I have are as follows:
select *
from
(
select foo.a,bar.b,baz.c
from foo,bar,baz
-- updated for clarity sake
where foo.a=b.bar
and b.bar=baz.c
)
group by a,b,c
vice
create table results as
select foo.a,bar.b,baz.c
from foo,bar,baz
where foo.a=b.bar
and b.bar=baz.c ;
create index results_spanning on results(a,b,c);
select * from results group by a,b,c;
So in case it isn't clear. The top query performs the group by outright against the multi-table select thus preventing me from using an index. The second query allows me to create a new table that stores the results of the query, proceeding to create a spanning index, then finishing the group by query to utilize the index.
What is the complexity difference of these two approaches, i.e. how do they scale and which is preferable in the case of large quantities of data. Also, the main issue is the performance of the overall select so that is what I am attempting to fix here.
Comments
Are you really doing a CROSS JOIN on three tables? Are those three
columns indexed in their own right? How often do you want to run the
query which delivers the end result?
1) No.
2) Yes, where clause omitted for the sake of discussion as this is clearly a super trivial example
3) Doesn't matter.
2nd Update
This is a temporary table as it is only valid for a brief moment in time, so yes this table will only be queried against one time.
If your query is executed frequently and unacceptably slow, you could look into creating materialized views to pre-compute the results. This gives you the benefit of an indexable "table", without the overhead of creating a table every time.
You'll need to refresh the materialized view (preferably fast if the tables are large) either on commit or on demand. There are some restrictions on how you can create on commit, fast refreshable views, and they will add to your commit time processing slightly, but they will always give the same result as running the base query. On demand MVs will become stale as the underlying data changes until these are refreshed. You'll need to determine whether this is acceptable or not.
So the question is, which is quicker?
Run a query once and sort the result set?
Run a query once to build a table, then build an index, then run the query again and sort the result set?
Hmmm. Tricky one.
The use cases for temporary tables are pretty rare in Oracle. They normally onlya apply when we need to freeze a result set which we are then going to query repeatedly. That is apparently not the case here.
So, take the first option and just tune the query if necessary.
The answer is, as is so often the case with tuning questions, it depends.
Why are you doing a GROUP BY in the first place. The query as you posted it doesn't do any aggregation so the only reason for doing GROUP BY woudl be to eliminate duplicate rows, i.e. a DISTINCT operation. If this is actually the case then you doing some form of cartesian join and one tuning the query would be to fix the WHERE clause so that it only returns discrete records.