Query optimization in SQL

Query optimization in SQL - sql

I have to optimize the physical design of several queries. I have tried several techniques such as indexes or clusters but in most of the queries the best option in terms of consistent gets is creating a materialized view. Is there any reason why not to choose materialized views for optimizing the queries? Because if we could optimize all queries using only materialized views, everything would be much easier and faster.

You can optimize using materialized views. In my experience, they have one major downside: timing. Materialized views are not all updated at exactly the same time.
As a result, different tables that you think might be related might be missing rows. As a trivial example, you might have a foreign key relationship from T1 to T2. However, T2 gets materialized before T1. Then when T1 is materialized, some foreign key values could be missing. I have spent a lot of time dealing with the issues that this causes.
There are ways to adjust for this. For instance, all rows could have create dates and the materialized views could restrict rows only to those where were created or updated up to the previous hour boundary.
There are other issues, in terms of performance and maintenance. For instance, your database load might shift to materializing the views. Or the views might fail due to underlying schema changes. However, once you have a process in place, you will probably find that these are quite manageable.

For some application like OLTP it does not make sense because queries don't read a lot of data.
For some application like datawarehousing applications it could be an option but there are sometimes limitation to SQL statements that can be used to create materialized views (MV).
And in general one must take into account the cost to refresh the MV: if you don't want stale data, refresh should be automatic and there is some overhead to be taken into account.

Related

Redshift query planner and views

I have seen in a few non-Amazon sources that the Redshift query planner has problems working with views (here is one source, here is another, here is a third). By views I mean standard SQL views, not the newly-available materialized views. However I can't find anything about this in the developer's guide, and these sources listed above are a few years out of date. Does anyone know what the current situation is with the Redshift query planner and views, and if there is official Redshift documentation that describes it, where it is located?

The arguments of the blogs are, as you say, a bit outdated as they present as one of the main drawbacks of views the fact that they couldn’t be materialized at the time of writing, which is not the case anymore.
The first link just says that Redshift has trouble at optimizing queries involving views but doesn’t show any benchmark/proof of that nor it explains why and in which way.
The second and third sources have some more merit in that they actually provide alternatives, which are creating an actual table or materialize the view.
My understanding is that views in Redshift don’t inherently suffer from bad performances but that instead, given their transient nature, they don’t take advantage of the clustered architecture of Redshift. Additionally, as mentioned by some of the resources you linked as well, the queries that make up a view get executed every time you query the view and that definitely doesn’t help performances.
I would definitely suggest you to consider aggregating your data in actual tables or look into materializing these views.
To better understand how the planner works I’d take a look at this Query planning and execution workflow

Redshift has no problem working with views. The logic of the view is combined with the rest of the query that calls the view, similar to a subquery or CTE. Redshift plans and optimizes the entire statement (outer query + view logic) as a single statement.
The are 2 main "issues" that people have with views:
Views are bound to the tables (or other views) that they reference. You cannot drop them or make certain changes to them without first dropping the view. To address this Redshift offers WITH NO SCHEMA BINDING syntax so that the view is not bound to its objects. The compromise is that the view is not checked and queries against it may fail if underlying objects are changed.
Views make it very easy to generate extremely complex and inefficient queries that look "simple". This particularly happens when you nest views on top of views. You can use EXPLAIN to see the query plan that Redshift will use for a given query to see how your view is processed.

SQL Views v Stored procedures

I've just started joining Stored Procedures together using views and it seems a simple way of building up a short query using the results of others.
Are there any disadvantages to over relying on Views before I plough on with this method? Am I better to pursue the temporary table option?

The main differences are that a view only actually stores the query not the results (with the exception of materialised views) and views persist after the end of your session. Views are an excellent way of hiding complexity, but does not make the queries run more quickly than if you wrote out the whole thing in one query. Views also do not use up storage space (except for a very small amount for the metadata).
I would recommend using views if you do not have any requirements to speed the queries up further or if you need to be able to reference the data without recreating it subsequent sessions.
Temporary tables do store the result, but just for the current session, so if you need a base query to speed up further queries for the duration of your session, this can be useful.

In fact, views are mostly used for security reasons, and they also make queries more simple (for some cases.) So it just depends on what you are doing, based on if it requires storing and other requirements.

Performance of Tables vs. Views

Recently started working with a database in which the convention is to create a view for every table. If you assume that there is a one to one mapping between tables and views, I was wondering if anyone could tell me the performance impacts of doing something like this. BTW, this is on Oracle.

Assuming the question is about non-materialized views -- Really depends on the query that the view is based on, and what is being done to it. Sometimes, the predicates can be pushed into the view query by the optimizer. If not, then it wouldn't be as good as against the table itself. Views are built on top of tables -- why would you expect that the performance would be better?
Layering views, where you build one view on top of another, is a bad practice because you won't know about issues until run time. It's also less of a chance that predicate pushing will occur with layered views.
Views can also be updateable -- they aren't a reliable means to restricting access to resources if someone has INSERT/UPDATE/DELETE privileges on the underlying tables.
Materialized views are as good as tables, but are notoriously restrictive in what they support.

You don't explain what you're doing in the views? A 1:1 with the tables sounds like you are using the views more like synonyms than a view. IOW, are the views = "SELECT * FROM table", then you'll see no performance hit except on hard parse.
If you are joining to other tables or placing filter clauses in them which prevent predicate pushing than you're bound to see a major hit sometime.

The only pain I have had with views is a distributed query over a DB link. The local optimizer gets some details about the remote object, but the view doesn't tell it about any indexes so you can get some kooky plans.
I've heard about some places that use it as a standard since they can easily 're-order' the columns in a view. Not a big benefit in my opinion by YMMV

Is a view faster than a simple query?

Is a
select * from myView
faster than the query itself to create the view (in order to have the same resultSet):
select * from ([query to create same resultSet as myView])
?
It's not totally clear to me if the view uses some sort of caching making it faster compared to a simple query.

Yes, views can have a clustered index assigned and, when they do, they'll store temporary results that can speed up resulting queries.
Microsoft's own documentation makes it very clear that Views can improve performance.
First, most views that people create are simple views and do not use this feature, and are therefore no different to querying the base tables directly. Simple views are expanded in place and so do not directly contribute to performance improvements - that much is true. However, indexed views can dramatically improve performance.
Let me go directly to the documentation:
After a unique clustered index is created on the view, the view's result set is materialized immediately and persisted in physical storage in the database, saving the overhead of performing this costly operation at execution time.
Second, these indexed views can work even when they are not directly referenced by another query as the optimizer will use them in place of a table reference when appropriate.
Again, the documentation:
The indexed view can be used in a query execution in two ways. The query can reference the indexed view directly, or, more importantly, the query optimizer can select the view if it determines that the view can be substituted for some or all of the query in the lowest-cost query plan. In the second case, the indexed view is used instead of the underlying tables and their ordinary indexes. The view does not need to be referenced in the query for the query optimizer to use it during query execution. This allows existing applications to benefit from the newly created indexed views without changing those applications.
This documentation, as well as charts demonstrating performance improvements, can be found here.
Update 2: the answer has been criticized on the basis that it is the "index" that provides the performance advantage, not the "View." However, this is easily refuted.
Let us say that we are a software company in a small country; I'll use Lithuania as an example. We sell software worldwide and keep our records in a SQL Server database. We're very successful and so, in a few years, we have 1,000,000+ records. However, we often need to report sales for tax purposes and we find that we've only sold 100 copies of our software in our home country. By creating an indexed view of just the Lithuanian records, we get to keep the records we need in an indexed cache as described in the MS documentation. When we run our reports for Lithuanian sales in 2008, our query will search through an index with a depth of just 7 (Log2(100) with some unused leaves). If we were to do the same without the VIEW and just relying on an index into the table, we'd have to traverse an index tree with a search depth of 21!
Clearly, the View itself would provide us with a performance advantage (3x) over the simple use of the index alone. I've tried to use a real-world example but you'll note that a simple list of Lithuanian sales would give us an even greater advantage.
Note that I'm just using a straight b-tree for my example. While I'm fairly certain that SQL Server uses some variant of a b-tree, I don't know the details. Nonetheless, the point holds.
Update 3: The question has come up about whether an Indexed View just uses an index placed on the underlying table. That is, to paraphrase: "an indexed view is just the equivalent of a standard index and it offers nothing new or unique to a view." If this was true, of course, then the above analysis would be incorrect! Let me provide a quote from the Microsoft documentation that demonstrate why I think this criticism is not valid or true:
Using indexes to improve query performance is not a new concept; however, indexed views provide additional performance benefits that cannot be achieved using standard indexes.
Together with the above quote regarding the persistence of data in physical storage and other information in the documentation about how indices are created on Views, I think it is safe to say that an Indexed View is not just a cached SQL Select that happens to use an index defined on the main table. Thus, I continue to stand by this answer.

Generally speaking, no. Views are primarily used for convenience and security, and won't (by themselves) produce any speed benefit.
That said, SQL Server 2000 and above do have a feature called Indexed Views that can greatly improve performance, with a few caveats:
Not every view can be made into an indexed view; they have to follow a specific set of guidelines, which (among other restrictions) means you can't include common query elements like COUNT, MIN, MAX, or TOP.
Indexed views use physical space in the database, just like indexes on a table.
This article describes additional benefits and limitations of indexed views:
You Can…
The view definition can reference one or more tables in the
same database.
Once the unique clustered index is created, additional nonclustered
indexes can be created against the view.
You can update the data in the underlying tables – including inserts,
updates, deletes, and even truncates.
You Can’t…
The view definition can’t reference other views, or tables
in other databases.
It can’t contain COUNT, MIN, MAX, TOP, outer joins, or a few other
keywords or elements.
You can’t modify the underlying tables and columns. The view is
created with the WITH SCHEMABINDING option.
You can’t always predict what the query optimizer will do. If you’re
using Enterprise Edition, it will automatically consider the unique
clustered index as an option for a query – but if it finds a “better”
index, that will be used. You could force the optimizer to use the
index through the WITH NOEXPAND hint – but be cautious when using any
hint.

EDIT: I was wrong, and you should see Marks answer above.
I cannot speak from experience with SQL Server, but for most databases the answer would be no. The only potential benefit that you get, performance wise, from using a view is that it could potentially create some access paths based on the query. But the main reason to use a view is to simplify a query or to standardize a way of accessing some data in a table. Generally speaking, you won't get a performance benefit. I may be wrong, though.
I would come up with a moderately more complicated example and time it yourself to see.

In SQL Server at least, Query plans are stored in the plan cache for both views and ordinary SQL queries, based on query/view parameters. For both, they are dropped from the cache when they have been unused for a long enough period and the space is needed for some other newly submitted query. After which, if the same query is issued, it is recompiled and the plan is put back into the cache. So no, there is no difference, given that you are reusing the same SQL query and the same view with the same frequency.
Obviously, in general, a view, by it's very nature (That someone thought it was to be used often enough to make it into a view) is generally more likely to be "reused" than any arbitrary SQL statement.

Definitely a view is better than a nested query for SQL Server. Without knowing exactly why it is better (until I read Mark Brittingham's post), I had run some tests and experienced almost shocking performance improvements when using a view versus a nested query. After running each version of the query several hundred times in a row, the view version of the query completed in half the time. I'd say that's proof enough for me.

It may be faster if you create a materialized view (with schema binding). Non-materialized views execute just like the regular query.

My understanding is that a while back, a view would be faster because SQL Server could store an execution plan and then just use it instead of trying to figure one out on the fly. I think the performance gains nowadays is probably not as great as it once was, but I would have to guess there would be some marginal improvement to use the view.

I would expect the two queries to perform identically. A view is nothing more than a stored query definition, there is no caching or storing of data for a view. The optimiser will effectively turn your first query into your second query when you run it.

It all depends on the situation. MS SQL Indexed views are faster than a normal view or query but indexed views can not be used in a mirrored database invironment (MS SQL).
A view in any kind of a loop will cause serious slowdown because the view is repopulated each time it is called in the loop. Same as a query. In this situation a temporary table using # or # to hold your data to loop through is faster than a view or a query.
So it all depends on the situation.

There should be some trivial gain in having the execution plan stored, but it will be negligible.

In my finding, using the view is a little bit faster than a normal query. My stored procedure was taking around 25 minutes (working with a different larger record sets and multiple joins) and after using the view (non-clustered), the performance was just a little bit faster but not significant at all. I had to use some other query optimization techniques/method to make it a dramatic change.

Select from a View or from a table will not make too much sense.
Of course if the View does not have unnecessary joins, fields, etc. You can check the execution plan of your queries, joins and indexes used to improve the View performance.
You can even create index on views for faster search requirements. http://technet.microsoft.com/en-us/library/cc917715.aspx
But if you are searching like '%...%' than the sql engine will not benefit from an index on text column. If you can force your users to make searches like '...%' than that will be fast
referred to answer on asp forums :
https://forums.asp.net/t/1697933.aspx?Which+is+faster+when+using+SELECT+query+VIEW+or+Table+

Against all expectation, views are way slower in some circumstances.
I discovered this recently when I had problems with data which was pulled from Oracle which needed to be massaged into another format. Maybe 20k source rows. A small table. To do this we imported the oracle data as unchanged as I could into a table and then used views to extract data.
We had secondary views based on those views. Maybe 3-4 levels of views.
One of the final queries, which extracted maybe 200 rows would take upwards of 45 minutes! That query was based on a cascade of views. Maybe 3-4 levels deep.
I could take each of the views in question, insert its sql into one nested query, and execute it in a couple of seconds.
We even found that we could even write each view into a temp table and query that in place of the view and it was still way faster than simply using nested views.
What was even odder was that performance was fine until we hit some limit of source rows being pulled into the database, performs just dropped off a cliff over the space of a couple of days - a few more source rows was all it took.
So, using queries which pull from views which pull from views is much slower than a nested query - which makes no sense for me.

There is no practical different and if you read BOL you will find that ever your plain old SQL SELECT * FROM X does take advantage of plan caching etc.

The purpose of a view is to use the query over and over again. To that end, SQL Server, Oracle, etc. will typically provide a "cached" or "compiled" version of your view, thus improving its performance. In general, this should perform better than a "simple" query, though if the query is truly very simple, the benefits may be negligible.
Now, if you're doing a complex query, create the view.

No. view is just a short form of your actual long sql query. But yes, you can say actual query is faster than view command/query.
First view query will tranlate into simple query then it will execute, so view query will take more time to execute than simple query.
You can use sql views when you are using joins b/w multiple tables, to reuse complicated query again and again in simple manners.

I ran across this thread and just wanted to share this post from Brent Ozar as something to consider when using availability groups.
Brent Ozar bug report

Is it a good idea to use normalised tables with denormalised indexed views?

I'm architecting a new app at the moment, with a high read:write ratio. At my current employer we have lots of denormalised data on our tables for performance reasons. Is it better practice to have totally 3NF tables and then use indexed views to do all the denormalisation? Should I run queries against the tables or views?
An example of some of the things I am interested are aggregates of columns child tables (e.g. having user post count stored somewhere).

In general it's a good idea to have denormalized views if you need to access across multiple normalized tables very frequently. In most cases it'll be a significant performance increase over using a join and querying directly against the tables, and it's usually not any less maintainable, since either your view or join can be written to be agnostic about changes to parts of the tables that it doesn't use.
Whether all your tables should be in the third normal form is another question. In most applications I've worked with the answer is most tables should be normalized this way, but there are exceptions. Whether to make an exception has to do with how the data is used, and whether you can be very confident about that use not changing in the future.
Having to go back and re-normalize later because you did something the wrong way can be costly, but over-normalizing data that should be straightforward to use and understand can make things more complicated and difficult to maintain than they need to be. Your mileage may vary.

If you are going to use views to present denormalized data to the user (and you're using SQL Server), you should check out the SCHEMABINDING clause. If a view is schemabound, you can index it, and the index will be updated when the underlying tables are updated. In this way, if the indexes are set up well, people who are looking for data can actually select from the index, so it won't need to rebuild the complex view for every query, but users will still see up-to-date date when the underlying tables change.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas