Redshift query planner and views - sql

I have seen in a few non-Amazon sources that the Redshift query planner has problems working with views (here is one source, here is another, here is a third). By views I mean standard SQL views, not the newly-available materialized views. However I can't find anything about this in the developer's guide, and these sources listed above are a few years out of date. Does anyone know what the current situation is with the Redshift query planner and views, and if there is official Redshift documentation that describes it, where it is located?

The arguments of the blogs are, as you say, a bit outdated as they present as one of the main drawbacks of views the fact that they couldn’t be materialized at the time of writing, which is not the case anymore.
The first link just says that Redshift has trouble at optimizing queries involving views but doesn’t show any benchmark/proof of that nor it explains why and in which way.
The second and third sources have some more merit in that they actually provide alternatives, which are creating an actual table or materialize the view.
My understanding is that views in Redshift don’t inherently suffer from bad performances but that instead, given their transient nature, they don’t take advantage of the clustered architecture of Redshift. Additionally, as mentioned by some of the resources you linked as well, the queries that make up a view get executed every time you query the view and that definitely doesn’t help performances.
I would definitely suggest you to consider aggregating your data in actual tables or look into materializing these views.
To better understand how the planner works I’d take a look at this Query planning and execution workflow

Redshift has no problem working with views. The logic of the view is combined with the rest of the query that calls the view, similar to a subquery or CTE. Redshift plans and optimizes the entire statement (outer query + view logic) as a single statement.
The are 2 main "issues" that people have with views:
Views are bound to the tables (or other views) that they reference. You cannot drop them or make certain changes to them without first dropping the view. To address this Redshift offers WITH NO SCHEMA BINDING syntax so that the view is not bound to its objects. The compromise is that the view is not checked and queries against it may fail if underlying objects are changed.
Views make it very easy to generate extremely complex and inefficient queries that look "simple". This particularly happens when you nest views on top of views. You can use EXPLAIN to see the query plan that Redshift will use for a given query to see how your view is processed.

Related

SQL View From a View Running Slow

I created a view that pulls data from another view and it is running extremely slow. The original view runs fine so I'm not sure what the hold up would be. Is this typically an issue when querying off a view?
I did some brief research and can conclude that you'll have a hard time maintaining efficient nested views. In short, your SQL Server query optimizer is going to have a near-impossible time figuring out how to quickly execute your query. At best it has to execute three statements: Source statement, View 1, View 2 (Nested). Without indexes on the tables and views performance is going to continually be slow because it's running so many operations.
You'd be much better off by searching for an alternate solution to your nested view, such as replicating your SELECT statement from your first view within your nested view. The duplication is also less than ideal, but it means that your optimizer won't have to work so hard. You can also look into applying indexes to your views. When done properly, you will improve SELECT performance while sacrificing speed when you perform INSERT operations. This should be done if you don't expect your source table to change too often, but is probably the best solution available if you can't restructure view layout.
Probably you use some filters on both internal (fast) and external (slow) views.
Make sure that filters used on the external view are best 'linked' to the internal view. Best 'linked' means that if e.g. at the external view you use some more tables and join the internal view, try to join with filter columns as well or with the lower detail base table.

Is it bad to call views inside a View in sql

I have created 8 different views and i am using all these views inside a view.
So i was wondering before i go any further with this idea. i want to know does it affects performance too badly or not.
No, it's fine. In many cases I personally consider it preferable to writing one view with a giant and difficult to understand definition. In my opinion, using multiple views allows you to:
Encapsulate discrete logic in individual views.
Re-use logic in the individual views without having to repeat the logic (eliminating update problems later).
Name your logic so that it's easier for the next programmer to understand what you were trying to accomplish.
Views get "compiled" away during execution plan creation. Therefore there is only a very small penalty for using them: The extra time it takes SQL Server to look up the definition. Usually this delay is not measurable.
That means using views for the purposes mentioned by Larry Lustig is perfectly fine and encourage-able.
HOWEVER: Make sure that you do not introduce unnecessary JOINs using this technique. While SQL Server has mechanisms to eliminate unneeded tables from a query it quickly gives up if the query becomes to complex. Executing those additional JOINs can cause a significant slowdown. This is the reason that many companies have a no-view-rule in place.
So: Use views, but make sure to not misuse them.
It's not bad for performance just for being a view. It may add some complexity to maintain, and cause additional consideration when you want to change the schema of the underlying tables. If you were using views and they joined to the same tables, I think that would be less efficient than joining to the table once in one view.
I favour using nested views, with each view encapsulating and naming some cross section of data.
As for performance, it can actually improve performance if the alternative required that same data to be queried multiple times: A nested view is a bit like a temporary table - fired once.
The best, and recommended, way to discover performance implications is to try both options and examine the explain output.
The pure fact of querying a view from within a view does not have any negative performance implications. It is not different from querying a table from within a view.

Is there a reason not to use views in Oracle?

I have recently noticed that nobody uses views in my company (and it's a big company).
I want to create a few views largely because they make my queries simpler to the eye, and these views are on somewhat big tables that don't get very frequent updates (once a day).
My alternative is to create a type table of type record an populate it each time a SP is called. Is this better than using a view? (my guess is no)
PS: database is oracle 10g and
EDIT:
- yes i have asked around but no one could give me a reason.
- both the views and the queries that will be using them are heavy on joins.
Aesthetics doesn't have a place in SQL, or coding in general, when there's performance implications.
If the optimizer determines that predicate pushing can occur, a view will be as good as directly querying the table(s) the view represents. And as Justin mentions, because a view is a macro that expands into the underlying query the view represents -- a soft parse (re-use of the query from cache) is very likely because the cache check needs to match queries exactly.
But be aware of the following:
layering views (one view based on another) is a bad practice -- errors won't be encountered until the view is run
joining views to other tables and or views is highly suspect -- the optimizer might not see things as well if the underlying query is in place of the view reference. I've had such experiences, because the views joined to were doing more than what the query needed -- sometimes, the queries from all the views used were condensed into a single query that ran much better.
I recommend creating your views, and comparing the EXPLAIN plans to make sure that you are at least getting identical performance. I'd need to see your code for populating a TYPE before commenting on the approach, but it sounds like a derived table in essence...
It's possible you would benefit from using materialized views, but they are notorious restricted in what they support.
It certainly sounds like creating some views would be helpful in this case.
Have you asked around to see why no one uses views? That seems quite odd and would certainly tend to indicate that you're not reusing your SQL very efficiently. Without views, you'd tend to put the same logic in many different SQL statements rather than in a single view which would make maintenance a pain.
One reason not to use views which may or may not be valid... is that they have the potential to create complexity where there isn't any
For example I could write
CREATE VIEW foo as <SOME COMPLEX QUERY>
then later I could write
CREATE Procedure UseFoo as
BEGIN
SELECT
somefields
FROM
x
INNER JOIN foo
.....
So now I'm creating to objects that need to be deployed, maintained, version controlled etc...
Or I could write either
CREATE Procedure UseFoo as
BEGIN
WITH foo as (<SOME COMPLEX QUERY>)
SELECT
somefields
FROM
x
INNER JOIN foo
.....
or
CREATE Procedure UseFoo as
BEGIN
SELECT
somefields
FROM
x
INNER JOIN <SOME COMPLEX QUERY> foo
.....
And now I only need to deploy, maintain, and version control a single object.
If <SOME COMPLEX QUERY> only exists in one context maintaining two separate objects creates an unnecessary burden. Also after deployment any changes to requires evaluating things that rely on UseFoo. When two object you need to visit anything that evaluating on UseFoo and Foo
Of course on the other hand if Foo represents some shared logic the evaluation is required anyway but you only have to find and change a single object.
It has been my experience that when you have a large/complex database and some complex queries and no views, it is just because the users just don't know what views are, or how to use them. Once I explained the benifits of using a view, most people used them with out any problems.
From your description, I would just make a view, not a new table.
Views are great for hiding complexity -- if your users can just run the views you create as-is (as opposed to writing a query against the view), this is good.
But views also come with performance issues -- if your users know how to write sql, and they understand the tables they're using, it might be better to let them keep doing that.
Consider also that stored procedures are less prone to (the same) performance issues that views are.
here is a link to and a snippet from a nice article that describes views as well as how to tune them for better peformance.
Uses of Views
Views are useful for providing a horizontal or vertical subset of data
from a table (possibly for security reasons) ; for hiding the
complexity of a query; for ensuring that exactly the same SQL is used
throughout your application; and in n-tier applications to retrieve
supplementary information about an item from a related table......
http://www.smart-soft.co.uk/Oracle/oracle-tuning-part4-vw-use.htm

Performance of Tables vs. Views

Recently started working with a database in which the convention is to create a view for every table. If you assume that there is a one to one mapping between tables and views, I was wondering if anyone could tell me the performance impacts of doing something like this. BTW, this is on Oracle.
Assuming the question is about non-materialized views -- Really depends on the query that the view is based on, and what is being done to it. Sometimes, the predicates can be pushed into the view query by the optimizer. If not, then it wouldn't be as good as against the table itself. Views are built on top of tables -- why would you expect that the performance would be better?
Layering views, where you build one view on top of another, is a bad practice because you won't know about issues until run time. It's also less of a chance that predicate pushing will occur with layered views.
Views can also be updateable -- they aren't a reliable means to restricting access to resources if someone has INSERT/UPDATE/DELETE privileges on the underlying tables.
Materialized views are as good as tables, but are notoriously restrictive in what they support.
You don't explain what you're doing in the views? A 1:1 with the tables sounds like you are using the views more like synonyms than a view. IOW, are the views = "SELECT * FROM table", then you'll see no performance hit except on hard parse.
If you are joining to other tables or placing filter clauses in them which prevent predicate pushing than you're bound to see a major hit sometime.
The only pain I have had with views is a distributed query over a DB link. The local optimizer gets some details about the remote object, but the view doesn't tell it about any indexes so you can get some kooky plans.
I've heard about some places that use it as a standard since they can easily 're-order' the columns in a view. Not a big benefit in my opinion by YMMV

Is a view faster than a simple query?

Is a
select * from myView
faster than the query itself to create the view (in order to have the same resultSet):
select * from ([query to create same resultSet as myView])
?
It's not totally clear to me if the view uses some sort of caching making it faster compared to a simple query.
Yes, views can have a clustered index assigned and, when they do, they'll store temporary results that can speed up resulting queries.
Microsoft's own documentation makes it very clear that Views can improve performance.
First, most views that people create are simple views and do not use this feature, and are therefore no different to querying the base tables directly. Simple views are expanded in place and so do not directly contribute to performance improvements - that much is true. However, indexed views can dramatically improve performance.
Let me go directly to the documentation:
After a unique clustered index is created on the view, the view's result set is materialized immediately and persisted in physical storage in the database, saving the overhead of performing this costly operation at execution time.
Second, these indexed views can work even when they are not directly referenced by another query as the optimizer will use them in place of a table reference when appropriate.
Again, the documentation:
The indexed view can be used in a query execution in two ways. The query can reference the indexed view directly, or, more importantly, the query optimizer can select the view if it determines that the view can be substituted for some or all of the query in the lowest-cost query plan. In the second case, the indexed view is used instead of the underlying tables and their ordinary indexes. The view does not need to be referenced in the query for the query optimizer to use it during query execution. This allows existing applications to benefit from the newly created indexed views without changing those applications.
This documentation, as well as charts demonstrating performance improvements, can be found here.
Update 2: the answer has been criticized on the basis that it is the "index" that provides the performance advantage, not the "View." However, this is easily refuted.
Let us say that we are a software company in a small country; I'll use Lithuania as an example. We sell software worldwide and keep our records in a SQL Server database. We're very successful and so, in a few years, we have 1,000,000+ records. However, we often need to report sales for tax purposes and we find that we've only sold 100 copies of our software in our home country. By creating an indexed view of just the Lithuanian records, we get to keep the records we need in an indexed cache as described in the MS documentation. When we run our reports for Lithuanian sales in 2008, our query will search through an index with a depth of just 7 (Log2(100) with some unused leaves). If we were to do the same without the VIEW and just relying on an index into the table, we'd have to traverse an index tree with a search depth of 21!
Clearly, the View itself would provide us with a performance advantage (3x) over the simple use of the index alone. I've tried to use a real-world example but you'll note that a simple list of Lithuanian sales would give us an even greater advantage.
Note that I'm just using a straight b-tree for my example. While I'm fairly certain that SQL Server uses some variant of a b-tree, I don't know the details. Nonetheless, the point holds.
Update 3: The question has come up about whether an Indexed View just uses an index placed on the underlying table. That is, to paraphrase: "an indexed view is just the equivalent of a standard index and it offers nothing new or unique to a view." If this was true, of course, then the above analysis would be incorrect! Let me provide a quote from the Microsoft documentation that demonstrate why I think this criticism is not valid or true:
Using indexes to improve query performance is not a new concept; however, indexed views provide additional performance benefits that cannot be achieved using standard indexes.
Together with the above quote regarding the persistence of data in physical storage and other information in the documentation about how indices are created on Views, I think it is safe to say that an Indexed View is not just a cached SQL Select that happens to use an index defined on the main table. Thus, I continue to stand by this answer.
Generally speaking, no. Views are primarily used for convenience and security, and won't (by themselves) produce any speed benefit.
That said, SQL Server 2000 and above do have a feature called Indexed Views that can greatly improve performance, with a few caveats:
Not every view can be made into an indexed view; they have to follow a specific set of guidelines, which (among other restrictions) means you can't include common query elements like COUNT, MIN, MAX, or TOP.
Indexed views use physical space in the database, just like indexes on a table.
This article describes additional benefits and limitations of indexed views:
You Can…
The view definition can reference one or more tables in the
same database.
Once the unique clustered index is created, additional nonclustered
indexes can be created against the view.
You can update the data in the underlying tables – including inserts,
updates, deletes, and even truncates.
You Can’t…
The view definition can’t reference other views, or tables
in other databases.
It can’t contain COUNT, MIN, MAX, TOP, outer joins, or a few other
keywords or elements.
You can’t modify the underlying tables and columns. The view is
created with the WITH SCHEMABINDING option.
You can’t always predict what the query optimizer will do. If you’re
using Enterprise Edition, it will automatically consider the unique
clustered index as an option for a query – but if it finds a “better”
index, that will be used. You could force the optimizer to use the
index through the WITH NOEXPAND hint – but be cautious when using any
hint.
EDIT: I was wrong, and you should see Marks answer above.
I cannot speak from experience with SQL Server, but for most databases the answer would be no. The only potential benefit that you get, performance wise, from using a view is that it could potentially create some access paths based on the query. But the main reason to use a view is to simplify a query or to standardize a way of accessing some data in a table. Generally speaking, you won't get a performance benefit. I may be wrong, though.
I would come up with a moderately more complicated example and time it yourself to see.
In SQL Server at least, Query plans are stored in the plan cache for both views and ordinary SQL queries, based on query/view parameters. For both, they are dropped from the cache when they have been unused for a long enough period and the space is needed for some other newly submitted query. After which, if the same query is issued, it is recompiled and the plan is put back into the cache. So no, there is no difference, given that you are reusing the same SQL query and the same view with the same frequency.
Obviously, in general, a view, by it's very nature (That someone thought it was to be used often enough to make it into a view) is generally more likely to be "reused" than any arbitrary SQL statement.
Definitely a view is better than a nested query for SQL Server. Without knowing exactly why it is better (until I read Mark Brittingham's post), I had run some tests and experienced almost shocking performance improvements when using a view versus a nested query. After running each version of the query several hundred times in a row, the view version of the query completed in half the time. I'd say that's proof enough for me.
It may be faster if you create a materialized view (with schema binding). Non-materialized views execute just like the regular query.
My understanding is that a while back, a view would be faster because SQL Server could store an execution plan and then just use it instead of trying to figure one out on the fly. I think the performance gains nowadays is probably not as great as it once was, but I would have to guess there would be some marginal improvement to use the view.
I would expect the two queries to perform identically. A view is nothing more than a stored query definition, there is no caching or storing of data for a view. The optimiser will effectively turn your first query into your second query when you run it.
It all depends on the situation. MS SQL Indexed views are faster than a normal view or query but indexed views can not be used in a mirrored database invironment (MS SQL).
A view in any kind of a loop will cause serious slowdown because the view is repopulated each time it is called in the loop. Same as a query. In this situation a temporary table using # or # to hold your data to loop through is faster than a view or a query.
So it all depends on the situation.
There should be some trivial gain in having the execution plan stored, but it will be negligible.
In my finding, using the view is a little bit faster than a normal query. My stored procedure was taking around 25 minutes (working with a different larger record sets and multiple joins) and after using the view (non-clustered), the performance was just a little bit faster but not significant at all. I had to use some other query optimization techniques/method to make it a dramatic change.
Select from a View or from a table will not make too much sense.
Of course if the View does not have unnecessary joins, fields, etc. You can check the execution plan of your queries, joins and indexes used to improve the View performance.
You can even create index on views for faster search requirements. http://technet.microsoft.com/en-us/library/cc917715.aspx
But if you are searching like '%...%' than the sql engine will not benefit from an index on text column. If you can force your users to make searches like '...%' than that will be fast
referred to answer on asp forums :
https://forums.asp.net/t/1697933.aspx?Which+is+faster+when+using+SELECT+query+VIEW+or+Table+
Against all expectation, views are way slower in some circumstances.
I discovered this recently when I had problems with data which was pulled from Oracle which needed to be massaged into another format. Maybe 20k source rows. A small table. To do this we imported the oracle data as unchanged as I could into a table and then used views to extract data.
We had secondary views based on those views. Maybe 3-4 levels of views.
One of the final queries, which extracted maybe 200 rows would take upwards of 45 minutes! That query was based on a cascade of views. Maybe 3-4 levels deep.
I could take each of the views in question, insert its sql into one nested query, and execute it in a couple of seconds.
We even found that we could even write each view into a temp table and query that in place of the view and it was still way faster than simply using nested views.
What was even odder was that performance was fine until we hit some limit of source rows being pulled into the database, performs just dropped off a cliff over the space of a couple of days - a few more source rows was all it took.
So, using queries which pull from views which pull from views is much slower than a nested query - which makes no sense for me.
There is no practical different and if you read BOL you will find that ever your plain old SQL SELECT * FROM X does take advantage of plan caching etc.
The purpose of a view is to use the query over and over again. To that end, SQL Server, Oracle, etc. will typically provide a "cached" or "compiled" version of your view, thus improving its performance. In general, this should perform better than a "simple" query, though if the query is truly very simple, the benefits may be negligible.
Now, if you're doing a complex query, create the view.
No. view is just a short form of your actual long sql query. But yes, you can say actual query is faster than view command/query.
First view query will tranlate into simple query then it will execute, so view query will take more time to execute than simple query.
You can use sql views when you are using joins b/w multiple tables, to reuse complicated query again and again in simple manners.
I ran across this thread and just wanted to share this post from Brent Ozar as something to consider when using availability groups.
Brent Ozar bug report