Performance when querying a View - sql

I'm wondering if this is a bad practice or if in general this is the correct approach.
Lets say that I've created a view that combines a few attributes from a few tables.
My question, what do I need to do so I can query against this view as if it were a table without worrying about performance?
All attributes in the original tables are indexed, my concern is that the result view will have hundreds of thousands of records, which I will want to narrow down quite a bit based on user input.
What I'd like to avoid, is having multiple versions of the code that generates this view floating around with a few extra "where" conditions to facilitate the user input filtering.
For example, assume my view has this header VIEW(Name, Type, DateEntered) this may have 100,000+ rows (possibly millions). I'd like to be able to make this view in SQL Server, and then in my application write querlies like this:
SELECT Name, Type, DateEntered FROM MyView WHERE DateEntered BETWEEN #date1 and #date2;
Basically, I am denormalizing my data for a series of reports that need to be run, and I'd like to centralize where I pull the data from, maybe I'm not looking at this problem from the right angle though, so I'm open to alternative ways to attack this.

My question, what do I need to do so I can query against this view as if it were a table without worrying about performance?
SQL Server is very good in view unnesting.
Your queries will be as efficient as if the view's query were used in the query itself.
This means that
CREATE VIEW myview AS
SELECT *
FROM /* complex joins */
SELECT *
FROM mytable
JOIN myiew
ON …
and
SELECT *
FROM mytable
JOIN (
SELECT *
FROM /* complex joins */
) myview
ON …
will have the same performance.

SQL Server 2005 has indexed views - these provide indexes on views. That should help with performance. If the underlying tables already have good indexes on the queried fields, these will be used - you should only add indexed views when this is not the case.
These are known in other database systems as materialized views.

The view will make use of the index in your WHERE clause to filter the results.
Views aren't stored result sets. They're stored queries, so you'll have the performance gained from your indexes each time you query the view.

Why would it perform badly? I, mean you can think of a view as a compiled select statement. It makes use of existing indexes on the underlying tables, even when you add extra where clauses. In my opinion it is a good approach. In any case it's better than having virtually the same select statement scattered all over your application (from a design and maintainability point of view at least).

If not indexed then...
When you query a view, it's ignored. The view is expanded into the main query.
It is the same as querying the main tables directly.
What will kill you is view on top of view on top of view, in my experience.

It should, in general, perform no worse than the inline code.
Note that it is possible to make views which hide very complex processing (joins, pivots, stacked CTEs, etc), and you may never want anyone to be able to SELECT * FROM view on such a view for all time or all products or whatever. If you have standard filter criteria, you can use an inline table-valued function (effectively a parameterized view), which would require all users to supply the expected parameters.
In your case, for instance, they would always have to do:
SELECT Name, Type, DateEntered
FROM MyITVF(#date1, #date2);
To share the view logic between multiple ITVFs, you can build many inline table-valued functions on top of the view, but not give access to the underlying tables or views to users.

Related

What exactly is a sql "view"

I suppose the definition might be different for different databases (I've tagged a few databases in the question), but suppose I have the following (in pseudocode):
CREATE VIEW myview FROM
SELECT * FROM mytable GROUP BY name
And then I can query the view like so:
SELECT * FROM myview WHERE name like 'bob%'
What exactly is the "view" doing in this case? Is it just a short-hand and the same as doing:
SELECT * FROM (
SELECT * FROM mytable GROUP BY name
) myview WHERE name like 'bob%'
Or does creating a view reserve storage (or memory, indexes, whatever else)? In other words, what are the internals of what happens when a view is created and accessed?
A view is a name that refers to a stored SQL query. When referenced, the definition of the query are replaced in the referencing query. It is basically the short-hand that you describe.
A view is defined by the standard and is pretty much the same thing across all databases.
A view does not permanently store data. Each time it is referenced the code is run. One caveat is that -- in some databases -- the view may be pre-compiled, so the pre-compiled code is actually included in the query plan.
By contrast, some databases support materialized views. These are very different beasts and they do store data.
Some other reasons for views:
Not everyone is a SQL expert so the Data Base Administrator might develop views consisting of complex joins on multiple tables to provide users easy access to the data they might need to access but might not know how to best do that.
On some databases you can also create read-only views. Again, a DBA might create these to limit what operations a user can perform on certain tables.
A DBA might also create a view to limit what columns of a table a user can see.

use of views to protect the actual tables in sql

how do views act as a mediator between the actual tables and an end user ? what's the internal process which occurs when a view is created. i mean that when a view is created on a table, then does it stands like a wall between the table and the end user or else? how do views protect the actual tables, only with the check option? but if a user inserts directly into the table then how come do i protect the actual tables?
if he/she does not use : insert into **vw** values(), but uses: insert into **table_name** values() , then how is the table protected now?
Non-materialized views are just prepackaged SQL queries. They execute the same as any derived table/inline view. Multiple references to the same view will run the query the view contains for every reference. IE:
CREATE VIEW vw_example AS
SELECT id, column, date_column FROM ITEMS
SELECT x.*, y.*
FROM vw_example x
JOIN vw_example y ON y.id = x.id
...translates into being:
SELECT x.*, y.*
FROM (SELECT id, column, date_column FROM ITEMS) x
JOIN (SELECT id, column, date_column FROM ITEMS) y ON y.id = x.id
Caching
The primary benefit is caching because the query will be identical. Queries are cached, including the execution plan, in order to make the query run faster later on because the execution plan has been generated already. Caching often requires queries to be identical to the point of case sensitivity, and expires eventually.
Predicate Pushing
Another potential benefit is that views often allow "predicate pushing", where criteria specified on the view can be pushed into the query the view represents by the optimizer. This means that the query could scan the table once, rather than scan the table in order to present the resultset to the outer/ultimate query.
SELECT x.*
FROM vw_example x
WHERE x.column = 'y'
...could be interpreted by the optimizer as:
SELECT id, column, date_column
FROM ITEMS
WHERE x.column = 'y'
The decision for predicate pushing lies solely with the optimizer. I'm unaware of any ability for a developer to force the decision, only that it really depends on the query the view uses and what additional criteria is being applied.
Commentary on Typical Use of Non-materialized Views
Sadly, it's very common to see a non-materialized SQL view used for nothing more than encapsulation to simplify writing queries -- simplification which isn't a recommended practice either. SQL is SET based, and doesn't optimize well using procedural approaches. Layering views on top of one another is also not a recommended practice.
Updateable Views
Non-materialized views are also updatable, but there are restrictions because a view can be made of numerous tables joined together. An updatable, non-materialized view will stop a user from being able to insert new records, but could update existing ones. The CHECK OPTION depends on the query used to create the view for enforcing a degree of update restriction, but it's not enough to ensure none will ever happen. This demonstrates that the only reliable means of securing against unwanted add/editing/deletion is to grant proper privileges to the user, preferably via a role.
Views do not protect tables, though they can be used in a permissions-based table-protection scheme. Views simply provide a convenient way to access tables. If you give a user access to views and not tables, then you have probably greatly restricted access.

Why would you want to put an index on a view?

Microsoft SQL Server allows you to add an index to a view, but why would you want to do this?
My understanding is that a view is really just a subquery, i.e., if I say SELECT * FROM myView, i'm really saying SELECT * FROM (myView's Query)
It seems like the indexes on the underlying tables would be the ones that matter the most. So why would you want a separate index on the view?
If the view is indexed then any queries that can be answered using the index only will never need to refer to the underlying tables. This can lead to an enormous improvement in performance.
Essentially, the database engine is maintaining a "solved" version of the query (or, rather, the index of the query) as you update the underlying tables, then using that solved version rather than the original tables when possible.
Here is a good article in Database Journal.
Microsoft SQL Server allows you to add an index to a view, but why would you want to do this?
To speed up the queries.
My understanding is that a view is really just a subquery, i.e., if I say SELECT * FROM myView, i'm really saying SELECT * FROM (myView's Query)
Not always.
By creating a clustered index on a view, you materialize the view, and updates to the underlying tables physically update the view. The queries against this view may or may not access the underlying tables.
Not all views can be indexed.
For instance, if you are using GROUP BY in a view, for it to be indexable it should contain a COUNT_BIG and all aggregate functions in it should distribute over UNION ALL (only SUM and COUNT_BIG actually are). This is required for the index to be maintainable and the update to the underlying tables could update the view in a timely fashion.
the following link provides better worded information than i can say especially in the section under performance increases. Hope it helps
http://technet.microsoft.com/en-us/library/cc917715.aspx
You create an index on a view for the same reason as on a base table: to improve the performance of queries against that view. Another reason for doing it is to implement some uniqueness constraint you can't implement against base tables. SQL Server unfortunately doesn't allow constraints to be created on views.

A SQL story in 2 parts - Are SQL views always good and how can I solve this example?

I'm building a reporting app, and so I'm crunching an awful lot of data. Part of my approach to creating the app in an agile way is to use SQL views to take the strain off the DB if multiple users are all bashing away.
One example is:
mysql_query("CREATE VIEW view_silverpop_clicks_baby_$email AS SELECT view_email_baby_position.EmailAddress, view_email_baby_position.days, silverpop_campaign_emails.id, silverpop_actions.`Click Name` , silverpop_actions.`Mailing Id`
FROM silverpop_actions
INNER JOIN view_email_baby_position ON (silverpop_actions.Email = view_email_baby_position.EmailAddress ) , silverpop_campaign_emails
WHERE silverpop_campaign_emails.id = $email
AND view_email_baby_position.days
BETWEEN silverpop_campaign_emails.low
AND silverpop_campaign_emails.high
AND silverpop_actions.`Event Type` = 'Click Through'") or die(mysql_error());
And then later in the script this view is used to calculate the number of clicks a particular flavour of this email has had.
$sql = "SELECT count(*) as count FROM `view_silverpop_clicks_baby_$email` WHERE `Click Name` LIKE '$countme%'";
My question is in 2 parts really:
Are views always good? Can you
have too many?
Could I create yet another set of
views to cache the count variable in
the second snippet of code. If so
how could I approach this? I can't
quite make this out yet.
Thanks!
To answer your questions.
1.) I don't know that I can think of an instance where views are BAD in and of themselves, but it would be bad to use them unnecessarily. Whether you can have too many really depends on your situation.
2.) Having another set of views will not cache the count variable so it wouldn't be beneficial from that standpoint.
Having said that, I think you have a misunderstanding on what a view actually does. A view is just a definition of a particular SQL statement and it does not cache data. When you execute a SELECT * FROM myView;, the database is still executing the select statement defined in the CREATE VIEW definition just as it would if a user was executing that statement.
Some database vendors offer a different kind of view called a materialized view. In this case the table data needed to create the view is stored/cached and is usually updated based on a refresh rate specified when you create it. This is "heavy" in the sense that your data is stored twice, but can create better execution plans because the data is already joined, aggregated, etc. Note though, you only see the data based on the last refresh of the materialized view, where with a normal view you see the data as it currently exists in the underlying tables. Currently, MySQL does not support materialized views.
Some useful uses of views are to:
Create easier/cleaner SQL statements for complex queries (which is something you are doing)
Security. If you have tables where you want a user to be able to see some columns or rows, but not other columns/rows, you restrict access to the base table and create a view of the base table that only selects the columns/rows that the user should have access too.
Create aggregations of tables
Views are used by query optimizer so they often help in querying for information more efficiently.
Indexed or materialized views however create a table with the required information which can make quite a difference. Think of it as denormalization of you db scheme without changing existing scheme. You get best of both worlds.
Some views are never used so they represent needles compexity -which is bad.
Indexed views cannot reference other views (mssql) so there's hardly a point in creating such view.

Views or table functions or something else

I designed 5 stored procedures which almost use same join condition but parameters or values in where clause change for each on different runs.
Is it best solution to create a view with all join conditions without where clause and then query from view or work on view? Can views auto update itself if i create view?
Can i do sub-queries or query similar to (i think i read somewhere views do not support sub queries but not 100% sure)
select count(x1) as x1cnt, count(x2) as x2count
from (
select x1,x2,
(
case when x1 is 'y' then 1 else 0 end +
case when x2 is 'y' then 1 else 0 end
) per
from vw_viewname) v1
where v1.per = 1
Updated below:
In my queries i use joins similar to this also
select c1,c2,c3
FROM [[join conditions - 5 tables]]
Inner join
(
select x1,x2,x3, some case statements
FROM [[join conditions - 5 tables]]
where t1.s1 = val1 and t2.s2 = v2 etc
) s
on s.id = id
so i'm using join twice so i thought can i reduce it using some views
Leaving out their where clause could make the query run more slowly or just give more results than a specific query would. But you will have to determine if that is advantageous based on your system.
You will get the common view results table to work with. View basically run the query when you use them so you will get results as if you did the query yourself by some other mechanism. You can do sub queries on a view just as if it were another table. That should not be a problem. But if you have 5 different queries doing 5 specific things then it is probably beneficial to leave it as so. One or two of those may be called more and you would be trading off their performance with a general view table and gain nothing really for doing so other than view reuse.
I would only construct the view if you have some specific benefit from doing so.
Also I found this post that may be similar Dunno if you will find it helpful or not.
EDIT: Well, I think it would just make it worse really. You would just be calling the view twice and if its a generic view it means each of those calls is going to get a lot of generic results to deal with.
I would say just focus on optimizing those queries to give you exactly what you need. Thats really what you have 5 different procedure for anyway right? :)
It's 5 different queries so leave it like that.
It's seductive to encapsulate similar JOINs in a view, but before you know it you have views on top of views and awful performance. I've seen it many times.
The "subquery in a view" thing probably refers to indexed views which have limitations.
Unless your talking about an indexed view, the view will actually run the script to generate the view on demand. In that regard, it would be the same as using a subquery.
If I were you, I would leave it as it is. It may seem like you should compact your code (each of the 5 scripts have almost the same code), but its what is different that is important here.
You can have subqueries in a view, and that approach is perfectly acceptable.
SQL Server views do support sub-queries. And, in a sense, views to auto update themselves because a view is not a persisted object (unless you use an Indexed View). With a non Indexed View, each time you query the view, it is using the underlying tables. So, your view will be as up to date as the tables they are based upon.
It sounds to me like a view would be a good choice here.
It's fine to create a view, even if it contains a subselect. You can remove the where for the view.
Are you sure you want to use COUNT like that without a group by? It counts the number of rows which contain non-null values or the parameter.
I've done a lot of presentations recently on the simplification offered by the Query Optimiser. Essentially if you have planned your joins well enough, the system can see that they're redundant and ignore them completely.
http://msmvps.com/blogs/robfarley/archive/2008/11/09/join-simplification-in-sql-server.aspx
Stored procedures will do the same work each time (parameters having some effect), but a view (or inline TVF) will be expanded into the outer query and simplified out.