Taking this into consideration that:
My cube's DSV reads only from views on the DW
I have access to create and alter these views
Lets say that I need a new field in my table CUSTOMER on the DSV witch is mapped directly to a view vwCustomer on the DW.
All the information necessary to this new field can be found on the view vwCustomer. Any advantage on creating that field as a named calculation over altering the view?
No advantages, only pros and cons...
Pros of adding to DSV:
Doesn't interfere with other applications that use vwCustomer with SELECT * queries
...
Cons of adding to DSV:
Calculation isn't available to other applications/reports that may need to leverage it.
...
that's actually all I can think of off the top. Definitely no performance benefit to either method. My preference would be to implement it in the view.
Related
Microsoft SQL Server allows you to add an index to a view, but why would you want to do this?
My understanding is that a view is really just a subquery, i.e., if I say SELECT * FROM myView, i'm really saying SELECT * FROM (myView's Query)
It seems like the indexes on the underlying tables would be the ones that matter the most. So why would you want a separate index on the view?
If the view is indexed then any queries that can be answered using the index only will never need to refer to the underlying tables. This can lead to an enormous improvement in performance.
Essentially, the database engine is maintaining a "solved" version of the query (or, rather, the index of the query) as you update the underlying tables, then using that solved version rather than the original tables when possible.
Here is a good article in Database Journal.
Microsoft SQL Server allows you to add an index to a view, but why would you want to do this?
To speed up the queries.
My understanding is that a view is really just a subquery, i.e., if I say SELECT * FROM myView, i'm really saying SELECT * FROM (myView's Query)
Not always.
By creating a clustered index on a view, you materialize the view, and updates to the underlying tables physically update the view. The queries against this view may or may not access the underlying tables.
Not all views can be indexed.
For instance, if you are using GROUP BY in a view, for it to be indexable it should contain a COUNT_BIG and all aggregate functions in it should distribute over UNION ALL (only SUM and COUNT_BIG actually are). This is required for the index to be maintainable and the update to the underlying tables could update the view in a timely fashion.
the following link provides better worded information than i can say especially in the section under performance increases. Hope it helps
http://technet.microsoft.com/en-us/library/cc917715.aspx
You create an index on a view for the same reason as on a base table: to improve the performance of queries against that view. Another reason for doing it is to implement some uniqueness constraint you can't implement against base tables. SQL Server unfortunately doesn't allow constraints to be created on views.
Can you update a view in a database?
If so, how?
If not, why not?
The actual answer is "it depends", there are no absolutes.
The basic criteria is it has to be an updateable view in the opinion of the database engine, that is to say can the engine uniquely identify the row(s) to be updated and secondly are the fields updateable. If your view has a calculated field or represents the product of a parent/child join then the default answer is probably no.
However its also possible to cheat... in MS SQL Server and Oracle (to take just two examples) you can have triggers that fire when you attempt to insert or update a view such that you can make something that the server doesn't think updateable into something that is - usually because you have knowledge that the server can't easily infer from the schema.
The correct answer is "it depends". You can't update an aggregate column in a view for example. For Oracle views you can Google for "updatable join view" for some examples of when you can and cannot update a view.
Yes, they are updatable but not always. Views can be updated under followings:
If the view consists of the primary key of the table based on which the view has been created.
If the view is defined based on one and only one table.
If the view has not been defined using groups and aggregate functions.
If the view does not have any distinct clause in its definition.
If the view that is supposed to be updated is based on another view, the later should be updatable.
If the definition of the view does not have any sub queries.
PostgreSQL has RULEs to create updatable VIEWs. Check the examples in the manual to see how to use them.
Ps. In PostgreSQL a VIEW is a RULE, a select rule.
In the past it wasn't possible to update any views. The main purpose of a view is to look at data, hence the name. It could also have been called a stored query.
Today, many database engines support to update views. It's bound to restrictions, some updates are virtually impossible (eg. calculated columns, group by etc).
There are two approaches:
INSTEAD OF trigger, which basically shifts the problem to the user. You write some procedural code that does the job. Certainly, no guarantees is made about correctness, consistency, etc. From RDBMS engine perspective a trigger that deletes everything from the base tables, no matter what update is made in the view, is perfectly fine.
Much more ambitious is view updates handled exclusively by RDBMS engine. Not much progress is made here: to put it mildly, if you have some good ideas there, then you can roll out PhD thesis. In practice, your favorite RDBMS might allow some limiting ad-hock view updates; check the manual:-)
Yes you can, but have a look at CREATE VIEW (Transact-SQL) and see the section Updatable Views
http://msdn.microsoft.com/en-us/library/ms187956.aspx
See Remarks\updateable view
Yes they are - the syntax is the same as updating a table
Update MyView
Set Col1 = "Testing"
Where Col2 = 3
Go
There a few conditions to creating an View that can be updated. They can be found here
EDIT:
I must add that is based on MS SQL
When a view is created in SQL Server, metadata for the referenced table columns (column name and ordinal position) is persisted in the database. Any change to the referenced base table(s) (column re-ordering, new column addition, etc) will not be reflected in the view until the view is either:
•Altered with an ALTER VIEW statement
•Recreated with DROP VIEW/CREATE VIEW statements
•Refreshed using system stored procedure sp_refreshview
Yes, using an INSTEAD OF trigger.
We generally don't update a view. A view is written to fetch data from the various tables based on joins and where conditions put.
View is just a logic put in place which gives the desired data set on invoking it.
But not sure on what scenario one needs to update a view.
I'm building a reporting app, and so I'm crunching an awful lot of data. Part of my approach to creating the app in an agile way is to use SQL views to take the strain off the DB if multiple users are all bashing away.
One example is:
mysql_query("CREATE VIEW view_silverpop_clicks_baby_$email AS SELECT view_email_baby_position.EmailAddress, view_email_baby_position.days, silverpop_campaign_emails.id, silverpop_actions.`Click Name` , silverpop_actions.`Mailing Id`
FROM silverpop_actions
INNER JOIN view_email_baby_position ON (silverpop_actions.Email = view_email_baby_position.EmailAddress ) , silverpop_campaign_emails
WHERE silverpop_campaign_emails.id = $email
AND view_email_baby_position.days
BETWEEN silverpop_campaign_emails.low
AND silverpop_campaign_emails.high
AND silverpop_actions.`Event Type` = 'Click Through'") or die(mysql_error());
And then later in the script this view is used to calculate the number of clicks a particular flavour of this email has had.
$sql = "SELECT count(*) as count FROM `view_silverpop_clicks_baby_$email` WHERE `Click Name` LIKE '$countme%'";
My question is in 2 parts really:
Are views always good? Can you
have too many?
Could I create yet another set of
views to cache the count variable in
the second snippet of code. If so
how could I approach this? I can't
quite make this out yet.
Thanks!
To answer your questions.
1.) I don't know that I can think of an instance where views are BAD in and of themselves, but it would be bad to use them unnecessarily. Whether you can have too many really depends on your situation.
2.) Having another set of views will not cache the count variable so it wouldn't be beneficial from that standpoint.
Having said that, I think you have a misunderstanding on what a view actually does. A view is just a definition of a particular SQL statement and it does not cache data. When you execute a SELECT * FROM myView;, the database is still executing the select statement defined in the CREATE VIEW definition just as it would if a user was executing that statement.
Some database vendors offer a different kind of view called a materialized view. In this case the table data needed to create the view is stored/cached and is usually updated based on a refresh rate specified when you create it. This is "heavy" in the sense that your data is stored twice, but can create better execution plans because the data is already joined, aggregated, etc. Note though, you only see the data based on the last refresh of the materialized view, where with a normal view you see the data as it currently exists in the underlying tables. Currently, MySQL does not support materialized views.
Some useful uses of views are to:
Create easier/cleaner SQL statements for complex queries (which is something you are doing)
Security. If you have tables where you want a user to be able to see some columns or rows, but not other columns/rows, you restrict access to the base table and create a view of the base table that only selects the columns/rows that the user should have access too.
Create aggregations of tables
Views are used by query optimizer so they often help in querying for information more efficiently.
Indexed or materialized views however create a table with the required information which can make quite a difference. Think of it as denormalization of you db scheme without changing existing scheme. You get best of both worlds.
Some views are never used so they represent needles compexity -which is bad.
Indexed views cannot reference other views (mssql) so there's hardly a point in creating such view.
the mysql certification guide suggests that views can be used for:
creating a summary that may involve calculations
selecting a set of rows with a WHERE clause, hide irrelevant information
result of a join or union
allow for changes made to base table via a view that preserve the schema of original table to accommodate other applications
but from how to implement search for 2 different table data?
And maybe you're right that it doesn't
work since mysql views are not good
friends with indexing. But still. Is
there anything to search for in the
shops table?
i learn that views dont work well with indexing so, will it be a big performance hit, for the convenience it may provide?
A view can be simply thought of as a SQL query stored permanently on the server. Whatever indices the query optimizes to will be used. In that sense, there is no difference between the SQL query or a view. It does not affect performance any more negatively than the actual SQL query. If anything, since it is stored on the server, and does not need to be evaluated at run time, it is actually faster.
It does afford you these additional advantages
reusability
a single source for optimization
This mysql-forum-thread about indexing views gives a lot of insight into what mysql views actually are.
Some key points:
A view is really nothing more than a stored select statement
The data of a view is the data of tables referenced by the View.
creating an index on a view will not work as of the current version
If merge algorithm is used, then indexes of underlying tables will be used.
The underlying indices are not visible, however. DESCRIBE on a view will show no indexed columns.
MySQL views, according to the official MySQL documentation, are stored queries that when invoked produce a result set.
A database view is nothing but a virtual table or logical table (commonly consist of SELECT query with joins). Because a database view is similar to a database table, which consists of rows and columns, so you can query data against it.
Views should be used when:
Simplifying complex queries (like IF ELSE and JOIN or working with triggers and such)
Putting extra layer of security and limit or restrict data access (since views are merely virtual tables, can be set to be read-only to specific set of DB users and restrict INSERT )
Backward compatibility and query reusability
Working with computed columns. Computed columns should NOT be on DB tables, because the DB schema would be a bad design.
Views should not be use when:
associate table(s) is/are tentative or subjected to frequent structure change.
According to http://www.mysqltutorial.org/introduction-sql-views.aspx
A database table should not have calculated columns however a database view should.
I tend to use a view when I need to calculate totals, counts etc.
Hope that help!
One more down side of view that doesn't work well with mysql replicator as well as it is causing the master a bit behind of the slave.
http://bugs.mysql.com/bug.php?id=30998
I have a data warehouse containing typical star schemas, and a whole bunch of code which does stuff like this (obviously a lot bigger, but this is illustrative):
SELECT cdim.x
,SUM(fact.y) AS y
,dim.z
FROM fact
INNER JOIN conformed_dim AS cdim
ON cdim.cdim_dim_id = fact.cdim_dim_id
INNER JOIN nonconformed_dim AS dim
ON dim.ncdim_dim_id = fact.ncdim_dim_id
INNER JOIN date_dim AS ddim
ON ddim.date_id = fact.date_id
WHERE fact.date_id = #date_id
GROUP BY cdim.x
,dim.z
I'm thinking of replacing it with a view (MODEL_SYSTEM_1, say), so that it becomes:
SELECT m.x
,SUM(m.y) AS y
,m.z
FROM MODEL_SYSTEM_1 AS m
WHERE m.date_id = #date_id
GROUP BY m.x
,m.z
But the view MODEL_SYSTEM_1 would have to contain unique column names, and I'm also concerned about performance with the optimizer if I go ahead and do this, because I'm concerned that all the items in the WHERE clause across different facts and dimensions get optimized, since the view would be across a whole star, and views cannot be parametrized (boy, wouldn't that be cool!)
So my questions are -
Is this approach OK, or is it just going to be an abstraction which hurts performance and doesn't give my anything but a lot nicer syntax?
What's the best way to code-gen these views, eliminating duplicate column names (even if the view later needs to be tweaked by hand), given that all the appropriate PK and FKs are in place? Should I just write some SQL to pull it out of the INFORMATION_SCHEMA or is there a good example already available.
Edit: I have tested it, and the performance seems the same, even on the bigger processes - even joining multiple stars which each use these views.
The automation is mainly because there are a number of these stars in the data warehouse, and the FK/PK has been done properly by the designers, but I don't want to have to pick through all the tables or the documentation. I wrote a script to generate the view (it also generates abbreviations for the tables), and it works well to generate the skeleton automagically from INFORMATION_SCHEMA, and then it can be tweaked before committing the creation of the view.
If anyone wants the code, I could probably publish it here.
I’ve used this technique on several data warehouses I look after. I have not noticed any performance degradation when running reports based off of the views versus a table direct approach but have never performed a detailed analysis.
I created the views using the designer in SQL Server management studio and did not use any automated approach. I can’t imagine the schema changing often enough that automating it would be worthwhile anyhow. You might spend as long tweaking the results as it would have taken to drag all the tables onto the view in the first place!
To remove ambiguity a good approach is to preface the column names with the name of the dimension it belongs to. This is helpful to the report writers and to anyone running ad hoc queries.
Make the view or views into into one or more summary fact tables and materialize it. These only need to be refreshed when the main fact table is refreshed. The materialized views will be faster to query and this can be a win if you have a lot of queries that can be satisfied by the summary.
You can use the data dictionary or information schema views to generate SQL to create the tables if you have a large number of these summaries or wish to change them about frequently.
However, I would guess that it's not likely that you would change these very often so auto-generating the view definitions might not be worth the trouble.
If you happen to use MS SQL Server, you could try an Inline UDF which is as close to a parameterized view as it gets.