The BigQuery documentation describes each kind of view but doesn't provide particular use cases, when to use one and when to use the other, so the question here is what are some of the best use cases to use one over the other.
Views are generally used when data is to be accessed infrequently and data in tables get updated on a frequent basis.
Some Use Cases where you might benefit from using views are:
When you want to consult a table.
Views are commonly faster than materialized views.
Making use when creating a dashboard or an impression etiquette is highly recommended.
Materialized Views are used when data is to be accessed frequently and data in tables do not get updated on a frequent basis.
Some Use Cases where you might benefit from using materialized views are:
Pre-aggregate data. Aggregation of streaming data.
Pre-filter data. Run queries that only read a particular subset of the table.
Pre-join data. Query joins, especially between large and small tables.
Recluster data. Run queries that would benefit from a clustering scheme that differs from the base tables.
Note that these are just some Use Cases between views and materialized views.
Related
I have seen in a few non-Amazon sources that the Redshift query planner has problems working with views (here is one source, here is another, here is a third). By views I mean standard SQL views, not the newly-available materialized views. However I can't find anything about this in the developer's guide, and these sources listed above are a few years out of date. Does anyone know what the current situation is with the Redshift query planner and views, and if there is official Redshift documentation that describes it, where it is located?
The arguments of the blogs are, as you say, a bit outdated as they present as one of the main drawbacks of views the fact that they couldn’t be materialized at the time of writing, which is not the case anymore.
The first link just says that Redshift has trouble at optimizing queries involving views but doesn’t show any benchmark/proof of that nor it explains why and in which way.
The second and third sources have some more merit in that they actually provide alternatives, which are creating an actual table or materialize the view.
My understanding is that views in Redshift don’t inherently suffer from bad performances but that instead, given their transient nature, they don’t take advantage of the clustered architecture of Redshift. Additionally, as mentioned by some of the resources you linked as well, the queries that make up a view get executed every time you query the view and that definitely doesn’t help performances.
I would definitely suggest you to consider aggregating your data in actual tables or look into materializing these views.
To better understand how the planner works I’d take a look at this Query planning and execution workflow
Redshift has no problem working with views. The logic of the view is combined with the rest of the query that calls the view, similar to a subquery or CTE. Redshift plans and optimizes the entire statement (outer query + view logic) as a single statement.
The are 2 main "issues" that people have with views:
Views are bound to the tables (or other views) that they reference. You cannot drop them or make certain changes to them without first dropping the view. To address this Redshift offers WITH NO SCHEMA BINDING syntax so that the view is not bound to its objects. The compromise is that the view is not checked and queries against it may fail if underlying objects are changed.
Views make it very easy to generate extremely complex and inefficient queries that look "simple". This particularly happens when you nest views on top of views. You can use EXPLAIN to see the query plan that Redshift will use for a given query to see how your view is processed.
I have to optimize the physical design of several queries. I have tried several techniques such as indexes or clusters but in most of the queries the best option in terms of consistent gets is creating a materialized view. Is there any reason why not to choose materialized views for optimizing the queries? Because if we could optimize all queries using only materialized views, everything would be much easier and faster.
You can optimize using materialized views. In my experience, they have one major downside: timing. Materialized views are not all updated at exactly the same time.
As a result, different tables that you think might be related might be missing rows. As a trivial example, you might have a foreign key relationship from T1 to T2. However, T2 gets materialized before T1. Then when T1 is materialized, some foreign key values could be missing. I have spent a lot of time dealing with the issues that this causes.
There are ways to adjust for this. For instance, all rows could have create dates and the materialized views could restrict rows only to those where were created or updated up to the previous hour boundary.
There are other issues, in terms of performance and maintenance. For instance, your database load might shift to materializing the views. Or the views might fail due to underlying schema changes. However, once you have a process in place, you will probably find that these are quite manageable.
For some application like OLTP it does not make sense because queries don't read a lot of data.
For some application like datawarehousing applications it could be an option but there are sometimes limitation to SQL statements that can be used to create materialized views (MV).
And in general one must take into account the cost to refresh the MV: if you don't want stale data, refresh should be automatic and there is some overhead to be taken into account.
I'm trying to understand the difference between the two, and when it would be advisable to use one over the other.
A datamart is a whole database: generally like a simpler data warehouse in that it is usually the source for reporting or analysis. It is usually the end point of ETL processes pulling and aggregating data from multiple sources.
A materialised view is a stored query. It is 'materialised' in the sense that some aspect of it will be permanently stored, as opposed to an ordinary view which is evaluated on the fly. Often this is in order to apply indexes to a view: the view has to be schema bound to the underlying data, and updates to the underlying data will cause the materialised view's indexes to be updated so they're ready ahead of time before the view is called.
So really, the question of which to use doesn't make sense: they're completely different things.
If the problem is complex querying, then go for views.
If performance is the problem, then go for data marts.
I've just started joining Stored Procedures together using views and it seems a simple way of building up a short query using the results of others.
Are there any disadvantages to over relying on Views before I plough on with this method? Am I better to pursue the temporary table option?
The main differences are that a view only actually stores the query not the results (with the exception of materialised views) and views persist after the end of your session. Views are an excellent way of hiding complexity, but does not make the queries run more quickly than if you wrote out the whole thing in one query. Views also do not use up storage space (except for a very small amount for the metadata).
I would recommend using views if you do not have any requirements to speed the queries up further or if you need to be able to reference the data without recreating it subsequent sessions.
Temporary tables do store the result, but just for the current session, so if you need a base query to speed up further queries for the duration of your session, this can be useful.
In fact, views are mostly used for security reasons, and they also make queries more simple (for some cases.) So it just depends on what you are doing, based on if it requires storing and other requirements.
When should a View actually be used over an actual Table? What gains should I expect this to produce?
Overall, what are the advantages of using a view over a table? Shouldn't I design the table in the way the view should look like in the first place?
Oh there are many differences you will need to consider
Views for selection:
Views provide abstraction over tables. You can add/remove fields easily in a view without modifying your underlying schema
Views can model complex joins easily.
Views can hide database-specific stuff from you. E.g. if you need to do some checks using Oracles SYS_CONTEXT function or many other things
You can easily manage your GRANTS directly on views, rather than the actual tables. It's easier to manage if you know a certain user may only access a view.
Views can help you with backwards compatibility. You can change the underlying schema, but the views can hide those facts from a certain client.
Views for insertion/updates:
You can handle security issues with views by using such functionality as Oracle's "WITH CHECK OPTION" clause directly in the view
Drawbacks
You lose information about relations (primary keys, foreign keys)
It's not obvious whether you will be able to insert/update a view, because the view hides its underlying joins from you
Views can:
Simplify a complex table structure
Simplify your security model by allowing you to filter sensitive data and assign permissions in a simpler fashion
Allow you to change the logic and behavior without changing the output structure (the output remains the same but the underlying SELECT could change significantly)
Increase performance (Sql Server Indexed Views)
Offer specific query optimization with the view that might be difficult to glean otherwise
And you should not design tables to match views. Your base model should concern itself with efficient storage and retrieval of the data. Views are partly a tool that mitigates the complexities that arise from an efficient, normalized model by allowing you to abstract that complexity.
Also, asking "what are the advantages of using a view over a table? " is not a great comparison. You can't go without tables, but you can do without views. They each exist for a very different reason. Tables are the concrete model and Views are an abstracted, well, View.
Views are acceptable when you need to ensure that complex logic is followed every time. For instance, we have a view that creates the raw data needed for all financial reporting. By having all reports use this view, everyone is working from the same data set, rather than one report using one set of joins and another forgetting to use one which gives different results.
Views are acceptable when you want to restrict users to a particular subset of data. For instance, if you do not delete records but only mark the current one as active and the older versions as inactive, you want a view to use to select only the active records. This prevents people from forgetting to put the where clause in the query and getting bad results.
Views can be used to ensure that users only have access to a set of records - for instance, a view of the tables for a particular client and no security rights on the tables can mean that the users for that client can only ever see the data for that client.
Views are very helpful when refactoring databases.
Views are not acceptable when you use views to call views which can result in horrible performance (at least in SQL Server). We almost lost a multimillion dollar client because someone chose to abstract the database that way and performance was horrendous and timeouts frequent. We had to pay for the fix too, not the client, as the performance issue was completely our fault. When views call views, they have to completely generate the underlying view. I have seen this where the view called a view which called a view and so many millions of records were generated in order to see the three the user ultimately needed. I remember one of these views took 8 minutes to do a simple count(*) of the records. Views calling views are an extremely poor idea.
Views are often a bad idea to use to update records as usually you can only update fields from the same table (again this is SQL Server, other databases may vary). If that's the case, it makes more sense to directly update the tables anyway so that you know which fields are available.
According to Wikipedia,
Views can provide many advantages over tables:
Views can represent a subset of the data contained in a table.
Views can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table.
Views can join and simplify multiple tables into a single virtual table.
Views can act as aggregated tables, where the database engine aggregates data (sum, average, etc.) and presents the calculated results as part of the data.
Views can hide the complexity of data. For example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table.
Views take very little space to store; the database contains only the definition of a view, not a copy of all the data that it presents.
Views can provide extra security, depending on the SQL engine used.
A common practice is to hide joins in a view to present the user a more denormalized data model. Other uses involve security (for example by hiding certain columns and/or rows) or performance (in case of materialized views)
Views are handy when you need to select from several tables, or just to get a subset of a table.
You should design your tables in such a way that your database is well normalized (minimum duplication). This can make querying somewhat difficult.
Views are a bit of separation, allowing you to view the data in the tables differently than they are stored.
You should design your table WITHOUT considering the views.
Apart from saving joins and conditions, Views do have a performance advantage: SQL Server may calculate and save its execution plan in the view, and therefore make it faster than "on the fly" SQL statements.
View may also ease your work regarding user access at field level.
First of all as the name suggests a view is immutable. thats because a view is nothing other than a virtual table created from a stored query in the DB.
Because of this you have some characteristics of views:
you can show only a subset of the data
you can join multiple tables into a single view
you can aggregate data in a view (select count)
view dont actually hold data, they dont need any tablespace since they are virtual aggregations of underlying tables
so there are a gazillion of use cases for which views are better fitted than tables, just think about only displaying active users on a website. a view would be better because you operate only on a subset of the data which actually is in your DB (active and inactive users)
check out this article
hope this helped..