I have existing tables that are pretty much denormalized. There are no lookup tables for things list status, type, country, etc... This original design was done just to simplify the application's access to the database, so there was no performance reason for this denormalization.
This has resulted in tables with tons of duplicate data, and I would like to normalize properly by introducing lookup tables for various status/type/country columns.
Is there some was I can do this in the database (oracle) that would remain transparent to clients? Applications would continue to do inserts but the database would map things to the proper lookup tables behind the scenes.
I've been experimenting with a combination of views and triggers that will do the mapping, but it feels like there should be a more automatic way of doing this.
In the general case, you can make your changes transparent to the users if you can create updatable views.
Normalize a base table to 3NF, BCNF, or 5NF.
Rename the original base table.
Build an updatable view that has the same name, columns, and rows as
the original, denormalized base table.
Make sure the permissions on the new view correlate with the
permissions on the original base table.
Test.
Repeat until done.
Any client software that tries to SELECT, INSERT, UPDATE, or DELETE the original base table will hit the updatable view instead. (That's because tables and views share a namespace, and that's not an accident.) The dbms and your supporting code will make sure the Right Thing happens.
Depending on your platform and decomposition, building an updatable view might be easy, and it might be impossible. On Oracle, I think the worst case is that you'd have to write INSTEAD OF triggers to support all the query operations. That's not too bad.
But based on a few months knocking around on SO, I have to say I'm not 100% confident you really need to do this, or that you really want to do this. Post your tables' DDL and representative sample data as SQL INSERT statements, and we can offer better, more concrete suggestions.
Related
I am using PostgreSQL and I want to create a layer of views on top of all the tables in my database schema. I will implement a 1-to-1 mapping of view to table, so these views will not have joins in them. The purpose is to provide a read-only abstraction so that I can change the underlying table structure of the database over time but control what is exposed through the views.
The question I have is when I start querying (SELECT statements only) using the views, including some complex joins and other complex query dynamics like aggregation/grouping, will PostgreSQL make use of the indexes on the underlying tables as if I was querying them directly?
I am starting with a PoC of this now. I don't have any results yet, but wanted to hear from other people's knowledge, experiences and opinions.
Yes, the engine will use available indexes and optimize the code. It will basically replace the view with its definition and build the plan.
Here you can fine some example and test it further.
I am learning SQL. It seems that PostgreSQL allows you to update a table through a 'view', if you have visibility of a few select columns of the table. On the other hand, SQLite simply does not support this (which makes more sense to me).
I wonder whether it is a good practice to update tables through views even when it is allowed?
This question may be a matter of opinion, but I would say that updating data through a view is generally not a good practice, although there are exceptions.
One of the main reasons to define views is to isolate users from changes in underlying data structures. Because not all views are updatable, that means that a change to the definition of a view (but not the result set) could invalidate code.
In some databases, it is possible to get around this by using triggers on views.
I should add that is "general" thinking. Another reason to have views is for access control and security. For instance, some users may not be able to see some columns in some tables; they have access to the view but not the underlying table. In this case, updates to views are a bit more reasonable.
All that said, I should point out that I'm not really a fan of having users update data directly at all. My preference is to do such updates through stored procedures, so there is much better control over the data model, auditing, and user-access.
When should a View actually be used over an actual Table? What gains should I expect this to produce?
Overall, what are the advantages of using a view over a table? Shouldn't I design the table in the way the view should look like in the first place?
Oh there are many differences you will need to consider
Views for selection:
Views provide abstraction over tables. You can add/remove fields easily in a view without modifying your underlying schema
Views can model complex joins easily.
Views can hide database-specific stuff from you. E.g. if you need to do some checks using Oracles SYS_CONTEXT function or many other things
You can easily manage your GRANTS directly on views, rather than the actual tables. It's easier to manage if you know a certain user may only access a view.
Views can help you with backwards compatibility. You can change the underlying schema, but the views can hide those facts from a certain client.
Views for insertion/updates:
You can handle security issues with views by using such functionality as Oracle's "WITH CHECK OPTION" clause directly in the view
Drawbacks
You lose information about relations (primary keys, foreign keys)
It's not obvious whether you will be able to insert/update a view, because the view hides its underlying joins from you
Views can:
Simplify a complex table structure
Simplify your security model by allowing you to filter sensitive data and assign permissions in a simpler fashion
Allow you to change the logic and behavior without changing the output structure (the output remains the same but the underlying SELECT could change significantly)
Increase performance (Sql Server Indexed Views)
Offer specific query optimization with the view that might be difficult to glean otherwise
And you should not design tables to match views. Your base model should concern itself with efficient storage and retrieval of the data. Views are partly a tool that mitigates the complexities that arise from an efficient, normalized model by allowing you to abstract that complexity.
Also, asking "what are the advantages of using a view over a table? " is not a great comparison. You can't go without tables, but you can do without views. They each exist for a very different reason. Tables are the concrete model and Views are an abstracted, well, View.
Views are acceptable when you need to ensure that complex logic is followed every time. For instance, we have a view that creates the raw data needed for all financial reporting. By having all reports use this view, everyone is working from the same data set, rather than one report using one set of joins and another forgetting to use one which gives different results.
Views are acceptable when you want to restrict users to a particular subset of data. For instance, if you do not delete records but only mark the current one as active and the older versions as inactive, you want a view to use to select only the active records. This prevents people from forgetting to put the where clause in the query and getting bad results.
Views can be used to ensure that users only have access to a set of records - for instance, a view of the tables for a particular client and no security rights on the tables can mean that the users for that client can only ever see the data for that client.
Views are very helpful when refactoring databases.
Views are not acceptable when you use views to call views which can result in horrible performance (at least in SQL Server). We almost lost a multimillion dollar client because someone chose to abstract the database that way and performance was horrendous and timeouts frequent. We had to pay for the fix too, not the client, as the performance issue was completely our fault. When views call views, they have to completely generate the underlying view. I have seen this where the view called a view which called a view and so many millions of records were generated in order to see the three the user ultimately needed. I remember one of these views took 8 minutes to do a simple count(*) of the records. Views calling views are an extremely poor idea.
Views are often a bad idea to use to update records as usually you can only update fields from the same table (again this is SQL Server, other databases may vary). If that's the case, it makes more sense to directly update the tables anyway so that you know which fields are available.
According to Wikipedia,
Views can provide many advantages over tables:
Views can represent a subset of the data contained in a table.
Views can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table.
Views can join and simplify multiple tables into a single virtual table.
Views can act as aggregated tables, where the database engine aggregates data (sum, average, etc.) and presents the calculated results as part of the data.
Views can hide the complexity of data. For example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table.
Views take very little space to store; the database contains only the definition of a view, not a copy of all the data that it presents.
Views can provide extra security, depending on the SQL engine used.
A common practice is to hide joins in a view to present the user a more denormalized data model. Other uses involve security (for example by hiding certain columns and/or rows) or performance (in case of materialized views)
Views are handy when you need to select from several tables, or just to get a subset of a table.
You should design your tables in such a way that your database is well normalized (minimum duplication). This can make querying somewhat difficult.
Views are a bit of separation, allowing you to view the data in the tables differently than they are stored.
You should design your table WITHOUT considering the views.
Apart from saving joins and conditions, Views do have a performance advantage: SQL Server may calculate and save its execution plan in the view, and therefore make it faster than "on the fly" SQL statements.
View may also ease your work regarding user access at field level.
First of all as the name suggests a view is immutable. thats because a view is nothing other than a virtual table created from a stored query in the DB.
Because of this you have some characteristics of views:
you can show only a subset of the data
you can join multiple tables into a single view
you can aggregate data in a view (select count)
view dont actually hold data, they dont need any tablespace since they are virtual aggregations of underlying tables
so there are a gazillion of use cases for which views are better fitted than tables, just think about only displaying active users on a website. a view would be better because you operate only on a subset of the data which actually is in your DB (active and inactive users)
check out this article
hope this helped..
I'm just trying to get a general idea of what views are used for in RDBMSes. That is to say, I know what a view is and how to make one. I also know what I've used them for in the past.
But I want to make sure I have a thorough understanding of what a view is useful for and what a view shouldn't be useful for. More specifically:
What is a view useful for?
Are there any situations in which it is tempting to use a view when you shouldn't use one?
Why would you use a view in lieu of something like a table-valued function or vice versa?
Are there any circumstances that a view might be useful that aren't apparent at first glance?
(And for the record, some of these questions are intentionally naive. This is partly a concept check.)
In a way, a view is like an interface. You can change the underlying table structure all you want, but the view gives a way for the code to not have to change.
Views are a nice way of providing something simple to report writers. If your business users want to access the data from something like Crystal Reports, you can give them some views in their account that simplify the data -- maybe even denormalize it for them.
1) What is a view useful for?
IOPO In One Place Only•Whether you consider the data itself or the queries that reference the joined tables, utilizing a view avoids unnecessary redundancy. •Views also provide an abstracting layer preventing direct access to the tables (and the resulting handcuffing referencing physical dependencies). In fact, I think it's good practice1 to offer only abstracted access to your underlying data (using views & table-valued functions), including views such as CREATE VIEW AS SELECT * FROM tblData1I hafta admit there's a good deal of "Do as I say; not as I do" in that advice ;)
2) Are there any situations in which it is tempting to use a view when you shouldn't use one?
Performance in view joins used to be a concern (e.g. SQL 2000). I'm no expert, but I haven't worried about it in a while. (Nor can I think of where I'm presently using view joins.)Another situation where a view might be overkill is when the view is only referenced from one calling location and a derived table could be used instead. Just like an anonymous type is preferable to a class in .NET if the anonymous type is only used/referenced once. • See the derived table description in http://msdn.microsoft.com/en-us/library/ms177634.aspx
3) Why would you use a view in lieu of something like a table-valued function or vice versa?
(Aside from performance reasons) A table-valued function is functionally equivalent to a parameterized view. In fact, a common simple table-valued function use case is simply to add a WHERE clause filter to an already existing view in a single object.
4) Are there any circumstances that a view might be useful that aren't apparent at first glance?
I can't think of any non-apparent uses of the top of my head. (I suppose if I could, that would make them apparent ;)
Views can be used to provide security (ie: users can have access to views that only access certain columns in a table), views can provide additional security for updates, inserts, etc. Views also provide a way to alias column names (as do sp's) but views are more of an isolation from the actual table.
In a sense views denormalize. Denormalization is sometimes necessary to provide data in a more meaningful manner. This is what a lot of applications do anyway by way of domain modeling in their objects. They help present the data in a way that more closely matches a business' perspective.
In addition to what the others have stated, views can also be useful for removing more complecated SQL queries from the application.
As an example, instead of in an application doing:
sql = "select a, b from table1 union
select a, b from table2";
You could abstract that to a view:
create view union_table1_table2_v as
select a,b from table1
union
select a,b from table2
and in the app code, simply have:
sql = "select a, b from union_table1_table2_v";
Also if the data structures ever change, you won't have to change the app code, recompile, and redeploy. you would just change the view in the db.
Views hide the database complexity. They are great for a lot of reasons and are useful in a lot of situations, but if you have users that are allowed to write their own queries and reports, you can use them as a safeguard to make sure they don't submit badly designed queries with nasty cartesian joins that take down your database server.
The OP asked if there were situations where it might be tempting to use a view, but it's not appropriate.
What you don't want to use a view for is a substitute for complex joins. That is, don't let your procedural programming habit of breaking a problem down into smaller pieces lead you toward using several views joined together instead of one larger join. Doing so will kill the database engine's efficiency since it's essentially doing several separate queries rather than one larger one.
For example, let's say you have to join tables A, B, C, and D together. You may be tempted to make a view out of tables A & B and a view out of C & D, then join the two views together. It's much better to just join A, B, C, and D in one query.
Views can centralize or consolidate data. Where I'm at we have a number of different databases on a couple different linked servers. Each database holds data for a different application. A couple of those databases hold information that are relavent to a number of different applications. What we'll do in those circumstances is create a view in that application's database that just pulls data from the database where the data is really stored, so that the queries we write don't look like they're going across different databases.
The responses so far are correct -- views are good for providing security, denormalization (although there is much pain down that road if done wrong), data model abstraction, etc.
In addition, views are commonly used to implement business logic (a lapsed user is a user who has not logged in in the last 40 days, that sort of thing).
Views save a lot of repeated complex JOIN statements in your SQL scripts. You can just encapsulate some complex JOIN in some view and call it in your SELECT statement whenever needed. This would sometimes be handy, straight forward and easier than writing out the join statements in every query.
A view is simply a stored, named SELECT statement. Think of views like library functions.
I wanted to highlight the use of views for reporting. Often, there is a conflict between normalizing the database tables to speed up performance, especially for editing and inserting data (OLTP uses), and denormalizing to reduce the number of table joins for queries for reporting and analysis (OLAP uses). Of necessity, OLTP usually wins, because data entry must have optimal performance. Creating views, then, for optimal reporting performance, can help to satisfy both classes of users (data entry and report viewers).
I remember a very long SELECT which involved several UNIONs. Each UNION included a join to a price table which was created on the fly by a SELECT that was itself fairly long and hard to understand. I think it would have been a good idea to have a view that to create the price table. It would have shortened the overall SELECT by about half.
I don't know if the DB would evaluate the view once, or once each time in was invoked. Anyone know? If the former, using a view would improved performance.
Anytime you need [my_interface] != [user_interface].
Example:
TABLE A:
id
info
VIEW for TABLE A:
Customer Information
this is a way you might hide the id from the customer and rename the info to a more verbose name both at once.
The view will use underlying index for primary key id, so you won't see a performance loss, just better abstraction of the select query.
I'm architecting a new app at the moment, with a high read:write ratio. At my current employer we have lots of denormalised data on our tables for performance reasons. Is it better practice to have totally 3NF tables and then use indexed views to do all the denormalisation? Should I run queries against the tables or views?
An example of some of the things I am interested are aggregates of columns child tables (e.g. having user post count stored somewhere).
In general it's a good idea to have denormalized views if you need to access across multiple normalized tables very frequently. In most cases it'll be a significant performance increase over using a join and querying directly against the tables, and it's usually not any less maintainable, since either your view or join can be written to be agnostic about changes to parts of the tables that it doesn't use.
Whether all your tables should be in the third normal form is another question. In most applications I've worked with the answer is most tables should be normalized this way, but there are exceptions. Whether to make an exception has to do with how the data is used, and whether you can be very confident about that use not changing in the future.
Having to go back and re-normalize later because you did something the wrong way can be costly, but over-normalizing data that should be straightforward to use and understand can make things more complicated and difficult to maintain than they need to be. Your mileage may vary.
If you are going to use views to present denormalized data to the user (and you're using SQL Server), you should check out the SCHEMABINDING clause. If a view is schemabound, you can index it, and the index will be updated when the underlying tables are updated. In this way, if the indexes are set up well, people who are looking for data can actually select from the index, so it won't need to rebuild the complex view for every query, but users will still see up-to-date date when the underlying tables change.