Is it possible to add indices to a View in Teradata? Aiming to make querying of Views faster by adding indices.
Tried using SQL to check for indexes on existing Views by using SELECT * FROM DBC.Indices. Yet there are only results for Tables, none for Views.
I have also been checked internet, but have so far been unable to find anything so far for Teradata.
I would have expected to be able to find an index on some of the existing Views, if it was possible.
An index is always associated with a table (or with multiple tables in case of the Join index), never with a view.
But: the execution plan of a query (and consequently its performance) depends, among other things, on the indices defined on the tables involved in the query.
So while you can't create an index on a view, you can create indices on the underlying tables, and it will change how the queries referring the view are executed.
However, before starting to create additional indices in the hope of fixing your performance problem, you should first inspect the execution plan of the problematic queries and determine what the correct plan ought to look like. The problem might not be the lack of indices, but rather lack of up-to-date statistics, poor queries, or bad table design (e.g. wrong PI or poor partitioning).
Related
I found my query runs extremely slow, when two tables are inner joined together for a large data scale, while for a small dataset it was fine. I was told that sqlite3 can only use one index for every table in a query, but I am not sure about that, does anyone know where the information can be found? Thanks a lot!
Much of the information about how sqlite uses indexes can be found in the following documentation urls:
https://www.sqlite.org/queryplanner.html
https://www.sqlite.org/optoverview.html
https://www.sqlite.org/queryplanner-ng.html
Basically, each time a table is used in a query, at most one index of the table can normally be used (a notable exception is that OR conditions in a WHERE or ON might allow multiple indexes, one per term). The query planner tries to pick the one that will be most appropriate and fastest. Running ANALYZE on a populated database generates statistics that can be used to improve the selection when there are multiple possible indexes.
Understanding EXPLAIN QUERY PLAN reports is essential to figuring out how a particular query ends up being resolved and how and what indexes are involved.
Scenario:
I have 3 tables needing to be joined together, a where clause to limit the result set, and only a few columns from each table being selected. Simple. However, the query to do this isn't very pretty, and when using an ORM between the database and the application, its like trying to put a square peg into a round hole.
My way to get around this is to create a view that embraces the query and now my application model maps directly to a view in the database; no more crazy mapping the ORM layer.
Question:
Assuming no other factors come into play here, will the query against the view incur any additional performance penalties that I wouldn't have hit if I executed the SQL statement directly? - This is not an indexed view, assume the same where clause, keep this simple.
I am being led to believe that a view suffers from extra overhead of "being built". My understanding is that with all else the same, the two should have identical performance.
Please clarify. Thanks!
From MSDN:
View resolution
When an SQL statement references a nonindexed view, the parser and query optimizer analyze the source of both the SQL statement and the view and then resolve them into a single execution plan. There is not one plan for the SQL statement and a separate plan for the view.
There should not be any different performance. Views helps you organize, not any performance enhancement. Unless you are using indexed views.
Only the definition of a nonindexed view is stored, not the rows of the view. The query optimizer incorporates the logic from the view definition into the execution plan it builds for the SQL statement that references the nonindexed view.
In Oracle, the performance is the same. A view is really a named sql statement. But fancier.
When you start nesting views, and joining views with other table or views, things get complicated real quick. If Oracle can't push your filters down the view to the table, it often has to materialize (build a temp table of) parts of the query, and this is when you get the bad performance.
Microsoft SQL Server allows you to add an index to a view, but why would you want to do this?
My understanding is that a view is really just a subquery, i.e., if I say SELECT * FROM myView, i'm really saying SELECT * FROM (myView's Query)
It seems like the indexes on the underlying tables would be the ones that matter the most. So why would you want a separate index on the view?
If the view is indexed then any queries that can be answered using the index only will never need to refer to the underlying tables. This can lead to an enormous improvement in performance.
Essentially, the database engine is maintaining a "solved" version of the query (or, rather, the index of the query) as you update the underlying tables, then using that solved version rather than the original tables when possible.
Here is a good article in Database Journal.
Microsoft SQL Server allows you to add an index to a view, but why would you want to do this?
To speed up the queries.
My understanding is that a view is really just a subquery, i.e., if I say SELECT * FROM myView, i'm really saying SELECT * FROM (myView's Query)
Not always.
By creating a clustered index on a view, you materialize the view, and updates to the underlying tables physically update the view. The queries against this view may or may not access the underlying tables.
Not all views can be indexed.
For instance, if you are using GROUP BY in a view, for it to be indexable it should contain a COUNT_BIG and all aggregate functions in it should distribute over UNION ALL (only SUM and COUNT_BIG actually are). This is required for the index to be maintainable and the update to the underlying tables could update the view in a timely fashion.
the following link provides better worded information than i can say especially in the section under performance increases. Hope it helps
http://technet.microsoft.com/en-us/library/cc917715.aspx
You create an index on a view for the same reason as on a base table: to improve the performance of queries against that view. Another reason for doing it is to implement some uniqueness constraint you can't implement against base tables. SQL Server unfortunately doesn't allow constraints to be created on views.
the mysql certification guide suggests that views can be used for:
creating a summary that may involve calculations
selecting a set of rows with a WHERE clause, hide irrelevant information
result of a join or union
allow for changes made to base table via a view that preserve the schema of original table to accommodate other applications
but from how to implement search for 2 different table data?
And maybe you're right that it doesn't
work since mysql views are not good
friends with indexing. But still. Is
there anything to search for in the
shops table?
i learn that views dont work well with indexing so, will it be a big performance hit, for the convenience it may provide?
A view can be simply thought of as a SQL query stored permanently on the server. Whatever indices the query optimizes to will be used. In that sense, there is no difference between the SQL query or a view. It does not affect performance any more negatively than the actual SQL query. If anything, since it is stored on the server, and does not need to be evaluated at run time, it is actually faster.
It does afford you these additional advantages
reusability
a single source for optimization
This mysql-forum-thread about indexing views gives a lot of insight into what mysql views actually are.
Some key points:
A view is really nothing more than a stored select statement
The data of a view is the data of tables referenced by the View.
creating an index on a view will not work as of the current version
If merge algorithm is used, then indexes of underlying tables will be used.
The underlying indices are not visible, however. DESCRIBE on a view will show no indexed columns.
MySQL views, according to the official MySQL documentation, are stored queries that when invoked produce a result set.
A database view is nothing but a virtual table or logical table (commonly consist of SELECT query with joins). Because a database view is similar to a database table, which consists of rows and columns, so you can query data against it.
Views should be used when:
Simplifying complex queries (like IF ELSE and JOIN or working with triggers and such)
Putting extra layer of security and limit or restrict data access (since views are merely virtual tables, can be set to be read-only to specific set of DB users and restrict INSERT )
Backward compatibility and query reusability
Working with computed columns. Computed columns should NOT be on DB tables, because the DB schema would be a bad design.
Views should not be use when:
associate table(s) is/are tentative or subjected to frequent structure change.
According to http://www.mysqltutorial.org/introduction-sql-views.aspx
A database table should not have calculated columns however a database view should.
I tend to use a view when I need to calculate totals, counts etc.
Hope that help!
One more down side of view that doesn't work well with mysql replicator as well as it is causing the master a bit behind of the slave.
http://bugs.mysql.com/bug.php?id=30998
What are the patterns you use to determine the frequent queries?
How do you select the optimization factors?
What are the types of changes one can make?
This is a nice question, if rather broad (and none the worse for that).
If I understand you, then you're asking how to attack the problem of optimisation starting from scratch.
The first question to ask is: "is there a performance problem?"
If there is no problem, then you're done. This is often the case. Nice.
On the other hand...
Determine Frequent Queries
Logging will get you your frequent queries.
If you're using some kind of data access layer, then it might be simple to add code to log all queries.
It is also a good idea to log when the query was executed and how long each query takes. This can give you an idea of where the problems are.
Also, ask the users which bits annoy them. If a slow response doesn't annoy the user, then it doesn't matter.
Select the optimization factors?
(I may be misunderstanding this part of the question)
You're looking for any patterns in the queries / response times.
These will typically be queries over large tables or queries which join many tables in a single query. ... but if you log response times, you can be guided by those.
Types of changes one can make?
You're specifically asking about optimising tables.
Here are some of the things you can look for:
Denormalisation. This brings several tables together into one wider table, so in stead of your query joining several tables together, you can just read one table. This is a very common and powerful technique. NB. I advise keeping the original normalised tables and building the denormalised table in addition - this way, you're not throwing anything away. How you keep it up to date is another question. You might use triggers on the underlying tables, or run a refresh process periodically.
Normalisation. This is not often considered to be an optimisation process, but it is in 2 cases:
updates. Normalisation makes updates much faster because each update is the smallest it can be (you are updating the smallest - in terms of columns and rows - possible table. This is almost the very definition of normalisation.
Querying a denormalised table to get information which exists on a much smaller (fewer rows) table may be causing a problem. In this case, store the normalised table as well as the denormalised one (see above).
Horizontal partitionning. This means making tables smaller by putting some rows in another, identical table. A common use case is to have all of this month's rows in table ThisMonthSales, and all older rows in table OldSales, where both tables have an identical schema. If most queries are for recent data, this strategy can mean that 99% of all queries are only looking at 1% of the data - a huge performance win.
Vertical partitionning. This is Chopping fields off a table and putting them in a new table which is joinned back to the main table by the primary key. This can be useful for very wide tables (e.g. with dozens of fields), and may possibly help if tables are sparsely populated.
Indeces. I'm not sure if your quesion covers these, but there are plenty of other answers on SO concerning the use of indeces. A good way to find a case for an index is: find a slow query. look at the query plan and find a table scan. Index fields on that table so as to remove the table scan. I can write more on this if required - leave a comment.
You might also like my post on this.
That's difficult to answer without knowing which system you're talking about.
In Oracle, for example, the Enterprise Manager lets you see which queries took up the most time, lets you compare different execution profiles, and lets you analyze queries over a block of time so that you don't add an index that's going to help one query at the expense of every other one you run.
Your question is a bit vague. Which DB platform?
If we are talking about SQL Server:
Use the Dynamic Management Views. Use SQL Profiler. Install the SP2 and the performance dashboard reports.
After determining the most costly queries (i.e. number of times run x cost one one query), examine their execution plans, and look at the sizes of the tables involved, and whether they are predominately Read or Write, or a mixture of both.
If the system is under your full control (apps. and DB) you can often re-write queries that are badly formed (quite a common occurrance), such as deep correlated sub-queries which can often be re-written as derived table joins with a little thought. Otherwise, you options are to create covering non-clustered indexes and ensure that statistics are kept up to date.
For MySQL there is a feature called log slow queries
The rest is based on what kind of data you have and how it is setup.
In SQL server you can use trace to find out how your query is performing. Use ctrl + k or l
For example if u see full table scan happening in a table with large number of records then it probably is not a good query.
A more specific question will definitely fetch you better answers.
If your table is predominantly read, place a clustered index on the table.
My experience is with mainly DB2 and a smattering of Oracle in the early days.
If your DBMS is any good, it will have the ability to collect stats on specific queries and explain the plan it used for extracting the data.
For example, if you have a table (x) with two columns (date and diskusage) and only have an index on date, the query:
select diskusage from x where date = '2008-01-01'
will be very efficient since it can use the index. On the other hand, the query
select date from x where diskusage > 90
would not be so efficient. In the former case, the "explain plan" would tell you that it could use the index. In the latter, it would have said that it had to do a table scan to get the rows (that's basically looking at every row to see if it matches).
Really intelligent DBMS' may also explain what you should do to improve the performance (add an index on diskusage in this case).
As to how to see what queries are being run, you can either collect that from the DBMS (if it allows it) or force everyone to do their queries through stored procedures so that the DBA control what the queries are - that's their job, keeping the DB running efficiently.
indices on PKs and FKs and one thing that always helps PARTITIONING...
1. What are the patterns you use to determine the frequent queries?
Depends on what level you are dealing with the database. If you're a DBA or a have access to the tools, db's like Oracle allow you to run jobs and generate stats/reports over a specified period of time. If you're a developer writing an application against a db, you can just do performance profiling within your app.
2. How do you select the optimization factors?
I try and get a general feel for how the table is being used and the data it contains. I go about with the following questions.
Is it going to be updated a ton and on what fields do updates occur?
Does it have columns with low cardinality?
Is it worth indexing? (tables that are very small can be slowed down if accessed by an index)
How much maintenance/headache is it worth to have it run faster?
Ratio of updates/inserts vs queries?
etc.
3. What are the types of changes one can make?
-- If using Oracle, keep statistics up to date! =)
-- Normalization/De-Normalization either one can improve performance depending on the usage of the table. I almost always normalize and then only if I can in no other practical way make the query faster will de-normalize. A nice way to denormalize for queries and when your situation allows it is to keep the real tables normalized and create a denormalized "table" with a materialized view.
-- Index judiciously. Too many can be bad on many levels. BitMap indexes are great in Oracle as long as you're not updating the column frequently and that column has a low cardinality.
-- Using Index organized tables.
-- Partitioned and sub-partitioned tables and indexes
-- Use stored procedures to reduce round trips by applications, increase security, and enable query optimization without affecting users.
-- Pin tables in memory if appropriate (accessed a lot and fairly small)
-- Device partitioning between index and table database files.
..... the list goes on. =)
Hope this is helpful for you.