In my case, the fact (sales) is from a view on a SQL Server 2005 database. The view is for two years of data. When it comes to partition design I want to build one partition for each year. So can I use different quires to build the partitions? What's the rule here? Is it I only needs to make the query returns exact same columns as the view? If yes, is dsv only servers as a meta data describing the relationship between facts and dimensions but actual fact data is from the partitions?
Yes, it is exactly as you described.
Related
Maybe I'm using the wrong search terms for this, but I'm trying to understand if there's a difference between view vs. table query performance in Netezza. I have an Inventory view, which also considers Currency, that has grown to nearly 2B records since inception several years ago. The view is created by joining several tables and query performance has degraded over time. I'm wondering if it would help to create a new physical table that does the joins the view is currently doing and then create my new view as basically a "SELECT * FROM [THIS_NEW_TABLE]". Would this new view then theoretically perform better than the original one, where the joins are in the view? I know I can test this to see the results, but I'm trying to understand why one would be better than the other.
The answer is "yes", at least under most circumstances. Selecting from a base table versus a view has these advantages:
The engine only has to read the data in the columns of the resulting table. With a view, additional columns may need to be read.
The engine has accurate statistics on the resulting table, which can be used for the rest of the query.
The resulting table can be indexed to further speed queries that use it.
The downside, of course, is that the table is immediately out of date when new data is inserted into (or updated in or deleted from) the base tables. If you can live with that problem -- say by recreating the table once per day or once per week -- then you should see a performance improvement.
Some databases offer materialized views to overcome this issue. Alas, Netezza restricts materialized views to a single table, so that doesn't particularly help you.
Hi I have a question regarding star schema query in MS SQL datawarehouse.
I have a fact table and 8 dimensions. And I am confused, to get the metrics from Fact, do we have to join all dimensions with Fact, even though I am not getting data from them? Is this required for the right metrics?
My fact table is huge, so that's why I am wondering for performance purposes and the right way to query.
Thanks!
No you do not have to join all 8 dimensions. You only need to join the dimensions that contain data you need for analyzing the metrics in the fact table. Also to increase performance make sure to only include columns from the dimension table that are needed for the analysis. Including all columns from the dimensions you join will decrease performance.
It is not necessary to include all the dimensions. Indeed, while exploring fact tables, It is very important to have the possibility to select only some dimensions to join and drop the others. The performance issues must not be an excuse to give up on this capability.
You have a bunch of different techniques to solve performance issues depending on the database you are using. Some common ways :
aggregate tables : it is one of the best way to solve performance issues. If you have a huge fact table, you can create an aggregate version of it, using only the most frequently queried columns. This way, it should be much smaller. Then, users (or the adhoc query application) has to use the aggregrate table instead of the original fact table when this is possible. The good news is that most databases know how to automatically manage aggregate tables (materialized views for example). Queries that initially target the original fact table, are transparently redirected to the aggregate table whenever possible.
indexing : bitmap indexing for example can be an efficient way to increase performance in a star schema fact table.
I have a SQL Server 2008 R2 database containing a very huge table that we use for reporting. Every night around 40,000 records are inserted into the table. I read in many articles that Indexed views are suitable for OLAP or Warehouse databases, not for transaction tables.
My goal is not to query the whole table, but to query a subset, say last 3 months data. Don't want to use triggers to create a subset. Would an indexed view be suitable for my scenario ? If not, any better ideas ?
You might need to check some repercussions about using an indexed view. Here are some details of some items to consider before. http://msdotnetbuddy.blogspot.com/2010/12/indexed-view-in-mssql-server.html
You could also partition your big table, into let's say having only quarterly data. You would only query on a subset. If that is not an option, you could also create a temporary cache table, that only contains data specific for this report.
You could use an indexed view, you will need to use the "with schemabinding" keywords, you can put this into any search engine to find the implications of using this.
Is there any Database Server that offers the possibility to do global projection of the entire database? For example suppose that we have 30 tables that have a 'Year' column, and the database has data for the last 5 years, and let's say that we are interested in one year of data at a time, is there any way to do global projection so we can have a view of the database that include only data for one year at a time?
If you really must not alter existing code to have it only show the past year, then try to make a view for every table, have this view only show you the 'current year' if you want to show anything other than the current year you then can query the source table. You rename the table and name the view as the table was (though this is a generally sloppy practice).
Otherwise you're going to have to use a WHERE clause in all your queries.
Realistically this is something that your ORM should be dealing with NOT your RDBMS.. unless you're doing raw SQL queries in your code (in which case see the start of my answer for the VIEW option).
A UNION query with a WHERE clause to filter by a year date range should solve what you are describing.
All the major RDBMS support this functionality.
If the tables all have the same schema then it's easy; if not, you will probably have to introduce 'dummy' columns for some portions of the UNION.
[SGBD is the french term for a RDBMS: What does SGBD mean? ]
the mysql certification guide suggests that views can be used for:
creating a summary that may involve calculations
selecting a set of rows with a WHERE clause, hide irrelevant information
result of a join or union
allow for changes made to base table via a view that preserve the schema of original table to accommodate other applications
but from how to implement search for 2 different table data?
And maybe you're right that it doesn't
work since mysql views are not good
friends with indexing. But still. Is
there anything to search for in the
shops table?
i learn that views dont work well with indexing so, will it be a big performance hit, for the convenience it may provide?
A view can be simply thought of as a SQL query stored permanently on the server. Whatever indices the query optimizes to will be used. In that sense, there is no difference between the SQL query or a view. It does not affect performance any more negatively than the actual SQL query. If anything, since it is stored on the server, and does not need to be evaluated at run time, it is actually faster.
It does afford you these additional advantages
reusability
a single source for optimization
This mysql-forum-thread about indexing views gives a lot of insight into what mysql views actually are.
Some key points:
A view is really nothing more than a stored select statement
The data of a view is the data of tables referenced by the View.
creating an index on a view will not work as of the current version
If merge algorithm is used, then indexes of underlying tables will be used.
The underlying indices are not visible, however. DESCRIBE on a view will show no indexed columns.
MySQL views, according to the official MySQL documentation, are stored queries that when invoked produce a result set.
A database view is nothing but a virtual table or logical table (commonly consist of SELECT query with joins). Because a database view is similar to a database table, which consists of rows and columns, so you can query data against it.
Views should be used when:
Simplifying complex queries (like IF ELSE and JOIN or working with triggers and such)
Putting extra layer of security and limit or restrict data access (since views are merely virtual tables, can be set to be read-only to specific set of DB users and restrict INSERT )
Backward compatibility and query reusability
Working with computed columns. Computed columns should NOT be on DB tables, because the DB schema would be a bad design.
Views should not be use when:
associate table(s) is/are tentative or subjected to frequent structure change.
According to http://www.mysqltutorial.org/introduction-sql-views.aspx
A database table should not have calculated columns however a database view should.
I tend to use a view when I need to calculate totals, counts etc.
Hope that help!
One more down side of view that doesn't work well with mysql replicator as well as it is causing the master a bit behind of the slave.
http://bugs.mysql.com/bug.php?id=30998