Refresh strategy for materialized views in a data warehouse

Refresh strategy for materialized views in a data warehouse - sql

I have a system that has a materialized view that contains roughly 1 billion items, on a consistent two hour basis I need to update about 200 million (20% of the records). My question is what should the refresh strategy on my materialized view be? As of right now it is refresh with an interval. I am curious as to the performance impacts between refreshing on an interval vice refresh never and rename/replace the old materialized view with the new one. The underlying issue is the indices that are used by Oracle which creates a massive amount of redo. Any suggestions are appreciated.
UPDATE
Since some people seem to think this is off topic my current view point is to do the following:
Create an Oracle Schedule Chain that invokes a series of PL/SQL (programming language I promise) functions to refresh materialized view in a pseudo-parallel fashion. However, being as though I fell into the position of a DBA of sorts, I am looking to solve a data problem with an algorithm and/or some code.

Ok so here is the solution I came up with, your mileage may vary and any feedback is appreciated after the fact. The overall strategy was to do the following:
1) Utilize the Oracle Scheduler making use of parallel execution of chains (jobs)
2) Utilize views (the regular kind) as the interface from the application into the database
3) Rely on materialized views to be built in the following manner
create materialized view foo
parallel
nologging
never refresh
as
select statement
as needed use the following:
create index baz on foo(bar) nologging
The advantage of this is that we can build the materialized view in the background before dropping + recreating the view as described in step 2. Now the advantage is creating dynamically named materialized views, while keeping the view with the same name. The key is to not blow away the original materialized view until the new one is finished. This also allows for quick drops, as there is minimum redo to care about. This enabled materialized view creation on ~1 billion records in 5 minutes which met our requirement of "refreshes" every thirty minutes. Further this is able to be handled on a single database node, so even with constrained hardware, it is possible.
Here is a PL/SQL function that will create it for you:
CREATE OR REPLACE procedure foo_bar as
foo_view varchar2(500) := 'foo_'|| to_char(sysdate,'dd_MON_yyyy_hh_mi_ss');
BEGIN
execute immediate
'Create materialized view '|| foo_view || '
parallel
nologging
never refresh
as
select * from cats';
END foo_bar;

Related

Why a query on VIEW is executing faster than a query on MATERIALIZED VIEW

I have created two views of a table in snowflake database with same select statement, one is a normal view and the other is a materialized view as below,
create view view1
as ( select *
from customer
where name ilike 'a%')
create materialized view view2
as ( select *
from customer
where name ilike 'a%')
Then queried the views as below,
Select *
from view1 ----normal view
Select *
from view2 -----materialized view
(suspended warehouse and resumed to remove any cache before executing above queries individually. I have repeated execution many times in same manner.)
But against expectation, Materialized view is always taking longer than normal view.
Why is this?

It could be a number of things. What I would suggest is here:
Ensure that the result cache is turned off
ALTER SESSION SET USE_CACHED_RESULT = FALSE
Run them in a warehouse that's been turned off for hours - In my experience, restarting the virtual warehouse does not completely delete cached data. Do not run the query while the VW is off, manually turn them on first before running the query to avoid query delays to provision the warehouse.
Run them and check the ff. in QUERY_HISTORY View to get a better idea of what have happened
PERCENTAGE_SCANNED_FROM_CACHE
COMPILATION_TIME
QUEUED_REPAIR_TIME
TRANSACTION_BLOCKED_TIME
EXECUTION_TIME - I believe this holds the actual execution time which excludes the time spent in compilation as opposed to TOTAL_ELAPSED_TIME
QUEUED_OVERLOAD_TIME
Here the QUERY_HISTORY documentation to get more details
You might also want to check the Query Profile, though I think the query using an MV would show a straightforward single step retrieve but would still be worth checking to compare and understand both queries.

Generated de-normalised View table

We have a system that makes use of a database View, which takes data from a few reference tables (lookups) and then does a lot of pivoting and complex work on a hierarchy table of (pretty much fixed and static) locations, returning a view view of data to the application.
This view is getting slow, as new requirements are added.
A solution that may be an option would be to create a normal table, and select from the view, into this table, and let the application use that highly indexed and fast table for it's querying.
Issue is, I guess, if the underlying tables change, the new table will show old results. But the data that drives this table changes very infrequently. And if it does - a business/technical process could be made that means an 'Update the Table' procedure is run to refresh this data. Or even an update/insert trigger on the primary driving table?
Is this practice advised/ill-advised? And are there ways of making it safer?

The ideal solution is to optimise the underlying queries.
In SSMS run the slow query and include the actual execution plan (Ctrl + M), this will give you a graphical representation of how the query is being executed against your database.
Another helpful tool is to turn on IO statistics, this is usually the main bottleneck with queries, put this line at the top of your query window:
SET STATISTICS IO ON;
Check if SQL recommends any missing indexes (displayed in green in the execution plan), as you say the data changes infrequently so it should be safe to add additional indexes if needed.
In the execution plan you can hover your mouse over any element for more information, check the value for estimated rows vs actual rows returned, if this varies greatly update the statistics for the tables, this can help the query optimiser find the best execution plan.
To do this for all tables in a database:
USE [Database_Name]
GO
exec sp_updatestats
Still no luck in optimising the view / query?
Be careful with update triggers as if the schema changes on the view/table (say you add a new column to the source table) the new column will not be inserted into your 'optimised' table unless you update the trigger.
If it is not a business requirement to report on real time data there is not too much harm in having a separate optimized table for reporting (Much like a DataMart), just use a SQL Agent job to refresh it nightly during non-peak hours.
There are a few cons to this approach though:
More storage space / duplicated data
More complex database
Additional workload during the refresh
Decreased cache hits

Materialized view in Oracle? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
What is Materialized view in Oracle? What is the use of it? I searched this topic on the net but i cannot able to get an clear idea of it. So can you please explain this topic with an clear example. So that it will be more useful for me to understand the topic clearly.

A Materialized view is an RDMS provided mechanism to trade additional storage consumption for better query performance.
For example suppose that you have a really big query with 10 table joins that takes a long time to return data. If you convert the query into a materialized view the results of this query will be materialized into a special db table on disk automatically, whats even better is that as rows are added/updated/deleted they are automatically reflected in the materialized view.
The tradeoff of this handy tool is slower inserts and updates on underlying tables though. Materialized views is one of the few redeeming qualities of Oracle IMHO.
Here is an example of a two table join MATERIALIZED VIEW.
CREATE MATERIALIZED VIEW MV_Test
NOLOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT V.*, P.*, V.ROWID as V_ROWID, P.ROWID as P_ROWID, (2+1) as Calc1, 'Constant1' as Const1
FROM TPM_PROJECTVERSION V
INNER JOIN TPM_PROJECT P ON P.PROJECTID = V.PROJECTID
Now instead of running this same query everytime you can just run this simpler query against your new view which will run faster. The really cool thing is that you can also add derived and calculated columns too.
SELECT * FROM MV_Test WHERE ...
P.S.
MATERIALIZED VIEWS are not a panacea, use them in cases where you have a really slow query with lots of joins that is frequently used and where the reads far outweigh the writes.

A (nearly) real-world example.
Suppose you were asked to develop an enterprise-wide real-time inventory report that will output total worth of inventory across all warehouses of the enterprise.
You would then need to create a query to
sum up all transaction stored in the inventory transaction table grouped by item and warehouse
join the sums with the table storing current price per unit of measure
sum up again per warehouse
In an enterprise, such a query would take hours to complete (even medium companies may have hundreds of thousands of different items) and its performance would deteriorate over time (imagine this query running over 5 years of data).
So, you would write the same (more or less) query as a materialized view. When created, oracle will populate a table (think of it as a hidden table) with the results of your query, and then, each time a transaction is commited to the inventory, will update the record that has to do with this specific item. If an item's price has changed, it will update its worth. In general, every change on the underlying tables will be reflected on your materialized view immediately. Then, your report will run at a very reasonable time.
On top of that, by using GROUPING BY and GROUPING you may get different levels of drilling on the same Materialized View.
Keep in mind, though, that this is an idealized example. In practice, ON COMMIT (i.e. updating the matierialized view the same time with your underlying tables) may cause problems when you create a materialized view over frequently update tables (and inventory transactions are usually that) and you may write, depending on the case, intermediate MVs to boost up performance. Refreshing such a view every 5 minutes is a viable alternative.
MVs are a very powerful feature, but you need to use them with care.

What is a Materialized View?
A materialized view is a replica of a target master from a single
point in time. The master can be either a master table at a master
site or a master materialized view at a materialized view site.
Whereas in multimaster replication tables are continuously updated by
other master sites, materialized views are updated from one or more
masters through individual batch updates, known as a refreshes, from a
single master site or master materialized view site, as illustrated in
Figure 3-1. The arrows in Figure 3-1 represent database links.
--http://docs.oracle.com/cd/A97630_01/server.920/a96567/repmview.htm
Example-- creates a materialized view for employees table. The refresh can be set to preference, so read documentation in link above.
CREATE MATERIALIZED VIEW employee_mv
REFRESH FORCE
BUILD IMMEDIATE
ON DEMAND
AS
SELECT * FROM employees
A materialized view can also only contain a subset of data
CREATE MATERIALIZED VIEW employee_mv
REFRESH FORCE
BUILD IMMEDIATE
ON DEMAND
AS
SELECT name, ssn, address FROM employees

Replicate table from external database to internal

I need to replicate a table from an external db to an internal db for performance reasons. Several apps will use this local db to do joins and compare data. I only need to replicate every hour or so but if there is a performance solution, I would prefer to replicate every 5 to 10 minutes.
What would be the best way to replicate? The first thing that comes to mind is DROP and then CREATE:
DROP TABLE clonedTable;
CREATE TABLE clonedTable AS SELECT * from foo.extern#data.sourceTable;
There has to be a better way right? Hopefully an atomic solution to avoid the fraction of a second where the table doesn't exist but someone might try to query it.

The simplest possible solution would be a materialized view that is set to refresh every hour.
CREATE MATERIALIZED VIEW mv_cloned_table
REFRESH COMPLETE
START WITH sysdate + interval '1' minute
NEXT sysdate + interval '1' hour
AS
SELECT *
FROM foo.external_table#database_link;
This will delete all the data currently in mv_cloned_table, insert all the data from the table in the external database, and then schedule itself to run again an hour after it finishes (so it will actually be 1 hour + however long it takes to refresh between refreshes).
There are lots of ways to optimize this.
If the folks that own the source database are amenable to it, you can ask them to create a materialized view log on the source table. That would allow your materialized view to replicate just the changes which should be much more efficient and would allow you to schedule refreshes much more frequently.
If you have the cooperation of the folks that own the source database, you could also use Streams instead of materialized views which would let you replicate the changes in near real time (a lag of a few seconds would be common). That also tends to be more efficient on the source system than maintaining the materialized view logs would be. But it tends to take more admin time to get everything working properly-- materialized views are much less flexible and less efficient but pretty easy to configure.
If you don't mind the table being empty during a refresh (it would exist, it would just have no data), you can do a non-atomic refresh on the materialized view which would do a TRUNCATE followed by a direct-path INSERT rather than a DELETE and conventional path INSERT. The former will be much more efficient but will mean that the table appears empty when you're doing joins and data comparisons on the local server which seems unlikely to be appropriate in this situation.
If you want to go down the path of having the source side create a materialized view log so that you can do an incremental refresh, on the source side, assuming the source table has a primary key, you'd ask them to
CREATE MATERIALIZED VIEW LOG ON foo.external_table
WITH PRIMARY KEY
INCLUDING NEW VALUES;
The materialized view that you would create would then be
CREATE MATERIALIZED VIEW mv_cloned_table
REFRESH FAST
START WITH sysdate + interval '1' minute
NEXT sysdate + interval '1' hour
WITH PRIMARY KEY
AS
SELECT *
FROM foo.external_table#database_link;

ORACLE - Materialized View LOG

I have a table with a MVIEW Log, i would like to know if its suspicious to have :
SELECT count(*) from Table
8036132 rows
and
SELECT count(*) from MLOG$_Table
81657998 rows
Im asking this question because i get an error when trying to refresh my MVIEW
ORA-30036 : unable to extend segment by 4 in undo tablespace 'UNDOTBS1' and i would like to know if something could be done except of extending the undo Tablespace?
Thanks in advance

Yes, that is suspicious.
You need materialized view logs to be able to do a fast refresh. A fast refresh is really an incremental refresh: a refresh that only refreshes the last changes to avoid having to do a complete refresh, which could be time-consuming. If your materialized view log contains 10 times as much rows as your original table, then you defeat the purpose of a fast refresh.
I'd first look into why this materialized view log contains this much rows. If you can avoid that, then your other problem - the ORA-30036 - will likely disappear as well.
Regards,
Rob.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas