Materialized views are great however there is a cost associated with the jobs that maintain them. We want to create a materialized view over the top of a table into which we're streaming about 50million events per day and are worried about the cost implication of that materialized view.
How can we track the cost of maintaining those materialized views?
Try this.
select ref_tabls.table_id,jobs.*
from `project-id`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT jobs,
unnest(referenced_tables) as ref_tabls
left join `project-id`.`region-us`.INFORMATION_SCHEMA.TABLES tb
on ref_tabls.table_id = tb.table_name
where tb.table_type = 'MATERIALIZED VIEW'
Related
I have created two views of a table in snowflake database with same select statement, one is a normal view and the other is a materialized view as below,
create view view1
as ( select *
from customer
where name ilike 'a%')
create materialized view view2
as ( select *
from customer
where name ilike 'a%')
Then queried the views as below,
Select *
from view1 ----normal view
Select *
from view2 -----materialized view
(suspended warehouse and resumed to remove any cache before executing above queries individually. I have repeated execution many times in same manner.)
But against expectation, Materialized view is always taking longer than normal view.
Why is this?
It could be a number of things. What I would suggest is here:
Ensure that the result cache is turned off
ALTER SESSION SET USE_CACHED_RESULT = FALSE
Run them in a warehouse that's been turned off for hours - In my experience, restarting the virtual warehouse does not completely delete cached data. Do not run the query while the VW is off, manually turn them on first before running the query to avoid query delays to provision the warehouse.
Run them and check the ff. in QUERY_HISTORY View to get a better idea of what have happened
PERCENTAGE_SCANNED_FROM_CACHE
COMPILATION_TIME
QUEUED_REPAIR_TIME
TRANSACTION_BLOCKED_TIME
EXECUTION_TIME - I believe this holds the actual execution time which excludes the time spent in compilation as opposed to TOTAL_ELAPSED_TIME
QUEUED_OVERLOAD_TIME
Here the QUERY_HISTORY documentation to get more details
You might also want to check the Query Profile, though I think the query using an MV would show a straightforward single step retrieve but would still be worth checking to compare and understand both queries.
I build statistical output generated on-demand from data stored in BigQuery tables. Some data is imported daily via stitch using "Append-Only". This results in duplicated observations in the imported tables (around 20kk rows growing 8kk yearly).
I could either schedule a BigQuery query to store deduplicated values in a cleaned table, or build views to do the same, but I don't understand the tradeoffs in terms of:
costs on BigQuery for storing/running scheduled queries and views.
speed of later queries dependent on deduplicated views. Do the views cache?
Am I correct to assume that daily scheduled queries to store deduplicated data is more costly (for re-writing stored tables) but speeds up later queries to the deduplicated data (saving on costs for usage) ?
The deduplicated data will namely in turn be queried hundreds of times daily to produce dashboard output for which responsiveness is a concern.
How should I argue when deciding for the better solution?
Lets go to the facts:
The price you will pay in the query is the same regardless you are using a View or a Scheduled Query
When using a Scheduled Query, you will need to pay for the data you store in the de-duplicated table. As a View will not store any data, you will not have extra charges.
In terms of speed, using the Scheduled Query approach wins because you have your data already de-duplicated and cleaned. If you are going to feed dashboards with this data, the View approach can lead to laziness in the dashboard loading.
Another possible approach for you is using Materialized Views, which are smarter Views that periodically cache results in order to improve performance. In this guide you can find some information about choosing between Scheduled Queries and Materialized Views:
When should I use scheduled queries versus materialized views?
Scheduled queries are a convenient way to run arbitrarily complex
calculations periodically. Each time the query runs, it is being run
fully. The previous results are not used, and you pay the full price
for the query. Scheduled queries are great when you don't need the
freshest data and you have a high tolerance for data staleness.
Materialized views are suited for when you need to query the latest
data while cutting down latency and cost by reusing the previously
computed result. You can use materialized views as pseudo-indexes,
accelerating queries to the base table without updating any existing
workflows.
As a general guideline, whenever possible and if you are not running
arbitrarily complex calculations, use materialized views.
I think it might also be affected by how often the your view / table would be queried.
For example - a very complex query over a large dataset will be costly every time it's run. If the result is a significantly smaller dataset, it will be more cost-effective to schedule a query to save the results, and query the results directly - rather than using a view which will perform the very complex query time and time again.
For the speed factor - it definitely is better to query a reduced table directly and not a view.
For the cost factor - I would try to understand how often this view/table will be queried and how are the processing+storage costs for it:
For a view: roughly calculate the processing costs * amount of times it will be queried monthly, for example
For a stored table: scheduled queries performed per month * processing costs + monthly storage costs for the table results
This should give you pretty much the entire case you need to build in order to argue for your solution.
I have 14 tables in BQ, which are updated several times a day.
Via JOIN of three of them, I have created a new one.
My question is, would this new table be updated each time new data are pushed into BQ tables on which it is based on? If not, is there way, how to make this JOIN "live" so the newly created table will be updated automatically?
Thank you!
BigQuery also supports views, virtual tables defined by a SQL query.
BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query.
BigQuery supports up to eight levels of nested views.
You can create a view or materialized view so that your required data setup gets updated instantly but this queries underlying tables so beware of joining massive tables.
For more complex table sync from/to BQ and other Apps (two-way sync), I finally used https://www.stacksync.cloud/
It offers real-time update and eventually two-way sync. Check it out too for the less technical folks!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
What is Materialized view in Oracle? What is the use of it? I searched this topic on the net but i cannot able to get an clear idea of it. So can you please explain this topic with an clear example. So that it will be more useful for me to understand the topic clearly.
A Materialized view is an RDMS provided mechanism to trade additional storage consumption for better query performance.
For example suppose that you have a really big query with 10 table joins that takes a long time to return data. If you convert the query into a materialized view the results of this query will be materialized into a special db table on disk automatically, whats even better is that as rows are added/updated/deleted they are automatically reflected in the materialized view.
The tradeoff of this handy tool is slower inserts and updates on underlying tables though. Materialized views is one of the few redeeming qualities of Oracle IMHO.
Here is an example of a two table join MATERIALIZED VIEW.
CREATE MATERIALIZED VIEW MV_Test
NOLOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT V.*, P.*, V.ROWID as V_ROWID, P.ROWID as P_ROWID, (2+1) as Calc1, 'Constant1' as Const1
FROM TPM_PROJECTVERSION V
INNER JOIN TPM_PROJECT P ON P.PROJECTID = V.PROJECTID
Now instead of running this same query everytime you can just run this simpler query against your new view which will run faster. The really cool thing is that you can also add derived and calculated columns too.
SELECT * FROM MV_Test WHERE ...
P.S.
MATERIALIZED VIEWS are not a panacea, use them in cases where you have a really slow query with lots of joins that is frequently used and where the reads far outweigh the writes.
A (nearly) real-world example.
Suppose you were asked to develop an enterprise-wide real-time inventory report that will output total worth of inventory across all warehouses of the enterprise.
You would then need to create a query to
sum up all transaction stored in the inventory transaction table grouped by item and warehouse
join the sums with the table storing current price per unit of measure
sum up again per warehouse
In an enterprise, such a query would take hours to complete (even medium companies may have hundreds of thousands of different items) and its performance would deteriorate over time (imagine this query running over 5 years of data).
So, you would write the same (more or less) query as a materialized view. When created, oracle will populate a table (think of it as a hidden table) with the results of your query, and then, each time a transaction is commited to the inventory, will update the record that has to do with this specific item. If an item's price has changed, it will update its worth. In general, every change on the underlying tables will be reflected on your materialized view immediately. Then, your report will run at a very reasonable time.
On top of that, by using GROUPING BY and GROUPING you may get different levels of drilling on the same Materialized View.
Keep in mind, though, that this is an idealized example. In practice, ON COMMIT (i.e. updating the matierialized view the same time with your underlying tables) may cause problems when you create a materialized view over frequently update tables (and inventory transactions are usually that) and you may write, depending on the case, intermediate MVs to boost up performance. Refreshing such a view every 5 minutes is a viable alternative.
MVs are a very powerful feature, but you need to use them with care.
What is a Materialized View?
A materialized view is a replica of a target master from a single
point in time. The master can be either a master table at a master
site or a master materialized view at a materialized view site.
Whereas in multimaster replication tables are continuously updated by
other master sites, materialized views are updated from one or more
masters through individual batch updates, known as a refreshes, from a
single master site or master materialized view site, as illustrated in
Figure 3-1. The arrows in Figure 3-1 represent database links.
--http://docs.oracle.com/cd/A97630_01/server.920/a96567/repmview.htm
Example-- creates a materialized view for employees table. The refresh can be set to preference, so read documentation in link above.
CREATE MATERIALIZED VIEW employee_mv
REFRESH FORCE
BUILD IMMEDIATE
ON DEMAND
AS
SELECT * FROM employees
A materialized view can also only contain a subset of data
CREATE MATERIALIZED VIEW employee_mv
REFRESH FORCE
BUILD IMMEDIATE
ON DEMAND
AS
SELECT name, ssn, address FROM employees
I have a system that has a materialized view that contains roughly 1 billion items, on a consistent two hour basis I need to update about 200 million (20% of the records). My question is what should the refresh strategy on my materialized view be? As of right now it is refresh with an interval. I am curious as to the performance impacts between refreshing on an interval vice refresh never and rename/replace the old materialized view with the new one. The underlying issue is the indices that are used by Oracle which creates a massive amount of redo. Any suggestions are appreciated.
UPDATE
Since some people seem to think this is off topic my current view point is to do the following:
Create an Oracle Schedule Chain that invokes a series of PL/SQL (programming language I promise) functions to refresh materialized view in a pseudo-parallel fashion. However, being as though I fell into the position of a DBA of sorts, I am looking to solve a data problem with an algorithm and/or some code.
Ok so here is the solution I came up with, your mileage may vary and any feedback is appreciated after the fact. The overall strategy was to do the following:
1) Utilize the Oracle Scheduler making use of parallel execution of chains (jobs)
2) Utilize views (the regular kind) as the interface from the application into the database
3) Rely on materialized views to be built in the following manner
create materialized view foo
parallel
nologging
never refresh
as
select statement
as needed use the following:
create index baz on foo(bar) nologging
The advantage of this is that we can build the materialized view in the background before dropping + recreating the view as described in step 2. Now the advantage is creating dynamically named materialized views, while keeping the view with the same name. The key is to not blow away the original materialized view until the new one is finished. This also allows for quick drops, as there is minimum redo to care about. This enabled materialized view creation on ~1 billion records in 5 minutes which met our requirement of "refreshes" every thirty minutes. Further this is able to be handled on a single database node, so even with constrained hardware, it is possible.
Here is a PL/SQL function that will create it for you:
CREATE OR REPLACE procedure foo_bar as
foo_view varchar2(500) := 'foo_'|| to_char(sysdate,'dd_MON_yyyy_hh_mi_ss');
BEGIN
execute immediate
'Create materialized view '|| foo_view || '
parallel
nologging
never refresh
as
select * from cats';
END foo_bar;