Snowflake Create Materilaized view on self Join - sql

I am trying to create a materialized view on table which will have latest data.
The query looks liks this
Create Materialized view t1_latest as
select c1,c2,dt from t1
join
(select max(dt) maxdt from t1) t2
ON t1.dt = t2.maxdt
dt being the date field.
Now as we know Materialized view does not allow subquery or window function. Is there a way to rewrite the query to create the Materialized view with latest date. latest date cannot be considered as current_date or hardcoded.
Another approach is to create a view with join and then create the Materialized view on top of that. But the problem there is we will loose the advantage of Materialized view being calculated before hand.
Any suggestion.

You'd actually gain exactly the same amount of performance gain if your table was clustered by dt and you just put a standard view over it with the same logic you have above. This will prune the table in the same manner as creating a materialized view would do.
If the table is already clustered on something else, then creating a simple materialized view with the dt as your cluster key will provide the same benefits, but also have the benefits of the ability to simply query the base table and have the query optimizer assist with choosing the best pruning option:
https://docs.snowflake.com/en/user-guide/views-materialized.html#how-the-query-optimizer-uses-materialized-views
Edited per comment:
Not sure I understand why you can't use a task/stream, but what if you did something a bit different with task & stream.
Create a stream...when the stream has data, you could execute a task that executes a stored procedure.
The Stored Procedure would evaluate the table to see if there were
more than 1 distinct value for dt. If there is, it would delete the older data from the table.
In Snowflake, the delete would essentially just be a metadata operation, since all of the micropartitions would contain the same date based on how you describe the data being loaded.

Related

The view with UNION is failing to return the result of the first part of the query when the dblink fails

I have one view containing two tables with the same information. The only difference is that the first one is a subset of the second one.
The first table named SMALL_TABLE with 1-month of data to allow various queries to run quicker and another table named BIG_TABLE with more than 6 months of data in another database.
I use a db_link in the view from the DB where the small table is located to join both tables with UNION operator to return any results where the date range is greater than one month.
This has worked perfectly so far. But the issue is that when the second DB is not available, the view as well fails to return even these data from the first table. How can I still get results from the first part of the view for specific queries with a date range of less than one month despite the unavailability of the second DB for certain reasons?
SELECT COL1 DATEFIELD1, COL2 ALIASFIELD2, COL3 ALIASFIELD3
FROM SMALL_TABLE
WHERE DATEFIELD < TRUNC(SYSDATE)-31
UNION
SELECT COL1 DATEFIELD1, COL2 ALIASFIELD2, COL3 ALIASFIELD3
FROM BIG_TABLE#DBLINK_MAINDB
WHERE DATEFIELD >= TRUNC(SYSDATE)-31
Switching from a view to a materialized view could be one option. Unlike an "ordinary" view, which is just a stored query and doesn't contain any data itself, a materialized view actually contains data and acts as if it was a table; for example, you can create indexes on it.
Benefit is that - even if database link is down for some reason, materialized view will still hold data and let you perform queries.
How does it get refreshed? It depends on you; schedule it to e.g. at 02:00 when nobody is working and it'll be ready in the morning. Or, do it twice a day. Or, refresh it on demand (you'd explicitly perform refresh). Or, do it when commit happens in underlying table(s). Those options are described in documentation; once you figure out which one to use, see how to enforce it.
Furthermore, materialized view would select only from the BIG_TABLE (as it contains all data; you said that SMALL_TABLE is its subset) and you'd avoid UNION which can be expensive as it eliminates duplicate rows.
In its simplest way, you'd
create materialized view mv_big_table as
select col1 datefield1,
col2 aliasfield2,
col3 aliasfield3
from big_table#dblink_maindb;

postgresql query returned successfully with no result while creating view

I have created a view from table as
CREATE VIEW dp_val_view
AS
select dp_id,dp_id,dp_s,dp_n,dp_ord,id,answer,date,eny_date
from
(
select select dp_id,dp_id,dp_s,dp_n,dp_ord,id,answer,date,eny_date,row_number(*)
over (partition by dp_id ,dp_ord ,id order by eny_date desc ) as rn
from values
) dt
where rn < 2
view created successfull and i got as query returned successfully with no result...After that when i am trying to access table data it is daying refreshing table please .Does it mean value from table is getting inserted into view in background ?
A view is only a "shortcut" for a query. No data is stored in a (regular) view and no data is "inserted" when you run create view.
And because create view only stores the query in the database, it does not "return" any results just like a create table does not return any results from that table.
When you select form the view, the database runs the underlying query. It's exactly the same as if you had run the query directly.
Views are just wrappers, or in other terms we can say alias for the longer query.
Nothing gets inserted into the view, when you query the view, internally it will execute the select query which you have specified.
A good explanation is in the manual http://www.postgresql.org/docs/9.2/static/sql-createview.html
The view is not physically materialized. Instead, the query is run every time the view is referenced in a query.

Indexing views with a CTE

So, I just found out that SQL Server 2008 doesn't let you index a view with a CTE in the definition, but it allows you to alter the query to add with schemabinding in the view definition. Is there a good reason for this? Does it make sense for some reason I am unaware of? I was under the impression that WITH SCHEMABINDINGs main purpose was to allow you to index a view
new and improved with more query action
;with x
as
(
select rx.pat_id
,rx.drug_class
,count(*) as counts
from rx
group by rx.pat_id,rx.drug_class
)
select x.pat_id
,x.drug_class
,x.counts
,SUM(c.std_cost) as [Healthcare Costs]
from x
inner join claims as c
on claims.pat_id=x.pat_id
group by x.pat_id,x.drug_class,x.counts
And the code to create the index
create unique clustered index [TestIndexName] on [dbo].[MyView]
( pat_id asc, drug_class asc, counts asc)
You can't index a view with a CTE. Even though the view can have SCHEMABINDING. Think of it this way. In order to index a view, it must meet two conditions (and many others): (a) that it has been created WITH SCHEMABINDING and (b) that it does not contain a CTE. In order to schemabind a view, it does not need to meet the condition that it does not contain a CTE.
I'm not convinced there is a scenario where a view has a CTE and will benefit from being indexed. This is peripheral to your actual question, but my instinct is that you are trying to index this view to magically make it faster. An indexed view isn't necessarily going to be any faster than a query against the base tables - there are restrictions for a reason, and there are only particular use cases where they make sense. Please be careful to not just blindly index all of your views as a magic "go faster" button. Also remember that an indexed view requires maintenance. So it will increase the cost of any and all DML operations in your workload that affect the base table(s).
Schemabinding is not just for indexing views. It can also be used
on things like UDFs to help persuade determinism, can be used on
views and functions to prevent changes to the underlying schema, and
in some cases it can improve performance (for example, when a UDF is
not schema-bound, the optimizer may have to create a table spool to
handle any underlying DDL changes). So please don't think that it is
weird that you can schema-bind a view but you can't index it.
Indexing a view requires it, but the relationship is not mutual.
For your specific scenario, I recommend this:
CREATE VIEW dbo.PatClassCounts
WITH SCHEMABINDING
AS
SELECT pat_id, drug_class,
COUNT_BIG(*) AS counts
FROM dbo.rx
GROUP BY pat_id, drug_class;
GO
CREATE UNIQUE CLUSTERED INDEX ON dbo.PatClassCounts(pat_id, drug_class);
GO
CREATE VIEW dbo.ClaimSums
WITH SCHEMABINDING
AS
SELECT pat_id,
SUM(c.std_cost) AS [Healthcare Costs],
COUNT_BIG(*) AS counts
FROM dbo.claims
GROUP BY pat_id;
GO
CREATE UNIQUE CLUSTERED INDEX ON dbo.ClaimSums(pat_id);
GO
Now you can create a non-indexed view that just does a join between these two indexed views, and it will utilize the indexes (you may have to use NOEXPAND on a lower edition, not sure):
CREATE VIEW dbo.OriginalViewName
WITH SCHEMABINDING
AS
SELECT p.pat_id, p.drug_class, p.counts, c.[Healthcare Costs]
FROM dbo.PatClassCounts AS p
INNER JOIN dbo.ClaimSums AS c
ON p.pat_id = c.pat_id;
GO
Now, this all assumes that it is worthwhile to pre-aggregate this information - if you run this query infrequently, but the data is modified a lot, it may be better to NOT create indexed views.
Also note that the SUM(std_cost) from the ClaimSums view will be the same for every pat_id + drug_class combination, since it's only aggregated to pat_id. I guess there might be a drug_class in the claims table that should be part of the join criteria too, but I'm not sure. If that is the case, I think this could be collapsed to a single indexed view.

SQL Server 2000 Indexed View with Max Subquery

I have a contracts table that is large and that we have many stored procedures that query for contracts with a status of Open. Less than 10% of the contracts are open and this number is shrinking as the DB grows. I thought I could create an Indexed view of the open contracts in order to speed up some of our queries. The problem is that the status is not on the contract table and I need a subquery to retrieve the data I want. (SQL Server then does a clustered index scan on the whole table in the queries I have looked at)
Here is the condensed version of the view (I removed the 30 other columns from the contract table)
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [dbo].[vw_OpenContractsIndexed]
WITH SCHEMABINDING
AS
SELECT c.ContractID
FROM dbo.NMPT_Contract AS c INNER JOIN
dbo.NMPT_ContractStatus AS cs ON c.ContractID = cs.ContractID AND cs.ContractStatusCreated =
(SELECT MAX(ContractStatusCreated) AS Expr1
FROM dbo.NMPT_ContractStatus AS cs2
WHERE (ContractID = c.ContractID)) INNER JOIN
dbo.CMSS_Status AS s ON cs.StatusID = s.StatusID
WHERE (s.StatusCode = 'OPN')
If I try to create an index on the view (unique clustered on contractid) I get the following
Creation Failed for Index
It contains one or more disallowed constructs. (Microsoft SQL Server, Error 1936)
From what I can gather it is the Max in the subquery that is the problem??
Other than putting the status on the contracts table (where I personally think it belongs) are there any suggestions for optimising this situation. Failing that will other versions of SQL Server allow this indexed view?
From TechNet regarding Indexed Views in SS 2000:
There are several restrictions on the syntax of the view definition.
The view definition must not contain the following:
COUNT(*)
ROWSET function
Derived table
self-join
DISTINCT
STDEV, VARIANCE, AVG
Float*, text, ntext, image columns
Subquery
full-text predicates (CONTAIN, FREETEXT)
SUM on nullable expression
MIN, MAX
TOP
OUTER join
UNION
You're using MAX, and a subquery, both of which are not allowed.
To get advice on how to get around this, you need to share some data and what you are trying to do.
It is not a "View" solution and will require more work to accomplish, but you can create denormalized table which will hold the result of the view. This way, all reads for Open contracts can go against that table. This will be the fastest, but will require maintenance of the new table.
Creating an indexed view is quiet difficuylt task as it has so many restirction and one is related to self join as well. You have self join here.No other views etc.
Other thing for these kind of master tables if you are using just a single status like 'OPEN' in you case I would suggest that instead of joining the table (master table with status code) just declare statusid variable and then store the value for status OPEN there and then use that value in final query. This will avoid extra join with a master table.
I would suggest that you store the data for open status in temp table before joining with contract table in final statement. You can have an index on statusid,customerid and contractcreationdate. Then force this index to get the contractId into a temp table like
select contractid into #temp from NMPT_ContractStatus
where statusid =#statusid group by contractid
having datefield = max(datefield)
Now join this temp table with the Contract table.
But before creating any kind of indexes make sure that the overhead of these are much less than the benefits you are getting.

Query only the required mview from the union

I have materialized view manually partitioned by month
create materialized view MV_MVNAME_201001 as
select MONTH_ID, AMOUNT, NAME
from TABLE_NAME_201001
201002, 201003, ..., 201112, 2012, 2009
Now i need query from these views, take only the required views.
Is it possible, without involving the client side?
example query
select AMOUNT, NAME
from (
--union all mview
)
where month_id >= 201003
and month_id < 201101
should look only to the MV_MVNAME_201003 .. MV_MVNAME_201012
The materialized view is "materialized". It is a phisical table with data within it.
The query that produce the materialized view is used only when you refresh data, not on querys.
Oracle doesn't know where data came from(in your case - a union from several distinct tables), unless you specify it somehow, for example - a column.
But in your specific case you have the column month_id, on witch you can partition the table.
When you specify the month or range of months, it will scan only the correspondent partitions.
UDPATE:
Now I understand better your question, but I cannot give you a better answer. Your question has nothing to do with mviews. Mviews can be tables. Your problem is the same. You want to select only from some tables, dynamic. For this was created partitioning. Probably old dogs know a trick...