Indexing views with a CTE - sql

So, I just found out that SQL Server 2008 doesn't let you index a view with a CTE in the definition, but it allows you to alter the query to add with schemabinding in the view definition. Is there a good reason for this? Does it make sense for some reason I am unaware of? I was under the impression that WITH SCHEMABINDINGs main purpose was to allow you to index a view
new and improved with more query action
;with x
as
(
select rx.pat_id
,rx.drug_class
,count(*) as counts
from rx
group by rx.pat_id,rx.drug_class
)
select x.pat_id
,x.drug_class
,x.counts
,SUM(c.std_cost) as [Healthcare Costs]
from x
inner join claims as c
on claims.pat_id=x.pat_id
group by x.pat_id,x.drug_class,x.counts
And the code to create the index
create unique clustered index [TestIndexName] on [dbo].[MyView]
( pat_id asc, drug_class asc, counts asc)

You can't index a view with a CTE. Even though the view can have SCHEMABINDING. Think of it this way. In order to index a view, it must meet two conditions (and many others): (a) that it has been created WITH SCHEMABINDING and (b) that it does not contain a CTE. In order to schemabind a view, it does not need to meet the condition that it does not contain a CTE.
I'm not convinced there is a scenario where a view has a CTE and will benefit from being indexed. This is peripheral to your actual question, but my instinct is that you are trying to index this view to magically make it faster. An indexed view isn't necessarily going to be any faster than a query against the base tables - there are restrictions for a reason, and there are only particular use cases where they make sense. Please be careful to not just blindly index all of your views as a magic "go faster" button. Also remember that an indexed view requires maintenance. So it will increase the cost of any and all DML operations in your workload that affect the base table(s).
Schemabinding is not just for indexing views. It can also be used
on things like UDFs to help persuade determinism, can be used on
views and functions to prevent changes to the underlying schema, and
in some cases it can improve performance (for example, when a UDF is
not schema-bound, the optimizer may have to create a table spool to
handle any underlying DDL changes). So please don't think that it is
weird that you can schema-bind a view but you can't index it.
Indexing a view requires it, but the relationship is not mutual.
For your specific scenario, I recommend this:
CREATE VIEW dbo.PatClassCounts
WITH SCHEMABINDING
AS
SELECT pat_id, drug_class,
COUNT_BIG(*) AS counts
FROM dbo.rx
GROUP BY pat_id, drug_class;
GO
CREATE UNIQUE CLUSTERED INDEX ON dbo.PatClassCounts(pat_id, drug_class);
GO
CREATE VIEW dbo.ClaimSums
WITH SCHEMABINDING
AS
SELECT pat_id,
SUM(c.std_cost) AS [Healthcare Costs],
COUNT_BIG(*) AS counts
FROM dbo.claims
GROUP BY pat_id;
GO
CREATE UNIQUE CLUSTERED INDEX ON dbo.ClaimSums(pat_id);
GO
Now you can create a non-indexed view that just does a join between these two indexed views, and it will utilize the indexes (you may have to use NOEXPAND on a lower edition, not sure):
CREATE VIEW dbo.OriginalViewName
WITH SCHEMABINDING
AS
SELECT p.pat_id, p.drug_class, p.counts, c.[Healthcare Costs]
FROM dbo.PatClassCounts AS p
INNER JOIN dbo.ClaimSums AS c
ON p.pat_id = c.pat_id;
GO
Now, this all assumes that it is worthwhile to pre-aggregate this information - if you run this query infrequently, but the data is modified a lot, it may be better to NOT create indexed views.
Also note that the SUM(std_cost) from the ClaimSums view will be the same for every pat_id + drug_class combination, since it's only aggregated to pat_id. I guess there might be a drug_class in the claims table that should be part of the join criteria too, but I'm not sure. If that is the case, I think this could be collapsed to a single indexed view.

Related

Snowflake Create Materilaized view on self Join

I am trying to create a materialized view on table which will have latest data.
The query looks liks this
Create Materialized view t1_latest as
select c1,c2,dt from t1
join
(select max(dt) maxdt from t1) t2
ON t1.dt = t2.maxdt
dt being the date field.
Now as we know Materialized view does not allow subquery or window function. Is there a way to rewrite the query to create the Materialized view with latest date. latest date cannot be considered as current_date or hardcoded.
Another approach is to create a view with join and then create the Materialized view on top of that. But the problem there is we will loose the advantage of Materialized view being calculated before hand.
Any suggestion.
You'd actually gain exactly the same amount of performance gain if your table was clustered by dt and you just put a standard view over it with the same logic you have above. This will prune the table in the same manner as creating a materialized view would do.
If the table is already clustered on something else, then creating a simple materialized view with the dt as your cluster key will provide the same benefits, but also have the benefits of the ability to simply query the base table and have the query optimizer assist with choosing the best pruning option:
https://docs.snowflake.com/en/user-guide/views-materialized.html#how-the-query-optimizer-uses-materialized-views
Edited per comment:
Not sure I understand why you can't use a task/stream, but what if you did something a bit different with task & stream.
Create a stream...when the stream has data, you could execute a task that executes a stored procedure.
The Stored Procedure would evaluate the table to see if there were
more than 1 distinct value for dt. If there is, it would delete the older data from the table.
In Snowflake, the delete would essentially just be a metadata operation, since all of the micropartitions would contain the same date based on how you describe the data being loaded.

Unindexed views v/s Queries

I'm creating a view which contains subquery as specified witht he following SQL query on SQL Server 2012.
CREATE VIEW [dbo].[VIEW_Detail] WITH SCHEMABINDING
AS
SELECT a.ID, a.Name1, a.Name2,
STUFF
((SELECT CAST(',' AS varchar(max)) + t .Name1
FROM dbo.Synonyms AS s
INNER JOIN dbo.Details AS t ON s.SynonymTSN = t .TSN
WHERE s.oID= a.ID FOR XML PATH('')), 1, 1, '') AS Synonym
FROM a.Details
WHERE (a.Rank <= 100)
Since the definition contains a subquery I'm not able to create an Indexed view. Will it be faster to use a query instead of the view to retrieve data if my tables are indexed.Or will an unindexed view will still perform better than using a query. The view currently contains more than 50,000 rows. What other query optimizations can I use?
PS: I don't care about performance on insert/update
A view is simply a single SELECT statement saved using a name (i.e View Name), There is no performance benefit using view over an ad-hoc query.
Yes Indexed views can increase the performance but they come with a longgggggggg list of limitations. As they are materialized and Also other queries which are not calling this indexed view but can benefit from indexes defined on this view will make use of these indexes.
In your case you have a sub-query and also using FOR XML clause, they both are not allowed inside an indexed view.
To optimize you query you need to look at the execution plan first and see if query is doing table or Clustered Index scans. Try adding some indexes and try to get a seek instead of a scan.
Looking at this query I think if you have Indexes on TSN, ID and RANK columns of dbo.Details table and SynonymTSN column of dbo.Synonyms table, it can improve the performance of this query.
On a side note 50,000 rows isn't really a big number of rows as long as you have primary keys defined on these two tables, you should get a reasonable performance with is pretty simple query.

Indexed View looking for null references without INNER JOIN or Subquery

So I have a legacy database with table structure like this (simplified)
Create Table Transaction
{
TransactionId INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
ReplacesTransactionId INT
..
..
}
So I want to create an indexed view such that the following example would return only the second role (because it replaces the first one)
Insert Into Transaction (TransactionId, ReplacesTransactionId, ..) Values (1,0 ..)
Insert Into Transaction (TransactionId, ReplacesTransactionId, ..) Values (2,1 ..)
There are a number of ways of creating this query but I would like to create an indexed view which means I cannot use Subqueries, Left joins or Excepts. An example query (using LEFT JOIN) could be.
SELECT trans1.* FROM Transaction trans1
LEFT JOIN Transaction trans2 on trans1.TransactionId = trans2.ReplacesTransactionId
Where trans2.TransacationId IS NULL
Clearly I'm stuck with the structure of the database and am looking to improve performance of the application using the data.
Any suggestions?
What you have here is essentially a hierarchical dataset in which you want to pre-traverse the hierarchy and store the result in an indexed view, but AFAIK, indexed views do not support that.
On the other hand, this may not be the only angle of attack to your larger goal of improving performance. First, the most obvious question: can we assume that TransactionId is clustered and ReplacesTransactionId is indexed? If not, those would be my first two changes. If the indexing is already good, then the next step would be to look at the query plan of your left join and see if anything leaps out.
In general terms (not having seen the query plan): one possible approach could be to try and convert your SELECT statement to a "covered query" (see https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/). This would most likely entail some combination of:
Reducing the number of columns in the SELECT statement (replacing SELECT *)
Adding a few "included" columns to the index on ReplacesTransactionId (either in SSMS or using the INCLUDES clause of CREATE INDEX).
Good luck!

Fastest and Best Performance in Select Views

This is My View:
CREATE VIEW [Products].[VProductFull] AS
SELECT
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
[Mc].[Title] AS [MajorCategory]
FROM [Products].[Product] AS [Pr]
INNER JOIN [Products].[Category] AS [Cat] ON [Pr].[Category_Id] = [Cat].[Id]
INNER JOIN [Products].[MajorCategory] AS [Mc] ON [Cat].[MajorCategory_Id] = [Mc].[Id];
GO
And I need an SP To get VProductFull By MajorCategoryId there are 2 SPs, first one select use joins like create view and second one use own view:
-- SP#1
CREATE PROCEDURE [Products].[GetFullProductByMajorCategory](
#MajorCategoryid [bigint]
)
AS
BEGIN
BEGIN TRANSACTION [FullProductByMajor]
SELECT
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
[Mc].[Title] AS [MajorCategory]
FROM [Products].[Product] AS [Pr]
INNER JOIN [Products].[Category] AS [Cat] ON [Pr].[Category_Id] = [Cat].[Id]
INNER JOIN [Products].[MajorCategory] AS [Mc] ON [Cat].[MajorCategory_Id] = [Mc].[Id]
WHERE [Mc].[Id] = #MajorCategoryid;
COMMIT TRANSACTION [FullProductByMajor]
END
GO
And
-- SP#2
CREATE PROCEDURE [Products].[GetFullProductByMajorCategory](
#MajorCategoryid [bigint]
)
AS
BEGIN
BEGIN TRANSACTION [FullProductByMajor]
SELECT
[VPF].[Id],
[VPF].[Title],
[VPF].[IsScanAllowed],
[VPF].[Category_Id],
[VPF].[Category],
[VPF].[MajorCategory_Id],
[VPF].[MajorCategory]
FROM [Products].[VProductFull] AS [VPF]
WHERE [VPF].[MajorCategory_Id] = #MajorCategoryid;
COMMIT TRANSACTION [FullProductByMajor]
END
GO
Which of the above SPs return faster and have better performance? and why? And is there another way to get VProductFull by MajorCategoryId faster with better performance rather than above SPs?
Both have the same execution times and there will be no difference between them. At runtime the view will just be expanded to its underlying query. You can see this for yourself by looking at the query plans for both versions.
To optimize, you need to make VProductFull an indexed view. Meaning a materialized view. Then, when selecting from it use the NOEXPAND hint. If you want to extend your knowledge about indexed views, you can read more here.
If you do not want to use an indexed view, then use a non-indexed view but make sure you create two non-clustered indexes on the two tables: on [Pr].[Category_Id] and on [Cat].[MajorCategory_Id].
You need these indexes in order to avoid clustered index scans, and use the much faster index seek plan operators.
For the first index you should include the following columns in the index (as included columns, not index columns): [Title], [IsScanAllowed], [Id]. For the second index you should include column [Cat].[Title].
I think you can experiment with both and then compare the query plans as well as execution times (with SET STATISTICS TIME ON.
My bet is that the indexed view will be faster but, if you have large base tables, the indexed view will have an impact on inserts/updates in the base tables. So you may want a trade-off in order to get balanced performance in all situations.
For reference, and for whoever will read this question, please post the current execution plans and times and the ones after you apply each modification. That's if this is not too much trouble.
Why are you using Transactions in Select Statements?
Have you considered using SQL Profiler? That means you can check the Reads and Duration for a particular Query.
Do you have Indexes on the column being used in the Where clause?
The above mentioned View should be Indexed View.
Try below Query directly in your Stored Proc and compare the Reads and Duration in SQL Profiler
Select
K.[MajorCategory],
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
From
(
Select [Title] AS [MajorCategory], [Id]
From [Products].[MajorCategory]
WHERE [Id] = #MajorCategoryid;
)K
INNER JOIN [Products].[Category] AS [Cat] ON [Cat].[MajorCategory_Id] = K.[Id]
INNER JOIN [Products].[Product] AS [Pr] ON [Pr].[Category_Id] = [Cat].[Id]
In this suggestion, as per my understanding, the Table Scan will not take place for all Matching records on the basis of Category and Product Table(This is as per the query in the View). It will instead depends upon merely the single record of MajorCategory Table.(This is as per my suggestion)

How does SQL Server treat indexes on a table behind a view?

So I'm trying to understand how SQL Server makes use of indexes on tables behind views. Here's the scenario: Table A has a composite clustered index on fields 1 & 2 and a nonclustered index on fields 3 & 4.
View A is written against Table A to filter out additional fields, but fields 1-4 are part of the view. So we write a query that joins the view to another table on the nonclustered index fields.
The resulting query plan hits Table A with a clustered index scan (instead of the expected nonclustered index seek). However, if we replace the view in the FROM clause with the table, the query plan then hits the nonclustered index and we get the index seek we expected.
Shouldn't the SQL engine make use of the index on the table the view is constructed on? Since it doesn't, why not?
When you're thinking of non-materialized views and optimizations -- think of them like this:
The engine is "cutting and pasting" the view text into every query you perform.
OK, that's not exactly 100% true, but it's probably the most helpful way to think of what to expect in terms of performance.
Views can be tricky, though. People tend to think that just because a column is in a view, that it means something significant when it comes to query performance. The truth is, if the query which uses your view doesn't include a set of columns, it can be "optimized away". So if you were to SELECT every column from your base tables in your view, and then you were to only select one or two columns when you actually use the view, the query will be optimized considering only those two columns you select.
Another consequence of this is that you can use views to very aggressively flatten out table structures. So let's say for example I have the following schema:
Widget
-------
ID (UNIQUE)
Name
Price
WidgetTypeID (FK to WidgetType.ID)
WidgetType
----------
ID (UNIQUE)
Name
vw_Widgets
----------
SELECT w.ID, w.Name, w.Price, w.WidgetTypeID, wt.Name AS TypeName
FROM Widgets w
LEFT JOIN WidgetType wt
ON wt.ID = w.WidgetTypeID;
Note the LEFT JOIN in the view definition. If you were to simply SELECT Name, Price FROM vw_Widgets, you'd notice that WidgetType wasn't even involved in the query plan! It's completely optimized away! This works with LEFT JOINS across unique columns because the optimizer knows that since WidgetType's ID is UNIQUE, it won't generate any duplicate rows from the join. And since there's a FK, you know that you can leave the join as a LEFT join because you'll always have a corresponding row.
So the moral of the story here with views is that the columns you select at the end of the day are the ones that matter, not the ones in the view. Views aren't optimized when they're created -- they're optimized when they're used.
Your question isn't really about views
Your question is actually more generic -- why can't you use the NC index? I can't tell you really because I can't see your schema or your specific query, but suffice it to say that at a certain point, the optimizer sees that the cost of looking up the additional fields outweighs what it would have cost to scan the table (because seeks are expensive) and ignores your nonclustered index.