Indexed views. Query ignoring view and uses table instead - sql

My task is to optimize this query:
Declare #sumBalance NUMERIC = (select SUM(CURRENT_BALANCE) as Balance
from dbo.ACCOUNT_DETAILS)
select #sumBalance
I've read that the best solution for aggregation functions is using indexed views instead of tables.
I've created the view with SCHEMABINDING:
CREATE VIEW dbo.CURRENT_BALANCE_VIEW
WITH SCHEMABINDING
AS
SELECT id,CURRENT_BALANCE
FROM dbo.ACCOUNT_DETAILS
After that I've created 2 indexes:
The first for ID
CREATE UNIQUE CLUSTERED INDEX index_ID_VIEW ON dbo.View(ID);
The second for CURRENT_BALANCE my second one column
CREATE NONCLUSTERED INDEX index_CURRENT_BALANCE_VIEW
ON dbo.CURRENT_BALANCE_VIEW(ID);
And here I got troubles with new query:
Declare #sumBalance NUMERIC = (select SUM(CURRENT_BALANCE) as Balance
from dbo.CURRENT_BALANCE_VIEW)
select #sumBalance
New query doesn't use view
http://i.stack.imgur.com/jlPEd.png
Somehow my indexes were added to the folder Statistics
Look in another post
I don't understand why I can see index 'index_current_balance' cause there is no such an index in the table
Look in another post
P.S. Already tried create index in the table and it helped. It made query works faster from 0.2 Es.operator cost to 0.009 but anyway it must be faster.
p.s.s Sorry for making you click on the link, my reputation doesn't allow me to past images properly =\
p.s.s.s Working with SQL Server 2014
p.s.s.s.s Just realized that I don't need to sum 0-s. Expected them grom function.
Thanks in advance.

if you use Standard Edition of SQL-Server you have to use the NOEXPAND-Hint in order to use the index of a view.
For example:
SELECT *
FROM dbo.CURRENT_BALANCE_VIEW (NOEXPAND);

This query:
Declare #sumBalance NUMERIC = (select SUM(CURRENT_BALANCE) as Balance
from dbo.ACCOUNT_DETAILS);
select #sumBalance;
is not easy to optimize. The only index that will help it is:
create index idx_account_details_current_balance on account_details(current_balance);
This is a covering index for the query, and can be used for the SUM(). However, the index still needs to be scanned to do the SUM(). Scanning the index should be faster than scanning the table because it is likely to be much smaller.
SQL Server 2012+ has a facility called columnstore indexes that would have the same effect.
The advice for using indexed views for aggregation functions doesn't seem like good advice. For instance, if the above query used MIN() or MAX(), then the above index should be the optimal index for the query, and it should run quite fast.
EDIT:
Your reference article is quite reasonable. If you want to create an indexed view for this purpose, then create it with aggregation.
CREATE VIEW dbo.CURRENT_BALANCE_VIEW
WITH SCHEMABINDING
AS
SELECT SUM(CURRENT_BALANCE) as bal, COUNT_BIG(CURRENT_BALANCE) as cnt
FROM dbo.ACCOUNT_DETAILS;
This is a little weird, because it returns one row. I think the following will work:
create index idx_account_details on current_balance_view(bal);
If not, you may need to introduce a dummy column for the index.
Then:
select *
from dbo.current_balance_view;
should have the precomputed value.

Related

Fastest Way To Get Count From A Table With Conditions?

I am using sql server 2017 and EF Core 2.2. One of my tables right now has 5 million records in it.
I want to group all these records by "CategoryId" and then have a count for each one.
I also need to filter out with a where clause.
However even if I write the query in Sql it still takes around a minute to get me these numbers.
This is way too slow and I need something faster.
select CategoryId, count(*) from Items where Deleted = 'False'
group by CategoryId
I am guessing that EF core probably won't have a solution that will be fast enough so I am open to using ado.net if needed. I just need something that is fast.
Consider creating an indexed view to materialize the aggregation:
CREATE VIEW dbo.ItemCategory
WITH SCHEMABINDING
AS
SELECT CategoryId, COUNT_BIG(*) AS CountBig
FROM dbo.Items
WHERE Deleted = 'False'
GROUP BY CategoryId;
GO
CREATE UNIQUE CLUSTERED INDEX cdx_ItemCategory
ON dbo.ItemCategory (CategoryId);
GO
Using this view for the aggregated result will improve performance significantly:
SELECT CategoryId, CountBig
FROM dbo.ItemCategory;
Depending on your SQL Server edition, you may need to specify the NOEXPAND hint for the view index to be used:
SELECT CategoryId, CountBig
FROM dbo.ItemCategory WITH (NOEXPAND);
You better add indexes on "deleted" and categoryid.
Or put all deleted items on a separate table
You should have a covering index for your query to make it go fast, other than this there is no shortcut to get performance out of it as your query will need to read every page from the table to count the category ID.
I have a table with 5 million rows almost 4.7 million rows are set to Delete = False, without the covering index, my query takes about 12 seconds and execution plan looks like this.
Once I create the following covering index on my table the query is executed in less than a second and the execution plan looks exactly the same but it is doing a seek on the nonclustered index rather than doing a scan on the clustered index:
Index Definition:
CREATE NONCLUSTERED INDEX [Test_Index]
ON [dbo].[Test] ([IsDeleted])
INCLUDE ([CategoryId])
With this covering Index SQL Server will only need to look into the index and return the results rather than looking into your whole table.
If you really want to speed up this query then there is another very specific way to speed up this query by creating a filtered index specifically for your query;
Index definition would be:
CREATE NONCLUSTERED INDEX [Test_Index2]
ON [dbo].[Test] ([CategoryId])
WHERE IsDeleted = 'False'
With this filtered index my query was pretty instant, I didnt set IO time on my query but I would see a few milliseconds. The execution plan slightly changed with this index.

Improve the performance of a query on a view which references external tables

I have a view which looks like this:
CREATE VIEW My_View AS
SELECT * FROM My_Table UNION
SELECT * FROM My_External_Table
What I have found is that performance is very slow when ordering the data which I need to do for pagination. For example the following query takes almost 2 minutes despite only returning 20 rows:
SELECT * FROM My_View
ORDER BY My_Column
OFFSET 20 ROWS FETCH NEXT 20 ROWS ONLY
In contrast the following (useless) query takes less than 2 seconds:
SELECT * FROM My_View
ORDER BY GETDATE()
OFFSET 20 ROWS FETCH NEXT 20 ROWS ONLY
I cannot add indexes to the view as it is not SCHEMABOUND and I cannot make it SCHEMABOUND as it references an external table.
Is there any way I can improve the performance of the query or otherwise get the desired result. All the databases involved are AzureSQL.
If all items are unique in My_table and My_external_table using OUTER UNION would help you to improve the performance.
And adding an index to table would help to run your query faster.
You can't really get around the order by so I don't think there is anything you can do.
I'm a bit surprised the order by getdate() works, because ordering by a constant does not usually work. I imagine it is equivalent to order by (select null) and no ordering takes place.
My recommendation? You probably need to replicate the external table on the local system and have a process to create a new local table. That sounds complicated, but you may be able to do it using a materialized view. However this works with the "external" table depends on what you mean by "external".
Note that you will also want an index on my_column to avoid the sort.

Unindexed views v/s Queries

I'm creating a view which contains subquery as specified witht he following SQL query on SQL Server 2012.
CREATE VIEW [dbo].[VIEW_Detail] WITH SCHEMABINDING
AS
SELECT a.ID, a.Name1, a.Name2,
STUFF
((SELECT CAST(',' AS varchar(max)) + t .Name1
FROM dbo.Synonyms AS s
INNER JOIN dbo.Details AS t ON s.SynonymTSN = t .TSN
WHERE s.oID= a.ID FOR XML PATH('')), 1, 1, '') AS Synonym
FROM a.Details
WHERE (a.Rank <= 100)
Since the definition contains a subquery I'm not able to create an Indexed view. Will it be faster to use a query instead of the view to retrieve data if my tables are indexed.Or will an unindexed view will still perform better than using a query. The view currently contains more than 50,000 rows. What other query optimizations can I use?
PS: I don't care about performance on insert/update
A view is simply a single SELECT statement saved using a name (i.e View Name), There is no performance benefit using view over an ad-hoc query.
Yes Indexed views can increase the performance but they come with a longgggggggg list of limitations. As they are materialized and Also other queries which are not calling this indexed view but can benefit from indexes defined on this view will make use of these indexes.
In your case you have a sub-query and also using FOR XML clause, they both are not allowed inside an indexed view.
To optimize you query you need to look at the execution plan first and see if query is doing table or Clustered Index scans. Try adding some indexes and try to get a seek instead of a scan.
Looking at this query I think if you have Indexes on TSN, ID and RANK columns of dbo.Details table and SynonymTSN column of dbo.Synonyms table, it can improve the performance of this query.
On a side note 50,000 rows isn't really a big number of rows as long as you have primary keys defined on these two tables, you should get a reasonable performance with is pretty simple query.

Not able to create indexed view

consider this sql
CREATE VIEW [dbo].[MyView1] ([ID],[VisitDate],[StartDate] ,[EndDate],[MyCount])
WITH SCHEMABINDING
AS
SELECT id, VisitDate,dateadd(dd,-10,VisitDate),dateadd(dd,10,VisitDate),
count_BIG(*)as MyCount
FROM dbo.Visits2
group by id,VisitDate
I am trying to create a clustered index on this view on id,VisitDate.I am getting the following error.
Cannot create the clustered index 'IX_!!' on view 'CI_DB.dbo.MyView4'
because the select list of the view contains an expression on result of
aggregate function or grouping column.
Consider removing expression on result of aggregate function or
grouping column from select list.
This is a known issue since 2006.
If you have an aggregation in an indexed view, and both a field and an expression applied to the field are in the GROUP BY (which I'm assuming you just left out of your sample code), the engine won't allow you to create it.
There are some workarounds but they aren't very straightforward. Basically you need to fool the engine into thinking the fields are different.
That's a pretty clear error message, your view is not one that can be indexed. THere are many many conditions on what kinds of views can be indexed.
Changed the sql to
CREATE VIEW [dbo].[MyView2] ([ID],[VisitDate],[StartDate] ,[EndDate],[MyCount])
WITH SCHEMABINDING
AS
SELECT id, VisitDate,
dateadd(day,duration,VisitDate) startdate
,dateadd(day,duration,VisitDate) enddate,
count_BIG(*)as MyCount
FROM dbo.Visits3
group by id,VisitDate,dateadd(day,duration,VisitDate),dateadd(day,duration,VisitDate)
GO
Seems like you cant specify a direct value like 10 inside the function and group by clause.Now it works!!

How to speed up SQL query with group by statement + max function?

I have a table with millions of rows, and I need to do LOTS of queries which look something like:
select max(date_field)
where varchar_field1 = 'something'
group by varchar_field2;
My questions are:
Is there a way to create an index to help with this query?
What (other) options do I have to enhance performance of this query?
An index on (varchar_field1, varchar_field2, date_field) would be of most use. The database can use the first index field for the where clause, the second for the group by, and the third to calculate the maximum date. It can complete the entire query just using that index, without looking up rows in the table.
Obviously, an index on varchar_field1 will help a lot.
You can create yourself an extra table with the columns
varchar_field1 (unique index)
max_date_field
You can set up triggers on inserts, updates, and deletes on the table you're searching that will maintain this little table -- whenever a row is added or changed, set a row in this table.
We've had good success with performance improvement using this refactoring technique. In our case it was made simpler because we never delete rows from the table until they're so old that nobody ever looks up the max field. This is an especially helpful technique if you can add max_date_field to some other table rather than create a new one.