This is My View:
CREATE VIEW [Products].[VProductFull] AS
SELECT
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
[Mc].[Title] AS [MajorCategory]
FROM [Products].[Product] AS [Pr]
INNER JOIN [Products].[Category] AS [Cat] ON [Pr].[Category_Id] = [Cat].[Id]
INNER JOIN [Products].[MajorCategory] AS [Mc] ON [Cat].[MajorCategory_Id] = [Mc].[Id];
GO
And I need an SP To get VProductFull By MajorCategoryId there are 2 SPs, first one select use joins like create view and second one use own view:
-- SP#1
CREATE PROCEDURE [Products].[GetFullProductByMajorCategory](
#MajorCategoryid [bigint]
)
AS
BEGIN
BEGIN TRANSACTION [FullProductByMajor]
SELECT
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
[Mc].[Title] AS [MajorCategory]
FROM [Products].[Product] AS [Pr]
INNER JOIN [Products].[Category] AS [Cat] ON [Pr].[Category_Id] = [Cat].[Id]
INNER JOIN [Products].[MajorCategory] AS [Mc] ON [Cat].[MajorCategory_Id] = [Mc].[Id]
WHERE [Mc].[Id] = #MajorCategoryid;
COMMIT TRANSACTION [FullProductByMajor]
END
GO
And
-- SP#2
CREATE PROCEDURE [Products].[GetFullProductByMajorCategory](
#MajorCategoryid [bigint]
)
AS
BEGIN
BEGIN TRANSACTION [FullProductByMajor]
SELECT
[VPF].[Id],
[VPF].[Title],
[VPF].[IsScanAllowed],
[VPF].[Category_Id],
[VPF].[Category],
[VPF].[MajorCategory_Id],
[VPF].[MajorCategory]
FROM [Products].[VProductFull] AS [VPF]
WHERE [VPF].[MajorCategory_Id] = #MajorCategoryid;
COMMIT TRANSACTION [FullProductByMajor]
END
GO
Which of the above SPs return faster and have better performance? and why? And is there another way to get VProductFull by MajorCategoryId faster with better performance rather than above SPs?
Both have the same execution times and there will be no difference between them. At runtime the view will just be expanded to its underlying query. You can see this for yourself by looking at the query plans for both versions.
To optimize, you need to make VProductFull an indexed view. Meaning a materialized view. Then, when selecting from it use the NOEXPAND hint. If you want to extend your knowledge about indexed views, you can read more here.
If you do not want to use an indexed view, then use a non-indexed view but make sure you create two non-clustered indexes on the two tables: on [Pr].[Category_Id] and on [Cat].[MajorCategory_Id].
You need these indexes in order to avoid clustered index scans, and use the much faster index seek plan operators.
For the first index you should include the following columns in the index (as included columns, not index columns): [Title], [IsScanAllowed], [Id]. For the second index you should include column [Cat].[Title].
I think you can experiment with both and then compare the query plans as well as execution times (with SET STATISTICS TIME ON.
My bet is that the indexed view will be faster but, if you have large base tables, the indexed view will have an impact on inserts/updates in the base tables. So you may want a trade-off in order to get balanced performance in all situations.
For reference, and for whoever will read this question, please post the current execution plans and times and the ones after you apply each modification. That's if this is not too much trouble.
Why are you using Transactions in Select Statements?
Have you considered using SQL Profiler? That means you can check the Reads and Duration for a particular Query.
Do you have Indexes on the column being used in the Where clause?
The above mentioned View should be Indexed View.
Try below Query directly in your Stored Proc and compare the Reads and Duration in SQL Profiler
Select
K.[MajorCategory],
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
From
(
Select [Title] AS [MajorCategory], [Id]
From [Products].[MajorCategory]
WHERE [Id] = #MajorCategoryid;
)K
INNER JOIN [Products].[Category] AS [Cat] ON [Cat].[MajorCategory_Id] = K.[Id]
INNER JOIN [Products].[Product] AS [Pr] ON [Pr].[Category_Id] = [Cat].[Id]
In this suggestion, as per my understanding, the Table Scan will not take place for all Matching records on the basis of Category and Product Table(This is as per the query in the View). It will instead depends upon merely the single record of MajorCategory Table.(This is as per my suggestion)
Related
I've created a stored procedure that selects from a few different tables in different dbs into temp tables to work with in the proc. Everything seems to be fine with filling the tables. My problem arises when I attempt to join them. This is the query I wrote:
SELECT #temp1.id,
#temp2.first_name,
#temp2.last_name,
#temp3.DOB,
#temp4.Sex,
#temp5.SSN
FROM (((#temp1
LEFT JOIN #temp3 ON #temp1.id = #temp3.id)
LEFT JOIN #temp4 ON #temp1.id = #temp4.id)
LEFT JOIN #temp2 ON #temp1.id = #temp2.id)
LEFT JOIN #temp5 ON #temp1.id = #temp5.id;
The query works to an extent. The output window is filled with the results of the select. The problem is that the query doesn't exit. It stops adding new records to the output but continues executing and thus the procedure hangs because it can't move on to the next statement. Any ideas?
I can see that your query is resulting in high table scan for all the 5 tables in the execution plan. You can create indexes on the joining column (ID) in all the 5 temp tables as follows:
CREATE CLUSTERED INDEX IDX_C_t1_ID ON #temp1(ID)
CREATE CLUSTERED INDEX IDX_C_t2_ID ON #temp2(ID)
CREATE CLUSTERED INDEX IDX_C_t3_ID ON #temp3(ID)
CREATE CLUSTERED INDEX IDX_C_t4_ID ON #temp4(ID)
CREATE CLUSTERED INDEX IDX_C_t5_ID ON #temp5(ID)
It'd be really helpful if you can include number of rows and columns for all the 5 tables.
How many records are in #temp1? Are you seeing that same number of records in your output window? If not, then your query is still returning records.
Also, did you explicitly begin a transaction somewhere in your script? Maybe it is waiting for you to commit or rollback.
Also, I'm not sure why you have parenthesis around your joins. They aren't needed (and maybe you are getting some weird execution plan by using them).
The running time of this query depends (among else) on the Number of rows in #temp1 and on the indexes that it has (by default --> nothing ).
First step would be as I see it:
Isolating the execution plan, specifically for this query. You can do so, by filling your data into temp1 (and not #temp1),
Then - try running only the query against temp1.
Seeing the execution plan for the query, may recommend which indexes might help, and will be a first step towards optimizing it.
I'm creating a view which contains subquery as specified witht he following SQL query on SQL Server 2012.
CREATE VIEW [dbo].[VIEW_Detail] WITH SCHEMABINDING
AS
SELECT a.ID, a.Name1, a.Name2,
STUFF
((SELECT CAST(',' AS varchar(max)) + t .Name1
FROM dbo.Synonyms AS s
INNER JOIN dbo.Details AS t ON s.SynonymTSN = t .TSN
WHERE s.oID= a.ID FOR XML PATH('')), 1, 1, '') AS Synonym
FROM a.Details
WHERE (a.Rank <= 100)
Since the definition contains a subquery I'm not able to create an Indexed view. Will it be faster to use a query instead of the view to retrieve data if my tables are indexed.Or will an unindexed view will still perform better than using a query. The view currently contains more than 50,000 rows. What other query optimizations can I use?
PS: I don't care about performance on insert/update
A view is simply a single SELECT statement saved using a name (i.e View Name), There is no performance benefit using view over an ad-hoc query.
Yes Indexed views can increase the performance but they come with a longgggggggg list of limitations. As they are materialized and Also other queries which are not calling this indexed view but can benefit from indexes defined on this view will make use of these indexes.
In your case you have a sub-query and also using FOR XML clause, they both are not allowed inside an indexed view.
To optimize you query you need to look at the execution plan first and see if query is doing table or Clustered Index scans. Try adding some indexes and try to get a seek instead of a scan.
Looking at this query I think if you have Indexes on TSN, ID and RANK columns of dbo.Details table and SynonymTSN column of dbo.Synonyms table, it can improve the performance of this query.
On a side note 50,000 rows isn't really a big number of rows as long as you have primary keys defined on these two tables, you should get a reasonable performance with is pretty simple query.
For the DB gurus out there, I was wondering if there is any functional/performance difference between Joining to the results a SELECT statement and Joining to a previously filled table variable. I'm working in SQL Server 2008 R2.
Example (TSQL):
-- Create a test table
DROP TABLE [dbo].[TestTable]
CREATE TABLE [dbo].[TestTable](
[id] [int] NOT NULL,
[value] [varchar](max) NULL
) ON [PRIMARY]
-- Populate the test table with a few rows
INSERT INTO [dbo].[TestTable]
SELECT 1123, 'test1'
INSERT INTO [dbo].[TestTable]
SELECT 2234, 'test2'
INSERT INTO [dbo].[TestTable]
SELECT 3345, 'test3'
-- Create a reference table
DROP TABLE [dbo].[TestRefTable]
CREATE TABLE [dbo].[TestRefTable](
[id] [int] NOT NULL,
[refvalue] [varchar](max) NULL
) ON [PRIMARY]
-- Populate the reference table with a few rows
INSERT INTO [dbo].[TestRefTable]
SELECT 1123, 'ref1'
INSERT INTO [dbo].[TestRefTable]
SELECT 2234, 'ref2'
-- Scenario 1: Insert matching results into it's own table variable, then Join
-- Create a table variable
DECLARE #subset TABLE ([id] INT NOT NULL, [refvalue] VARCHAR(MAX))
INSERT INTO #subset
SELECT * FROM [dbo].[TestRefTable]
WHERE [dbo].[TestRefTable].[id] = 1123
SELECT t.*, s.*
FROM [dbo].[TestTable] t
JOIN #subset s
ON t.id = s.id
-- Scenario 2: Join directly to SELECT results
SELECT t.*, s.*
FROM [dbo].TestTable t
JOIN (SELECT * FROM [dbo].[TestRefTable] WHERE id = 1123) s
ON t.id = s.id
In the "real" world, the tables and table variable are pre-defined. What I'm looking at is being able to have the matched reference rows available for further operations, but I'm concerned that the extra steps will slow the query down. Are there technical reasons as to why one would be faster than the other? What sort of performance difference may be seen between the two approaches? I realize it is difficult (if not impossible) to give a definitive answer, just looking for some advice for this scenario.
The database engine has an optimizer to figure out the best way to execute a query. There is more under the hood than you probably imagine. For instance, when SQL Server is doing a join, it has a choice of at least four join algorithms:
Nested Loop
Index Lookup
Merge Join
Hash Join
(not to mention the multi-threaded versions of these.)
It is not important that you understand how each of these works. You just need to understand two things: different algorithms are best under different circumstances and SQL Server does its best to choose the best algorithm.
The choice of join algorithm is only one thing the optimizer does. It also has to figure out the ordering of the joins, the best way to aggregate results, whether a sort is needed for an order by, how to access the data (via indexes or directly), and much more.
When you break the query apart, you are making an assumption about optimization. In your case, you are making the assumption that the first best thing is to do a select on a particular table. You might be right. If so, your result with multiple queries should be about as fast as using a single query. Well, maybe not. When in a single query, SQL Server does not have to buffer all the results at once; it can stream results from one place to another. It may also be able to take advantage of parallelism in a way that splitting the query prevents.
In general, the SQL Server optimizer is pretty good, so you are best letting the optimizer do the query all in one go. There are definitely exceptions, where the optimizer may not choose the best execution path. Sometimes fixing this is as easy as being sure that statistics are up-to-date on tables. Other times, you can add optimizer hints. And other times you can restructure the query, as you have done.
For instance, one place where loading data into a local table is useful is when the table comes from a different server. The optimizer may not have full information about the size of the table to make the best decisions.
In other words, keep the query as one statement. If you need to improve it, then focus on optimization after it works. You generally won't have to spend much time on optimization, because the engine is pretty good at it.
This would give the same result?
SELECT t.*, s.*
FROM dbo.TestTable AS t
JOIN dbo.TestRefTable AS s ON t.id = s.id AND s.id = 1123
Basically, this is a cross join of all records from TestTable and TestRefTable with id = 1123.
Joining to table variables will also result in bad cardinality estimates by the optimizer. Table variables are always assumed by the optimizer to contain only a single row. The more rows it actually has the worse that estimate becomes. This causes the optimizer to assume the wrong number of rows for the table itself, but in other places, for operators that might then join to that result, it can result in wrong estimations of the number executions for that operation.
Personally I think Table parameters should be used for getting data into and out of the server conveniently using client apps (C# .Net apps make good use of them), or for passing data between Stored Procs, but should not be used too much within the proc itself. The importance of getting rid of them within the Proc code itself increases with the expected number of rows to be carried by the parameter.
Sub Selects will perform better, or immediately copying into a temp table will work well. There is overhead for copying into the temp table, but again, the more rows you have the more worth it that overhead becomes because the estimates by the optimizer get worse and worse.
In general a derived table in the query is probably going to be faster than joining to a table variable because it can make use of indexes and they are not available in table variables. However, temp tables can also have indexes creted and that might solve the potential performance difference.
Also if the number of table variable records is expected to be small, then indexes won't make a great deal of difference anyway and so there would be little or no differnce.
As alawys you need to test on your own system as number of records and table design and index design havea great deal to do with what works best.
I'd expect the direct Table join to be faster than the Table to TableVariable, and use less resources.
So I have a legacy database with table structure like this (simplified)
Create Table Transaction
{
TransactionId INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
ReplacesTransactionId INT
..
..
}
So I want to create an indexed view such that the following example would return only the second role (because it replaces the first one)
Insert Into Transaction (TransactionId, ReplacesTransactionId, ..) Values (1,0 ..)
Insert Into Transaction (TransactionId, ReplacesTransactionId, ..) Values (2,1 ..)
There are a number of ways of creating this query but I would like to create an indexed view which means I cannot use Subqueries, Left joins or Excepts. An example query (using LEFT JOIN) could be.
SELECT trans1.* FROM Transaction trans1
LEFT JOIN Transaction trans2 on trans1.TransactionId = trans2.ReplacesTransactionId
Where trans2.TransacationId IS NULL
Clearly I'm stuck with the structure of the database and am looking to improve performance of the application using the data.
Any suggestions?
What you have here is essentially a hierarchical dataset in which you want to pre-traverse the hierarchy and store the result in an indexed view, but AFAIK, indexed views do not support that.
On the other hand, this may not be the only angle of attack to your larger goal of improving performance. First, the most obvious question: can we assume that TransactionId is clustered and ReplacesTransactionId is indexed? If not, those would be my first two changes. If the indexing is already good, then the next step would be to look at the query plan of your left join and see if anything leaps out.
In general terms (not having seen the query plan): one possible approach could be to try and convert your SELECT statement to a "covered query" (see https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/). This would most likely entail some combination of:
Reducing the number of columns in the SELECT statement (replacing SELECT *)
Adding a few "included" columns to the index on ReplacesTransactionId (either in SSMS or using the INCLUDES clause of CREATE INDEX).
Good luck!
I have a contracts table that is large and that we have many stored procedures that query for contracts with a status of Open. Less than 10% of the contracts are open and this number is shrinking as the DB grows. I thought I could create an Indexed view of the open contracts in order to speed up some of our queries. The problem is that the status is not on the contract table and I need a subquery to retrieve the data I want. (SQL Server then does a clustered index scan on the whole table in the queries I have looked at)
Here is the condensed version of the view (I removed the 30 other columns from the contract table)
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [dbo].[vw_OpenContractsIndexed]
WITH SCHEMABINDING
AS
SELECT c.ContractID
FROM dbo.NMPT_Contract AS c INNER JOIN
dbo.NMPT_ContractStatus AS cs ON c.ContractID = cs.ContractID AND cs.ContractStatusCreated =
(SELECT MAX(ContractStatusCreated) AS Expr1
FROM dbo.NMPT_ContractStatus AS cs2
WHERE (ContractID = c.ContractID)) INNER JOIN
dbo.CMSS_Status AS s ON cs.StatusID = s.StatusID
WHERE (s.StatusCode = 'OPN')
If I try to create an index on the view (unique clustered on contractid) I get the following
Creation Failed for Index
It contains one or more disallowed constructs. (Microsoft SQL Server, Error 1936)
From what I can gather it is the Max in the subquery that is the problem??
Other than putting the status on the contracts table (where I personally think it belongs) are there any suggestions for optimising this situation. Failing that will other versions of SQL Server allow this indexed view?
From TechNet regarding Indexed Views in SS 2000:
There are several restrictions on the syntax of the view definition.
The view definition must not contain the following:
COUNT(*)
ROWSET function
Derived table
self-join
DISTINCT
STDEV, VARIANCE, AVG
Float*, text, ntext, image columns
Subquery
full-text predicates (CONTAIN, FREETEXT)
SUM on nullable expression
MIN, MAX
TOP
OUTER join
UNION
You're using MAX, and a subquery, both of which are not allowed.
To get advice on how to get around this, you need to share some data and what you are trying to do.
It is not a "View" solution and will require more work to accomplish, but you can create denormalized table which will hold the result of the view. This way, all reads for Open contracts can go against that table. This will be the fastest, but will require maintenance of the new table.
Creating an indexed view is quiet difficuylt task as it has so many restirction and one is related to self join as well. You have self join here.No other views etc.
Other thing for these kind of master tables if you are using just a single status like 'OPEN' in you case I would suggest that instead of joining the table (master table with status code) just declare statusid variable and then store the value for status OPEN there and then use that value in final query. This will avoid extra join with a master table.
I would suggest that you store the data for open status in temp table before joining with contract table in final statement. You can have an index on statusid,customerid and contractcreationdate. Then force this index to get the contractId into a temp table like
select contractid into #temp from NMPT_ContractStatus
where statusid =#statusid group by contractid
having datefield = max(datefield)
Now join this temp table with the Contract table.
But before creating any kind of indexes make sure that the overhead of these are much less than the benefits you are getting.