I have a contracts table that is large and that we have many stored procedures that query for contracts with a status of Open. Less than 10% of the contracts are open and this number is shrinking as the DB grows. I thought I could create an Indexed view of the open contracts in order to speed up some of our queries. The problem is that the status is not on the contract table and I need a subquery to retrieve the data I want. (SQL Server then does a clustered index scan on the whole table in the queries I have looked at)
Here is the condensed version of the view (I removed the 30 other columns from the contract table)
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE VIEW [dbo].[vw_OpenContractsIndexed]
WITH SCHEMABINDING
AS
SELECT c.ContractID
FROM dbo.NMPT_Contract AS c INNER JOIN
dbo.NMPT_ContractStatus AS cs ON c.ContractID = cs.ContractID AND cs.ContractStatusCreated =
(SELECT MAX(ContractStatusCreated) AS Expr1
FROM dbo.NMPT_ContractStatus AS cs2
WHERE (ContractID = c.ContractID)) INNER JOIN
dbo.CMSS_Status AS s ON cs.StatusID = s.StatusID
WHERE (s.StatusCode = 'OPN')
If I try to create an index on the view (unique clustered on contractid) I get the following
Creation Failed for Index
It contains one or more disallowed constructs. (Microsoft SQL Server, Error 1936)
From what I can gather it is the Max in the subquery that is the problem??
Other than putting the status on the contracts table (where I personally think it belongs) are there any suggestions for optimising this situation. Failing that will other versions of SQL Server allow this indexed view?
From TechNet regarding Indexed Views in SS 2000:
There are several restrictions on the syntax of the view definition.
The view definition must not contain the following:
COUNT(*)
ROWSET function
Derived table
self-join
DISTINCT
STDEV, VARIANCE, AVG
Float*, text, ntext, image columns
Subquery
full-text predicates (CONTAIN, FREETEXT)
SUM on nullable expression
MIN, MAX
TOP
OUTER join
UNION
You're using MAX, and a subquery, both of which are not allowed.
To get advice on how to get around this, you need to share some data and what you are trying to do.
It is not a "View" solution and will require more work to accomplish, but you can create denormalized table which will hold the result of the view. This way, all reads for Open contracts can go against that table. This will be the fastest, but will require maintenance of the new table.
Creating an indexed view is quiet difficuylt task as it has so many restirction and one is related to self join as well. You have self join here.No other views etc.
Other thing for these kind of master tables if you are using just a single status like 'OPEN' in you case I would suggest that instead of joining the table (master table with status code) just declare statusid variable and then store the value for status OPEN there and then use that value in final query. This will avoid extra join with a master table.
I would suggest that you store the data for open status in temp table before joining with contract table in final statement. You can have an index on statusid,customerid and contractcreationdate. Then force this index to get the contractId into a temp table like
select contractid into #temp from NMPT_ContractStatus
where statusid =#statusid group by contractid
having datefield = max(datefield)
Now join this temp table with the Contract table.
But before creating any kind of indexes make sure that the overhead of these are much less than the benefits you are getting.
Related
Hello I created a view to make a subquery ( select from two tables)
this the sql order :
CREATE OR REPLACE VIEW EMPLOYEER_VIEW
AS
SELECT A.ID,A.FIRST_NAME||' '||A.LAST_NAME AS NAME,B.COMPANY_NAME
FROM EMPLOY A, COMPANY B
WHERE A.COMPANY_ID=B.COMPANY_ID
AND A.DEPARTEMENT !='DEP_004'
ORDER BY A.ID;
If I select data from EMPLOYEER_VIEW the average execution time is 135,953 s
Table EMPLOY contiens 124600329 rows
Table COMPANY contiens 609 rows.
My question is :
How can i make the execution faster ?
I created two indexes:
emply_index (ID,COMPANY_ID,DEPARTEMENT)
and company_index(COMPANY_ID)
Can you help me to make selections run faster ? ( creating another index or change join )
PS: I Can't create a materialized view in this database.
In advance thanks for help.
You have a lot of things to do.
If you must work with a view, and can not create a scheduled job to insert data in a table, I will remove my answer.
VIEWs does not have the scope to support hundred of million data. Is for few million.
INDEXes Must be cleaned when data are inserting. If you insert data with an index the process is 100 times slower. (You can drop and create or update them).
In table company CREATE PARTITION.
If you have a lot of IDs, use RANGE.
If you have around 100 IDs LIST PARTITION.
You do not need Index, because the clause to JOIN does not optimize, INDEXes is specified to strict WHERE Clause.
We had a project with 433.000.000 data, and the only way to works was playing with partitions.
I'm creating a view which contains subquery as specified witht he following SQL query on SQL Server 2012.
CREATE VIEW [dbo].[VIEW_Detail] WITH SCHEMABINDING
AS
SELECT a.ID, a.Name1, a.Name2,
STUFF
((SELECT CAST(',' AS varchar(max)) + t .Name1
FROM dbo.Synonyms AS s
INNER JOIN dbo.Details AS t ON s.SynonymTSN = t .TSN
WHERE s.oID= a.ID FOR XML PATH('')), 1, 1, '') AS Synonym
FROM a.Details
WHERE (a.Rank <= 100)
Since the definition contains a subquery I'm not able to create an Indexed view. Will it be faster to use a query instead of the view to retrieve data if my tables are indexed.Or will an unindexed view will still perform better than using a query. The view currently contains more than 50,000 rows. What other query optimizations can I use?
PS: I don't care about performance on insert/update
A view is simply a single SELECT statement saved using a name (i.e View Name), There is no performance benefit using view over an ad-hoc query.
Yes Indexed views can increase the performance but they come with a longgggggggg list of limitations. As they are materialized and Also other queries which are not calling this indexed view but can benefit from indexes defined on this view will make use of these indexes.
In your case you have a sub-query and also using FOR XML clause, they both are not allowed inside an indexed view.
To optimize you query you need to look at the execution plan first and see if query is doing table or Clustered Index scans. Try adding some indexes and try to get a seek instead of a scan.
Looking at this query I think if you have Indexes on TSN, ID and RANK columns of dbo.Details table and SynonymTSN column of dbo.Synonyms table, it can improve the performance of this query.
On a side note 50,000 rows isn't really a big number of rows as long as you have primary keys defined on these two tables, you should get a reasonable performance with is pretty simple query.
So I have a legacy database with table structure like this (simplified)
Create Table Transaction
{
TransactionId INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
ReplacesTransactionId INT
..
..
}
So I want to create an indexed view such that the following example would return only the second role (because it replaces the first one)
Insert Into Transaction (TransactionId, ReplacesTransactionId, ..) Values (1,0 ..)
Insert Into Transaction (TransactionId, ReplacesTransactionId, ..) Values (2,1 ..)
There are a number of ways of creating this query but I would like to create an indexed view which means I cannot use Subqueries, Left joins or Excepts. An example query (using LEFT JOIN) could be.
SELECT trans1.* FROM Transaction trans1
LEFT JOIN Transaction trans2 on trans1.TransactionId = trans2.ReplacesTransactionId
Where trans2.TransacationId IS NULL
Clearly I'm stuck with the structure of the database and am looking to improve performance of the application using the data.
Any suggestions?
What you have here is essentially a hierarchical dataset in which you want to pre-traverse the hierarchy and store the result in an indexed view, but AFAIK, indexed views do not support that.
On the other hand, this may not be the only angle of attack to your larger goal of improving performance. First, the most obvious question: can we assume that TransactionId is clustered and ReplacesTransactionId is indexed? If not, those would be my first two changes. If the indexing is already good, then the next step would be to look at the query plan of your left join and see if anything leaps out.
In general terms (not having seen the query plan): one possible approach could be to try and convert your SELECT statement to a "covered query" (see https://www.simple-talk.com/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/). This would most likely entail some combination of:
Reducing the number of columns in the SELECT statement (replacing SELECT *)
Adding a few "included" columns to the index on ReplacesTransactionId (either in SSMS or using the INCLUDES clause of CREATE INDEX).
Good luck!
This is My View:
CREATE VIEW [Products].[VProductFull] AS
SELECT
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
[Mc].[Title] AS [MajorCategory]
FROM [Products].[Product] AS [Pr]
INNER JOIN [Products].[Category] AS [Cat] ON [Pr].[Category_Id] = [Cat].[Id]
INNER JOIN [Products].[MajorCategory] AS [Mc] ON [Cat].[MajorCategory_Id] = [Mc].[Id];
GO
And I need an SP To get VProductFull By MajorCategoryId there are 2 SPs, first one select use joins like create view and second one use own view:
-- SP#1
CREATE PROCEDURE [Products].[GetFullProductByMajorCategory](
#MajorCategoryid [bigint]
)
AS
BEGIN
BEGIN TRANSACTION [FullProductByMajor]
SELECT
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
[Mc].[Title] AS [MajorCategory]
FROM [Products].[Product] AS [Pr]
INNER JOIN [Products].[Category] AS [Cat] ON [Pr].[Category_Id] = [Cat].[Id]
INNER JOIN [Products].[MajorCategory] AS [Mc] ON [Cat].[MajorCategory_Id] = [Mc].[Id]
WHERE [Mc].[Id] = #MajorCategoryid;
COMMIT TRANSACTION [FullProductByMajor]
END
GO
And
-- SP#2
CREATE PROCEDURE [Products].[GetFullProductByMajorCategory](
#MajorCategoryid [bigint]
)
AS
BEGIN
BEGIN TRANSACTION [FullProductByMajor]
SELECT
[VPF].[Id],
[VPF].[Title],
[VPF].[IsScanAllowed],
[VPF].[Category_Id],
[VPF].[Category],
[VPF].[MajorCategory_Id],
[VPF].[MajorCategory]
FROM [Products].[VProductFull] AS [VPF]
WHERE [VPF].[MajorCategory_Id] = #MajorCategoryid;
COMMIT TRANSACTION [FullProductByMajor]
END
GO
Which of the above SPs return faster and have better performance? and why? And is there another way to get VProductFull by MajorCategoryId faster with better performance rather than above SPs?
Both have the same execution times and there will be no difference between them. At runtime the view will just be expanded to its underlying query. You can see this for yourself by looking at the query plans for both versions.
To optimize, you need to make VProductFull an indexed view. Meaning a materialized view. Then, when selecting from it use the NOEXPAND hint. If you want to extend your knowledge about indexed views, you can read more here.
If you do not want to use an indexed view, then use a non-indexed view but make sure you create two non-clustered indexes on the two tables: on [Pr].[Category_Id] and on [Cat].[MajorCategory_Id].
You need these indexes in order to avoid clustered index scans, and use the much faster index seek plan operators.
For the first index you should include the following columns in the index (as included columns, not index columns): [Title], [IsScanAllowed], [Id]. For the second index you should include column [Cat].[Title].
I think you can experiment with both and then compare the query plans as well as execution times (with SET STATISTICS TIME ON.
My bet is that the indexed view will be faster but, if you have large base tables, the indexed view will have an impact on inserts/updates in the base tables. So you may want a trade-off in order to get balanced performance in all situations.
For reference, and for whoever will read this question, please post the current execution plans and times and the ones after you apply each modification. That's if this is not too much trouble.
Why are you using Transactions in Select Statements?
Have you considered using SQL Profiler? That means you can check the Reads and Duration for a particular Query.
Do you have Indexes on the column being used in the Where clause?
The above mentioned View should be Indexed View.
Try below Query directly in your Stored Proc and compare the Reads and Duration in SQL Profiler
Select
K.[MajorCategory],
[Pr].[Id],
[Pr].[Title],
[Pr].[IsScanAllowed],
[Pr].[Category_Id],
[Cat].[Title] AS [Category],
[Cat].[MajorCategory_Id],
From
(
Select [Title] AS [MajorCategory], [Id]
From [Products].[MajorCategory]
WHERE [Id] = #MajorCategoryid;
)K
INNER JOIN [Products].[Category] AS [Cat] ON [Cat].[MajorCategory_Id] = K.[Id]
INNER JOIN [Products].[Product] AS [Pr] ON [Pr].[Category_Id] = [Cat].[Id]
In this suggestion, as per my understanding, the Table Scan will not take place for all Matching records on the basis of Category and Product Table(This is as per the query in the View). It will instead depends upon merely the single record of MajorCategory Table.(This is as per my suggestion)
I have a table named Projects that has the following relationships:
has many Contributions
has many Payments
In my result set, I need the following aggregate values:
Number of unique contributors (DonorID on the Contribution table)
Total contributed (SUM of Amount on Contribution table)
Total paid (SUM of PaymentAmount on Payment table)
Because there are so many aggregate functions and multiple joins, it gets messy do use standard aggregate functions the the GROUP BY clause. I also need the ability to sort and filter these fields. So I've come up with two options:
Using subqueries:
SELECT Project.ID AS PROJECT_ID,
(SELECT SUM(PaymentAmount) FROM Payment WHERE ProjectID = PROJECT_ID) AS TotalPaidBack,
(SELECT COUNT(DISTINCT DonorID) FROM Contribution WHERE RecipientID = PROJECT_ID) AS ContributorCount,
(SELECT SUM(Amount) FROM Contribution WHERE RecipientID = PROJECT_ID) AS TotalReceived
FROM Project;
Using a temporary table:
DROP TABLE IF EXISTS Project_Temp;
CREATE TEMPORARY TABLE Project_Temp (project_id INT NOT NULL, total_payments INT, total_donors INT, total_received INT, PRIMARY KEY(project_id)) ENGINE=MEMORY;
INSERT INTO Project_Temp (project_id,total_payments)
SELECT `Project`.ID, IFNULL(SUM(PaymentAmount),0) FROM `Project` LEFT JOIN `Payment` ON ProjectID = `Project`.ID GROUP BY 1;
INSERT INTO Project_Temp (project_id,total_donors,total_received)
SELECT `Project`.ID, IFNULL(COUNT(DISTINCT DonorID),0), IFNULL(SUM(Amount),0) FROM `Project` LEFT JOIN `Contribution` ON RecipientID = `Project`.ID GROUP BY 1
ON DUPLICATE KEY UPDATE total_donors = VALUES(total_donors), total_received = VALUES(total_received);
SELECT * FROM Project_Temp;
Tests for both are pretty comparable, in the 0.7 - 0.8 seconds range with 1,000 rows. But I'm really concerned about scalability, and I don't want to have to re-engineer everything as my tables grow. What's the best approach?
Knowing the timing for each 1K rows is good, but the real question is how they'll be used.
Are you planning to send all these back to a UI? Google doles out results 25 per page; maybe you should, too.
Are you planning to do calculations in the middle tier? Maybe you can do those calculations on the database and save yourself bringing all those bytes across the wire.
My point is that you may never need to work with 1,000 or one million rows if you think carefully about what you do with them.
You can EXPLAIN PLAN to see what the difference between the two queries is.
I would go with the first approach. You are allowing the RDBMS to do it's job, rather than trying to do it's job for it.
By creating a temp table, you will always create the full table for each query. If you only want data for one project, you still end up creating the full table (unless you restrict each INSERT statement accordingly.) Sure, you can code it, but it's already becoming a fair amount code and complexity for a small performance gain.
With a SELECT, the db can fetch the appriate amount of data, optimizing the whole query based on context. If other users have queried the same data, it may even be cached (query, and possibly data, depending upon your db). If performance is truly a concern, you might consider using Indexed/Materialized Views, or generating a table on an INSERT/UPDATE/DELETE trigger. Scaling out, you can use server clusters and partioned views - something that I believe will be difficult if you are creating temporary tables.
EDIT: the above is written without any specific rdbms in mind, although the OP added that mysql is the target db.
There is a third option which is derived tables:
Select Project.ID AS PROJECT_ID
, Payments.Total AS TotalPaidBack
, Coalesce(ContributionStats.DonarCount, 0) As ContributorCount
, ContributionStats.Total As TotalReceived
From Project
Left Join (
Select C1.RecipientId, Sum(C1.Amount) As Total, Count(Distinct C1.DonarId) ContributorCount
From Contribution As C1
Group By C1.RecipientId
) As ContributionStats
On ContributionStats.RecipientId = Project.Project_Id
Left Join (
Select P1.ProjectID, Sum(P1.PaymentAmount) As Total
From Payment As P1
Group By P1.RecipientId
) As Payments
On Payments.ProjectId = Project.Project_Id
I'm not sure if it will perform better, but you might give it shot.
A few thoughts:
The derived table idea would be good on other platforms, but MySQL has the same issue with derived tables that it does with views: they aren't indexed. That means that MySQL will execute the full content of the derived table before applying the WHERE clause, which doesn't scale at all.
Option 1 is good for being compact, but syntax might get tricky when you want to start putting the derived expressions in the WHERE clause.
The suggestion of materialized views is a good one, but MySQL unfortunately doesn't support them. I like the idea of using triggers. You could translate that temporary table into a real table that persists, and then use INSERT/UPDATE/DELETE triggers on the Payments and Contribution tables to update the Project Stats table.
Finally, if you don't want to mess with triggers, and if you aren't too concerned with freshness, you can always have the separate stats table and update it offline, having a cron job that runs every few minutes that does the work that you specified in Query #2 above, except on the real table. Depending on the nuances of your application, this slight delay in updating the stats may or may not be acceptable to your users.