I have a table in my DB which contains 5 million records:
CREATE TABLE [dbo].[PurchaseFact](
[Branch] [int] NOT NULL,
[ProdAnal] [varchar](30) NULL,
[Account] [varchar](12) NULL,
[Partno] [varchar](24) NULL,
[DteGRN] [date] NULL,
[DteAct] [date] NULL,
[DteExpect] [date] NULL,
[OrderNo] [bigint] NULL,
[GRNNO] [varchar](75) NULL,
[SuppAdv] [varchar](75) NULL,
[Supplier] [varchar](12) NULL,
[OrdType] [varchar](4) NULL,
[UnitStock] [varchar](4) NULL,
[OrderQty] [float] NULL,
[RecdQty] [float] NULL,
[Batch] [varchar](100) NULL,
[CostPr] [float] NULL,
[Reason] [varchar](2) NULL,
[TotalCost] [float] NULL,
[Magic] [bigint] IDENTITY(1,1) NOT NULL,
PRIMARY KEY CLUSTERED
(
[Magic] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
As you can see from the above - a CLUSTERED INDEX is being used on the MAGIC column which is a UNIQUE column.
Data retrieval time for the following SELECT statement is well over 8mins which causes reporting issues:
SELECT Branch,
Supplier,
ProdAnal,
DteGRN AS Date,
PartNo AS Partno,
OrderNo,
OrderQty,
TotalCost,
CostPr
FROM dbo.PurchaseFact src
WHERE YEAR(DteGRN) = 2016
Excluding the WHERE clause also doesn't make the query run any faster.
I have tried, together with the CLUSTERED index to include a UNIQUE index in the hopes that it would run faster but to no avail :
CREATE UNIQUE INDEX Unique_Index ON dbo.PurchaseFact ([Branch], [Supplier], [Magic])
INCLUDE ([ProdAnal], [Account], [Partno], [DteAct], [DteExpect], [OrderNo], [GRNNO],
[SuppAdv], [OrdType], [UnitStock])
Is there any way I can optimize performance time on this table or should I resort to archiving old data?
Any advice would be greatly appreciated.
This is your where clause:
WHERE YEAR(DteGRN) = 2016
If the table has 5 million rows, then this is going to return a lot of data, assuming any reasonable distribution of dates. The volume of data is probably responsible for the length of time for the query.
One thing you can do is to rephrase the WHERE and then put an index on the appropriate column:
WHERE DteGRN >= '2016-01-01' and DteGRN < '2017-01-01'
This can then take advantage of an index on PurchaseFact(DteGRN). However, given the likely number of rows being returned, the index probably will not help very much.
The bigger question is why your reporting application is bringing back all the rows from 2016, rather than summarizing them inside the database. I suspect you have an architecture issue with the reporting application.
Sorry, can't add comments.
To help improve performance further (if you can live with the UPDATE overhead), create a COVERING INDEX that INCLUDES only the columns in the SELECT part of the query.
Related
I have a table in SQL Server which has millions of records.
I was trying to do a select by passing id in the where condition, like this:
select id,processid value
from table1
where processid= 5
It's returning around 1 million records and took around 25 minutes to execute.
There is one index on the table. Do I need to create a separate non clustered index?
please see my table script
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [Schema1].[Table1](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[ProcessId] [bigint] NOT NULL,
[Amount2] [decimal](21, 6) NULL,
[Amount1] [decimal](21, 6) NULL,
[Amount3] [decimal](21, 6) NULL,
[Amount4] [decimal](21, 6) NULL,
[CreatedById] [int] NOT NULL,
[CreatedDate] [datetime] NOT NULL,
[UpdatedById] [int] NULL,
[UpdatedDate] [datetime] NULL,
[IsActive] [bit] NOT NULL,
[IsDeleted] [bit] NOT NULL,
CONSTRAINT [PK_Schema1_Table1] PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Table1_INDEX_FG]
) ON [Allocation_DATA_FG]
GO
ALTER TABLE [Schema1].[Table1] WITH CHECK ADD CONSTRAINT [CHK_Table1ComputeNode] CHECK ((([ProcessId]%(3)+(1))=(2)))
GO
ALTER TABLE [Schema1].[Table1] CHECK CONSTRAINT [CHK_Table1ComputeNode]
GO
I have to do performance improvement in a stored procedure which do have lots of joins with table. But this selection itself taking too much time.
Please give suggestions to improve the performance
Optimal variant:
Since id is ever increasing identity field, change the primary key from non-clustered to clustered. Also this will prevent fragmentation.
Add non-clustered index on processid. It will be covered for your query. Which will make it ideal in terms of select performance
Less optimal:
In case if the primary key cannot be adjusted:
CREATE INDEX ProcessID on table1(ProcessID) INCLUDE (id)
At first, if you just need to extract list of IDs for certain ProcessId and you do not want to create clustered index by ID you can create clustered index on "processid" column. If you do not want to create clustered index at all, create index by ProcessId with included ID:
CREATE INDEX IDX_Table1 ON [Table1](processid) INCLUDE ([Id])
At second, your query can't return millions of records, because accordingly to CHECK CONSTRAINT [CHK_Table1ComputeNode] processid can only have values: 1,4,7,...
I have a table in a SQL Server 2012 instance, like so
CREATE TABLE [dbo].[Test](
[SampleDateTime] [datetime] NULL,
[Unit ID] [nvarchar](4) NULL,
[WS Avg 2min] [float] NULL,
[WD Avg 2min] [float] NULL,
[WGS 10min] [float] NULL,
[WGD 10min] [float] NULL,
[Air Temp] [float] NULL,
[Rel Humidity] [float] NULL,
[Dew Point] [float] NULL,
[Pyranometer] [float] NULL,
[Quantum] [float] NULL,
[Air Pressure] [float] NULL,
[Snow Level] [float] NULL,
[Rainfall] [float] NULL,
[PW Current] [varchar](10) NULL,
[Visibility] [float] NULL,
[CBase 1] [float] NULL,
[CBase 2] [float] NULL,
[CBase 3] [float] NULL,
[Vert Vis] [float] NULL
) ON [PRIMARY]
connected to MS Access (2010) via an ODBC linked table (SQL Server Native Client 11.0)
When I open the table, I see all of the data
However, when I try a simple query
SELECT dbo_Test.* FROM dbo_Test
WHERE ( (dbo_Test.[Unit ID])="BASE") ;
I am still getting all the rows, not just the rows where [Unit ID] is "BASE"
The same query in SQL Server Mgt. Studio works just fine with only the expected results returned.
I also notice that when sorting the linked table by [Unit ID], it does not sort properly. There will be rows with data just not sorted like I would expect. (See image below, sorted Ascending by [Unit ID])
Is there a way to get this linked table to behave properly?
Your table seems to be lacking a primary key. This can cause all sorts of issues (e.g. you won't be able to write to that linked table). Although I would be surprised if the effect you see is caused by that.
But try adding an IDENTITY column as primary key, and relink the table from Access.
// Edit: too late :)
It turns out the issue was that the table did not have a primary key established. Due to our data, I could not use SampleDateTime as a primary key. So, I created a unique key on SampleDateTime + Unit ID instead
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [KEY_AWAData_DateandUnitID] UNIQUE NONCLUSTERED
(
[SampleDateTime] ASC,
[Unit ID] ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now MS Access is happy to execute queries and sort as expected.
I have stored procedure which execute simple select. Any time I run it manually, it runs under the second. But in production (SQL Azure S2 database) it runs inside scheduled task every 12 ours - so I think it is reasonable to expect it to run every time with "cold" - with no cached data. And the performance is very unpredictable - sometimes it takes 5 second, sometimes 30 and sometimes even 100.
The select is optimized to the maximum (of my knowledge, anyway) - I created filtered index including all the columns returned from SELECT, so the only operation in execution plan is Index scan. There is huge difference between estimated and actual rows:
But overall the query seems pretty lightweight. I do not blame environment (SQL Azure) because there is A LOT of queries executing all the time, and this one is the only one with this performance problem.
Here is XML execution plan for SQL ninjas willing to help : http://pastebin.com/u5GCz0vW
EDIT:
Table structure:
CREATE TABLE [myproject].[Purchase](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProductId] [nvarchar](50) NOT NULL,
[DeviceId] [nvarchar](255) NOT NULL,
[UserId] [nvarchar](255) NOT NULL,
[Receipt] [nvarchar](max) NULL,
[AppVersion] [nvarchar](50) NOT NULL,
[OSType] [tinyint] NOT NULL,
[IP] [nchar](15) NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ValidationState] [smallint] NOT NULL,
[ValidationInfo] [nvarchar](max) NULL,
[ValidationError] [nvarchar](max) NULL,
[ValidatedOn] [datetime] NULL,
[PurchaseId] [nvarchar](255) NULL,
[PurchaseDate] [datetime] NULL,
[ExpirationDate] [datetime] NULL,
CONSTRAINT [PK_Purchase] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
Index definition:
CREATE NONCLUSTERED INDEX [IX_AndroidRevalidationTargets3] ON [myproject].[Purchase]
(
[ExpirationDate] ASC,
[ValidatedOn] ASC
)
INCLUDE ( [ProductId],
[DeviceId],
[UserId],
[Receipt],
[AppVersion],
[OSType],
[IP],
[CreatedOn],
[ValidationState],
[ValidationInfo],
[ValidationError],
[PurchaseId],
[PurchaseDate])
WHERE ([OSType]=(1) AND [ProductId] IS NOT NULL AND [ProductId]<>'trial' AND ([ValidationState] IN ((1), (0), (-2))))
Data can be considered sensitive, so I cant provide sample.
Since your query returns only 1 match, I think you should trim down your index to a bare minimum. You can get the remaining columns via a Key Lookup from the clustered index:
CREATE NONCLUSTERED INDEX [IX_AndroidRevalidationTargets3] ON [myproject].[Purchase]
(
[ExpirationDate] ASC,
[ValidatedOn] ASC
)
WHERE ([OSType]=(1) AND [ProductId] IS NOT NULL AND [ProductId]<>'trial' AND ([ValidationState] IN ((1), (0), (-2))))
This doesn't eliminate the scan, but it makes the index much leaner for a fast read.
Edit: OP stated that the slimmed-down index was ignored by SQL Server. You can force SQL Server to use the filter index:
SELECT *
FROM [myproject].[Purchase] WITH (INDEX(IX_AndroidRevalidationTargets3))
All examples of table partitioning that I have found are quite simple but I need to partition many tables by one criteria.
For example I have tables: Contractors and Products where ContractorId in Products table is a foreign key.
I created function and schema for ContractorId. It works perfectly for Contractors table but when it comes to the Products table...
I have no idea how should I use it because when I try I always got the information: "The filegroup 'PRIMARY' specified for the clustered index 'PK_dbo.Products' was used for table 'dbo.Products' even though partition scheme 'scheme_Contractors' is specified for it". My Products table looks like:
CREATE TABLE [dbo].[Products](
[ProductId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL,
[Amount] [int] NULL,
[Color] [nvarchar](max) NULL,
[Price] [decimal](18, 2) NULL,
[Guarantee] [nvarchar](max) NULL,
[GuaranteeType] [int] NULL,
[AdditionalFeatures] [nvarchar](max) NULL,
[Valid] [bit] NULL,
[ContractorId] [int] NOT NULL,
[ProducerId] [int] NOT NULL,
[ProductCategoryId] [int] NOT NULL,
CONSTRAINT [PK_dbo.Products] PRIMARY KEY ( [ProductId] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] )
GO
ALTER TABLE [dbo].[Products] WITH CHECK ADD CONSTRAINT [FK_dbo.Products_dbo.Contractors_ContractorId] FOREIGN KEY([ContractorId])
REFERENCES [dbo].[Contractors] ([ContractorId])
GO
ALTER TABLE [dbo].[Products] CHECK CONSTRAINT [FK_dbo.Products_dbo.Contractors_ContractorId]
GO
Could anymone tell me please - is it possible to use my schema on ContractorId column and how? Thank you in advance!
In agreement with Dan Guzman, I'd like to point out there should be no [PRIMARY] specification in the table definition.
We use partitioning on large scale. It is very comfortable to partition all tables on the same partitioning scheme, because the SQL engine will use its multi-processor paralellisation capabilities to the full.
When a certain group of partitions is in one database file and another paration in another file you can even become flexible with disc-usage and backups.
So you first need a partition function to define the values of the partitioning scheme:
CREATE PARTITION FUNCTION [ContractorPartitionFunction](int) AS RANGE LEFT
FOR VALUES (contractor1,contractor2,...)
Then you need to create the partition scheme
CREATE PARTITION SCHEME [ContractorPartitionScheme]
AS PARTITION [ContractorPartitionFunction]
TO ([File_001],[File_002],...,[PRIMARY])
Then for all tables and indexes you now create you should remove ON [PRIMARY] from the definitions as the target filegroup, but instead you should use
ON [ContractorPartitionScheme](ContractorId)
So you table definition should now read:
CREATE TABLE [dbo].[Products](
[ProductId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL,
[Amount] [int] NULL,
[Color] [nvarchar](max) NULL,
[Price] [decimal](18, 2) NULL,
[Guarantee] [nvarchar](max) NULL,
[GuaranteeType] [int] NULL,
[AdditionalFeatures] [nvarchar](max) NULL,
[Valid] [bit] NULL,
[ContractorId] [int] NOT NULL,
[ProducerId] [int] NOT NULL,
[ProductCategoryId] [int] NOT NULL)
ON ContractorPartitionScheme(ContractorId)
CREATE UNIQUE NONCLUSTERED INDEX PK_dbo.Products ON Products
(
productId,
ConstructorId
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON ContractorPartitionScheme(ContractorId)
I have a table like this
CREATE TABLE [dbo].[tbl_LandRigs](
[ID] [int] IDENTITY(700000,1) NOT NULL,
[Company] [nvarchar](500) NULL,
[Rig] [nvarchar](500) NULL,
[RigType] [nvarchar](200) NULL,
[DrawWorks] [nvarchar](500) NULL,
[TopDrive] [nvarchar](200) NULL,
[RotaryTable] [nvarchar](500) NULL,
[MudPump] [nvarchar](500) NULL,
[MaxDD] [nvarchar](50) NULL,
[Operator] [nvarchar](500) NULL,
[Country] [nvarchar](200) NULL,
[Location] [nvarchar](500) NULL,
[OPStatus] [nvarchar](200) NULL,
[CreatedDate] [datetime] NULL,
[CreatedByID] [int] NULL,
[CreatedByName] [nvarchar](50) NULL,
CONSTRAINT [PK_tbl_LandRigs] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
And I am trying to get data from MaxDD column in Descending order
SELECT distinct "MaxDD" FROM [tbl_LandRigs] ORDER BY "MaxDD" Desc
But this returns data in following order
According to my calculation 4000 must be the first value followed by others.But this results astonished me.Can any one help me out in this?
You are storing them as text(nvarchar), that's why you get lexographical order. That means every character is compared with each other from left to right. Hence 4000 is "higher" than 30000 (the last zero doesn't matter since the first 4 is already higher than the 3).
So the correct way is to store it as a numeric value. However, that seems to be impossible since you also use values like 16.000 with 4.1/2"DP. Then i would add another column, one for the numeric value you want to order by and the other for the textual representation.
As MaxDD is a varchar, not a number it is sorted in lexicographical order (i.e. ordered by the first character, then second, ...), not numerical. You should convert it to a numerical value
This behaviour is due to the nvarchar type.
Try this:
SELECT distinct "MaxDD" FROM [tbl_LandRigs] ORDER BY CAST ("MaxDD" as Int)