Just keen to see a working example of how to partition a large table of (150 million rows with 30 columns), what are the best practices to partition such a big table by date (sample code please)
Also, want to know how are these partitions are merged, switched out, and archived. Any TSQL code based implementation example is much appreciated.
Below how you partition a table by a date field:
CREATE TABLE [dbo].[FactInternetSales]
(
[ProductKey] int NOT NULL
, [OrderDateKey] int NOT NULL
, [CustomerKey] int NOT NULL
, [PromotionKey] int NOT NULL
, [SalesOrderNumber] nvarchar(20) NOT NULL
, [OrderQuantity] smallint NOT NULL
, [UnitPrice] money NOT NULL
, [SalesAmount] money NOT NULL
)
WITH
( CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = HASH([ProductKey])
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101,20010101,20020101
,20030101,20040101,20050101
)
)
)
;
Below is a sample partitioned columnstore table containing one row in each partition:
CREATE TABLE [dbo].[FactInternetSales]
(
[ProductKey] int NOT NULL
, [OrderDateKey] int NOT NULL
, [CustomerKey] int NOT NULL
, [PromotionKey] int NOT NULL
, [SalesOrderNumber] nvarchar(20) NOT NULL
, [OrderQuantity] smallint NOT NULL
, [UnitPrice] money NOT NULL
, [SalesAmount] money NOT NULL
)
WITH
( CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = HASH([ProductKey])
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101
)
)
)
;
INSERT INTO dbo.FactInternetSales
VALUES (1,19990101,1,1,1,1,1,1);
INSERT INTO dbo.FactInternetSales
VALUES (1,20000101,1,1,1,1,1,1);
CREATE STATISTICS Stat_dbo_FactInternetSales_OrderDateKey ON dbo.FactInternetSales(OrderDateKey);
SQL Data Warehouse supports partition splitting, merging, and switching. Each of these functions is excuted using the ALTER TABLE statement.
To create a partitioned table on Azure SQL Data Warehouse from data coming from another table, you can use CTAS as shown below:
CREATE TABLE dbo.FactInternetSales_20000101
WITH ( DISTRIBUTION = HASH(ProductKey)
, CLUSTERED COLUMNSTORE INDEX
, PARTITION ( [OrderDateKey] RANGE RIGHT FOR VALUES
(20000101
)
)
)
AS
SELECT *
FROM FactInternetSales
WHERE 1=2
;
For more information, please visit this documentation.
Related
I have a simple task (they said!), which I need to update the table column using a select statement.
Something like below:
So let's say this table A, I have a bad data on previous pcsProduces column,
right now I want to multiply the cavities and heatcyclecount and then I would like to update the pcsProduces column to a proper value.
The problem is, I have thousands of record, which I really appreciate if someone can help me by showing how to use simple update and select query.
Just fire SQL Update Command like :
update tablename set pcsProduces = cavities * heatcyclecount
You have at least 2 options:
Updating the whole table with an update statement (where clause is optional):
update TbYourTable
set pcsProduces = cavities * heatsOrCyclecount
where pcsProduces != cavities * heatsOrCyclecount
Using a computed column (MS SQL Syntax)
create table [TbYourTable]
(
[Id] int identity(1,1) not null
, [domainS] int not null
, [tStations] int not null
, [itemNo] int not null
, [defaultCavities] int not null
, [missingCavities] int not null
, [cavities] int not null
, [heatsOrCyclecount] int not null
, [shift] nvarchar (max) null
, [pcsProduces] as ([cavities] * [heatsOrCyclecount]) persisted not null -- peristed clause is optional
, constraint [PK_TbYourTable] primary key nonclustered
(
[Id] asc
)
) on [primary];
I have a temp table in SQL 2014. In this table variable I have an identity column declared.
DECLARE #TempTable TABLE
(
ID int IDENTITY(1,1) PRIMARY KEY
, IncidentResolvedOn date
, IncidentCreatedOn date
, IncidentClosedOn date
, TaskAssigned date
, TaskCompleted date
, TaskID float
, IncidentTeamID int
, TotalDaysOutstanding int
, TierInfo varchar(15)
, Task_NoTask varchar(15)
, Tier_2_Days_Outstanding int
, Tier_1_Days_Outstanding int
, DaysToResolve int
, BadDays int
, StartDate date
, EndDate date
)
When I run the rest of the query the ID column sometimes doesn't start with 1, instead it starts with some random number. The code below is what I use to insert into this table variable.
INSERT INTO #TempTable(IncidentResolvedOn, IncidentCreatedOn, IncidentClosedOn,TaskAssigned, TaskCompleted, TaskID, IncidentTeamID )
SELECT [Incident Resolved On]
, [Incident Created On]
, [Incident Closed On]
, [Task Assigned On]
, [Task Completed On]
, [Task ID]
, IncidentTeamID
FROM HEATData
This happens in both a table variable and temp table. I've never seen this happen before. Usually when I use the IDENTITY(1,1) phrase it always starts with 1 no matter how many times I create that table. Any suggestions out there?
I imagine your connection is staying open and thus, your identity isn't resetting for your local variable. Here's an example.
DECLARE #TempTable TABLE
(
ID int IDENTITY(1,1) PRIMARY KEY,
ID2 int)
insert into #TempTable
values
(1),(2)
select * from #TempTable
delete from #TempTable
insert into #TempTable
values
(1),(2)
select * from #TempTable
Now, if you'd wrap this in it's own batch using GO you could see this wouldn't happen. In fact, you have to re-declare your table variable.
DECLARE #TempTable TABLE
(
ID int IDENTITY(1,1) PRIMARY KEY,
ID2 int)
insert into #TempTable
values
(1),(2)
select * from #TempTable
go
DECLARE #TempTable TABLE
(
ID int IDENTITY(1,1) PRIMARY KEY,
ID2 int)
insert into #TempTable
values
(1),(2)
select * from #TempTable
This would be the same for a #TempTable as well if you didn't explicitly drop the #TempTable or the connection remained open. Of course, for actual tables the increment will continue similarly to the first examaple.
I have a query that creates an #TABLE of a population of interest. It's structure is like this:
DECLARE #SepsisTbl TABLE (
PK INT IDENTITY(1, 1) PRIMARY KEY
, Name VARCHAR(500)
, MRN INT
, Account INT
, Age INT -- Age at arrival
, Arrival DATETIME
, Triage_StartDT DATETIME
, Left_ED_DT DATETIME
, Disposition VARCHAR(500)
, Mortality CHAR(1)
);
WITH Patients AS (
SELECT UPPER(Patient) AS [Name]
, MR#
, Account
, DATEDIFF(YEAR, AgeDob, Arrival) AS [Age_at_Arrival]
, Arrival
, Triage_Start
, TimeLeftED
, Disposition
, CASE
WHEN Disposition IN (
'Medical Examiner', 'Morgue'
)
THEN 'Y'
ELSE 'N'
END AS [Mortality]
FROM SMSDSS.c_Wellsoft_Rpt_tbl
WHERE Triage_Start IS NOT NULL
AND (
Diagnosis LIKE '%SEPSIS%'
OR
Diagnosis LIKE '%SEPTIC%'
)
)
INSERT INTO #SepsisTbl
SELECT * FROM Patients
From this point forward I have 5 more queries of the same sort that are looking for different types of orders that I then LEFT OUTER JOIN onto this table. My question is, why does my performance degrade so much when I change the where clause of the tables from this:
AND A.Account IN (
SELECT Account
FROM SMSDSS.c_Wellsoft_Rpt_tbl
WHERE (
Diagnosis LIKE '%SEPSIS%'
OR
Diagnosis LIKE '%SEPTIC%'
)
to this:
AND A.Account IN (
SELECT Account
FROM #SepsisTbl
)
The run time goes from 2.5 minutes to over 10 minutes with still no results. The CTE itself runs as fast as I can press F5.
Thank you,
I suspect that the problem is because the table variable doesn't have an index on Account. If you add an index on Account then I would expect better performance.
See the answer to this question for details on how to add an index: Creating an index on a table variable
I use SQL Server 2012 and I have a large table and I divided my table in some tables like below :
Create Table A2013
(
Id int identity(1,1),
CountA int ,
Name varchar(50),
ADate DATETIME NULL
CHECK (DATEPART(yy, ADate) = 2013)
)
Create Table A2014
(
Id int identity(1,1),
CountA int ,
Name varchar(50),
ADate DATETIME NULL
CHECK (DATEPART(yy, ADate) = 2014)
)
Insert Into A2013 Values ( 102 , 'A','20131011' )
Insert Into A2013 Values (15 , 'B' ,'20130211' )
Insert Into A2013 Values ( 54, 'C' ,'20131211' )
Insert Into A2013 Values ( 54, 'D' ,'20130611' )
Insert Into A2013 Values ( 95, 'E' ,'20130711' )
Insert Into A2013 Values (8754 , 'F' ,'20130310' )
Insert Into A2014 Values ( 102 , 'A','20141011' )
Insert Into A2014 Values (15 , 'B' ,'20140911' )
Insert Into A2014 Values ( 54, 'C' ,'20140711' )
Insert Into A2014 Values ( 54, 'D' ,'20141007' )
Insert Into A2014 Values ( 95, 'E' ,'20140411' )
Insert Into A2014 Values (8754 , 'F' ,'20140611' )
I created a partition view like below:
Create View A
As
Select * From A2013
Union
Select * From A2014
I hope SQL Optimizer use a good plan and use my CHECK constraint definitions to determine which member table contains the rows but it scan two table when run this query :
Select * From A Where A.ADate = '20140611'
I expected that SQL Optimiser do not use table A2013?!?
The CHECK CONSTRAINT expression must be sargable in order for the optimizer to eliminate the unneeded tables in the execution plan. The constraints below avoid applying a function to the column and are sargable:
CREATE TABLE dbo.A2013
(
Id int IDENTITY(1, 1)
, CountA int
, Name varchar(50)
, ADate datetime NULL
CONSTRAINT CK_A2013_ADate
CHECK ( ADate >= '20130101'
AND ADate < '20140101' )
);
CREATE TABLE dbo.A2014
(
Id int IDENTITY(1, 1)
, CountA int
, Name varchar(50)
, ADate datetime NULL
CONSTRAINT CK_A2014_ADate
CHECK ( ADate >= '20140101'
AND ADate < '20150101' )
);
The issue is not whether the expression is sargable. As far as I know, the term "sargable" applies to the use of indexes in queries. The question is whether SQL Server recognizes the where clause as matching the check constraint.
The check constraint you have is:
CHECK (DATEPART(yy, ADate) = 2014)
The where clause is:
Where A.ADate = '20140611'
The problem is that the second is not recognized as a subset of the first. You could fix this by adding redundancy:
Where A.ADate = '20140611' and DATEPART(yy, A.ADate) = 2014
Or, you could fix this by using ranges -- but be careful about data types, because data type conversion can definitely confuse the optimizer. I think the following will work:
CHECK ADate BETWEEN '2014-01-01' and '2014-12-31'
WHERE A.ADate = '2014-06-11'
(The hyphens are optional and can be dropped.)
The documentation (as far as I can tell) is not really explicit about the cause:
The SQL Server query optimizer recognizes that the search condition in
this SELECT statement references only rows in the May1998Sales and
Jun1998Sales tables. Therefore, it limits its search to those tables.
. . .
CHECK constraints are not needed for the partitioned view to return
the correct results. However, if the CHECK constraints have not been
defined, the query optimizer must search all the tables instead of
only those that cover the search condition on the partitioning column.
Without the CHECK constraints, the view operates like any other view
with UNION ALL. The query optimizer cannot make any assumptions about
the values stored in different tables and it cannot skip searching the
tables that participate in the view definition.
I am loading data from a CSV file into a temp staging table and this temp table is being queried a lot. I looked at my execution plan and saw that a lot of the time is spent scanning the temp table.
Is there any way to create index on this table when I SELECT INTO it?
SELECT *
FROM TradeTable.staging.Security s
WHERE (
s.Identifier IS NOT NULL
OR s.ConstituentTicker IS NOT NULL
OR s.CompositeTicker IS NOT NULL
OR s.CUSIP IS NOT NULL
OR s.ISIN IS NOT NULL
OR s.SEDOL IS NOT NULL
OR s.eSignalTicker IS NOT NULL)
The table created by SELECT INTO is always a heap. If you want a PK/Identity column you can either do as you suggest in the comments
CREATE TABLE #T
(
Id INT IDENTITY(1,1) PRIMARY KEY,
/*Other Columns*/
)
INSERT INTO #T
SELECT *
FROM TradeTable.staging.Security
Or avoid the explicit CREATE and need to list all columns out with
SELECT TOP (0) IDENTITY(int,1,1) As Id, *
INTO #T
FROM TradeTable.staging.Security
ALTER TABLE #T ADD PRIMARY KEY(Id)
INSERT INTO #T
SELECT *
FROM TradeTable.staging.Security