how can I improve performance for this query - sql

I have been battling for ages on how to complete a data task that I have, I have tried doing in c# as I am proficient at this, but its was taking forever!!! so I have decided to do in SQL Server, however it looks like it is still going to take forever, maybe as long as 20 days to complete.
Does anyone know any more efficient ways to write my stored procedure?
ALTER PROCEDURE [dbo].[TotalWins]
AS
BEGIN
SET NOCOUNT ON
DECLARE #meeting_date DATE;
DECLARE #idStore INT;
DECLARE #race_idStore INT;
DECLARE #runner_id INT;
SET #idStore = 0;
SET #race_idStore = -1;
SET #runner_id = 0;
WHILE(#idStore IS NOT NULL)
BEGIN
SET #race_idStore = -1;
SET #runner_id = 0;
SELECT
#idStore = MIN(runners.id)
FROM
dbHorseRacing.dbo.historic_runners AS runners
WHERE
runners.id > #idStore;
IF #idStore IS NOT NULL
BEGIN
SELECT
#runner_id = runners.runner_id, #meeting_date = races.meeting_date
FROM
dbHorseRacing.dbo.historic_runners AS runners
INNER JOIN
dbHorseRacing.dbo.historic_races AS races ON races.race_id = runners.race_id
WHERE
runners.id > #idStore;
INSERT INTO dbHorseRacing.dbo.total_wins
SELECT
#idStore, COUNT(*) AS total_wins
FROM
dbHorseRacing.dbo.historic_runners AS runners
INNER JOIN
dbHorseRacing.dbo.historic_races AS races ON races.race_id = runners.race_id
WHERE
runners.runner_id = #runner_id
AND races.meeting_date < #meeting_date
AND runners.finish_position = 1;
END
END
END
I am updating this with question with the ddl and a data sample for the races and runners table. Sorry that they are quite large...
race sample date:
race_id meeting_id meeting_date course conditions race_name race_abbrev_name race_type_id race_type race_num going direction class draw_advantage num_fences handicap all_weather seller claimer apprentice maiden amateur num_runners num_finishers rating group_race min_age max_age distance_yards added_money official_rating speed_rating private_handicap scheduled_time off_time winning_time_disp winning_time_secs standard_time_disp standard_time_secs loaded_at
-1 2941 2003-07-03 Newbury Arab Race The Emirates Arabian International Arab Race 12 Flat 1 Good Left Handed 1 High numbers best in large fields, especially in very soft ground. NULL 0 0 0 0 0 0 0 8 8 NULL NULL NULL NULL 1320 0 NULL NULL NULL 2003-07-03 18:10:00.000 2003-07-03 00:00:00.000 0:00.00 0 1:14.38 74.379997253418 0x00000000000007DB
race ddl:
[dbo].[historic_races]
(
[race_id] [int] NOT NULL,
[meeting_id] [int] NOT NULL,
[meeting_date] [date] NOT NULL,
[course] [varchar](255) NOT NULL,
[conditions] [varchar](255) NOT NULL,
[race_name] [varchar](255) NOT NULL,
[race_abbrev_name] [varchar](80) NOT NULL,
[race_type_id] [int] NOT NULL,
[race_type] [varchar](80) NOT NULL,
[race_num] [tinyint] NOT NULL,
[going] [varchar](80) NULL,
[direction] [varchar](80) NULL,
[class] [tinyint] NULL,
[draw_advantage] [varchar](255) NULL,
[num_fences] [tinyint] NULL,
[handicap] [tinyint] NULL,
[all_weather] [tinyint] NULL,
[seller] [tinyint] NULL,
[claimer] [tinyint] NULL,
[apprentice] [tinyint] NULL,
[maiden] [tinyint] NULL,
[amateur] [tinyint] NULL,
[num_runners] [tinyint] NULL,
[num_finishers] [tinyint] NULL,
[rating] [int] NULL,
[group_race] [int] NULL,
[min_age] [tinyint] NULL,
[max_age] [tinyint] NULL,
[distance_yards] [int] NULL,
[added_money] [float] NULL,
[official_rating] [int] NULL,
[speed_rating] [int] NULL,
[private_handicap] [int] NULL,
[scheduled_time] [datetime] NULL,
[off_time] [datetime] NULL,
[winning_time_disp] [varchar](20) NULL,
[winning_time_secs] [float] NULL,
[standard_time_disp] [varchar](20) NULL,
[standard_time_secs] [float] NULL,
[loaded_at] [timestamp] NULL,
PRIMARY KEY CLUSTERED
(
[race_id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Sample date for runners:
runner_id race_id name foaling_date colour distance_travelled form_figures gender age bred cloth_number stall_number num_fences_jumped long_handicap how_easy_won in_race_comment official_rating official_rating_type speed_rating speed_rating_type private_handicap private_handicap_type trainer_name trainer_id owner_name owner_id jockey_name jockey_id jockey_claim dam_name dam_id sire_name sire_id dam_sire_name dam_sire_id forecast_price forecast_price_decimal starting_price starting_price_decimal betting_text position_in_betting finish_position amended_position unfinished distance_beaten distance_won distance_behind_winner prize_money tote_win tote_place days_since_ran last_race_type_id last_race_type last_race_beaten_fav weight_pounds penalty_weight over_weight tack_hood tack_visor tack_blinkers tack_eye_shield tack_eye_cover tack_cheek_piece tack_pacifiers tack_tongue_strap id total_wins
1 82 401251 David Jack 2010-03-21 CH 143 NULL C 2 UK 4 3 NULL NULL NULL slowly into stride, took keen hold and soon in touch, pushed along and kept on same pace inside final furlong NULL NULL 32 Flat 18 Flat B J Meehan 9262 Roldvale Limited 2311 T E Durcan 18761 NULL NULL NULL NULL NULL NULL NULL 8/1 9 9/1 10 op 8/1 tchd 10/1 5 4 NULL NULL 2 NULL 3.5 216.449996948242 NULL NULL NULL NULL NULL NULL 129 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1267937 NULL
runners ddl:
[dbo].[historic_runners]
(
[runner_id] [int] NOT NULL,
[race_id] [int] NOT NULL,
[name] [varchar](255) NOT NULL,
[foaling_date] [date] NULL,
[colour] [varchar](20) NOT NULL,
[distance_travelled] [int] NULL,
[form_figures] [varchar](80) NULL,
[gender] [varchar](20) NULL,
[age] [int] NULL,
[bred] [varchar](4) NULL,
[cloth_number] [int] NULL,
[stall_number] [int] NULL,
[num_fences_jumped] [int] NULL,
[long_handicap] [int] NULL,
[how_easy_won] [int] NULL,
[in_race_comment] [text] NULL,
[official_rating] [int] NULL,
[official_rating_type] [varchar](80) NULL,
[speed_rating] [int] NULL,
[speed_rating_type] [varchar](80) NULL,
[private_handicap] [int] NULL,
[private_handicap_type] [varchar](80) NULL,
[trainer_name] [varchar](80) NULL,
[trainer_id] [int] NULL,
[owner_name] [varchar](255) NULL,
[owner_id] [int] NULL,
[jockey_name] [varchar](80) NULL,
[jockey_id] [int] NULL,
[jockey_claim] [int] NULL,
[dam_name] [varchar](80) NULL,
[dam_id] [int] NULL,
[sire_name] [varchar](80) NULL,
[sire_id] [int] NULL,
[dam_sire_name] [varchar](80) NULL,
[dam_sire_id] [int] NULL,
[forecast_price] [varchar](20) NULL,
[forecast_price_decimal] [float] NULL,
[starting_price] [varchar](20) NULL,
[starting_price_decimal] [float] NULL,
[betting_text] [text] NULL,
[position_in_betting] [int] NULL,
[finish_position] [int] NULL,
[amended_position] [int] NULL,
[unfinished] [varchar](30) NULL,
[distance_beaten] [float] NULL,
[distance_won] [float] NULL,
[distance_behind_winner] [float] NULL,
[prize_money] [float] NULL,
[tote_win] [float] NULL,
[tote_place] [float] NULL,
[days_since_ran] [int] NULL,
[last_race_type_id] [int] NULL,
[last_race_type] [varchar](80) NULL,
[last_race_beaten_fav] [int] NULL,
[weight_pounds] [int] NULL,
[penalty_weight] [int] NULL,
[over_weight] [int] NULL,
[tack_hood] [int] NULL,
[tack_visor] [int] NULL,
[tack_blinkers] [int] NULL,
[tack_eye_shield] [int] NULL,
[tack_eye_cover] [int] NULL,
[tack_cheek_piece] [int] NULL,
[tack_pacifiers] [int] NULL,
[tack_tongue_strap] [int] NULL,
[id] [int] NOT NULL,
[total_wins] [int] NULL,
CONSTRAINT [PK_RunnerRaceID] PRIMARY KEY CLUSTERED
(
[runner_id] ASC,
[race_id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Desired results - total_wins table
[dbo].[total_wins]
(
[id] [int] NOT NULL,
[total_wins] [int] NULL
)
The "id" on the total wins table corresponds to the id of the runners table, so I have 2mill rows in the runners table, with a unique indicator called id (not to be confused with the runner_id column, which contains duplicate entries as 1 runner can run in lots of races). So I would hope to end up with 2 million rows in the total_wins table, the total wins reflects how many races, the runner has won,prior to the date of the particular race the row relates to.
Any help would be really appreciated!! I have been struggling with this, I even considered flattening down the data and using a big data solution like hadoop, or mongodb.
Thanks
Laura

Thanks to Davids suggestion about using the group by, and avoiding the loop through, I think this is the potential solution...
SELECT runners.id, count(*) as total_wins
FROM dbo.historic_runners as runners
inner join dbo.historic_races as races on races.race_id = runners.race_id
where races.meeting_date <
(
select meeting_date
FROM dbo.historic_runners as ru
inner join dbo.historic_races as ra on ra.race_id = ru.race_id
where ru.id = runners.id
)
and runners.finish_position = 1
group by runners.id
Thanks for you answers on this problem, I appreciate it :)

Laura, I do not know the exact properties of your database, so I will only give you general ideas of what could be improved.
Detecting what is slow
You will need to test what is slow. Make a copy of your database and try to run the queries without the inserts. And then try to run a lot of inserts without custom selections. This way you will detect whether the writes or the reads are slow in your case. If none of these makes it slow, then something else happens to the table slowing down your process.
Take a look at the schema
Take a look whether the schema is adequate, like whether your database is in normal form and if so, in which one. If it is not in normal form, it might be a good idea to convert it to normal form.
Indexing
Take a look at indexes. If reads are slow, then you will need to add indexes for the columns which are involved in your query, but make sure you read an article before you do that about indexing if you are not well versed in the area. If writes are slow, then consider removing unnecessary indexes, like those involving columns which are not used in queries.
Larger batches
I understand that you are iterating the set by each user, but there are probably not so many races that you need to iterate them one-by-one. You could iterate them by batches of 100, by first getting the minimum, then getting the maximum in a select top 100 query ordering by runners.id. This will probably quicken up your process. Note that in later steps you will put the maximum into the minimum, so you will need only one query to determine limits after the very first iteration.
Last but not least
If writes are slow, then you can make a copy of the main table with a lot of indexes, so everything will be quick there and copy only relevant subsets there periodically and use that as a source for the stored procedure, so queries will be quick. This would increase performance, but avoid it if you are not forced to do it, since this adds a lot of maintenance work and a lot of additional possibilities to err.

Related

Query not using index in exists statement

I have the index IDX_tbl_SpeedRun_StatusTypeID_GameID_CategoryID_LevelID_PlusInclude on table dbo.tbl_SpeedRun below. The exists statement in the query below is taking a while (1m 10s) saying there is a missing index ON [dbo].[tbl_SpeedRun] ([StatusTypeID],[LevelID]).
Why is the exists statement not using the index I created? It already includes the columns [StatusTypeID],[LevelID].
Table:
CREATE TABLE [dbo].[tbl_SpeedRun]
(
[OrderValue] [int] NOT NULL IDENTITY(1,1),
[ID] [varchar] (50) NOT NULL,
[StatusTypeID] [int] NOT NULL,
[GameID] [varchar] (50) NOT NULL,
[CategoryID] [varchar] (50) NOT NULL,
[LevelID] [varchar] (50) NULL,
[SubCategoryVariableValues] [varchar] (1000) NULL,
[PlayerIDs] [varchar] (1000) NULL,
[PlatformID] [varchar] (50) NULL,
[RegionID] [varchar] (50) NULL,
[IsEmulated] [bit] NOT NULL,
[Rank] [int] NULL,
[PrimaryTime] [bigint] NULL,
[RealTime] [bigint] NULL,
[RealTimeWithoutLoads] [bigint] NULL,
[GameTime] [bigint] NULL,
[Comment] [varchar] (MAX) NULL,
[ExaminerUserID] [varchar] (50) NULL,
[RejectReason] [varchar] (MAX) NULL,
[SpeedRunComUrl] [varchar] (2000) NOT NULL,
[SplitsUrl] [varchar] (2000) NULL,
[RunDate] [datetime] NULL,
[DateSubmitted] [datetime] NULL,
[VerifyDate] [datetime] NULL,
[ImportedDate] [datetime] NOT NULL CONSTRAINT [DF_tbl_SpeedRun_ImportedDate] DEFAULT(GETDATE()),
[ModifiedDate] [datetime] NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tbl_SpeedRun]
ADD CONSTRAINT [PK_tbl_SpeedRun]
PRIMARY KEY NONCLUSTERED ([ID]) WITH (FILLFACTOR=90) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [IDX_tbl_SpeedRun_OrderValue]
ON [dbo].[tbl_SpeedRun] ([OrderValue]) WITH (FILLFACTOR=90) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_tbl_SpeedRun_StatusTypeID_GameID_CategoryID_LevelID_PlusInclude]
ON [dbo].[tbl_SpeedRun] ([StatusTypeID], [GameID], [CategoryID],[LevelID])
INCLUDE ([SubCategoryVariableValues], [PlayerIDs], [Rank],[PrimaryTime])
GO
Query:
SELECT
CASE
WHEN EXISTS (SELECT 1 FROM dbo.tbl_SpeedRun rn WITH (NOLOCK)
WHERE rn.LevelID = l.ID AND rn.StatusTypeID = 1)
THEN 1
ELSE 0
END
FROM
dbo.tbl_Level l WITH (NOLOCK)
WHERE
l.GameID = 'pd0wq901'
ORDER BY
l.OrderValue
This is the from clause of your subquery:
WHERE rn.LevelID = l.ID AND rn.StatusTypeID = 1
A helpful index for this predicate would involve the two columns, in any order.
Your existing index does not satisfy that requirement. It has columns:
[StatusTypeID], [GameID], [CategoryID], [LevelID])
INCLUDE ([SubCategoryVariableValues], [PlayerIDs], [Rank], [PrimaryTime])
Both columns are here, but buried within others - so the database cannot take advantage of it to speed up the subquery.
Bottom line: creating a large index that involves a lot of columns does not speed up queries by default. Instead, you can analyze each query individually and define the proper optimization.

SQL Server partition and index

I have a requirement to design a table that is going to have around 80 million records. I created a partition for every month using persisted column (if its wrong suggest me the best way). Please find below scripts that I used to create tables and partition and the query that's going to be used often. Only Insertion and deletion will be done on this table.
-- Create the Partition Function
CREATE PARTITION FUNCTION PF_Invoice_item (int)
AS RANGE LEFT FOR VALUES (1,2,3,4,5,6,7,8,9,10,11,12);
-- Create the Partition Scheme
CREATE PARTITION SCHEME PS_Invoice_item
AS PARTITION PF_Invoice_item ALL TO ([Primary]);
CREATE TABLE [Invoice]
(
[invoice_id] [bigint] NOT NULL,
[Invoice_Number] [varchar](255) NULL,
[Invoice_Date] [date] NULL,
[Invoice_Total] [numeric](18, 2) NULL,
[Outstanding_Balance] [decimal](18, 2) NULL,
CONSTRAINT [PK_Invoice_id] PRIMARY KEY CLUSTERED([invoice_id] ASC)
)
CREATE TABLE [InvoiceItem](
[invoice_item_id] [bigint] NOT NULL,
[invoice_id] [bigint] NOT NULL,
[invoice_Date] [date] NULL,
[make] [varchar](255) NULL,
[serial_number] [varchar](255) NULL,
[asset_id] [varchar](100) NULL,
[application] [varchar](255) NULL,
[customer] [varchar](255) NULL,
[ucid] [varchar](255) NULL,
[dcn] [varchar](255) NULL,
[dcn_name] [varchar](255) NULL,
[device_serial_number] [varchar](255) NULL,
[subscription_name] [varchar](255) NULL,
[product_name] [varchar](255) NULL,
[subscription_start_date] [date] NULL,
[subscription_end_date] [date] NULL,
[duration] [varchar](50) NULL,
[promo_name] [varchar](255) NULL,
[promo_end_date] [date] NULL,
[discount] [decimal](18, 2) NULL,
[tax] [decimal](18, 2) NULL,
[line_item_total] [decimal](18, 2) NULL,
[mth] AS (datepart(month,[invoice_date])) PERSISTED NOT NULL,**
[RELATED_PRODUCT_RATEPLAN_NAME] [varchar](250) NULL,
[SUB_TOTAL] [decimal](18, 2) NULL,
[BILLING_START_DATE] [date] NULL,`enter code here`
[BILLING_END_DATE] [date] NULL,
[SUBSCRIPTION_ID] [varchar](200) NULL,
[DEVICE_TYPE] [varchar](200) NULL,
[BASE_OR_PROMO] [varchar](200) NULL,
CONSTRAINT [PK_InvoiceItem_ID] PRIMARY KEY CLUSTERED ([invoice_item_id]
ASC,[mth] ASC))
ON PS_Invoice_item(mth);
GO
ALTER TABLE [InvoiceItem] WITH CHECK ADD CONSTRAINT [FK_Invoice_ID]
FOREIGN KEY([invoice_id])
REFERENCES [Invoice] ([invoice_id])
GO
I will be using below queries
select subscription_name,duration,start_date,end_date,promotion_name,
promotion_end_date,sub_total,discount,tax,line_item_total from InvoiceItem
lt inner join Invoice on lt.invoice_id=invoice.invoice_id where
invoice.invoice_number='' and lt.customer='' and lt.ucid='' lt.make='' and
lt.SERIAL_NUMBER='' and lt.dcn='' and lt.application=''
select customer,make,application from billing.AssetApplicationTotals
lineItem inner join billing.Invoice invoice on
lineItem.invoice_id=invoice.invoice_id where invoice.invoice_number='';
SELECT [invoice_Date],[make],[serial_number],[application],[customer],
[ucid],[dcn],[dcn_name],[device_serial_number]
,[subscription_name],[product_name],[subscription_start_date],
[subscription_end_date],[duration],[promo_name],[promo_end_date]
FROM [InvoiceItem] where [application]=''
SELECT [invoice_Date],[make],[serial_number],[application],[customer],
[ucid],[dcn],[dcn_name],[device_serial_number]
,[subscription_name],[product_name],[subscription_start_date],
[subscription_end_date],[duration],[promo_name],[promo_end_date]
FROM [InvoiceItem] where [customer]=''
What is the best way to create index? Shall I create separate non clustered index for each filter, or shall I have Composite index and shall I have covering index to avoid key lookup?

Update on view over partitioned tables updating all clustered indexes

We have a table, which is partitioned through a date field into separate years.
There is a view over all of these tables (Call)
Schema is as follows:
CREATE TABLE [dbo].[Call_2015](
[calID] [uniqueidentifier] NOT NULL,
[calPackageID] [int] NULL,
[calClientID] [int] NULL,
[calStartDate] [datetime] NOT NULL,
[calEndDate] [datetime] NOT NULL,
[calTimeIn] [char](5) NULL,
[calTimeOut] [char](5) NULL,
[calMinutes] [smallint] NULL,
[calPreferredTimeIn] [char](5) NULL,
[calPreferredTimeOut] [char](5) NULL,
[calActualTimeIn] [char](5) NULL,
[calActualTimeOut] [char](5) NULL,
[calActualMinutes] [smallint] NULL,
[calConfirmed] [smallint] NULL,
[calCarerID] [int] NULL,
[calRepCarerID] [int] NULL,
[calOriginalCarerID] [int] NULL,
[calContractID] [int] NULL,
[calNeedID] [int] NULL,
[calMedicationID] [int] NULL,
[calFrequency] [smallint] NULL,
[calFromDate] [datetime] NULL,
[calWeekNo] [smallint] NULL,
[calAlert] [smallint] NULL,
[calNoLeave] [smallint] NULL,
[calTimeCritical] [smallint] NULL,
[calStatus] [smallint] NULL,
[calClientAwayReasonID] [int] NULL,
[calCarerAwayReasonID] [int] NULL,
[calOutsideShift] [smallint] NULL,
[calHistoryID] [int] NULL,
[calInvoiceID] [int] NULL,
[calWagesheetID] [int] NULL,
[calReasonID] [int] NULL,
[calCallConfirmID] [varchar](50) NULL,
[calCreated] [datetime] NULL,
[calUpdated] [datetime] NULL,
[calVariation] [int] NULL,
[calVariationUserID] [int] NULL,
[calException] [smallint] NULL,
[calRetained] [smallint] NULL,
[calDoubleUpID] [uniqueidentifier] NULL,
[calDoubleUpOrder] [smallint] NULL,
[calNeedCount] [smallint] NULL,
[calNoStay] [smallint] NULL,
[calCoverCarerID] [int] NULL,
[calPayAdjustment] [real] NULL,
[calChargeAdjustment] [real] NULL,
[calTeamID] [int] NULL,
[calExpenses] [money] NULL,
[calMileage] [real] NULL,
[calOverrideStatus] [smallint] NULL,
[calLocked] [smallint] NULL,
[calDriver] [smallint] NULL,
[calPostcode] [char](10) NULL,
[calDayCentreID] [int] NULL,
[calMustHaveCarer] [smallint] NULL,
[calRoleID] [int] NULL,
[calUnavailableCarerID] [int] NULL,
[calClientInformed] [smallint] NULL,
[calFamilyInformed] [smallint] NULL,
[calMonthlyDay] [smallint] NULL,
[calOriginalTimeIn] [char](5) NULL,
[calLeadCarer] [smallint] NULL,
[calCallTypeID] [int] NULL,
[calActualStartDate] [datetime] NULL,
[calActualEndDate] [datetime] NULL,
[Table_Year] [int] NOT NULL,
CONSTRAINT [PK_Call_2015] PRIMARY KEY CLUSTERED
(
[Table_Year] ASC,
[calID] ASC,
[calStartDate] ASC,
[calEndDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[Call_2015] WITH CHECK ADD CONSTRAINT [CK_Call_Year_2015] CHECK (([Table_Year]=(2015)))
GO
ALTER TABLE [dbo].[Call_2015] CHECK CONSTRAINT [CK_Call_Year_2015]
GO
ALTER TABLE [dbo].[Call_2015] WITH CHECK ADD CONSTRAINT [CK_calStartDate_2015] CHECK (([calStartDate]>=CONVERT([datetime],'01 Jan 2015 00:00:00',(0)) AND [calStartDate]<=CONVERT([datetime],'31 DEC 2015 23:59:59',(0))))
GO
ALTER TABLE [dbo].[Call_2015] CHECK CONSTRAINT [CK_calStartDate_2015]
GO
ALTER TABLE [dbo].[Call_2015] ADD CONSTRAINT [DF_Call_2015_Table_Year] DEFAULT ((2015)) FOR [Table_Year]
GO
The update to the table is as follows:
UPDATE Call SET
calStartDate = CASE
WHEN calFrequency = 14 THEN dbo.funDate(#MonthlyDay, MONTH(calStartDate), YEAR(calStartDate))
WHEN calFrequency IN (15,16) THEN dbo.funMonthlyCallDate(calFrequency, #MonthlyDay, calStartDate)
ELSE DateAdd(d, #StartDay-1, (calStartDate - datepart(dw,calStartDate)+1))
END,
calEndDate = CASE
WHEN calFrequency = 14 THEN dbo.funDate(#MonthlyDay + #EndDay - #StartDay, MONTH(calStartDate), YEAR(calStartDate))
WHEN calFrequency IN (15,16) THEN DATEADD(D, #EndDay - #StartDay, dbo.funMonthlyCallDate(calFrequency, #MonthlyDay, calStartDate))
ELSE DateAdd(d, #StartDay-1+#DayCount, (calStartDate - datepart(dw,calStartDate)+1))
END,
calTimeIn = #TimeIn,
calTimeOut = #TimeOut,
calMinutes = #Minutes,
calMonthlyDay = #MonthlyDay,
calClientInformed = Null,
calFamilyInformed = Null
WHERE calPackageID = #PackageID
AND calClientID = #ClientID
AND calWeekNo = #WeekNo
AND (DatePart(dw, calStartDate) = #OriginalDay OR calFrequency IN (14,15,16))
AND calStartDate BETWEEN #StartDate AND #EndDate
AND (calInvoiceID = 0 OR calInvoiceID Is Null OR #InvoicesFinalised = 1)
AND (calWagesheetID = 0 OR calWagesheetID Is Null OR #WagesFinalised = 1)
AND (calLocked = 0 OR calLocked Is Null)
AND (Table_Year = YEAR(#StartDate)
OR Table_Year =YEAR(#EndDate))
The SP updates a batch of rows dependant of input into #StartDate and #EndDate (updates all rows with a calStartDate between the two)
The problem then comes with the execution plan. There are huge IO costs to the operation, and I've nailed it down to how SQL is dealing with the update.
Currently we have 20 of these tables; partitioned per year. Each update is causing an update of every single table's indexes, regardless of whether the table is actually touched by the update operation or not.
Execution Plan
Below this section it goes on to update, in the exact same manner, every table in the view.
I cannot see why this is, as I have specified the Table_Year (which the table is partitioned on) within the query text. Shouldn't SQL only update the necessary table?

cannot convert varchar to float in sql

These are my 2 tables
CREATE TABLE [dbo].[dailyRate](
[SYMBOL] [varchar](50) NULL,
[SERIES] [varchar](50) NULL,
[OPENPRICE] [varchar](50) NULL,
[HIGHPRICE] [varchar](50) NULL,
[LOWPRICE] [varchar](50) NULL,
[CLOSEPRICE] [varchar](50) NULL,
[LASTPRICE] [varchar](50) NULL,
[PREVCLOSE] [varchar](50) NULL,
[TOTTRDQTY] [varchar](50) NULL,
[TOTTRDVAL] [varchar](50) NULL,
[TIMESTAMPDAY] [varchar](50) NULL,
[TOTALTRADES] [varchar](50) NULL,
[ISIN] [varchar](50) NULL
)
CREATE TABLE [dbo].[cmpDailyRate](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[SYMBOL] [varchar](50) NULL,
[SERIES] [varchar](50) NULL,
[OPENPRICE] [decimal](18, 4) NULL,
[HIGHPRICE] [decimal](18, 4) NULL,
[LOWPRICE] [decimal](18, 4) NULL,
[CLOSEPRICE] [decimal](18, 4) NULL,
[LASTPRICE] [decimal](18, 4) NULL,
[PREVCLOSE] [decimal](18, 4) NULL,
[TOTTRDQTY] [bigint] NULL,
[TOTTRDVAL] [decimal](18, 4) NULL,
[TIMESTAMPDAY] [smalldatetime] NULL,
[TOTALTRADES] [bigint] NULL,
[ISIN] [varchar](50) NULL,
[M_Avg] [decimal](18, 4) NULL
)
this is my insert query to fetch data from table to another with casting
Collapse | Copy Code
INSERT into [Stock].[dbo].[cmpDailyRate]
SELECT [SYMBOL],[SERIES],Str([OPENPRICE], 18,4),Str([HIGHPRICE],18,4),
Str([LOWPRICE],18,4),Str([CLOSEPRICE],18,4),Str([LASTPRICE],18,4),Str([PREVCLOSE],18,4),convert(bigint,[TOTTRDQTY]),Str([TOTTRDVAL],18,4),
convert(date, [TIMESTAMPDAY], 105),convert(bigint,[TOTALTRADES]),[ISIN],null
FROM [Stock].[dbo].[DailyRate]
This query runs perfectly in SQL Server 2005, but it's causing errors in SQL Server 2008 (above query run also in SQL Server 2008 when installed; error arise in last few days)
Error :
Error cannot convert varchar to float
What to do?
One of your rows contains invalid data in the columns you are doing the float conversion (Str) on. Use the following strategy to work out which:
SELECT *
FROM [dailyRate]
WHERE IsNumeric([OPENPRICE]) = 0
OR IsNumeric([HIGHPRICE]) = 0
etc etc.
If you do not want to filter out data, a CASE statement might work better for you.
SELECT CASE
WHEN IsNumeric([OPENPRICE]) = 1 THEN [OPENPRICE]
ELSE NULL -- or 0 or whatever
END AS OPENPRICE,
CASE
WHEN IsNumeric([HIGHPRICE]) = 1 THEN [HIGHPRICE]
ELSE NULL -- or 0 or whatever
END AS [HIGHPRICE]
FROM [dailyRate]

Joining multiple columns in one table to a single column in another table

I am looking to create a view that pulls data from two tables "Schedule" and "Reference".
Schedule has 50+ columns (it's almost completely denormalized -- not my design), most of which contain a value that could be joined to a column in the Reference table.
How do I write the SQL statement to correctly join each column in Schedules to the single column in Reference?
The Schedule table is defined as:
CREATE TABLE [dbo].[Schedule](
[ID] [int] NOT NULL,
[SCHEDULEWEEK] [datetime] NOT NULL,
[EMPNO] [numeric](10, 0) NOT NULL,
[EMPLNAME] [varchar](32) NULL,
[EMPFNAME] [varchar](32) NULL,
[EMPSENDATE] [datetime] NULL,
[EMPHIREDATE] [datetime] NULL,
[EMPTYPE] [char](1) NULL,
[EMPSTATUS] [char](1) NULL,
[SNREFUSALS] [tinyint] NULL,
[QUALSTRING] [varchar](128) NULL,
[JOBOVERSHIFTTYPE] [bit] NULL,
[SHORTNOTICE] [bit] NULL,
[SHORTNOTICEWAP] [bit] NULL,
[SHORTNOTICEPHONE] [varchar](32) NULL,
[LEADHAND] [bit] NULL,
[DUALCURRENCY] [bit] NULL,
[MIN100WINDOW] [bit] NULL,
[STATHOLIDAY] [bit] NULL,
[AREAOVERHOURS] [bit] NULL,
[DOUBLEINTERZONES] [bit] NULL,
[MAXDAYSPERWEEK] [tinyint] NULL,
[MAXHOURSPERWEEK] [numeric](10, 2) NULL,
[MAXHOURSPERSHIFT] [numeric](10, 2) NULL,
[MAXDOUBLESPERWEEK] [tinyint] NULL,
[ASSIGNEDDAYS] [tinyint] NULL,
[ASSIGNEDHOURS] [numeric](10, 2) NULL,
[ASSIGNEDDOUBLES] [tinyint] NULL,
[ASSIGNEDLOAHOURS] [numeric](10, 2) NULL,
[SHIFTNO1] [int] NULL,
[TEXT1_1] [varchar](64) NULL,
[TEXT2_1] [varchar](64) NULL,
[DAYFLAG1] [bit] NULL,
[COMMENT1] [text] NULL,
[SHIFTNO2] [int] NULL,
[TEXT1_2] [varchar](64) NULL,
[TEXT2_2] [varchar](64) NULL,
[DAYFLAG2] [bit] NULL,
[COMMENT2] [text] NULL,
[SHIFTNO3] [int] NULL,
[TEXT1_3] [varchar](64) NULL,
[TEXT2_3] [varchar](64) NULL,
[DAYFLAG3] [bit] NULL,
[COMMENT3] [text] NULL,
[SHIFTNO4] [int] NULL,
[TEXT1_4] [varchar](64) NULL,
[TEXT2_4] [varchar](64) NULL,
[DAYFLAG4] [bit] NULL,
[COMMENT4] [text] NULL,
[SHIFTNO5] [int] NULL,
[TEXT1_5] [varchar](64) NULL,
[TEXT2_5] [varchar](64) NULL,
[DAYFLAG5] [bit] NULL,
[COMMENT5] [text] NULL,
[SHIFTNO6] [int] NULL,
[TEXT1_6] [varchar](64) NULL,
[TEXT2_6] [varchar](64) NULL,
[DAYFLAG6] [bit] NULL,
[COMMENT6] [text] NULL
-- Snip
) ON [PRIMARY]
And the Reference table is defined as:
CREATE TABLE [dbo].[Reference](
[ID] [int] NOT NULL,
[CODE] [varchar](21) NOT NULL,
[LOCATIONCODE] [varchar](4) NOT NULL,
[SCHAREACODE] [varchar](16) NOT NULL,
[LOCATIONNAME] [varchar](32) NOT NULL,
[FLTAREACODE] [varchar](16) NOT NULL
) ON [PRIMARY]
I am trying to join each [TEXT1_]/[TEXT2_] column in Schedule to the [SCHAREACODE] column in reference. All the reference table contains is a list of areas where the employee could work.
I think he means to join on the Reference table multiple times:
SELECT *
FROM Schedule AS S
INNER JOIN Reference AS R1
ON R1.ID = S.FirstID
INNER JOIN Reference AS R2
ON R2.ID = S.SecondID
INNER JOIN Reference AS R3
ON R3.ID = S.ThirdID
INNER JOIN Reference AS R4
ON R4.ID = S.ForthID
Your description is a bit lacking, so I'm going to assume that
Schedule has 50+ columns (it's almost completely denormalized -- not my design), most of which contain a value that could be joined to a column in the Reference table.
means that 1 of the 50+ columns in Schedule is a ReferenceId. So, given a table design like:
Schedule ( MaybeReferenceId1, MaybeReferenceId2, MaybeReferenceId3, ... )
Reference ( ReferenceId )
Something like:
SELECT *
FROM Schedule
JOIN Reference ON
Schedule.MaybeReferenceId1 = Reference.ReferenceId
OR Schedule.MaybeReferenceId2 = Reference.ReferenceId
OR Schedule.MaybeReferenceId3 = Reference.ReferenceId
OR Schedule.MaybeReferenceId4 = Reference.ReferenceId
...
would work. You could simplify it by using IN if your RDBMS supports it:
SELECT *
FROM Schedule
JOIN Reference ON
Reference.ReferenceId IN (
Schedule.MaybeReferenceId1,
Schedule.MaybeReferenceId2,
Schedule.MaybeReferenceId3,
Schedule.MaybeReferenceId4,
...
)
From updated question
Perhaps something like this? It will be messy no matter what you do.
SELECT S.ID
S.TEXT1_1,
TEXT1_1_RID = COALESCE((SELECT MAX(R.ID) FROM Reference R WHERE R.SCHAREACODE = S.TEXT1_1), 0),
S.TEXT1_2,
TEXT1_2_RID = COALESCE((SELECT MAX(R.ID) FROM Reference R WHERE R.SCHAREACODE = S.TEXT1_2), 0),
...
FROM Schedule S
Agree with TheSoftwareJedi, but can I just suggest using LEFT JOINs so that failures-to-match don't cause your Schedule row to disappear?
Of course, doing 28 JOINs is going to be a bit cumbersome whatever the details.
I'm not sure I'd call this "denormalized", more "abnormalized" ... :-)
Try a query like this:
select s.*, r.schareacode from schedule s,
where
s.text1_1 = s.schareacode
or s.text2_1 = s.schareacode
or s.textx_x = s.schareacode
..
You should be able to get the same results with traditional joins so I recommend you experiment with that as well.