SSIS stored procedure performance issue - sql

I am using SSIS to transform a raw data row into a transaction. Everything was going well until I added logic for a new field called "SplitPercentage" to the SQL command. The new field simply converts the value to a decimal, for example 02887 would transform into 0.2887.
The new logic works as intended, but now it takes 8 hours to run instead of 5 minutes.
Please see entire original code vs new code here:
Greatly appreciate any help!
New logic resulting in poor performance:
IF TRIM(SUBSTRING(#line, 293, 1)) = 1
BEGIN
SET #SplitPercentage = 1
END
ELSE
BEGIN
SET #SplitPercentage = CAST(''.'' + TRIM(SUBSTRING(#line, 294, 4)) AS decimal(7, 4))
END

While your current code is not ideal, I don't see anything in your new expression (SUBSTRING(), TRIM(), concatenation, CAST) that would account for such a drastic performance hit. I suspect the cause lies elsewhere.
However, I believe your expression can be simplified to eliminate the IF. Given a 5-character field "nnnnn" that you wish to treat as a decimal n.nnnn, you should be able to do this in a single statement using STUFF() to inject the decimal point:
#SplitPercentage = CAST(STUFF(SUBSTRING(#line, 293, 5), 2, 0, '.') AS decimal(7, 4))
The STUFF() injects the decimal point at position 2 (replacing 0 characters). I see no need for the TRIM().
(You would to double up the quotes for use within your Exec ('...') statement.)

Please try to change IF/ELSE block of code as follows:
SET #SplitPercentage = IIF(TRIM(SUBSTRING(#line, 293, 1)) = ''1''
, 1.0000
, CAST(''.'' + TRIM(SUBSTRING(#line, 294, 4)) AS DECIMAL(7, 4)));

A challenge you've run into is "I have a huge dynamic query process that I cannot debug." When I run into these issues, I try to break the problem down into smaller, solvable, set based options.
Reading that wall of code, my psuedocode would be something like
For all the data in Inbound_Transaction_Source by a given Source value (#SourceName)
Do all this data validation, type correction and cleanup by slicing out the current line into pieces
You can then lose the row-based approach by slicing your data up. I favor using CROSS APPLY at this point in my life but a CTE, Dervied Table, whatever makes sense in your head is valid.
Why I favor this approach though, is you can see what you're building, test it, and then modify it without worrying you're going to upset a house of cards.
-- Column ordinal declaration and definition is offsite
SELECT
*
FROM
[dbo].[Inbound_Transaction_Source] AS ITS
CROSS APPLY
(
SELECT
CurrentAgentNo = SUBSTRING(ITS.line, #CurrentAgentStartColumn, 10)
, CurrentCompMemo = SUBSTRING(ITS.line, #CompMemoStartColumn + #Multiplier, 1)
, CurrentCommAmount = SUBSTRING(ITS.line, #CommAmountStartColumn + #Multiplier, 9)
, CurrentAnnCommAmount = SUBSTRING(ITS.line, #AnnCommAmountStartColumn + #Multiplier, 9)
, CurrentRetainedCommAmount = SUBSTRING(ITS.line, #RetainedCommAmountStartColumn + #Multiplier, 9)
, CurrentRetainedSwitch = SUBSTRING(ITS.line, #RetainedSwitchStartColumn + #Multiplier, 9)
-- etc
-- A sample of your business logic
, TransactionSourceSystemCode = SUBSTRING(ITS.line, 308, 3)
)NamedCols
CROSS APPLY
(
SELECT
-- There's some business rules to be had here for first year processing
-- Something special with position 102
SUBSTRING(ITS.line,102 , 1) AS SeniorityBit
-- If department code? is 0079, we have special rules
, TRIM(SUBSTRING(ITS.line,141, 4)) As DepartmentCode
)BR0
CROSS APPLY
(
SELECT
CASE
WHEN NamedCols.TransactionSourceSystemCode in ('LVV','UIV','LMV') THEN
CASE WHEN BR0.SenorityBit = '0' THEN '1' ELSE '0' END
WHEN NamedCols.TransactionSourceSystemCode in ('CMP','FAL') AND BR0.DepartmentCode ='0079' THEN
CASE WHEN BR0.SenorityBit = '1' THEN '0' ELSE '1' END
WHEN NamedCols.TransactionSourceSystemCode in ('UIA','LMA','RIA') AND BR0.SenorityBit > '1' THEN
'1'
WHEN NamedCols.TransactionSourceSystemCode in ('FAL') THEN
'1'
ELSE '0'
END
)FY(IsFirstYear)
WHERE Source = #SourceName
ORDER BY Id;
Why did processing take increase from 5 minutes to 8 hours?
It likely had nothing to do with the change to the dynamic SQL. When an SSIS package run is "taking forever" relative to normal, then preferably while it's still running, look at your sources and destinations and make note of what it happening as it's likely one of the two.
A cursor complicates your life and is not needed once you start thinking in sets but it's unlikely to be the source of the performance problems given than you have a solid baseline of what normal is. Plus, this query is a single table query with a single filter.
Your SSIS package's data flow is probably chip shot Source to Destination Extract and Load or Slurp and Burp with no intervening transformation (as the logic is all in the stored procedure). If that's the case, then the only two possible performance points of contention are the source and destination. Since the source appears trivial, then it's likely that some other process had the destination tied up for those 8 hours. Had you run something like sp_whoisactive on the source and destination, you can identify the process that is blocking your run.

Related

Matching on Values, but Erroring on New Value in SQL Server

I am comparing data from two different databases (one MariaDB and one SQL Server) within my Node project, and am then doing inserts and updates as necessary depending on the comparison results.
I have a question about this code that I use to iterate through results in Node, going one at a time and passing in values to check against (note - I am more familiar with Node and JS than with SQL, hence this question):
SELECT TOP 1
CASE
WHEN RM00101.CUSTCLAS LIKE ('%CUSR%')
THEN CAST(REPLACE(LEFT(LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', '') AS INT)
ELSE 0
END AS Id,
CASE
WHEN LR301.RMDTYPAL = 7 THEN LR301.ORTRXAMT * -1
WHEN LR301.RMDTYPAL = 9 THEN LR301.ORTRXAMT * -1
ELSE LR301.ORTRXAMT
END DocumentAmount,
GETDATE() VerifyDate
FROM
CRDB..RM20101
INNER JOIN
CRDB..RM00101 ON LR301.CUSTNMBR = RM00101.CUSTNMBR
WHERE
CONVERT(BIGINT, (REPLACE(LEFT(LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', ''))) = 589091
Currently, the above works for me for finding records that match. However, if I enter a value that doesn't yet exist - in this line below, like so:
WHERE CONVERT(BIGINT, (REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', ''))) = 789091
I get this error:
Error converting data type varchar to bigint.
I assume the issue is that, if the value isn't found, it can't cast it to an INTEGER, and so it errors out. Sound right?
What I ideally want is for the query to execute successfully, but just return 0 results when a match is not found. In JavaScript I might doing something like an OR clause to handle this:
const array = returnResults || [];
But I'm not sure how to handle this with SQL.
By the way, the value in SQL Server that's being matched is of type char(21), and the values look like this: 00000516542-000. The value in MariaDB is of type INT.
So two questions:
Will this error out when I enter a value that doesn't currently match?
If so, how can I handle this so as to just return 0 rows when a match isn't found?
By the way, as an added note, someone suggested using TRY_CONVERT, but while this works in SQL Server, it doesn't work when I use it with the NODE mssql package.
I think the issue is happening because the varchar value is not always made of numbers. You can make the comparison in varchar format itself to avoid this issue:
WHERE (REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)), '-', '')) = '789091'
Hope this helps.
Edit: based on the format in the comment, this should do the trick;
WHERE REPLACE(LTRIM(REPLACE(REPLACE(LEFT( LR301.DOCNUMBR, CHARINDEX('-', LR301.DOCNUMBR)),'0',' '),'-','')),' ','0') = '789091'

Error checking in T-SQL Script

I am self taught in T-SQL, so I am sure that I can gain efficiency in my code writing, so any pointers are welcomed, even if unrelated to this specific problem.
I am having a problem during a nightly routine I wrote. The database program that is creating the initial data is out of my control and is loosely written, so I have bad data that can blow up my script from time to time. I am looking for assistance in adding error checking into my script so I lose one record instead of the whole thing blowing up.
The code looks like this:
SELECT convert(bigint,(SUBSTRING(pin, 1, 2)+ SUBSTRING(pin, 3, 4)+ SUBSTRING(pin, 7, 5) + SUBSTRING(pin, 13, 3))) AS PARCEL, taxyear, subdivisn, township, propclass, paddress1, paddress2, pcity
INTO [ASSESS].[dbo].[vpams_temp]
FROM [ASSESS].[dbo].[Property]
WHERE parcelstat='F'
GO
The problem is in the first part of this where the concatenation occurs. I am attempting to convert this string (11-1111-11111.000) into this number (11111111111000). If they put their data in correctly, there is punctuation in exactly the correct spots and numbers in the right spots. If they make a mistake, then I end up with punctuation in the wrong spots and it creates a string that cannot be converted into a number.
How about simply replacing "-" and "." with "" before CONVERT to BIGINT?
To do that you would simply replace part of your code with
SELECT CONVERT(BIGINT,REPLACE(REPLACE(pin,"-",""), ".","")) AS PARCEL, ...
Hope it helps.
First, I would use replace() (twice). Second, I would use try_convert():
SELECT try_convert(bigint,
replace(replace(pin, '-', ''), '.', '')
) as PARCEL,
taxyear, subdivisn, township, propclass, paddress1, paddress2, pcity
INTO [ASSESS].[dbo].[vpams_temp]
FROM [ASSESS].[dbo].[Property]
WHERE parcelstat = 'F' ;
You might want to check if there are other characters in the value:
select pin
from [ASSESS].[dbo].[Property]
where pin like '%[^-0-9.]%';
Why not just:
select cast(replace(replace('11-1111-11111.000','-',''),'.','') as bigint)
simply, use the next code:-
declare #var varchar(100)
set #var = '11-1111-11111.000'
select convert(bigint, replace(replace(#var,'-',''),'.',''))
Result:-
11111111111000

Can 2 character length variables cause SQL injection vulnerability?

I am taking a text input from the user, then converting it into 2 character length strings (2-Grams)
For example
RX480 becomes
"rx","x4","48","80"
Now if I directly query server like below can they somehow make SQL injection?
select *
from myTable
where myVariable in ('rx', 'x4', '48', '80')
SQL injection is not a matter of length of anything.
It happens when someone adds code to your existing query. They do this by sending in the malicious extra code as a form submission (or something). When your SQL code executes, it doesn't realize that there are more than one thing to do. It just executes what it's told.
You could start with a simple query like:
select *
from thisTable
where something=$something
So you could end up with a query that looks like:
select *
from thisTable
where something=; DROP TABLE employees;
This is an odd example. But it does more or less show why it's dangerous. The first query will fail, but who cares? The second one will actually work. And if you have a table named "employees", well, you don't anymore.
Two characters in this case are sufficient to make an error in query and possibly reveal some information about it. For example try to use string ')480 and watch how your application will behave.
Although not much of an answer, this really doesn't fit in a comment.
Your code scans a table checking to see if a column value matches any pair of consecutive characters from a user supplied string. Expressed in another way:
declare #SearchString as VarChar(10) = 'Voot';
select Buffer, case
when DataLength( Buffer ) != 2 then 0 -- NB: Len() right trims.
when PatIndex( '%' + Buffer + '%', #SearchString ) != 0 then 1
else 0 end as Match
from ( values
( 'vo' ), ( 'go' ), ( 'n ' ), ( 'po' ), ( 'et' ), ( 'ry' ),
( 'oo' ) ) as Samples( Buffer );
In this case you could simply pass the value of #SearchString as a parameter and avoid the issue of the IN clause.
Alternatively, the character pairs could be passed as a table parameter and used with IN: where Buffer in ( select CharacterPair from #CharacterPairs ).
As far as SQL injection goes, limiting the text to character pairs does preclude adding complete statements. It does, as others have noted, allow for corrupting the query and causing it to fail. That, in my mind, constitutes a problem.
I'm still trying to imagine a use-case for this rather odd pattern matching. It won't match a column value longer (or shorter) than two characters against a search string.
There definitely should be a canonical answer to all these innumerable "if I have [some special kind of data treatment] will be my query still vulnerable?" questions.
First of all you should ask yourself - why you are looking to buy yourself such an indulgence? What is the reason? Why do you want add an exception to your data processing? Why separate your data into the sheep and the goats, telling yourself "this data is "safe", I won't process it properly and that data is unsafe, I'll have to do something?
The only reason why such a question could even appear is your application architecture. Or, rather, lack of architecture. Because only in spaghetti code, where user input is added directly to the query, such a question can be ever occur. Otherwise, your database layer should be able to process any kind of data, being totally ignorant of its nature, origin or alleged "safety".

Pinpoint a particular variable error in sql 2008 script

I have big SQL query like this:
Select Distinct [Student].[Class].roll_nbr as [PERIOD-NBR],[Student].[Class].ent_nbr as [CLASS-NBR],
IsNull(Stuff((SELECT CAST(', ' AS Varchar(MAX)) + CAST([Student].[Subject].ent_nbr AS Varchar(MAX))
FROM [Student].[Subject]
WHERE [Student].[Subject].roll_nbr = [Student].[Class].roll_nbr
and ([Student].[Subject].class_nbr = [Student].[Class].roll_assignment_nbr
or ([Student].[Class].roll_assignment_nbr = '0'
and [Student].[Subject].class_nbr = [Student].[School].bus_stop) )
AND [Student].[Subject].ent_nbr <> ''
FOR XML PATH ('')), 1, 2, ''), '')
AS [OLD-STUDENT-NBR.OLD],IsNull(Stuff((SELECT CAST(', ' AS Varchar(MAX)) + ....
It goes on and on and a page long query, which builds a report. The problem I am having is some variable is erring out with message:
Error converting data type varchar to numeric.
This is very generic error does not tell me which variable. Is there any way to pinpoint which variable is erring out in sql 2008?
Comment out half the columns, if the error continues, comment out another half. If the error stops, it's in the section you just commented out. Rinse-repeat.
When faced with this type of error in the past, I've narrowed it down by commenting out portions of the query, see if it executes, then uncomment portions of the query until it point right to the error.
Not that I know of. However, you could try the following procedure:
1) Identify what columns are being converted.
2) Execute the select with half of them. If it executes well, then the problem is in the other half.
3) Repeat 2 (halving the number of columns) until you have come to a single candidate.
If query execution is long, keep track of all combinations tried and their result, as the problem could be affecting more than one column. This leads to:
4) If the problem continues, then there is a second affected column. Discard all columns present in queries that have executed without problem, plus the incorrect one just discovered, and start again with this set on 2).
5) Repeat until the original query (and necessary modifications) executes with no issue.

How does one filter based on whether a field can be converted to a numeric?

I've got a report that has been in use quite a while - in fact, the company's invoice system rests in a large part upon this report (Disclaimer: I didn't write it). The filtering is based upon whether a field of type VarChar(50) falls between two numeric values passed in by the user.
The problem is that the field the data is being filtered on now not only has simple non-numeric values such as '/A', 'TEST' and a slew of other non-numeric data, but also has numeric values that seem to be defying any type of numeric conversion I can think of.
The following (simplified) test query demonstrates the failure:
Declare #StartSummary Int,
#EndSummary Int
Select #StartSummary = 166285,
#EndSummary = 166289
Select SummaryInvoice
From Invoice
Where IsNull(SummaryInvoice, '') <> ''
And IsNumeric(SummaryInvoice) = 1
And Convert(int, SummaryInvoice) Between #StartSummary And #EndSummary
I've also attempted conversions using bigint, real and float and all give me similar errors:
Msg 8115, Level 16, State 2, Line 7
Arithmetic overflow error converting
expression to data type int.
I've tried other larger numeric datatypes such as BigInt with the same error. I've also tried using sub-queries to sidestep the conversion issue by only extracting fields that have numeric data and then converting those in the wrapper query, but then I get other errors which are all variations on a theme indicating that the value stored in the SummaryInvoice field can't be converted to the relevant data type.
Short of extracting only those records with numeric SummaryInvoice fields to a temporary table and then querying against the temporary table, is there any one-step solution that would solve this problem?
Edit: Here's the field data that I suspect is causing the problem:
SummaryInvoice
11111111111111111111111111
IsNumeric states that this field is numeric - which it is. But attempting to convert it to BigInt causes an arithmetic overflow. Any ideas? It doesn't appear to be an isolated incident, there seems to have been a number of records populated with data that causes this issue.
It seems that you are gonna have problems with the ISNUMERIC function, since it returns 1 if can be cast to any number type (including ., ,, e0, etc). If you have numbers longer than 2^63-1, you can use DECIMAL or NUMERIC. I'm not sure if you can use PATINDEX to perform an regex look on SummaryInvoice, but if you can, then you should try this:
SELECT SummaryInvoice
FROM Invoice
WHERE ISNULL(SummaryInvoice, '') <> ''
AND CASE WHEN PATINDEX('%[^0-9]%',SummaryInvoice) > 0 THEN CONVERT(DECIMAL(30,0), SummaryInvoice) ELSE -1 END
BETWEEN #StartSummary And #EndSummary
You can't guarantee what order the WHERE clause filters will be applied.
One ugly option to decouple inner and outer.
SELECT
*
FROM
(
Select TOP 2000000000
SummaryInvoice
From Invoice
Where IsNull(SummaryInvoice, '') <> ''
And IsNumeric(SummaryInvoice) = 1
ORDER BY SummaryInvoice
) foo
WHERE
Convert(int, SummaryInvoice) Between #StartSummary And #EndSummary
Another using CASE
Select SummaryInvoice
From Invoice
Where IsNull(SummaryInvoice, '') <> ''
And
CASE WHEN IsNumeric(SummaryInvoice) = 1 THEN Convert(int, SummaryInvoice) ELSE -1 END
Between #StartSummary And #EndSummary
YMMV
Edit: after question update
use decimal(38,0) not int
Change ISNUMERIC(SummaryInvoice) to ISNUMERIC(SummaryInvoice + '0e0')
AND with IsNumeric(SummaryInvoice) = 1, will not short circuit in SQL Server.
But may be you can use
AND (CASE IsNumeric(SummaryInvoice) = 1 THEN Convert(int, SummaryInvoice) ELSE 0 END)
Between #StartSummary And #EndSummary
Your first issue is to fix your database structure so bad data cannot get into the field. You are putting a band-aid on a wound that needs stitches and wondering why it doesn't heal.
Database refactoring is not fun, but it needs to be done when there is a data integrity problem. I assume you aren't really invoicing someone for 11,111,111,111,111,111,111,111,111 or 'test'. So don't allow those values to ever get entered (if you can't change the structure to the correct data type, consider a trigger to prevent bad data from going in) and delete the ones you do have that are bad.