Function to get a Sub String after a selected character in place selectively and dynamically - sql

In one of the database tables, I have a nvarchar type field that contains a series of special strings combined with some special characters. For example:
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD'
or
'SJDGh-SUDYSUI-jhsdhsj-YTsagh-ytetyyuwte-sagd'
or
'hwerweyri~sdjhfkjhsdkjfhds~jsdfhjsdhf~mdnfsd,mfn'
Based on a formula, a sub string is always returned after the special character. But this string may be after the first, second or third place of the special character - or _ or ~. I used Charindex and Substring function in SQL server. But always only the first part of the character string after the selected character is returned. for example:
select SUBSTRING ('hwerweyri~sdjhfkjhsdkjfhds~jsdfhjsdhf~mdnfsd,mfn', 0, CHARINDEX('~', 'hwerweyri~sdjhfkjhsdkjfhds~jsdfhjsdhf~mdnfsd,mfn', 0))
returned value: hwerweyri
If there is a solution for this purpose or you have a piece of code that can work in solving this problem, please advise.
It is important to mention that the location of the special character must be entered by ourselves in the function, for example, after the third repetition or the second repetition or the tenth repetition. The method or code should be such that the location can be entered dynamically and the function does not need to be defined statically.
For Example:
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD' ==> 3rd substring ==> 'GFSDGFSHDGF'
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD' ==> second substring ==> 'HGSDHGJD'
'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD' ==> 1st substring ==> 'JHJSD'
And The formula will be sent to the function through a programmed form and the generated numbers will be numbers between 1 and 15. These numbers are actually the production efficiency of a product whose form is designed in C# programming language. These numbers sent to the function are variable and each time these numbers may be sent to the function and applied to the desired character string. The output should look something like the one above. I don't know if I managed to get my point across or if I managed to make my request correctly or not.

Try the following function:
CREATE FUNCTION [dbo].[SplitWithCte]
(
#String NVARCHAR(4000),
#Delimiter NCHAR(1),
#PlaceOfDelimiter int
)
RETURNS Table
AS
RETURN
(
WITH SplitedStrings(Ends,Endsp)
AS (
SELECT 0 AS Ends, CHARINDEX(#Delimiter,#String) AS Endsp
UNION ALL
SELECT Endsp+1, CHARINDEX(#Delimiter,#String,Endsp+1)
FROM SplitedStrings
WHERE Endsp > 0
)
SELECT f.DataStr
FROM (
SELECT 'RowId' = ROW_NUMBER() OVER (ORDER BY (SELECT 1)),
'DataStr' = SUBSTRING(#String,Ends,COALESCE(NULLIF(Endsp,0),LEN(#String)+1)-Ends)
FROM SplitedStrings
) f WHERE f.RowId = #PlaceOfDelimiter + 1
)
How to use:
select * from [dbo].[SplitWithCte](N'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD', N'_', 3)
or
select DataStr from [dbo].[SplitWithCte](N'HGHGSD_JHJSD_HGSDHGJD_GFSDGFSHDGF_GFSD', N'_', 3)
Result: GFSDGFSHDGF

Related

Problem with using SUBSTRING and CHARINDEX

I have a column (RCV1.ECCValue) in a table which 99% of the time has a constant string format- example being:
T0-11.86-273
the middle part of the two hyphens is a percentage. I'm using the below sql to obtain this figure which is working fine and returns 11.86 on the above example. when the data in that table is in above format
'Percentage' = round(SUBSTRING(RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1, CHARINDEX('-',RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1) -CHARINDEX('-',RCV1.ECCValue)-1),2) ,
However...this table is updated from an external source and very occasionally the separators differ, for example:
T0-11.86_273
when this occurs I get the error:
Invalid length parameter passed to the LEFT or SUBSTRING function.
I'm very new to SQL and have got myself out of many challenges but this one has got me stuck. Any help would be mostly appreciated. Is there a better way to extract this percentage value?
Replace '_' with '-' to string in CHARINDEX while specifying length to the substring
'Percentage' = round(SUBSTRING(RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1, CHARINDEX('-',replace(RCV1.ECCValue,'_','-'),CHARINDEX('-',RCV1.ECCValue)+1) -CHARINDEX('-',RCV1.ECCValue)-1),2) ,
If you can guarantee the structure of these strings, you can try parsename
select round(parsename(translate(replace('T0-11.86_273','.',''),'-_','..'),2), 2)/100
Breakdown of steps
Replace . character in the percentage value with empty string using replace.
Replace - or _, whichever is present, with . using translate.
Parse the second element using parsename.
Round it up to 2 digits, which will also
automatically cast it to the desired numeric type.
Divide by 100
to restore the number as percentage.
Documentation & Gotchas
Use NULLIF to null out such values
round(
SUBSTRING(
RCV1.ECCValue,
NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) + 1,
NULLIF(CHARINDEX('-',
RCV1.ECCValue,
NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) + 1
), 0)
- NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) - 1
),
2)
I strongly recommend that you place the repeated values in CROSS APPLY (VALUES to avoid having to repeat yourself. And do use whitespace, it's free.

SQL Query to select a value between two known strings

I need a SQL query to get the value between two known strings in a text column.
The column name is d_info and the table name is Details.
The text is an XML fragment, but stored as a text value.
What I need is to get the value between the bookends <nettoeinkommen> and </nettoeinkommen> which is 718 in this example.
I also need the output to be saved in new column named income with data type float(8).
land>DE</land></wohnanschrift><taetigkeit>rentner</taetigkeit><dkbkundenstatus><bestandskunde>false</bestandskunde></dkbkundenstatus><haushaltsangaben><einnahmen><einkommen><nettoeinkommen>718</nettoeinkommen></einkommen><kindergeld>0</kindergeld><vermietungverpachtungnetto>0</vermietungverpachtungnetto><elterngeld>0</elterngeld><rentenunbefristet>0</rentenunbefristet><unselbststaendigetaetigkeit>740</unselbststaendigetaetigkeit><geringfuegigebeschaeftigung>0</geringfuegigebeschaeftigung></einnahmen><ausgaben><warmmiete>550</warmmiete><ratenimmobilienfinanzierung>0</ratenimmobilienfinanzierung>
I tried this code:
SELECT cast(SUBSTRING(d_info, CHARINDEX('<nettoeinkommen>', d_info)
, CHARINDEX('</nettoeinkommen>', d_info) - CHARINDEX('<nettoeinkommen>', d_info)) as float(8)) as income
from dbo.Details
But it's returning an Error converting data type varchar to real.
When I remove the cast function, the script works but it returns <nettoeinkommen>718 instead of only 718.
Thanks.
It is starting at the start of the tag not the end of it.
SELECT cast(
SUBSTRING(
d_info,
CHARINDEX('<nettoeinkommen>', d_info) + len('<nettoeinkommen>'),
CHARINDEX('</nettoeinkommen>', d_info) - (CHARINDEX('<nettoeinkommen>', d_info) + len('<nettoeinkommen>'))
) as float(8)) as income
from dbo.Details
you might even have these defined in variables:
SELECT cast(
SUBSTRING(
d_info,
CHARINDEX(#startTag, d_info) + len(#startTag),
CHARINDEX(#endTag, d_info) - (CHARINDEX(#startTag,d_info)+ len(#startTag))
) as float(8)) as income
from dbo.Details
I think the code is much easier to understand with the variables.
You need to add the length of your opening tag from the start index and subtract from the length of your substring statement:
SUBSTRING(d_info, CHARINDEX('<nettoeinkommen>', d_info)+16,
CHARINDEX('</nettoeinkommen>', d_info) - CHARINDEX('<nettoeinkommen>', d_info)-16)
As it seems, you are querieing plain xml data, for such purpose sql-server provides xquery functionality:
SELECT CAST(r.d_info AS XML).value('(/haushaltsangaben/einnahmen/einkommen/nettoeinkommen)[1]', 'decimal(19,2)')
FROM
(
SELECT '<taetigkeit>rentner</taetigkeit>
<dkbkundenstatus>
<bestandskunde>false</bestandskunde>
</dkbkundenstatus>
<haushaltsangaben>
<einnahmen>
<einkommen>
<nettoeinkommen>718</nettoeinkommen>
</einkommen>
</einnahmen>
</haushaltsangaben>' AS d_info
) AS r
If you intend to query more info from your source, you will end up with a bunch of stacked substring, patindex functions or even your own defined functions. This should be more readable and mantainable.
Using XQuery: https://learn.microsoft.com/en-us/sql/t-sql/xml/query-method-xml-data-type
As for your initial issue The SUBSTRING function in SQL returns the subset from a string starting from a given index for a specific length. For example SELECT SUBSTRING('whatever',5,4) returns 'ever'.
In case of CHARINDEX it gives the index for the first found match of a given pattern within a string. Example SELECT CHARINDEX('ever','whatever') should return 5, as 'ever' starts at the fifth position in 'whatever').
Now in your case you need to add the length of '<nettoeinkommen>' to the starting charindex and substract the length of '</nettoeinkommen>' from the length of the substring:
Also consider using decimal or numeric type instead of float, if you need to precise calculations: https://technet.microsoft.com/en-us/library/ms187912(v=sql.105).aspx

SQL: Replacing dates contained within a text string

I am using SQL Server Management Studio 2012. I work with medical records and need to de-identify reports. The reports are structured in a table with columns Report_Date, Report_Subject, Report_Text, etc... The string I need to update is in report_text and there are ~700,000 records.
So if I have:
"patient had an EKG on 04/09/2012"
I need to replace that with:
"patient had an EKG on [DEIDENTIFIED]"
I tried
UPDATE table
SET Report_Text = REPLACE(Report_Text, '____/___/____', '[DEIDENTIFED]')
because I need to replace anything in there that looks like a date, and it runs but doesn't actually replace anything, because apparently I can't use the _ wildcard in this command.
Any recommendations on this? Advance thanks!
You can use PATINDEX to find the location of Date and then use SUBSTRING and REPLACE to replace the dates.
Since there may be multiple dates in the Text you have to run a while loop to replace all the dates.
Below sql will work for all dates in the form of MM/DD/YYYY
WHILE EXISTS( SELECT 1 FROM dbo.MyTable WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0 )
BEGIN
UPDATE t
SET Report_Text = REPLACE(Report_Text, DateToBeReplaced, '[DEIDENTIFIED]')
FROM ( SELECT * ,
SUBSTRING(Report_Text,PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text), 10) AS DateToBeReplaced
FROM dbo.MyTable AS a
WHERE PATINDEX('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]%',Report_Text) > 0
) AS t
END
I have tested the above sql on a dummy table with few rows.I don't know how it will scale for your data but recommend you to give it a try.
To keep it simple, assume that a number represents an identifying element in the string so look for the position of the first number in the string and the position of the last number in the string. Not sure if this will apply to your entire set of records but here is the code ...
I created two test strings ... the one you supplied and one with the date at the beginning of the string.
Declare #tstString varchar(100)
Set #tstString = 'patient had an EKG on 04/09/2012'
Set #tstString = '04/09/2012 EKG for patient'
Select #tstString
-- Calculate 1st Occurrence of a Number
,PATINDEX('%[0-9]%',#tstString)
-- Calculate last Occurrence of a Number
,LEN(#tstString) - PATINDEX('%[0-9]%',REVERSE(#tstString))
,CASE
-- No numbers in the string, return the string
WHEN PATINDEX('%[0-9]%',#tstString) = 0 THEN #tstString
-- Number is the first character to find the last position and remove front
WHEN PATINDEX('%[0-9]%',#tstString) = 1 THEN
CONCAT('[DEIDENTIFIED]',SUBSTRING(#tstString, LEN(#tstString)-PATINDEX('%[0-9]%',REVERSE(#tstString))+2,LEN(#tstString)))
-- Just select string up to the first number
ELSE CONCAT(SUBSTRING(#tstString,1,PATINDEX('%[0-9]%',#tstString)-1),'[DEIDENTIFIED]')
END AS 'newString'
As you can see, this is messy in SQL.
I would rather achieve this with a parser service and move the data with SSIS and call the service.

DB2 SQL Anything left of a /

I've been working on this for days and can't seem to work it out. Basically I need return digits from a field before there is a forward slash. e.g. if the field was 1234/TEXT I want to return 1234. I can't just use left fieldname 4 as the digits vary in left e.g. 12345/TEXT, so it needs to be anything left of the forward slash. Now in the World of MS Access, it is something like this - and it works
Left(TABLE!FIELD,InStr(1,TABLE!FIELD,"/")-1)
However, how do I convert this to be used in an IBM\DB2 system? The DB2 SQL seems somewhat different to 'normal' SQL.
Thanks!
Rather than INSTR, maybe LOCATE
LOCATE(char, string)
char is the search term
string is the string being searched
You can achieve this by combining LOCATE with SUBSTR;
Locate information
Substring information
Cheat sheet (for this example);
SUBSTRING('FIELD','START POSITION', 'LENGTH')
LOCATE('SEARCH STRING', 'SOURCE STRING')
SUBSTRING lets you retrieve specific characters from a string, i.e.;
AFIELD = 'Hello'
SUBSTRING(AFIELD,4,2)
Result = 'lo' (position 4 and 5 of Hello)
LOCATE returns the position of the first character of the search string it finds as a number, i.e.;
AFIELD = 'Hello'
LOCATE('ello', AFIELD)
Result = 2 (it starts at position 2)
So you can combine these to do what you want, example;
XTABLE has 1 column called ACOL with the following values in it;
123467/ABCD
1321/ABDD
1123467/ABCD
To just retrieve the numbers;
SELECT SUBSTRING(ACOL,1, LOCATE('/',ACOL)-1)
FROM XRDK/XTABLE
Result;
123467
1321
1123467
What are we doing?
SUBSTRING(
ACOL,
1,
LOCATE('/',ACOL)-1
)
SUBSTRING(
Field ACOL,
Starting at position 1,
Length; using locate set this to where I find a '/' and subtract 1 from the
resulting postion (without the -1 you'd have the / on the end)
)
Try this
SELECT SUBSTRING(CAST (ROUND(COLUMN,2) AS DECIMAL(6,2)), 0, locate('/',CAST (ROUND(COLUMN,2) AS DECIMAL(6,2))))
FROM TABLE

Entering in more characters than a function parameter tables SQL

I have this code snippet I'm playing with (forgive the generic names):
create function GetList
(#d1 varchar(3), #d2 varchar(3), #d3 varchar(3))
returns table
as
return
with List
as
(
select x.pattern
from (values (#d1), (#d2), (#d3)) as x(pattern)
)
select * from list
This is eventually going to be a user-supplied list which they will use to query something else out, but playing around with this made me curious. If I were to run
select * from GetList('1111111','222','333')
I will get the same results as if I only entered in 3 characters for each. Since I limited the varchar parameter to characters, are the others completely ignored? Is there any potential nastiness that can happen if I have a varchar parameter that is 'overflowed' like this (other than the loss of data at the end of the string, of course)
The other characters totally ignored since you limited the parameter to a length of 3.
The only issue that you could have is if you actually wanted to return the characters that are over the length of 3.
For example, you pass in 1234567 and you actually want the whole value, you will only get 123.
If you are limiting the input parameter to 3, then there would be no reason to try and pass in a longer value. If there is a chance that you will pass in longer values, then you should increase the length of the parameter.