SQL Query to select a value between two known strings - sql

I need a SQL query to get the value between two known strings in a text column.
The column name is d_info and the table name is Details.
The text is an XML fragment, but stored as a text value.
What I need is to get the value between the bookends <nettoeinkommen> and </nettoeinkommen> which is 718 in this example.
I also need the output to be saved in new column named income with data type float(8).
land>DE</land></wohnanschrift><taetigkeit>rentner</taetigkeit><dkbkundenstatus><bestandskunde>false</bestandskunde></dkbkundenstatus><haushaltsangaben><einnahmen><einkommen><nettoeinkommen>718</nettoeinkommen></einkommen><kindergeld>0</kindergeld><vermietungverpachtungnetto>0</vermietungverpachtungnetto><elterngeld>0</elterngeld><rentenunbefristet>0</rentenunbefristet><unselbststaendigetaetigkeit>740</unselbststaendigetaetigkeit><geringfuegigebeschaeftigung>0</geringfuegigebeschaeftigung></einnahmen><ausgaben><warmmiete>550</warmmiete><ratenimmobilienfinanzierung>0</ratenimmobilienfinanzierung>
I tried this code:
SELECT cast(SUBSTRING(d_info, CHARINDEX('<nettoeinkommen>', d_info)
, CHARINDEX('</nettoeinkommen>', d_info) - CHARINDEX('<nettoeinkommen>', d_info)) as float(8)) as income
from dbo.Details
But it's returning an Error converting data type varchar to real.
When I remove the cast function, the script works but it returns <nettoeinkommen>718 instead of only 718.
Thanks.

It is starting at the start of the tag not the end of it.
SELECT cast(
SUBSTRING(
d_info,
CHARINDEX('<nettoeinkommen>', d_info) + len('<nettoeinkommen>'),
CHARINDEX('</nettoeinkommen>', d_info) - (CHARINDEX('<nettoeinkommen>', d_info) + len('<nettoeinkommen>'))
) as float(8)) as income
from dbo.Details
you might even have these defined in variables:
SELECT cast(
SUBSTRING(
d_info,
CHARINDEX(#startTag, d_info) + len(#startTag),
CHARINDEX(#endTag, d_info) - (CHARINDEX(#startTag,d_info)+ len(#startTag))
) as float(8)) as income
from dbo.Details
I think the code is much easier to understand with the variables.

You need to add the length of your opening tag from the start index and subtract from the length of your substring statement:
SUBSTRING(d_info, CHARINDEX('<nettoeinkommen>', d_info)+16,
CHARINDEX('</nettoeinkommen>', d_info) - CHARINDEX('<nettoeinkommen>', d_info)-16)

As it seems, you are querieing plain xml data, for such purpose sql-server provides xquery functionality:
SELECT CAST(r.d_info AS XML).value('(/haushaltsangaben/einnahmen/einkommen/nettoeinkommen)[1]', 'decimal(19,2)')
FROM
(
SELECT '<taetigkeit>rentner</taetigkeit>
<dkbkundenstatus>
<bestandskunde>false</bestandskunde>
</dkbkundenstatus>
<haushaltsangaben>
<einnahmen>
<einkommen>
<nettoeinkommen>718</nettoeinkommen>
</einkommen>
</einnahmen>
</haushaltsangaben>' AS d_info
) AS r
If you intend to query more info from your source, you will end up with a bunch of stacked substring, patindex functions or even your own defined functions. This should be more readable and mantainable.
Using XQuery: https://learn.microsoft.com/en-us/sql/t-sql/xml/query-method-xml-data-type
As for your initial issue The SUBSTRING function in SQL returns the subset from a string starting from a given index for a specific length. For example SELECT SUBSTRING('whatever',5,4) returns 'ever'.
In case of CHARINDEX it gives the index for the first found match of a given pattern within a string. Example SELECT CHARINDEX('ever','whatever') should return 5, as 'ever' starts at the fifth position in 'whatever').
Now in your case you need to add the length of '<nettoeinkommen>' to the starting charindex and substract the length of '</nettoeinkommen>' from the length of the substring:
Also consider using decimal or numeric type instead of float, if you need to precise calculations: https://technet.microsoft.com/en-us/library/ms187912(v=sql.105).aspx

Related

Problem with using SUBSTRING and CHARINDEX

I have a column (RCV1.ECCValue) in a table which 99% of the time has a constant string format- example being:
T0-11.86-273
the middle part of the two hyphens is a percentage. I'm using the below sql to obtain this figure which is working fine and returns 11.86 on the above example. when the data in that table is in above format
'Percentage' = round(SUBSTRING(RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1, CHARINDEX('-',RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1) -CHARINDEX('-',RCV1.ECCValue)-1),2) ,
However...this table is updated from an external source and very occasionally the separators differ, for example:
T0-11.86_273
when this occurs I get the error:
Invalid length parameter passed to the LEFT or SUBSTRING function.
I'm very new to SQL and have got myself out of many challenges but this one has got me stuck. Any help would be mostly appreciated. Is there a better way to extract this percentage value?
Replace '_' with '-' to string in CHARINDEX while specifying length to the substring
'Percentage' = round(SUBSTRING(RCV1.ECCValue,CHARINDEX('-',RCV1.ECCValue)+1, CHARINDEX('-',replace(RCV1.ECCValue,'_','-'),CHARINDEX('-',RCV1.ECCValue)+1) -CHARINDEX('-',RCV1.ECCValue)-1),2) ,
If you can guarantee the structure of these strings, you can try parsename
select round(parsename(translate(replace('T0-11.86_273','.',''),'-_','..'),2), 2)/100
Breakdown of steps
Replace . character in the percentage value with empty string using replace.
Replace - or _, whichever is present, with . using translate.
Parse the second element using parsename.
Round it up to 2 digits, which will also
automatically cast it to the desired numeric type.
Divide by 100
to restore the number as percentage.
Documentation & Gotchas
Use NULLIF to null out such values
round(
SUBSTRING(
RCV1.ECCValue,
NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) + 1,
NULLIF(CHARINDEX('-',
RCV1.ECCValue,
NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) + 1
), 0)
- NULLIF(CHARINDEX('-', RCV1.ECCValue), 0) - 1
),
2)
I strongly recommend that you place the repeated values in CROSS APPLY (VALUES to avoid having to repeat yourself. And do use whitespace, it's free.

SQL Server - Combine string to integer where integer can have a variable number of leading zeros

I have a report in SQL Server Report Builder which brings back the profession acronym (string) and registration number (integer) for each professional in a separate SQL database.
The registration number can be 5 or more digits long, and may start with one or more zeros. For example:
Profession Registration #
AB 00162
PH 02272
SA 13925
SA 026025
DA 1025927
I'm trying to put the profession acronym and registration number together into a registration ID, because I need to compare this with the registration ID from another (non SQL) database.
I'm trying to get something like this:
Registration ID
AB00162
PH02272
SA13925
SA026025
DA1025927
I've tried converting the integers to strings using the following in my query:
REGISTRY.PROFESSION + right('00000' + cast(REGISTRY.REGISTRATION_NO as varchar(8)), 5) as Full_Reg_Number
However, with the above the integers that are more than 5 digits long get cut off, and if I increase '00000' to, say, '0000000' and the number '5' to '7' in the above, the integers that only have 5 digits are padded with extra leading zeros.
I do not have permission to change the formatting of the integers in either database.
Integers aren't stored with leading zeroes. To be stored like that, then the field is NOT of integer type in the first place. Simply do:
Registry.profession + registry.registration_no
You can confirm that the stored type is not an integer as follows:
select data_type
from information_schema.columns
where table_name = 'registry'
and column_name = 'registration_no'
If you're getting a type conversion error as you mention in your comments, then most likely the error is not coming due to this concatenation. It's probably down the line, such as if you're using 'Full_Reg_Number' in a 'where' statement or other comparison that expects a comparison to an integer, and instead is getting a varchar. After all, you called the column 'Full_Reg_Number' even though it's not a number.
Based on your problems, I suspect those really are integers. You've just shown them with leading zeros in the question.
A simple solution is to use case:
(REGISTRY.PROFESSION +
CASE WHEN REGISTRY.REGISTRATION_NO < 10000 THEN right('00000' + cast(REGISTRY.REGISTRATION_NO as varchar(8)), 5)
ELSE REGISTRY.REGISTRATION_NO
END
) as Full_Reg_Number
An even simpler method uses FORMAT():
(REGISTRY.PROFESSION + FORMAT(REGISTRY.REGISTRATION_NO, '00000')
) as Full_Reg_Number

Translate function not returning relevant string in amazon redshift

I am trying to use a simple Translate function to replace "-" in a 23 digit string. The example of one such string is "1049477-1623095-2412303" The expected outcome of my query should be 104947716230952412303
The list of all "1049477-1623095-2412303" is present in a single column "table1". The name of the column is "data"
My query is
Select TRANSLATE(t.data, '-', '')
from table1 as t
However, it is returning 104947716230952000000 as the output.
At first, I thought it is an overflow error since the resulting integer is 20 digit so I also tried to use following
SELECT CAST(TRANSLATE(t.data,'-','') AS VARCHAR)
from table1 as t
but this is not working as well.
Please suggest a way so that I could have my desirable output
This is too long for a comment.
This code:
select translate('1049477-1623095-2412303', '-', '')
is going to return:
'104947716230952412303'
The return value is a string, not a number.
There is no way that it can return '104947716230952000000'. I could only imagine that happening if somehow the value is being converted to a numeric or bigint type.
Try regexp_replace()
Taking your own example, execute:
select regexp_replace('[string / column_name]','-');
It can be achieve RPAD try below code.
SELECT RPAD(TRANSLATE(CAST(t.data as VARCHAR),'-','') ,20,'00000000000000000000')

How to substring records with variable length

I have a table which has a column with doc locations, such as AA/BB/CC/EE
I am trying to get only one of these parts, lets say just the CC part (which has variable length). Until now I've tried as follows:
SELECT RIGHT(doclocation,CHARINDEX('/',REVERSE(doclocation),0)-1)
FROM Table
WHERE doclocation LIKE '%CC %'
But I'm not getting the expected result
Use PARSENAME function like this,
DECLARE #s VARCHAR(100) = 'AA/BB/CC/EE'
SELECT PARSENAME(replace(#s, '/', '.'), 2)
This is painful to do in SQL Server. One method is a series of string operations. I find this simplest using outer apply (unless I need subqueries for a different reason):
select *
from t outer apply
(select stuff(t.doclocation, 1, patindex('%/%/%', t.doclocation), '') as doclocation2) t2 outer apply
(select left(tt.doclocation2), charindex('/', tt.doclocation2) as cc
) t3;
The PARSENAME function is used to get the specified part of an object name, and should not used for this purpose, as it will only parse strings with max 4 objects (see SQL Server PARSENAME documentation at MSDN)
SQL Server 2016 has a new function STRING_SPLIT, but if you don't use SQL Server 2016 you have to fallback on the solutions described here: How do I split a string so I can access item x?
The question is not clear I guess. Can you please specify which value you need? If you need the values after CC, then you can do the CHARINDEX on "CC". Also the query does not seem correct as the string you provided is "AA/BB/CC/EE" which does not have a space between it, but in the query you are searching for space WHERE doclocation LIKE '%CC %'
SELECT SUBSTRING(doclocation,CHARINDEX('CC',doclocation)+2,LEN(doclocation))
FROM Table
WHERE doclocation LIKE '%CC %'

How to use function in subquery

I have one table named MemberCheque where the fields are:
MemberName, Amount
I want to to show the name and the respective amount in numbers and as well as in words after separating the integer amount from the decimal. So my query is like:
SELECT MemName, Amount, (SELECT (Amount)%1*100 AS lefAmn, dbo.fnNumberToWords(lefAmn)+
'Hundred ', (Amount) - (Amount)%1 AS righAmnt, dbo.fnNumberToWords (righAmnt)+' Cents'
from MemberCheque) AS AmountInWords FROM MemberCheque
but my store procedure can take only integer value to change into words. So, I am doing separating the Amount into two parts before and after decimal but when I am trying to run this query it gives me error that lefAmn and righAmnt is not recognised. Because I am trying to send the parameter from the same query.
The first problem is that you have a subquery that is returning more than one value, and that is not allowed for a subquery in the select clause.
That answer to your specific question is to use cast() (or convert()) to make the numbers integers:
select leftAmt, rightAmt,
(dbo.fnNumberToWords(cast(leftAmt as int))+'Hundred ' +
dbo.fnNumberToWords(cast(rightAmt as int))+' Cents'
) as AmountInWords
from (SELECT (Amount%1)*100 AS leftAmt,
(Amount) - (Amount)%1 AS rightAmt
from MemberCheque
) mc
If you can't alter your function, then CAST the left/right values as INT:
CAST((Amount)%1*100 AS INT) AS lefAmn
CAST((Amount) - (Amount)%1 AS INT) AS righAmnt
You can't pass the alias created in the same statement as your function parameter, you need:
dbo.fnNumberToWords (CAST((Amount)%1*100 AS INT))
dbo.fnNumberToWords (CAST((Amount) - (Amount)%1 AS INT))