Retrieve text between two periods in a value - sql

I’ve been spinning around a bit on how to accomplish this in SQL DW. I need to extract the text between two periods in a returned value. So my value returned for Result is:
I’m trying to extract the values between period 1 and 2, so the red portion above:
The values will be a wide variety of lengths.
I’ve got this code:
substring(Result,charindex('.',Result)+1,3) as ResultMid
that results in this:
My problem is I’m not sure how to get to a variable length to return so that I can pull the full value between the two periods. Would someone happen to know how I can accomplish this?
Thx,
Joe

We can build on your current attempt:
substring(
result,
charindex('.', result) + 1,
charindex('.', result, charindex('.', result) + 1) - charindex('.', result) - 1
)
Rationale: you alreay have the first two arguments to substring() right. The third argument defines the number of characters to capture. For this, we compute the position of the next dot (.) with expression: charindex('.', result, charindex('.', result) + 1). Then we substract the position of the first dot from that value, which gives us the number of characters that we should capture.
Demo on DB Fiddle:
result | result_mid
:----------------------- | :---------
sam.pdc.sys.paas.l.com | pdc
sm.ridl.sys.paas.m.com | ridl
s.sandbox.sys.paas.g.com | sandbox

If you are dealing with up to 128 characters per delimited part of the string, try parsename as below. Otherwise, GMB has a pretty solid solution up there.
select *, parsename(left(result,charindex('.',result,charindex('.',result)+1)-1),1) as mid
from your_table;
Another method that you can easily modify to extract 3rd, 4th...(hopefully not too remote) part of the string using cross apply.
select result, mid
from your_table t1
cross apply (select charindex('.',result) as i1) t2
cross apply (select charindex('.',result,(i1 + 1)) as i2) t3
cross apply (select substring(result,(i1+1),(i2-i1-1)) as mid) t4;
DEMO

Related

Write a query to find 2nd best word from the string using sql

I have below table like this:
ID Name
1 Ram played best And he best one
2 Shiv is best in reading and he is best player
3 X is best cook
I need to get the output like this:
ID Second best word
1 one
2 player
3 n/a
I need to get the output whatever we have after 'best' word in the name string. Can anyone help me to crack this?
XQuery makes this task simple.
XQuery is based on ordered sequences. Exactly what we need.
Notable points:
Input string is tokenized as XML (first CROSS APPLY).
XML element with the 2nd "best" word is found (second CROSS APPLY).
Finally, using its position we are getting very next word via .value() method.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(MAX));
INSERT INTO #tbl (tokens) VALUES
('Ram played best And he best one'),
('Shiv is best in reading and he is best player'),
('X is best cook');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = SPACE(1);
SELECT ID, tokens
, COALESCE(c.value('(/root/r[position()=sql:column("t2.x")]/text())[1]','VARCHAR(20)'), 'N/A') AS Result
FROM #tbl
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT c.query('for $i in /root/r[./text()="best"][2]
let $pos := count(root/*[. << $i]) + 2
return $pos').value('.','INT')) AS t2(x);
Output
+----+-----------------------------------------------+--------+
| ID | tokens | Result |
+----+-----------------------------------------------+--------+
| 1 | Ram played best And he best one | one |
| 2 | Shiv is best in reading and he is best player | player |
| 3 | X is best cook | N/A |
+----+-----------------------------------------------+--------+
You can use the following;
select
id,
name,
case
when PATINDEX('%best %best %',name)= 0 then 'N/A'
else SUBSTRING(
SUBSTRING(name, PATINDEX('%best %', name)+ 5, LEN(name)),
PATINDEX('%best %', SUBSTRING(name, PATINDEX('%best %', name)+ 5, LEN(name)))+ 5,
LEN(name)
)
end
from
test
You could split the string in to rows but that becomes tricky as SQL Server's string_split does not guarantee ordering.
This method uses cross apply values to pick out key elements of the string and should work whether the next word is the end of the string or not.
select Id,
case when IsNull(b2,0)=0 then 'n/a' else
Substring(name2,w, IsNull(NullIf(CharIndex(' ',name2,w),0),Len(name2))-w+1)
end SecondBestWord
from t
cross apply (values(CharIndex('best',name)))x(b1)
cross apply (values(Stuff(name,b1,4,'')))y(name2)
cross apply (values(CharIndex('best',name2)))z(b2)
cross apply (values(Iif(b2=0, 0, b2+5)))w(w)
Example DB<>Fiddle
One more solution calling XQuery for rescue, but this might be a bit simpler. Especially the predicate *[. << $i] can be very slow with larger XMLs.
Using some more replacements we can achieve your needs with this:
(Credits for the sample data to Yitzhak Kabinsky)
I added one more line where your wanted word is not the last word and where there is another "best" in the sentence afterwards...
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(MAX));
INSERT INTO #tbl (tokens) VALUES
('Ram played best And he best one'),
('Shiv is best in reading and he is best player'),
('X is best cook'),
('this is best example where best is not the end of the best sentence');
--the query
SELECT t.tokens
,A.x.value('best[3]/text()[1]','nvarchar(100)')
FROM #tbl t
CROSS APPLY(SELECT CAST(CONCAT('<best>',REPLACE(REPLACE(t.tokens,' best ','</best><best>'),' ','<blank/>'),'</best>') AS XML)) A(x);
One of the intermediate XMLs looks like this:
<best>Shiv<blank/>is</best>
<best>in<blank/>reading<blank/>and<blank/>he<blank/>is</best>
<best>player</best>
The idea in short:
we use " best " to find the buzz words and any normal blank to separate the tokens within.
Now we can pick the 3rd <best> (we added one more in the beginning), pick the first text() (which is NULL, if not existing).
You can use COALESCE() to return "N/A" instead of NULL.
Hint:
Using <blank/> (=> self-closing tags) instead of the blanks allows us to address the text()-nodes beneath <best> by their index.
My suggestion would be this one
SELECT
ID,
CASE
WHEN Name NOT LIKE '%best %best %' THEN 'n/a'
ELSE REVERSE(LEFT(REVERSE(Name),CHARINDEX('tseb',REVERSE(Name))-2))
END AS [Second best word]
FROM Table;
So first you check if there are two occurances of the word best and if so, then you reverse the whole string and look for the first occurance of tseb and reverse it again.
Maybe this is the solution you're looking for :)
EDIT: this would always pick the word after the last best. So if there are 20 occurances of best then you won't the the second one...

Get name after and before certain character in SQL Server

I got the following entry in my database:
\\folder.abc\es\Folder-A\\2020-08-03\namefile.csv
So basically, I want everything after the last \ and before .
the namefile in that example
Thanks in advance.
If you are making use of an older version of SQL server which doenst support string_split. The reverse function comes in handy as follows.
The steps i do is reverse the string, grab the char position of ".", grab the char position of "\" then apply the substring function on it to slice the data between the two positions. Finally i reverse it again to get the proper value.
Here is an example
with data
as(select '\\folder.abc\es\Folder-A\\2020-08-03\namefile.csv' as col
)
select reverse(substring(reverse(col)
,charindex('.',reverse(col))+1
,charindex('\',reverse(col))
-
charindex('.',reverse(col))-1
)
) as file_name
from data
+-----------+
| file_name |
+-----------+
| namefile |
+-----------+
dbfiddle link
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=8c0fc11f5ec813671228c362f5375126
You can use:
select t.*,
left(s.value, charindex('.', s.value))
from t cross apply
string_split(t.entry, '\') s
where t.entry like concat('%', s.value);
This splits the string into different components and matches on the one at the end of the string. If components can repeat, the above can return duplicates. That is easily addressed by moving more logic into the apply:
select t.*, s.val
from t cross apply
(select top (1) left(s.value, charindex('.', s.value)) as val
from string_split(t.entry, '\') s
where t.entry like concat('%', s.value)
) s
You can just use String functions (REVERSE,CHARINDEX,SUBSTRING).
SELECT
REVERSE(
SUBSTRING(REVERSE('\\folder.abc\es\Folder-A\\2020-08-03\namefile.csv'),
CHARINDEX('.',REVERSE('\\folder.abc\es\Folder-A\\2020-08-03\namefile.csv'))+1,
CHARINDEX('\',REVERSE('\\folder.abc\es\Folder-A\\2020-08-03\namefile.csv'))-
CHARINDEX('.',REVERSE('\\folder.abc\es\Folder-A\\2020-08-03\namefile.csv'))-1))
OR
SELECT
REVERSE
(
SUBSTRING( --get filename
reverse(path), --to get position last \
CHARINDEX('.',reverse(path))+1,
CHARINDEX('\',reverse(path))- CHARINDEX('.',reverse(path))-1)
)

Split a String based on a specific pattern of characters

Basically, I would like to be able to Split a long text field after each date into unique rows that correspond to the dates. The source field "Notes" is just a long running text field with multiple comments over time with a distinct date ... initially, I tried splitting off the '-' after the date which works to some degree, except where there are dashes elsewhere in the text. So I'm thinking of something where I could split off of each unique instance of a date (mm/dd/yy) ... one issue is the length is not consistent meaning, it could be:
'm/d/yy-' or 'mm/dd/yy-' or 'mm/d/yy-'
Example Data > 'Notes' Column:
3/30/16-Had a meeting 2/5/16-LVM 10/5/15-Spoke to customer
*A single cell could have multiple dates and comments in it
Looking for end result like this:
Date Value
3/30/16 Had a meeting
2/15/16 LVM
10/5/15 Spoke to customer
I am using something basic like the below, but wondering if I can get a little more sophisticated with the STRING_SPLIT
SELECT NOTES, VALUE
FROM SRC_TABLE
CROSS APPLY STRING_SPLIT(NOTES, '-')
Appreciate any insights or ideas!
Hmmm. I think this does what you want:
select t.*, s.*
from src_table t outer apply
(select value as value_date
from string_split(t.notes, '-') s
where s.value like '%/%/%' and s.value not like '%/%/%/%'
) s;
EDIT:
If you just want to split on the first -, you can use:
select left(notes, charindex('-', notes + '-') - 1),
stuff(notes, 1, charindex('-', notes + '-'), '')
from src_table;
Here is a db<>fiddle.

How can I extract part of a string with different lengths and insert it into a table?

I uploaded some data from excel sheet to a table in sql , I would like to use part of the string that I inserted into the column PPRName and insert into another table [Verify].
The data in the column when inserted looks like this:
August 2018 [ NW: Construction MTP021 - Building and Civil Construction: Masonry NQF 3 ]
I want to insert this part of the string :
NW: Construction MTP021 - Building and Civil Construction: Masonry NQF 3
into another table [Verify] for every PPR Name in the PPRName column. The names of the PPRs vary in length but all come in same format.
I would also like to extract the August 2018 and cast it as a date and insert into my table [Verify].
I am not sure how to use Charindex and Substrings to achieve this.
i tried this but no data was returned
select SUBSTRING([PPR_Caption],charindex('[',[PPR_Caption]),charindex([PPR_Caption],']'))
FROM [dbo].[PPRS]
You incorrectly use the 2nd CHARINDEX and you incorrectly use the SUBSTRING commands.
SELECT SUBSTRING(PPR_Caption, CHARINDEX("[", PPR_Caption) + 1, CHARINDEX("]", PPR_Caption) - CHARINDEX("[", PPR_Caption) - 1)
FROM PPRS
SUBSTRING uses a start and a lenght, not the start and end point. To get the length use your end point and substract the start point (and correct the 1 position offset with -1).
In your 2nd CHARINDEX you switched the string to search in and the string to look for.
String operations like this are cumbersome in SQL Server.
Try this:
select replace(v2.str_rest, ' ]', '') as name, cast(str_start as date) as dte
from (values ('August 2018 [ NW: Construction MTP021 - Building and Civil Construction: Masonry NQF 3 ]')
) v(str) cross apply
(values (stuff(v.str, 1, charindex('[', str) + 1, ''), substring(v.str, 1, charindex('[', str) -1))
) v2(str_rest, str_start);
SQL Server is pretty good about guessing formats for converting dates, so it will actually convert the date without the day of the month.

SQL Summing digits of a number

i'm using presto. I have an ID field which is numeric. I want a column that adds up the digits within the id. So if ID=1234, I want a column that outputs 10 i.e 1+2+3+4.
I could use substring to extract each digit and sum it but is there a function I can use or simpler way?
You can combine regexp_extract_all from #akuhn's answer with lambda support recently added to Presto. That way you don't need to unnest. The code would be really self explanatory if not the need for cast to and from varchar:
presto> select
reduce(
regexp_extract_all(cast(x as varchar), '\d'), -- split into digits array
0, -- initial reduction element
(s, x) -> s + cast(x as integer), -- reduction function
s -> s -- finalization
) sum_of_digits
from (values 1234) t(x);
sum_of_digits
---------------
10
(1 row)
If I'm reading your question correctly you want to avoid having to hardcode a substring grab for each numeral in the ID, like substring (ID,1,1) + substring (ID,2,1) + ...substring (ID,n,1). Which is inelegant and only works if all your ID values are the same length anyway.
What you can do instead is use a recursive CTE. Doing it this way works for ID fields with variable value lengths too.
Disclaimer: This does still technically use substring, but it does not do the clumsy hardcode grab
WITH recur (ID, place, ID_sum)
AS
(
SELECT ID, 1 , CAST(substring(CAST(ID as varchar),1,1) as int)
FROM SO_rbase
UNION ALL
SELECT ID, place + 1, ID_sum + substring(CAST(ID as varchar),place+1,1)
FROM recur
WHERE len(ID) >= place + 1
)
SELECT ID, max(ID_SUM) as ID_sum
FROM recur
GROUP BY ID
First use REGEXP_EXTRACT_ALL to split the string. Then use CROSS JOIN UNNEST GROUP BY to group the extracted digits by their number and sum over them.
Here,
WITH my_table AS (SELECT * FROM (VALUES ('12345'), ('42'), ('789')) AS a (num))
SELECT
num,
SUM(CAST(digit AS BIGINT))
FROM
my_table
CROSS JOIN
UNNEST(REGEXP_EXTRACT_ALL(num,'\d')) AS b (digit)
GROUP BY
num
;