How to find part of a string in HL7-like freetextparameter between e.q. 2nd and 3thd | character? - sql

In a program I created some HL7-like strings.
They look like this:
ID=1610968|EAD=02962|CNR=0|ACT=10968|ACTNAME=bijkomend honorarium voor toezicht COVID-19-patient|TIME=2/02/2023 16:21:00|EENHEID=30016|AFDCODE=KANE|AANTAL=1|URG=0|INF=0|TOPO=0|ARTS=avdbro9|SUP=avdbro9
I am looking for a SQL query in which i can split up this string in separate parts between the | character.
for instance I would like to find a way to isolate the part |ACT=10968| which always will be between the third and fourth |.
How can I do this?

If you are using SQL Server then try using the STRING_SPLIT function as documented here and then select the element you require from the output of this function


TERADATA REGEXP_SUBSTR Get string between two values

I am fairly new to teradata, but I was trying to understand how to use REGEXP_SUBSTR
For example I have the following cell value = ABCD^1234567890^1
How can I extract 1234567890
What I attempted to do is the following:
REGEXP_SUBSTR(x, '(?<=^).*?(?=^)')
But this didnt seem to work.
Can anyone help?
It might (or might not) be possible to use REGEXP_SUBSTR() to handle this, but you would need to use a capture group. An alternative here would be to do a regex replacement instead:
SELECT x, REGEXP_REPLACE(x, '^.*?\^|\^.*$', '') AS output
FROM yourTable;
The regex pattern used here matches:
^.*?\^ everything from the start to the first ^
| OR
\^.*$ everything from the second ^ to the end
We then replace with empty string to remove the content being matched.

How do I print the first occurence of a string after a special character in Hive using reg_extract or split?

I am having a deep dilemma in hive. My data set in Hive looks like this:
In each instance, I want to print only the first string after the #. So the output should be something like this:
I tried using the reg_extract function, but alas I am getting only NULL values. Since hive doesn't support reg_substr, the following synatax doesn't work:
Any suggestions are wecome!
You can use regexp_replace and then substr combination.
First remove all multiple occurrences of # from the string using regexp_replace().
regexp_replace(col,'#+','#') -- for data '#####123##' this will produce '#123#'
Then remove first # using substr. And then use instr to fetch everything starting from first till #.
substr(substr(str,2),1, instr(substr(str,2),'#')-1) this will produce '123'
You can see whole sql below.
select substr(substr(str,2),1, instr(substr(str,2),'#')-1) as result
from (
SELECT regexp_replace('#####123##','#+','#') as str) a
I assumed you always have # in the beginning. if you just add if left(str,1)='#'... and handle according to the data.

using Regex get substring between underscore 2 and underscore 3 of string,

I have a string like: Title Name_2021-04-13_A+B+C_Division.txt. I need to extract the A+B+C. The A+B+C may be other letters. I believe that using Regex would be the simplest way to do this. In other words I need to get the substring between underscore 2 and underscore 3 of string. All of my code is written in I have tried:
boatClass = Regex.Match(myFile, "(?<=_)(.*)(?=_)").ToString
I know this is not right but I think it is close. What do I need to add or change?
The regex code that will extract a substring between the second and third underscore of a string is:
However, I chose to use the split function:

How to replace part of a string in SQLite using Regex?

If i have the following string:
"123456abcd" and "123456"
I would like to find and replace where the numbers 123456 appear in that order, i know that regex has to be used in SQlite but I cannot seem to find which function to use (either REPLACE or UPDATE).
For example in the above - I would like to replace 123456 with the string "cheese" - then I would like the following:
"cheeseabcd" and "cheese"
I am stuck on how to solve this in SQL lite! everytime I make a change it changes the entire string and not just part of the string!
I don't see any regular expressions here. Just:
replace(col, '123456', cheese)

How to escape delimiter found in value - pig script?

In pig script, I would like to find a way to escape the delimiter character in my data so that it doesn't get interpreted as extra columns. For example, if I'm using colon as a delimiter, and I have a column with value "foo:bar" I want that string interpreted as a single column without having the loader pick up the comma in the middle.
You can try
A = LOAD 'somefile' AS (s:chararray);
The regex might have to be adapted.
It seems Pig takes the Input as the string its not so intelligent to identify how what is data or what is not.
The pig Storage works on the Strong Tokenizer. So if u want to do something like
a = LOAD '/abc/def/file.txt' USING PigStorage(':');
It doesn't seems to be solving your problem. But if we can write our own PigStorage() Method possibly we could come across some solution.
I will try posting the Code to resolve this.
you can use STRSPLIT(string, regex, limit); for the column split based on the delimiter.