match multiline strings with regex - openrefine

Is it possible to match multiline strings with match() function?
I tried to apply match(/(abc)\rdef/) to a cell containing 2 lines of text abc & def, but it does not work. Is there a way to get "abc" as result?

Simply use \n (newline) instead of \r (carriage return).
value.match(/(abc)\ndef/)
But you have to indicate where the newline is. match has no "multliline" parameter, so the dot (.) doesn't match line breaks.

Of course ! Thanx Ettore
And I found a way to do what I wanted with value.match(/(.*?\n)*(def)\n?(.*?\n?)*/)

Related

using Regex get substring between underscore 2 and underscore 3 of string, vb.net

I have a string like: Title Name_2021-04-13_A+B+C_Division.txt. I need to extract the A+B+C. The A+B+C may be other letters. I believe that using Regex would be the simplest way to do this. In other words I need to get the substring between underscore 2 and underscore 3 of string. All of my code is written in vb.net. I have tried:
boatClass = Regex.Match(myFile, "(?<=_)(.*)(?=_)").ToString
I know this is not right but I think it is close. What do I need to add or change?
The regex code that will extract a substring between the second and third underscore of a string is:
(?:[^_]+_){2}([^_]+)
However, I chose to use the split function:
myString.Split("_"c)(2)

Using groups in OpenRefine regex

I'm wondering if it is possible to use "groups" in ReGeX used in Open Refine GREL syntax. I mean, I'd like to replace all the dots followed and preceded by a character WITH the same character and dot but followed by a space and then the character.
Something like:
s.replace(/(.{1})\..({1})/,/(1).\s(2)/)
It should, but your last argument needs to be a string, not a regular expression. Internally Refine uses Java's Matcher#replaceAll method which accepts a string argument.
I think I found out how to deal with this. You need to put $X in your string value to address a Xth capture group.
It should be like this:
s.replace(/.?(#capcure group 1).?(#capcure group 2).*?/), " some text $1 some text $2 some text")

Regex for letters, digits, no spaces

I'm trying to create a Regex to check for 6-12 characters, one being a digit, the rest being any characters, no spaces. Can Regex do this? I'm trying to do this in objective-c and I'm not familiar with Regex at all. I've been reading a couple tutorials, but most are for matching simple cases of a number, or a set of numbers, but not exactly what i'm looking for. I can do it with methods, but I was wondering if it that would be too slow and I figured I could try learning something new.
asdfg1 == ok
asdfg 1 != ok
asdfgh != ok
123456 != ok
asdfasgdasgdasdfasdf != ok
use this regex ^(?=.*\d)(?=.*[a-zA-Z])[^ ]{6,12}$
It seems that you mean "letter" when you say "character", right? And (thanks to burning_LEGION for pointing that out) there may be only one digit?
In that case, use
^(?=\D*\d\D*$)[^\W_]{6,12}$
Explanation:
^ # Start of string
(?=\D*\d\D*$) # Assert that there is exactly one digit in the string
[^\W_] # Match a letter or digit (explanation below)
{6,12} # 6-12 times
$ # End of string
[^\W_] might look a little odd. How does it work? Well, \w matches any letter, digit or underscore. \W matches anything that \w doesn't match. So [^\W] (meaning "match any character that is not not alphanumeric/underscore") is essentially the same as \w, but by adding _ to this character class, we can remove the underscore from the list of allowed characters.
i didn't try though, but i think here is the answer
(^[^\d\x20]*\d[^\d\x20]*$){6,12}
This is for one digit: ^[^\d\x20]{0,11}\d{1}[^\d\x20]{0,11}$ but I can`t get limited to 6-12 length, you can use other function to check length first and if it from 6 to 12 check with this regex witch I wrote.

Match words but could possibly contain spaces within word

Is there a way to match words in regexp (or SQL) with spaces so, for example,
This would match to
T h i s
T hi s
Th is
You can use \s* after each letter, meanings that 0 or more white spaces. but you can use a simple solution using replace()...
WordThis.replace(' ','').equals("this")
You can try an expression like this (t\s*h\s*i\s*s\s*).
You'll need to ensure your settings are case insensitive.
I suppose it is only way to add \s* manually between every characters in word.

skip to next non whitespace character in string array

I have a problem...
in attempting to parse a file seperated by nothing more than whitespace i have an issue... I have decided the best way to do this is to tokenise the string i have, so far i have put all my lines into an array (by defining all new entrys in the array via the newline character) So my array may contain 5 entrys as such : (each entry in the array defines the lines in the file)
1)mary julia anne steve
2)alex james david katie
3)omegle yikes craxy horse
4)foo bar foobar matt maximus
5)capital or not smack
As you can see, each entry in the file may contain differing amounts of undefined whitespace... which can be one or more tab spaces, or many regular space characters.
I've considered looping through the string char by char until non whitespace is detected, but this seems ugly...
any help?
Thanks :)
sscanf make it all for you:
char *s="\nmary julia anne \t steve", o[100];
int n=0;
while( sscanf(s+=n,"%99s%n",o,&n)==1 )
puts(o);
str += strspn(str, " \t\r\n" );
use isspace()
From man isspace
isspace()
checks for white-space characters. In the "C" and "POSIX" locales, these are: space, form-feed ('\f'), newline ('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical tab ('\v').