How can I add a string character based on a position in OpenRefine? - openrefine

I have a column in Openrefine, which I would like to add a character string in each of its rows, based on the position in the string.
For example:
I have an 8th character number string: 85285296 and would like to add "-" at the fourth place: "8528-5296".
Anyone can help me find the specific function in OpenRefine?
Thanks
Tzipy

The simplest approach is to just use the expression language's built-in string indexing and concatenation:
value[0,4]+'-'+value[4,8]
or more generally, if you don't know that your value is exactly 8 characters long:
value[0,4]+'-'+value[4,999]

Possible solution (not sure if it's the most straightforward):
value.replace(/(\d{4})(.+)/, "$1-$2")
This means : if $1 represents the content of the first parenthesis/group in the regular expression before and $2 the content of the second one, replaces each value in the column with $1-$2.

Some other options:
value.splitByLengths(4,4).join("-")
value.match(/(\d{4})(\d{4})/).join("-")
value.substring(0,4)+"-"+value.substring(4,8)
I think 'splitByLengths' is the neatest, but I might use 'match' instead because it fails with an error if your starting string isn't 8 digits - which means you don't accidentally process data that doesn't conform to your assumption of what data is in the column - but you could use a facet/filter to check this with any of the others

Related

Postgres - substring from the beginning to the second last occurrence of a char within a string

I need to retrieve the bolded section of the below string . This value is in a column within my Postgres database table.
SEALS_LME_TRADES_MBL_20220919_00212.csv
I tried to utilize the functions; substring, reverse, strpos but they all have limitations. It seems like regex is the best option, however I was not able to do it.
Essentially I need to substring from beginning till the second last '_'. I do not want the date and sequence number along with the file extension at the end.
The closes regex I managed to get is: ^(([^]*){4})
https://regex101.com/
This look a little wonky but how about this?
select substring ('SEALS_LME_TRADES_MBL_20220919_00212.csv', '^(.+)_[^_]+_[^_]+')
Translation
^ from the beginning
(.+) any characters (capture and return this value), followed by
_ an underscore, followed by
[^_]+ one or more non-underscores, followed by
_ an underscore, followed by
[^_]+ one or more non-underscores
Regex greediness will cause any incidental underscores to be captured in the initial string.
Technically speaking the last portion (one or more non-underscores) can probably be omitted.

How to Trim right and left a String in VB .net

I want to take the value of
T.GS.+0.220kg
but I don't know how to remove the string.
I just want to take numbers from the weight.
like 0.220
Can someone help me ?
You can make use of the Regular Expressions to extract a decimal value from basically any string. First you'd need to import the library:
Imports System.Text.RegularExpressions
Then using this will return just the decimal value:
Regex.Match("T.GS.+0.220kg", "\d+.\d+").Value
This particular expression looks for a digit or digits, followed by a point (dot), followed by another number of digits, so the previous points (in between T and G for example) aren't included.
This returns exactly 0.220, you can then replace the string with any string variable and assign this expression as needed.
If you havn't worked with regular expressions before and want somthing that looks a little nicer. You could use the string.split method.
dim input as string = "T.GS.+0.220kg"
input = input.split("+")(1) ' which will grab the "0.220kg"
input = input.substring(0, input.length - 2) ' then filter off the last 2 chars
In english:
split the string into 2 seperate pieces grabing the part to the right of the first '+' symbol.
Then remove the last 2 chars from the end.

Retrieving the value which have '#' character in ColdFusion

I'm trying to assign the value of a column from the query to a variable using cfset tag. For example, if the value is 'abcd#1244', then if I use <cfset a = #trim(queryname.column)#> it will return only abcd. But I need the whole value of that column.
You will need to escape the # symbol. You can get clever and do it all in one swoop (# acts as an escape character when placed next to another #).
Example being
The item## is #variable#.
In order to print "The item# is 254."
There are plenty of text and string functions at your disposal.
I'd recommend first trying to escape the value as soon as it is drawn from your database.
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec1a60c-7ffc.html

Using groups in OpenRefine regex

I'm wondering if it is possible to use "groups" in ReGeX used in Open Refine GREL syntax. I mean, I'd like to replace all the dots followed and preceded by a character WITH the same character and dot but followed by a space and then the character.
Something like:
s.replace(/(.{1})\..({1})/,/(1).\s(2)/)
It should, but your last argument needs to be a string, not a regular expression. Internally Refine uses Java's Matcher#replaceAll method which accepts a string argument.
I think I found out how to deal with this. You need to put $X in your string value to address a Xth capture group.
It should be like this:
s.replace(/.?(#capcure group 1).?(#capcure group 2).*?/), " some text $1 some text $2 some text")

How should a string be matched with a regular expression in Objective C

I'm finding it hard to match strings using NSRegularExpression. Generic alpha characters are not a problem with [a-z] but if I need to match a word like 'import' I'm struggling to make it work. I'm sure I have to escape the word in some manner but I can't find any docs around this. A really basic example would be
{{import "hello"}}
where I want to get hold of the string: hello
edit: to clarify - 'hello' could be any string - it's the bit I want returned
This regular expression matches the text between the "-s in your example:
\{\{import "([^"]+)"\}\}
The match will be stored in the first match group.