Extra blank space between words - openrefine

Please help me with 2 questions on how to do the GREL expression for:
If there are double spaces between 2 words in a column, how can I eliminate 1 space Example: Robert--Smith to Robert-Smith The minus character equals a blank for illustration
How can I look for an exact word in a text filter.
Thanks!

1°) try transform---> value.replace(" "," ")
Or, simply common transforms ----> collapse consecutive white spaces
2°) Column ---> text filters and enter you word
Or, do column---> Facet---> Customs facet and type : value.contains(" you_word ")
or value.contains(/(yourexactword)/)
This will return a True or False facet
H.

#hpiedcoq is the right answer if you need to have them in GREL. if not you can just use the point and click interface:
for the first question: Select your column and select Edit cells > Common transforms > Collapse consecutive white space
for the second question: select your column > text filter > enter the work you are looking for. You can select case sensitive if you want to take into account upper and lower case in your search.

1.1 transform -- > value.replace(" "," ")
Deletes all double whitespace.
1.2 transform -- > value.trim()
Deletes all double whitespace and deletes whitespaces before and after the string.
1.3 transform -- > value.replace(/\b \b/," ")
Replace with regular expression, deletes only double whitespace between two words.
Text filter > turn on regular expression and use \b.
Text filter with regular expression: \bWord\b = exact word, before and after the word may or may not be a only whitespace.

Related

Remove the last punctuation in list of numbers in Python

I have variable of numbers and letters and want a code to remove the apostrophe between each number/letter and only keeping the first and last apostrophe for the variable. Desired output is shown below
numbers = 'V7780T103', '494368103', '003654100', '26210C104'
output should be
numbers = 'V7780T103, 494368103, 003654100, 26210C104'

Regex to replace multiple patterns with single not working

I am working on replacing multiple occurance of string 0000 with single random number in HANA SQL
I have used these patterns
'(\w+)\s+\1'
'([0000 ]+) \1'
but all occurrences are replaced except the last occurrence of the pattern
SELECT REPLACE_REGEXPR('(\w+)\s+\1' IN '0000 0000 0000' WITH ROUND(RAND()*1000) OCCURRENCE ALL) AS a2
FROM DUMMY;
Current output is
RANDOM 0000
expected output is
RANDOM
Try this regex:
((0000) +)+(0000)
Look Here
And if it's OK to use any digit and more \ less times then 4:
(\d+ +)+\d+
Good Luck!
You may use
\b(\d+)(?:\s+\1)+\b
See the regex demo
You need \d to match digits (if you need to match letters and _ keep on using \w).
Also, to match 1 or more repetitions of a sequence of patterns you need (?:....)+, a + quantified non-capturing group.
Pattern details
\b - word boundary
(\d+) - Group 1: one or more digits
(?:\s+\1)+ - 1+ repetitions of 1+ whitespaces and the same value as captured in Group 1
\b - word boundary
Regex graph:

Perl6 split function adding extra elements to array

my #r = split("", "hi");
say #r.elems;
--> output: 4
split is adding two extra elements to the array, one at the beginning and another at the end.
I have to do shift and pop after every split to correct for this.
Is there a better way to split a string?
If you're splitting on the empty string, you will get an empty element at the start and the end of the returned list as there is also an empty string before and after the string.
What you want is .comb without parameters, written out completely functionally:
"hi".comb.elems.say; # 2
See https://docs.raku.org/routine/comb#(Str)_routine_comb for more info.
The reason for this is when you use an empty Str “” for the delimiter it is the same as if you had used the regex /<|wb>/ which matches next to characters. So it also matches before the first character, and after the last character. Perl 5 removes these “extra” strings for you in this case (and in this case only), which is likely where the confusion lays.
What Perl 6 does instead is allow you to explicitly :skip-empty values
'hi'.split('') :skip-empty
'hi'.split('', :skip-empty)
split("", "hi") :skip-empty
split("", "hi", :skip-empty)
Or to specify what you actually want
'hi'.comb( /./ )
'hi'.comb( 1 )
'hi'.comb
comb( /./, 'hi' )
comb( 1, 'hi' )

Find Each Occurrence of X and Insert a Carriage Return

A colleague has some data he is putting into a flat file (.txt) and needs to insert a carriage return before EACH occurrence of 'POL01', 'SUB01','VEH01','MCO01'.
I did use:
For Each line1 As String In System.IO.File.ReadAllLines(BodyFileLoc)
If line1.Contains("POL01") Or line1.Contains("SUB01") Or line1.Contains("VEH01") Or line1.Contains("MCO01") Then
Writer.WriteLine(Environment.NewLine & line1)
Else
Writer.WriteLine(line1)
End If
Next
But unfortunately it turns out that the file is not formatted in 'lines' by SSIS but as one whole string.
How can I insert a carriage return before every occurrence of the above?
Test Text
POL01CALT302276F 332 NBPM 00101 20151113201511130001201611132359 2015111300010020151113000100SUB01CALT302276F 332 NBPMP01 Akl Abi-Khalil 19670131 M U33 Stoford Close SW19 6TJ 2015111300010020151113000100VEH01CALT302276F 332 NBPM001LV56 LEJ N 2006VAUXHALL CA 2015111300010020151113000100MCO01CALT302276F 332 NBPM0101 0 2015111300010020151113000100POL01CALT742569N
You can use regular expressions for this, specifically by using Regex.Replace to find and replace each occurrence of the strings you're looking for with a newline followed by the matching text:
Dim str as String = "xxxPOL01xxxSUB01xxxVEH01xxxMCO01xxx"
Dim output as String = Regex.Replace(str, "((?:POL|SUB|VEH|MCO)01)", Environment.NewLine + "$1")
'output contains:
'xxx
'POL01xxx
'SUB01xxx
'VEH01xxx
'MCO01xxx
There may be a better way to construct this regular expression, but this is a simple alternation on the different letters, followed by 01. This matched text is represented by the $1 in the replacement string.
If you're new to regular expressions, there are a number of tools that help you understand them - for example, regex101.com will show you an explanation of the one I have used here:

Vi: how to automatically insert spaces

I'm trying to write a nice feature for crazy people like me who like there lines to be perfectly aligned.
I often write some file in which the format is "key = value".
Since the key may contain an indeterminate number of character, one have to manually align the "=" symbols which is not cool.
Is there a way to tell vi "when someone type the equal character, then insert as spaces as necessary to go to the column 25, then write an the equal symbol"?
The second step will be to define a shortcut to apply this format to an entire file.
Any help would be appreciated.
Ben.
Map the behavior of = in Insert Mode.
Next code will add spaces until column 24 from current cursor position and will add an equal sign after it. If there were characters after cursor position (suppose in a middle of a word), those characters will be moved after column 25. Add it to your vimrc file and try.
"" If length of the line is more or equal to 24, add an equal sign at the end.
"" Otherwise insert spaces from current position of cursor until column 24
"" and an equal sign, moving characters after it.
function My_align()
let line_len = strlen( getline('.') )
if line_len >= 24
s/$/=/
return
endif
let col_pos = col('.')
exe 's/\%#\(.\|$\)/\=submatch(1) . printf( "%' . (24 - col_pos) . 's%s", " ", "=" )/'
endfunction
inoremap = <Esc>:call My_align()<CR>A
For second step, use the multiple repeats command, check for an equal sign and insert spaces until column 25 just before it. Won't work if equal sign is after column 25 before executing it, but you get the idea.
:g/=/exe 's/=/\=printf( "%' . ( 24 - stridx( getline('.'), "=" ) ) . 's", " " ) . submatch(0)/'