Remove all occurrences of a list of words vim - variables

Having a document whose first line is foo,bar,baz,qux,quux, is there a way to store these words in a variable as a list ['foo','bar','baz','qux','quux']and remove all their occurrences in a document with vim?
Like a command :removeall in visual mode highlighting the list:
foo,bar,baz,qux,quux
hello foo how are you
doing foo bar baz qux
good quux
will change the text to:
hello how are you
doing good

A safer way is to write a function, check each part of your "list", if there is something needs to be escaped. then do the substitution (removing). A dirty & quick way to do it with your input is with this mapping:
nnoremap <leader>R :s/,/\|/g<cr>dd:%s/\v<c-r>"<c-h>//g<cr>
then in Normal mode, when you go to the line, which contains deletion parts and must be CSV format, press <leader>R you will get expected output.
The substitution would fail if that line has regex special chars, like /, *, . or \ etc.

Something like this one liner should work:
:for f in split(getline("."), ",") | execute "%s/" . f | endfor | 0d
Note that you'll end up with a lot of trailing spaces.
edit
This version of the command above takes care of those pesky trailing spaces (but not the one on line 2 of your sample text):
:for f in split(getline("."), ",") | execute "%s/ *" . f | endfor | 0d
Result:
hello how are you
doing
good

Related

Awk - How to escape the | in sub?

I'd like to substitue a string, which contains a |
My STDIN :
13|Test|123|6232
14|Move|126|6692
15|Test|123|6152
I'd like to obtain :
13|Essai|666|6232
14|Move|126|6692
15|Essai|666|6152
I tried like this
{sub("|Test|123","|Essai|666") ;} {print;}
But I think the | is bothers me.... I really need to replace the complete string WITH the |.
How should I do to get this result ?
Many thanks for you precious help
You can use
awk '{sub(/\|Test\|123\|/,"|Essai|666|")}1' file
See the online demo.
Note:
/\|Test\|123\|/ is a regex that matches |Test|123| substring
sub(/\|Test\|123\|/,"|Essai|666|") - replaces the first occurrence of the regex pattern in the whole record (since the input is omitted, $0 is assumed)
1 triggers the default print action, no need to explicitly call print here.

How to remove diacritics in Perl 6

Two related questions.
Perl 6 is so smart that it understands a grapheme as one character, whether it is one Unicode symbol (like ä, U+00E4) or two and more combined symbols (like p̄ and ḏ̣). This little code
my #symb;
#symb.push("ä");
#symb.push("p" ~ 0x304.chr); # "p̄"
#symb.push("ḏ" ~ 0x323.chr); # "ḏ̣"
say "$_ has {$_.chars} character" for #symb;
gives the following output:
ä has 1 character
p̄ has 1 character
ḏ̣ has 1 character
But sometimes I would like to be able to do the following.
1) Remove diacritics from ä. So I need some method like
"ä".mymethod → "a"
2) Split "combined" symbols into parts, i.e. split p̄ into p and Combining Macron U+0304. E.g. something like the following in bash:
$ echo p̄ | grep . -o | wc -l
2
Perl 6 has great Unicode processing support in the Str class. To do what you are asking in (1), you can use the samemark method/routine.
Per the documentation:
multi sub samemark(Str:D $string, Str:D $pattern --> Str:D)
method samemark(Str:D: Str:D $pattern --> Str:D)
Returns a copy of $string with the mark/accent information for each character changed such that it matches the mark/accent of the corresponding character in $pattern. If $string is longer than $pattern, the remaining characters in $string receive the same mark/accent as the last character in $pattern. If $pattern is empty no changes will be made.
Examples:
say 'åäö'.samemark('aäo'); # OUTPUT: «aäo␤»
say 'åäö'.samemark('a'); # OUTPUT: «aao␤»
say samemark('Pêrl', 'a'); # OUTPUT: «Perl␤»
say samemark('aöä', ''); # OUTPUT: «aöä␤»
This can be used both to remove marks/diacritics from letters, as well as to add them.
For (2), there are a few ways to do this (TIMTOWTDI). If you want a list of all the codepoints in a string, you can use the ords method to get a List (technically a Positional) of all the codepoints in the string.
say "p̄".ords; # OUTPUT: «(112 772)␤»
You can use the uniname method/routine to get the Unicode name for a codepoint:
.uniname.say for "p̄".ords; # OUTPUT: «LATIN SMALL LETTER P␤COMBINING MACRON␤»
or just use the uninames method/routine:
.say for "p̄".uninames; # OUTPUT: «LATIN SMALL LETTER P␤COMBINING MACRON␤»
If you just want the number of codepoints in the string, you can use codes:
say "p̄".codes; # OUTPUT: «2␤»
This is different than chars, which just counts the number of characters in the string:
say "p̄".chars; # OUTPUT: «1␤»
Also see #hobbs' answer using NFD.
This is the best I was able to come up with from the docs — there might be a simpler way, but I'm not sure.
my $in = "Él está un pingüino";
my $stripped = Uni.new($in.NFD.grep: { !uniprop($_, 'Grapheme_Extend') }).Str;
say $stripped; # El esta un pinguino
The .NFD method converts the string to normalization form D (decomposed), which separates graphemes out into base codepoints and combining codepoints whenever possible. The grep then returns a list of only those codepoints that don't have the "Grapheme_Extend" property, i.e. it removes the combining codepoints. the Uni.new(...).Str then assembles those codepoints back into a string.
You can also put these pieces together to answer your second question; e.g.:
$in.NFD.map: { Uni.new($_).Str }
will return a list of 1-character strings, each with a single decomposed codepoint, or
$in.NFD.map(&uniname).join("\n")
will make a nice little unicode debugger.
I can't say this is better or faster, but I strip diacritics in this way:
my $s = "åäö";
say $s.comb.map({.NFD[0].chr}).join; # output: "aao"

Need solution for break line issue in string

I have below string which has enter character coming randomely and fields are separated by ~$~ and end with ##&.
Please help me to merge broken line into one.
In below string enter character is occured in address field (4/79A)
-------Sting----------
23510053~$~ABC~$~4313708~$~19072017~$~XYZ~$~CHINNUSAMY~$~~$~R~$~~$~~$~~$~42~$~~$~~$~~$~~$~28022017~$~
4/79A PQR Marg, Mumbai 4000001~$~TN~$~637301~$~Owns~$~RAT~$~31102015~$~12345~$~##&
Thanks in advance.
Rupesh
Seems to be a (more or less) duplicate of https://stackoverflow.com/a/802439/3595749
Note, you should ask to your client to remove the CRLF signs (rather than aplying the code below).
Nevertheless, try this:
cat inputfile | tr -d '\n' | sed 's/##&/##\&\n/g' >outputfile
Explanation:
tr is to remove the carriage return,
sed is to add it again (only when ##& is encountred). s/##&/##\&\n/g is to substitute "##&" by "##&\n" (I add a carriage return and "&" must be escaped). This applies globally (the "g" letter at the end).
Note, depending of the source (Unix or Windows), "\n" must be replaced by "\r\n" in some cases.

Write multiple lines to text file with '\n'

I have a program that iterates over all lines of a text file, adds spaces between the characters, and writes the output to the same file. However, if there are multiple lines in the input, I want the output to have separate lines as well. I tried:
let text = format!(r"{}\n", line); // Add newline character to each line (while iterating)
file.write_all(text.as_bytes()); // Write each line + newline
Here is an example input text file:
foo
bar
baz
And its output:
f o o\n b a r\n b a z
It seems that Rust treats "\n" as an escaped n character, but using r"\n" treats it as a string. How can I have Rust treat \n as a newline character to write multiple lines to a text file?
Note: I can include the rest of my code if you need it, let me know.
Edit: I am on Windows 7 64 bit
The problem is the 'r' in front of your string. Remove it and your program will print newlines instead of '\n'.
Also note that only most Unices use '\n' as newline. Windows uses "\r\n".

escaping characters for substitution into a PDF

Can anyone tell me the set of control characters for a PDF file, and how to escape them? I have a (non-deflated (inflated?)) PDF document that I would like to edit the text in, but I'm afraid of accidentally making some control sequence using parentheses and stuff.
Thanks.
Okay, I think I found it. On page 15 of the PDF 1.7 spec (PDF link), it appears that the only characters I need to worry about are the parentheses and the backslash.
Sequence | Meaning
---------------------------------------------
\n | LINE FEED (0Ah) (LF)
\r | CARRIAGE RETURN (0Dh) (CR)
\t | HORIZONTAL TAB (09h) (HT)
\b | BACKSPACE (08h) (BS)
\f | FORM FEED (FF)
\( | LEFT PARENTHESIS (28h)
\) | RIGHT PARENTHESIS (29h)
\\ | REVERSE SOLIDUS (5Ch) (Backslash)
\ddd | Character code ddd (octal)
Hopefully this was helpful to someone.
You likely already know this, but PDF files have an index at the end that contains byte offsets to everything in the document. If you edit the doc by hand, you must ensure that the new text you write has exactly the same number of characters as the original.
If you want to extract PDF page content and edit that, it's pretty straightforward. My CAM::PDF library lets you do it programmatically or via the command line:
use CAM::PDF;
my $pdf = CAM::PDF->new($filename);
my $page_content = $pdf->getPageContent($pagenum);
# ...
$pdf->setPageContent($pagenum, $page_content)l
$pdf->cleanoutput($out_filename);
or
getpdfpage.pl in.pdf 1 > page1.txt
setpdfpage.pl in.pdf page1.txt 1 out.pdf