Unix cut in c shell - cut

in file ~/x,
--- //zep/arod/jo/new/ded/main/changes 2013-05-13 17:14:34.000000000 -0700
--- //zep/arod/jo/new/ded/main/lib/soph/tool.py 2013-05-16 14:14:34.000000000 -0700
--- //zep/arod/jo/new/ded/main/lib/soph/pomp.py 2013-05-16 14:14:34.000000000 -0700
in c shell,
set F=`grep '^---' ~/x | cut -d/ -f7-99 | cut and somehow cut number`
then, ls $F should give
ded/main/changes
ded/main/lib/soph/tool.py
ded/main/lib/soph/pomp.py
I dont quite understand the -f tag and not sure how to cut the timestamp part
any suggestions?

-f7-99 means "include fields 7 through 99" (which in this case, they probably just meant -f7- which would give all fields 7 and up).
cut divides each line up into fields, based on the divider (which is what -d/ is specifying - the divider in that case is the / character). It then returns the fields that you ask it for (in your example, 7 through 99).
Your second cut command could probably be cut -d' ' -f1 which would use a divider of spaces and only give you the first field (in other words, everything before the first space, which would be just the path).

Related

Need solution for break line issue in string

I have below string which has enter character coming randomely and fields are separated by ~$~ and end with ##&.
Please help me to merge broken line into one.
In below string enter character is occured in address field (4/79A)
-------Sting----------
23510053~$~ABC~$~4313708~$~19072017~$~XYZ~$~CHINNUSAMY~$~~$~R~$~~$~~$~~$~42~$~~$~~$~~$~~$~28022017~$~
4/79A PQR Marg, Mumbai 4000001~$~TN~$~637301~$~Owns~$~RAT~$~31102015~$~12345~$~##&
Thanks in advance.
Rupesh
Seems to be a (more or less) duplicate of https://stackoverflow.com/a/802439/3595749
Note, you should ask to your client to remove the CRLF signs (rather than aplying the code below).
Nevertheless, try this:
cat inputfile | tr -d '\n' | sed 's/##&/##\&\n/g' >outputfile
Explanation:
tr is to remove the carriage return,
sed is to add it again (only when ##& is encountred). s/##&/##\&\n/g is to substitute "##&" by "##&\n" (I add a carriage return and "&" must be escaped). This applies globally (the "g" letter at the end).
Note, depending of the source (Unix or Windows), "\n" must be replaced by "\r\n" in some cases.

How to remove a specific pattern of junk values from a file using awk or sed?

I have two types of pattern in my xml file which I want to remove without disturbing any other meaningful patterns.
testname="#TEST-Loop${c}- 05030502956 #TEST - verify that the Handler returns an error indicating â~#~\call barredâ~#~]." enabled="true">
I want to change it to
testname="#TEST-Loop${c}- 05030502956 #TEST - verify that the Handler returns an error indicating call barred." enabled="true">
I tried below code but it didnt worked
awk '{if(match($0,/#TEST.*" enabled="true">$/))
gsub(/â~#~\\/,"");
gsub(/â~#~\]/,"");
print}' $file >> tmp.jmx && mv tmp.jmx $file
The pattern you are attempting to replace looks like a mangled UTF-8 character viewed in some legacy 8-bit encoding. Because you don't specify which encoding that is, we have to do a fair amount of guesswork.
You are asking about Unix tools, so this answer assumes that you are using some U*x derivative or have access to similar tools on your local box (Cygwin?)
To find the actual bytes in the string you want to replace, you can do something like
bash$ grep -o '...~#~...' -m1 "$file" |
> od -Ax -tx1o1
0000000 67 20 e2 7e 40 7e 5c 63 61 0a
147 040 342 176 100 176 134 143 141 012
000000a
I use od for portability reasons; you might prefer hexdump or xxd or some other tool. The output includes both hex and octal, as octal is preferred in Awk but hex is ubiquitous in programming otherwise. I keep a couple of characters of context around the match in case â would in fact be stored in a multibyte encoding in your sample, but here, in this somewhat speculative example, it turns out it is represented by the single byte 0xE2 (octal 342). (This would identify your terminal encoding as Latin-1 or some close relative; maybe one of the CP125x Windows encodings.)
Armed with this information, we can proceed with
awk '{ gsub(/\342~#~./, "") }1' "$file"
to replace the pesky character sequence, or perhaps
sed $'s/\xe2~#~.//' "$file"
which assumes your shell is Bash or some near-compatible which allows you to use C-style strings $'...' -- alternatively, if you know your sed dialect supports a particular notation for unprintable characters, you can use that, but that's even less portable.
(If your sed supports the -i option, or your Awk supports --inline, you can replace the file in-place, i.e. have the script replace the file with a modified version without the need for redirection or temporary files. Again, this has portability issues.)
I want to emphasize that we cannot guess your encoding so your question should ideally include this information. See further the Stack Overflow character-encoding tag wiki for guidance on what to include in a question like this.

Grep for Multiple instances of string between a substring and a character?

Can you please tell me how to Grep for every instance of a substring that occurs multiple times on multiple lines within a file?
I've looked at
https://unix.stackexchange.com/questions/131399/extract-value-between-two-search-patterns-on-same-line
and How to use sed/grep to extract text between two words?
But my problem is slightly different - each substring will be immediately preceded by the string: name"> and will be terminated be a < character immediately after the last character of the substring I want.
So one line might be
<"name">Bob<125><adje></name><"name">Dave<123><adfe></name><"name">Fred<125><adfe></name>
And I would like the output to be:
Bob
Dave
Fred
Although awk is not the best tool for xml processing, it will help if your xml structure and data simple enough.
$ awk -F"[<>]" '{for(i=1;i<NF;i++) if($i=="\"name\"") print $(++i)}' file
Bob
Dave
Fred
I doubt that the tag is <"name"> though. If it's <name>, without the quotes change the condition in the script to $i=="name"
gawk
awk -vRS='<"name">|<' '/^[A-Z]/' file
Bob
Dave
Fred

Remove all occurrences of a list of words vim

Having a document whose first line is foo,bar,baz,qux,quux, is there a way to store these words in a variable as a list ['foo','bar','baz','qux','quux']and remove all their occurrences in a document with vim?
Like a command :removeall in visual mode highlighting the list:
foo,bar,baz,qux,quux
hello foo how are you
doing foo bar baz qux
good quux
will change the text to:
hello how are you
doing good
A safer way is to write a function, check each part of your "list", if there is something needs to be escaped. then do the substitution (removing). A dirty & quick way to do it with your input is with this mapping:
nnoremap <leader>R :s/,/\|/g<cr>dd:%s/\v<c-r>"<c-h>//g<cr>
then in Normal mode, when you go to the line, which contains deletion parts and must be CSV format, press <leader>R you will get expected output.
The substitution would fail if that line has regex special chars, like /, *, . or \ etc.
Something like this one liner should work:
:for f in split(getline("."), ",") | execute "%s/" . f | endfor | 0d
Note that you'll end up with a lot of trailing spaces.
edit
This version of the command above takes care of those pesky trailing spaces (but not the one on line 2 of your sample text):
:for f in split(getline("."), ",") | execute "%s/ *" . f | endfor | 0d
Result:
hello how are you
doing
good

escaping characters for substitution into a PDF

Can anyone tell me the set of control characters for a PDF file, and how to escape them? I have a (non-deflated (inflated?)) PDF document that I would like to edit the text in, but I'm afraid of accidentally making some control sequence using parentheses and stuff.
Thanks.
Okay, I think I found it. On page 15 of the PDF 1.7 spec (PDF link), it appears that the only characters I need to worry about are the parentheses and the backslash.
Sequence | Meaning
---------------------------------------------
\n | LINE FEED (0Ah) (LF)
\r | CARRIAGE RETURN (0Dh) (CR)
\t | HORIZONTAL TAB (09h) (HT)
\b | BACKSPACE (08h) (BS)
\f | FORM FEED (FF)
\( | LEFT PARENTHESIS (28h)
\) | RIGHT PARENTHESIS (29h)
\\ | REVERSE SOLIDUS (5Ch) (Backslash)
\ddd | Character code ddd (octal)
Hopefully this was helpful to someone.
You likely already know this, but PDF files have an index at the end that contains byte offsets to everything in the document. If you edit the doc by hand, you must ensure that the new text you write has exactly the same number of characters as the original.
If you want to extract PDF page content and edit that, it's pretty straightforward. My CAM::PDF library lets you do it programmatically or via the command line:
use CAM::PDF;
my $pdf = CAM::PDF->new($filename);
my $page_content = $pdf->getPageContent($pagenum);
# ...
$pdf->setPageContent($pagenum, $page_content)l
$pdf->cleanoutput($out_filename);
or
getpdfpage.pl in.pdf 1 > page1.txt
setpdfpage.pl in.pdf page1.txt 1 out.pdf