How do I decrement all array indexes in a text file? - awk

Background
I have a text file that looks like the following:
$SomeText.element_[1]="MoreText[3]";\r"
$SomeText.element_[2]="MoreText[6]";\r"
$SomeText.element_[3]="MoreText[2]";\r"
$SomeText.element_[4]="MoreText[1]";\r"
$SomeText.element_[5]="MoreText[5]";\r"
This goes on for over a thousand lines. I want to do the following:
$SomeText.element_[0]="MoreText[3]";\r"
$SomeText.element_[1]="MoreText[6]";\r"
$SomeText.element_[2]="MoreText[2]";\r"
$SomeText.element_[3]="MoreText[1]";\r"
$SomeText.element_[4]="MoreText[5]";\r"
Each line of text in the file should have the left most index reduced by one, with the rest of the text unchanged.
Attempted Solutions
So far I have tried the following...but the issue for me is I do not know how to feed it back into the file properly:
Attempt 1
I tried a double cutting technique:
cat file.txt | cut -d '[' -f2 | cut -d ']' -f1 | xargs -I {} expr {} + 1
This properly outputs all of the indicies reduced by one to the command line.
Attempt 2
I tried using awk with a mix of sed, but this caused by machine to hang:
awk -F'[' '{printf("%d\n", $2-1)}' file.txt | xargs -I {} sed -i 's/\[\d+\]/{}/g' file.txt
Question
How to I properly decrement all of the array indexes in the file by one and properly write the decremented indexes into the right location of the text file?

A Perl one-liner makes this easy, overwriting the input file:
perl -pi -e 's/(\d+)/$1-1/e' your-file-name-here
(assuming the first number on each line is the index)

With simple awk you could try following, written and tested with shown samples.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) count++ substr($0,RSTART+RLENGTH)
}
' Input_file
OR in case your Input_file's count in between [..] is in any order then simply reduce 1 from them as follows.
awk '
match($0,/\[[^]]*/){
print substr($0,1,RSTART) substr($0,RSTART+1,RLENGTH)-1 substr($0,RSTART+RLENGTH)
}
' Input_file

With GNU sed and bash:
sed -E "s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e" file
Or, if it is possible that the lines contain ' character:
sed -E "
/\[[0-9]+]/{
s/'/'\\\''/g
s/([^[]*\[)([0-9]+)(].*)/printf '%s%d%s\n' '\1' \$((\2 - 1)) '\3'/e
}" file

Related

Convert multiple lines to a line separated by brackets and "|"

I have the following data in multiple lines:
1
2
3
4
5
6
7
8
9
10
I want to convert them to lines separated by "|" and "()":
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|10
I made a mistake. I'm sorry,I want to convert them to lines separated by "|" and "()":
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
What I have tried is:
seq 10 | sed -r 's/(.*)/(\1)/'|paste -sd"|"
What's the best unix one-liner to do that?
This might work for you (GNU sed):
sed 's/.*/(&)/;H;1h;$!d;x;s/\n/|/g' file
Surround each line by parens.
Append all lines to the hold space except for the first line which replaces the hold space.
Delete all lines except the last.
On the last line, swap to the hold space and replace all newlines by |'s.
N.B. When a line is deleted no further commands are invoked and the command cycle begins again. That is why the last two commands are only executed on the last line of the file.
Alternative:
sed -z 's/\n$//;s/.*/(&)/mg;y/\n/|/' file
With your shown samples please try following awk code. This should work in any version of awk.
awk -v OFS="|" '{val=(val?val OFS:"") "("$0")"} END{print val}' Input_file
Using GNU sed
$ sed -Ez ':a;s/([0-9]+)\n/(\1)|/;ta;s/\|$/\n/' input_file
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Here is another simple awk command:
awk 'NR>1 {printf "%s|", p} {p="(" $0 ")"} END {print p}' file
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Here it is:
sed -z 's/^/(/;s/\n/)|(/g;s/|($//' your_input
where -z allows you to treat the whole file as a single string with embedded \ns.
In detail, the sed script above consists of 3 commands separated by ;s:
s/^/(/ inserts a ( at the beginning of the whole file,
s/\n/)|(/g changes every \n to )|(;
s/|($// removes the trailing |( resulting from the \n at EOF, that is likely in your file since you are on linux.
With perl:
$ seq 10 | perl -pe 's/.*/($&)/; s/\n/|/ if !eof'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
s/.*/($&)/ to surround input lines with ()
s/\n/|/ if !eof will change newline to | except for the last input line.
Here's a solution with paste (just for fun):
$ seq 10 | paste -d'()' /dev/null - /dev/null | paste -sd'|'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)
Using any awk:
$ seq 10 | awk '{printf "%s(%s)", sep, $0; sep="|"} END{print ""}'
(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(10)

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5
sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc
You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file
awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.
If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'
You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt
tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing

How to grab the content of command "who" except 'pts/0'? [duplicate]

I have a file with three columns. I would like to delete the 3rd column(in-place editing). How can I do this with awk or sed?
123 abc 22.3
453 abg 56.7
1236 hjg 2.3
Desired output
123 abc
453 abg
1236 hjg
try this short thing:
awk '!($3="")' file
With GNU awk for inplace editing, \s/\S, and gensub() to delete
1) the FIRST field:
awk -i inplace '{sub(/^\S+\s*/,"")}1' file
or
awk -i inplace '{$0=gensub(/^\S+\s*/,"",1)}1' file
2) the LAST field:
awk -i inplace '{sub(/\s*\S+$/,"")}1' file
or
awk -i inplace '{$0=gensub(/\s*\S+$/,"",1)}1' file
3) the Nth field where N=3:
awk -i inplace '{$0=gensub(/\s*\S+/,"",3)}1' file
Without GNU awk you need a match()+substr() combo or multiple sub()s + vars to remove a middle field. See also Print all but the first three columns.
This might work for you (GNU sed):
sed -i -r 's/\S+//3' file
If you want to delete the white space before the 3rd field:
sed -i -r 's/(\s+)?\S+//3' file
It seems you could simply go with
awk '{print $1 " " $2}' file
This prints the two first fields of each line in your input file, separated with a space.
Try using cut... its fast and easy
First you have repeated spaces, you can squeeze those down to a single space between columns if thats what you want with tr -s ' '
If each column already has just one delimiter between it, you can use cut -d ' ' -f-2 to print fields (columns) <= 2.
for example if your data is in a file input.txt you can do one of the following:
cat input.txt | tr -s ' ' | cut -d ' ' -f-2
Or if you better reason about this problem by removing the 3rd column you can write the following
cat input.txt | tr -s ' ' | cut -d ' ' --complement -f3
cut is pretty powerful, you can also extract ranges of bytes, or characters, in addition to columns
excerpt from the man page on the syntax of how to specify the list range
Each LIST is made up of one range, or many ranges separated by commas.
Selected input is written in the same order that it is read, and is
written exactly once. Each range is one of:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte, character or field
-M from first to M'th (included) byte, character or field
so you also could have said you want specific columns 1 and 2 with...
cat input.txt | tr -s ' ' | cut -d ' ' -f1,2
Try this :
awk '$3="";1' file.txt > new_file && mv new_file file.txt
or
awk '{$3="";print}' file.txt > new_file && mv new_file file.txt
Try
awk '{$3=""; print $0}'
If you're open to a Perl solution...
perl -ane 'print "$F[0] $F[1]\n"' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the following perl code

Removing blank lines

I have a csv file in which every other line is blank. I have tried everything, nothing removes the lines. What should make it easier is that the the digits 44 appear in each valid line. Things I have tried:
grep -ir 44 file.csv
sed '/^$/d' <file.csv
cat -A file.csv
sed 's/^ *//; s/ *$//; /^$/d' <file.csv
egrep -v "^$" file.csv
awk 'NF' file.csv
grep '\S' file.csv
sed 's/^ *//; s/ *$//; /^$/d; /^\s*$/d' <file.csv
cat file.csv | tr -s \n
Decided I was imagining the blank lines, but import into Google Sheets and there they are still! Starting to question my sanity! Can anyone help?
sed -n -i '/44/p' file
-n means skip printing
-i inplace (overwrite same file)
- /44/p print lines where '44' exists
without '44' present
sed -i '/^\s*$/d' file
\s is matching whitespace, ^startofline, $endofline, d delete line
Use the -i option to replace the original file with the edited one.
sed -i '/^[ \t]*$/d' file.csv
Alternatively output to another file and rename it, which is doing the exactly what -i does.
sed '/^[[:blank:]]*$/d' file.csv > file.csv.out && mv file.csv.out file.csv
Given:
$ cat bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
You can remove blank lines with Perl:
$ perl -lne 'print unless /^\s*$/' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
awk:
$ awk 'NF>0' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
sed + tr:
$ cat bl.txt | tr '\t' ' ' | sed '/^ *$/d'
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Just sed:
$ sed '/^[[:space:]]*$/d' bl.txt
Line 1 (next line has a tab)
Line 2 (next has several space)
Line 3
Aside from the fact that your commands do not show that you capture their output in a new file to be used in place of the original, there's nothing wrong with them, EXCEPT that:
cat file.csv | tr -s \n
should be:
cat file.csv | tr -s '\n' # more efficient alternative: tr -s '\n' < file.csv
Otherwise, the shell eats the \ and all that tr sees is n.
Note, however, that the above only eliminates only truly empty lines, whereas some of your other commands also eliminate blank lines (empty or all-whitespace).
Also, the -i (for case-insensitive matching) in grep -ir 44 file.csv is pointless, and while using -r (for recursive searches) will not change the fact that only file.csv is searched, it will prepend the filename followed by : to each matching line.
If you have indeed captured the output in a new file and that file truly still has blank lines, the cat -A (cat -et on BSD-like platforms) you already mention in your question should show you if any unusual characters are present in the file, in the form of ^<char> sequences, such as ^M for \r chars.
If you like awk, this should do:
awk '/44/' file
It will only print lines that contains 44

tips with awk with changing parameters

i got several pieces of code that look like:
for ff in `seq 3 $nlpN`;
do
npc1[$ff]=`awk 'NR=='$ff' {print $1}' p_walls.raw`;
echo ${npc1[$ff]};
npc2[$ff]=`awk 'NR=='$ff' {print $2}' p_walls.raw`;
npc3[$ff]=`awk 'NR=='$ff' {print $3}' p_walls.raw`;
npRs[$ff]=`awk 'NR=='$ff' {print $4}' p_walls.raw`;
echo $ff
done
as You can see i'm invoking awk several times. Is there a faster way to do this, like invoking awk once and do the assignments with the changin parameters?
thanks a lot in advance!
input looks like:
...
3.76023 0.79528 0.307771 8729.82
3.76024 0.814664 0.307849 8650.2
3.76026 0.845679 0.307978 8802.97
3.76025 0.826293 0.307897 8690.43
3.76017 0.65959 0.30722 8936.07
...
im looking for sth like:
TY
That does look pretty inefficient. As written, awk is processing the input file in its entirety four times with every pass of the loop. I'm also pretty sure that cut is completely unnecessary unless you have the FS environment variable set to something strange. The following will replace the multiple awk runs with a single pass over the data file that stops after it finds the line. Then you can use cut to extract the individual fields.
for ff in `seq 3 $nlpN`
do
data=`awk 'NR=='$ff' { print $1, $2, $3, $4; exit }' p_walls.raw`
npc1[$ff]=`echo "$data" | cut -f1 -d ' '`
echo ${npc1[$ff]}
npc2[$ff]=`echo "$data" | cut -f2 -d ' '`
npc3[$ff]=`echo "$data" | cut -f3 -d ' '`
npRs[$ff]=`echo "$data" | cut -f4 -d ' '`
echo $ff
done
Note that I added an exit statement so that awk will exit after processing the line. This prevents it from reading the entire file on every pass. If all that you need to do is extract a single line from a file, then you might want to use sed instead since (IMHO) the script is easier to read and it seems to be a little faster on large files. The following sed expression is equivalent to the awk line:
data=`sed -n -e "$ff p" -e "$ff q" p_walls.raw`
The -n tells sed to only output from the lines that are selected by the script. In this case, the script, supplied as two -e parameters. Each is an address followed by processing command. Multiple commands are separated newlines in sed scripts but they can also be specified by multiple -e parameters with the same address. Putting this all together, the expression 42 p tells sed to select line 42 and run the p command which prints the selected pattern space (the 42nd line). The 42 q command tells the utility to exit after processing the 42nd line. So, our sed expression reads the first $ffth lines from "p_walls.raw", prints the $ffth one and exits.
Run awk a single time and process the output on each iteration separately.
awk "(NR > 3 && NR <= $nlpN)"' { print NR, $1, $2, $3, $4 }' p_walls.raw |
while read ff c1 c2 c3 Rs
do
npc1[$ff]=$c1
echo ${npc1[$ff]};
npc2[$ff]=$c2
npc3[$ff]=$c3
npRs[$ff]=$Rs
echo $ff
done