Extract each word immediately preceded by an asterisk - awk

I'm a computer science student and they asked us to extract a word from the text that results from the lpoptions -l command using the sed command so
PageSize/Page Size: Letter *A4 11x17 A3 A5 B5 Env10 EnvC5 EnvDL EnvISOB5 EnvMonarch Executive Legal
Resolution/Resolution: *default 150x150dpi 300x300dpi 600x600dpi 1200x600dpi 1200x1200dpi 2400x600dpi 2400x1200dpi 2400x2400dpi
InputSlot/Media Source: Default Tray1 *Tray2 Tray3 Manual
Duplex/Double-Sided Printing: DuplexNoTumble DuplexTumble *None
PreFilter/GhostScript pre-filtering: EmbedFonts Level1 Level2 *No
I need to get only the words preceded by a *, but I can't find how to do it with sed, I already did it using cut which is easier but I want to know it with sed.
I expect :
A4
default
Tray2
None
No
and I had tried :
sed -E 's/.*\*=(\S+).*/\1/'
but it didn't do anything.

With any POSIX sed (assuming there is always at least one non-space character following the asterisk):
sed 's/.*\*\([^[:space:]]*\).*/\1/'
With GNU sed it'd be:
sed -E 's/.*\*(\S+).*/\1/'
Given your sample they both output:
A4
default
Tray2
None
No

Could you please try following, in case you are ok with awk solution.
awk '{for(i=1;i<=NF;i++){if($i~/^\*/){sub(/^\*/,"",$i);print $i}}}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
for(i=1;i<=NF;i++){ ##Starting for loop here to loop through each field of currnet line.
if($i~/^\*/){ ##Checking condition if line starts from * then do following.
sub(/^\*/,"",$i) ##Substituting starting * with NULL in current field.
print $i ##Printing current field value here.
}
}
}
' Input_file ##Mentioning Input_file name here.

Related

right pad regex with spaces using sed or awk

I have a file with two fields separated with :, both fields are varying length, second field can have all sort of characters(user input). I want the first field to be right padded with spaces to fixed length of 15 characters, for first field I have a working regex #.[A-Z0-9]{4,12}.
sample:
#ABC123:"wild things here"
#7X3Z:"":":#":";:*:-user input:""
#99999X999:"also, imagine: unicode, yay!"
desired output:
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
There is plenty of examples how to zero pad a number, but surprisingly not a lot about general padding a regex or a field, any help using (preferably) sed or awk?
Here is another awk solution that would work with any version of awk:
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
With perl:
$ perl -pe 's/^[^:]+/sprintf("%-15s",$&)/e' ip.txt
#ABC123 :"wild things here"
#7X3Z :"":":#":";:*:-user input:""
#99999X999 :"also, imagine: unicode, yay!"
The e flag allows you to use Perl code in replacement section. $& will have the matched portion which gets formatted by sprintf.
With awk:
# should work with any awk
awk 'match($0, /^[^:]+/){printf "%-15s%s\n", substr($0,1,RLENGTH), substr($0,RLENGTH+1)}'
# can be simplified with GNU awk
awk 'match($0, /^[^:]+/, m){printf "%-15s%s\n", m[0], substr($0,RLENGTH+1)}'
# or
awk 'match($0, /^([^:]+)(.+)/, m){printf "%-15s%s\n", m[1], m[2]}'
substr($0,1,RLENGTH) or m[0] will give contents of first field. I have used 1 instead of the usual RSTART here since we are matching start of line
substr($0,RLENGTH+1) will give rest of the line contents (i.e. from the first :)
See awk manual: String-Manipulation for details about match function.
Adding one more way of adding spaces to 1st columns here, though anubhava's answer with sprintf is better answer, adding is as an option here. Here I have created a variable named spaces, where one could define number of spaces which we need to add to it.
awk -v spaces="15" 'BEGIN{FS=OFS=":"} {sub(/:/,sprintf("%"spaces-length($1)"s",":"))} 1' Input_file
Explanation: Adding detailed explanation for above.
awk -v spaces="15" ' ##Starting awk program from here, setting spaces to 15 here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS=":" ##Setting FS and OFS as colon here.
}
{
sub(/:/,sprintf("%"spaces-length($1)"s",":")) ##Substituting colon first occurrence with spaces(left padding of spaces) along with colon here.
}
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name here.
i believe anbhava's solution of
awk 'BEGIN {FS=OFS=":"} {$1 = sprintf("%-15s", $1)} 1' file
can be even further simplified as :
awk -F: 'BEGIN{FS=OFS} $1=sprintf("%-15s",$1)'
the { } and final 1 are optional

How can I extract using sed or awk between newlines after a specific pattern?

I like to check if there is other alternatives where I can print using other bash commands to get the range of IPs under #Hiko other than the below sed, tail and head which I actually figured out to get what I needed from my hosts file.
I'm just curious and keen in learning more on bash, hope I could gain more knowledge from the community.
:D
$ sed -n '/#Hiko/,/#Pico/p' /etc/hosts | tail -n +3 | head -n -2
/etc/hosts
#Tito
192.168.1.21
192.168.1.119
#Hiko
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
#Pico
192.168.1.23
192.168.1.93
192.168.1.121
1st solution: With shown samples could you please try following. Written and tested in GNU awk.
awk -v RS= '/#Pico/{exit} /#Hiko/{found=1;next} found' Input_file
Explanation:
awk -v RS= ' ##Starting awk program from here.
/#Pico/{ ##Checking condition if line has #Pico then do following.
exit ##exiting from program.
}
/#Hiko/{ ##Checking condition if line has #Hiko is present in line.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
found ##Checking condition if found is SET then print the line.
' Input_file ##mentioning Input_file name here.
2nd solution: Without using RS function try following.
awk '/#Pico/{exit} /#Hiko/{found=1;next} NF && found' Input_file
3rd solution: You could look for record #Hiko and then could print its next record and come out with shown samples.
awk -v RS= '/#Hiko/{found=1;next} found{print;exit}' Input_file
NOTE: These all solutions above check if string #Hiko or #Pico are present in anywhere in line, in case you want to look exact string then change above only /#Hiko/ and /#Pico/ part to /^#Hiko$/ and /^#Pico$/ respectively.
With sed (checked with GNU sed, syntax might differ for other implementations)
$ sed -n '/#Hiko/{n; :a n; /^$/q; p; ba}' /etc/hosts
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
-n turn off automatic printing of pattern space
/#Hiko/ if line contains #Hiko
n get next line (assuming there's always an empty line)
:a label a
n get next line (using n will overwrite any previous content in the pattern space, so only single line content is present in this case)
/^$/q if the current line is empty, quit
p print the current line
ba branch to label a
You can use
awk -v RS= '/^#Hiko$/{getline;print;exit}' file
awk -v RS= '$0 == "#Hiko"{getline;print;exit}' file
Which means:
RS= - make awk read the file paragraph by paragraph
/^#Hiko$/ or '$0 == "#Hiko" - finds a paragraph that is equal to #Hiko
{getline;print;exit} - gets the next paragraph, prints it and exits.
See the online demo.
You may use:
awk -v RS= 'p && NR == p + 1; $1 == "#Hiko" {p = NR}' /etc/hosts
192.168.1.243
192.168.1.125
192.168.1.94
192.168.1.24
192.168.1.242
This might work for you (GNU sed):
sed -n '/^#/h;G;/^[0-9].*\n#Hiko/P' file
Copy the header to the hold buffer.
Append the hold buffer to each line.
If the line begins with a digit and contains the required header, print the first line in the pattern space.

Remove bracket from a particular string

If some text is like
cell (ABC)
(A1)
(A2)
function (A1.A2)
I want output as
cell ABC
A1
A2
function (A1.A2)
I want to remove bracket from each line of file except the present in function line.
Using code
sed 's/[()]//g' file
Removes bracket from each line. How can I modify the above code to get desired output.
You can add a jump out condition to your sed command:
sed '/^function /b;s/[()]//g' file
Or, condition the substitute on not matching a function:
sed '/^function /!s/[()]//g' file
Could you please try following. Written and tested with shown samples in GNU awk.
awk '!/function/{gsub(/[()]/,"")} 1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
!/function/{ ##Checking condition if line does not have function in it then do following.
gsub(/[()]/,"") ##Globally substituting ( OR ) with null in current line.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.

Delete third-to-last line of file using sed or awk

I have several text files with different row numbers and I have to delete in all of them the third-to-last line . Here is a sample file:
bear
horse
window
potato
berry
cup
Expected result for this file:
bear
horse
window
berry
cup
Can we delete the third-to-last line of a file:
a. not based on any string/pattern.
b. based only on a condition that it has to be the third-to-last line
I have problem on how to index my files beginning from the last line. I have tried this from another SO question for the second-to-last line:
> sed -i 'N;$!P;D' output1.txt
With tac + awk solution, could you please try following. Just set line variable of awk to line(from bottom) whichever you want to skip.
tac Input_file | awk -v line="3" 'line==FNR{next} 1' | tac
Explanation: Using tac will read the Input_file reverse(from bottom line to first line), passing its output to awk command and then checking condition if line is equal to line(which we want to skip) then don't print that line, 1 will print other lines.
2nd solution: With awk + wc solution, kindly try following.
awk -v lines="$(wc -l < Input_file)" -v skipLine="3" 'FNR!=(lines-skipLine+1)' Input_file
Explanation: Starting awk program here and creating a variable lines which has total number of lines present in Input_file in it. variable skipLine has that line number which we want to skip from bottom of Input_file. Then in main program checking condition if current line is NOT equal to lines-skipLine+1 then printing the lines.
3rd solution: Adding solution as per Ed sir's comment here.
awk -v line=3 '{a[NR]=$0} END{for (i=1;i<=NR;i++) if (i != (NR-line)) print a[i]}' Input_file
Explanation: Adding detailed explanation for 3rd solution.
awk -v line=3 ' ##Starting awk program from here, setting awk variable line to 3(line which OP wants to skip from bottom)
{
a[NR]=$0 ##Creating array a with index of NR and value is current line.
}
END{ ##Starting END block of this program from here.
for(i=1;i<=NR;i++){ ##Starting for loop till value of NR here.
if(i != (NR-line)){ ##Checking condition if i is NOT equal to NR-line then do following.
print a[i] ##Printing a with index i here.
}
}
}
' Input_file ##Mentioning Input_file name here.
With ed
ed -s ip.txt <<< $'$-2d\nw'
# thanks Shawn for a more portable solution
printf '%s\n' '$-2d' w | ed -s ip.txt
This will do in-place editing. $ refers to last line and you can specify a negative relative value. So, $-2 will refer to last but second line. w command will then write the changes.
See ed: Line addressing for more details.
This might work for you (GNU sed):
sed '1N;N;$!P;D' file
Open a window of 3 lines in the file then print/delete the first line of the window until the end of the file.
At the end of the file, do not print the first line in the window i.e. the 3rd line from the end of the file. Instead, delete it, and repeat the sed cycle. This will try to append a line after the end of file, which will cause sed to bail out, printing the remaining lines in the window.
A generic solution for n lines back (where n is 2 or more lines from the end of the file), is:
sed ':a;N:s/[^\n]*/&/3;Ta;$!P;D' file
Of course you could use:
tac file | sed 3d | tac
But then you would be reading the file 3 times.
To delete the 3rd-to-last line of a file, you can use head and tail:
{ head -n -3 file; tail -2 file; }
In case of a large input file, when perfomance matters, this is very fast, because it doesn't read and write line by line. Also, do not modify the semicolons and the spaces next to the brackets, see about commands grouping.
Or use sed with tac:
tac file | sed '3d' | tac
Or use awk with tac:
tac file | awk 'NR!=3' | tac

How to delete the last nth characters from the nth line with sed?

Ubuntu 16.04
Bash 4.4
I want to delete the last nth characters from the nth line. Here is a simple file and the last 4 characters on each line is the number 4.
root#0o0o0o0o0 ~/.ssh # cat remove.txt
00000000004444
55555555555555555555555554444
222222222222222224444
000033334444
111114444
To remove the last 4 characters from each line I can execute sed -i 's/....$//' remove.txt
root#0o0o0o0o0 ~/.ssh # sed -i 's/....$//' remove.txt
root#0o0o0o0o0 ~/.ssh # cat remove.txt
0000000000
5555555555555555555555555
22222222222222222
00003333
11111
But what if I wanted to remove the last 4 characters from the 4th line, removing the 3's so the file would look like this:
0000000000
5555555555555555555555555
22222222222222222
0000
11111
With GNU sed:
sed -i 's/....$//; 4s/....$//' file
4s/....$// limits search and replace to 4th row.
See: man sed and info sed
Could you please try following in awk. Written and tested in GNU awk.
awk -v line="4" -v nofChar="4" '
{
sub(".{"nofChar"}$","")
}
FNR==line{
sub(".{"nofChar"}$","")
}
1
' Input_file
Detailed explanation:
awk -v line="4" -v nofChar="4" ' ##Starting awk program and setting line variable value, nofChar variable value here.
{
sub(".{"nofChar"}$","") ##Substituting last nofChar number of characters at last of the each line here.
}
FNR==line{ ##Checking if this is same line which OP wants to do 2nd time substitution.
sub(".{"nofChar"}$","") ##Substituting last nofChar number of characters at last of the each line here.
}
1 ##Mentioning 1 will print edited/non-edited line.
' Input_file ##Mentioning Input_file name here.
2nd solution: In case number of characters in all lines are different than specific line then try following. One has to change variable named nofUsualChar value.
awk -v line="4" -v nofChar="4" -v nofUsualChar="4" '
{
sub(".{"nofUsualChar"}$","")
}
FNR==line{
sub(".{"nofChar"}$","")
}
1
' Input_file