Remove bracket from a particular string - awk

If some text is like
cell (ABC)
(A1)
(A2)
function (A1.A2)
I want output as
cell ABC
A1
A2
function (A1.A2)
I want to remove bracket from each line of file except the present in function line.
Using code
sed 's/[()]//g' file
Removes bracket from each line. How can I modify the above code to get desired output.

You can add a jump out condition to your sed command:
sed '/^function /b;s/[()]//g' file
Or, condition the substitute on not matching a function:
sed '/^function /!s/[()]//g' file

Could you please try following. Written and tested with shown samples in GNU awk.
awk '!/function/{gsub(/[()]/,"")} 1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
!/function/{ ##Checking condition if line does not have function in it then do following.
gsub(/[()]/,"") ##Globally substituting ( OR ) with null in current line.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.

Related

Select first and last column using regex or linux command

I have [a text file][1] that looks something like this...
("oo" (set CANDRA-E-O 0) "ऊ")
("o" (set CANDRA-E-O ?ऑ) "ओ")
("oa" "ऑ")
("au" "औ")
I need to extract the first and last columns like:
"oo", "ऊ"
"o", "ओ"
"oa", "ऑ"
"au", "औ"
I have managed to extract the first column. But not sure how to select the second column.
\ {2}\(\".+\"\
With your shown samples/attempts, please try following awk command. Written and tested in GNU awk.
awk -v FPAT='"[^"]*"' '{for(i=1;i<=NF;i++){printf("%s%s",$i,i==NF?ORS:OFS)}}' Input_file
Explanation: Simple explanation would be, setting FPAT to '"[^"]*"' which means setting field separator as regex form, from " to till next occurrence of " comes. Then in main program going through all fields of each line and printing them, when its last field of line then printing new line else printing spaces(to get all one line values into a single line).
With this awk solution:
awk -v OFS="," '{sub(/^\(/,"",$1);sub(/\)$/,"",$NF);print $1, $NF}' file
"oo","ऊ"
"o","ओ"
"oa","ऑ"
"au","औ"
with first sub() we remove the parenthesis ( of the first field.
Idem second sub() for last parenthesis ) of the last field.
we print the two fields separated by comma: OFS=","

Extract each word immediately preceded by an asterisk

I'm a computer science student and they asked us to extract a word from the text that results from the lpoptions -l command using the sed command so
PageSize/Page Size: Letter *A4 11x17 A3 A5 B5 Env10 EnvC5 EnvDL EnvISOB5 EnvMonarch Executive Legal
Resolution/Resolution: *default 150x150dpi 300x300dpi 600x600dpi 1200x600dpi 1200x1200dpi 2400x600dpi 2400x1200dpi 2400x2400dpi
InputSlot/Media Source: Default Tray1 *Tray2 Tray3 Manual
Duplex/Double-Sided Printing: DuplexNoTumble DuplexTumble *None
PreFilter/GhostScript pre-filtering: EmbedFonts Level1 Level2 *No
I need to get only the words preceded by a *, but I can't find how to do it with sed, I already did it using cut which is easier but I want to know it with sed.
I expect :
A4
default
Tray2
None
No
and I had tried :
sed -E 's/.*\*=(\S+).*/\1/'
but it didn't do anything.
With any POSIX sed (assuming there is always at least one non-space character following the asterisk):
sed 's/.*\*\([^[:space:]]*\).*/\1/'
With GNU sed it'd be:
sed -E 's/.*\*(\S+).*/\1/'
Given your sample they both output:
A4
default
Tray2
None
No
Could you please try following, in case you are ok with awk solution.
awk '{for(i=1;i<=NF;i++){if($i~/^\*/){sub(/^\*/,"",$i);print $i}}}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
for(i=1;i<=NF;i++){ ##Starting for loop here to loop through each field of currnet line.
if($i~/^\*/){ ##Checking condition if line starts from * then do following.
sub(/^\*/,"",$i) ##Substituting starting * with NULL in current field.
print $i ##Printing current field value here.
}
}
}
' Input_file ##Mentioning Input_file name here.

How to extract data in such a pattern using grep or awk?

I have multiple instances of the following pattern in my document:
Dipole Moment: [D]
X: 1.5279 Y: 0.1415 Z: 0.1694 Total: 1.5438
I want to extract the total dipole moment, so 1.5438. How can I pull this off?
When I throw in grep "Dipole Moment: [D]" filename, I don't get the line after. I am new to these command line interfaces. Any help you can provide would be greatly appreciated.
Could you please try following. Written and tested with shown samples in GNU awk.
awk '/Dipole Moment: \[D\]/{found=1;next} found{print $NF;found=""}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/Dipole Moment: \[D\]/{ ##Checking if line contains Dipole Moment: \[D\] escaped [ and ] here.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
found{ ##Checking condition if found is NOT NULL then do following.
print $NF ##Printing last field of current line here.
found="" ##Nullifying found here.
}
' Input_file ##Mentioning Input_file name here.
Sed alternative:
sed -rn '/^Dipole/{n;s/(^[[:space:]]{5}.*[[:space:]]{5})(.*)(([[:space:]]{5}.*+[:][[:space:]]{5}.*){3})/\2/p}' file
Search for the line beginning with "Dipole" then read the next line. Split this line into three sections based on regular expressions and substitute the line for the second section only, printing the result.

Delete third-to-last line of file using sed or awk

I have several text files with different row numbers and I have to delete in all of them the third-to-last line . Here is a sample file:
bear
horse
window
potato
berry
cup
Expected result for this file:
bear
horse
window
berry
cup
Can we delete the third-to-last line of a file:
a. not based on any string/pattern.
b. based only on a condition that it has to be the third-to-last line
I have problem on how to index my files beginning from the last line. I have tried this from another SO question for the second-to-last line:
> sed -i 'N;$!P;D' output1.txt
With tac + awk solution, could you please try following. Just set line variable of awk to line(from bottom) whichever you want to skip.
tac Input_file | awk -v line="3" 'line==FNR{next} 1' | tac
Explanation: Using tac will read the Input_file reverse(from bottom line to first line), passing its output to awk command and then checking condition if line is equal to line(which we want to skip) then don't print that line, 1 will print other lines.
2nd solution: With awk + wc solution, kindly try following.
awk -v lines="$(wc -l < Input_file)" -v skipLine="3" 'FNR!=(lines-skipLine+1)' Input_file
Explanation: Starting awk program here and creating a variable lines which has total number of lines present in Input_file in it. variable skipLine has that line number which we want to skip from bottom of Input_file. Then in main program checking condition if current line is NOT equal to lines-skipLine+1 then printing the lines.
3rd solution: Adding solution as per Ed sir's comment here.
awk -v line=3 '{a[NR]=$0} END{for (i=1;i<=NR;i++) if (i != (NR-line)) print a[i]}' Input_file
Explanation: Adding detailed explanation for 3rd solution.
awk -v line=3 ' ##Starting awk program from here, setting awk variable line to 3(line which OP wants to skip from bottom)
{
a[NR]=$0 ##Creating array a with index of NR and value is current line.
}
END{ ##Starting END block of this program from here.
for(i=1;i<=NR;i++){ ##Starting for loop till value of NR here.
if(i != (NR-line)){ ##Checking condition if i is NOT equal to NR-line then do following.
print a[i] ##Printing a with index i here.
}
}
}
' Input_file ##Mentioning Input_file name here.
With ed
ed -s ip.txt <<< $'$-2d\nw'
# thanks Shawn for a more portable solution
printf '%s\n' '$-2d' w | ed -s ip.txt
This will do in-place editing. $ refers to last line and you can specify a negative relative value. So, $-2 will refer to last but second line. w command will then write the changes.
See ed: Line addressing for more details.
This might work for you (GNU sed):
sed '1N;N;$!P;D' file
Open a window of 3 lines in the file then print/delete the first line of the window until the end of the file.
At the end of the file, do not print the first line in the window i.e. the 3rd line from the end of the file. Instead, delete it, and repeat the sed cycle. This will try to append a line after the end of file, which will cause sed to bail out, printing the remaining lines in the window.
A generic solution for n lines back (where n is 2 or more lines from the end of the file), is:
sed ':a;N:s/[^\n]*/&/3;Ta;$!P;D' file
Of course you could use:
tac file | sed 3d | tac
But then you would be reading the file 3 times.
To delete the 3rd-to-last line of a file, you can use head and tail:
{ head -n -3 file; tail -2 file; }
In case of a large input file, when perfomance matters, this is very fast, because it doesn't read and write line by line. Also, do not modify the semicolons and the spaces next to the brackets, see about commands grouping.
Or use sed with tac:
tac file | sed '3d' | tac
Or use awk with tac:
tac file | awk 'NR!=3' | tac

AIX/KSH Extract string from a comma seperated line

I want to extract the part "virtual_eth_adapters" from the following comma seperated line:
lpar_io_pool_ids=none,max_virtual_slots=300,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1","virtual_scsi_adapters=166/client/1/ibm/166/0,266/client/2/ibm/266/0",virtual_eth_adapters=116/0/263,proc_mode=shared,min_proc_units=0.5,desired_proc_units=2.0,max_proc_units=8.0
Im using AIX with ksh.
I found a workaround with awk and the -F flag to seperate the string with a delimiter and then printing the item ID. But if the input string changes the id may differ...
1st solution: Could you please try following in case you want to print string virtual_eth_adapters too in output.
awk '
match($0,/virtual_eth_adapters[^,]*/){
print substr($0,RSTART,RLENGTH)
}
' Input_file
Output will be as follows.
virtual_eth_adapters=116/0/263
2nd solution: In case you want to print only value for String virtual_eth_adapters then try following.
awk '
match($0,/virtual_eth_adapters[^,]*/){
print substr($0,RSTART+21,RLENGTH-21)
}
' Input_file
Output will be as follows.
116/0/263
Explanation: Adding explanation for code.
awk ' ##Starting awk program here.
match($0,/virtual_eth_adapters[^,]*/){ ##Using match function of awk here, to match from string virtual_eth_adapters till first occurrence of comma(,)
print substr($0,RSTART,RLENGTH) ##Printing sub-string whose starting value is RSTART and till value of RLENGTH, where RSTART and RLENGTH variables will set once a regex found by above line.
}
' Input_file ##Mentioning Input_file name here.
I do use these approach to get data out in middle of lines.
awk -F'virtual_eth_adapters=' 'NF>1{split($2,a,",");print a[1]}' file
116/0/263
Its short and easy to learn. (no counting or regex needed)
-F'virtual_eth_adapters=' split the line by virtual_eth_adapters=
NF>1 if there are more than one field (line contains virtual_eth_adapters=)
split($2,a,",") split last part of line in to array a separated by ,
print a[1] print first part of array a
And one more solution (assuming the position of the string)
awk -F\, '{print $7}'
If you need only the value try this:
awk -F\, '{print $7}'|awk -F\= '{print $2}'
Also is possible to get the value on this way:
awk -F\, '{split($7,a,"=");print a[2]}'