AWK remove leading and trailing whitespaces from fields - awk

I need to remove leading and trailing whitespace characters from an input like this below using awk:
27 mA; 25 mA ; 24 mA ; 22 mA;
Required output:
27 mA;25 mA;24 mA;22 mA;
What I've tried:
awk -F";" '{$1=$1}1': doesn't remove the whitespaces, but removes ;s
awk -F";" '{gsub(/^[ \t]+|[ \t]+$/,""); print;}': doesn't remove the whitespaces
How can I modify these commands above to remove all leading and trailing whitespace (0x20) characters?
Update
Input might contain leading whitespace(s) at the first field:
27 mA; 25 mA ; 24 mA ; 22 mA;

With your shown samples, please try following awk code. Written and tested with GNU awk, should work in any awk.
awk -F'[[:space:]]*;[[:space:]]*' -v OFS=";" '{sub(/^[[:space:]]+/,"");$1=$1} 1' Input_file
OR
awk -F'[[:blank:]]*;[[:blank:]]*' -v OFS=";" '{sub(/^[[:blank:]]+/,"");$1=$1} 1' Input_file
Explanation: Simple explanation would be, making field separator for each line as spaces(0 or more occurrences) followed by ; which is further followed by 0 or more occurrences of spaces. Setting OFS as ;. In main program, re-assigning 1st field, then simply printing line.
2nd solution: As per #jhnc nice suggestion in comments, you could try with your shown samples.
awk '{sub(/^[[:space:]]+/,"");gsub(/ *; */,";")} 1' Input_file

A simple sed:
sed -E 's/[[:blank:]]*;[[:blank:]]*/;/g; s/^[[:blank:]]+|[[:blank:]]+$//' file
27 mA;25 mA;24 mA;22 mA;
or this awk:
awk '{gsub(/[[:blank:]]*;[[:blank:]]*/, ";"); gsub(/^[[:blank:]]+|[[:blank:]]+$/, "")} 1' file

Related

removing lines with special characters in awk

I have a text file like this:
VAREAKAVVLRDRKSTRLN 2888
ACP*VRWPIYTACGP 292
RDRKSTRLNSSHVVTSRMP 114
VAREA*KAVVLRDRRAHV*T 73
in the 1st column in some rows there is a "*". I want to remove all the lines with that '*'. here is the expected output:
expected output:
VAREAKAVVLRDRKSTRLN 2888
RDRKSTRLNSSHVVTSRMP 114
to do so, I am using this code:
awk -F "\t" '{ if(($1 == '*')) { print $1 "," $2} }' infile.txt > outfile.txt
this code does not return the expected output. how can I fix it?
how can I fix it?
You did
awk -F "\t" '{ if(($1 == '*')) { print $1 "," $2} }' infile.txt > outfile.txt
by doing $1 == "*" you are asking: is first field * not does first contain *? You might use index function which does return position of match if found or 0 otherwise. Let infile.txt content be
VAREAKAVVLRDRKSTRLN 2888
ACP*VRWPIYTACGP 292
RDRKSTRLNSSHVVTSRMP 114
VAREA*KAVVLRDRRAHV*T 73
then
awk 'index($1,"*")==0{print $1,$2}' infile.txt
output
VAREAKAVVLRDRKSTRLN 2888
RDRKSTRLNSSHVVTSRMP 114
Note that if you use index rather than pattern /.../ you do not have to care about characters with special meaning, e.g. .. Note that for data you have you do not have to set field separator (FS) explicitly. Important ' is not legal string delimiter in GNU AWK, you should use " for that purpose, unless your intent is to summon hard to find bugs.
(tested in gawk 4.2.1)
with your shown samples, please try following awk program.
awk '$1!~/\*/' Input_file
OR above will print complete line when condition is NOT matched, in case you want to print only 1st and 2nd fields of matched condition line then try following:
awk '$1!~/\*/{print $1,$2}' Input_file
Use grep like so to remove the lines that contain literal asterisk (*). Note that it should be escaped with a backslash (\*) or put in a character class ([*]) to prevent grep from interpreting * as a modifier meaning 0 or more characters:
echo "A*B\nCD" | grep -v '[*]'
CD
Here, GNU grep uses the following options:
-v : Print lines that do not match.

gawk - Delimit lines with custom character and no similar ending character

Let's say I have a file like so:
test.txt
one
two
three
I'd like to get the following output: one|two|three
And am currently using this command: gawk -v ORS='|' '{ print $0 }' test.txt
Which gives: one|two|three|
How can I print it so that the last | isn't there?
Here's one way to do it:
$ seq 1 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1
$ seq 3 | awk -v ORS= 'NR>1{print "|"} 1; END{print "\n"}'
1|2|3
With paste:
$ seq 1 | paste -sd'|'
1
$ seq 3 | paste -sd'|'
1|2|3
Convert one column to one row with field separator:
awk '{$1=$1} 1' FS='\n' OFS='|' RS='' file
Or in another notation:
awk -v FS='\n' -v OFS='|' -v RS='' '{$1=$1} 1' file
Output:
one|two|three
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
awk solutions work great. Here is tr + sed solution:
tr '\n' '|' < file | sed 's/\|$//'
1|2|3
just flatten it :
gawk/mawk 'BEGIN { FS = ORS; RS = "^[\n]*$"; OFS = "|"
} NF && ( $NF ? NF=NF : —-NF )'
ascii | = octal \174 = hex 0x7C. The reason for —-NF is that more often than not, the input includes a trailing new line, which makes field count 1 too many and result in
1|2|3|
Both NF=NF and --NF are similar concepts to $1=$1. Empty inputs, regardless of whether trailing new lines exist or not, would result in nothing printed.
At the OFS spot, you can delimit it with any string combo you like instead of being constrained by tr, which has inconsistent behavior. For instance :
gtr '\012' '高' # UTF8 高 = \351\253\230 = xE9 xAB x98
on bsd-tr, \n will get replaced by the unicode properly 1高2高3高 , but if you're on gnu-tr, it would only keep the leading byte of the unicode, and result in
1 \351 2 \351 . . .
For unicode equiv-classes, bsd-tr works as expected while gtr '[=高=]' '\v' results in
gtr: ?\230: equivalence class operand must be a single character
and if u attempt equiv-classes with an arbitrary non-ASCII byte, bsd-tr does nothing while gnu-tr would gladly oblige, even if it means slicing straight through UTF8-compliant characters :
g3bn 77138 | (g)tr '[=\224=]' '\v'
bsd-tr : 77138=Koyote 코요태 KYT✜ 高耀太
gnu-tr : 77138=Koyote ?
?
태 KYT✜ 高耀太
I would do it following way, using GNU AWK, let test.txt content be
one
two
three
then
awk '{printf NR==1?"%s":"|%s", $0}' test.txt
output
one|two|three
Explanation: If it is first line print that line content sans trailing newline, otherwise | followed by line content sans trailing newline. Note that I assumed that test.txt has not trailing newline, if this is not case test this solution before applying it.
(tested in gawk 5.0.1)
Also you can try this with awk:
awk '{ORS = (NR%3 ? "|" : RS)} 1' file
one|two|three
% is the modulo operator and NR%3 ? "|" : RS is a ternary expression.
See Ed Morton's explanation here: https://stackoverflow.com/a/55998710/14259465
With a GNU sed, you can pass -z option to match line breaks, and thus all you need is replace each newline but the last one at the end of string:
sed -z 's/\n\(.\)/|\1/g' test.txt
perl -0pe 's/\n(?!\z)/|/g' test.txt
perl -pe 's/\n/|/g if !eof' test.txt
See the online demo.
Details:
s - substitution command
\n\(.\) - an LF char followed with any one char captured into Group 1 (so \n at the end of string won't get matched)
|\1 - a | char and the captured char
g - all occurrences.
The first perl command matches any LF char (\n) not at the end of string ((?!\z)) after slurping the whole file into a single string input (again, to make \n visible to the regex engine).
The second perl command replaces an LF char at the end of each line except the one at the end of file (eof).
To make the changes inline add -i option (mind this is a GNU sed example):
sed -i -z 's/\n\(.\)/|\1/g' test.txt
perl -i -0pe 's/\n(?!\z)/|/g' test.txt
perl -i -pe 's/\n/|/g if !eof' test.txt

How to remove 0's from the second column

I have a file that looks like this :
k141_173024,001
k141_173071,002
k141_173527,021
k141_173652,034
k141_173724,041
...
How do I remove 0's from each line of the second field?
The desired result is :
k141_173024,1
k141_173071,2
k141_173527,21
k141_173652,34
k141_173724,41
...
What I've tied was
cut -f 2 -d ',' file | awk '{print $1 + 0} > file2
cut -f 1 -d ',' file > file1
paste file1 file2 > final_file
This was an inefficient way to edit it.
Thank you.
awk 'BEGIN{FS=OFS=","} {print $1 OFS $2+0}' Input.txt
Force to Integer value by adding 0
If it's only the zeros following the comma (,001 to ,1 but ,010 to ,10; it's not remove 0's from the second column but the example doesn't clearly show the requirement), you could replace the comma and zeros with another comma:
$ awk '{gsub(/,0+/,",")}1' file
k141_173024,1
k141_173071,2
k141_173527,21
k141_173652,34
k141_173724,41
Could you please try following.
awk 'BEGIN{FS=OFS=","} {gsub(/0/,"",$2)}1' Input_file
EDIT: To remove only leading zeros try following.
awk 'BEGIN{FS=OFS=","} {sub(/^0+/,"",$2)}1' Input_file
If the second field is a number, you can do this to remove the leading zeroes:
awk 'BEGIN{FS=OFS=","} {print $1 OFS int($2)}' file
As per #Inian's suggestion, this can be further simplified to:
awk -F, -v OFS=, '{$2=int($2)}1' file
This might work for you (GNU sed):
sed 's/,0\+/,/' file
This removes leading zeroes from the second column by replacing a comma followed by one or more zeroes by a comma.
P.S. I guess the OP did not mean to remove zeroes that are part of the number.

How to remove field separators in awk when printing $0?

eg, each row of the file is like :
1, 2, 3, 4,..., 1000
How can print out
1 2 3 4 ... 1000
?
If you just want to delete the commas, you can use tr:
$ tr -d ',' <file
1 2 3 4 1000
If it is something more general, you can set FS and OFS (read about FS and OFS) in your begin block:
awk 'BEGIN{FS=","; OFS=""} ...' file
You need to set OFS (the output field separator). Unfortunately, this has no effect unless you also modify the string, leading the rather cryptic:
awk '{$1=$1}1' FS=, OFS=
Although, if you are happy with some additional space being added, you can leave OFS at its default value (a single space), and do:
awk -F, '{$1=$1}1'
and if you don't mind omitting blank lines in the output, you can simplify further to:
awk -F, '$1=$1'
You could also remove the field separators:
awk -F, '{gsub(FS,"")} 1'
Set FS to the input field separators. Assigning to $1 will then reformat the field using the output field separator, which defaults to space:
awk -F',\s*' '{$1 = $1; print}'
See the GNU Awk Manual for an explanation of $1 = $1

awk command to change field seperator from tilde to tab

I want to replace the delimter tilde into tab space in awk command, I have mentioned below how I would have expect.
input
~1~2~3~
Output
1 2 3
this wont work for me
awk -F"~" '{ OFS ="\t"; print }' inputfile
It's really a job for tr:
tr '~' '\t'
but in awk you just need to force the record to be recompiled by assigning one of the fields to its own value:
awk -F'~' -v OFS='\t' '{$1=$1}1'
awk NF=NF FS='~' OFS='\t'
Result
1 2 3
Code for sed:
$echo ~1~2~3~|sed 'y/~/\t/'
1 2 3