Assigning a variable on each line in a quick awk command - awk

Awk is awesome of text manipulation, but a little opaque to me. I would like to run an awk command that boils down to something like this
awk '{$x = ($3 > 0 ? 1 : -1); print $1*$x "\t" $2*$x}' file
I want to assign $x on each line, i.e. not using the -v option, and then use it inside my print statement. Unforunately, after the ; awk has forgotten the values of $1 and $2. And putting the assignment outside the braces doesn't seem to work either. How does this work?

AWK doesn't use dollar signs on its variables:
awk '{x = ($3 > 0 ? 1 : -1); print $1*x "\t" $2*x}' file
In your version you're assigning a 1 or -1 to $0 (the entire input line) since x==0 (effectively) when the first line of the input file is read. That's why $1 and $2 seem to be "forgotten".

Related

Can I delete a field in awk?

This is test.txt:
0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76
If I run
awk -F, 'BEGIN{OFS=","}{$2="";print $0}' test.txt
the result is:
0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76
The $2 wasn't deleted, it just became empty.
I hope, when printing $0, that the result is:
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
All the existing solutions are good though this is actually a tailor made job for cut:
cut -d, -f 1,3- file
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
If you want to remove 3rd field then use:
cut -d, -f 1,2,4- file
To remove 4th field use:
cut -d, -f 1-3,5- file
I believe simplest would be to use sub function to replace first occurrence of continuous ,,(which are getting created after you made 2nd field NULL) with single ,. But this assumes that you don't have any commas in between field values.
awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file
2nd solution: OR you could use match function to catch regex from first comma to next comma's occurrence and get before and after line of matched string.
awk '
match($0,/,[^,]*,/){
print substr($0,1,RSTART-1)","substr($0,RSTART+RLENGTH)
}' Input_file
It's a bit heavy-handed, but this moves each field after field 2 down a place, and then changes NF so the unwanted field is not present:
$ awk -F, -v OFS=, '{ for (i = 2; i < NF; i++) $i = $(i+1); NF--; print }' test.txt
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01
0x01,0x00,0x76
$
Tested with both GNU Awk 4.1.3 and BSD Awk ("awk version 20070501" on macOS Mojave 10.14.6 — don't ask; it frustrates me too, but sometimes employers are not very good at forward thinking). Setting NF may or may not work on older versions of Awk — I was a little surprised it did work, but the surprise was a pleasant one, for a change.
If Awk is not an absolute requirement, and the input is indeed as trivial as in your example, sed might be a simpler solution.
sed 's/,[^,]*//' test.txt
This is especially elegant if you want to remove the second field. A more generic approach to remove, the nth field would require you to put in a regex which matches the first n - 1 followed by the nth, then replace that with just the the first n - 1.
So for n = 4 you'd have
sed 's/\([^,]*,[^,]*,[^,]*,\)[^,]*,/\1/' test.txt
or more generally, if your sed dialect understands braces for specifying repetitions
sed 's/\(\([^,]*,\)\{3\}\)[^,]*,/\1/' test.txt
Some sed dialects allow you to lose all those pesky backslashes with an option like -r or -E but again, this is not universally supported or portable.
In case it's not obvious, [^,] matches a single character which is not (newline or) comma; and \1 recalls the text from first parenthesized match (back reference; \2 recalls the second, etc).
Also, this is completely unsuitable for escaped or quoted fields (though I'm not saying it can't be done). Every comma acts as a field separator, no matter what.
With GNU sed you can add a number modifier to substitute nth match of non-comma characters followed by comma:
sed -E 's/[^,]*,//2' file
Using awk in a regex-free way, with the option to choose which line will be deleted:
awk '{ col = 2; n = split($0,arr,","); line = ""; for (i = 1; i <= n; i++) line = line ( i == col ? "" : ( line == "" ? "" : "," ) arr[i] ); print line }' test.txt
Step by step:
{
col = 2 # defines which column will be deleted
n = split($0,arr,",") # each line is split into an array
# n is the number of elements in the array
line = "" # this will be the new line
for (i = 1; i <= n; i++) # roaming through all elements in the array
line = line ( i == col ? "" : ( line == "" ? "" : "," ) arr[i] )
# appends a comma (except if line is still empty)
# and the current array element to the line (except when on the selected column)
print line # prints line
}
Another solution:
You can just pipe the output to another sed and squeeze the delimiters.
$ awk -F, 'BEGIN{OFS=","}{$2=""}1 ' edward.txt | sed 's/,,/,/g'
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
$
Commenting on the first solution of #RavinderSingh13 using sub() function:
awk 'BEGIN{FS=OFS=","}{$2="";sub(/,,/,",");print $0}' Input_file
The gnu-awk manual: https://www.gnu.org/software/gawk/manual/html_node/Changing-Fields.html
It is important to note that making an assignment to an existing field changes the value of $0 but does not change the value of NF, even when you assign the empty string to a field." (4.4 Changing the Contents of a Field)
So, following the first solution of RavinderSingh13 but without using, in this case,sub() "The field is still there; it just has an empty value, delimited by the two colons":
awk 'BEGIN {FS=OFS=","} {$2="";print $0}' file
0x01,,0x93,0x65,0xF8
0x01,,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,,0x00,0x76
My solution:
awk -F, '
{
regex = "^"$1","$2
sub(regex, $1, $0);
print $0;
}'
or one line code:
awk -F, '{regex="^"$1","$2;sub(regex, $1, $0);print $0;}' test.txt
I found that OFS="," was not necessary
I would do it following way, let file.txt content be:
0x01,0xDF,0x93,0x65,0xF8
0x01,0xB0,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0xB2,0x00,0x76
then
awk 'BEGIN{FS=",";OFS=""}{for(i=2;i<=NF;i+=1){$i="," $i};$2="";print}' file.txt
output
0x01,0x93,0x65,0xF8
0x01,0x01,0x03,0x02,0x00,0x64,0x06,0x01,0xB0
0x01,0x00,0x76
Explanation: I set OFS to nothing (empty string), then for 2nd and following column I add , at start. Finally I set what is now comma and value to nothing. Keep in mind this solution would need rework if you wish to remove 1st column.

Replace empty fields in tab separated file with "0"

Say I have a data file with rows like a^I^I^I^I^I^I^Ib^Ic, which is separated by ^I (means a tab).
Now, I want to change the empty value of each column into 0, so the result should be like: a^I0^I0^I0^I0^I0^I0^Ib^Ic.
How can I achieve it with only one sed command?
This is easier to do using a tool with support for look-ahead:
perl -pe 's/\t(?=\t)/\t0/g' file
This puts a "0" in between any pair of tab characters. The look-ahead is useful as it matches the second tab without consuming it, so it can be used in the next match.
Here's a way you could do it using awk:
awk -F'\t' -v OFS='\t' '{ for (i = 1; i <= NF; ++i) sub(/^$/, 0, $i) } 1' file
This loops through all the fields, substituting all empty ones with a 0.
With GNU sed:
sed ':a;s/\t\(\t\|$\)/\t0\1/;ta' file
Replace all \t followed by \t or end of line with \t0.
another awk
$ awk -v RS='\t' -v ORS='\t' '$0==""{$0=0}1'
or with BEGIN block
$ awk 'BEGIN{RS=ORS="\t"} $0==""{$0=0}1'
$ awk '{while(gsub(/\t\t/,"\t0\t"));} 1' file
a 0 0 0 0 0 0 b c

AWK - How can I get the values that it equal among them? if ... $1== $1?

I am working with a list of DNA sequences. I would like to get all the sequences with the same name ($1). I was thinking to use if ($1 == "$1"). But this does not work.
result_file:
name1 number1 number2 sequenceofname1
name1 number3 number4 sequenceofname1
script:
awk '{if ($1 == "$1") printf("%s_%s_%s \t%s\n", $1,$2,$3,$4);}' <result_file >file.txt
How do I pass $1 to my awk command?
you can use -v option
awk -v name="name1" '{
if ($1 == name) printf("%s_%s_%s \t%s\n", $1,$2,$3,$4);
}' result_file > file.txt
or, if this statement in a script
awk -v name="$1" '{
if ($1 == name) printf("%s_%s_%s \t%s\n", $1,$2,$3,$4);
}' result_file > file.txt
-v var=val, Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the BEGIN block of an AWK program.
If I've understood correctly, you want to use $1 from your shell script as an argument to the awk command within it.
In which case, you want to not quote the $1 that you want expanding, but do quote the rest of the awk command. One possibility is to double-quote the command:
awk "{if (\$1 == \"$1\") printf(\"%s_%s_%s \\t%s\\n\", \$1,\$2,\$3,\$4);}"
It can get hard to manage all the backslashes, so you probably prefer to single-quote most of the command, but double-quote the part to be expanded:
awk '{if ($1 == "'"$1"'") printf("%s_%s_%s \t%s\n", $1,$2,$3,$4);}'
That's slightly tricky to read - the critical bit divides as '...($1 == "' "$1" '")...'. So there's a double-quote that's part of the Awk command, and one that's for the shell, to keep $1 in one piece.
Oh, and no need to invoke cat - just provide the file as input:
awk ... <result_file >file.txt

Awk with a variable

I have an awk command to print out the total number of times "200" occurred in column 26.
awk '$26 ~ /200/{n++}; END {print n+0}' testfile
How do I modify this statement so I can pass 200 as a variable? e.g. if I have a variable $code with a value of 200
Thanks in advance
awk '$26 ~ code {n++} END {print n+0}' code=200 testfile
If a filename on the command line has the form var=val it is treated as a variable
assignment. The variable var will be assigned the value val.
§ Awk Program Execution
awk -v var="$shellVar" '$26~var{n++} END{print n}' file
you see above line how to use shell variable in awk. some notes for your awk one-liner:
print n+0 not necessary. because the n defined by you, not picked from input text, and you explicitly did n++, so it is number type, n+0 makes no sense
the ; before END should be removed
I copied your code about the checking 200 part. but it is risky. if the $26 has only a number, you can consider to use 1*$26 == 200 or $26 == "200" using regex in this situation may give wrong result, think about in your $26, value was : 20200

How to find if substring is in a variable in awk

i am using awk and need to find if a variable , in this case $24 contains the word 3:2- if so to print the line (for sed command)- the variable may include more letters or spaces or \n.......
for ex.
$24 == "3:2" {print "s/(inter = ).*/\\1\"" "3:2_pulldown" "\"/" >> NR }
in my above line- it never find such a string although it exists.
can you help me with the command please??
If you're looking for "3:2" within $24, then you want $24 ~ /3:2/ or index($24, "3:2") > 0
Why are you using awk to generate a sed script?
Update
To pass a variable from the shell to awk, use the -v option:
val="3:2" # or however you determine this value
awk -v v="$val" '$24 ~ v {print}'
awk '$24~/3:2/' file_name
this will serach for "3:2" in field 24