How to supress original line at gawk - awk

why does gawk write the input line first?
ws#i7$ echo "8989889898 jAAA_ALL_filenames.txt" | gawk 'match($0, /([X0-9\\\-]{9,13})/, arr); {print arr[1];}'
my output
8989889898 jAAA_ALL_filenames.txt
8989889898
I do not want that the same first line is printed.
Thanks
Walter

You have a stray semicolon in there.
$ echo "8989889898 jAAA_ALL_filenames.txt" | gawk 'match($0, /([X0-9\\\-]{9,13})/, arr); {print arr[1];}'
8989889898 jAAA_ALL_filenames.txt
8989889898
$ echo "8989889898 jAAA_ALL_filenames.txt" | gawk 'match($0, /([X0-9\\\-]{9,13})/, arr) {print arr[1];}'
8989889898

The semicolon after match($0, /([X0-9\\\-]{9,13})/, arr) means that your script is effectively:
match($0, /([X0-9\\\-]{9,13})/, arr) { print $0 } # default action block inserted
1 {print arr[1];} # default condition inserted
match returns a "true" value so the whole line gets printed.
To fix it, remove the semicolon:
match($0, /([X0-9\\\-]{9,13})/, arr) {print arr[1];}
Now the code only has one condition { action } structure, as you intended, so it does what you want.

Related

Why doesn't awk and gsub remove only the dot?

This awk command:
awk -F ',' 'BEGIN {line=1} {print line "\n0" gsub(/\./, ",", $2) "0 --> 0" gsub(/\./, ",", $3) "0\n" $10 "\n"; line++}' file
is supposed to convert these lines:
Dialogue: 0,1:51:19.56,1:51:21.13,Default,,0000,0000,0000,,Hello!
into these:
1273
01:51:19.560 --> 01:51:21.130
Hello!
But somehow I'm not able to make gsub behave to replace the . by , and instead get 010 as both gsub results. Can anyone spot the issue?
Thanks
The return value from gsub is not the result from the substitution. It returns the number of substitutions it performed.
You want to gsub first, then print the modified string, which is the third argument you pass to gsub.
awk -F ',' 'BEGIN {line=1}
{ gsub(/\./, ",", $2);
gsub(/\./, ",", $3);
print line "\n0" $2 "0 --> 0" $3 "0\n" $10 "\n";
line++}' file
Another way is to use GNU awk's gensub instead of gsub:
$ awk -F ',' '
{
print NR ORS "0" gensub(/\./, ",","g", $2) "0 --> 0" gensub(/\./, ",","g",$3) "0" ORS $10 ORS
}' file
Output:
1
01:51:19,560 --> 01:51:21,130
Hello!
It's not as readable as the gsub solution by #tripleee but there is a place for it.
Also, I replace the line with builtin NR and \ns with ORS.

Find "complete cases" with awk

Using awk, how can I output the lines of a file that have all fields non-null without manually specifying each column?
foo.dat
A||B|X
A|A|1|
|1|2|W
A|A|A|B
Should return:
A|A|A|B
In this case we can do:
awk -F"|" -v OFS="|" '$1 != "" && $2 != "" && $3 != "" && $4 != "" { print }' foo.dat
But is there a way to do this without specifying each column?
You can loop over all fields and skip the record if any of the fields are empty:
$ awk -F'|' '{ for (i=1; i<=NF; ++i) { if (!$i) next } }1' foo.dat
A|A|A|B
if (!$i) is "if field i is not non-empty", and 1 is short for "print the line", but it is only hit if next was not executed for any of the fields of the current line.
Another in awk:
$ awk -F\| 'gsub(/[^|]+(\||$)/,"&")==NF' file
A|A|A|B
print record if there are NF times | terminating (non-empty, |-excluding) strings.
awk '!/\|\|/&&!/\|$/&&!/^\|/' file
A|A|A|B

print unique lines based on field

Would like to print unique lines based on first field , keep the first occurrence of that line and remove duplicate other occurrences.
Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
Desired Output:
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi
Have tried below command and in-complete
awk 'BEGIN { FS = OFS = "," } { !seen[$1]++ } END { for ( i in seen) print $0}' Input.csv
Looking for your suggestions ...
You put your test for "seen" in the action part of the script instead of the condition part. Change it to:
awk -F, '!seen[$1]++' Input.csv
Yes, that's the whole script:
$ cat Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc
$
$ awk -F, '!seen[$1]++' Input.csv
10,15-10-2014,abc
20,12-10-2014,bcd
40,06-10-2014,ghi
This should give you what you want:
awk -F, '{ if (!($1 in a)) a[$1] = $0; } END '{ for (i in a) print a[i]}' input.csv
typo there in syntax.
awk '{ if (!($1 in a)) a[$1] = $0; } END { for (i in a) print a[i]}'

awk command to split nth field

I am learning AWK and was trying some exercises on built-in string functions.
Here's my exercise:
I have a file containing as below
RecordType:83
1,2,3,a|x|y|z,4,5
And my desired output is as below:
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
I wrote an awk command for the above output.
awk -F',' '$1 ~ /RecordType:83/{print $0}
$1 == 1{
split($4,splt,"|")
for(i in splt)
{
if(i==1)
print $1,$2,$3,splt[i],$5,$6
else
print $1,0,0,splt[i],$5,$6
}
}' OFS=, file_name
The above command looks so clumsy. Is there any way minimizing the command?
Thanks in advance
The shortest possible one-liner I could manage:
awk -F, 'NR>1{n=split($4,a,"|");for(;i++<n;){$4=a[i];print;$2=$3=0}}NR==1' OFS=, file
RecordType:83    
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
The much more readable script (recommended):
BEGIN {
FS=OFS="," # Comma delimiter
}
NR==1 { # If the first line in file
print $0 # Print the whole line
next # Skip to next line
}
{
n=split($4,a,"|") # Split field four on |
for(i=1;i<=n;i++) # For each sub-field
print $1,i==1?$2OFS$3:"0"OFS"0",a[i],$5,$6 # Print the output
}
another shorter one-liner
awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
with your example:
kent$ awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5

How to print out a specific field in AWK?

A very simple question, which a found no answer to. How do I print out a specific field in awk?
awk '/word1/', will print out the whole sentence, when I need just a word1. Or I need a chain of patterns (word1 + word2) to be printed out only from a text.
Well if the pattern is a single word (which you want to print and can't contaion FS (input field separator)) why not:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print MYPATTERN }' INPUTFILE
If your pattern is a regex:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print gensub(".*(" MYPATTERN ").*","\\1","1",$0) }' INPUTFILE
If your pattern must be checked in every single field:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN {
for (i=1;i<=NF;i++) {
if ($i ~ MYPATTERN) { print "Field " i " in " NR " row matches: " MYPATTERN }
}
}' INPUTFILE
Modify any of the above to your taste.
The fields in awk are represented by $1, $2, etc:
$ echo this is a string | awk '{ print $2 }'
is
$0 is the whole line, $1 is the first field, $2 is the next field ( or blank ),
$NF is the last field, $( NF - 1 ) is the 2nd to last field, etc.
EDIT (in response to comment).
You could try:
awk '/crazy/{ print substr( $0, match( $0, "crazy" ), RLENGTH )}'
i know you can do this with awk :
an alternative would be :
sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file
or you can use grep -o
Something like this perhaps:
awk '{split("bla1 bla2 bla3",a," "); print a[1], a[2], a[3]}'