Print Specific Columns using AWK - awk

I'm trying to fetch the data from column B to D from a tab delimited file "FILE". The simple AWK code I use fetch the data, but unfortunately keeps the output in a single column and remove the identifiers (shown below).
Any suggestions please.
CODE
awk '{for(i=2;i<=4;++i)print $i}' FILE
FILE
A B C D E F G
1_at 10.8435630935 10.8559287854 8.6666141543 8.820310681 9.9024050571 8.613199083 11.9807771094
2_at 4.7615531106 4.5209119307 11.2467919586 8.8105151099 7.1831990104 11.0645055836 4.3726598561
3_at 6.0025262754 5.4058080843 3.2475272982 3.1869728585 3.5654989547
OUTPUT OBTAINED
B
C
D
10.8435630935
10.8559287854
8.6666141543
4.7615531106
4.5209119307
11.2467919586
6.0025262754
5.4058080843
3.2475272982

Why don't you directly use cut?
$ cut -d$'\t' -f2-4 < file
B C D
10.8435630935 10.8559287854 8.6666141543
4.7615531106 4.5209119307 11.2467919586
6.0025262754 5.4058080843 3.2475272982
With awk you would need printf to avoid new lines of print:
awk -F"\t" '{for(i=2;i<=4;++i) printf "%s%s", $i, (i==4?RS:FS)}'

Related

Remove empty columns by awk

I have a input file, which is tab delimited, but I want to remove all empty columns. Empty columns : $13=$14=$15=$84=$85=$86=$87=$88=$89=$91=$94
INPUT: tsv file with more than 90 columns
a b d e g...
a b d e g...
OUTPUT: tsv file without empty columns
a b d e g....
a b d e g...
Thank you
This might be what you want:
$ printf 'a\tb\tc\td\te\n'
a b c d e
$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=""} 1'
a c e
$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a c e
Note that the above doesn't remove all empty columns as some potential solutions might do, it only removes exactly the column numbers you want removed:
$ printf 'a\tb\t\td\te\n'
a b d e
$ printf 'a\tb\t\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a e
remove ALL empty columns:
If you have a tab-delimited file, with empty columns and you want to remove all empty columns, it implies that you have multiple consecutive tabs. Hence you could just replace those with a single tab and delete then the first starting tab if you also removed the first column:
sed 's/\t\+/\t/g;s/^\t//' <file>
remove SOME columns: See Ed Morton or just use cut:
cut --complement -f 13,14,15,84,85,86,87,88,89,91,94 <file>
remove selected columns if and only if they are empty:
Basically a simple adaptation from Ed Morton :
awk 'BEGIN{FS=OFS="\t"; n=split(col,a,",")}
{ for(i=1;i<=n;++i) if ($a[i]=="") $a[i]=RS; gsub("(^|"FS")"RS,"") }
1' col=13,14,15,84,85,86,87,88,89,91,94 <file>

awk remove mirrored duplicates from 2 columns

Big question:
I want a list of the unique combinations between two fields in a data frame.
Example data:
A B
C D
E F
B A
C F
E F
I would like to be able to get the result of 4 unique combinations: AB, CD, EF, and CF. Since BA and and BA contain the same components but in a different order, I only want one copy (it is a mutual relationship so BA is the same thing as AB)
Attempt:
So far I have tried sorting and keeping unique lines:
sort file | uniq
but of course that produces 5 combinations:
A B
C D
E F
B A
C F
I do not know how to approach AB/BA being considered the same. Any suggestions on how to do this?
The idiomatic awk approach is to order the index parts:
$ awk '!seen[$1>$2 ? $1 FS $2 : $2 FS $1]++' file
A B
C D
E F
C F
another awk magic
awk '!a[$1,$2] && !a[$2,$1]++' file
In awk:
$ awk '($1$2 in a){next}{a[$1$2];a[$2$1]}1' file
A B
C D
E F
C F
Explained:
($1$2 in a) { next } # if duplicate in hash, next record
{ a[$1$2]; a[$2$1] } 1 # hash reverse also and output
It works for single char fields. If you want to use it for longer strings, add FS between fields, like a[$1 FS $2] etc. (thanks #EdMorton).

awk '{print $<number>}' but without knowing <number> before hand

Printing a specific column per line using pipe to awk is fine.
But how do I do it if I do not know which column it is, except I have to get the column who's first row matches something.
Example.
Title1 Title2 TargetTitle Title3
x y z a
b c d e
The above table, I want to filter out only:
z
d
BUT, two problems
1) I don't know exactly the column number
2) I don't want first row (not a big deal, I can just sed lines 2 to $).
Thanks.
You can build your output using awk like this:
awk -v OFS='\t' 'NR>1{for (i=1; i<=NF; i++) {
if ($i=="b"||$i=="d") $i=""; printf "%s%s", $i, (i==NF)?ORS:OFS}}' file
x y z a
c e
To filter out one column, you could use something like this:
awk -v title="TargetTitle" 'NR==1 { for (i=1;i<=NF;++i) if ($i==title) col=i }
NR>1 { for (i=1;i<=NF;++i) if (i!=col) printf "%s%s", $i, (i<NF?OFS:ORS)}' file
Output:
x y a
b c e
If you want to add more space between each column in the output, you can change the value of the OFS variable or change the first format specifier from %s to %4s, for example.
If you want to only print one column, you can do something like this:
awk -v title="TargetTitle" 'NR==1 { for (i=1;i<=NF;++i) if ($i==title) col=i }
NR>1 { print $col }' file
Output:
z
d

Split large single column into two columns

I need to split a single column of data in a large file into two columns as follows:
A
B B A
C ----> D C
D F E
E H G
F
G
H
Is there an easy way of doing it with unix shell commands and/or small shell script? awk?
$ awk 'NR%2{s=$0;next} {print $0,s}' file
B A
D C
F E
H G
You can use the following awk script:
awk 'NR % 2 != 0 {cache=$0}; NR % 2 == 0 {print $0 cache}' data.txt
Output:
BA
DC
FE
HG
It caches the value of odd lines and outputs even lines + appends the cache to them.
I know this is tagged awk, but I just can't stop myself from posting a sed solution, since the question left it open for "easy way . . . with unix shell commands":
$ sed -n 'h;n;G;s/\n/ /g;p' data.txt
B A
D C
F E
H G

add column of 1's to tab-delimited file

Can't find a solution, even though thousands of variations of this question have been asked.
I want to add a column of 1's to a tab-delimited file using awk or sed.
The file will have about 20 million lines, so something efficient would be nice.
turn this:
a b c
r j k
i t w
into this:
a b c 1
r j k 1
i t w 1
One simple way. Modify Input and Output field separators to a tab. The NF variable keeps last column, so increment for a new one, assign the 1 number and print:
awk 'BEGIN { FS = OFS = "\t" } { $(NF+1) = 1; print $0 }' infile
It yields:
a b c 1
r j k 1
i t w 1
Code for sed:
sed 's/$/&\t1/' file
Assuming you used awk -F'\t' instead of just awk:
{
print $0 FS 1;
}
If you didn't use the -F option, replace FS 1 with "\t1".