Changing the field separator of awk to newline - awk

The -F option lets you specify the field separator for awk, but using '\n' as the line separator doesn't work, that is, it doesn't make $1 the first line of the input, $2 the second line, and so on.
I suspect that this is because awk looks for the field separator within each line. Is there a way to get around this with awk, or some other Linux command? Basically, I want to separate my input by newline characters and put them into an Excel file.
I'm still warming up to Linux and shell scripts, which is the reason for my lack of creativity with this problem.
Thank you!

You may require to overwrite the input record separator (RS), which default is newline.
See my example below,
$ cat test.txt
a
b
c
d
$ awk 'BEGIN{ RS = "" ; FS = "\n" }{print $1,$2,$3,$4}' test.txt
a b c d

Note that you can change both the input and output record separator so you can do something like this to achieve a similar result to the accepted answer.
cat test.txt
a
b
c
d
$ awk -v ORS=" " '{print $1}' test.txt
a b c d

one can simplify it to just the following, with a minor caveat of extra trailing space without trailing newline :
% echo "a\nb\nc\nd"
a
b
c
d
% echo "a\nb\nc\nd" | mawk 8 ORS=' '
a b c d %
To rectify that, plus handling the edge case of no trailing newline from input, one can modify it to :
% echo -n "a\nb\nc\nd" | mawk 'NF-=_==$NF' FS='\n' RS='^$' | odview
0000000 543301729 174334051
a b c d \n
141 040 142 040 143 040 144 012
a sp b sp c sp d nl
97 32 98 32 99 32 100 10
61 20 62 20 63 20 64 0a
0000010
% echo "a\nb\nc\nd" | mawk 'NF -= (_==$NF)' FS='\n' RS='^$' | odview
0000000 543301729 174334051
a b c d \n
141 040 142 040 143 040 144 012
a sp b sp c sp d nl
97 32 98 32 99 32 100 10
61 20 62 20 63 20 64 0a
0000010

Related

how to append a file to the second column of another tsv file

I have a file first.txt that looks like this :
45
56
74
62
I want to append this file to second.tsv that looks like this(there are 17 columns) :
2 a ...
3 b ...
5 c ...
6 d ...
The desired output is :
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
How can I append to the second column?
I've tried
awk -F, '{getline f1 <"first.txt" ;print $1,f1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' second.tsv
but did not work. This added the columns of first.txt to the last column of second.tsv, and it was not tab separated.
Thank you.
Your code works if you remove the -F, bit. This tells awk that the file is comma-separated, which it is not.
Another option would be to go for a piped version with paste, e.g.:
paste first.tsv second.tsv | awk '{ t=$2; $2=$1; $1=t } 1' OFS='\t'
Output:
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
$ awk 'NR==FNR{a[FNR]=$0;next} {$1=$1 OFS a[FNR]} 1' file1 file2
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
If your files are tab-separated add BEGIN{FS=OFS="\t"} at the front.

How to replace multiple occurrences of a letter with that letter?

I have a file with 5 columns that looks like this:
15642 G A.aa,, 0.77501 107
15643 G A.a,.A, 0.7570 17
15644 C t.TtTt,.T, 0.7501 10
I'm trying to convert the 3rd column of Aa's and Tt's to just "A" or "T".
Output:
15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C T 0.7501 10
I've tried various awk methods without success. I'd sincerely appreciate any help. Thanks!
Following awk may help you on same.
awk '$3~/[Aa]/{$3="A"} $3~/[Tt]/{$3="T"} 1' Input_file
There's many possibilities including:
$ awk '{sub(/\..*/,"",$3)} 1' file
15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C t 0.7501 10
or
$ awk '{$3=substr($3,1,1)} 1' file
15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C t 0.7501 10
or
$ awk '{$3=toupper(substr($3,1,1))} 1' file
15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C T 0.7501 10
This might work for you (GNU sed):
sed -ri 's/(\S)\S*/\U\1/3' file
Convert the first character of the third field to uppercase.

Print Distinct Values from Field AWK

I'm looking for a way to print the distinct values in a field while in the command-prompt environment using AWK.
ID Title Promotion_ID Flag
12 Purse 7 Y
24 Wallet 7 Y
709 iPhone 1117 Y
74 Satchel 7 Y
283 Xbox 84 N
Ideally I'd like to return the promotion_ids: 7, 1117, 84.
I've researched the question on Google and have found some examples such as:
`cut -f 3 | uniq *filename.ext*` (returned error)
`awk cut -f 3| uniq *filename.ext*` (returned error)
`awk cut -d, -f3 *filename.ext* |sort| uniq` (returned error)
awk 'NR>1{a[$3]++} END{for(b in a) print b}' file
Output:
7
84
1117
Solution 1st: Simple awk may help.(Following will remove the header of Input_file)
awk 'FNR>1 && !a[$3]++{print $3}' Input_file
Solution 2nd: In case you need to keep the Header of the Input_file then following may help you on same.
awk 'FNR==1{print;next} !a[$3]++{print $3}' Input_file
with the pipe line
$ sed 1d file | # remove header
tr -s ' ' '\t' | # normalize space delimiters to tabs
cut -f3 | # isolate the field
sort -nu # sort numerically and report unique entries
7
84
1117
[root#test ~]# cat test
ID Title Promotion_ID Flag
12 Purse 7 Y
24 Wallet 7 Y
709 iPhone 1117 Y
74 Satchel 7 Y
283 Xbox 84 N
Output -:
[root#test ~]# awk -F" " '!s[$3]++' test
ID Title Promotion_ID Flag
12 Purse 7 Y
709 iPhone 1117 Y
283 Xbox 84 N
[root#test ~]#
mawk '!__[$!NF=$--NF]--^(!_<NR)'
or
gawk' !__[$!--NF=$NF]--^(!_<NR)'
or perhaps
gawk '!__[$!--NF=$NF]++^(NF<NR)'
or even
mawk '!__[$!--NF=$NF]++^(NR-!_)' # mawk-only
gawk '!__[$!--NF=$--NF]--^(NR-NF)' # gawk-equiv of similar idea
7
1117
84

compare a text file with another files

I have a file named file.txt as shown below
12 2
15 7
134 8
154 12
155 16
167 6
175 45
45 65
812 54
I have another five files named A.txt, B.txt, C.txt, D.txt, E.txt. The contents of these files are shown below.
A.txt
45
134
B.txt
15
812
155
C.txt
12
154
D.txt
175
E.txt
167
I need to check, which file contains the values of first column of file.txt exists and print the name of the file as third column.
Output:-
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B
This should work:
One-liner:
awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) }1' {A..E}.txt file.txt
Formatted with comments:
awk '
#Check if the filename is not of the main file
FILENAME != "file.txt" {
#Create a hash. Store column 1 values of look up files as key and assign filename as values
a[$1]=FILENAME
#Skip the rest of the action
next
}
#Check the first column of main file is a key in the hash
$1 in a {
#If the key exists, assign the value of the key (which is filename) as Column 3 of main file
$3=a[$1]
#Using sub function, strip the extension of the file name as desired in your output
sub(/\..*/,"",$3)
#1 is a non-zero value forcing awk to print. {A..E} is brace expansion of your files.
}1' {A..E}.txt file.txt
Note: The main file needs to be passed at the end.
Test:
[jaypal:~/Temp] awk 'FILENAME != "file.txt"{ a[$1]=FILENAME; next } $1 in a { $3=a[$1]; sub(/\..*/,"",$3) ; printf "%-5s%-5s%-5s\n",$1,$2,$3}' {A..E}.txt file.txt
12 2 C
15 7 B
134 8 A
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B
#! /usr/bin/awk -f
FILENAME == "file.txt" {
a[FNR] = $0;
c=FNR;
}
FILENAME != "file.txt" {
split(FILENAME, name, ".");
k[$1] = name[1];
}
END {
for (line = 1; line <= c; line++) {
split(a[line], seg, FS);
print a[line], k[seg[1]];
}
}
# $ awk -f script.awk *.txt
This solution does not preserve the order
join <(sort file.txt) \
<(awk '
FNR==1 {filename = substr(FILENAME, 1, length(FILENAME)-4)}
{print $1, filename}
' [ABCDE].txt |
sort) |
column -t
12 2 C
134 8 A
15 7 B
154 12 C
155 16 B
167 6 E
175 45 D
45 65 A
812 54 B

awk + Need to print everything (all rest fields) except $1 and $2

I have the following file and I need to print everything except $1 and $2 by awk
File:
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
...
the desirable output:
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
...
Well, given your data, cut should be sufficient:
cut -d\ -f3- infile
Although it adds an extra space at the beginning of each line compared to yael's expected output, here is a shorter and simpler awk based solution than the previously suggested ones:
awk '{$1=$2=""; print}'
or even:
awk '{$1=$2=""}1'
$ cat t
INFORMATION DATA 12 33 55 33 66 43
INFORMATION DATA 45 76 44 66 77 33
INFORMATION DATA 77 83 56 77 88 22
$ awk '{for (i = 3; i <= NF; i++) printf $i " "; print ""}' t
12 33 55 33 66 43
45 76 44 66 77 33
77 83 56 77 88 22
danbens answer leaves a whitespace at the end of the resulting string. so the correct way to do it would be:
awk '{for (i=3; i<NF; i++) printf $i " "; print $NF}' filename
If the first two words don't change, probably the simplest thing would be:
awk -F 'INFORMATION DATA ' '{print $2}' t
Here's another awk solution, that's more flexible than the cut one and is shorter than the other awk ones. Assuming your separators are single spaces (modify the regex as necessary if they are not):
awk --posix '{sub(/([^ ]* ){2}/, ""); print}'
If Perl is an option:
perl -lane 'splice #F,0,2; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print it
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,2 cleanly removes columns 0 and 1 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
Variation for csv input files:
perl -F, -lane 'splice #F,0,2; print join " ",#F' file
This uses the -F field separator option with a comma