my_file is like this:
SELECTED NAME AGE
* adam 30
bob 70
I'd like to output:
adam
bob
however, if I try: cat my_file|awk '{print $2}' it outputs
NAME
adam
70
Any suggestions on how you get awk to account for a blank column?
with gawk field widths
$ awk -v FIELDWIDTHS='11 8 3' '{print $2}' file
NAME
adam
bob
Could you please try following.
awk '{printf("%s%s",/^ +/?$1:$2,ORS)}' Input_file
Output will be as follows.
NAME
adam
bob
Here are multiple tsv files, where I want to add 'XX' characters only in the second column (everywhere except in the header) and save it to this same file.
Input:
$ls
file1.tsv file2.tsv file3.tsv
$head -n 4 file1.tsv
a b c
James England 25
Brian France 41
Maria France 18
Ouptut wanted:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
I tried this, but the result is not kept in the file, and a simple redirection won't work:
# this works, but doesn't save the changes
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed "s|^|X${i}_|"
i=$((i+1))
done
# adding '-i' option to sed: this throws an error but would be perfect (sed no input files error)
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed -i "s|^|T${i}_|"
i=$((i+1))
done
Some help would be appreciated.
The second column is particularly easy because you simply replace the first occurrence of the separator.
for file in *.tsv; do
sed -i '2,$s/\t/\tX1_/' "$file"
done
If your sed doesn't recognize the symbol \t, use a literal tab (in many shells, you type it with ctrlv tab.) On *BSD (and hence MacOS) you need -i ''
AWK solution:
awk -i inplace 'BEGIN { FS=OFS="\t" } NR!=1 { $2 = "X1_" $2 } 1' file1.tsv
Input:
a b c
James England 25
Brian France 41
Maria France 18
Output:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
How do I find duplicates in a column?
$ head countries_lat_long_int_code3.csv | cat -n
1 country,latitude,longitude,name,code
2 AD,42.546245,1.601554,Andorra,376
3 AE,23.424076,53.847818,United Arab Emirates,971
4 AF,33.93911,67.709953,Afghanistan,93
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
7 AL,41.153332,20.168331,Albania,355
8 AM,40.069099,45.038189,Armenia,374
9 AN,12.226079,-69.060087,Netherlands Antilles,599
10 AO,-11.202692,17.873887,Angola,244
For instance this has duplicates in the 5th column.
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
How do I view all the others in this file?
I know I can do this:
awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort
And I can eyeball and see if there is any duplicates, but is there a better way?
Or I can do this:
Find out how may are there completely
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | wc -l
210
Find out how many unique values are there
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | uniq | wc -l
183
Therefore there are at most 27 (210-183) duplicates.
EDIT1
My desired output would be something as follows, basically all the columns but just showing the rows that are duplicates:
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
This will give you the duplicated codes
awk -F, 'a[$5]++{print $5}'
if you're only interested in count of duplicate codes
awk -F, 'a[$5]++{count++} END{print count}'
To print duplicated rows try this
awk -F, '$5 in a{print a[$5]; print} {a[$5]=$0}'
This will print the whole row with duplicates found in col $5:
awk -F, 'a[$5]++{print $0}'
This is the less memory aggressive i can guess:
$ cat infile
country,latitude,longitude,name,code
AD,42.546245,1.601554,Andorra,376
AE,23.424076,53.847818,United Arab Emirates,971
AF,33.93911,67.709953,Afghanistan,93
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AM,40.069099,45.038189,Armenia,374
AN,12.226079,-69.060087,Netherlands Antilles,599
AO,-11.202692,17.873887,Angola,355
$ awk -F\, '$NF in a{if (a[$NF]!=0){print a[$NF];a[$NF]=0}print;next}{a[$NF]=$0}' infile
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AO,-11.202692,17.873887,Angola,355
NOTE: I have included another duplicate for testing purposes.
If you just want to print out a unique value that repeat over the same file just add at the end of the awk:
awk ... ... | sort | uniq -u
That will print the unique values only on alphabetic order
I have been search aorund and could not find the answer... wonder if anyone can help here.
suppose I have a file contents the following:
File1:
name Joe
day Wednesday
lunch was fish
name John
dinner pie
day tuesday
lunch was noodles
name Mary
day Friday
lunch was fish pie
I wanted to grep and print only their name and what they had for lunch.
I suppose i can do
cat file1 | grep -iE 'name|lunch'
but what if i want to do a awk to just have their name and food like this output below?
Joe
fish
John
noodles
Mary
fish pie
I am aware to use awk to print, but this may require awk, is it possible for awk to lets say print $2 on one line, and print $3 on another?
Can I also output it in this format:
Person food
Joe fish
John noodles
Mary fish pie
Thanks
You can for example say:
$ awk '/name/ {print $2} /lunch/ {$1=$2=""; print}' file
Joe
fish
John
noodles
Mary
fish pie
Or remove the lunch was text:
awk '/name/ {print $2} /lunch/ {gsub("lunch was ",""); print}' file
To make the output in two columns:
$ awk -v OFS="\t" '/name/ {name=$2} /lunch/ {gsub("lunch was ",""); print name, $0}' a
Joe fish
John noodles
Mary fish pie
awk
with awk you can do it in one shot,
awk -v RS="" '{n=$2;sub(/.*lunch was\s*/,"");print n,$0}' file
Note that with this one-liner, the format of your input file should be fixed. Your data should be stored in data blocks and lunch was line should be at the end of each data block.
test with your example:
kent$ awk -v RS="" '{n=$2;sub(/.*lunch was\s*/,"");print n,$0}' file
Joe fish
John noodles
Mary fish pie
grep & sed
also you can do it in two steps, grep the values out, and merge lines
grep -Po 'name\s*\K.*|lunch was\s*\K.*' file|sed 'N;s/\n/ /'
with your input file, it outputs:
kent$ grep -Po 'name\s*\K.*|lunch was\s*\K.*' file|sed 'N;s/\n/ /'
Joe fish
John noodles
Mary fish pie
I am trying to find a way to exclude numbers on a file when I cat ti but I only want to exclude the numbers on print $1 and I want to keep the number that is in front of the word. I have something that I thought might might work but is not quite giving me what I want. I have also showed an example of what the file looks like.The file is separated by pipes.
cat files | awk -F '|' ' {print $1 "\t" $2}' |sed 's/0123456789//g'
input:
b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green
Output
b1ark dog23
m2eow cat24
h3iss nake57
try this:
awk -F'|' -v OFS='|' '{gsub(/[0-9]/,"",$1)}7' file
the output of your example would be:
bark | dog23 | brown
meow| cat24 |yellow
hiss | snake57 | green
EDIT
this outputs col1 (without ending numbers and spaces) and col2, separated by <tab>
kent$ echo "b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green"|awk -F'|' -v OFS='\t' '{gsub(/[0-9]*\s*$/,"",$1);print $1,$2}'
b1ark dog23
m2eow cat24
h3iss snake57
This might work for you (GNU sed):
sed -r 's/[0-9]*\s*\|\s*(\S*).*/ \1/' file