Input:
dog
fish
elephant
...
Output:
dog |
fish |
elephant|
... |
I want to add a "|" on the 9th character of every row
You should first space pad the lines to the max line width (eg: 8 chars as you say).
Then, you can use
sed 's/./&|/9' <padded.txt >output.txt
Hard-coding the output field width:
$ awk '{printf "%-*s|\n",8,$0}' file
dog |
fish |
elephant|
... |
or specifying the output field width as an argument:
$ awk -v wid=8 '{printf "%-*s|\n",wid,$0}' file
dog |
fish |
elephant|
... |
or dynamically determining the output field width from the input field widths:
$ awk 'NR==FNR{lgth=length($0); wid=(lgth > wid ? lgth : wid); next} {printf "%-*s|\n",wid,$0}' file file
dog |
fish |
elephant|
... |
If you need to further process the records, it might be a good idea to actually make the $0 9 chars wide:
$ awk '{$0=$0 sprintf("%" 9-length() "s","|")}1' file
Output:
dog |
fish |
elephant|
... |
How do I find duplicates in a column?
$ head countries_lat_long_int_code3.csv | cat -n
1 country,latitude,longitude,name,code
2 AD,42.546245,1.601554,Andorra,376
3 AE,23.424076,53.847818,United Arab Emirates,971
4 AF,33.93911,67.709953,Afghanistan,93
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
7 AL,41.153332,20.168331,Albania,355
8 AM,40.069099,45.038189,Armenia,374
9 AN,12.226079,-69.060087,Netherlands Antilles,599
10 AO,-11.202692,17.873887,Angola,244
For instance this has duplicates in the 5th column.
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
How do I view all the others in this file?
I know I can do this:
awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort
And I can eyeball and see if there is any duplicates, but is there a better way?
Or I can do this:
Find out how may are there completely
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | wc -l
210
Find out how many unique values are there
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | uniq | wc -l
183
Therefore there are at most 27 (210-183) duplicates.
EDIT1
My desired output would be something as follows, basically all the columns but just showing the rows that are duplicates:
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
This will give you the duplicated codes
awk -F, 'a[$5]++{print $5}'
if you're only interested in count of duplicate codes
awk -F, 'a[$5]++{count++} END{print count}'
To print duplicated rows try this
awk -F, '$5 in a{print a[$5]; print} {a[$5]=$0}'
This will print the whole row with duplicates found in col $5:
awk -F, 'a[$5]++{print $0}'
This is the less memory aggressive i can guess:
$ cat infile
country,latitude,longitude,name,code
AD,42.546245,1.601554,Andorra,376
AE,23.424076,53.847818,United Arab Emirates,971
AF,33.93911,67.709953,Afghanistan,93
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AM,40.069099,45.038189,Armenia,374
AN,12.226079,-69.060087,Netherlands Antilles,599
AO,-11.202692,17.873887,Angola,355
$ awk -F\, '$NF in a{if (a[$NF]!=0){print a[$NF];a[$NF]=0}print;next}{a[$NF]=$0}' infile
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AO,-11.202692,17.873887,Angola,355
NOTE: I have included another duplicate for testing purposes.
If you just want to print out a unique value that repeat over the same file just add at the end of the awk:
awk ... ... | sort | uniq -u
That will print the unique values only on alphabetic order
What's wrong with my syntax here?
awk -F '|' 'sub/\s*\w*/,"Visit our website!","$3"' merchant_report
it's suppose to turn
|bob|jones| blagblag| texas
|tom|markus| | alabama
into
|bob|jones|Visit our website!| texas
|tom|markus| | alabama
this line may do what you want:
awk -F'|' -v OFS="|" 'NR==1{$4="Visit our website!"}1' file
in your awk codes:
you have had FS to separate the fields, you don't need the sub func., just set $3 directly.
even if with sub( ) function, your syntax is not correct. you can get detail info by man gawk
it is actually not $3, it is $4. because your line starting with |
if you want to just change the first line, you should add NR==1 otherwise awk will do the change on all lines
example with the code:
kent$ cat file
|bob|jones| blagblag| texas
|tom|markus| | alabama
kent$ awk -F'|' -v OFS="|" 'NR==1{$4="Visit our website!"}1' file
|bob|jones|Visit our website!| texas
|tom|markus| | alabama
In awk you would just assign the field the new value for the given line. If you are more comfortable with the substitution approach try sed:
sed '1s/|[^|]*/|Visit our website!/3' file
|bob|jones|Visit our website!| texas
|tom|markus| | alabama
I have a file with the following lines of text:
jeremy , thomas , 123
peter , paul , 456
jack , jill , 789
I would like to remove all of the data except for the center item. For example ending up with a file which contains:
thomas
paul
jill
I have tried so many awk patterns my brain is exploding. Any help would be appreciated.
Try awk:
awk -F '[[:space:]]*,[[:space:]]*' '{print $2}' input.txt
Try this
cat <filepath> | tr -d ' ' | cut -d',' -f2
grep look around:
grep -Po '(?<=, ).*(?= ,)' file
Try this:
$ cat (your file)| cut -d ',' -f2 > (new file)
for instance:
$ cat /home/file1.txt | cut -d ',' -f2 > /home/file2.txt
I have e a line that looks like:
Feb 21 1:05:14 host kernel: [112.33000] SRC=192.168.0.1 DST=90.90.90.90 PREC=0x40 TTL=51 ....
I would like to the a list of uniq IPs from SRC=
How can I do this? Thanks
This will work, although you could probably simplify it further in a single awk script if you wanted:
awk '{print $7}' <your file> | awk -F= '{print $2}' | sort -u
grep -o 'SRC=\([^ ]\+\)' | cut -d= -f2 | sort -u
cat thefile | grep SRC= | sed -r 's/^.*SRC=([^ ]+).*$/\1/' | sort | uniq
This awk script will do:
{a[$7]=1}
END{for (i in a) print i}
This will print the IP addresses in order without the "SRC=" string:
awk '{a[$7] = $7} END {asort(a); for (i in a) {split(a[i], b, "="); print b[2]}}' inputfile
Example output:
192.168.0.1
192.168.0.2
192.168.1.1
grep -Po "SRC=(.[^\s]*)" file | sed 's/SRC=//' | sort -u
Ruby(1.9+)
ruby -ne 'puts $_.scan(/SRC=(.[^\s]*)/)[0] if /SRC=/' file| sort -u