Exclude characters using awk

Exclude characters using awk - scripting

I am trying to find a way to exclude numbers on a file when I cat ti but I only want to exclude the numbers on print $1 and I want to keep the number that is in front of the word. I have something that I thought might might work but is not quite giving me what I want. I have also showed an example of what the file looks like.The file is separated by pipes.
cat files | awk -F '|' ' {print $1 "\t" $2}' |sed 's/0123456789//g'
input:
b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green
Output
b1ark dog23
m2eow cat24
h3iss nake57

try this:
awk -F'|' -v OFS='|' '{gsub(/[0-9]/,"",$1)}7' file
the output of your example would be:
bark | dog23 | brown
meow| cat24 |yellow
hiss | snake57 | green
EDIT
this outputs col1 (without ending numbers and spaces) and col2, separated by <tab>
kent$ echo "b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green"|awk -F'|' -v OFS='\t' '{gsub(/[0-9]*\s*$/,"",$1);print $1,$2}'
b1ark dog23
m2eow cat24
h3iss snake57

This might work for you (GNU sed):
sed -r 's/[0-9]*\s*\|\s*(\S*).*/ \1/' file

Related

Adding a character to vertically on column

Input:
dog
fish
elephant
...
Output:
dog |
fish |
elephant|
... |
I want to add a "|" on the 9th character of every row

You should first space pad the lines to the max line width (eg: 8 chars as you say).
Then, you can use
sed 's/./&|/9' <padded.txt >output.txt

Hard-coding the output field width:
$ awk '{printf "%-*s|\n",8,$0}' file
dog |
fish |
elephant|
... |
or specifying the output field width as an argument:
$ awk -v wid=8 '{printf "%-*s|\n",wid,$0}' file
dog |
fish |
elephant|
... |
or dynamically determining the output field width from the input field widths:
$ awk 'NR==FNR{lgth=length($0); wid=(lgth > wid ? lgth : wid); next} {printf "%-*s|\n",wid,$0}' file file
dog |
fish |
elephant|
... |

If you need to further process the records, it might be a good idea to actually make the $0 9 chars wide:
$ awk '{$0=$0 sprintf("%" 9-length() "s","|")}1' file
Output:
dog |
fish |
elephant|
... |

awk + How do I find duplicates in a column?

How do I find duplicates in a column?
$ head countries_lat_long_int_code3.csv | cat -n
1 country,latitude,longitude,name,code
2 AD,42.546245,1.601554,Andorra,376
3 AE,23.424076,53.847818,United Arab Emirates,971
4 AF,33.93911,67.709953,Afghanistan,93
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
7 AL,41.153332,20.168331,Albania,355
8 AM,40.069099,45.038189,Armenia,374
9 AN,12.226079,-69.060087,Netherlands Antilles,599
10 AO,-11.202692,17.873887,Angola,244
For instance this has duplicates in the 5th column.
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1
How do I view all the others in this file?
I know I can do this:
awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort
And I can eyeball and see if there is any duplicates, but is there a better way?
Or I can do this:
Find out how may are there completely
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | wc -l
210
Find out how many unique values are there
$ awk -F, 'NR>1{print $5}' countries_lat_long_int_code3.csv | sort | uniq | wc -l
183
Therefore there are at most 27 (210-183) duplicates.
EDIT1
My desired output would be something as follows, basically all the columns but just showing the rows that are duplicates:
5 AG,17.060816,-61.796428,Antigua and Barbuda,1
6 AI,18.220554,-63.068615,Anguilla,1

This will give you the duplicated codes
awk -F, 'a[$5]++{print $5}'
if you're only interested in count of duplicate codes
awk -F, 'a[$5]++{count++} END{print count}'
To print duplicated rows try this
awk -F, '$5 in a{print a[$5]; print} {a[$5]=$0}'
This will print the whole row with duplicates found in col $5:
awk -F, 'a[$5]++{print $0}'

This is the less memory aggressive i can guess:
$ cat infile
country,latitude,longitude,name,code
AD,42.546245,1.601554,Andorra,376
AE,23.424076,53.847818,United Arab Emirates,971
AF,33.93911,67.709953,Afghanistan,93
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AM,40.069099,45.038189,Armenia,374
AN,12.226079,-69.060087,Netherlands Antilles,599
AO,-11.202692,17.873887,Angola,355
$ awk -F\, '$NF in a{if (a[$NF]!=0){print a[$NF];a[$NF]=0}print;next}{a[$NF]=$0}' infile
AG,17.060816,-61.796428,Antigua and Barbuda,1
AI,18.220554,-63.068615,Anguilla,1
AL,41.153332,20.168331,Albania,355
AO,-11.202692,17.873887,Angola,355
NOTE: I have included another duplicate for testing purposes.

If you just want to print out a unique value that repeat over the same file just add at the end of the awk:
awk ... ... | sort | uniq -u
That will print the unique values only on alphabetic order

Substitute in only field 3

What's wrong with my syntax here?
awk -F '|' 'sub/\s*\w*/,"Visit our website!","$3"' merchant_report
it's suppose to turn
|bob|jones| blagblag| texas
|tom|markus| | alabama
into
|bob|jones|Visit our website!| texas
|tom|markus| | alabama

this line may do what you want:
awk -F'|' -v OFS="|" 'NR==1{$4="Visit our website!"}1' file
in your awk codes:
you have had FS to separate the fields, you don't need the sub func., just set $3 directly.
even if with sub( ) function, your syntax is not correct. you can get detail info by man gawk
it is actually not $3, it is $4. because your line starting with |
if you want to just change the first line, you should add NR==1 otherwise awk will do the change on all lines
example with the code:
kent$ cat file
|bob|jones| blagblag| texas
|tom|markus| | alabama
kent$ awk -F'|' -v OFS="|" 'NR==1{$4="Visit our website!"}1' file
|bob|jones|Visit our website!| texas
|tom|markus| | alabama

In awk you would just assign the field the new value for the given line. If you are more comfortable with the substitution approach try sed:
sed '1s/|[^|]*/|Visit our website!/3' file
|bob|jones|Visit our website!| texas
|tom|markus| | alabama

Split a text file based on delimiter using linux command line

I have a file with the following lines of text:
jeremy , thomas , 123
peter , paul , 456
jack , jill , 789
I would like to remove all of the data except for the center item. For example ending up with a file which contains:
thomas
paul
jill
I have tried so many awk patterns my brain is exploding. Any help would be appreciated.

Try awk:
awk -F '[[:space:]]*,[[:space:]]*' '{print $2}' input.txt

Try this
cat <filepath> | tr -d ' ' | cut -d',' -f2

grep look around:
grep -Po '(?<=, ).*(?= ,)' file

Try this:
$ cat (your file)| cut -d ',' -f2 > (new file)
for instance:
$ cat /home/file1.txt | cut -d ',' -f2 > /home/file2.txt

grep and awk parse line

I have e a line that looks like:
Feb 21 1:05:14 host kernel: [112.33000] SRC=192.168.0.1 DST=90.90.90.90 PREC=0x40 TTL=51 ....
I would like to the a list of uniq IPs from SRC=
How can I do this? Thanks

This will work, although you could probably simplify it further in a single awk script if you wanted:
awk '{print $7}' <your file> | awk -F= '{print $2}' | sort -u

grep -o 'SRC=\([^ ]\+\)' | cut -d= -f2 | sort -u

cat thefile | grep SRC= | sed -r 's/^.*SRC=([^ ]+).*$/\1/' | sort | uniq

This awk script will do:
{a[$7]=1}
END{for (i in a) print i}

This will print the IP addresses in order without the "SRC=" string:
awk '{a[$7] = $7} END {asort(a); for (i in a) {split(a[i], b, "="); print b[2]}}' inputfile
Example output:
192.168.0.1
192.168.0.2
192.168.1.1

grep -Po "SRC=(.[^\s]*)" file | sed 's/SRC=//' | sort -u
Ruby(1.9+)
ruby -ne 'puts $_.scan(/SRC=(.[^\s]*)/)[0] if /SRC=/' file| sort -u

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Exclude characters using awk - scripting

This might work for you (GNU sed): sed -r 's/[0-9]\s\|\s(\S).*/ \1/' file

Related

Adding a character to vertically on column

awk + How do I find duplicates in a column?

Substitute in only field 3

Split a text file based on delimiter using linux command line

grep and awk parse line

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Exclude characters using awk - scripting

This might work for you (GNU sed): sed -r 's/[0-9]*\s*\|\s*(\S*).*/ \1/' file

Related

Adding a character to vertically on column

awk + How do I find duplicates in a column?

Substitute in only field 3

Split a text file based on delimiter using linux command line

grep and awk parse line

Categories

Resources

This might work for you (GNU sed): sed -r 's/[0-9]\s\|\s(\S).*/ \1/' file