Split a text file based on delimiter using linux command line

Split a text file based on delimiter using linux command line - awk

I have a file with the following lines of text:
jeremy , thomas , 123
peter , paul , 456
jack , jill , 789
I would like to remove all of the data except for the center item. For example ending up with a file which contains:
thomas
paul
jill
I have tried so many awk patterns my brain is exploding. Any help would be appreciated.

Try awk:
awk -F '[[:space:]]*,[[:space:]]*' '{print $2}' input.txt

Try this
cat <filepath> | tr -d ' ' | cut -d',' -f2

grep look around:
grep -Po '(?<=, ).*(?= ,)' file

Try this:
$ cat (your file)| cut -d ',' -f2 > (new file)
for instance:
$ cat /home/file1.txt | cut -d ',' -f2 > /home/file2.txt

Related

How to use tab as delimiter in this cut command?

cut -f2 words.txt | sort | uniq -c | sort -nr > counted.txt
This command selects the column from the file, count each entry and sort based on frequency, and then output the result to counted.txt, ranked by frequencies, as follows:
1 71321 good
2 14945 bad
3 5891 nice
4 4641 pretty
5 4494 slow
6 3671 quick
...
However, this output delimiter is a white space , and I want a tab as delimiter. I tried the following with --output-delimiter:
cut -f2 --output-delimiter='\t' words.txt | sort | uniq -c | sort -nr > counted.txt
But it reports an error, invalid option.
How to make the output delimiter a tab?

Try to do as follows :
cut -d'\t' words.txt -f2 | sort | uniq -c | sort -nr > counted.txt
The -d option specifies the delimiter you want to use

Sed replace nth column of multiple tsv files without header

Here are multiple tsv files, where I want to add 'XX' characters only in the second column (everywhere except in the header) and save it to this same file.
Input:
$ls
file1.tsv file2.tsv file3.tsv
$head -n 4 file1.tsv
a b c
James England 25
Brian France 41
Maria France 18
Ouptut wanted:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18
I tried this, but the result is not kept in the file, and a simple redirection won't work:
# this works, but doesn't save the changes
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed "s|^|X${i}_|"
i=$((i+1))
done
# adding '-i' option to sed: this throws an error but would be perfect (sed no input files error)
i=1
for f in *tsv
do awk '{if (NR!=1) print $2}’ $f | sed -i "s|^|T${i}_|"
i=$((i+1))
done
Some help would be appreciated.

The second column is particularly easy because you simply replace the first occurrence of the separator.
for file in *.tsv; do
sed -i '2,$s/\t/\tX1_/' "$file"
done
If your sed doesn't recognize the symbol \t, use a literal tab (in many shells, you type it with ctrlv tab.) On *BSD (and hence MacOS) you need -i ''

AWK solution:
awk -i inplace 'BEGIN { FS=OFS="\t" } NR!=1 { $2 = "X1_" $2 } 1' file1.tsv
Input:
a b c
James England 25
Brian France 41
Maria France 18
Output:
a b c
James X1_England 25
Brian X1_France 41
Maria X1_France 18

Exclude characters using awk

I am trying to find a way to exclude numbers on a file when I cat ti but I only want to exclude the numbers on print $1 and I want to keep the number that is in front of the word. I have something that I thought might might work but is not quite giving me what I want. I have also showed an example of what the file looks like.The file is separated by pipes.
cat files | awk -F '|' ' {print $1 "\t" $2}' |sed 's/0123456789//g'
input:
b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green
Output
b1ark dog23
m2eow cat24
h3iss nake57

try this:
awk -F'|' -v OFS='|' '{gsub(/[0-9]/,"",$1)}7' file
the output of your example would be:
bark | dog23 | brown
meow| cat24 |yellow
hiss | snake57 | green
EDIT
this outputs col1 (without ending numbers and spaces) and col2, separated by <tab>
kent$ echo "b1ark45 | dog23 | brown
m2eow66| cat24 |yellow
h3iss67 | snake57 | green"|awk -F'|' -v OFS='\t' '{gsub(/[0-9]*\s*$/,"",$1);print $1,$2}'
b1ark dog23
m2eow cat24
h3iss snake57

This might work for you (GNU sed):
sed -r 's/[0-9]*\s*\|\s*(\S*).*/ \1/' file

Reading a file from line 4 to the end

I want to read a file from the line 4 to the very end is there anyway to this with awk or something?

This sed command will do:
sed -n '4,$p' file.txt
Or using awk:
awk 'NR>=4' file.txt
Or using tail:
tail +4 file.txt

awk 'NR >= 4 {print $0}'
For example
$> seq 101 110 | awk 'NR >= 4 {print $0}'
104
105
106
107
108
109
110

tail +4 filename ll serve ur purpose.
more on tail

heres a method (that can depend on the type of shell you use, bash should work):
tmpvar=`cat a_file | wc -l `; tail -$((tmpvar-4)) a_file
heres another method that should work in more shells:
cat a_file -n | awk '{if($1>4) print $2}'

grep and awk parse line

I have e a line that looks like:
Feb 21 1:05:14 host kernel: [112.33000] SRC=192.168.0.1 DST=90.90.90.90 PREC=0x40 TTL=51 ....
I would like to the a list of uniq IPs from SRC=
How can I do this? Thanks

This will work, although you could probably simplify it further in a single awk script if you wanted:
awk '{print $7}' <your file> | awk -F= '{print $2}' | sort -u

grep -o 'SRC=\([^ ]\+\)' | cut -d= -f2 | sort -u

cat thefile | grep SRC= | sed -r 's/^.*SRC=([^ ]+).*$/\1/' | sort | uniq

This awk script will do:
{a[$7]=1}
END{for (i in a) print i}

This will print the IP addresses in order without the "SRC=" string:
awk '{a[$7] = $7} END {asort(a); for (i in a) {split(a[i], b, "="); print b[2]}}' inputfile
Example output:
192.168.0.1
192.168.0.2
192.168.1.1

grep -Po "SRC=(.[^\s]*)" file | sed 's/SRC=//' | sort -u
Ruby(1.9+)
ruby -ne 'puts $_.scan(/SRC=(.[^\s]*)/)[0] if /SRC=/' file| sort -u

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Split a text file based on delimiter using linux command line - awk

Try awk: awk -F '[[:space:]],[[:space:]]' '{print $2}' input.txt

Try this cat <filepath> | tr -d ' ' | cut -d',' -f2

grep look around: grep -Po '(?<=, ).*(?= ,)' file

Try this: $ cat (your file)| cut -d ',' -f2 > (new file) for instance: $ cat /home/file1.txt | cut -d ',' -f2 > /home/file2.txt

Related

How to use tab as delimiter in this cut command?

Sed replace nth column of multiple tsv files without header

Exclude characters using awk

Reading a file from line 4 to the end

grep and awk parse line

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Split a text file based on delimiter using linux command line - awk

Try awk: awk -F '[[:space:]]*,[[:space:]]*' '{print $2}' input.txt

Try this cat <filepath> | tr -d ' ' | cut -d',' -f2

grep look around: grep -Po '(?<=, ).*(?= ,)' file

Try this: $ cat (your file)| cut -d ',' -f2 > (new file) for instance: $ cat /home/file1.txt | cut -d ',' -f2 > /home/file2.txt

Related

How to use tab as delimiter in this cut command?

Sed replace nth column of multiple tsv files without header

Exclude characters using awk

Reading a file from line 4 to the end

grep and awk parse line

Categories

Resources

Try awk: awk -F '[[:space:]],[[:space:]]' '{print $2}' input.txt