How to delete first three columns in a delimited file - awk

For example, I have a csv file as follow,
12345432|1346283301|5676438284971|13564357342151697 ...
87540258|1356433301|1125438284971|135643643462151697 ...
67323266|1356563471|1823543828471|13564386436651697 ...
and hundreds more columns but I want to remove first three columns and save to a new file(if possible same file would be better for me)
This is the result I want.
13564357342151697 ...
135643643462151697 ...
13564386436651697 ...
I have been looking and trying but I am not able to do it. And below is the code I have.
awk -F'|' '{print $1 > "newfile"; sub(/^[^|]+\|/,"")}1' old.csv > new.csv
Appreciate if someone can help me. Thank you.

You can use cut :
cut -f4- -d'|' old.csv > new.csv

#Heng: try:
awk -F"|" '{for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"":"|")};print ""}' Input_file
OR
awk -F"|" '{for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"\n":"|")};}' Input_file
you could re-direct this command's output into a file as per your need.
EDIT:
awk -F"|" 'FNR==1{++e;fi="REPORT_A1_"e;} {for(i=4;i<=NF;i++){printf("%s%s",$i,i==NF?"\n":"|") > fi}}' Input_file1 Input_file2 Input_file3

This is what you're looking for:
awk -F '|' '{$1=$2=$3=""; print $0}' oldfile > newfile
But it will have leading whitespaces so then add the following substitution:
sub(/^[ \t\|]+/,"") --> changed to sub(/^[ \t\|]+/,"") (escaped leading '|' from column removal)
awk -F '|' '{$1=$2=$3="";OFS="|";sub(/^[ \t\|]+/,"") ;print $0}' oldFile > newFile

awk -F\| '{print $NF}' file >newfile
13564357342151697 ...
135643643462151697 ...
13564386436651697 ...

Related

Print line modified and the line after using awk

I want to modify lines in a file using awk and print the new lines with the following line.
My file is like this
Name_Name2_ Name3_Name4
ASHRGSJFSJRGDJRG
Name5_Name6_Name7_Name8
ADGTHEGHGTJKLGRTIWRK
I want
Name-Name2
ASHRGSJFSJRGDJRG
Name5-Name6
ADGTHEGHGTJKLGRTIWRK
I have sued awk to modify my file:
awk -F'_' {print $1 "-" $2} file > newfile
but I don't know how to tell to print also the line just after (ABDJRH)
sure is it possible with awk x=NR+1 NR<=x
thanks
Following awk may help you on same.
awk -F"_" '/_/{print $1"-"$2;next} 1' Input_file
assuming your structure in sample (no separation in line with "data" letter )
awk '$0=$1' Input_file
# or with sed
sed 's/[[:space:]].*//' Input_file

Using AWK to extract one column from a tab separated file

I know this is a simple question, but the awk command is literally melting my brain. I have a tab separated file "inputfile.gtf" and I need to extract one column from it and put it into a new file "newfile.tsv" I cannot for the life of me figure out the proper syntax to do this with awk. Here is what I've tried:
awk -F, 'BEGIN{OFS="/t"} {print $8}' inputfile.gtf > newfile.tsv
also
awk 'BEGIN{OFS="/t";FS="/t"};{print $8}' inputfile.gtf > newfile.tsv
Both of these just give me an empty file. Everywhere I search, people seem to have completely different ways of trying to achieve this simple task, and at this point I am completely lost. Any help would be greatly appreciated. Thanks.
Why not simpler :
awk -F'\t' '{print $8}' inputfile.gtf > newfile.tsv
You have specified the wrong delimiter /t, the tab character typed as \t:
awk 'BEGIN{ FS=OFS="\t" }{ print $8 }' inputfile.gtf > newfile.tsv
Your 1st command :
awk -F, 'BEGIN{OFS="/t"} {print $8}' inputfile.gtf > newfile.tsv
You are setting -F, which is not required, as your file is not , comma separated.
next, OFS="/t" : Syntax is incorrect, it should be OFS="\t", but again you don't need this as you don't want to set Output fields separator as \t since you're printing only a single record and OFS is not at all involved in this case; unless you print atleast two fields.
Your 2nd command :
awk 'BEGIN{OFS="/t";FS="/t"};{print $8}' inputfile.gtf > newfile.tsv
Again it's not /t it should be \t. Also FS="\t" is similar to -F "\t"
What you actually need is :
awk -F"\t" '{print $8}' inputfile.gtf > newfile.tsv
or
awk -v FS="\t" '{print $8}' inputfile.gtf > newfile.tsv
And if your file has just tabs and your fields don't have spaces in between then you can simply use :
awk '{print $8}' inputfile.gtf > newfile.tsv

awk, print all columns and add new column with substr

I have this table
USI,Name,2D-3D
RO0001,Patate,2D
RO0002,Haricot,3D
RO0003,Banane,2D
RO0004,Pomme,2D
RO0005,Poire,2D
and I want this
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
I manage to obtain the construction "RO_2D_Patate" with awk
awk -F "," '{print substr($1,1,2)"_"substr($3,1,2)"_"$2}' Test4.txt
But I want to print all my column $0 before as my second table.
I tried everything But I am still a novice !!!!
Any idea over there?
awk -F, '{print $0 (NR>1 ? FS substr($1,1,2)"_"$3"_"$2 : "")}' Test4.txt
$ awk -F, -v OFS=, 'NR>1{$4=substr($1,1,2)"_"$3"_"$2}1' Test4.txt
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire
awk -F, 'NR>1{print $0,substr($1,1,2)"_"$NF"_"$2}/USI/' OFS=, file
USI,Name,2D-3D
RO0001,Patate,2D,RO_2D_Patate
RO0002,Haricot,3D,RO_3D_Haricot
RO0003,Banane,2D,RO_2D_Banane
RO0004,Pomme,2D,RO_2D_Pomme
RO0005,Poire,2D,RO_2D_Poire

initialising field seperators on condition in awk

I know that initialising FS in BEGIN is the correct practice but what if i need different field seperators for different lines(lines containing a particular pattern)? eg: my awk script is
{if($0 ~ /.*youtube.*/){FS="=";print $2}}
This code is not processing the first line.How to fix this?
You can use split. Eks get the middle date from third field green
echo "on,cat ,blue|green|red,more" | awk -F, '{split($3,a,"|");print a[2]}'
green
And you BEGIN block is not only where you can set the Field Separator:
echo "on,two,three" | awk -F, '{print $2}'
echo "on,two,three" | awk '{print $2}' FS=,
echo "on,two,three" | awk 'BEGIN{FS=","} {print $2}'
echo "on,two,three" | awk -v FS=, '{print $2}'
All these will print two
But they may have some different impact in when they can be used.
awk -F, 'BEGIN{print FS}'
,
and this does not work and gives no output.
awk 'BEGIN{print FS}' FS=,
Back to your problem:
This:
awk '{if($0 ~ /.*youtube.*/){FS="=";print $2}}' file
should be:
awk '{if($0 ~ /.*youtube.*/){split($0,a,"=");print a[2]}}' file
You do not need to test for any characters before and after regex, so:
awk '{if($0 ~ /youtube/){split($0,a,"=");print a[2]}}' file
And this could even more be simplified:
awk '/youtube/ {split($0,a,"=");print a[2]}' file
If data is like this:
cat file
youtube=thisisyoutube1 //starts here
youtube=thisisyoutube2
youtube=thisisyoutube3
youtube=thisisyoutube4
yautube=thisisnottobeprinted
Then do like this:
awk -F= '/youtube/ {split($2,a," ");print a[1]}' file
thisisyoutube1
thisisyoutube2
thisisyoutube3
thisisyoutube4

use awk to print a column, adding a comma

I have a file, from which I want to retrieve the first column, and add a comma between each value.
Example:
AAAA 12345 xccvbn
BBBB 43431 fkodks
CCCC 51234 plafad
to obtain
AAAA,BBBB,CCCC
I decided to use awk, so I did
awk '{ $1=$1","; print $1 }'
Problem is: this add a comma also on the last value, which is not what I want to achieve, and also I get a space between values.
How do I remove the comma on the last element, and how do I remove the space? Spent 20 minutes looking at the manual without luck.
$ awk '{printf "%s%s",sep,$1; sep=","} END{print ""}' file
AAAA,BBBB,CCCC
or if you prefer:
$ awk '{printf "%s%s",(NR>1?",":""),$1} END{print ""}' file
AAAA,BBBB,CCCC
or if you like golf and don't mind it being inefficient for large files:
$ awk '{r=r s $1;s=","} END{print r}' file
AAAA,BBBB,CCCC
awk {'print $1","$2","$3'} file_name
This is the shortest I know
Why make it complicated :) (as long as file is not too large)
awk '{a=NR==1?$1:a","$1} END {print a}' file
AAAA,BBBB,CCCC
For better porability.
awk '{a=(NR>1?a",":"")$1} END {print a}' file
You can do this:
awk 'a++{printf ","}{printf "%s", $1}' file
a++ is interpreted as a condition. In the first row its value is 0, so the comma is not added.
EDIT:
If you want a newline, you have to add END{printf "\n"}. If you have problems reading in the file, you can also try:
cat file | awk 'a++{printf ","}{printf "%s", $1}'
awk 'NR==1{printf "%s",$1;next;}{printf "%s%s",",",$1;}' input.txt
It says: If it is first line only print first field, for the other lines first print , then print first field.
Output:
AAAA,BBBB,CCCC
In this case, as simple cut and paste solution
cut -d" " -f1 file | paste -s -d,
In case somebody as me wants to use awk for cleaning docker images:
docker image ls | grep tag_name | awk '{print $1":"$2}'
Surpised that no one is using OFS (output field separator). Here is probably the simplest solution that sticks with awk and works on Linux and Mac: use "-v OFS=," to output in comma as delimiter:
$ echo '1:2:3:4' | awk -F: -v OFS=, '{print $1, $2, $4, $3}' generates:
1,2,4,3
It works for multiple char too:
$ echo '1:2:3:4' | awk -F: -v OFS=., '{print $1, $2, $4, $3}' outputs:
1.,2.,4.,3
Using Perl
$ cat group_col.txt
AAAA 12345 xccvbn
BBBB 43431 fkodks
CCCC 51234 plafad
$ perl -lane ' push(#x,$F[0]); END { print join(",",#x) } ' group_col.txt
AAAA,BBBB,CCCC
$
This can be very simple like this:
awk -F',' '{print $1","$1","$2","$3}' inputFile
where input file is : 1,2,3
2,3,4 etc.
I used the following, because it lists the api-resource names with it, which is useful, if you want to access it directly. I also use a label "application" to find specific apps in a namespace:
kubectl -n ops-tools get $(kubectl api-resources --no-headers=true --sort-by=name | awk '{printf "%s%s",sep,$1; sep=","}') -l app.kubernetes.io/instance=application