cut -f command for selecting multiple fields - cut

I have a file with as below:
A,B
1,hi there
2, Heloo there
I am trying to print the B column first and then column A
cat file.txt | cut -d "," -f2,1
However, this does not work. Any ideas how to do it?

Related

Count b or B in even lines

I need count the number of times in the even lines of the file.txt the letter 'b' or 'B' appears, e.g. for the file.txt like:
everyB or gbnBra
uitiakB and kanapB bodddB
Kanbalis astroBominus
I got the first part but I need to count these b or B letters and I do not know how to count them together
awk '!(NR%2)' file.txt
$ awk '!(NR%2){print gsub(/[bB]/,"")}' file
4
Could you please try following, one more approach with awk written on mobile will try it in few mins should work but.
awk -F'[bB]' 'NR%2 == 0{print (NF ? NF - 1 : 0)}' Input_file
Thanks to #Ed sir for solving zero matches found line problem in comments.
In a single awk:
awk '!(NR%2){gsub(/[^Bb]/,"");print length}' file.txt
gsub(/[^Bb]/,"") deletes every character in the line the line except for B and b.
print length prints the length of the resulting string.
awk '!(NR%2)' file.txt | tr -cd 'Bb' | wc -c
Explanation:
awk '!(NR%2)' file.txt : keep only even lines from file.txt
tr -cd 'Bb' : keep only B and b characters
wc -c : count characters
Example:
With file bellow, the result is 4.
everyB or gbnBra
uitiakB and kanapB bodddB
Kanbalis astroBominus
Here is another way
$ sed -n '2~2s/[^bB]//gp' file | wc -c

Pipe Command Output into Awk

I want to pipe the output of a command into awk. I want to add that number to every row of a new column in an existing .txt file. The new column should be at the end, and won't necessarily be column 2.
$command 1
4512438
$ input.txt
A
B
C
D
$ desired_ouput.txt
A 4512438
B 4512438
C 4512438
D 4512438
I think I need to do something along the lines of the following. I'm not sure how to designate that the pipe goes into the new column - this awk command will simply add integers to the column.
$ command1 | awk -F, '{$(NF+1)=++i;}1' OFS=, input.txt > desired_ouput.txt
It doesn't seem like you really want to pipe the value to awk. Instead, you want to pass it as a parameter. You could read it from the pipe with something like:
cmd1 | awk 'NR==FNR{a=$0} NR!=FNR{print $0,a}' - input.txt
but it seems much more natural to do:
awk '{print $0,a}' a="$(cmd1)" input.txt

Sed/Awk: how to find and remove two lines if a pattern in the first line is being repeated; bash

I am processing text file(s) with thousands of records per file. Each record is made up of two lines: a header that starts with ">" and followed by a line with a long string of characters "-AGTCNR". The header has 10 fields separated by "|" whose first field is a unique identifier to each record e.g ">KEN096-15" and a record is termed duplicate if it has same identifier. Here is how a simple record look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTTTCCT----TAAATAAT-----
Now I am triying to delete repeats, like duplicate records of "ACRJP458-10" and "PMANL2431-12".
Using a bash script I have extracted the unique identifiers and stored repeated ones in a variable "$duplicate_headers". Currently, I am trying to find any repeated instances of their two-line records and deleting them as follows:
for i in "$#"
do
unset duplicate_headers
duplicate_headers=`grep ">" $1 | awk 'BEGIN { FS="|"}; {print $1 "\n"; }' | sort | uniq -d`
for header in `echo -e "${duplicate_headers}"`
do
sed -i "/^.*\b${header}\b.*$/,+1 2d" $i
#sed -i "s/^.*\b${header}\b.*$//,+1 2g" $i
#sed -i "/^.*\b${header}\b.*$/{$!N; s/.*//2g; }" $i
done
done
The final result (with thousands of records in mind) will look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
$ awk -F'[|]' 'NR%2{f=seen[$1]++} !f' file
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
To run it on multiple files at once would be this to remove duplicates across all files:
awk -F'[|]' 'FNR%2{f=seen[$1]++} !f' *
or this to only remove duplicates within each file:
awk -F'[|]' 'FNR==1{delete seen} FNR%2{f=seen[$1]++} !f' *

Awk, order foreach 12 lines to insert query

I have the following script:
curl -s 'https://someonepage=5m' | jq '.[]|.[0],.[1],.[2],.[3],.[4],.[5],.[6],.[7],.[8],.[9],.[10],.[11],.[12]' | perl -p -e 's/\"//g' |awk '/^[0-9]/{print; if (++onr%12 == 0) print ""; }'
This is part of result:
1517773500000
0.10250100
0.10275700
0.10243500
0.10256600
257.26700000
1517773799999
26.38912220
1229
104.32200000
10.70579910
0
1517773800000
0.10256600
0.10268000
0.10231600
0.10243400
310.64600000
1517774099999
31.83806883
1452
129.70500000
13.29758266
0
1517774100000
0.10243400
0.10257500
0.10211800
0.10230000
359.06300000
1517774399999
36.73708621
1296
154.78500000
15.84041910
0
I want to insert this data in a MySQL database. I want for each line this result:
(1517773800000,0.10256600,0.10268000,0.10231600,0.10243400,310.64600000,1517774099999,31.83806883,1452,129.70500000,13.29758266,0)
(1517774100000,0.10243400,0.10257500,0.10211800,0.10230000,359.06300000,151774399999,36.73708621,1296,154.78500000,15.84041910,0)
I need merge lines each 12 lines, any can help me for get this result.
Here's an all-jq solution:
.[] | .[0:12] | #tsv | gsub("\t";",") | "(\(.))"
In the sample, all the subarrays have length 12, so you might be able to drop the .[0:12] part of the pipeline. If using jq 1.5 or later, you could use join(“,”) instead of the #tsv|gsub portion of the pipeline. You might, for example, want to consider:
.[] | join(“,”) | “(\(.))”. # jq 1.5 or later
Invocation: use the -r command-line option
Sample output:
(1517627400000,0.10452300,0.10499000,0.10418200,0.10449400,819.50400000,1517627699999,85.57150693,2340,452.63400000,47.27213035,0)
(1517627700000,0.10435700,0.10449200,0.10366000,0.10370000,717.37000000,1517627999999,74.60582079,1996,321.25500000,33.42273846,0)
(1517628000000,0.10376600,0.10390000,0.10366000,0.10370400,519.59400000,1517628299999,53.88836170,1258,239.89300000,24.88613854,0)
$ awk 'BEGIN {RS=""; OFS=","} {$1=$1; $0="("$0")"}1' file
(1517773500000,0.10250100,0.10275700,0.10243500,0.10256600,257.26700000,1517773799999,26.38912220,1229,104.32200000,10.70579910,0)
(1517773800000,0.10256600,0.10268000,0.10231600,0.10243400,310.64600000,1517774099999,31.83806883,1452,129.70500000,13.29758266,0)
(1517774100000,0.10243400,0.10257500,0.10211800,0.10230000,359.06300000,1517774399999,36.73708621,1296,154.78500000,15.84041910,0)
RS="":
Treat groups of lines separated one or more blank lines as a record
OFS=","
Set the output separator to be a ","
$1=$1
Reconstitute the line, replacing the input separators with the output separator
$0="("$0")"
Surround the record with parens
1
Print the record

awk: how to delete first and last value on entire column

I have a data that is comprised of several columns. On one column I would like to delete two commas that are each located in beginning and the end of entire column. My data looks something like this:
a ,3,4,3,2,
b ,3,4,5,1,
c ,1,5,2,4,5,
d ,3,6,24,62,3,54,
Can someone teach me how to delete the first and last commas on this data? I would appreciate it.
$ awk '{gsub(/^,|,$/,"",$NF)}1' file
a 3,4,3,2
b 3,4,5,1
c 1,5,2,4,5
d 3,6,24,62,3,54
awk '{sub(/,/,"",$0); print substr($0,0,length($0)-1)}' input.txt
Output:
a 3,4,3,2,
b 3,4,5,1,
c 1,5,2,4,5,
d 3,6,24,62,3,54
You can do it with sed too:
sed -e 's/,//' -e 's/,$//' file
That says "substitue the first comma on the line with nothing" and then "substitute a comma followed by end of line with nothing".
If you want it to write a new file, do this:
sed -e 's/,//' -e 's/,$//' file > newfile.txt