I have a data set created by a tool with file name test.deg. The file contents is as follows:
1 I0.XPDIN1 1.581e-01 1.507e-01 3.662e-04 3.891e-02
2 I0.XPXA1 1.577e-01 1.502e-01 3.653e-04 3.859e-02
3 I0.XPXA2 1.538e-01 1.444e-01 3.552e-04 3.471e-02
I have a second file ,test.spf, containing the following information:
XPDIN1 XPDIN1#d XPDIN1#g XPDIN1#s VPP
XPXA1 XPXA1#d XPXA1#g XPXA1#s VPP
XPXA2 XPXA2#d XPXA2#g XPXA2#s VPP
I am trying to write an awk script that matches the Instance name from test.deg to the instance name in test.spf. When the script sees a match I would like the 5th column's contents appended to that matched instance name's line end. Example output for I0.XPDIN1 in test.deg would be XPDIN1 XPDIN1#d XPDIN1#g XPDIN1#s VPP 3.662e-04
The script needs to match the instance name from test.deg after the prefix I0. to the first instance name call in test.spf then add the 5th columns data.
Thanks,
Bad Awk
GNU Awk
$ awk 'FNR==NR{a[$2]=$5; next} ("I0."$1 in a){$6=a["I0."$1]}1' test.deg test.spf
XPDIN1 XPDIN1#d XPDIN1#g XPDIN1#s VPP 3.662e-04
XPXA1 XPXA1#d XPXA1#g XPXA1#s VPP 3.653e-04
XPXA2 XPXA2#d XPXA2#g XPXA2#s VPP 3.552e-04
I need help with a simple dash script solution. A script reading the values of file: "Install.txt", (sample content):
TRUE 203
TRUE 301
TRUE 602
TRUE 603
The numbers will correspond with the same number (at the end of line) in file: "ExtraExt.sys", (sample content):
# Read $[EXTRA_DIR]/diaryThumbPlace #202
# Read $[CORE_DIR]/myBorderStyle #203
# Read $[EXTRA_DIR]/mMenu #301
# Read $[EXTRA_DIR]/dDecor #501
# Read $[EXTRA_DIR]/controlPg #601
# Read $[EXTRA_DIR]/DashToDock #602
# Read $[EXTRA_DIR]/deskSwitch #603
All lines are tagged (#). The script will untag the corresponding line that has the same number as in the file "install.txt". For example, TRUE 203 will untag the line ending with #203
# Read $[CORE_DIR]/myBorderStyle #203
to (without "#" before Read)
Read $[CORE_DIR]/myBorderStyle #203
I've searched for awk/sed solution, but this requires a loop to go through the numbers in Install.txt.
Any help is appreciated. Thank you.
you can try,
# store "203|301|602|603" in search variable
search=$(awk 'BEGIN{OFS=ORS=""}{if(NR>1){print "|"}print $2;}' Install.txt)
sed -r "s/^# (.*#($search))$/\1/g" ExtraExt.sys
you get,
# Read $[EXTRA_DIR]/diaryThumbPlace #202
Read $[CORE_DIR]/myBorderStyle #203
Read $[EXTRA_DIR]/mMenu #301
# Read $[EXTRA_DIR]/dDecor #501
# Read $[EXTRA_DIR]/controlPg #601
Read $[EXTRA_DIR]/DashToDock #602
Read $[EXTRA_DIR]/deskSwitch #603
or, -i option for edit in file
sed -i -r "s/^# (.*#($search))$/\1/g" ExtraExt.sys
One option using awk could be reading the first file Install.txt and store the values of the second field in arr
Reading the second file ExtraExt.sys you could get numbers of the last column using $NF and match that against the pattern ^#[0-9]+$
If there is a match, remove the # part from the match leaving only the number and check if the number is in arr.
If it is, print the current line without the leading #
awk '
FNR==NR{
arr[$2]
next
}
match ($NF, /^#[0-9]+$/) {
if (substr($NF, RSTART+1, RLENGTH) in arr) {
sub(/^#[[:space:]]+/, "", $0); print $0
}
}
' Install.txt ExtraExt.sys
Output
Read $[CORE_DIR]/myBorderStyle #203
Read $[EXTRA_DIR]/mMenu #301
Read $[EXTRA_DIR]/DashToDock #602
Read $[EXTRA_DIR]/deskSwitch #603
I have the following .txt file:
##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not known. The major allele was used as reference allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the reference and alternate alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##bcftools_viewVersion=1.12-57-g0c2765b+htslib-1.12-45-g1830551
##bcftools_viewCommand=view -h 20Perc.SNPs.mergedAll.vcf; Date=Tue Sep 28 09:46:59 2021
I would like to make two edits to this text. First, in the line:
##FORMAT=<ID=PL,Number=.,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
I would like to replace "Number=." with "Number=G"
And immediately after the after the line:
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
I would like to add a new line of text (& and line break):
##INFO=<ID=QualityScore,Number=.,Type=Float,Description="Quality score">
I was wondering if this could be done with one or two awk commands.
Thanks for any suggestions!
My solution is similar to #Daweo. Consider this script, replace.awk:
/^##FORMAT/ { sub(/Number=\./, "Number=G") }
/##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">/ {
print
print "##INFO=<ID=QualityScore,Number=.,Type=Float,Description=\"Quality score\">"
next
}
1
Run it:
awk -f replace.awk file.txt
Notes
The first line is easy to understand. It is a straight replace
The next group of lines deals with your second requirements. First, the print statement prints out the current line
The next print statement prints out your data
The next command skips to the next line
Finally, the pattern 1 tells awk to print every lines
I would GNU AWK following way, let file.txt content be
##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not known. The major allele was used as reference allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the reference and alternate alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##bcftools_viewVersion=1.12-57-g0c2765b+htslib-1.12-45-g1830551
##bcftools_viewCommand=view -h 20Perc.SNPs.mergedAll.vcf; Date=Tue Sep 28 09:46:59 2021
then
awk '/##FORMAT=<ID=PL/{gsub("Number=\\.","Number=G")}/##INFO=<ID=AF/{print;print "##INFO=<ID=QualityScore,Number=.,Type=Float,Description=\x22Quality score\x22>";next}{print}' file.txt
output
##fileformat=VCFv4.0
##FILTER=<ID=PASS,Description="All filters passed">
##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not known. The major allele was used as reference allele">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the reference and alternate alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=QualityScore,Number=.,Type=Float,Description="Quality score">
##bcftools_viewVersion=1.12-57-g0c2765b+htslib-1.12-45-g1830551
##bcftools_viewCommand=view -h 20Perc.SNPs.mergedAll.vcf; Date=Tue Sep 28 09:46:59 2021
Explanation: If current line contains ##FORMAT=<ID=PL change Number=\\. to Number=G (note \ are required to get literal . rather than . meaning any character). If current line contains ##INFO=<ID=AF print it and then print ##INFO=<ID=QualityScore,Number=.,Type=Float,Description=\x22Quality score\x22> (\x22 is hex escape code for ", " could not be used inside " delimited string) and go to next line. Final print-ing is for all lines but those containing ##INFO=<ID=AF as these have own print-ing.
(tested in gawk 4.2.1)
I want to replace strings in a target file (target.txt) by strings in a lookup table (lookup.tab), which looks as follows.
Seq_1 Name_one
Seq_2 Name_two
Seq_3 Name_three
...
Seq_10 Name_ten
Seq_11 Name_eleven
Seq_12 Name_twelve
The target.txt file is a large file with a tree structure (Nexus format). It is not arranged in columns.
Therefore I use the following command:
awk 'FNR==NR { array[$1]=$2; next } { for (i in array) gsub(i, array[i]) }1' "lookup.tab" "target.txt"
Unfortunately, this command does not take the full length of the elements from the first column, so that Seq_1, Seq_10, Seq_11, Seq_12 end up as Name_one, Name_one0, Name_one1, Name_one2 etc...
How can the awk command be made more specific to correctly substitute the strings?
Try this please, see if it meets your need:
awk 'FNR==NR { le=length($1); a[le][$1]=$2; if (maxL<le) maxL=le; next } { for(le=maxL;le>0;le--) if(length(a[le])) for (i in a[le]) gsub(i, a[le][i]) }1' "lookup.tab" "target.txt"
It's based on your own trying, but instead of randomly replace using the hashes in the array, replace using those longer keys first.
By this way, and based on your examples, I think it's enough to avoid wrongly substitudes.
I have a text file named stat.txt which contains lines each in the format
<User Name>-<IP>-<File Name>-<Size>. Each line contains a user name,an IP address,a file name and a download file size.I need to create a script userstat.awk which allows the following data to be obtained when the specific command is written:
userstat.awk u -will list all files
userstat.awk total -will list total size of all files
So far,I have tried to list all the files for a user using default commands but I can't do it using these commands.
Given stat.txt:
user-1.1.1.1-file.jpg-20
root-1.1.1.1-file.jpg-20
user-1.1.1.1-img.jpg-20
root-1.1.1.1-thing.jpg-20
You could use the command (improved by #ClasesWikner):
awk -F- '{print $3; s+=$4}END {print "total: " s}' stat.txt
To output:
file.jpg
file.jpg
img.jpg
thing.jpg
total: 80
As mentioned by #Scheff this will not work when usernames or file names contain a -.