awk dictionary not storing value [closed] - awk

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I have written an awk statement, and the "a" dictionary is not storing values properly for an unknown reason.
I have the following file:
cat lookup.txt
1 a
2 b
3 c
However when I write the following awk statement the "a" dictionary seems not to be storing values properly:
awk 'NR==FNR{a[$1]=$2;print a[$2];print $2}' lookup.txt
a
b
c
I would expect that statement to print as follows:
a
a
b
b
c
c
Any help would be greatly appreciated.

You store a[$1] but fetch a[$2]. When the program ends, the a array contains three records: a["1"] = "a", a["2"] = "b", and a["3"] = "c". When you fetch from the array, there are no elements a["a"], a["b"] and a["c"], so those values are printed as nulls. I think you probably want to print a[$1].

Related

Quantitavely replace digit (as counter) with string in sed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
Let's say i have the following file:
balloons:
- 2
- 3
Each number above should represents how many times i want to print the string. So for example I would like to process this to output as following:
balloons:
- red
- red
- blue
- blue
- blue
I only have red and blue balloons. The digits will vary from one file to another, so my search string would be a simple regex search sed -e "/[[:digit:]]\+/ perform_my_action"
Try:
awk 'BEGIN{idx[2]="red"; idx[3]="blue"}
/^-[ \t]+[0-9]+/{for(i=1;i<=$2;i++) print idx[$2]; next}
1
' file

Return not so similar codes from a single group [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I have a list of product codes grouped in 2 or 3 lines. I need to return the group where codes are not same (or consecutive)
9003103
9003103
9003978
9003979
9003763
9003728
9003543
9003543
9003543
In this case, only the third group should be returned:
9003763
9003728
I would harness GNU AWK for this task following way, let file.txt content be
9003103
9003103
9003978
9003979
9003763
9003728
9003543
9003543
9003543
then
awk 'BEGIN{RS=""}{diff=$NF-$1;diff=diff>0?diff:-diff}diff>NF' file.txt
gives output
9003763
9003728
Explanation: I set RS to empty string to provoke paragraph mode, thus every block is treated as single line, then for each block I compute absolute of difference between first and last field, if difference is bigger than number of field block is printed.
(tested in GNU Awk 5.0.1)

Returning the position of pattern matches in a text file with multiple lines [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a long text file with the following format:
>foo_bar
TATGTTCTGCAACTGTATAATGGTATAAAAACATTGCAAAATGTAATGAAACTTGTTATTTTGTGAAATACATTCTATAAATATCACTATTTCATGAAAA
ATATTGAAAATCATTTATTTTCGACAAGTAGAACCATAGGTTCTGTAATTGTAAATAGTTCTGCAAACTTAACCTGTTTTGCAGAAGAATATGTTTTCAC
TAGTTAACTTGTAGAATGTTTAGGATTGTTAAAATTTTTAACAAAATAAGATTTTATAGAACATGATTTGCAAAATAACACATTTTGCAATATTTTTATA
CCATATATAGTTGCAGAACATATGGGGACTACGGGCAGCCGGTAAATATGTGGACTACATGGAACTTGTTCAGATACATCTGGAGCAAAGAGCCACCGCT
CTAAATTATCTCTTCTCATTTCCAGTATTATATCTCTCATGCTAAATTATCTCTACAAATCATGACCTCTCTTAGCAATCTCCCTGAGCATCTCCGTAGG
GAGCAGATATTCACCCGTCTTCCGATGAAAGACCTAATGGTCCTCGCATCTGCAAGTCATGTCTTGCGTTAATCTTTCTCTCTCTTTTTGTGGAATCCCA
TCTCTCCTCTTATCAACTAAACCAGATACAGTTTGCACCAACTTTCTTCACTCCCCTGTTACATGAGAAGGCCAGACTTAGGTAGCTTCTGAATCAGAAC
CCGGTCATTCCAAGCATGGGATTTCTTGTTGATCTCTTGTTTTTATGTAATAGTGATCATTTGATATCTGGTGTTGATGGGAATTCAGATGTATGGGACT
TTGTTTATTGTTGATGTGGAATTCTTATATTTTACTGTGTACTATAAAATTTTAGTGATACCTACTATCTATTGTATAAATTGATTAATTGATGTTCTTA
>bar_foo
TATGTTCTGCAACTGTATAATGGTATAAAAACATTGCAAAATGTAATGAAACTTGTTATTTTGTGAAATACATTCTATAAATATCACTATTTCATGAAAA
ATATTGAAAATCATTTATTTTCGACAAGTAGAACCATAGGTTCTGTAATTGTAAATAGTTCTGCAAACTTAACCTGTTTTGCAGAAGAATATGTTTTCAC
TAGTTAACTTGTAGAATGTTTAGGATTGTTAAAATTTTTAACAAAATAAGATTTTATAGAACATGATTTGCAAAATAACACATTTTGCAATATTTTTATA
CCATATATAGTTGCAGAACATATGGGGACTACGGTACTACGGTAAATATGTGGACTACATGGAACTTGTTCAGATACATCTGGAGCAAAGAGCCACCGCT
CTAAATTATCTCTTCTCATTTCCAGCTGCATATCTCTCATGCTAAATTATCTCTACAAATCATGACCTCTCTTAGCAATCTCCCTGAGCATCTCCGTAGG
GAGCAGATATTCACCCGTCTTCCGATGAAAGACCTAATGGTCCTCGCATCTGCAAGTCATGTCTTGCGTTAATCTTTCTCTCTCTTTTTGTGGAATCCCA
TCTCTCCTCTTATCAACTAAACCAGATACAGTTTGCACCAACTTTCTTCACTCCCCTGTTACATGAGAAGGCCAGACTTAGGTAGCTTCTGAATCAGAAC
CCGGTCATTCCAAGCATGGGATTTCTTGTTGATCTCTTGTTTTTATGTAATAGTGATCATTTGATATCTGGTGTTGATGGGAATTCAGATGTATGGGACT
TTGTTTATTGTTGATGTGGAATTCTTATATTTTACTGTGTACTATAAAATTTTAGTGATACCTACTATCTATTGTATAAATTGATTAATTGATGTTCTTA
I.e., there is a header line which begins with a ">", and then an arbitrary number of lines with no more than 100 letters in them. I would like to find the positions within the non-header lines that match either "GCAGC" or "GCTGC". Overlapping match sites would both get recorded individually.
An example output would be a three column text file where the first column contained the header line for that block of text minus the ">", the second column contained the start position of a pattern match (i.e., the number of characters into the text block, excluding line-break characters), and the third column recorded which of the two patterns were matched. E.g.:
foo_bar 109 GCAGC
bar_foo 58289 GCTGC
Not sure how complex this task is, and in particular whether there is a memory-efficient way to perform this operation in a streaming fashion. awk or sed seem like two utilities which might work, but the required command is beyond my limited understanding of the programs.
A tiny tweak on yesterdays answer:
sub(/^>/,"") {
hdr = $0
next
}
{
while ( match($0,/GC[AT]GC/) ) {
print hdr, RSTART, substr($0,RSTART,RLENGTH)
$0 = substr($0,1,RSTART-1) " " substr($0,RSTART+1)
}
}
Please get the book Effective AWK Programming, 5th Edition, by Arnold Robbins to learn the basics of awk.

Indexing words in a file according to their line with AWK [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Suppose I have a file similar to the following:
hello
hello
hi
hi
hello
hey
I would like to find the indices of every unique line and using a comma as the indices separator. So ideally, the output would be like:
hello 1,2,5
hi 3,4
hey 6
What has been done in getting the value of lines by using the following codes,
{ arr[$0]++ }
END { for (i in arr) {
print i
}
}
the result is,
hey
hi
hello
Try using this script
{
words[$0] = words[$0] == "" ? FNR : words[$0] "," FNR # appends the line, sorting for the word
}
END { # once we are done reading the file
for (w in words) # for each word, the sorting order depends on awk internal variables.
{
print w, words[w] # prints the desired output
}
}
Please see Controlling Array Traversal for more details on how the words are going to be printed out and how to control it. For more details on FNR see What are NR and FNR.

Syntax error in range function declaration [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Error in range function syntax enter image description here
for x in range(20),
if x % 2 == 0
print x
else
print 'odd',
output:
File "<ipython-input-106-a3bbe30e4016>", line 1
for x in range(20),
^
SyntaxError: invalid syntax
just replace the comma in the first line by :
Then, in Python 3.x, write print("toto") instead of print "toto".
Finally, the end of an if condition needs a : (like in a foor or a while loop)
for x in range(20):
if x % 2 == 0:
print(x)
else:
print('odd')