Print next word after pattern match for two strings in same line for one file in se [closed] - awk

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
i have different lines in one file .I want to achieve the output as below. I want to print the next word after aaaa and test words with delimiter ,.
Input is
Line Aaaa orange test match
Colour Aaaa banana test sun
Ball Aaaa guava test Saturday
Basket Aaaa tomato test sunset
Output has to be
Orange ,match
Banana ,sun
Guava, Saturday
Tomato,sunset
Could anyone please help on this
I have tried using sed ,grep commands but i didnt get the expected output

perl, using look-behinds:
perl -nE 'if (/(?<=aaaa )(\w+).*(?<=test )(\w+)/i) {say "$1,$2"}' file

awk solution
Assuming you want to ignore lines where "Aaaa" does not occur, the following awk process should achieve your needs (if the relative positions of"Aaaa" and "test" are constant - see edit below):
awk '{for(i=1;i<NF;i++) if ($i=="Aaaa") {print $(i+1) ", " $(i+3); next}}' file
explanation
Each line of file is processed (by default) as white-space separated fields. A loop examines each field for the required pattern ("Aaaa") and (if found) prints the values of the next field, the required comma, and the value of the final required field.
Edit
For cases where the position of "test" may also vary, providing "test" is never before "Aaaa"*, the following procedure should work:
awk '{for(i=1;i<NF;i++) {if ($i=="Aaaa") {line= $(i+1) ", "; } if ($i=="test") {line=line $(i+1); print line;next}}}' file
It will need reworking if "test" can come before "Aaaa" as follows:
awk '{partA=partB=""; for(i=1;i<NF;i++) {if($i=="Aaaa") partA=$(i+1); if($i=="test") partB=$(i+1); if(partB && partA) {print partA ", " partB; next}}}' file
This version requires both "Aaaa" and "test" are present but they can be in any order on the line.
(each) tested on file:
Line Aaaa orange test match
Colour Aaaa banana test sun
Ball Aaaa guava test Saturday
Basket Aaaa tomato test sunset
output (for all three versions)
orange, match
banana, sun
guava, Saturday
tomato, sunset
(using GNU Awk 5.1.0)

Related

how to remove part of the string if the condition exists [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a file similar to this:
A*01:03:05
B1*02:06:08
F2*03:01:06
R5*02:01
S1*02:08
And would like to remove the last 2 numbers and the colon, only when there are 2 colon separators. so it will be:
A*01:03
B1*02:06
F2*03:01
R5*02:01
S1*02:08
The last 2 lines remain unchanged because they do not have 2 colon separators after the *, so no changes are made to those values
I used sed and gsub to remove everything after the last underscore but was not sure how to add a condition to exempt the condition when I do not have 2 colons after the *.
This might work for you (GNU sed):
sed 's/:..//2' file
This removes the second occurrence of a : followed by 2 characters.
If this is too lax, use:
sed -E 's/^([^:]*:[^:]*):[0-9]{2}/\1/' file
With cut, you can set : as delimiter and print only upto first two fields
cut -d: -f-2 ip.txt
Similar logic can be done with awk, assuming the implementation supports manipulating NF
awk 'BEGIN{FS=OFS=":"} NF==3{NF=2} 1' ip.txt
This works:
$ sed -E 's/^([^:]*:[^:]*):[0-9][0-9]$/\1/' file
The [^:] means 'any character other than a :' so it works by making the deletion at the end only if there are two leading colons.
This awk works too:
$ awk 'gsub(/:/,":")==2 {sub(/:[0-9][0-9]$/,"")} 1' file
In this case, gsub returns the number of replacements made. So if there are two colons, delete the ending.
You can also use GNU grep (with PCRE) to only match the template of what you are looking for:
$ grep -oP '^\w+\*\d\d:\d\d' file
Or perl same way:
$ perl -lnE 'say "$1" if /(^\w+\*\d\d:\d\d)/' file

Combine 2 Conditions of AWK in 1 set of forward slashs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a file where I need only 18th column and that 18th column must not contain 30 words like
AAA, BBB, CCC etc
Sample file
$ cat a.csv
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,Aaa
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,BBB
awk -F, '!($18 ~ /AAA/) && !($18 ~ /BBB/) {print $18 }'
It's possible to write something like
awk -F, '!($18 ~ /AAA, BBB /) {print $18 }'
EDIT
If I use
i=$("AAA|BBB")
awk -F, '!($18 ~ /$i/) {print $18 }'
it produces error command not found
You could use the alternation operator | and use something like
awk -F',' '$18 !~ /AAA|BBB|CCC/{print $18}' a.csv
If you want to simply strip lines where a field is one of a set of blacklist entries, you can create the blacklist once in the BEGIN section, then simply use ~ to see if that blacklist contains your field.
Perhaps the easiest way to do this is to construct the blacklist using the input field separator (so you know it won't be part of the field). With an input.csv file of:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,AAA
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,BBB
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,CCC
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,DDD
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,EEE
Let's say you don't want lines where field 18 is AAA, BBB or DDD:
pax> awk -F, 'BEGIN{ss=",AAA,BBB,DDD,"}ss!~","$18","{print}' input.csv
CCC
EEE
Below, we'll break down how it works:
BEGIN {
ss=",AAA,BBB,DDD," # This is the blacklist, note IFS separator and start/end.
}
ss !~ ","$18"," { # If ",<column 18>," not in blacklist, print.
print $18
}
The trick is to create a string which is the column we're checking surrounded by the delimiters (which cannot be in the column). If we find that in the blacklist (which is every unwanted item surrounded by the delimiter), we can discard it.
Note that you're not restricted to a fixed blacklist (either in your string or if you decide to use a regex solution), you can if you wish read the entries from a file and dynamically construct the list. For example, consider the file blacklist.txt:
AAA
BBB
DDD
and the input.txt file as shown above. The following awk command can dynamically create the blacklist from that file thusly:
pax> awk -F, 'BEGIN{ss=","}NR==FNR{ss=ss""$1",";next}ss!~","$18","{print}' blacklist.txt input.csv
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,CCC
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,EEE
Again, breaking it down:
BEGIN {
ss = "," # Start blacklist.
}
NR==FNR { # Only true for first file in list (blacklist).
ss = ss""$1"," # Extend blacklist.
next # Go get next line.
}
ss !~ ","$18"," { # Only get here for second file (input).
print
}
Here, we process the first file to construct the blacklist (rather than having a fixed one). Lines in the second file are treated as per my original script above.

Using awk on a folder and adding file name to output rows

I should start by thanking you all for all the work you put into the answers on this site. I have spent many hours reading through them but have not found anything fitting my question yet. Hence my own post.
I have a folder with multiple subfolders and txt-files within those. In column 7 of those files, there are gene names (I do genetics for a living :)). These are the string I am trying to extract. Shortly, I would like to search the whole folder for any rows within any of the files that contain a particular gene name/string. I have been using grep for this, writing something like:
grep -r GENE . > GENE.txt
Simple, but I need to be able to tweak the search further and it seems that then awk is the way to go.
So I tried using awk. I wrote something like this:
awk '$7 == "GENENAME"' FOLDER/* > GENENAME.txt
This works well (and now I can specify that the string has to be in a particular column, this I can not do with grep, right?).
However, in contrast to grep, which writes the file name at the start of each row, I now can not directly see which file which row in my output file comes from (which mostly defeats the point of the search). This, adding the name of the origin file somewhere to each row, seems like something that should absolutely be doable, but I am not able to figure it out.
The files I am searching within change (or rather get more numerous), but otherwise my search will always be for some specific string in column 7 of the same big folder. How can I get this working?
Thank you in advance,
Elisabet E
You can use FNR (FNR means file number of record) to print the row number and FILENAME to print the file's name, then you get the matching lines from which file and which row, for instance:
sample.csv:
aaa 123
bbb 456
aaa 789
command:
awk '$1 =="aaa"{print $0, FNR, FILENAME}' sample.csv
The output is:
aaa 123 1 sample.csv
aaa 789 3 sample.csv
Sounds like you're looking for:
awk '$7 == "GENENAME"{print FILENAME, $0}' FOLDER/*
If not then edit your question to clarify with sample input and expected output.

interview questions on sed or awk [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hi I recently crashed and burned on a interview question on file editing. Kinda been bugging me. I searched on the forum but did quite get it to go.
The question was:
testfile.txt has the following:
some test text 1
some test text 2
some test text 3
Change testfile.txt so it can look like this (with out using VI or gedit):
some test text 1
some more test text
some test text 2
some more test text
some test text 3
some more test text
I have tried to use sed and awk but the text does not come out where it needs to be. Any help would be greatly appreciated.
This might work for you (GNU sed):
sed -i 'asome more test text' file
This will append the text some more test text to every line and amend the original file in place.
Given:
$ echo "$txt"
some test text 1
some test text 2
some test text 3
In awk:
$ echo "$txt" | awk '{print; print "some more test text"}'
some test text 1
some more test text
some test text 2
some more test text
some test text 3
some more test text
posting this for the original question for inserting second line to a text file.
here is one way to do it with sed
$ seq 3 | sed 2iinserted
1
inserted
2
3
For inserting text after every line
$ seq 3 | sed 1~1ainserted
1
inserted
2
inserted
3
inserted
if you don't understand what this is, read some of the tutorials, internet is full of them.
Given the output matching against a repeating phrase some test text, I think
sed 's/some test text.*/&\nsome more test text/' file
will do
some test text 1
some more test text
some test text 2
some more test text
some test text 3
some more test text
Of course the & char in the RHS of the substitute command captures all chars matched and inserts them into the output. We add in \n some more test text on a new-line and we're good.
Note that non-gnu/linux seds may not recognized \n as a newline, in which case (from the cmd-line), you need to use the key combination Ctrl-V Ctrl-M to add line breaks to your output. (I've never seen it explained how ^M is converted in the file to ^J, but it goes back to the 80s, so go figure).
IHTH

Select required field from line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
$ cat file1.txt
test 15 18 17
test1 11 12 14
test2 13 16 19
Need to extract only
test 12 17 19
Preferably using awk one line.
Here you go:
awk '{a=$1;c=$4;getline;b=$3;getline;d=$4;print a,b,c,d}'
test 12 17 19
You does not say how to get the result!!!
awk ' # start
{
a=$1 # from first line set a="test"
c=$4 # from first line set c=17
getline # get next line
b=$3 # from second line set b=12
getline # get next line
d=$4 # from third line set d=19
print a,b,c,d # print this
}' file
Using GNU awk for arrays of arrays (can do similar in any awk with slightly different syntax and a loop):
$ awk '{a[NR][1]=""; split($0,a[NR])} END{print a[1][1], a[2][3], a[1][4], a[3][4]}' file
test 12 17 19
For comparison with the getline-based solution posted, to modify the above to print "found" to stderr every time the word "test" appears on a line would simply be to add that condition and action once:
$ awk '{a[NR][1]=""; split($0,a[NR])} /test/{print "found" |"cat>&2"} END{print a[1][1], a[2][3], a[1][4], a[3][4]}' file
test 12 17 19
found
found
found
With the getline solution you'd need to add the test+action repeatedly, once for every line of the input file:
$ awk '{a=$1;c=$4;if (/test/)print "found" |"cat>&2";getline;b=$3;if (/test/)print "found" |"cat>&2";getline;d=$4;if (/test/)print "found" |"cat>&2";print a,b,c,d}' file
and consider what you'd need to do if your file became 20 lines long instead of 3. This is 100% not an appropriate problem to solve using getline.