AIX Find 2 lines below the search String - aix

I am trying to come up with a script or a command to find the line that is 2 lines below a search parameter in AIX. Please help with the correct command syntax.
Thanks in advance.

I have found the answer based on the answer given here: How to print 5 consecutive lines after a pattern in file using awk
awk '/pattern/ {print $0; {getline;getline; print}}' filename

Related

filter unique parameters from file

i have file contains urls plus params like following
https://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123
https://example.com/endpoint/endpoint2/?param1=123
https://example.com/endpoint/endpoint2/
https://example.com/endpoint/endpoint5/"//i.example.com/00/s/Nzk5WDEwMjQ=/z/47IAAOSwBu5hXIKF
and i need to filter only urls with unique params
the desired output
http://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123
i managed to filter only urls with params with grep
grep -E '(\?[a-zA-Z0-9]{1,9}\=)'
but i need to filter params in the same time so i tried with awk with the same regex
but it gives error
awk '{sub(\?[a-zA-Z0-9]{1,9}\=)} !seen[$0]++'
update
i am sorry for editing the desired output but when i tried the scripts i figured out that their a lot of carbege in my file need to filter too.
i tried #James Brown with some editing and it looks good till the end line it dose not filter it unfortunately
awk -F '?|&' '$2&&!a[$2]++'
and to be more clear why the that output is good for me
it chosed the 1 st line because it has at least param1
2nd line because it has at least param3
3 line because it has at least param2
the comparison method here is choose just unique parameter whatever it concatenate with others with & char or not
Edited version after the reqs changes some:
$ awk -F? '{ # ? as field delimiter
split($2,b,/&/) # split at & to get whats between ? and &
if(b[1]!=""&&!a[b[1]]++) # no ? means no $2
print
}' file
Output as expected. Original answer was:
A short one:
$ awk -F? '$2&&!a[$2]++' file
Explained: Split records at ? (-F?) and if there is a second field ($2) and (&&) it is unique this far by counting the instances of the parameters in the array a (!a[$2]++), output it.
EDIT: Following solution may help when query string has ? as well as & present in it and we want to consider both of them for removing duplicates.
awk '
/\?/{
match($0,/\?[^&]*/)
val=substr($0,RSTART,RLENGTH)
match($0,/&.*/)
if(!seen[val]++ && !seen[substr($0,RSTART,RLENGTH)]++){
print
}
}' Input_file
2nd solution: (Following solution may help when we don't have & parameters in query string) With your shown samples, please try following awk program.
awk 'match($0,/\?.*$/) && !seen[substr($0,RSTART,RLENGTH)]++' Input_file
OR above could be shorten to as follows:(as per Ed sir's suggestions):
awk 's=index($0,"?") && !seen[substr($0,s)]++' Input_file
Explanation: Simple explanation would be, using match function of awk which matches everything from ? to till end of line value. Then adding an AND condition to it to make sure we get only unique values out of all matched values in all lines.
With gnu awk, you could also match the url till the first occurrence of the question mark, and then capture what follows using your initial pattern for the first parameter ([a-zA-Z0-9]{1,9}=[^&]+) followed by matching any character except the &
Then you can use the !seen[$0]++ part with the value of capture group 1.
awk '
match($0, /https?:\/\/[^?]+\?([a-zA-Z0-9]{1,9}=[^&]+)/, arr) && !seen[arr[1]]++
' file
Output
https://example.com/endpoint/?param1=123&param2=1212
https://example.com/endpoint/?param3=123&param1=98989
https://example.com/endpoint/endpoint3/?param2=123
Using awk you can check that the string starts with the protocol and contains a question mark.
Then to get the first parameter only, you can split on ? and & and use the second part of the split for seen
awk '
/^https?:\/\/[^?]*\?/ && split($0, arr, /[?&]/) > 1 && !seen[arr[2]]++
' file

Finding sequence in data

I to use awk to find the sequence of pattern in a DNA data but I cannot figure out how to do it. I have a text file "test.tx" which contains a lot of data and I want to be able to match any sequence that starts with ATG and ends with TAA, TGA or TAG and prints them.
for instance, if my text file has data that look like below. I want to find and match all the existing sequence and output as below.
AGACGCCGGAAGGTCCGAACATCGGCCTTATTTCGTCGCTCTCTTGCTTTGCTCGAATAAACGAGTTTGGCTTTATCGAATCTCCGTACCGTAAGGTCGAAAACGGCCGGGTCATTGAGTACGTGAAAGTACAAAATGG
GTCCGCGAATTTTTCGGTTCGTCTCAGCTTTCGCAGTTTATGGATCAGACGAACCCGCTCTCTGAAATTACTCATAAACGCAGGCTCTCGGCGCTCGGGCCCGGCGGACTCTCGCGGGAGCGTGCAGGTTTCGAAGTTC
GGATGATATCGACCATCTCGGCAATCGACGCGTTCGGGCCGTAGGCGAACTGCTCGAAAATCAATTCCGAATCGGGCTTGAGCGAATGGAGCGGGCCATCAAGGAAAAAATGTCTATCCAGCAGGATATGCAAACGACG
AAAGTATGTTTTTCGATCCGCGCCGATTCGACCTCTCAAGAGTCGGAAGGCTTAAATTCAATATCAAAATGGGACGCCCCGAGCGCGACCGTATAGACGATCCGCTGCTTGCGCCGATGGATTTCATCGACGTTGTGAA
ATGAGACCGGGCGATCCGCCGACTGTGCCAACCGCCTACCGGCTTCTGG
Print out matches:
ATGATATCGACCATCTCGGCAATCGACGCGTTCGGGCCGTAG
ATGATATCGACCATCTCGGCAATCGACGCGTTCGGGCCGTAG
ATGTTTTTCGATCCGCGCCGATTCGACCTCTCAAGAGTCGGAAGGCTTAA
I try something like this, but it only display the rows that starts with ATG. it doesn't actually fix my problem
awk '/^AGT/{print $0}' test.txt
assuming the records are not spanning multiple lines
$ grep -oP 'ATG.*?T(AA|AG|GA)' file
ATGGATCAGACGAACCCGCTCTCTGA
ATGATATCGACCATCTCGGCAATCGACGCGTTCGGGCCGTAG
ATGTTTTTCGATCCGCGCCGATTCGACCTCTCAAGAGTCGGAAGGCTTAA
ATGGGACGCCCCGAGCGCGACCGTATAG
ATGGATTTCATCGACGTTGTGA
non-greedy match, requires -P switch (to find the first match, not the longest).
Could you please try following.
awk 'match($0,/ATG.*TAA|ATG.*TGA|ATG.*TAG/){print substr($0,RSTART,RLENGTH)}' Input_file

parse and concatenate phrases

I have a file dump which has the records of individuals:
.....Detail....account=xxxxx,......state=yyyyy,....
.....Detail....account=aaaaa,......state=bbbbb,....
What would be a way to extract the 2 phrases concatenated together using awk,sed or grep?
Would it be possible in a single-pass command line?
Expected output(delimiter does not matter):
xxxxx-yyyyy
aaaaa-bbbbb
awk -F'[=,]' '{print $2"-"$4}' file
xxxxx-yyyyy
aaaaa-bbbbb
The details about the input data are a bit vague, but the following sed filter will probably have the desired effect, and could most likely be tweaked to do so otherwise:
s/.*account=\([^,]*\).*state=\([^,]*\),.*/\1-\2/
#IUnknown: I believe that .....(dots) in your Input_file are the data, could you please try following and let me know if this helps.
awk '{for(i=1;i<=NF;i++){if($i ~ /=/){split($i, A,"=");Q=Q?Q"-"A[2]:A[2]}};print Q;Q=""}' Input_file
It considers that you need only those values which are having = in them and you want their second values, let me know if this helps you.

awk lines in file between header and footer strings

I'm trying to parse out all of the lines in between different headers and footers to different files using an awk script in a for loop. For example, I have a file with a list of mismatches with sample-name headers (compiled.csv) that looks like this:
19-T00,,,,,,,,,,,,,,,,
1557,WT,,,,,,,,,,,,,,,
6,109-G->A,110-G->A,,,,,,,,,,,,,,
3,183-G->A,,,,,,,,,,,,,,,
19-T10,,,,,,,,,,,,,,,,
642,WT,,,,,,,,,,,,,,,
206,24->G,,,,,,,,,,,,,,,
19-T21,,,,,,,,,,,,,,,,
464,24->G,,,,,,,,,,,,,,,
19-TSpl,,,,,,,,,,,,,,,,
2219,24->G,,,,,,,,,,,,,,,
20-T00,,,,,,,,,,,,,,,,,,
...
...
My goal for the lines above would be to pass all the lines from the 19-T00 to the 2219,24->G,,,,,,,,,,,,,,, in a sample output file called sample-19.csv.
The sample names all share the pattern [0-9][0-9]-T*. And my approach to doing this first was based on creating an array with all 20 sample names (i.e. 19, 20, 21...). I am trying to execute the following loop, and output files are created but they are blank.
for i in {0,19}
do a="$i"
b=`echo $i+1 | bc`
header="${array[$a]}-T"; footer="${array[$b]}-T"
name=`echo $header | cut -d"-" -f1`
awk -F, -v start="$header" -v finish="$footer" '/^start*/,/^finish*/' compiled.csv >"sample-"$name".csv"
done
If I do this manually with the one-liner:
awk '/^19-T*/,/^20-T*/' compiled.csv >sample-19.csv it works fine. So I think there may be a problem in the variable passing, but I don't know how to fix it.
I know there are some other threads discussing the header-footer approach using awk, but I just think my syntax needs some help. If anyone has any advice by way of more experienced eyes, it would be much appreciated. Let me know if anything isn't clear.
Thanks,
Matt
All you need is something like this (untested):
awk '
/^[0-9][0-9]-T00,/ {
close(out)
out = "sample-" $0
sub(/-T00.*/,".csv",out)
}
{ print > out }
' compiled.csv
If you're ever again considering processing text with a shell loop make sure to read why-is-using-a-shell-loop-to-process-text-considered-bad-practice first
using awk
awk --posix '/[0-9]{2}-T00/{split($0,a,"-"); name=a[1]} {print $0>"sample-"name".cas"}' file
Output will be two files "sample-19.csv" and "sample-20.csv" for your contents

Extract data from ASCII file with grep/AWK

I have a long ASCII log-file from a simulation and need to extract some data from it.
The lines I want have the structure:
Main step= 1 a= 0.00E+00 b=-6.85E-08 c= 4.58E-08
The phrase "Main step" is only used in the lines I want. This is easy to grep for, but I also want to include the next line following the line above, which has the structure:
Fine step= 1 t=-1.31854E+01
Note that "Fine step" is used other places in the log-file.
My question boils down to this: How can I extract the lines containing a keyword/phrase (here "Main step") and also make sure that I get the next following line using grep or AWK or some other standard Linux program?
You can use sed
sed -n '/Main step/,/./p' inputFile
This prints only the lines in a range starting from Main step and ending with . (the wildcard). Effectively, every line which reads Main step and the following are printed.
Posted according to the tag awk. And the one through awk's getline function,
awk '/Main step/{print; getline; print}' file
It would print the Main step line and also the next line.
Because you tagged "grep", and since this is the most obvious solution to me:
grep -A1 'Main step' file
...although this will add "--" between matches. So to get the same output as the awk and sed answer:
grep -A1 'Main step' file | grep -v '^--$'