Search for pattern with alphanumeric and print the lines - awk

I have data in format
A ((!(A1+A2)))
B (A1A2)
C (A1+A2)
D (!(A1+A2)B1)
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)
I want output as
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)
I want to get that line which has ! only with alphanumeric.
I used
awk -F' ' '$2 ~ /\!/' file
And this also
awk '$2'~/^[(!)_[:alnum:]]+[(!)+_[:alnum:]]+$/' file
But this is listing all the lines which has ! along with ! in alphanumeric also.

I want to get that line which has ! only with alphanumeric:
You can just use this regex for matching ! followed by an alphanumeric character:
/![[:alnum:]]/
awk command:
awk '$2 ~ /![[:alnum:]]/' file
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)

$ grep '![[:alpha:]]' file
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)
$ awk '/![[:alpha:]]/' file
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)
$ sed -n '/![[:alpha:]]/p' file
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)
If that's not all you need then edit your question to more clearly state your requirements and provide sample input/output that that doesn't work for.

I guess I want to get that line which has ! only with alphanumeric means: not !(
$ awk '/![^(]/' file
Output:
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)
If your real data is not so tight, you may want to throw in some space checking, for example: /! *[^(]/

Also with awk:
awk '/![[:alpha:]]+/' file
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)
Or
awk '/![[:alpha:]][[:digit:]]/' file
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)

You can require a starting (leading) word boundary (\<) after !:
awk '$2 ~ /!\</' file
Note that whitespace is the default field separator pattern, you do not need -F' '.
See the online awk demo:
s='A ((!(A1+A2)))
B (A1A2)
C (A1+A2)
D (!(A1+A2)B1)
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)'
awk '$2 ~ /!\</' <<< "$s"
Output:
E (!A1+!A2)
F (!(!A1!A2)+A3+A4)
G ((A1!A2)B1)

Related

Replace space in a specific column with a character

I have data in format
A ((!(A1+A2)))
B (A1+A2)
C (A1 A2)
D (!(A1 A2) B1)
E (!A1+!A2)
F ((A1+A2) A3 A4)
G ((A1 A2)+(A3 A4))
I want output as
A ((!(A1+A2)))
B (A1+A2)
C (A1&A2)
D (!(A1&A2)&B1
E (!A1+!A2)
F ((A1+A2)&A3&A4)
G ((A1&A2)+(A3&A4))
So whenever there is space in column2 I want it to get replaced with &
I tried
sed 's/ /&/2' file
But there is no change
I also tried
awk -F' ' '{if($2==" ")$2="&";}1' file
This also has no change getting back input file only.
You may use this awk with 2 spaces as input/output field separator:
awk 'BEGIN {FS=OFS=" "} {gsub(/ +/, "\\&", $2)} 1' file
A ((!(A1+A2)))
B (A1+A2)
C (A1&A2)
D (!(A1&A2)&B1)
E (!A1+!A2)
F ((A1+A2)&A3&A4)
G ((A1&A2)+(A3&A4))
You can harness sed for this task following way:
sed 's/\([^ ]\) \([^ ]\)/\1\&\2/g'
gives for input
A ((!(A1+A2)))
B (A1+A2)
C (A1 A2)
D (!(A1 A2) B1)
E (!A1+!A2)
F ((A1+A2) A3 A4)
G ((A1 A2)+(A3 A4))
output
A ((!(A1+A2)))
B (A1+A2)
C (A1&A2)
D (!(A1&A2)&B1)
E (!A1+!A2)
F ((A1+A2)&A3&A4)
G ((A1&A2)+(A3&A4))
Explanation: I used capturing groups here, 1st is any character but space, 2nd is also any character but space and there is space between them, such match is replaced by content of 1st group (\1) followed by & (\&) followed by content of 2nd group (\2). Note that we want multiple replacements, hence g. Disclaimer: this solution assumes there are not leading or trailing spaces in your input.

Move every first row down to the second row

I have the following sample text:
a b c
x_y_
d e f
x_y_
g h i
x_y_
k l m
x_y_
I need it to be formatted as follows:
x_y_ a b c
x_y_ d e f
x_y_ g h i
x_y_ k l m
Using sed, awk or something else in bash, how do we accomplish this?
Another awk:
$ awk 'NR%2==0{print $0,p}{p=$0}' file
Output:
x_y_ a b c
x_y_ d e f
x_y_ g h i
x_y_ k l m
Explained:
$ awk '
NR%2==0 { # on every even numbered record
print $0,p # output current record and previous
}{
p=$0 # buffer record for next round
}' file
Update:
In case of odd number of records (mostly due to the peer pressure :), you need to deal with the left-over x y z:
$ awk 'NR%2==0{print $0,p}{p=$0}END{if(NR%2)print}' file
Output:
...
x_y_ g h i
x_y_ k l m
x y z
With sed:
sed -E 'N;s/(.*)\n(.*)/\2 \1/g' sample.txt
a short pipeline:
tac file | paste -d ' ' - - | tac
$ awk 'NR%2{s=$0; next} {print $0, s}' file
x_y_ a b c
x_y_ d e f
x_y_ g h i
x_y_ k l m
1st solution: Could you please try following, tested and created with GNU awk.
awk -v RS="" -v FS="\n" '{for(i=2;i<=NF;i+=2){printf("%s\n",$i OFS $(i-1))}}' Input_file
OR(with print):
awk -v RS="" -v FS="\n" '{for(i=2;i<=NF;i+=2){print $i,$(i-1)}}' Input_file
2nd solution: By checking if a line number is completely divided by 2 then print previous and current lines values. It also checks if total number of lines are ODD in Input_file then it prints last remaining line too(by checking a flag(variable)'s status).
awk 'prev && FNR%2==0{print $0 OFS prev;prev="";next} {prev=$0} END{if(prev){print prev}}' Input_file
Output will be as follows.
x_y_ a b c
x_y_ d e f
x_y_ g h i
x_y_ k l m
This might work for you (GNU sed):
sed '1~2{h;d};G;s/\n/ /' file
Save odd line numbered lines in the hold space and append them to even numbered lines and replace the newline with a space.
Another variation:
sed -n 'h;n;G;s/\n/ /p' file
There are many more ways to achieve this, as can be seen by answers above.
How about this:
parallel -N2 echo "{2} {1}" :::: file
See here for parallel.

Put a comma in a specific column

I would like to know how to put a comma in one column (space). For example.
a b c d e
And I would like this.
a b c d, e
A comma in the 4th space.
I tried with this command.
awk -F '{print $4}' < file.txt | cut -d"," -f4-
$ awk '{$4=$4","}1' file
a b c d, e
If you have only 5 fields(or in case you have more fields in your Input_file and you want to perform this for second last field) in your Input_file then following may also help you in same.
awk '{$(NF-1)=$(NF-1)","} 1' Input_file
Or with sed simply replace 4th space with comma as follows.
sed 's/ /, /4' Input_file
echo a b c d e| awk '{$0=gensub(/ /,", ",4)}1'
a b c d, e

Random selection of ids from a file

I have a text file in the following format, the alphabets are ids separated by a space.
OG1: A B C D E
OG2: C F G D R
OG3: A D F F F
I would like to randomly extract one id from each group as
OG1: E
OG2: D
OG3: A
I tried using
shuf -n 1 data.txt
which gives me
OG2: C F G D R
awk to the rescue!
$ awk -v seed=$RANDOM 'BEGIN{srand(seed)} {print $1,$(rand()*(NF-1)+2)}' file
OG1: D
OG2: F
OG3: F
to skip a certain letter, you can change the main block to
... {while ("C"==r=$(rand()*(NF-1)+2)); print $1,r}' file
perl -lane 'print "$F[0] ".$F[rand($#F-1)+1]' data.txt
Explanation:
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code
#F is the array of words in each line, indexed starting with $F[0]
$#F is the number of words in #F
output:
OG1: A
OG2: F
OG3: F

Split large single column into two columns

I need to split a single column of data in a large file into two columns as follows:
A
B B A
C ----> D C
D F E
E H G
F
G
H
Is there an easy way of doing it with unix shell commands and/or small shell script? awk?
$ awk 'NR%2{s=$0;next} {print $0,s}' file
B A
D C
F E
H G
You can use the following awk script:
awk 'NR % 2 != 0 {cache=$0}; NR % 2 == 0 {print $0 cache}' data.txt
Output:
BA
DC
FE
HG
It caches the value of odd lines and outputs even lines + appends the cache to them.
I know this is tagged awk, but I just can't stop myself from posting a sed solution, since the question left it open for "easy way . . . with unix shell commands":
$ sed -n 'h;n;G;s/\n/ /g;p' data.txt
B A
D C
F E
H G