Grep pattern and select portion of a line after a matching patterns 41572: 90000: and 90002:
input
hyt : generation
str : 122344
stks : 9000233
dhy : 9000aaaa
sjyt : hist : hhh9000kkk
Count ch : 41572:47149-47999/2(14485-14910) 41584:47149-47999/2(14911-15449) 90000:47919-47999/2(15447-15477) 90002:47919-47999/2(15478-15418)
drx : 12345
here the code used
awk '
{
flag=""
for(i=1;i<=NF;i++){
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){
flag=1
printf("%s%s",$i,i==NF?ORS:OFS)
}
}
}
!flag
' Input_file
with the code above from Mr. RavinderSingh13, I got the following output
hyt : generation
str : 122344
stks : 9000233
dhy : 9000aaaa
sjyt : hist : hhh9000kkk
41572:47149-47999/2(14485-14910) 90000:47919-47999/2(15447-15477) 90002:47919-47999/2(15478-15418)
drx : 12345
I need the following output desired
hyt : generation
str : 122344
stks : 9000233
dhy : 9000aaaa
sjyt : hist : hhh9000kkk
Count ch : 41572:47149-47999/2(14485-14910) 90000:47919-47999/2(15447-15477) 90002:47919-47999/2(15478-15418)
drx : 12345
Thanks in advance
EDIT: Adding solution as per OP's new question.
awk '{flag="";for(i=1;i<=NF;i++){if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){flag=1;printf("%s%s",$i,i==NF?ORS:OFS)}}} !flag'
OR
awk '
{
flag=""
for(i=1;i<=NF;i++){
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){
flag=1
printf("%s%s",$i,i==NF?ORS:OFS)
}
}
}
!flag
' Input_file
Could you please try following(though not fully clear going by as per shown sample output only).
awk 'NF>1{for(i=1;i<=NF;i++){if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){printf("%s%s",$i,i==NF?ORS:OFS)}};next} 1' Input_file
Adding a non-one liner form of solution too now.
awk '
NF>1{
for(i=1;i<=NF;i++){
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){
printf("%s%s",$i,i==NF?ORS:OFS)
}
}
next
}
1
' Input_file
Explanation: Adding explanation for above code too here.
awk '
NF>1{ ##Checking if NF is greater than 1.
for(i=1;i<=NF;i++){ ##Using for loop to go through from value 1 to till value of NF.
if($i ~ /41572/ || $i ~ /90000/ || $i ~ /90002/){ ##Checking if value of fields is either 41572 OR 90000 OR 90002 then do following.
printf("%s%s",$i,i==NF?ORS:OFS) ##Print the field value in case above condition is TRUE with NEW line if i==NF or space if not.
}
}
next ##Next will skip all further statements from here.
}
1 ##1 will print all edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.
Related
I am trying to add a prefix to a field in awk if it is not already present. That is if chr isn't present before the number it is inserted. However, if it is there it is skipped.
The first awk adds the prefix to each $2 even if it is present and the senond awk does skip the $2 with chr in them, but does print chr in the $2 without. Thank you :).
file
ASPA,17:3483575-3483585
ATM,11:108289609-108289613
ATP7B,13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123
desired
ASPA,chr17:3483575-3483585
ATM,chr11:108289609-108289613
ATP7B,chr13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123
awk
awk -F, '{$2="chr"$2; print}' file
awk 2
awk -F, '$2 !~/chr/{gsub("chr","chr",$2)}1' file
You can use:
awk 'BEGIN {FS=OFS=","} $2 !~ /^chr/ {$2="chr" $2} 1' file
ASPA,chr17:3483575-3483585
ATM,chr11:108289609-108289613
ATP7B,chr13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123
Or without using any regex:
awk 'BEGIN {FS=OFS=","} index($2 , "chr") != 1 {$2="chr" $2} 1' file
Another solution that might be shortest of all:
awk '{sub(/,(chr)?/, ",chr")} 1' file
1st solution: With your shown samples, please try following awk code.
awk '
BEGIN{FS=OFS=":"}
{
split($1,arr,",")
if(int(arr[2]) || arr[2]==0){
$1=arr[1] ",chr" arr[2]
}
}
1
' Input_file
2nd solution: With GNU awk using its match function which captures values into an array from capturing groups try following code.
awk '
match($0,/^([^,]*,)([^:]*)(:.*)/,arr){
if(int(arr[2]) || arr[2]==0){
arr[2]="chr" arr[2]
}
print arr[1] arr[2] arr[3]
}
' Input_file
3rd solution(Bonus one): Just in case your 2nd field is having Negative values(integers) and you want to change it Eg: from -11 to -chr11 then you can try following GNU awk code.
awk '
match($0,/^([^,]*,)(-)?([^:]*)(:.*)/,arr){
if(int(arr[3]) || arr[3]==0){
if(arr[2]=="-"){
arr[3]="-chr" arr[3]
}
else{
arr[3]="chr" arr[3]
}
$0=arr[1] arr[3] arr[4]
}
print
}
' Input_file
mawk NF=NF FS=',(chr)?' OFS=',chr'
ASPA,chr17:3483575-3483585
ATM,chr11:108289609-108289613
ATP7B,chr13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123
Input file data:
"1","123","hh
KKK,111,ll
Jk"
"2","124","jj"
Output data:
"1","123","hh KKK,111,ll jk"
"2","124","jj"
Tried below code in awk file. still not working for desired output:
BEGIN{
`FS="\",\"";
record_lock_flag=0;
total_feilds=3;
tmp_field_count=0;
tmp_rec_buff="";
lines=0;
}
{
if(NR>0)
{
if( record_lock_flag == 0 && NF == total_feilds && substr($NF,length($NF)-1,length($NF)) ~ /^"/ )
{
print $0;
}
else
{
tmp_rec_buff=tmp_rec_buff$0 ;
tmp_field_count=tmp_field_count+NF ;
if ( $0 != "")
{ lines++ ;}
rec_lock_flag=1 ;
if(tmp_field_count==exp_fields+lines-1){
print tmp_rec_buff;
record_lock_flag=0;
tmp_field_count=0;
tmp_rec_buff="";
lines=0;
}
}
}
}
END{
}`
Using any awk in any shell on every Unix box:
$ awk 'BEGIN{RS=ORS="\""} !(NR%2){gsub(/\n/," ")} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
See also What's the most robust way to efficiently parse CSV using awk?.
Using gnu-awk we can break records using text "\n" then remove \n from each record and finally append "\n" in the end using same ORS (assuming there are no blank fields with opening and closing quotes on separate lines):
awk -v RS='"\n("|$)' '{gsub(/\n/, " "); ORS=RT} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
Another version using gnu-awk if you already know number of fields in each record as shown in your question:
awk -v n=3 -v FPAT='"[^"]*"' 'p {$0 = p " " $0; p=""}
NF < n {p = $0; next} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
With your shown samples only, you could try following awk code. Written and tested with GNU awk.
awk -v RS="" -v FS="\n" '
{
for(i=1;i<=NF;i++){
sum+=gsub(/"/,"&",$i)
val=(val?val OFS:"")$i
if(sum%2==0){
print val
sum=0
val=""
}
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v RS="" -v FS="\n" ' ##Starting awk program from here, setting RS as NULL and field separator as new line.
{
for(i=1;i<=NF;i++){ ##Traversing through all fields here.
sum+=gsub(/"/,"&",$i) ##Globally substituting " with itself and keeping its count to sum variable.
val=(val?val OFS:"")$i ##Creating val which has current field in it and keep appending its value to it.
if(sum%2==0){ ##Checking if sum is even number then do following.
print val ##Printing val here.
sum=0 ##Setting sum to 0 here.
val="" ##Nullifying val here.
}
}
}
' Input_file ##Mentioning Input_file name here.
With awk setting ORS:
awk '{ORS = (!/"$/) ? " " : "\n"} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
Here is my data:
NAME1,NAME1_001,NULL,LIC100_1,NULL,LIC300-3,LIC300-6
NAME1,NAME1_003,LIC000_1,NULL,NULL,NULL,NULL
NAME2,NAME2_001,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_001,NULL,LIC400_2,NULL,NULL,LIC500_1
NAME3,NAME3_005,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_006,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME4,NAME4_002,NULL,LIC100_1,NULL,LIC300-3,LIC300-6
Expected result:
NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6
I tried below command, but have no idea how to add the details ($3 to $7)
awk '
BEGIN{FS=","; OFS="|"};
{ arr[$1] = arr[$1] == ""? $2 : arr[$1] "|" $2 }
END {for (i in arr) print i, arr[i] }' file.csv
Any suggestion? thanks!!
Could you please try following. Written and tested with shown samples in GNU awk.
awk '
BEGIN{
FS=","
OFS="|"
}
FNR==NR{
first=$1
$1=""
sub(/^,/,"")
arr[first]=(first in arr?arr[first] OFS:"")$0
next
}
($1 in arr){
print $1 arr[$1]
delete arr[$1]
}
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS="," ##Setting FS as comma here.
OFS="|" ##Setting OFS as | here.
}
FNR==NR{ ##Checking FNR==NR which will be TRUE when first time Input_file is being read.
first=$1 ##Setting first as 1st field here.
$1="" ##Nullifying first field here.
sub(/^,/,"") ##Substituting starting comma with NULL in current line.
arr[first]=(first in arr?arr[first] OFS:"")$0 ##Creating arr with index of first and keep adding same index value to it.
next ##next will skip all further statements from here.
}
($1 in arr){ ##Checking condition if 1st field is present in arr then do following.
print $1 arr[$1] ##Printing 1st field with arr value here.
delete arr[$1] ##Deleting arr item here.
}
' Input_file Input_file ##Mentioning Input_file names here.
Another awk:
$ awk '
BEGIN { # set them field separators
FS=","
OFS="|"
}
{
if($1 in a) { # if $1 already has an entry in a hash
t=$1 # store key temporarily
$1=a[$1] # set the a hash entry to $1
a[t]=$0 # and hash the record
} else { # if $1 seen for the first time
$1=$1 # rebuild record to change the separators
a[$1]=$0 # and hash the record
}
}
END { # afterwards
for(i in a) # iterate a
print a[i] # and output
}' file
Assuming your input is grouped by the key field as shown in your example (if it isn't then sort it first) you don't need to store the whole file in memory or read it twice and this will output the lines in the same order they appear in the input:
$ cat tst.awk
BEGIN { FS=","; OFS="|" }
$1 != prev {
if (NR>1) {
print rec
}
prev = rec = $1
}
{
$1 = ""
rec = rec $0
}
END { print rec }
$ awk -f tst.awk file
NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6
I need to found field of password that is empty, with space or tab, and replace it with x (on /etc/passwd file)
I found this syntax with awk, that show users where second field (using : as delimiter) is or empty, or has space or tab inside:
awk -F":" '($2 == "" || $2 == " " || $2 == "\t") {print $0}' $file
and result is the follow:
user1::53556:100::/home/user1:/bin/bash
user2: :53557:100::/home/user2:/bin/bash
user3: :53558:100::/home/user3:/bin/bash
How I can say to awk to replace this 2nd field (empty or with space or tab) with another character? (for example x)
Could you please try following.
awk 'BEGIN{FS=OFS=":"} {$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2} 1' Input_file
Explanation: Adding explanation of above code.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here which will be executed before Input_file is being read.
FS=OFS=":" ##Setting FS and OFS as colon here for all lines of Input_file.
} ##Closing BEGIN section block here.
{
$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2 ##Checking condition if $2(2nd field) of current line is either NULL or having complete space in it then put its vaklue as X or keep $2 value as same as it is.
}
1 ##mentioning 1 will print edited/non-edited current line.
' Input_file ##Mentioning Input_file name here.
EDIT: As per OP, OP need NOT to touch last line of Input_file so adding following solutio now.
tac Input_file | awk 'BEGIN{FS=OFS=":"} FNR==1{print;next} {$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2} 1' | tac
EDIT2: In case you want to do it kin single awk itself then try following.
awk '
BEGIN{
FS=OFS=":"
}
prev{
num=split(prev,array,":")
array[2]=array[2]=="" || array[2]~/^[[:space:]]+$/?"X":array[2]
for(i=1;i<=num;i++){
val=(val?val OFS array[i]:array[i])
}
print val
val=""
}
{
prev=$0
}
END{
if(prev){
print prev
}
}' Input_file
In case you want to change Input_file itself append > temp_file && mv temp_file Input_file in above code.
$ awk 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file
user1:x:53556:100::/home/user1:/bin/bash
user2:x:53557:100::/home/user2:/bin/bash
user3:x:53558:100::/home/user3:/bin/bash
To change the original file using GNU awk:
awk -i inplace 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file
or with any awk:
awk 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file > tmp && mv tmp file
The test for NF>1 ensures we only operate on lines that already have at least 2 fields and so we don't create a line like :x in the output when there's an empty line in the input file. The rest is hopefully obvious.
Would like to read and compare first field from two files then print
Match Lines from Both the files - ( Available in f11.txt and f22.txt) -> Op_Match.txt
Non- Match Lines from f11.txt ( Available in f11.txt Not-Available in f22.txt)-> Op_NonMatch_f11.txt
Non- Match Lines from f22.txt ( Available in f22.txt Not-Available in f11.txt)-> Op_NonMatch_f22.txt
Using below 3 separate commands to achieve the above scenario's .
f11.txt
10,03-APR-14,abc
20,02-JUL-13,def
10,19-FEB-14,abc
20,02-AUG-13,def
10,22-JAN-07,abc
10,29-JUN-07,abc
40,11-SEP-13,ghi
f22.txt
50,DL,3000~4332,ABC~XYZ
10,DL,5000~2503,ABC~XYZ
30,AL,2000~2800,DEF~PQZ
To Match Lines from Both the files:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1] = $0; next} ($1 in a) {print $0,a[$1]}' f22.txt f11.txt> Op_Match.txt
10,03-APR-14,abc,10,DL,5000~2503,ABC~XYZ
10,19-FEB-14,abc,10,DL,5000~2503,ABC~XYZ
10,22-JAN-07,abc,10,DL,5000~2503,ABC~XYZ
10,29-JUN-07,abc,10,DL,5000~2503,ABC~XYZ
To Non- Match Lines from f11.txt:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1] = $0; next} !($1 in a) {print $0}' f22.txt f11.txt > Op_NonMatch_f11.txt
20,02-JUL-13,def
20,02-AUG-13,def
40,11-SEP-13,ghi
To Non- Match Lines from f22.txt:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1] = $0; next} !($1 in a) {print $0}' f11.txt f22.txt > Op_NonMatch_f22.txt
50,DL,3000~4332,ABC~XYZ
30,AL,2000~2800,DEF~PQZ
Using above 3 separate commands to achieve the mentioned scenario’s. Is there any simplest way to avoid 3 different commands? Any Suggestions ...!!!
Something like this, untested:
awk '
BEGIN{ FS=OFS="," }
NR==FNR {
fname1 = FILENAME
keys[NR] = $1
recs[NR] = $0
key2nrs[$1] = ($1 in key2nrs ? key2nrs[$1] RS : "") NR
next
}
{
if ($1 in key2nrs) {
split (key2nrs[$1],nrs,RS)
for (i=1; i in nrs; i++) {
print recs[nrs[i]], $0 > "Op_Match.txt"
}
matched[$1]
}
else {
print > ("Op_NonMatch_" FILENAME ".txt")
}
}
END {
for (i=1; i in recs; i++) {
if (! (keys[i] in matched) ) {
print recs[i] > ("Op_NonMatch_" fname1 ".txt")
}
}
}
' f11.txt f22.txt
The main difference between this and Kent and Etans answers is that theirs assume that the $1 in f22.txt can only appear once within that file while the above would work if, say, 10 occurred as the first field on multiple lines of f22.txt.
The other difference is that the above will output lines in the same order that they occurred in the input files while the other answers will output some of them in random order based on how they're stored internally in a hash table.
I haven't checked #EdMorton's answer but he will quite likely have gotten it right.
My solution (which looks slightly less generic than his at first glance) is:
awk -F, '
FNR==NR {
a[$1]=$0;
next
}
($1 in a){
print $0,a[$1] > "Op_Match.txt"
am[$1]++
}
!($1 in a) {
print $0 > "Op_NonMatch_f11.txt"
}
END {
for (i in a) {
if (!(i in am)) {
print a[i] > "Op_NonMatch_f22.txt"
}
}
}
' f22.txt f11.txt
here is one:
awk -F, -v OFS="," 'NR==FNR{a[$1]=$0;next}
$1 in a{print $0,a[$1]>("common.txt");c[$1];next}
{print $0>("NonMatchFromFile1.txt")}
END{for(x in a)
if(!(x in c))
print a[x]>("NonMatchFromFile2.txt")}' f2 f1
with this, you will get 3 files: common.txt, nonmatchfromFile1.txt and nonMatchfromfile2.txt