awk search pattern in a specific field and replace its content - awk

I need to found field of password that is empty, with space or tab, and replace it with x (on /etc/passwd file)
I found this syntax with awk, that show users where second field (using : as delimiter) is or empty, or has space or tab inside:
awk -F":" '($2 == "" || $2 == " " || $2 == "\t") {print $0}' $file
and result is the follow:
user1::53556:100::/home/user1:/bin/bash
user2: :53557:100::/home/user2:/bin/bash
user3: :53558:100::/home/user3:/bin/bash
How I can say to awk to replace this 2nd field (empty or with space or tab) with another character? (for example x)

Could you please try following.
awk 'BEGIN{FS=OFS=":"} {$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2} 1' Input_file
Explanation: Adding explanation of above code.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here which will be executed before Input_file is being read.
FS=OFS=":" ##Setting FS and OFS as colon here for all lines of Input_file.
} ##Closing BEGIN section block here.
{
$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2 ##Checking condition if $2(2nd field) of current line is either NULL or having complete space in it then put its vaklue as X or keep $2 value as same as it is.
}
1 ##mentioning 1 will print edited/non-edited current line.
' Input_file ##Mentioning Input_file name here.
EDIT: As per OP, OP need NOT to touch last line of Input_file so adding following solutio now.
tac Input_file | awk 'BEGIN{FS=OFS=":"} FNR==1{print;next} {$2=$2=="" || $2~/^[[:space:]]+$/?"X":$2} 1' | tac
EDIT2: In case you want to do it kin single awk itself then try following.
awk '
BEGIN{
FS=OFS=":"
}
prev{
num=split(prev,array,":")
array[2]=array[2]=="" || array[2]~/^[[:space:]]+$/?"X":array[2]
for(i=1;i<=num;i++){
val=(val?val OFS array[i]:array[i])
}
print val
val=""
}
{
prev=$0
}
END{
if(prev){
print prev
}
}' Input_file
In case you want to change Input_file itself append > temp_file && mv temp_file Input_file in above code.

$ awk 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file
user1:x:53556:100::/home/user1:/bin/bash
user2:x:53557:100::/home/user2:/bin/bash
user3:x:53558:100::/home/user3:/bin/bash
To change the original file using GNU awk:
awk -i inplace 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file
or with any awk:
awk 'BEGIN{FS=OFS=":"} (NF>1) && ($2~/^[[:space:]]*$/){$2="x"} 1' file > tmp && mv tmp file
The test for NF>1 ensures we only operate on lines that already have at least 2 fields and so we don't create a line like :x in the output when there's an empty line in the input file. The rest is hopefully obvious.

Related

awk to add prefix if not present in field

I am trying to add a prefix to a field in awk if it is not already present. That is if chr isn't present before the number it is inserted. However, if it is there it is skipped.
The first awk adds the prefix to each $2 even if it is present and the senond awk does skip the $2 with chr in them, but does print chr in the $2 without. Thank you :).
file
ASPA,17:3483575-3483585
ATM,11:108289609-108289613
ATP7B,13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123
desired
ASPA,chr17:3483575-3483585
ATM,chr11:108289609-108289613
ATP7B,chr13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123
awk
awk -F, '{$2="chr"$2; print}' file
awk 2
awk -F, '$2 !~/chr/{gsub("chr","chr",$2)}1' file
You can use:
awk 'BEGIN {FS=OFS=","} $2 !~ /^chr/ {$2="chr" $2} 1' file
ASPA,chr17:3483575-3483585
ATM,chr11:108289609-108289613
ATP7B,chr13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123
Or without using any regex:
awk 'BEGIN {FS=OFS=","} index($2 , "chr") != 1 {$2="chr" $2} 1' file
Another solution that might be shortest of all:
awk '{sub(/,(chr)?/, ",chr")} 1' file
1st solution: With your shown samples, please try following awk code.
awk '
BEGIN{FS=OFS=":"}
{
split($1,arr,",")
if(int(arr[2]) || arr[2]==0){
$1=arr[1] ",chr" arr[2]
}
}
1
' Input_file
2nd solution: With GNU awk using its match function which captures values into an array from capturing groups try following code.
awk '
match($0,/^([^,]*,)([^:]*)(:.*)/,arr){
if(int(arr[2]) || arr[2]==0){
arr[2]="chr" arr[2]
}
print arr[1] arr[2] arr[3]
}
' Input_file
3rd solution(Bonus one): Just in case your 2nd field is having Negative values(integers) and you want to change it Eg: from -11 to -chr11 then you can try following GNU awk code.
awk '
match($0,/^([^,]*,)(-)?([^:]*)(:.*)/,arr){
if(int(arr[3]) || arr[3]==0){
if(arr[2]=="-"){
arr[3]="-chr" arr[3]
}
else{
arr[3]="chr" arr[3]
}
$0=arr[1] arr[3] arr[4]
}
print
}
' Input_file
mawk NF=NF FS=',(chr)?' OFS=',chr'
ASPA,chr17:3483575-3483585
ATM,chr11:108289609-108289613
ATP7B,chr13:51937469-51937480
ATR,chr3:142562768-142562773
BAG3,chr10:119670120-119670123

Append nextline to current line until pattern matched in awk

Input file data:
"1","123","hh
KKK,111,ll
Jk"
"2","124","jj"
Output data:
"1","123","hh KKK,111,ll jk"
"2","124","jj"
Tried below code in awk file. still not working for desired output:
BEGIN{
`FS="\",\"";
record_lock_flag=0;
total_feilds=3;
tmp_field_count=0;
tmp_rec_buff="";
lines=0;
}
{
if(NR>0)
{
if( record_lock_flag == 0 && NF == total_feilds && substr($NF,length($NF)-1,length($NF)) ~ /^"/ )
{
print $0;
}
else
{
tmp_rec_buff=tmp_rec_buff$0 ;
tmp_field_count=tmp_field_count+NF ;
if ( $0 != "")
{ lines++ ;}
rec_lock_flag=1 ;
if(tmp_field_count==exp_fields+lines-1){
print tmp_rec_buff;
record_lock_flag=0;
tmp_field_count=0;
tmp_rec_buff="";
lines=0;
}
}
}
}
END{
}`
Using any awk in any shell on every Unix box:
$ awk 'BEGIN{RS=ORS="\""} !(NR%2){gsub(/\n/," ")} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
See also What's the most robust way to efficiently parse CSV using awk?.
Using gnu-awk we can break records using text "\n" then remove \n from each record and finally append "\n" in the end using same ORS (assuming there are no blank fields with opening and closing quotes on separate lines):
awk -v RS='"\n("|$)' '{gsub(/\n/, " "); ORS=RT} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
Another version using gnu-awk if you already know number of fields in each record as shown in your question:
awk -v n=3 -v FPAT='"[^"]*"' 'p {$0 = p " " $0; p=""}
NF < n {p = $0; next} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
With your shown samples only, you could try following awk code. Written and tested with GNU awk.
awk -v RS="" -v FS="\n" '
{
for(i=1;i<=NF;i++){
sum+=gsub(/"/,"&",$i)
val=(val?val OFS:"")$i
if(sum%2==0){
print val
sum=0
val=""
}
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk -v RS="" -v FS="\n" ' ##Starting awk program from here, setting RS as NULL and field separator as new line.
{
for(i=1;i<=NF;i++){ ##Traversing through all fields here.
sum+=gsub(/"/,"&",$i) ##Globally substituting " with itself and keeping its count to sum variable.
val=(val?val OFS:"")$i ##Creating val which has current field in it and keep appending its value to it.
if(sum%2==0){ ##Checking if sum is even number then do following.
print val ##Printing val here.
sum=0 ##Setting sum to 0 here.
val="" ##Nullifying val here.
}
}
}
' Input_file ##Mentioning Input_file name here.
With awk setting ORS:
awk '{ORS = (!/"$/) ? " " : "\n"} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"

How to fetch a particular string using a sed command

I have an input string like below:
VAL:1|b:2|c:3|VAL:<har:919876543210#abc.com>; tag=vy6r5BpcvQ|VAl:1234|name:mnp|VAL:91987654321
Like this, there are more than 1000 rows.
I want to fetch the value of the first parameter, i.e., the a field and d field, but for the d field I want only har:919876543210#abc.com.
I tried like this:
cat $filename | grep -v Orig |sed -e 's/['a:','d:']//g' |awk -F'|' -v OFS=',' '{print $1 "," $4}' >> $NGW_DATA_FILE
The output I got is below:
1,<har919876543210#abc.com>; tag=vy6r5BpcvQ
I want it like this,
1,har:919876543210#abc.com
Where did I make the mistake and how do I solve it?
EDIT: As per OP's change of Input_file and OP's comments, adding following now.
awk '
BEGIN{ FS="|"; OFS="," }
{
sub(/[^:]*:/,"",$1)
gsub(/^[^<]*|; .*/,"",$4)
gsub(/^<|>$/,"",$4)
print $1,$4
}' Input_file
With shown samples, could you please try following, written and tested with shown samples in GNU awk.
awk '
BEGIN{
FS="|"
OFS=","
}
{
val=""
for(i=1;i<=NF;i++){
split($i,arr,":")
if(arr[1]=="a" || arr[1]=="d"){
gsub(/^[^:]*:|; .*/,"",$i)
gsub(/^<|>$/,"",$i)
val=(val?val OFS:"")$i
}
}
print val
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS="|" ##Setting FS as pipe here.
OFS="," ##Setting OFS as comma here.
}
{
val="" ##Nullify val here(to avoid conflicts of its value later).
for(i=1;i<=NF;i++){ ##Traversing through all fields here
split($i,arr,":") ##Splitting current field into arr with delimiter by :
if(arr[1]=="a" || arr[1]=="d"){ ##Checking condition if first element of arr is either a OR d
gsub(/^[^:]*:|; .*/,"",$i) ##Globally substituting from starting till 1st occurrence of colon OR from semi colon to everything with NULL in $i.
val=(val?val OFS:"")$i ##Creating variable val which has current field value and keep adding in it.
}
}
print val ##printing val here.
}
' Input_file ##Mentioning Input_file name here.
You may also try this AWK script:
cat file
VAL:1|b:2|c:3|VAL:<har:919876543210#abc.com>; tag=vy6r5BpcvQ|VAl:1234|name:mnp|VAL:91987654321
awk -F '[|;]' '{
s=""
for (i=1; i<=NF; ++i)
if ($i ~ /^VAL:/) {
gsub(/^[^:]+:|[<>]*/, "", $i)
s = (s == "" ? "" : s "," ) $i
}
print s
}' file
1,har:919876543210#abc.com
You can do the same thing with sed rather easily using Extended Regex, two capture groups and two back-references, e.g.
sed -E 's/^[^:]*:(\w+)[^<]*[<]([^>]+).*$/\1,\2/'
Explanation
's/find/replace/' standard substitution, where the find is;
^[^:]*: from the beginning skip through the first ':', then
(\w+) capture one or more word characters ([a-zA-Z0-9_]), then
[^<]*[<] consume zero or more characters not a '<', then the '<', then
([^>]+) capture everything not a '>', and
.*$ discard all remaining chars in line, then the replace is
\1,\2 reinsert the captured groups separated by a comma.
Example Use/Output
$ echo 'a:1|b:2|c:3|d:<har:919876543210#abc.com>; tag=vy6r5BpcvQ|' |
sed -E 's/^[^:]*:(\w+)[^<]*[<]([^>]+).*$/\1,\2/'
1,har:919876543210#abc.com

If one string matches at the beginning of last line in a specific file then replace the other string from same line.using regex groups?

I have a file "test"
Below is the content
235788###235788###20200724_103122###SUCCESS
235791###235791###20200724_105934###SUCCESS
235833###235833###20200724_130652###FAILURE
235842###235842###20200724_132721###FAILURE
235852###235852###20200724_134607###FAILURE
235791###235791###20200724_105934###SUCCESS
if last line of this file begin with 235791 then replace string "SUCCESS" to "FAILURE" on just that line.
Expected Output
235788###235788###20200724_103122###SUCCESS
235791###235791###20200724_105934###SUCCESS
235833###235833###20200724_130652###FAILURE
235842###235842###20200724_132721###FAILURE
235852###235852###20200724_134607###FAILURE
235791###235791###20200724_105934###FAILURE
Below is the sample code
id = 235791
last_build_id = `tail -1 test | awk -F'###' '{print \$1}'`
if (id == last_build_id ){
sed -i '$s/SUCCESS/FAILURE/' test
}
I would like to avoid these many lines and use one line shell command using regex groups or in any other simple way.
sed might be easier here
$ sed -E '$s/(^235791#.*)SUCCESS$/\1FAILURE/' file
you can add -i for in place update.
To pass id as a variable
$ id=235791; sed -E '$s/(^'$id'#.*)SUCCESS$/\1FAILURE/' file
you should double quote "$id" ideally, but if you're sure about the contents you may get away without.
With GNU sed
sed -E '${/^235791\>/ s/SUCCESS$/FAILURE/}' file
Or with the BSD sed on MacOS
sed -E '${/^235791#/ s/SUCCESS$/FAILURE/;}' file
When working with "the last X in the file", it's often easier to reverse the file and work with "the first X":
tac file | awk '
BEGIN {FS = OFS = "###"}
NR == 1 && $1 == 235791 && $NF == "SUCCESS" {$NF = "FAILURE"}
1
' | tac
Could you please try following, written and tested with shown samples in GNU awk. You need not to use many commands for this one, we could do this in a single awk itself.
One liner form of code:
awk -v id="$your_shell_variable" 'BEGIN{ FS=OFS="###" } NR>1{print prev} {prev=$0} END{if($1==id && $NF=="SUCCESS"){$NF="FAILURE"}; print}' Input_file > temp && mv temp Input_file
Explanation: Adding detailed explanation for above.
awk -v id="$your_shell_variable"' ##Starting awk program from here.
NR>1{ ##Checking condition if prev is NOT NULL then do following.
print prev ##Printing prev here.
}
{
prev=$0 ##Assigning current line to prev here.
}
END{ ##Starting END block of this program from here.
if($1==id && $NF=="SUCCESS"){ ##Checking condition if first field is 235791 and last field is SUCCESS then do following.
$NF="FAILURE" ##Setting last field FAILURE here.
}
print ##Printing last line here.
}
' Input_file > temp && mv temp Input_file ##Mentioning Input_file name here.
2nd solution: As per Ed sir's comment some awk's don't support $1, $NF in END sections so if above doesn't work for someone please try more generic solution as follows.
One liner form of solution(since specifically asking it):
awk -v id="$your_shell_variable" 'BEGIN{ FS=OFS="###" } NR>1{print prev} {prev=$0} END{num=split(prev,array,"###");if(array[1]==id && array[num]=="SUCCESS"){array[num]="FAILURE"};for(i=1;i<=num;i++){val=(val?val OFS:"")array[i]};print val}' Input_file > temp && mv temp Input_file
Detailed level(non-one liner code):
awk -v id="$your_shell_variable" '
BEGIN{ FS=OFS="###" }
NR>1{
print prev
}
{
prev=$0
}
END{
num=split(prev,array,"###")
if(array[1]==id && array[num]=="SUCCESS"){
array[num]="FAILURE"
}
for(i=1;i<=num;i++){
val=(val?val OFS:"")array[i]
}
print val
}
' Input_file > temp && mv temp Input_file
$ awk -v val='235791' '
BEGIN { FS=OFS="###" }
NR>1 { print prev }
{ prev=$0 }
END {
$0=prev
if ($1 == val) {
$NF="FAILURE"
}
print
}
' file
235788###235788###20200724_103122###SUCCESS
235791###235791###20200724_105934###SUCCESS
235833###235833###20200724_130652###FAILURE
235842###235842###20200724_132721###FAILURE
235852###235852###20200724_134607###FAILURE
235791###235791###20200724_105934###FAILURE

How to join two files using awk?

Have two files: 1.txt and 2.txt
1.txt has items and their order in this form:
item-code|order-value|label
2.txt has items and their properties in this form:
item-code|property-A|property-B| ... |property-Z
For example, 1.txt looks like this:
ITEM-CODE|_o_o_|prefLabel-EN-ANSI
6|8719|disparlure
7|3300|acids,-bases,-and-salts
8|3299|chemical-compounds
2.txt looks like this:
ITEM-CODE|TERM|AV-FTC|DB-PEDIA-IRI|LCSH-1|LCSH-2|LCSH-3|LCSH-4|LCSH-5|LCSH-6|LCSH-7|GACS-IRI
2|positive-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C4028
4|negative-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C3806
6|disparlure|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_
7|acids,-bases,-and-salts|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_
8|chemical-compounds|c_49870|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C29686
sample 3.txt (result - see below) looks liken this:
ITEM-CODE|TERM|AV-FTC|DB-PEDIA-IRI|LCSH-1|LCSH-2|LCSH-3|LCSH-4|LCSH-5|LCSH-6|LCSH-7|GACS-IRI|_o_o_
2|positive-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C4028|NULL
4|negative-sense,-single-stranded-RNA-viruses|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|http://id.agrisemantics.org/gacs/C3806|NULL
6|disparlure|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|_0_|8719
this awk function:
BEGIN { FS=OFS="|" }
NR==FNR{
a[$1]=$2
next
}
{
if ($1 in a)
$(NF+1)=a[$1]
else
$(NF+1)="NULL"
print
}
generates:
item-code|label|property-A|property-B| ... |property-Z|order-value
if no item-code from 1.txt matches item-code in 2.txt, NULL is substituted for the missing order-value
How to modify the awk function to keep 1.txt on the left (the "constant") and 2.txt on the right (the "variables") and generate a result like this:
item-code|order-value|label|property-A|property-B| ... |property-Z
or, if no property-value is available for item-code, then
item-code|order-value|label|NULL
the command looks like this:
C:\gnu\GnuWin32\bin\awk.exe -f a.awk 1.txt 2.txt > 3.txt
where a.awk is the awk function above.
Am running awk on Win10 and using double quotes
Could you please try following.
awk '
BEGIN{
FS=OFS="|"
}
FNR==1 && ++count==1{
val=$2
next
}
FNR==1 && ++count==2{
print $0,val
next
}
FNR==NR{
a[$1]=$2
next
}
{
print $0,a[$1]?a[$1]:"NULL"
}
' 1.txt 2.txt
Explanation: Adding explanation for above code too now.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section for awk program here.
FS=OFS="|" ##Setting field separator and output field separator as pipe here.
} ##Closing BEGIN section here.
FNR==1 && ++count==1{ ##Checking condition if FNR==1 and variable count value is 1 means first Input_file header is being read.
val=$2 ##Creating variable val and setting its value as $2 here.
next ##Next will skip all further statements from here onwards.
} ##Closing this condition block.
FNR==1 && ++count==2{ ##Checking condition where FNR==1 and count variable value is 2 here.
print $0,val ##Printing current line with variable val here.
next ##Next will skip all further statements from here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1.txt is being read.
a[$1]=$2 ##Creating an array named a whose index is $1 and value is $2.
next ##next will skip all further statements from here.
}
{
print $0,a[$1]?a[$1]:"NULL" ##Printing current line and printing value of a[$1] if a[$1] is having no value then print NULL.
}
' 1.txt 2.txt ##Mentioning Input_file names here.
You can do that with join.
1.txt
1|48000|first
2|67500|second
3|81990|third
4|55000|fourth
2.txt
1|fred|sara|anthony
3|steve|jane|mike
4|tim
Then run:
join -a 1 -e "NULL" -t '|' -o 1.1,1.2,1.3,2.2,2.3,2.4 1.txt 2.txt
Sample Result
1|48000|first|fred|sara|anthony
2|67500|second|NULL|NULL|NULL
3|81990|third|steve|jane|mike
4|55000|fourth|tim|NULL|NULL