Bash - Update a value in a file using sed command - awk

I am trying to update the value of key1[count] to 2 in the following file.
#file1
//some additional lines
key1 = { var1 = "1", var2 = "12", "count" = 1 }
key2 = { var1 = "32", var2 = "23", "count" = 1 }
key3 = { var1 = "22", var2 = "32", "count" = 3 }
key4 = { var1 = "32", var2 = "12", "count" = 3 }
//some additional lines
I had tried using awk - awk -i inplace '{ if ( $1 == "key1" ) $12 = 2 ; print $0 }' file1.
However, this only works with awk version 4.1.0 and higher.

With your shown samples, please try following awk code.
awk -v newVal="2" '
/^key1/ && match($0,/"count"[[:space:]]+=[[:space:]]+[0-9]+/){
value=substr($0,RSTART,RLENGTH)
sub(/[0-9]+$/,"",value)
$0=substr($0,1,RSTART-1) value newVal substr($0,RSTART+RLENGTH)
}
1
' Input_file
Explanation: Checking condition if line starts from key1 and using match function to match regex "count"[[:space:]]+=[[:space:]]+[0-9]+ to get count string with its value here. Then with matched sub-string creating value(variable), where substituting digits at last of value to NULL. Then re-assigning value of current line to before match value, matched value, new value variable and rest of the line to make it as per OP's requirement. Finally by 1 printing current line(s).
NOTE: Above will only print the output on terminal, once you are Happy with shown results append > temp && mv temp Input_file to above code.

You can use sed with the -r option for regular expression.
For your example, you can use this command.
sed -i -r 's/(key1.*"count" = )([0-9]*)(.*)/\12\3/g' <file>
More information:
Command template:
sed -i -r 's/(<bef>)(<tar>)(<aft>)/\1<new value>\3/g' <file>
Where
<bef> is a regex that targets /key1... "count" = /
<tar> is a regex that targets /1/
<aft> is a regex that targets / }/
Together, /<bef><tar><aft>/ will expand to become:
/key1 = { var1 = "1", var2 = "12", "count" = 1 }/
Since you want to replace "1" with "2"
<new value> is 2

Related

change field value of one file based on another input file using awk

I have a sparse matrix ("matrix.csv") with 10k rows and 4 columns (1st column is "user", and the rest columns are called "slots" and contain 0s or 1s), like this:
user1,0,1,0,0
user2,0,1,0,1
user3,1,0,0,0
Some of the slots that contain a "0" should be changed to contain a "1".
I have another file ("slots2change.csv") that tells me which slots should be changed, like this:
user1,3
user3,2
user3,4
So for user1, I need to change slot3 to contain a "1" instead of a "0", and for user3 I should change slot2 and slot4 to contain a "1" instead of a "0", and so on.
Expected result:
user1,0,1,1,0
user2,0,1,0,1
user3,1,1,0,1
How can I achieve this using awk or sed?
Looking at this post: awk or sed change field of file based on another input file, a user proposed an answer that is valid if the "slots2change.csv" file do not contain the same user in diferent rows, which is not the case in here.
The solution proposed was:
awk 'BEGIN{FS=OFS=","}
NR==FNR{arr[$1]=$2;next}
NR!=FNR {for (i in arr)
if ($1 == i) {
F=arr[i] + 1
$F=1
}
print
}
' slots2change.csv matrix.csv
But that answer doesn't apply in the case where the "slots2change.csv" file contain the same user in different rows, as is now the case.
Any ideas?
Using GNU awk for arrays of arrays:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR == FNR {
users2slots[$1][$2]
next
}
$1 in users2slots {
for ( slot in users2slots[$1] ) {
$(slot+1) = 1
}
}
{ print }
$ awk -f tst.awk slots2change.csv matrix.csv
user1,0,1,1,0
user2,0,1,0,1
user3,1,1,0,1
or using any awk:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR == FNR {
if ( !seen[$0]++ ) {
users2slots[$1] = ($1 in users2slots ? users2slots[$1] FS : "") $2
}
next
}
$1 in users2slots {
split(users2slots[$1],slots)
for ( idx in slots ) {
slot = slots[idx]
$(slot+1) = 1
}
}
{ print }
$ awk -f tst.awk slots2change.csv matrix.csv
user1,0,1,1,0
user2,0,1,0,1
user3,1,1,0,1
Using sed
while IFS="," read -r user slot; do
sed -Ei "/$user/{s/(([^,]*,){$slot})[^,]*/\11/}" matrix.csv
done < slots2change.csv
$ cat matrix.csv
user1,0,1,1,0
user2,0,1,0,1
user3,1,1,0,1
If the order in which the users are outputted doesn't matter then you could do something like this:
awk '
BEGIN { FS = OFS = "," }
FNR == NR {
fieldsCount[$1] = NF
for (i = 1; i <= NF; i++ )
matrix[$1,i] = $i
next
}
{ matrix[$1,$2+1] = 1 }
END {
for ( id in fieldsCount ) {
nf = fieldsCount[id]
for (i = 1; i <= nf; i++)
printf "%s%s", matrix[id,i], (i < nf ? OFS : ORS)
}
}
' matrix.csv slots2change.csv
user1,0,1,1,0
user2,0,1,0,1
user3,1,1,0,1
This might work for you (GNU sed):
sed -E 's#(.*),(.*)#/^\1/s/,[01]/,1/\2#' fileChanges | sed -f - fileCsv
Create a sed script from the file containing the changes and apply it to the intended file.
The solution above, manufactures a match and substitution for each line in the file changes. This is then piped through to second invocation of sed which applies the sed script to the csv file.

How to make an array of alphabets from a file and update in a new file

I have a single column file.
A
A
A
B
B
B
C
C
D
I want to use this file and want to make a new one as below
command="A" "B" "C" "D"
TYPE=1 1 1 2 2 2 3 3 4,
These A B C D are random alphabets and varies file to file.
I tried to overcome the solution with below shell script
#!/bin/bash
NQ=$(cat RLP.txt | wc -l)
ELEMENT='element='
echo "$ELEMENT" > element.txt
TYPE='types='
echo "$TYPE" > types.txt
for i in `seq 1 1 $NQ`
do
RLP=$(echo "$i" | tail -n 1)
cat RLP.txt | head -n "$RLP" | tail -n 1 > el.$RLP.txt
done
paste element.txt el.*.txt
paste types.txt
The output of paste element.txt el.*.txt is element= A A A B B B C C D
I could not remove the repeated alphabets and put the reaming alphabets in "".
and cold not move forward for with second command to get
TYPE=1 1 1 2 2 2 3 3 4,
which represents that the 1st alphabets repeated three times, 2nd alphabets repeated three times, 3rd alphabets repeated two times and so on..
$ cat tst.awk
!seen[$1]++ {
cmd = cmd sep "\"" $1 "\""
cnt++
}
{
type = type sep cnt
sep = OFS
}
END {
print "command=" cmd
print "TYPE=" type ","
}
$ awk -f tst.awk file
command="A" "B" "C" "D"
TYPE=1 1 1 2 2 2 3 3 4,
Instead of using multiple text processing tools in a pipeline, this can be achieved by one awk command as below
awk '
{
unique[$0]
}
prev !~ $0 {
alpha[NR] = idx++
}
{
prev = $0
alpha[NR] = idx
}
END {
for (i in unique) {
str = str ? (str " " "\"" i "\"") : "\"" i "\""
}
first = "command=" str
str = ""
for (i = 1; i <= NR; i++) {
str = str ? (str " " alpha[i]) : alpha[i]
}
second = "TYPE=" str ","
print(first "\n" second) > "types.txt"
close("types.txt")
}' RLP.txt
The command works as follows
Each unique line in the file is saved as an index in into the array unique
The array alpha keeps track of the unique value counter, i.e. every time a value in the file changes, the counter is incremented at the corresponding line number NR
The END block is all about constructing the output from the array to a string value and writing the result to the new file "types.txt"
Pure bash implementation. Requires at least Bash version 4 for the associative array
#!/bin/bash
outfile="./RLP.txt"
infile="./infile"
declare -A map
while read line; do
(( map["$line"]++ ))
done < "$infile"
command="command="
command+=$(printf "\"%s\" " "${!map[#]}")
type="$(
for i in "${map[#]}"; do
((k++))
for (( j=0; j < i; j++ )); do
printf " %d" "$k"
done
done
),"
echo "$command" >> "$outfile"
echo "TYPE=${type#* }" >> "$outfile"

read file and extract variables based on what is in the line

I have a file that looks like this:
$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven
I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.
So, for the above example, lets say I want to extract the variables A, B, and C, then for each line, the loop would have this:
garbage text A=one B=two C=three D=four
A = "one"
B = "two"
C = "three"
garbage text A= B=six D=seven
A = ""
B = "six"
C = ""
garbage text A=eight E=nine D=ten B=eleven
A = "eight"
B = "eleven"
C = ""
My original plan was to use sed but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).
My next thought is to go through line by line, then split the line into fields using awk and set variables based on each field but I have no clue where or how to start.
I'm open to other ideas or better suggestions.
right answer depends on what you're going to do with the variables.
assuming you need them as shell variables, here is a different approach
$ while IFS= read -r line;
do A=""; B=""; C="";
source <(echo "$line" | grep -oP "(A|B|C)=\w*" );
echo "A=$A B=$B C=$C";
done < file
A=one B=two C=three
A= B=six C=
A=eight B=eleven C=
the trick is using source for variable declarations extracted from each line with grep. Since value assignments carry over, you need to reset them before each new line.
If perl is your option, please try:
perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test
Output:
A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""
It parses each line for assignments with =, store the key-value pair in an assoc array %a, then finally reports the values for A, B and C.
I'm partial to the awk solution, e.g.
$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven
Explanation
for (i = 1; i <= NF; i++) loop over each space separated field;
if ($i ~ /^[A-Za-z_][^=]*[=]/) if the field begins with at least one character that is [A-Za-z_] followed by an '='; then
print $i print the field.
On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C and you do not want to simply print them, if this is the case then following(s) may help you.
1st Solution: It considers that your variables A,B,C are always coming in same field number.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=${third#*=}
b_var=${fourth#*=}
c_var=${fifth#*=}
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.
2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A,B, or C's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.
while read line
do
a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file
Output will be as follows.
Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=
EDIT1: In case you simply want to print values of A,B,C then try following.
awk '{
for(i=1;i<=NF;i++){
if($i ~ /[ABCabc]=/){
sub(/.*=/,"",$i)
a[++count]=$i
}
}
print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
delete a
}' Input_file
Another Perl
perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '
with the input file
$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$
a generic variable awk seld documented.
Assuming variable separator are = and not part of text before nor variable content itself.
awk 'BEGIN {
# load the list of variable and order to print
VarSize = split( "A B C", aIdx )
# create a pattern filter for variable catch in lines
for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
}
{
# reset varaible value
split( "", aVar )
# for each part of the line
for ( Fld=1; Fld<=NF; Fld++ ) {
# if part is a varaible assignation
if( $Fld ~ VarEntry ) {
# separate variable name and content in array
split( $Fld, aTemp, /=/ )
# put variable content in corresponding varaible name container
aVar[aTemp[1]] = aTemp[2]
}
}
# print all variable content (empty or not) found on this line
for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
}
' YourFile
Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:
$ cat tst.awk
BEGIN {
numKeys = split("A B C",keys)
}
{
delete f
for (i=1; i<=NF; i++) {
if ( split($i,t,/=/) == 2 ) {
f[t[1]] = t[2]
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
}
}
$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""
$ while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )
$ echo "${arr["A"]}"
eight

gsub for substituting translations not working

I have a dictionary dict with records separated by ":" and data fields by new lines, for example:
:one
1
:two
2
:three
3
:four
4
Now I want awk to substitute all occurrences of each record in the input
file, eg
onetwotwotwoone
two
threetwoone
four
My first awk script looked like this and works just fine:
BEGIN { RS = ":" ; FS = "\n"}
NR == FNR {
rep[$1] = $2
next
}
{
for (key in rep)
grub(key,rep[key])
print
}
giving me:
12221
2
321
4
Unfortunately another dict file contains some character used by regular expressions, so I have to substitute escape characters in my script. By moving key and rep[key] into a string (which can then be parsed for escape characters), the script will only substitute the second record in the dict. Why? And how to solve?
Here's the current second part of the script:
{
for (key in rep)
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
print
}
All scripts are run by awk -f translate.awk dict input
Thanks in advance!
Your fundamental problem is using strings in regexp and backreference contexts when you don't want them and then trying to escape the metacharacters in your strings to disable the characters that you're enabling by using them in those contexts. If you want strings, use them in string contexts, that's all.
You won't want this:
gsub(regexp,backreference-enabled-string)
You want something more like this:
index(...,string) substr(string)
I think this is what you're trying to do:
$ cat tst.awk
BEGIN { FS = ":" }
NR == FNR {
if ( NR%2 ) {
key = $2
}
else {
rep[key] = $0
}
next
}
{
for ( key in rep ) {
head = ""
tail = $0
while ( start = index(tail,key) ) {
head = head substr(tail,1,start-1) rep[key]
tail = substr(tail,start+length(key))
}
$0 = head tail
}
print
}
$ awk -f tst.awk dict file
12221
2
321
4
Never mind for asking....
Just some missing parentheses...?!
{
for (key in rep)
{
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
}
print
}
works like a charm.

awk | Rearrange fields of CSV file on the basis of column value

I need you help in writing awk for the below problem. I have one source file and required output of it.
Source File
a:5,b:1,c:2,session:4,e:8
b:3,a:11,c:5,e:9,session:3,c:3
Output File
session:4,a=5,b=1,c=2
session:3,a=11,b=3,c=5|3
Notes:
Fields are not organised in source file
In Output file: fields are organised in their specific format, for example: all a values are in 2nd column and then b and then c
For value c, in second line, its coming as n number of times, so in output its merged with PIPE symbol.
Please help.
Will work in any modern awk:
$ cat file
a:5,b:1,c:2,session:4,e:8
a:5,c:2,session:4,e:8
b:3,a:11,c:5,e:9,session:3,c:3
$ cat tst.awk
BEGIN{ FS="[,:]"; split("session,a,b,c",order) }
{
split("",val) # or delete(val) in gawk
for (i=1;i<NF;i+=2) {
val[$i] = (val[$i]=="" ? "" : val[$i] "|") $(i+1)
}
for (i=1;i in order;i++) {
name = order[i]
printf "%s%s", (i==1 ? name ":" : "," name "="), val[name]
}
print ""
}
$ awk -f tst.awk file
session:4,a=5,b=1,c=2
session:4,a=5,b=,c=2
session:3,a=11,b=3,c=5|3
If you actually want the e values printed, unlike your posted desired output, just add ,e to the string in the split() in the BEGIN section wherever you'd like those values to appear in the ordered output.
Note that when b was missing from the input on line 2 above, it output a null value as you said you wanted.
Try with:
awk '
BEGIN {
FS = "[,:]"
OFS = ","
}
{
for ( i = 1; i <= NF; i+= 2 ) {
if ( $i == "session" ) { printf "%s:%s", $i, $(i+1); continue }
hash[$i] = hash[$i] (hash[$i] ? "|" : "") $(i+1)
}
asorti( hash, hash_orig )
for ( i = 1; i <= length(hash); i++ ) {
printf ",%s:%s", hash_orig[i], hash[ hash_orig[i] ]
}
printf "\n"
delete hash
delete hash_orig
}
' infile
that splits line with any comma or colon and traverses all odd fields to save either them and its values in a hash to print at the end. It yields:
session:4,a:5,b:1,c:2,e:8
session:3,a:11,b:3,c:5|3,e:9