awk: printing lines side by side when the first field is the same in the records - awk

I have a file containing lines like
a x1
b x1
q xq
c x1
b x2
c x2
n xn
c x3
I would like to test on the fist field in each line, and if there is a match I would like to append the matching lines to the first line. The output should look like
a x1
b x1 b x2
q xq
c x1 c x2 c x3
n xn
any help will be greatly appreciated

Using awk you can do this:
awk '{arr[$1]=arr[$1]?arr[$1] " " $0:$0} END {for (i in arr) print arr[i]}' file
n xn
a x1
b x1 b x2
c x1 c x2 c x3
q xq

To preserve input ordering:
$ awk '
{
if ($1 in vals) {
prev = vals[$1] " "
}
else {
prev = ""
keys[++k] = $1
}
vals[$1] = prev $0
}
END {
for (k=1;k in keys;k++)
print vals[keys[k]]
}
' file
a x1
b x1 b x2
q xq
c x1 c x2 c x3
n xn

What I ended up doing. (The answers by Ed Morton and Jonte are obviously more elegant.)
First I saved the 1st column of the input file in a separate file.
awk '{print $1}' input.file.txt > tmp0
Then saved the input file with lines, which has duplicate values at $1 field, removed.
awk 'BEGIN { FS = "\t" }; !x[$1]++ { print $0}' input_file.txt > tmp1
Then saved all the lines with duplicate $1 field.
awk 'BEGIN { FS = "\t" }; x[$1]++ { print $0}' input_file.txt >tmp2
Then saved the $1 fields of the non-duplicate file (tmp1).
awk '{ print $1}' tmp1 > tmp3
I used a for loop to pull in lines from the duplicate file (tmp2) and the duplicates removed file (tmp1) into an output file.
for i in $(cat tmp3)
do
if [ $(grep -w $i tmp0 | wc -l) = 1 ] #test for single instance in the 1st col of input file
then
echo "$(grep -w $i tmp1)" >> output.txt #if single then pull that record from no dupes
else
echo -e "$(grep -w $i tmp1) \t $(grep -w $i tmp2 | awk '{
printf $0"\t" }; END { printf "\n" }')" >> output.txt # if not single then pull that record from no_dupes first then all the records from dupes in a single line.
fi
done
Finally remove the tmp files
rm tmp* # remove all the tmp files

Related

Find and replace and move a line that contains a specific string

Assuming I have the following text file:
a b c d 1 2 3
e f g h 1 2 3
i j k l 1 2 3
m n o p 1 2 3
How do I replace '1 2 3' with '4 5 6' in the line that contains the letter (e) and move it after the line that contains the letter (k)?
N.B. the line that contains the letter (k) may come in any location in the file, the lines are not assumed to be in any order
My approach is
Remove the line I want to replace
Find the lines before the line I want to move it after
Find the lines after the line I want to move it after
append the output to a file
grep -v 'e' $original > $file
grep -B999 'k' $file > $output
grep 'e' $original | sed 's/1 2 3/4 5 6/' >> $output
grep -A999 'k' $file | tail -n+2 >> $output
rm $file
mv $output $original
but there is a lot of issues in this solution:
a lot of grep commands that seems unnecessary
the argument -A999 and -B999 are assuming the file would not contain lines more than 999, it would be better to have another way to get lines before and after the matched line
I am looking for a more efficient way to achieve that
Using sed
$ sed '/e/{s/1 2 3/4 5 6/;h;d};/k/{G}' input_file
a b c d 1 2 3
i j k l 1 2 3
e f g h 4 5 6
m n o p 1 2 3
Here is a GNU awk solution:
awk '
/\<e\>/{
s=$0
sub("1 2 3", "4 5 6", s)
next
}
/\<k\>/ && s {
printf("%s\n%s\n",$0,s)
next
} 1
' file
Or POSIX awk:
awk '
function has(x) {
for(i=1; i<=NF; i++) if ($i==x) return 1
return 0
}
has("e") {
s=$0
sub("1 2 3", "4 5 6", s)
next
}
has("k") && s {
printf("%s\n%s\n",$0,s)
next
} 1
' file
Either prints:
a b c d 1 2 3
i j k l 1 2 3
e f g h 4 5 6
m n o p 1 2 3
This works regardless of the order of e and k in the file:
awk '
function has(x) {
for(i=1; i<=NF; i++) if ($i==x) return 1
return 0
}
has("e") {
s=$0
sub("1 2 3", "4 5 6", s)
next
}
FNR<NR && has("k") && s {
printf("%s\n%s\n",$0,s)
s=""
next
}
FNR<NR
' file file
This awk should work for you:
awk '
/(^| )e( |$)/ {
sub(/1 2 3/, "4 5 6")
p = $0
next
}
1
/(^| )k( |$)/ {
print p
p = ""
}' file
a b c d 1 2 3
i j k l 1 2 3
e f g h 4 5 6
m n o p 1 2 3
This might work for you (GNU sed):
sed -n '/e/{s/1 2 3/4 5 6/;s#.*#/e/d;/k/s/.*/\&\\n&/#p};' file | sed -f - file
Design a sed script by passing the file twice and applying the sed instructions from the first pass to the second.
Another solution is to use ed:
cat <<\! | ed file
/e/s/1 2 3/4 5 6/
/e/m/k/
wq
!
Or if you prefer:
<<<$'/e/s/1 2 3/4 5 6/\n.m/k/\nwq' ed -s file

Compare two columns of two files, print the row if it matches and print zero in third column

I need to compare column 1 and column 2 of my file1.txt and file2.txt. If both columns match, print the entire row of file1.txt, but where a row in file1.txt is not present in file2.txt, also print that missing row in the output and add "0" as its value in third column.
# file1.txt #
AA ZZ
JB CX
CX YZ
BB XX
SU BY
DA XZ
IB KK
XY IK
TY AB
# file2.txt #
AA ZZ 222
JB CX 345
BB XX 3145
DA XZ 876
IB KK 234
XY IK 897
Expected output
# output.txt #
File1.txt
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 376
IB KK 234
XY IK 897
TY AB 0
I tried this code but couldn't figure out how to add rows that did not match and add "0" to it
awk 'BEGIN { while ((getline <"file2.txt") > 0) {REC[$1]=$0}}{print REC[$1]}' < file1.txt > output.txt
With your shown samples, could you please try following.
awk '
FNR==NR{
arr[$1 OFS $2]
next
}
(($1 OFS $2) in arr){
print
arr1[$1 OFS $2]
}
END{
for(i in arr){
if(!(i in arr1)){
print i,0
}
}
}
' file1.txt file2.txt
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking FNR==NR condition which will be TRUE when file1.txt is being read.
arr[$1 OFS $2] ##Creating array with 1st and 2nd field here.
next ##next will skip all further statements from here.
}
(($1 OFS $2) in arr){ ##Checking condition if 1st and 2nd field of file2.txt is present in arr then do following.
print ##Print the current line here.
arr1[$1 OFS $2] ##Creating array arr1 with index of 1st and 2nd fields here.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr all elements from here.
if(!(i in arr1)){ ##Checking if an element/key is NOT present in arr1 then do following.
print i,0 ##Printing index and 0 here.
}
}
}
' file1.txt file2.txt ##Mentioning Input_file names here.
You may try this awk:
awk '
FNR == NR {
map[$1,$2] = $3
next
}
{
print $1, $2, (($1,$2) in map ? map[$1,$2] : 0)
}' file2 file1
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 876
IB KK 234
XY IK 897
TY AB 0
$ awk '
{ key = $1 FS $2 }
NR==FNR { map[key]=$3; next }
{ print $0, map[key]+0 }
' file2.txt file1.txt
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 876
IB KK 234
XY IK 897
TY AB 0

How to print out lines starting with keyword and connected with backslash with sed or awk

For example, I'd like to print out the line starting with set_2 and connected with \ like this. I'd like to know whether it's possible do to it with sed, awk or any other text process command lines.
< Before >
set_1 abc def
set_2 a b c d\
e f g h\
i j k l\
m n o p
set_3 ghi jek
set_2 aaa bbb\
ccc ddd\
eee fff
set_4 1 2 3 4
< After text process >
set_2 a b c d\
e f g h\
i j k l\
m n o p
set_2 aaa bbb\
ccc ddd\
eee fff
Try the following:
awk -v st="set_2" '/^set/ {set=$1} /\\$/ && set==st { prnt=1 } prnt==1 { print } !/\\$/ { prnt=0 }' file
Explanation:
awk -v st="set_2" ' # Pass the set to track as a variable st
/^set/ {
set=$1 # When the line begins with "set", track the set in the variable set
}
/\\$/ && set==st {
prnt=1 # When we are in the required set block and the line ends with "/", set a print marker (prnt) to 1
}
prnt==1 {
print # When the print marker is 1, print the line
}
!/\\$/ {
prnt=0 # If the line doesn't end with "/". set the print marker to 0
}' file
Would you try the sed solution:
sed -nE '
/^set_2/ { ;# if the line starts with "set_2" execute the block
:a ;# define a label "a"
/\\[[:space:]]*$/! {p; bb} ;# if the line does not end with "\", print the pattern space and exit the block
N ;# append the next line to the pattern space
ba ;# go to label "a"
} ;# end of the block
:b ;# define a label "b"
' file
Please note the character class [[:space:]]* is inserted just because the OP's posted example contains whitespaces after the slash.
[Alternative]
If perl is your option, following will also work:
perl -ne 'print if /^set_2/..!/\\\s*$/' file
This simple awk command should do the job:
awk '!/^[[:blank:]]/ {p = ($1 == "set_2")} p' file
set_2 a b c d\
e f g h\
i j k l\
m n o p
set_2 aaa bbb\
ccc ddd\
eee fff
And with this awk :
awk -F'[[:blank:]]*' '$1 == "set_2" || $NF ~ /\$/ {print $0;f=1} f && $1 == ""' file
set_2 a b c d\
e f g h\
i j k l\
m n o p
set_2 aaa bbb\
ccc ddd\
eee fff
This might work for you (GNU sed):
sed ':a;/set_2/{:b;n;/set_/ba;bb};d' file
If a line contains set_2 print it and go on printing until another line containing set_ then repeat the first test.
Otherwise delete the line.

manipulate input file and create a new file using awk

Input
1473697,5342715,256,0.3
1473697,7028427,256,0.1
1473697,5342716,256,0.3
1473697,5342715,257,0.3
1473697,7028427,257,0.1
1473610,7028427,256,0.1
1473610,5342715,256,0.3
1473610,7028422,256,0.1
Output
1473697,256,5342715 0.3 7028427 0.1 5342716 0.3
1473697,257,5342715 0.3 7028427 0.1
1473610,256,7028427 0.1 5342715 0.3 7028422 0.1
OFS and FS is = ,
is there a way to search unique lines base on column 1 and 3
then print the line with the details from column 2 and 4
It took awhile to figure out what you want, but I think you're looking for:
awk '!a[$1 $3] {a[$1 $3] = $1","$3","}
{a[$1 $3] = a[$1 $3] " " $2 " " $4}
END {for(i in a) print a[i]}' FS=, input-file
or
awk '{a[$1","$3] = a[$1","$3] " " $2 " " $4}
END {for(i in a) print i","a[i]}' FS=, input-file
There are many variations on the theme.

Compare Two Files Using awk Script

I need to validate two files using condition and each record in file is separated using comma.
File1
Prakash,10,20,3(Field Index)
File2
10,25,100
20,25,200
30,20,300
From reading the file 1 FieldIndex I need to sum the corresponding column in File 2(ie 100+200+300) needs to be added.
$ cat > file1
Prakash,10,20,3
$ awk -F, '
NR == FNR {f = $NF; next} # last field of the first file
{ sum += $f } # we're in the 2nd file here since NR != FNR
END { print "sum of field index " f " is " sum }
' file1 file2
sum of field index 3 is 600