how to replace a specific numbers instead certain numbers - awk

I have a text file like this
ID
MQ2427D17-01_1_12
MQ2427D17-01_1_1
MQ2427D17-01_1_2
MQ2427D17-01_1_3
MQ2427D17-01_1_4
MQ2427D17-02_2_5
MQ2427D17-02_2_25
MQ2427D17-02_2_1
MQ2427D17-02_2_2
MQ2427D17-02_2_3
MQ2427D17-02_2_4
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_1
MQ3427D17-01_1_2
MQ3427D17-01_3_3
MQ3427D17-01_3_30
MQ3427D17-01_3_33
I want to change the numbers at the end whenever it is 1 to 13, whenever it is 2 to 14, whenever it is 3 to 15 , whenever it is 4 to 16, whenever it is 5 to 17, whenever it is 6 to 18, whenever it is 7 to 19 , .... whenever it is 12 to 24.
so the output looks like this
ID
MQ2427D17-01_1_24
MQ2427D17-01_1_13
MQ2427D17-01_1_14
MQ2427D17-01_1_15
MQ2427D17-01_1_16
MQ2427D17-02_2_17
MQ2427D17-02_2_25
MQ2427D17-02_2_13
MQ2427D17-02_2_14
MQ2427D17-02_2_15
MQ2427D17-02_2_16
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_13
MQ3427D17-01_1_14
MQ3427D17-01_3_15
MQ3427D17-01_3_30
MQ3427D17-01_3_33
I was trying to do it with this
sed 's/1/13/g' myfile.txt > modified.txt
sed = Stream EDitor
The command string:
s = the substitute command
original = a regular expression describing the number to replace
g = global (i.e. replace all and not just the first occurrence)
myfile.txt = mydata
modified.txt = the output
but this will change the number anywhere they are
I don't know why the solution below does not work, for example on this example data
ID
MQ3HHD2D17-01_1_1
MQ3HHD2D17-01_1_2
MQ3HHD2D17-01_1_3
MQ3HHD2D17-01_1_4
MQ3HHD2D17-01_1_5
MQ3HHD2D17-01_1_6
MQ3HHD2D17-01_1_7
MQ3HHD2D17-01_1_8
MQ3HHD2D17-01_1_9
MQ3HHD2D17-01_1_10
MQ3HHD2D17-01_1_11
MQ3HHD2D17-01_1_12
MQ4HHD2D17-01_2_1
MQ4HHD2D17-01_2_2
MQ4HHD2D17-01_2_3
MQ4HHD2D17-01_2_4
MQ4HHD2D17-01_2_5
MQ4HHD2D17-01_2_6
MQ4HHD2D17-01_2_7
MQ4HHD2D17-01_2_8
MQ4HHD2D17-01_2_9
MQ4HHD2D17-01_2_10
MQ4HHD2D17-01_2_11
MQ4HHD2D17-01_2_12
It should be
ID
MQ3HHD2D17-01_1_13
MQ3HHD2D17-01_1_14
MQ3HHD2D17-01_1_15
MQ3HHD2D17-01_1_16
MQ3HHD2D17-01_1_17
MQ3HHD2D17-01_1_18
MQ3HHD2D17-01_1_19
MQ3HHD2D17-01_1_20
MQ3HHD2D17-01_1_21
MQ3HHD2D17-01_1_22
MQ3HHD2D17-01_1_23
MQ3HHD2D17-01_1_24
MQ4HHD2D17-01_2_13
MQ4HHD2D17-01_2_14
MQ4HHD2D17-01_2_15
MQ4HHD2D17-01_2_16
MQ4HHD2D17-01_2_17
MQ4HHD2D17-01_2_18
MQ4HHD2D17-01_2_19
MQ4HHD2D17-01_2_20
MQ4HHD2D17-01_2_21
MQ4HHD2D17-01_2_22
MQ4HHD2D17-01_2_23
MQ4HHD2D17-01_2_24

From your description, we can observe a pattern: adding 12 to the end-numbers if the end-numbers are below 12. (Here, end-numbers refer to the numbers after the last underscore.)
awk can accomplish this task.
awk -F_ -v OFS=_ '{if($NF <= 12) $NF += 12;}1' myfile.txt >modified.txt
Flags:
-F_: input delimiter is _
-v OFS=_: one of awk's special variables, denoting the Output File Seperator (aka the output delimiter)
Others:
NF: another one of awk's special variables, denoting the Number of Fields
$NF: this will get the variable holding the last field.
{...}1: the 1 at the end tells awk to print everything.
I personally wouldn't recommend using sed since you'll need to replace 1 with 13, 2 with 14, 3 with 15, (and so on) individually. This makes it a mm... tedious candidate to manhandle. On the other hand, awk can perform basic mathematical computations (such as +12 as you've seen) while still being able to parse input.
Output:
ID
MQ2427D17-01_1_24
MQ2427D17-01_1_13
MQ2427D17-01_1_14
MQ2427D17-01_1_15
MQ2427D17-01_1_16
MQ2427D17-02_2_17
MQ2427D17-02_2_25
MQ2427D17-02_2_13
MQ2427D17-02_2_14
MQ2427D17-02_2_15
MQ2427D17-02_2_16
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_13
MQ3427D17-01_1_14
MQ3427D17-01_3_15
MQ3427D17-01_3_30
MQ3427D17-01_3_33

Could you please try following.
awk 'BEGIN{FS=OFS="_"} $NF>=1 && $NF<=12{$NF+=12} 1' Input_file
OR
awk 'BEGIN{FS=OFS="_"} {gsub(/\r/,"")} $NF>=1 && $NF<=12{$NF+=12} 1' Input_file
OR
tr -d '\r' < Input_file > temp && mv temp Input_file
awk 'BEGIN{FS=OFS="_"} $NF>=1 && $NF<=12{$NF+=12} 1' Input_file
After doing troubleshooting with user in chatroom came to know that OP has control M characters(which OP doesn't want to have) so advised OP to remove them by doing tr -d '\r' < Input_file > temp && mv temp Input_file and then run above code.

A generic solution using Perl one-liner
perl -pe ' s/(\d+)$/$1<13?$1+12:$1/ge '
with inputs.
$ perl -pe ' s/(\d+)$/ $1<13 ? $1+12 : $1/ge ' learner.txt
ID
MQ2427D17-01_1_24
MQ2427D17-01_1_13
MQ2427D17-01_1_14
MQ2427D17-01_1_15
MQ2427D17-01_1_16
MQ2427D17-02_2_17
MQ2427D17-02_2_25
MQ2427D17-02_2_13
MQ2427D17-02_2_14
MQ2427D17-02_2_15
MQ2427D17-02_2_16
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_13
MQ3427D17-01_1_14
MQ3427D17-01_3_15
MQ3427D17-01_3_30
MQ3427D17-01_3_33
$

Related

Using awk pattern to file filter data

I have the folling file(named /tmp/test99) which containd the rows:
"0","15","wall15"
123132,09808098,"0","15"
I am trying to filter the rows that contains "0" in the 3rd place, and "15" in 4th place (like in the second row)
I tried running:
cat /tmp/test99 | awk '/"0","15"/{print>"/tmp/0_15_file.out"} '
but instead of getting only the second row, I get also the first row starting with "0","15".
Could you please help with the pattern ?
Thanks:)
You may check if Fields 3 and 4 are equal to some hardcoded value using
awk -F, '$3=="\"0\"" && $4=="\"15\""'
Set the field separator to a comma and then, if Field 3 is "0" and Field 4 is "15" print the line, else discard.
See the online demo:
s='"0","15","wall15"
123132,09808098,"0","15"'
awk -F, '$3=="\"0\"" && $4=="\"15\""' <<< "$s"
# => 123132,09808098,"0","15"
Could you please try following.(comment on your effort, you need NOT to use cat with awk it could read Input_file by itself)
awk -F, '$3!~/\"0\"/ && $4!~/\"15\"/' Input_file

Duplicate Lines 2 times and transpose from row to column

I will like to duplicate each line 2 times and print values of column 5 and 6 separated.( transpose values of column 5 and 6 from column to row ) for each line
I mean value on column 5 (first line) value in column 6 ( second line)
Input File
08,1218864123180000,3201338573,VV,22,27
08,1218864264864000,3243738789,VV,15,23
08,1218864278580000,3244738513,VV,3,13
08,1218864310380000,3243938789,VV,15,23
08,1218864324180000,3244538513,VV,3,13
08,1218864334380000,3200538561,VV,22,27
Desired Output
08,1218864123180000,3201338573,VV,22
08,1218864123180000,3201338573,VV,27
08,1218864264864000,3243738789,VV,15
08,1218864264864000,3243738789,VV,23
08,1218864278580000,3244738513,VV,3
08,1218864278580000,3244738513,VV,13
08,1218864310380000,3243938789,VV,15
08,1218864310380000,3243938789,VV,23
08,1218864324180000,3244538513,VV,3
08,1218864324180000,3244538513,VV,13
08,1218864334380000,3200538561,VV,22
08,1218864334380000,3200538561,VV,27
I use this code to duplicate the lines 2 times, but i cant'n figer out the condition with values of column 5 and 6
awk '{print;print}' file
Thanks in advance
To repeatedly print the start of a line for each of the last N fields where N is 2 in this case:
$ awk -v n=2 '
BEGIN { FS=OFS="," }
{
base = $0
sub("("FS"[^"FS"]+){"n"}$","",base)
for (i=NF-n+1; i<=NF; i++) {
print base, $i
}
}
' file
08,1218864123180000,3201338573,VV,22
08,1218864123180000,3201338573,VV,27
08,1218864264864000,3243738789,VV,15
08,1218864264864000,3243738789,VV,23
08,1218864278580000,3244738513,VV,3
08,1218864278580000,3244738513,VV,13
08,1218864310380000,3243938789,VV,15
08,1218864310380000,3243938789,VV,23
08,1218864324180000,3244538513,VV,3
08,1218864324180000,3244538513,VV,13
08,1218864334380000,3200538561,VV,22
08,1218864334380000,3200538561,VV,27
In this simple case where the last field has to be removed and placed on the last line, you can do
awk -F , -v OFS=, '{ x = $6; NF = 5; print; $5 = x; print }'
Here -F , and -v OFS=, will set the input and output field separators to a comma, respectively, and the code does
{
x = $6 # remember sixth field
NF = 5 # Set field number to 5, so the last one won't be printed
print # print those first five fields
$5 = x # replace value of fifth field with remembered value of sixth
print # print modified line
}
This approach can be extended to handle fields in the middle with a function like the one in the accepted answer of this question.
EDIT: As Ed notes in the comments, writing to NF is not explicitly defined to trigger a rebuild of $0 (the whole-line record that print prints) in the POSIX standard. The above code works with GNU awk and mawk, but with BSD awk (as found on *BSD and probably Mac OS X) it fails to do anything.
So to be standards-compliant, we have to be a little more explicit and force awk to rebuild $0 from the modified field state. This can be done by assigning to any of the field variables $1...$NF, and it's common to use $1=$1 when this problem pops up in other contexts (for example: when only the field separator needs to be changed but not any of the data):
awk -F , -v OFS=, '{ x = $6; NF = 5; $1 = $1; print; $5 = x; print }'
I've tested this with GNU awk, mawk and BSD awk (which are all the awks I can lay my hands on), and I believe this to be covered by the awk bit in POSIX where it says "setting any other field causes the re-evaluation of $0" right at the top. Mind you, the spec could be more explicit on this point, and I'd be interested to test if more exotic awks behave the same way.
Could you please try following(considering that your Input_file always is same as shown and you need to print every time 1st four fields and then rest of the fields(one by one printing along with 1st four)).
awk 'BEGIN{FS=OFS=","}{for(i=5;i<=NF;i++){print $1,$2,$3,$4,$i}}' Input_file
This might work for you (GNU awk):
awk '{print gensub(/((.*,).*),/,"\\1\n\\2",1)}' file
Replace the last comma by a newline and the previous fields less the penultimate.

How can I replace all middle characters with '*'?

I would like to replace middle of word with ****.
For example :
ifbewofiwfib
wofhwifwbif
iwjfhwi
owfhewifewifewiwei
fejnwfu
fehiw
wfebnueiwbfiefi
Should become :
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
So far I managed to replace all but the first 2 chars with:
sed -e 's/./*/g3'
Or do it the long way:
grep -o '^..' file > start
cat file | sed 's:^..\(.*\)..$:\1:' | awk -F. '{for (i=1;i<=length($1);i++) a=a"*";$1=a;a=""}1' > stars
grep -o '..$' file > end
paste -d "" start stars > temp
paste -d "" temp end > final
I would use Awk for this, if you have a GNU Awk to set the field separator to an empty string (How to set the field separator to an empty string?).
This way, you can loop through the chars and replace the desired ones with "*". In this case, replace from the 3rd to the 3rd last:
$ awk 'BEGIN{FS=OFS=""}{for (i=3; i<=NF-2; i++) $i="*"} 1' file
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
If perl is okay:
$ perl -pe 's/..\K.*(?=..)/"*" x length($&)/e' ip.txt
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
..\K.*(?=..) to match characters other than first/last two characters
See regex lookarounds section for details
e modifier allows to use Perl code in replacement section
"*" x length($&) use length function and string repetition operator to get desired replacement string
You can do it with a repetitive substitution, e.g.:
sed -E ':a; s/^(..)([*]*)[^*](.*..)$/\1\2*\3/; ta'
Explanation
This works by repeating the substitution until no change happens, that is what the :a; ...; ta bit does. The substitution consists of 3 matched groups and a non-asterisk character:
(..) the start of the string.
([*]*) any already replaced characters.
[^*] the character to be replaced next.
(.*..) any remaining characters to replace and the end of the string.
Alternative GNU sed answer
You could also do this by using the hold space which might be simpler to read, e.g.:
h # save a copy to hold space
s/./*/g3 # replace all but 2 by *
G # append hold space to pattern space
s/^(..)([*]*)..\n.*(..)$/\1\2\3/ # reformat pattern space
Run it like this:
sed -Ef parse.sed input.txt
Output in both cases
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
Following awk may help you on same. It should work in any kind of awk versions.
awk '{len=length($0);for(i=3;i<=(len-2);i++){val=val "*"};print substr($0,1,2) val substr($0,len-1);val=""}' Input_file
Adding a non-one liner form of solution too now.
awk '
{
len=length($0);
for(i=3;i<=(len-2);i++){
val=val "*"};
print substr($0,1,2) val substr($0,len-1);
val=""
}
' Input_file
Explanation: Adding explanation now for above code too.
awk '
{
len=length($0); ##Creating variable named len whose value is length of current line.
for(i=3;i<=(len-2);i++){ ##Starting for loop which starts from i=3 too till len-2 value and doing following:
val=val "*"}; ##Creating a variable val whose value is concatenating the value of it within itself.
print substr($0,1,2) val substr($0,len-1);##Printing substring first 2 chars and variable val and then last 2 chars of the current line.
val="" ##Nullifying the variable val here, so that old values should be nullified for this variable.
}
' Input_file ##Mentioning the Input_file name here.

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

how to get the output of 'system' command in awk

I have a file and a field is a time stamp like 20141028 20:49:49, I want to get the hour 20, so I use the system command :
hour=system("date -d\""$5"\" +'%H'")
the time stamp is the fifth field in my file so I used $5. But when I executed the program I found the command above just output 20 and return 0 so hour is 0 but not 20, so my question is how to get the hour in the time stamp ?
I know a method which use split function two times like this:
split($5, vec, " " )
split(vec[2], vec2, ":")
But this method is a little inefficient and ugly.
so are there any other solutions? Thanks
Another way using gawk:
gawk 'match($5, " ([0-9]+):", r){print r[1]}' input_file
If you want to know how to manage externall process output in awk:
awk '{cmd="date -d \""$5"\" +%H";cmd|getline hour;print hour;close(cmd)}' input_file
You can use the substr function to extract the hour without using system command.
for example:
awk {'print substr("20:49:49",1,2)}'
will produce output as
20
Or more specifically as in question
$ awk {'print substr("20141028 20:49:49",10,2)}'
20
substr(str, pos, len) extracts a substring from str at position pos and lenght len
if the value of $5 is 20141028 20:49:49,
$ awk {'print substr($5,10,2)}'
20