I have the folling file(named /tmp/test99) which containd the rows:
"0","15","wall15"
123132,09808098,"0","15"
I am trying to filter the rows that contains "0" in the 3rd place, and "15" in 4th place (like in the second row)
I tried running:
cat /tmp/test99 | awk '/"0","15"/{print>"/tmp/0_15_file.out"} '
but instead of getting only the second row, I get also the first row starting with "0","15".
Could you please help with the pattern ?
Thanks:)
You may check if Fields 3 and 4 are equal to some hardcoded value using
awk -F, '$3=="\"0\"" && $4=="\"15\""'
Set the field separator to a comma and then, if Field 3 is "0" and Field 4 is "15" print the line, else discard.
See the online demo:
s='"0","15","wall15"
123132,09808098,"0","15"'
awk -F, '$3=="\"0\"" && $4=="\"15\""' <<< "$s"
# => 123132,09808098,"0","15"
Could you please try following.(comment on your effort, you need NOT to use cat with awk it could read Input_file by itself)
awk -F, '$3!~/\"0\"/ && $4!~/\"15\"/' Input_file
I will like to duplicate each line 2 times and print values of column 5 and 6 separated.( transpose values of column 5 and 6 from column to row ) for each line
I mean value on column 5 (first line) value in column 6 ( second line)
Input File
08,1218864123180000,3201338573,VV,22,27
08,1218864264864000,3243738789,VV,15,23
08,1218864278580000,3244738513,VV,3,13
08,1218864310380000,3243938789,VV,15,23
08,1218864324180000,3244538513,VV,3,13
08,1218864334380000,3200538561,VV,22,27
Desired Output
08,1218864123180000,3201338573,VV,22
08,1218864123180000,3201338573,VV,27
08,1218864264864000,3243738789,VV,15
08,1218864264864000,3243738789,VV,23
08,1218864278580000,3244738513,VV,3
08,1218864278580000,3244738513,VV,13
08,1218864310380000,3243938789,VV,15
08,1218864310380000,3243938789,VV,23
08,1218864324180000,3244538513,VV,3
08,1218864324180000,3244538513,VV,13
08,1218864334380000,3200538561,VV,22
08,1218864334380000,3200538561,VV,27
I use this code to duplicate the lines 2 times, but i cant'n figer out the condition with values of column 5 and 6
awk '{print;print}' file
Thanks in advance
To repeatedly print the start of a line for each of the last N fields where N is 2 in this case:
$ awk -v n=2 '
BEGIN { FS=OFS="," }
{
base = $0
sub("("FS"[^"FS"]+){"n"}$","",base)
for (i=NF-n+1; i<=NF; i++) {
print base, $i
}
}
' file
08,1218864123180000,3201338573,VV,22
08,1218864123180000,3201338573,VV,27
08,1218864264864000,3243738789,VV,15
08,1218864264864000,3243738789,VV,23
08,1218864278580000,3244738513,VV,3
08,1218864278580000,3244738513,VV,13
08,1218864310380000,3243938789,VV,15
08,1218864310380000,3243938789,VV,23
08,1218864324180000,3244538513,VV,3
08,1218864324180000,3244538513,VV,13
08,1218864334380000,3200538561,VV,22
08,1218864334380000,3200538561,VV,27
In this simple case where the last field has to be removed and placed on the last line, you can do
awk -F , -v OFS=, '{ x = $6; NF = 5; print; $5 = x; print }'
Here -F , and -v OFS=, will set the input and output field separators to a comma, respectively, and the code does
{
x = $6 # remember sixth field
NF = 5 # Set field number to 5, so the last one won't be printed
print # print those first five fields
$5 = x # replace value of fifth field with remembered value of sixth
print # print modified line
}
This approach can be extended to handle fields in the middle with a function like the one in the accepted answer of this question.
EDIT: As Ed notes in the comments, writing to NF is not explicitly defined to trigger a rebuild of $0 (the whole-line record that print prints) in the POSIX standard. The above code works with GNU awk and mawk, but with BSD awk (as found on *BSD and probably Mac OS X) it fails to do anything.
So to be standards-compliant, we have to be a little more explicit and force awk to rebuild $0 from the modified field state. This can be done by assigning to any of the field variables $1...$NF, and it's common to use $1=$1 when this problem pops up in other contexts (for example: when only the field separator needs to be changed but not any of the data):
awk -F , -v OFS=, '{ x = $6; NF = 5; $1 = $1; print; $5 = x; print }'
I've tested this with GNU awk, mawk and BSD awk (which are all the awks I can lay my hands on), and I believe this to be covered by the awk bit in POSIX where it says "setting any other field causes the re-evaluation of $0" right at the top. Mind you, the spec could be more explicit on this point, and I'd be interested to test if more exotic awks behave the same way.
Could you please try following(considering that your Input_file always is same as shown and you need to print every time 1st four fields and then rest of the fields(one by one printing along with 1st four)).
awk 'BEGIN{FS=OFS=","}{for(i=5;i<=NF;i++){print $1,$2,$3,$4,$i}}' Input_file
This might work for you (GNU awk):
awk '{print gensub(/((.*,).*),/,"\\1\n\\2",1)}' file
Replace the last comma by a newline and the previous fields less the penultimate.
I would like to replace middle of word with ****.
For example :
ifbewofiwfib
wofhwifwbif
iwjfhwi
owfhewifewifewiwei
fejnwfu
fehiw
wfebnueiwbfiefi
Should become :
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
So far I managed to replace all but the first 2 chars with:
sed -e 's/./*/g3'
Or do it the long way:
grep -o '^..' file > start
cat file | sed 's:^..\(.*\)..$:\1:' | awk -F. '{for (i=1;i<=length($1);i++) a=a"*";$1=a;a=""}1' > stars
grep -o '..$' file > end
paste -d "" start stars > temp
paste -d "" temp end > final
I would use Awk for this, if you have a GNU Awk to set the field separator to an empty string (How to set the field separator to an empty string?).
This way, you can loop through the chars and replace the desired ones with "*". In this case, replace from the 3rd to the 3rd last:
$ awk 'BEGIN{FS=OFS=""}{for (i=3; i<=NF-2; i++) $i="*"} 1' file
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
If perl is okay:
$ perl -pe 's/..\K.*(?=..)/"*" x length($&)/e' ip.txt
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
..\K.*(?=..) to match characters other than first/last two characters
See regex lookarounds section for details
e modifier allows to use Perl code in replacement section
"*" x length($&) use length function and string repetition operator to get desired replacement string
You can do it with a repetitive substitution, e.g.:
sed -E ':a; s/^(..)([*]*)[^*](.*..)$/\1\2*\3/; ta'
Explanation
This works by repeating the substitution until no change happens, that is what the :a; ...; ta bit does. The substitution consists of 3 matched groups and a non-asterisk character:
(..) the start of the string.
([*]*) any already replaced characters.
[^*] the character to be replaced next.
(.*..) any remaining characters to replace and the end of the string.
Alternative GNU sed answer
You could also do this by using the hold space which might be simpler to read, e.g.:
h # save a copy to hold space
s/./*/g3 # replace all but 2 by *
G # append hold space to pattern space
s/^(..)([*]*)..\n.*(..)$/\1\2\3/ # reformat pattern space
Run it like this:
sed -Ef parse.sed input.txt
Output in both cases
if********ib
wo*******if
iw***wi
ow**************ei
fe***fu
fe*iw
wf***********fi
Following awk may help you on same. It should work in any kind of awk versions.
awk '{len=length($0);for(i=3;i<=(len-2);i++){val=val "*"};print substr($0,1,2) val substr($0,len-1);val=""}' Input_file
Adding a non-one liner form of solution too now.
awk '
{
len=length($0);
for(i=3;i<=(len-2);i++){
val=val "*"};
print substr($0,1,2) val substr($0,len-1);
val=""
}
' Input_file
Explanation: Adding explanation now for above code too.
awk '
{
len=length($0); ##Creating variable named len whose value is length of current line.
for(i=3;i<=(len-2);i++){ ##Starting for loop which starts from i=3 too till len-2 value and doing following:
val=val "*"}; ##Creating a variable val whose value is concatenating the value of it within itself.
print substr($0,1,2) val substr($0,len-1);##Printing substring first 2 chars and variable val and then last 2 chars of the current line.
val="" ##Nullifying the variable val here, so that old values should be nullified for this variable.
}
' Input_file ##Mentioning the Input_file name here.
I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time
I have a file and a field is a time stamp like 20141028 20:49:49, I want to get the hour 20, so I use the system command :
hour=system("date -d\""$5"\" +'%H'")
the time stamp is the fifth field in my file so I used $5. But when I executed the program I found the command above just output 20 and return 0 so hour is 0 but not 20, so my question is how to get the hour in the time stamp ?
I know a method which use split function two times like this:
split($5, vec, " " )
split(vec[2], vec2, ":")
But this method is a little inefficient and ugly.
so are there any other solutions? Thanks
Another way using gawk:
gawk 'match($5, " ([0-9]+):", r){print r[1]}' input_file
If you want to know how to manage externall process output in awk:
awk '{cmd="date -d \""$5"\" +%H";cmd|getline hour;print hour;close(cmd)}' input_file
You can use the substr function to extract the hour without using system command.
for example:
awk {'print substr("20:49:49",1,2)}'
will produce output as
20
Or more specifically as in question
$ awk {'print substr("20141028 20:49:49",10,2)}'
20
substr(str, pos, len) extracts a substring from str at position pos and lenght len
if the value of $5 is 20141028 20:49:49,
$ awk {'print substr($5,10,2)}'
20