changing value of first column in a file using awk - awk

I have a .csv file that has multiple columns and I wanted to change the value of the first column in the file to "positive" for all rows except when it is "negative"
So if I have the file
product,0 0,no way
brand,0 0 0,detergent
product,0 0 1,sugar
negative,0 0 1, sight
I want to make it
positive,0 0,no way
positive,0 0,detergent
positive,0 0 1,sugar
negative,0 0 1, sight
How can I accomplish this using awk?

this awk one-liner should help you:
awk -F, -v OFS="," '$1="positive"' file
since you tagged with sed:
sed 's/[^,]*/positive/' file
update for OP's new requirement:
awk -F, -v OFS="," '$1="negative"==$1?$1:"positive"' file
or:
awk -F, -v OFS="," '"negative"==$1||$1="positive"' file

This might work for you (GNU sed):
sed '/^negative,/!s/^[^,]*/positive/' file
If the first word is not negative change it to positive.

Related

How can i merge every block of 3 lines together while ignoring lower numbers of consecutive lines?

I have a text file like the following, contains blocks of text, blocks are in multiples of 3 lines or just 1 line:
AAAAAAAAAAAAA
BBBBBBBBBBBBB
CCCCCCCCCCCCC
DDDDDDDDDDDDD
EEEEEEEEEEEEE
FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH
IIIIIIIIIIIII
JJJJJJJJJJJJJ
KKKKKKKKKKKKK
LLLLLLLLLLLLL
MMMMMMMMMMMMM
NNNNNNNNNNNNN
OOOOOOOOOOOOO
PPPPPPPPPPPPP
QQQQQQQQQQQQQ
RRRRRRRRRRRRR
SSSSSSSSSSSSS
TTTTTTTTTTTTT
UUUUUUUUUUUUU
VVVVVVVVVVVVV
WWWWWWWWWWWWW
XXXXXXXXXXXXX
YYYYYYYYYYYYY
ZZZZZZZZZZZZZ
1111111111111
I would like to merge every block of 3 consecutive lines together, starting with the first in the block. I want to ignore lines that are in less then a block of 3 consecutive lines.
Characters and lengths of lines are always different. ( i have made the lines the same size in the example so it doesn't look too ugly).
So the output would be
AAAAAAAAAAAAA BBBBBBBBBBBBB CCCCCCCCCCCCC
DDDDDDDDDDDDD EEEEEEEEEEEEE FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH IIIIIIIIIIIII JJJJJJJJJJJJJ
KKKKKKKKKKKKK
LLLLLLLLLLLLL MMMMMMMMMMMMM NNNNNNNNNNNNN
OOOOOOOOOOOOO PPPPPPPPPPPPP QQQQQQQQQQQQQ
RRRRRRRRRRRRR SSSSSSSSSSSSS TTTTTTTTTTTTT
UUUUUUUUUUUUU
VVVVVVVVVVVVV WWWWWWWWWWWWW XXXXXXXXXXXXX
YYYYYYYYYYYYY ZZZZZZZZZZZZZ 1111111111111
I have tried to use
xargs -n3
However im not sure how to ignore singular lines
How can i acheive this?
With GNU awk for gensub():
$ awk -v RS= -v ORS='\n\n' '{$1=$1; print gensub(/(([^ ]+ ){2}[^ ]+) /,"\\1\n","g")}' file
AAAAAAAAAAAAA BBBBBBBBBBBBB CCCCCCCCCCCCC
DDDDDDDDDDDDD EEEEEEEEEEEEE FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH IIIIIIIIIIIII JJJJJJJJJJJJJ
KKKKKKKKKKKKK
LLLLLLLLLLLLL MMMMMMMMMMMMM NNNNNNNNNNNNN
OOOOOOOOOOOOO PPPPPPPPPPPPP QQQQQQQQQQQQQ
RRRRRRRRRRRRR SSSSSSSSSSSSS TTTTTTTTTTTTT
UUUUUUUUUUUUU
VVVVVVVVVVVVV WWWWWWWWWWWWW XXXXXXXXXXXXX
YYYYYYYYYYYYY ZZZZZZZZZZZZZ 1111111111111
In awk:
$ awk -v FS="\n" -v RS="" '{for(i=1;i<=NF;i+=3)print $i,$(i+1),$(i+2);print ""}' file
Output:
AAAAAAAAAAAAA BBBBBBBBBBBBB CCCCCCCCCCCCC
DDDDDDDDDDDDD EEEEEEEEEEEEE FFFFFFFFFFFFF
GGGGGGGGGGGGG
HHHHHHHHHHHHH IIIIIIIIIIIII JJJJJJJJJJJJJ
...
Update Version that won't leave trailing space:
$ awk -v FS="\n" -v RS="" '{for(i=1;i<=NF;i++)printf "%s%s",$i,(i%3==0||i==NF?ORS:OFS);print ""}' file
Please see discussion on some features in the comments. Thanks to the commentators for the constructive feedback.
Here is a different which will always work:
awk '(NF==0){print rec ORS; rec="";c=0; next}
{rec = rec (c ? (c%3==0 ? ORS : OFS) : "") $0; c++ }
END {print rec}' file
This might work for you (GNU sed):
sed '/\S/{N;/\n\s*$/b;N;//b;s/\n/ /g}' file
If the current line is not empty, append the next line.
If the appended line is not empty, append the next line.
If that line is also not empty, replace the newlines by spaces.
In all other cases print the line(s) as is.
An alternative, that is more programmatic:
sed ':a;N;s/\n/&/2;Ta;/^\s*$/M{P;D};s/\n/ /g' file

awk condition for 30th column of a line is not working

My Input file looks like below,
1,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,,kapeta,,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,C
2,,B4,3000,Rushab,UNI,20130919T22:45:05+0100,20190930T23:59:59+0100,20181006T11:57:13+0100,,vsuser,6741948090816,2285917436,971078887,1283538808965528,20181102_20001,,,,,,,,,,,,,,,H
1,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,,enrruac,,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,C
4,,F1,100000,RAWBANK,UNI,20180416T15:25:00+0100,20190416T23:59:59+0100,20181007T22:25:13+0100,,vsuser,7522609506635,3101315044,998445487,1290161608965816,20181102_20001,,,,,,,,,,,,,,,H
i want to print only the line that are starting with '1' and ends with 'C', so i am trying with below command,
awk -F, '$1=='1' && $31=='C'{print $0}' input_file.txt
but i am not getting any output.
Use double quotes:
awk -F, '$1=="1" && $31=="C"{print $0}' file
or
awk -F, '$1=="1" && $31=="C"' file
As other users suggested, this can be done using a simple regex. So you can use sed as well as awk
sed '/^1,.*,C$/!d' file

awk print several substring

I would like to be able to print several substrings via awk.
Here an example of what I usually do;
awk' {print substr($0,index($0,string),10)} ' test.txt > result.txt
This allow me to print 10 letters after the discovery of my string.
But the result is the first one substring, instead of several as I expected.
Here an example if I use the string "ATGC" :
test.txt
ATGCATATAAATGCTTTTTTTTT
result.txt
ATGCATATAA
instead of
ATGCATATAA
ATGCTTTTTT
What I have to add ?
I'm sure the answer is easy for you guys !
Thank you for your help.
If you have gawk (gnu awk), you can make use of FPAT:
awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' file
With your example:
$ awk -v FPAT='ATGC.{6}' '{for(i=1;i<=NF;i++)print $i}' <<<"ATGCATATAAATGCTTTTTTTTT"
ATGCATATAA
ATGCTTTTTT
awk '{print substr($0,1,10),RS substr($0,length -12,10)}' file
ATGCATATAA
ATGCTTTTTT

Print line modified and the line after using awk

I want to modify lines in a file using awk and print the new lines with the following line.
My file is like this
Name_Name2_ Name3_Name4
ASHRGSJFSJRGDJRG
Name5_Name6_Name7_Name8
ADGTHEGHGTJKLGRTIWRK
I want
Name-Name2
ASHRGSJFSJRGDJRG
Name5-Name6
ADGTHEGHGTJKLGRTIWRK
I have sued awk to modify my file:
awk -F'_' {print $1 "-" $2} file > newfile
but I don't know how to tell to print also the line just after (ABDJRH)
sure is it possible with awk x=NR+1 NR<=x
thanks
Following awk may help you on same.
awk -F"_" '/_/{print $1"-"$2;next} 1' Input_file
assuming your structure in sample (no separation in line with "data" letter )
awk '$0=$1' Input_file
# or with sed
sed 's/[[:space:]].*//' Input_file

Using each line of awk output as grep pattern

I want to find every line of a file that contains any of the strings held in a column of a different file.
I have tried
grep "$(awk '{ print $1 }' file1.txt)" file2.txt
but that just outputs file2.txt in its entirety.
I know I've done this before with a pattern I found on this site, but I can't find that question anymore.
I see in the OP's comment that maybe the question is no longer a question. However, the following slight modification will handle the blank line situation. Just add a check to make sure the line has at least one field:
grep "$(awk '{if (NF > 0) print $1}' file1)" file2
And if the file with the patterns is simply a set of patterns per line, then a much simpler version of it is:
grep -f file1 file2
That causes grep to use the lines in file1 as the patterns.
THere is no need to use grep when you have awk
awk 'FNR==NR&&NF{a[$0];next}($1 in a)' file2 file1
$(awk '{ print $1 }' file1.txt) | grep text > file.txt