Edit header file with awk - header

I have a file that is white-space separated value, i need to convert this into:
header=tab separated,
records=" ; " separated (space-semicolon-space)
what i'm doing now is:
cat ${original} | awk 'END {FS=" "} { for(i=1; i<=NR; i++) {if (i==1) { OFS="\t"; print $0; } else { OFS=";" ;print $0; }}}' > ${new}
But is working only partly, first, it produces millions of lines, while the original ones has about 90000.
Second, the header, which should be modified here:
if (i==1) { OFS="\t"; print $0; }
Is not modified at all,
Another option would be by using sed, i can get that job to be done partially, but again the header remains untouched:
cat ${original} | sed 's/\t/ ;/g' > ${new}

this line should change all the separator in file
awk -F'\t' -v OFS=";" '$1=$1' file
this will leave header untouched:
awk -F'\t' -v OFS=";" 'NR>1{$1=$}1' file
this will only change the header line:
awk -F'\t' -v OFS=";" 'NR==1{$1=$1}1' file
you could paste some example to let us know why your header was not modified.

Related

If one string matches at the beginning of last line in a specific file then replace the other string from same line.using regex groups?

I have a file "test"
Below is the content
235788###235788###20200724_103122###SUCCESS
235791###235791###20200724_105934###SUCCESS
235833###235833###20200724_130652###FAILURE
235842###235842###20200724_132721###FAILURE
235852###235852###20200724_134607###FAILURE
235791###235791###20200724_105934###SUCCESS
if last line of this file begin with 235791 then replace string "SUCCESS" to "FAILURE" on just that line.
Expected Output
235788###235788###20200724_103122###SUCCESS
235791###235791###20200724_105934###SUCCESS
235833###235833###20200724_130652###FAILURE
235842###235842###20200724_132721###FAILURE
235852###235852###20200724_134607###FAILURE
235791###235791###20200724_105934###FAILURE
Below is the sample code
id = 235791
last_build_id = `tail -1 test | awk -F'###' '{print \$1}'`
if (id == last_build_id ){
sed -i '$s/SUCCESS/FAILURE/' test
}
I would like to avoid these many lines and use one line shell command using regex groups or in any other simple way.
sed might be easier here
$ sed -E '$s/(^235791#.*)SUCCESS$/\1FAILURE/' file
you can add -i for in place update.
To pass id as a variable
$ id=235791; sed -E '$s/(^'$id'#.*)SUCCESS$/\1FAILURE/' file
you should double quote "$id" ideally, but if you're sure about the contents you may get away without.
With GNU sed
sed -E '${/^235791\>/ s/SUCCESS$/FAILURE/}' file
Or with the BSD sed on MacOS
sed -E '${/^235791#/ s/SUCCESS$/FAILURE/;}' file
When working with "the last X in the file", it's often easier to reverse the file and work with "the first X":
tac file | awk '
BEGIN {FS = OFS = "###"}
NR == 1 && $1 == 235791 && $NF == "SUCCESS" {$NF = "FAILURE"}
1
' | tac
Could you please try following, written and tested with shown samples in GNU awk. You need not to use many commands for this one, we could do this in a single awk itself.
One liner form of code:
awk -v id="$your_shell_variable" 'BEGIN{ FS=OFS="###" } NR>1{print prev} {prev=$0} END{if($1==id && $NF=="SUCCESS"){$NF="FAILURE"}; print}' Input_file > temp && mv temp Input_file
Explanation: Adding detailed explanation for above.
awk -v id="$your_shell_variable"' ##Starting awk program from here.
NR>1{ ##Checking condition if prev is NOT NULL then do following.
print prev ##Printing prev here.
}
{
prev=$0 ##Assigning current line to prev here.
}
END{ ##Starting END block of this program from here.
if($1==id && $NF=="SUCCESS"){ ##Checking condition if first field is 235791 and last field is SUCCESS then do following.
$NF="FAILURE" ##Setting last field FAILURE here.
}
print ##Printing last line here.
}
' Input_file > temp && mv temp Input_file ##Mentioning Input_file name here.
2nd solution: As per Ed sir's comment some awk's don't support $1, $NF in END sections so if above doesn't work for someone please try more generic solution as follows.
One liner form of solution(since specifically asking it):
awk -v id="$your_shell_variable" 'BEGIN{ FS=OFS="###" } NR>1{print prev} {prev=$0} END{num=split(prev,array,"###");if(array[1]==id && array[num]=="SUCCESS"){array[num]="FAILURE"};for(i=1;i<=num;i++){val=(val?val OFS:"")array[i]};print val}' Input_file > temp && mv temp Input_file
Detailed level(non-one liner code):
awk -v id="$your_shell_variable" '
BEGIN{ FS=OFS="###" }
NR>1{
print prev
}
{
prev=$0
}
END{
num=split(prev,array,"###")
if(array[1]==id && array[num]=="SUCCESS"){
array[num]="FAILURE"
}
for(i=1;i<=num;i++){
val=(val?val OFS:"")array[i]
}
print val
}
' Input_file > temp && mv temp Input_file
$ awk -v val='235791' '
BEGIN { FS=OFS="###" }
NR>1 { print prev }
{ prev=$0 }
END {
$0=prev
if ($1 == val) {
$NF="FAILURE"
}
print
}
' file
235788###235788###20200724_103122###SUCCESS
235791###235791###20200724_105934###SUCCESS
235833###235833###20200724_130652###FAILURE
235842###235842###20200724_132721###FAILURE
235852###235852###20200724_134607###FAILURE
235791###235791###20200724_105934###FAILURE

Awk column with pattern array

Is it possible to do this but use an actual array of strings where it says "array"
array=(cat
dog
mouse
fish
...)
awk -F "," '{ if ( $5!="array" ) { print $0; } }' file
I would like to use spaces in some of the strings in my array.
I would also like to be able to match partial matches, so "snow" in my array would match "snowman"
It should be case sensitive.
Example csv
s,dog,34
3,cat,4
1,african elephant,gd
A,African Elephant,33
H,snowman,8
8,indian elephant,3k
7,Fish,94
...
Example array
snow
dog
african elephant
Expected output
s,dog,34
H,snowman,8
1,african elephant,gd
Cyrus posted this which works well, but it doesn't allow spaces in the array strings and wont match partial matches.
echo "${array[#]}" | awk 'FNR==NR{len=split($0,a," "); next} {for(i=1;i<=len;i++) {if(a[i]==$2){next}} print}' FS=',' - file
The brief approach using a single regexp for all array contents:
$ array=('snow' 'dog' 'african elephant')
$ printf '%s\n' "${array[#]}" | awk -F, 'NR==FNR{r=r s $0; s="|"; next} $2~r' - example.csv
s,dog,34
1,african elephant,gd
H,snowman,8
Or if you prefer string comparisons:
$ cat tst.sh
#!/bin/env bash
array=('snow' 'dog' 'african elephant')
printf '%s\n' "${array[#]}" |
awk -F',' '
NR==FNR {
array[$0]
next
}
{
for (val in array) {
if ( index($2,val) ) { # or $2 ~ val for a regexp match
print
next
}
}
}
' - example.csv
$ ./tst.sh
s,dog,34
1,african elephant,gd
H,snowman,8
This prints no line from csv file which contains an element from array in column 5:
echo "${array[#]}" | awk 'FNR==NR{len=split($0,a," "); next} {for(i=1;i<=len;i++) {if(a[i]==$5){next}} print}' FS=',' - file

How to not remove the header while executing awk

I have a file file like this :
k_1_1
k_1_3
k_1_6
...
I have a file file2 :
0,1,2,3,...
k_1_1,17,16,15,...
k_1_2,17,89,15,...
k_1_3,10,26,45,...
k_1_4,17,16,15,...
k_1_5,10,26,45,...
k_1_6,17,16,15,...
...
I want to print lines of file2 that is matched with fileThe desired output is :
0,1,2,3,...
k_1_1,17,16,15,...
k_1_3,10,26,45,...
k_1_6,17,16,15,...
I tried
awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1];next}$1 in a {print $0}' file file2 > result
But the header line is gone in result like this :
k_1_1,17,16,15,...
k_1_3,10,26,45,...
k_1_6,17,16,15,...
How can a maintain it? Thank you.
Always print the first line, unconditionally.
awk 'BEGIN{FS=OFS=","}
NR==FNR{a[$1];next}
FNR==1 || $1 in a' file file2 > result
Notice also how { print $0 } is not necessary because it's the default action.
A very ad-hoc solution to your problem could be to compose the output in a command group:
{ head -1 file2; awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1];next}$1 in a {print $0}' file file2; } > result
Could you please try following.
awk -F, 'FNR==NR{a[$1]=$0;next} FNR==1 && ++count==1{print;next} a[$1]' Input_file Input_file2
OR
awk -F, 'FNR==NR{a[$1]=$0;next} FNR==1{print;next} a[$1]' Input_file Input_file2

awk command to split nth field

I am learning AWK and was trying some exercises on built-in string functions.
Here's my exercise:
I have a file containing as below
RecordType:83
1,2,3,a|x|y|z,4,5
And my desired output is as below:
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
I wrote an awk command for the above output.
awk -F',' '$1 ~ /RecordType:83/{print $0}
$1 == 1{
split($4,splt,"|")
for(i in splt)
{
if(i==1)
print $1,$2,$3,splt[i],$5,$6
else
print $1,0,0,splt[i],$5,$6
}
}' OFS=, file_name
The above command looks so clumsy. Is there any way minimizing the command?
Thanks in advance
The shortest possible one-liner I could manage:
awk -F, 'NR>1{n=split($4,a,"|");for(;i++<n;){$4=a[i];print;$2=$3=0}}NR==1' OFS=, file
RecordType:83    
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
The much more readable script (recommended):
BEGIN {
FS=OFS="," # Comma delimiter
}
NR==1 { # If the first line in file
print $0 # Print the whole line
next # Skip to next line
}
{
n=split($4,a,"|") # Split field four on |
for(i=1;i<=n;i++) # For each sub-field
print $1,i==1?$2OFS$3:"0"OFS"0",a[i],$5,$6 # Print the output
}
another shorter one-liner
awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
with your example:
kent$ awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5

using awk to count characters and modify file accordingly

I have a file that looks like this
#FCD17BKACXX:8:1101:2703:2197#0/1
CAGCTTTACTCGTCATTTCCCCCAAGGGTAAAATGCGTCCGTCCATTAAGTTCACAGTCATCGTCT
+FCD17BKACXX:8:1101:2703:2197#0/1
^`^\eggcghheJ`dffhhhffhe`ecd^a^_ceacecfhf\beZegfhh_fghhgfZbdg]c^a`
#FCD17BKACXX:8:1101:4434:2244#0/1
CTGCGTTCATCGCGTTGTTGGGAGGAATCTCTACCCCAGGTTCTCGCTGTGAA
+FCD17BKACXX:8:1101:4434:2244#0/1
eeecgeceeffhhihi_fhhiicdgfghiiihiiihiiihVbcdgfhge`cee
#FCD17BKACXX:8:1101:6394:2107#0/1
CAGCAGGACTAGGGCCTGCAGACGTACTG
+FCD17BKACXX:8:1101:6394:2107#0/1
eeeccggeghhiihiihihihhhhcfghf
I would like to go to every second line and count the number of characters. If the line contains less than e.g. 66 characters then fill it to 66 with 'A' and print to new file. If it contains 66 characters then just print the line as is.
The output file would look like this;
#FCD17BKACXX:8:1101:2703:2197#0/1
CAGCTTTACTCGTCATTTCCCCCAAGGGTAAAATGCGTCCGTCCATTAAGTTCACAGTCATCGTCT
+FCD17BKACXX:8:1101:2703:2197#0/1
^`^\eggcghheJ`dffhhhffhe`ecd^a^_ceacecfhf\beZegfhh_fghhgfZbdg]c^a`
#FCD17BKACXX:8:1101:4434:2244#0/1
CTGCGTTCATCGCGTTGTTGGGAGGAATCTCTACCCCAGGTTCTCGCTGTGAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:4434:2244#0/1
eeecgeceeffhhihi_fhhiicdgfghiiihiiihiiihVbcdgfhge`ceeAAAAAAAAAAAAA
#FCD17BKACXX:8:1101:6394:2107#0/1
CAGCAGGACTAGGGCCTGCAGACGTACTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:6394:2107#0/1
eeeccggeghhiihiihihihhhhcfghfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
I have a very basic knowledge of awk so from a learning perspective I would like to use awk to solve the problem.
One way:
awk '!(NR%2) && length<66{for(i=length;i<66;i++)$0=$0 "A"}1' file
This should be faster than the accepted approach:
awk 'NR%2==0 { x = sprintf("%-66s", $0); gsub(/ /,"A",x); $0 = x }1' file
Results:
#FCD17BKACXX:8:1101:2703:2197#0/1
CAGCTTTACTCGTCATTTCCCCCAAGGGTAAAATGCGTCCGTCCATTAAGTTCACAGTCATCGTCT
+FCD17BKACXX:8:1101:2703:2197#0/1
^`^\eggcghheJ`dffhhhffhe`ecd^a^_ceacecfhf\beZegfhh_fghhgfZbdg]c^a`
#FCD17BKACXX:8:1101:4434:2244#0/1
CTGCGTTCATCGCGTTGTTGGGAGGAATCTCTACCCCAGGTTCTCGCTGTGAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:4434:2244#0/1
eeecgeceeffhhihi_fhhiicdgfghiiihiiihiiihVbcdgfhge`ceeAAAAAAAAAAAAA
#FCD17BKACXX:8:1101:6394:2107#0/1
CAGCAGGACTAGGGCCTGCAGACGTACTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+FCD17BKACXX:8:1101:6394:2107#0/1
eeeccggeghhiihiihihihhhhcfghfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
I would paste another strange (maybe) oneliner:
awk 'BEGIN{while(++i<66)t=t"A"}!(NR%2){$0=$0substr(t,length)}1' file
awk 'NR%2 == 0{
printf("%s", $0)
for(i=length($0); i<66; i++)printf("A")
print "";next }
{print}'
awk -v FS= '{printf "%s",$0} !(NR%2){for (i=NF+1;i<=66;i++) printf "A"} {print ""}'
or if you don't like loops:
awk -v FS= '{sfx=(NR%2 ? "" : sprintf("%*s",66-NF,"")); gsub(/ /,"A",sfx); print $0 sfx}'