How can I store the length of a line into a var withing awk script? - awk

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1

Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

Related

Awk: I want to use the input filename to generate an output file with same name different extension

I have a script that looks like this:
#! /bin/awk -f
BEGIN { print "start" }
{ print $0 }
END { print "end" }
Call the script like this: ./myscript.awk test.txt
Pretty simple - takes a file and adds "start" to the start and "end" to the end.
Now I want to take the input filename, lets call it test.txt, and print the output to a file called test.out.
So I tried to print the input filename:
BEGIN { print "fname: '" FILENAME "'" }
But that printed: fname: '' :(
The rest I can figure out I think, I have this following to print to a hard-coded filename:
#! /bin/awk -f
BEGIN { print "start" > "test.out" }
{ print $0 >> "test.out" }
END { print "end" >> "test.out" }
And that works great.
So the questions are:
how do I get the input filename?
Assuming somehow I get the input file name in a variable, e.g. FILENAME which contains "test.txt" how would I make another variable, e.g. OUTFILE, which contains "test.out"?
Note: I will be doing much more awk processing so please don't suggest to use sed or other languages :))
Try something like this:
#! /bin/awk -f
BEGIN {
file = gensub(".txt",".out","g",ARGV[1])
print "start" > file
}
{ print $0 >> file }
END {
print "end" >> file
close(file)
}
I'd suggest to close() the file too in the END{} statement. Good call to Sundeep for pointing out that FILENAME is empty in BEGIN.
$ echo 'foo' > ip.txt
$ awk 'NR==1{op=FILENAME; sub(/\.[^.]+$/, ".log", op); print "start" > op}
{print > op}
END{print "end" > op}' ip.txt
$ cat ip.log
start
foo
end
Save FILENAME to a variable, change the extension using sub and then print as required
From gawk manual
Inside a BEGIN rule, the value of FILENAME is "", because there are no input files being processed yet
If you're using GNU awk (gawk), you can use the patterns BEGINFILE and ENDFILE
awk 'BEGINFILE{
outfile=FILENAME;
sub(".txt",".out",outfile);
print "start" > outfile
}
ENDFILE{
print "stop" >outfile
}' file1.txt file2.txt
You can then use the variable outfile your the main {...} loop.
Doing so will allow you to process more that 1 file in a single awk command.

Removing Quote From Field For Filename Using AWK

I've been playing around with this for an hour trying to work out how to embed the removal of quotes from a specific field using AWK.
Basically, the file encapsulates text in quotes, but I want to use the second field to name the file and split them based on the first field.
ID,Name,Value1,Value2,Value3
1,"AAA","DEF",1,2
1,"AAA","GGG",7,9
2,"BBB","DEF",1,2
2,"BBB","DEF",9,0
3,"CCC","AAA",1,1
What I want to get out are three files, all with the header row named:
AAA [1].csv
BBB [2].csv
CCC [3].csv
I have got it all working, except for the fact that I can't for the life of me work out how to remove the quotes around the filename!!
So, this command does everything (except the file is named with quotes around $2, but I need to do some kind of transformation on $2 before it goes into evname. In the actual file, I want to keep the encapsulating quotes.
awk -F, 'NR==1{h=$0;next}!($1 in files){evname=$2" ["$1"].csv";files[$1]=1;print h>evname}{print > evname}' DataExtract.csv
I've tried to push a gsub into this, but I'm struggling to work out exactly how this should look.
This is I think as close as I have got, but it is just calling everything "2" for $2, I'm not sure if this means I need to do an escape of $2 somehow in the gsub, but trying that doesn't seem to be working, so I'm at a loss as to what I'm doing wrong.
awk -F, 'NR==1{h=$0;next}!($1 in files){evname=gsub(""\","", $2)" - Event ID ["$1"].csv";files[$1]=1;print h>evname}{print > evname}' DataExtract.csv
Any help greatly appreciated.
Thanks in advance!!
Gannon
If I understand what you are attempting correctly, then
awk -F, 'NR==1{h=$0;next}!($1 in files){gsub(/"/, "", $2); evname=$2" ["$1"].csv";files[$1]=1;print h>evname}{print > evname}' DataExtract.csv
should work. That is
NR == 1 {
h = $0;
next
}
!($1 in files) {
stub = $2 # <-- this is the new bit: make a working copy
# of $2 (so that $2 is unchanged and the line
# is not rebuilt with changes for printing),
gsub(/"/, "", stub) # remove the quotes from it, and
evname = stub " [" $1 "].csv" # use it to assemble the filename.
files[$1] = 1;
print h > evname
}
{
print > evname
}
You can, of course, use
evname = stub " - Event ID [" $1 "].csv"
or any other format after the substitution (this one seems to be what you tried to get in your second code snippet).
The gsub function returns the number of substitutions made, not the result of the substitutions; that is why evname=gsub(""\","", $2)" - Event ID ["$1"].csv" does not work.
Things are always clearer with a little white space:
awk -F, '
NR==1 { hdr=$0; next }
!seen[$1]++ {
evname = $2
gsub(/"/,"",evname)
outfile = evname " [" $1 "].csv"
print hdr > outfile
}
{ print > outfile }
' DataExtract.csv
Aside: It's pretty unusual for someone to WANT to create files with spaces in their names given the complexity that introduces in any later scripts you write to process them. You sure you want to do that?
P.S. here's the gawk version as suggested by #JID below
awk -F, '
NR==1 { hdr=$0; next }
!seen[$1]++ {
outfile = gensub(/"/,"","g",$2) " [" $1 "].csv"
print hdr > outfile
}
{ print > outfile }
' DataExtract.csv
Apply the gsub before you make the assignment:
awk -F, 'NR==1{h=$0;next}
!($1 in files){
gsub("\"","",$2); # Add this line
evname=$2" ["$1"].csv";files[$1]=1;print...

Awk iterating with out a loop construct

I was reading a tutorial on awk scripting, and observed this strange behaviour, Why this awk script while executing asks for a number repeatedly even with out a loop construct like while or for. If we enter CTRL+D(EOF) it stops prompting for another number.
#!/bin/awk -f
BEGIN {
print "type a number";
}
{
print "The square of ", $1, " is ", $1*$1;
print "type another number";
}
END {
print "Done"
}
Please explain this behaviour of the above awk script
awk continues to work on lines until end of file is reached. Since in this case the input (STDIN) never ends as you keep entering number or hitting enter, it causes an endless loop.
When you hit CTRL+D you indicate the awk script that EOF is reached there by exiting the loop.
try this and enter 0 to exit
BEGIN {
print "type a number";
}
{
if($1==0)
exit;
print "The square of ", $1, " is ", $1*$1;
print "type another number";
}
END {
print "Done"
}
From the famous The AWK Programming Language:
If you don't provide a input file to the awk script on the command line, awk will apply the program to whatever you type next on your terminal until you type an end-of-file signal (control-d on Unix systems).

awk: non-terminated string

I'm trying to run the command below, and its giving me the error. Thoughts on how to fix? I would rather have this be a one line command than a script.
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } ' |
awk -F\" ' { print "url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/'{ print $1 }'\?schema\=1\.3\.0\&form\=json\&pretty\=true\&token\=582EVTY78-03iBkTAf0JAhwOBx\&account\=room_event\"" } '
awk: non-terminated string url = "ht... at source line 1
context is
>>> <<<
awk: giving up
source line number 2
The line below exports out a single column of ID's:
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } '
156512145
898545774
454658748
898432413
I'm looking to get the ID's above into a string like so:
" url = "string...'ID'string"
take a look what you have in last awk :
awk -F\"
' #single start here
{ print " #double starts for print, no ends
url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/
' #single ends here???
{ print $1 }'..... #single again??? ...
(rest codes)
and you want to print exact {print } out? i don't think so. why you were nesting print ?
Most of the elements of your pipe can be expressed right inside awk.
I can't tell exactly what you want to do with the last awk script, but here are some points:
Your "grep" is really just looking for a string of text, not a
regexp.
You can save time and simplify things if you use awk's
index() function instead of a RE. Output formats are almost always
best handled using printf().
Since you haven't provided your input data, I can't test this code, so you'll need to adapt it if it doesn't work. But here goes:
awk -F/ '
BEGIN {
string="id\": \"http://room.event.assist.com/event/room/event/";
fmt="url = http://example.com/event/room/event/%s?schema=whatever\n";
}
count == 1217 { nextfile; }
index($0, string) {
split($7, a, "\"");
printf(fmt, a[0]);
count++;
}' failed_events.txt
If you like, you can use awk's -v option to pass in the string variable from a shell script calling this awk script. Or if this is a stand-alone awk script (using #! shebang), you could refer to command line options with ARGV.

Awk command to insert corresponding line numbers except for blank lines

I'm doing an assignment at the moment and the question that's stumped me is:
"Write an awk command to insert the corresponding line number before
each line in the text file above. The blank line should NOT be
numbered in this case."
I have an answer, but I'm struggling to find the explanation of what each component does.
The command is:
awk '{print (NF? ++a " " :"") $0}' <textfile.txt>
I know that NF is the field number, and that $0 refers to the whole input record. I tried playing around with the command to find what does what, but it always seems to have syntax errors whenever I omit something.
So, my question is what does each component do? What does the ++a do? The ? after NF? and what does the bit with the quotations do?
Thanks in advance!
The instruction ... ? ... : ... it's an if-else. So, it's the same as:
if ( NF > 0 ) {
++a;
print a " " $0;
} else {
print $0;
}
a is a variable that is only incremented when found a line with fields.
print (NF? ++a " " :"") $0
a ternary operator has been used in your solution.
for a blank line NF will be 0 always
so
cond?true case:false case
if NF is >0 then print a or else print ""
a++ says that after printing increment a by 1 which will be used for next non blank line processing.
awk 'BEGIN{count=1}{if($0~/^$/){print}else{print count,$0;count++}}' your_file
tested below:
> cat temp.cc
int main ()
{
}
> awk 'BEGIN{count=1}{if($0~/^$/){print}else{print count,$0;count++}}' temp.cc
1 int main ()
2 {
3 }
>