Print Specific Line In Lex - yacc

I want to print only the lines that begin with the word "Hello"
How can I do it?
i.e
Word----------Print
Hello World (YES)
World Hello (NO)
Thanks!

^"Hello".*$ printf("%s\n", yytext);
^ beggining of line

Related

Fail to continue parsing after correct input

I have two input numbers separated by ','.
The program works fine for the first try, but for the second try it always ends with error.
How do I keep parsing?
lex file snippet:
#include "y.tab.h"
%%
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
. return yytext[0];
%%
yacc file snippet:
%{
#include <stdio.h>
int yylex();
int yyerror();
%}
%start s
%token NUMBER
%%
s: NUMBER ',' NUMBER{
if(($1 % 3 == 0) && ($3 % 2 == 0)) {printf("OK");}
else{printf("NOT OK, try again.");}
};
%%
int main(){ return yyparse(); }
int yyerror() { printf("Error Occured.\n"); return 0; }
output snippet:
benjamin#benjamin-VirtualBox:~$ ./ex1
15,4
OK
15,4
Error Occured.
Your start rule (indeed, your only rule) is:
s: NUMBER ',' NUMBER
That means that an input consists of a NUMBER, a ',' and another NUMBER.
That's it. After the parser encounters those three things, it expects an end of input indicator, because that's what you've told it a complete input looks like.
If you want to accept multiple lines, each consisting of two numbers separated by a comma, you'll need to write a grammar which describes that input. (And in order to describe the fact that they are lines, you'll probably want to make a newline character a token. Right now, it falls through the the scanner's default rule, because in (f)lex . doesn't match a newline character.) You'll also probably want to include an error production so that your parser doesn't suddenly terminate on the first error.
Alternatively, you could parse your input one line at a time by reading the lines yourself, perhaps using fgets or the Posix-standard getline function, and then passing each line to your scanner using yy_scan_string

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1
Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

awk: match pattern and print lines after and before till next pattern

Using awk:
Find a pattern.
Print all lines after that pattern till next pattern.
Print all lines before that pattern till next pattern.
eg. if this is the content of the file
?hello#
line-0
?type=A;so on
line-1
short-description
line-2
line-3
ending#
line-4
?bye#
match pattern short-description and print lines after till pattern # and print lines before till pattern ? so the output should be:
?type=A;so on
line-1
short-description
line-2
line-3
ending#
i tried: awk '/short-description/{copy=1;next} /#/{copy=0;next} copy' file
but i don't know how to get the before pattern part, i have very limited knowledge of awk. Also please provide a one line solution.
please help. Thanks a lot.
Try:
/^\?/ { delete arr ; len = 0 ; hit = 0 }
/^\?/,/#$/ {
arr[len++] = $0
if ( /short-description/ )
hit = 1
}
/#$/ {
if(hit)
for(i=0;i<len;++i)
print arr[i]
}
Or, this one-liner:
BEGIN { RS="?" } /short-description/ { sub("#.*","") ; print $0 }

Removing SOME line breaks from srt/txt file

I have a text file which has numbered entries, a timecode and a transcript. I am trying to remove the line breaks in the transcript and leave the others. I'm trying to use grep or awk.
File is like
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit,
and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on
these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single
line no matter how long that line.
Output would look like:
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit,
and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on
these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.
thanks to all who have provided help
Don't rely on lines starting (or not) with any specific characters - just attach the 4th and subsequent lines in each record to the end of the 3rd line of that record:
$ awk '
BEGIN { RS=ORS=""; FS=OFS="\n" }
{
print $1,$2,$3
for (i=4;i<=NF;i++)
printf " %s", $i
print "\n\n"
}
' file
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit, and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.
I think you need something like
awk '/[0-9]+/,/^$/{ if(NR<3) print $0; else {while($0!=""){ printf $0;next; }}}' file
It's not working, but you may get the idea.
You can try something like this with awk:
awk '!NF{print}/[a-z]/{printf "%s ", $0;next}1' file
$ cat file
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit,
and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on
these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single
line no matter how long that line.
$ awk '!NF{print}/[a-z]/{printf "%s ", $0;next}1' file
1
00:00:27,160 --> 00:00:29,054
Sometimes there's not much dialogue.
2
00:00:30,100 --> 00:00:31,090
But other times there is quite a bit, and it's formatted into two lines
3
00:00:31,500 --> 00:00:33,700
I want to remove the line breaks only on these long lines, leaving all other formatting.
4
00:00:33,805 --> 00:00:37,285
So that all dialogue ends up being on a single line no matter how long that line.
Delete all new lines that are preceded by a letter or a space or tab:
perl -pe 's/([a-zA-Z \t])\n$/$1/'
I had the same problem and wrote this little code, which solved my problem:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
FILE *quelle,*ziel;
int i;
long maxsub,count,tmp,sub;
char puffer[10][200], *ptr,line[400];
if(argc != 3)
{
printf("Usage: srtlinejoin Filename CountOfSubtitles\n");
return EXIT_FAILURE;
}
maxsub = strtol( argv[2], &ptr, 10);
if( (quelle=fopen(argv[1],"r")) == NULL) {
fprintf(stderr, "Can't open %s\n", argv[1]);
return EXIT_FAILURE;
}
if( (ziel=fopen("out.srt","w")) == NULL) {
fprintf(stderr, "Can't open out.srt\n");
fclose(quelle);
return EXIT_FAILURE;
}
//read and write first line
fgets(puffer[0], 200, quelle);
fputs(puffer[0], ziel);
for(count=1; count < maxsub;count++)
//for(count=1; count <= 3;count++)
{
//printf("Processing subtitle %d\n",count);
tmp=0;
//Read and write time
fgets(puffer[0], 200, quelle);
fputs(puffer[0], ziel);
do {
fgets(puffer[tmp], 200, quelle);
//Scan for next Subtitle
sub = strtol( puffer[tmp], &ptr, 10);
tmp++;
}
while(sub != (count+1));
//Der Untertitel hat nur eine Zeile
if (strlen(puffer[1]) == 2)
{
fputs(puffer[0], ziel); //New Subtitle
fputs(puffer[1], ziel); //Next empty line
fputs(puffer[2], ziel); //Next number
}
//Der Untertitel hat zwei Zeile
if ((strlen(puffer[1]) > 2) && (strlen(puffer[2]) == 2))
{
for(i=0;i<400;i++)
line[i]=0;
strncpy(line,puffer[0],(strlen(puffer[0])-2));
strcat(line," ");
strcat(line,puffer[1]);
fputs(line, ziel); //New Subtitle
fputs(puffer[2], ziel); //Next empty line
fputs(puffer[3], ziel); //Next number
}
//Der Untertitel hat mehr als zwei Zeile
if ((strlen(puffer[1]) == 2) && (strlen(puffer[2]) == 2))
{
printf("Attention: The subtitles has more than two lines\n");
}
}
printf("Check last subtitle!\n");
fclose(quelle);
fclose(ziel);
return EXIT_SUCCESS;
}

Awk command to insert corresponding line numbers except for blank lines

I'm doing an assignment at the moment and the question that's stumped me is:
"Write an awk command to insert the corresponding line number before
each line in the text file above. The blank line should NOT be
numbered in this case."
I have an answer, but I'm struggling to find the explanation of what each component does.
The command is:
awk '{print (NF? ++a " " :"") $0}' <textfile.txt>
I know that NF is the field number, and that $0 refers to the whole input record. I tried playing around with the command to find what does what, but it always seems to have syntax errors whenever I omit something.
So, my question is what does each component do? What does the ++a do? The ? after NF? and what does the bit with the quotations do?
Thanks in advance!
The instruction ... ? ... : ... it's an if-else. So, it's the same as:
if ( NF > 0 ) {
++a;
print a " " $0;
} else {
print $0;
}
a is a variable that is only incremented when found a line with fields.
print (NF? ++a " " :"") $0
a ternary operator has been used in your solution.
for a blank line NF will be 0 always
so
cond?true case:false case
if NF is >0 then print a or else print ""
a++ says that after printing increment a by 1 which will be used for next non blank line processing.
awk 'BEGIN{count=1}{if($0~/^$/){print}else{print count,$0;count++}}' your_file
tested below:
> cat temp.cc
int main ()
{
}
> awk 'BEGIN{count=1}{if($0~/^$/){print}else{print count,$0;count++}}' temp.cc
1 int main ()
2 {
3 }
>