AWK script to process information spread over multiple file lines

AWK script to process information spread over multiple file lines - awk

I am trying to do a text processing script, for what seems to be a rather simple task.
I have a file, which contains the following repeated pattern
111 0 1000 other stuff #<- here a new element begins
some text & #<- "&" or white spaces increment -
some more #<- signal continue on next line
last line
221 1 1.22E22 # new element $2!=0 must be followed by float
text &
contiuned text
c comment line in between
more text &
last line
2221 88 -12.123 &
line1
line2
c comment line
last line
223 0 lll -111 $ element given by line
22 22 -3.14 $ element given by new line
I would like to get
111 0 1000 other stuff #<- here a new element begins
some text & #<- "&" or white spaces increment -
some more #<- signal continue on next line
last line &
xyz=1
221 1 1.22E22 # new element $2!=0 must be followed by float
text &
contiuned text
c comment line in between
more text &
last line &
xyz=1
2221 88 -12.123 &
line1
line2
c comment line
last line &
xyz=1
223 0 lll -111 & $ element given by line
xyz=1
22 22 -3.14 & $ element given by new line
xyz=1
I would like to develop an awk script that appends a string to the last line of each element. To do so my script looks for the new element pattern, and continues to read until one of the next element indicators are found. Unfortunately, it does not function properly because it prints the last line two times and fails to append to the very last line of the file.
function newelement(line) {
split(line, s, " ")
if (s[1] ~/^[0-9]+$/ && ((s[2] ~/^[0-9]+$/ && s[3] ~/\./) || (s[2] == 0 && s[3] !~/\./))) {
return 1
} else {
return -1
}
}
function contline(line) {
if (line~/&/ || line~/^[cC]/ || line~/^\s{3,10}[^\s]./) {
return 1
} else {
return -1
}
}
BEGIN {
subs = " xyz=1 "
} #increment to have the next line in store
FNR == 1 {
getline nextline < FILENAME
}
{
# get the next line
getline nextline < FILENAME
if (newelement($0) == 1 && NR < 3673) {
if (length(old) > 0 || $0~/^$/) {
printf("%s &\n%20s\n", old, subs)
print $0
}
# to capture one line elements with no following continuation
# i.e.
# 221 91 0.5 33333
# 22 0 11
#look at the next line
else if (($0!~/&/ && contline(nextline) == -1)) {
printf("%s &\n%20s\n", $0, subs)
}
}
else {
print "-" $0
}
# store last not - commented line
if ($0!~/^\s{0,20}[cC]/) old = $0
}
Where the comment line has c or c followed by an empty space. Comment lines should be preserved but no strings should be appended to them.

Please check the following code and let me know if it works for you:
$ cat 3.1.awk
BEGIN{
subs = " xyz=1 "
threshold = 3673
}
# return boolean if the current line is a new element
function is_new_element(){
return ($1~/^[0-9]+$/) && (($2 ~ /^[0-9]+$/ && $3~/\./) || ($2 == 0 && $3 !~/\./))
}
# return boolean if the current line is a comment or empty line
function is_comment() {
return /^\s*[cC] / || /^\s*$/
}
# function to append extra text to line
# and followed by comments if applicable
function get_extra_text( extra_text) {
extra_text = sprintf("%s &\n%20s", prev, subs)
text = (text ? text ORS : "") extra_text
if (prev_is_comment) {
text = text ORS comment
prev_is_comment = 0
comment = ""
}
return text
}
NR < threshold {
# replace the above line with the following one if
# you want to process up to the first EMPTY line
#NR==1,/^\s*$/ {
# if the current line is a new element
if (is_new_element()) {
# save the last line and preceeding comments
# into the variable 'text', skip the first new element
if (has_hit_first_new_element) text = get_extra_text()
has_hit_first_new_element = 1
prev_is_new = 1
# before hitting the first new_element line, all lines
# should be printed as-is
} else if (!has_hit_first_new_element) {
print
next
# if current line is a comment
} else if (is_comment()) {
comment = (comment ? comment ORS : "") $0
prev_is_comment = 1
next
# if the current line is neither new nor comment
} else {
# if previous line a new element
if (prev_is_new) {
print (text ? text ORS : "") prev
text = ""
# if previous line is comment
} else if (prev_is_comment) {
print prev ORS comment
prev_is_comment = 0
comment = ""
} else {
print prev
}
prev_is_new = 0
}
# prev saves the last non-comment line
prev = $0
next
}
# print the last block if NR >= threshold
!is_last_block_printed {
print get_extra_text()
is_last_block_printed = 1;
}
# print lines when NR > threshold or after the first EMPTY line
{ print "-" $0 }
Where
The lines are divided into 3 categories and processed differently:
is_new_element() to true when the current line is a new element, the flag prev_is_new identify the previous new element
is_comment() function to true, then the current line is a comment, prev_is_comment to identify the previous comment line
other lines: all other lines except the above two
Others notes:
You can select a NR < threshold(which is 3673 in your code), or a range pattern NR==1,/^\s*$/ to process only a range of lines.
is_last_block_printed flag and related code are to make sure the last processing block is printed either at the end of the above range or in the END{} block
I did not check the trailing & for the continuing line, if they are followed by a comment or a new element, the logic has to be defined, i.e. which one should take precedence
If there are other lines before the first is_new_element() line, the code will not work well. This can be fixed by adding another flag instead of using if (NR > 1) to update text.
Testing Sample:
$ cat 3.1.txt
111 0 1000 other stuff #<- here a new element begins
some text & #<- "&" or white spaces increment -
some more #<- signal continue on next line
last line
221 1 1.22E22 # new element $2!=0 must be followed by float
text &
contiuned text
c comment line in between
more text &
last line
2221 88 -12.123 &
line1
line2
c comment line 1
last line
c comment line 2
c comment line 3
c comment line 4
c comment line 5
223 0 lll -111
223 0 22 -111
223 0 22 -111
c comment line in between 1
c comment line in between 2
22 22 -3.14
c comment line at the end
Output:
$ awk -f 3.1.awk 3.1.txt
111 0 1000 other stuff #<- here a new element begins
some text & #<- "&" or white spaces increment -
some more #<- signal continue on next line
last line &
xyz=1
221 1 1.22E22 # new element $2!=0 must be followed by float
text &
contiuned text
c comment line in between
more text &
last line &
xyz=1
2221 88 -12.123 &
line1
line2
c comment line 1
last line &
xyz=1
c comment line 2
c comment line 3
c comment line 4
c comment line 5
223 0 lll -111 &
xyz=1
223 0 22 -111 &
xyz=1
223 0 22 -111 &
xyz=1
c comment line in between 1
c comment line in between 2
22 22 -3.14 &
xyz=1
c comment line at the end
Some extra explanantion:
One concern to process the text is the trailing newline "\n" when appending subs to prev line. it's especially important when consecutive new_element lines happen.
Important to notice, the variable prev in the code is defined as the previous non-comment line (category-1, 3 defined above). there could be zero or multiple comment (category-2) lines between the prev line and the current line. that's also why we use print prev ORS comment instead of print comment ORS prev when printing regular comments (not those preceding the new_element line).
A block of comment lines (1 or more consecutive comment lines) are saved into the variable comment. if it's right before the new_element line, then append the block to the variable text. All other block of comments will be printed in the line print prev ORS comment mentioned above
The function get_extra_text() is to process the extra_text, which is in the order: prev subs ORS comments, where comments is appended only when prev_is_comment flag is 1. Do notice that the same variable text could have saved multiple prev subs ORS comments blocks if there are consecutive new_element lines.
We only print on the category-3 line mentioned above(neither a new_element nor a comment). This is a safe place when we have no worry about the trailing newline or extra_text:
if the prev_is_new, we print the cached text and then the variable prev (which is a new_element)
if the prev_is_comment, we just print the prev ORS comment. notice again the variable prev saves the last non-comment line from the current line, it does not have to be the line right above the current line.
all other case, just print the prev line as-is
Since we are concatenating lines into text and comment variables, we use the following syntax to avoid the leading ORS (which is "\n" by default)
text = (text ? text ORS : "") prev
If the leading ORS is not a concern, you can just use the following:
text = text ORS prev
and because the lines are appended to these variables, we will need to reset
them (i.e. text = "") each time after we consume them, otherwise, the
concatenated variable will contain all previously processed lines.
Final notes
added a flag has_hit_first_new_element, in case there are lines before the first new_element line, they will be printed as-is. In this code, the first new_element line should be treated differently, using NR==1 is not a safe-belt.
removed the code in the END{} block which is redundant

Try this:
function newelement(line){
split(line,s," ")
if(s[1]~/^[0-9]+$/ && ((s[2]~/^[0-9]+$/ && s[3]~/\./)|| (s[2]==0 && s[3]!~/\./))){return 1}
else{return -1}
}
BEGIN{
subs=" xyz=1 "
}
{
if (length($0)==0) next # Skip empty lines, remove or change it according to your needs.
if (newelement($0)==1){
if (length(last_data)>0) {
printf("%s &\n%20s\n",last_data,subs)
if (last_type=="c") {
print comments
}
}
last_data=$0
last_type="i"
} else if($0 ~/^\s*[cC] /) {
if (last_type=="c") comments = comments ORS $0
else comments = $0
last_type="c"
} else {
if (last_type=="c") print comments
else if(length(last_data)>0) print last_data
last_data=$0
last_type="d"
}
}
END{
printf("%s &\n%20s\n",last_data,subs)
if (last_type=="c") print comments
}
Three variables:
last_data to hold last data line.
last_type to hold the type of last line, i for indicator, c for comments.
comments to hold comments line(s).

Related

Printing input statements to a specific row(?)

Doing a project for a qbasic class, and i need the 1st row to ask for input i.e. " Enter projected depletion rate: ", after doing that it will run a loop under it, wherein i need it to print another input statement on that same 1st row, " Enter another projected depletion rate or 0 to quit :" the issue i'm having is that if i use LOCATE it will print the next results of the loop directly under that statement when id like it to print below the last results in the list, at the lowest unused space, and it doesn't clear the top row of old text. I know part of it is that the LOCATE is getting repeated because of the loop but i'm genuinely stuck. sorry for format i'm new :)
CLS
DIM percent AS DOUBLE
DIM ozLevel AS DOUBLE
DIM counter AS INTEGER
DIM change AS DOUBLE
INPUT "enter a projected depletion rate, or 0 to quit: ", percent
PRINT
PRINT TAB(2); "Loss"; TAB(17); "Final Ozone"
PRINT TAB(2); "Rate"; TAB(10); "Years"; TAB(17); "Concentration"
change = (percent / 100)
DO WHILE percent <> 0
counter = 0
ozLevel = 450
DO UNTIL ozLevel < 200
counter = counter + 1
ozLevel = ozLevel - (ozLevel * change)
LOOP
PRINT USING "##.##%"; TAB(2); percent;
PRINT TAB(10); counter;
PRINT USING "###.##"; TAB(17); ozLevel;
LOCATE 1, 1
INPUT "enter new projection: ", percent
change = (percent / 100)
LOOP
LOCATE 1, 35
PRINT "DONE"
END

QBasic has the CRSLIN function that tells you where the cursor is.
Make sure that printing the 3rd result does a carriage return and linefeed. Just remove the ;
Now store the index to the next available row in a suitable variable like TableRow.
Input as before on the 1st row of the screen.
Position the cursor on the next available row using this variable after each following input.
...
PRINT USING "###.##"; TAB(17); ozLevel
tablerow = CRSLIN
LOCATE 1, 1
INPUT "enter new projection: ", percent
change = (percent / 100)
LOCATE tablerow, 1
LOOP
...

How to remove " " in filename only but no the entire file ??

Below is my awk code to sort and split input files and rename it to new output files. But I have a problem as I don't want to keep " " in the filename but my filename for output files are created based on first four columns of input files. file= path ""$1""$2""$3""$4"_03042017.csv". I have been trying to use gsub to remove " " but it will also remove " " inside files which are not my desire outcome. I just want to remove " " in filename only. Anyone can please help me? Appreciate it a lot.
awk -F"|" 'BEGIN { OFS = "|" } NR==1 {
for( i=1;i<5;i++) $i = ""
h = substr($0, index($0,$5)); sub(/^[[:blank:]]+/,"", h)
next
}
{
file= path ""$1""$2"_"$3"_"$4"_03042017.csv"
# remove 4 first field
for( i=1;i<5;i++) $i = ""
# cleaning starting space
Cleaned = substr($0, index($0,$5)); sub( /^[[:blank:]]+/, "", Cleaned)
print ( a[file]++ ? "" : "DM9 03042017" ORS h ORS ) Cleaned > file
}
END {
for(file in a) print "EOF " a[file] > file
} ' file1

Not completed file read

I'm trying to debug what went wrong in my code. My file.txt contains 1763 lines but when I run it, it always ends up not completed. Always stops somewhere at 1680 and up (printed by the row in my code); the thing is it stops at different line every time I run it, so I don't think the problem's with my text file.
row = 0
for line in io.lines("file.txt") do
row = row+1
local new_row1 = {}
for n in line:gmatch'%S+' do
table.insert(new_row1, tonumber(n))
end
if #new_row1 > 0 then
table.insert(input, new_row1)
end
print(row)
end
Is there something wrong in my code?

It looks like in your code, you opened a file handle to "file.txt" at the beginning of your script and it remains open till the end where you close the file. During that time, you attempt to reopen "file.txt" again in your loop which is causing the strange behavior you're seeing.
When I moved your file open and close scopes to the middle section after first loop but before the last outer loop, that fixes the issue:
file = assert(io.open("file.txt", "w"))
for i = 1, 1000 do
j = math.random(i, row-one)
u[i], u[j] = u[j], u[i]
for k = 1, 11 do
file:write(input2[u[i]][k], " ")
end
file:write"\n"
end
num = (row-one)+1
for i = 1, one do
for k=1, 11 do
file:write(input2[num][k], " ") --writes to the file all the rows starting from where '1' in column11 was seen
end
file:write("\n")
num = num + 1
end
file:close()
-----------------------------------Access file.txt.--------------------------
-- ...
This gives the expected output:
Done 1762 1762
--------------------------

adding vertical text each item in text file writeline with some text

I am populating a listbox with some text & saving the output to textfile (sObj.txt)
'Saving items of lb1 in a file under C:\temp
Dim i As Integer
W = New IO.StreamWriter("C:\temp\sObj.txt")
For i = 0 To lb1.Items.Count - 1
W.WriteLine(lb1.Items.Item(i))
Next
W.Close()
This text file contains 3 (for example) entries, let's say abc in 1st line, def in 2nd line & ghi in the 3rd line.
Now I want to append another text file (MPadd.txt) using sObj.txt entries such that I get something like the following:
'Appending listbox items to the file MPadd.txt
Using SW As New IO.StreamWriter("c:\temp\MPadd.txt", True)
SW.WriteLine("some text" & abc & "some text")
SW.WriteLine("some text" & def & "some text")
SW.WriteLine("some text" & ghi & "some text")
End Using
Please help in getting it correctly. thanks.

Just read all the lines from the first file (just three lines so it is not a problem) and then loop over these lines adding prefix and postfix text as you like
EDIT
Following your last example
Dim commands() =
{
"cdhdef -t ftpv2 -c r -f {0} -x ",
"cdhdsdef -v CPUSRG {0} ",
"cacls K:\AES\data\Cdh\ftp\{0}\Archive /E /G OSSUSER:C"
}
Dim counter As Integer = 0
Dim objLines = File.ReadAllLines("C:\temp\sObj.txt")
Using SW As New IO.StreamWriter("c:\temp\MPadd.txt", True)
' Loop only for the number of strings in commands (max 3 now)
for x = 0 to commands.Length - 1
line = objeLines(x).Trim
' This check will prevent empty lines to be used for the output
If line <> string.Empty Then
SW.WriteLine(string.Format(commands(counter), line))
counter += 1
End If
Next
End Using
This example use composite formatting where you define a format string and a progressive placeholder where you want to insert another value.
Of course this will work only if you have just 3 lines in your input file

Way to Jump to Next i in For..Next Loop?

I'm reverse engineering in QuickBasic and I have code like this:
FOR i = star TO fin
IF a < 1 THEN
CALL clrbot
COLOR 15
PRINT force$(side); " army in "; city$(armyloc(i)); " is CUT OFF !";
TICK turbo!
GOTO alone
END IF
size = size + 1
max = 11: IF LEN(armyname$(i)) < 11 THEN max = LEN(armyname$(i))
mtx$(size) = LEFT$(armyname$(i), max)
array(size) = i
alone:
NEXT i
I'd like to get rid of the line label (alone) and instead do something like:
IF a < 1 THEN
CALL clrbot
COLOR 15
PRINT force$(side); " army in "; city$(armyloc(i)); " is CUT OFF !";
TICK turbo!
NEXT i
END IF

You could replace the GOTO with an Else:
For i = star To Fin
If a < 1 Then
' Do something
Else
' Do Something else
End If
Next
This would follow the same logic - the Else takes the place of the GOTO alone statement.
In the original code (QuickBASIC) if the If block is entered, everything after then GOTO alone statement is ignored.
If the If block is not entered (i.e., a >= 1) then everything after the If block is executed.
The Else statement in the VB.NET code will produce the same behavior. If a < 1, the first block will be executed and the Else block will be ignored, and the loop will advance to the next increment of i.
If a >= 1, then the Else block will be executed and the loop will then advance to the next increment of i.
The above assumes labels in QuickBASIC are like labels in DOS batch files.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

AWK script to process information spread over multiple file lines - awk

Related

Printing input statements to a specific row(?)

How to remove " " in filename only but no the entire file ??

Not completed file read

adding vertical text each item in text file writeline with some text

Way to Jump to Next i in For..Next Loop?

Categories

Resources