How to delete lines after first checking next 3 lines - awk

I have a text file similar to this
00:00:24.752
8,594
3,847
0
00:00:25.228
0
1,692
0
00:00:25.738
6,548
5,304
0
00:00:26.248
1,807
417
0
00:00:26.758
3,913
5,335
0
00:00:26.792
0
00:00:27.234
0
00:00:27.268
0
0
0
00:00:27.778
9,903
2,345
0
00:00:27.812
0
00:00:28.322
0
9,501
0
this is network traffic and the first part is a timestamp while the next two are sent and received traffic. The third is a zero which i do not know why is there.
So my goal is to keep only the lines that have at least a value of sent/receive traffic and also delete the third 0 every time. So i will have a result like this.
00:00:24.752
8,594
3,847
00:00:25.228
0
1,692
00:00:25.738
6,548
5,304
00:00:26.248
1,807
417
00:00:26.758
3,913
5,335
00:00:27.778
9,903
2,345
00:00:28.322
0
9,501
Have tried using awk in the sense of checking the length of the current line and if the line is less than 8 characters then print that line and the next 2. But since the file is not always having at least 2 values after the timestamp it is not working properly.

awk '
/[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{3}/ {
if (NR > 1) p()
i = 0
}
{ buf[++i] = $0 }
END { p() }
function p() {
if (buf[2] || buf[3]) {
print buf[1]
print buf[2]
print buf[3]
}
delete buf
}' file
p is a function that prints buffered lines if 2nd and 3rd of them are not empty or zero, and clears the buffer. It is called whenever a timestamp is seen (and it's not the first line in the file) and when EOF is hit. So the script above basically buffers lines between two timestamps, and if they meet the criteria that there should be at least two lines after timestamp, and they shouldn't be zero, prints them.

This might work for you (GNU sed):
sed '/:/!{H;$!d};x;/\n.*\n.*\n/{/\n0\n0\n0/!s/\n0$//p};x;h;d' file
If the current line is not a time stamp (does not contain a :), append it to the hold space and if it is not the last line, delete it.
If the current line is either the last line or a time stamp, swap to the hold space and check that the previous record contains 4 lines and that the last 3 lines are not zeroed, if so remove the last line of the record and print the amended record.
Swap back to the pattern space, replace the hold space by the current line (time stamp) and delete it.
N.B. When a line is deleted no further sed processing takes place for the current line.

If you want to omit all 4th lines, this the awk script to achieve this:
awk 'RN % 4{print}' input.txt
Results with your desired output.

Related

awk/sed replace multiple newlines in the record except end of record

I have file where:
field delimiter is \x01
the record delimiter is \n
Some lines contain multiple newlines I need to remove them, however I don't want to remove the legitimate newlines at the end of each lines. I have tried this with awk:
awk -F '\x01' 'NF < 87 {getline s; $0 = $0 s} 1' infile > outfile
But this is only working when the line contains one newline in the record (except end of line newline). This does not work for multiple newlines.
Note: the record contains 87 fields.
What am I doing wrong here?
Example of file:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000
test^A00000000
Test^A^A^A^A
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000
test^A00000000
Test^A^A^A^A
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000
test^A12102120^A00000^A00^A^A
NOTE: The file contains 11 fields; field separate \x01; record separator \n
Expected result:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000test^A12102120^A00000^A00^A^A
Note: I need to preserve the field delimiter (\x01) and record delimiter (\n)
Thank you very much in advance for looking into this.
The file always contains 87 fields;
The fild delimiter is '\x01', but when viewing in Linux it is represented as '^A'
Some lines contain newlines - I need to remove them, but I don't want to remove the legitimate newlines at the end of each line.
The newline appears twice in the 1st and second record and once in third record - this are the newlines I want to remove.
In the examples/expected results there are 11 delimiters "x01" represented as "^A",
I expect to have 3 records and not 6, i.e.:
First record:
test^A00000000 should be joined to the previous line
Test^A^A^A^A should be joined to the first line as well
forming one record:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
Second record
test^A00000000 should be joined to the previous line
Test^A^A^A^A should be joined to that previous line as well
forming one record:
PL^ANov-21^A29-11-2021^A0^A00^A00^A0000000test^A00000000 Test^A^A^A^A
Third record:
test^A12102120^A00000^A00^A^A should be joined to the previous line
forming one record:
SL^ANov-21^A30-11-2021^AB^A0000^A1234567^A00000test^A12102120^A00000^A00^A^A
Note:
The example of awk - provided works when there is one unwanted newline in the record but not when there are multiple newlines
Thank you so very much. It works perfectly. Thank you for explaining it so well to me too.
This might work for you (GNU sed):
sed ':a;N;s/\x01/&/87;Ta;s/\n//g' file
Gather up lines until there are 87 separators, remove any newlines and print the result.
What's wrong with your attempt is that you concatenate two lines, print the result and move to the next line. NF is then reset to the next fields count. As all your lines have less than 87 fields the NF < 87 condition is useless, your script would work the same without it.
Try this awk script:
$ awk -F'\x01' -vn=87 -vi=0 '
{printf("%s", $0); i+=NF; if(i==n) {i=0; print "";} else i-=1;}' file
Here, we use the real \x01 field separator and the NF fields count. Variable i counts the number of already printed fields. We first print the current line without the trailing newline (printf("%s", $0)). Then we update our i fields counter. If it is equal to n we reset it and print a newline. Else we decrement it such that we do not count the last field of this line and the first of the next as 2 separate fields.
Demo with n=12 instead of 87 and your own input file (with \x01 field separators):
$ awk -F'\x01' -vn=12 -vi=0 '
{printf("%s", $0); i+=NF; if(i==n) {i=0; print "";} else i-=1;}' file |
sed 's/\x01/|/g'
PL|Nov-21|29-11-2021|0|00|00|0000000test|00000000 Test||||
PL|Nov-21|29-11-2021|0|00|00|0000000test|00000000 Test||||
SL|Nov-21|30-11-2021|B|0000|1234567|00000test|12102120|00000|00||
The sed command shows the result with the \x01 replaced by | for easier viewing.

AWK Merge line accumulator not working

Below is a single record split over 2 lines with an embedded newline after field 3 (the blank line is the embedded newline)
peter,9,ghi
mno
The algorithm is if there are less than 4 fields in a record then continue merging subsequent lines until you have 4 fields, then output the record.
I have awk code that supposedly does this. 2 cases.
CASE 1
> If the number of fields in the current line plus the accumulated
> number of fields = 4 then if there were no previous fields
> print current line else
> print previously accumulated line plus current line
CASE 2 Append the current line to the accumulated previous lines
BEGIN {
FS=","
flds=4
prevF=0
}
flds == NF + prevF {
print (prevF==0) ? $0 : prevLine $0
prevF=0
prevLine=""
next
}
{prevLine = (prevF==0) ? $0 FS : prevLine $0
prevF = prevF + NF}
Simple enough algorithm. When I run it against the data snippet I get
,mnohi,9,ghi
instead of the 2nd line tacked on to the end of the first.
I am interested in understanding why the code is behaving as it is and in awk solutions.

Awk: Append output to new field in existing file

Is there a way to print the output of an awk script to an existing file as a new field every time?
Hi!
I'm very new at awk (so my terminology might not be correct, sorry about that!) and I'm trying to print the output of a script that will operate on several hundred files to the same file, in different fields.
For example, my data files have this structure:
#File1
1
Values, 2, Hanna
20
15
Values, 2, Josh
30
56
Values, 2, Anna
50
70
#File2
2
Values, 2, Hanna
45
60
Values, 2, Josh
98
63
Values, 2, Anna
10
56
I have several of these files, which are divided by numbered month, with the same names, but different values. I want files that are named by the name of the person, and the values in fields by month, like so:
#Hanna
20 45
15 60
#Josh
30 98
56 63
#Anna
50 10
70 56
In my script, I search for the word "values", and determine which records to print (based on the number after "value"). This works fine. Then I want to print these values. It works fine for one file, with the command:
Print $0 > name #the varible name have I saved to be = $3 of the correct row
This creates three files correctly named "Hanna", "Josh" and "Anna", with their values. However, I would like to run the script for all my datafiles, and append them to only one "Hanna"-file etc, in a new field.
So what I'm looking for is something like print $0 > $month name, reading out like "print the record to the field corresponding to the month"
I have tried to find a solution, but most solutions either just paste temporary files together or append the values after the existing ones (so that they all are in field 1). I want to avoid the temporary files and have them in different fields (so that I get a kind of matrix-structure).
Thank you in advance!
try following, though I have not checked all permutations and combinations and only considered your post. Also your output Josh column is not consistent also(Or please do let us know if more conditions are there for same too). Let me know how it goes then.
awk 'FNR==NR{if($0 ~ /^Values/){Q=$NF;B[$NF]=$NF;i="";next};A[Q,++i]=$0;next} /^Values/{V=$NF;print "#"B[V];i="";next} B[V]{print A[V,++i],$0}' file1 file2
EDIT: Adding a non-one liner form of solution too.
awk 'FNR==NR{
if($0 ~ /^Values/){
Q=$NF;
B[$NF]=$NF;
i="";
next
};
A[Q,++i]=$0;
next
}
/^Values/{
V=$NF;
print "#"B[V];
i="";
next
}
B[V]{
print A[V,++i],$0
}
' file1 file2
EDIT2: Adding explanation too now for same.
awk 'FNR==NR{ ###Checking condition FNR==NR where this condition will be TRUE only when first file named file1 is being read. FNR and NR both indicate number of lines in a Input_file, only difference between them is FNR value will be RESET whenever there is next Input_file is being read and NR value will be keep on incresing till all the Input_files are read.
if($0 ~ /^Values/){ ###Checking here if any line starts from string Values if yes then perform following operations.
Q=$NF; ###Creating a variable named Q whose value is the last field of the line.
B[$NF]=$NF;###Creating an array named B whose index is $NF(last field of the line) and value is same too.
i=""; ###Making variable i value to NULL now.
next ###using next here, it is built-in keyword for awk and it will skip all further statements now.
};
A[Q,++i]=$0; ###Creating an array named A whose index is Q and variable i with increasing value with 1 to it, each time it comes on this statement.
next ###Using next will skip all further statements now.
}
/^Values/{ ###All statements from here will be executed when second file named file2 is being read. So I am checking here if a line starts from string Values then do following.
V=$NF; ###create variable V whose value is $NF of current line.
print "#"B[V]; ###printing the string # then value of array B whose index is variable V.
i=""; ###Nullifying the variable i value here.
next ###next will sip all the further statements now.
}
B[V]{ ###Checking here if array B with index V is having a value in it, then perform following on it too.
print A[V,++i],$0 ###printing the value of array A whose index is variable V and variable i increasing value with 1 and current line.
}
' file1 file2 ###Mentioning the Input_files here named file1 and file2.

Print lines containing the same second field for more than 3 times in a text file

Here is what I am doing.
The text file is comma separated and has three field,
and I want to extract all the line containing the same second field
more than three times.
Text file (filename is "text"):
11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
6,keyword2,content5
6,keyword2,content5
7,keyword1,content4
8,keyword1,content2
1,keyword1,content2
My command is like below. cat the whole text file inside awk and grep with the second field of each line and count the number of the line.
If the number of the line is greater than 2, print the whole line.
The command:
awk -F "," '{ "cat text | grep "$2 " | wc -l" | getline var; if ( 2 < var ) print $0}' text
However, the command output contains only first three consecutive lines,
instead of printing also last three lines containing "keyword1" which occurs in the text for six times.
Result:
11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
My expected result:
11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
7,keyword1,content4
8,keyword1,content2
1,keyword1,content2
Can somebody tell me what I am doing wrong?
It is relatively straight-forward to make just two passes over the file. In the first pass, you count the number of occurrences of each value in column 2. In the second pass, you print out the rows where the value in column 2 occurs more than your threshold value of 3 times.
awk -F, 'FNR == NR { count[$2]++ }
FNR != NR { if (count[$2] > 3) print }' text text
The first line of code handles the first pass; it counts the occurrences of each different value of the second column.
The second line of code handles the second pass; if the value in column 2 was counted more than 3 times, print the whole line.
This doesn't work if the input is only available on a pipe rather than as a file (so you can't make two passes over the data). Then you have to work much harder.

Awk - use particular line again to match with patterns

Suppose I have file:
1Alorem
2ipsuml
3oremip
4sumZAl
5oremip
6sumlor
7emZips
I want to split text from lines containing A to lines containing Z match with range:
/A/,/Z/ {
print > "rangeX.txt"
}
I want this particular input to give me 2 files:
1Alorem
2ipsuml
3oremip
4sumZAl
and
4sumZAl
5oremip
6sumlor
7emZips
problem is that line 4 is taken only once ad is matched as end of range, but 2nd range never starts because there is no A in other lines.
Is there a way to try to match line 4 again against all patterns or tell awk that it has to start new range?
Thanks
As Arne pointed out the second section will not be caught but the current pattern. Here is an alternative without the range.
awk 'p==0 {p= (~/A/)>0;filenr++} p==1 {print > "range"filenr".txt"; p= (~/Z/)==0; if(!p && ~/A/){filenr++;;p=1; print > "range"filenr".txt"}}' test.txt
It also handles more than two sections
All you need to do is save the last line of the first range to a variable and then reprint that variable, along with the following range, for the second file.
In other words, since you're just looping through each line, define an empty variable in your BEGIN and then update it each time through. You'll have the variable saved as the last line when your range ends. Write out that line to the next file before you begin again.
There is no way to rematch a record, but writing a variant of the pattern is an option. Here the second range pattern matches from a line containing A and Z to a line containing Z but not A:
awk "/A/,/Z/ {print 1, $0} (/A/ && /Z/),(/Z/ && !/A/) {print 2, $0}"
prints:
1 1Alorem
1 2ipsuml
1 3oremip
1 4sumZAl
2 4sumZAl
2 5oremip
2 6sumlor
2 7emZips
As your sample is a bit synthetic I don't know if that solution fits your real problem.