how to repeat line numbers in a file separated by an empty line [closed] - line-numbers

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to restart the counting with 1 where ever there is an emptyline in my file.
My file is like this:
cat test.txt
333|111|222|333|
222|111|333|222|
I am using cat test.txt | sed 's/|/\n/g' | nl
output:
1 333
2 111
3 222
4 333
5 222
6 111
7 333
What i want is that after empty line the counting again start from 1.
desired output:
1 333
2 111
3 222
4 333
1 222
2 111
3 333
please help?

This task is quite trivial in 'awk', or one of its clones gawk, nawk.
Create a script linerecount.awk with content:
/^$/ { lineno=0; next}
{ print ++lineno, $0}
First line obviously resets the linenumber and skips to next line.
Second line increments linenumber, prints it, prints line.
Apply with gawk -f linerecount.awk to whatever you want.
Input:
$ cat input.txt
First line
Second line
Third line
Second part, First line
Second line
Third line
Final part, First line
Second line
Output:
$ gawk -f linerecount.awk < input.txt
1 First line
2 Second line
3 Third line
1 Second part, First line
2 Second line
3 Third line
1 Final part, First line
2 Second line

Related

I want to use awk to delete every nth line starting at line 27

I have a large text file, it has a header that has 27 lines. I want to keep the header in the new file, but then on line 28 where the data starts I only want to keep every 10th line.
So my new file will look just like my original file where it will include 27 lines of header, but instead of all of the data being there only every 10th line will be included in the new file.
Ive been trying use 'awk', ive also tried 'sed' i can get it to give me every 10th line, but i can't get it to include the lines of header, i.e. start on line 28 and then give every 10th line
Could you please try following.
awk 'FNR<=27{print;next} ++count%10==0{print;count=""}' Input_file
Or in case you want to make it more generic, so adding variables here, one could change variables values here and get as per need results too.
awk -v lines_till_print="27" -v nth_line_print="10" '
FNR<=lines_till_print{
print
next
}
++count%nth_line_print==0{
print
count=""
}
' Input_file
To print every 3rd line after a 5 line header:
$ seq 20 | awk -v h=5 -v r=3 'NR<=h || !((NR-h)%r)'
1
2
3
4
5
8
11
14
17
20
or if you want the first line after the 5 line header printed:
$ seq 20 | awk -v h=6 -v r=3 'NR<=h || !((NR-h)%r)'
1
2
3
4
5
6
9
12
15
18
or if you prefer:
$ seq 20 | awk -v h=5 -v r=3 'NR<=h || ((NR-h)%r)==1'
1
2
3
4
5
6
9
12
15
18
Change the variable values to suit.

script to read a file with many columns into a file with one column [duplicate]

This question already has answers here:
extract words from a file
(5 answers)
Closed 7 years ago.
For example if I have a file as follow:
1 2 3 4 5 6
7 8 9 10 11 12
And I want to reorganize this file as:
1
2
3
4
5
6
7
8
9
10
11
12
Can I use the awk command for that or not?
There are multiple ways to achieve this.
With grep:
grep -oE "[0-9]+" file
The -o flag prints only the matching patterns (the digits in this case), delimited by newline
-E activates extended regular expressions.
With awk:
awk 'OFS="\n"{$1=$1}1' file
OFS defines the output field separator.
$1=$1 because we changed the OFS, we need to rebuild the line by setting the first field to itself, this will force the rebuild.
1 at least we need a true condition that the line is printed.
With sed:
TMP$ sed -r 's/ +/\n/g' File
1
2
3
4
5
6
7
8
9
10
11
12
Replace all continuous spaces with newline.
The naive AWK approach:
#!/usr/bin/awk -f
{ for (i = 1; i <= NF; i++) print $i; }
Chaos's approach is probably more efficient.

Lookup and Replace with two files in awk

I am trying to correct one file with another with a single line of AWK code. I am trying to take $1 from FILE2, look it up in FILE1, get the corresponding $3 and $4. After I set them as variables I want the program to stop evaluating FILE1, change $10 and $11 from FILE2 to the values of the variables, and print this out.
I am having trouble getting the awk to switch from FILE1 to FILE2 after I have extracted the variables. I've tried nextfile, but this resets the program and it tires to extract variables from FILE2, I set NR to the last Record, but it did not switch.
I am also doing a loop to get each line out of FILE1, but if that can be part of the script I am sure it would speed things up not having to reopen awk over and over again.
here is the parts I have figured out.
for file in `cut -f 1 FILE2`; do
awk -v a=$file '$1=a{s=$2;q=$4; ---GO TO FILE1---}{if ($1==a) {$10=s; $11=q; print 0;exit}' FILE1 FILE2 >> FILEOUT
done
a quick example set NOTE: Despite how I have this written, the two files are not in the same order and on the order of 8GB in size, so a little unwieldy to sort.
FILE1
A 12345 + AJD$JD
B 12504 + DKFJ#%
C 52042 + DSJTJE
FILE2
A 2 3 4 5 6 7 8 9 345 D$J
B 2 3 4 5 6 7 8 9 250 KFJ
C 2 3 4 5 6 7 8 9 204 SJT
OUTFILE
A 2 3 4 5 6 7 8 9 12345 AJD$JD
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE
This is the code I got to work based on Kent's answer below.
awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{$9=$9" "a[$1]}{$10="";$11=""}2' f1 f2
try this one-liner:
kent$ awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{NF-=2;$0=$0" "a[$1]}7' f1 f2
A 2 3 4 5 6 7 8 9 12345 AJD$JD
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE
No need to loop over the files repeatedly - just read one file and store the relevant fields in arrays keyed on $1, then go through the other file and use those arrays to look up the values you want to insert.
awk '(FILENAME=="FILE1"){y[$1]=$2;z[$1]=$4}; (FILENAME=="FILE2" && $1 in y){$10=y[$1];$11=z[$1];print $0}' FILE1 FILE2
That said, it sounds like you might have a use for the join command here rather than messing about with awk (the above script assumes all your $1/$2/$4 values will fit in memory).

extracting data from a file with awk

I have a data set like below
first 0 1
first 1 2
first 2 3
second 0 1
second 1 2
second 2 3
third 0 1
third 1 2
third 2 3
I need to check this file and extract the third columns for first, second and third and store them in different files.
The output files should contain:
1
2
3
This is pretty straight forward awk '{print $3>$1}' file i.e. print the third field and redirect the output to the file, where the filename is the first field.
Demo:
$ ls
file
$ awk '{print $3>$1}' file
$ ls
file first second third
$ cat first
1
2
3
$ cat second
1
2
3
$ cat third
1
2
3

How to Add Column with Percentage

I would like to calculate percentage of value in each line out of all lines and add it as another column.
Input (delimiter is \t):
1 10
2 10
3 20
4 40
Desired output with added third column showing calculated percentage based on values in second column:
1 10 12.50
2 10 12.50
3 20 25.00
4 40 50.00
I have tried to do it myself, but when I calculated total for all lines I didn't know how to preserve rest of line unchanged. Thanks a lot for help!
Here you go, one pass step awk solution -
awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
[jaypal:~/Temp] cat file
1 10
2 10
3 20
4 40
[jaypal:~/Temp] awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1 10 12.5
2 10 12.5
3 20 25
4 40 50
Update: If tab is a required in output then just set the OFS variable to "\t".
[jaypal:~/Temp] awk -v OFS="\t" 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1 10 12.5
2 10 12.5
3 20 25
4 40 50
Breakout of pattern {action} statements:
The first pattern is NR==FNR. FNR is awk's in-built variable that keeps track of number of records (by default separated by a new line) in a given file. So FNR in our case would be 4. NR is similar to FNR but it does not get reset to 0. It continues to grow on. So NR in our case would be 8.
This pattern will be true only for the first 4 records and thats exactly what we want. After perusing through the 4 records, we are assign the total to a variable a. Notice that we did not initialize it. In awk we don't have to. However, this would break if entire column 2 is 0. So you can handle it by putting an if statement in the second action statement i.e do the division only if a > 0 else say division by 0 or something.
next is needed cause we don't really want second pattern {action} statement to execute. next tells awk to stop further actions and move to the next record.
Once the four records are parsed, the next pattern{action} begins, which is pretty straight forward. Doing the percentage and print column 1 and 2 along with percentage next to them.
Note: As #lhf mentioned in the comment, this one-liner will only work as long as you have the data set in a file. It won't work if you pass data through a pipe.
In the comments, there is a discussion going on ways to make this awk one-liner take input from a pipe instead of a file. Well the only way I could think of was to store the column values in array and then using for loop to spit each value out along with their percentage.
Now arrays in awk are associative and are never in order, i.e pulling the values out of arrays will not be in the same order as they went in. So if that is ok then the following one-liner should work.
[jaypal:~/Temp] cat file
1 10
2 10
3 20
4 40
[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}'
2 10 12.5
3 20 25
4 40 50
1 10 12.5
To get them in order, you can pipe the result to sort.
[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}' | sort -n
1 10 12.5
2 10 12.5
3 20 25
4 40 50
You can do it in a couple of passes
#!/bin/bash
total=$(awk '{total=total+$2}END{print total}' file)
awk -v total=$total '{ printf ("%s\t%s\t%.2f\n", $1, $2, ($2/total)*100)}' file
You need to escape it as %%. For instance:
printf("%s\t%s\t%s%%\n", $1, $2, $3)
Perhaps there is better way but I would pass file twice.
Content of 'infile':
1 10
2 10
3 20
4 40
Content of 'script.awk':
BEGIN {
## Tab as field separator.
FS = "\t";
}
## First pass of input file. Get total from second field.
ARGIND == 1 {
total += $2;
next;
}
## Second pass of input file. Print each original line and percentage as third field.
{
printf( "%s\t%2.2f\n", $0, $2 * 100 / total );
}
Run the script in my linux box:
gawk -f script.awk infile infile
And result:
1 10 12.50
2 10 12.50
3 20 25.00
4 40 50.00