How to extract lines between multiline patterns? - awk

I have a file which looks like:
blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
<empty line here>
Total DOS and NOS and partial (IT) DOSDOWN
<empty line here>
E Total 1
<empty line here>
-1.5000 0.004 0.000 0.004
-1.4953 0.004 0.000 0.004
-1.4906 0.004 0.000 0.004
-1.4859 0.004 0.000 0.004
-1.4812 0.004 0.000 0.004
0.3563 0.708 5.510 0.708
0.3609 0.562 5.513 0.562
0.3656 0.381 5.515 0.381
0.3703 0.149 5.517 0.149
<empty line here>
Sublattice 1 Atom Fe spin DOWN
What I want is to extract all lines between (first pattern)
Total DOS and NOS and partial (IT) DOSUP
<empty line here>
E Total 1
<empty line here>
and (second pattern)
<empty line here>
Sublattice 1 Atom Fe spin DOWN
i.e. I want to get
-1.5000 0.004 0.000 0.004
-1.4953 0.004 0.000 0.004
-1.4906 0.004 0.000 0.004
-1.4859 0.004 0.000 0.004
-1.4812 0.004 0.000 0.004
0.3563 0.708 5.510 0.708
0.3609 0.562 5.513 0.562
0.3656 0.381 5.515 0.381
0.3703 0.149 5.517 0.149
So, at the end of the day I want to have lines between two multiline patterns.
As I understand awk can detect multiline patterns via state machine (see here), but I failed to do it in my case.
Any suggestion how to resolve this problem would be very much appreciated.

Here's a solution based on Ed Morton's trick.
awk -v RS= 'n==2; /Total DOS/ || n {n++;next} {n=0}' input.txt
Here's how this works.
RS= puts awk into multi-line mode, so that records contain blocks of lines.
n==2; prints any record processed while this condition is met.
/RE/ || n is a condition that evaluates to true if EITHER the RE (pattern) is matched within the current record or the variable n is non-zero.
{n++;next} obviously increments n and skips to the next record.
{n=0} And if we haven't already skipped to the next record, we reset n.
The effect of all this is that we print the record that is two records after the one with the matched pattern. You could of course adjust the condition that begins the counter to whatever you like. $2=="Total" for example. Salt to taste.
sh-3.2$ cat input.txt
blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
Total DOS and NOS and partial (IT) DOSUP
E Total 1
-1.5000 0.004 0.000 0.004
-1.4953 0.004 0.000 0.004
-1.4906 0.004 0.000 0.004
....... ..... ..... .....
0.3609 0.562 5.513 0.562
0.3656 0.381 5.515 0.381
0.3703 0.149 5.517 0.149
blah blah blah blah
sh-3.2$ awk -v RS= 'n==2; /Total DOS and NOS/||n{n++;next} {n=0}' input.txt
-1.5000 0.004 0.000 0.004
-1.4953 0.004 0.000 0.004
-1.4906 0.004 0.000 0.004
....... ..... ..... .....
0.3609 0.562 5.513 0.562
0.3656 0.381 5.515 0.381
0.3703 0.149 5.517 0.149

Using sed: sed -n '5,/^$/{/^$/d}'
But that assumes that "multiline starting pattern" is always at the beginning of the file. Otherwise it gets a bit more complicated. Like this:
/Total/{N;N;N}
/Total.*Total/,/^$/{
/Total/d
/^$/d
}
Here I am assuming that 'Total' matches the beginning of multiline pattern, 'Total.*Total' matches the whole pattern. Replace N;N;N with something more complex if there are other patterns that start with first line of you multiline pattern but are shorter than 4 lines.

From your comments it sounds like all you need is:
awk -v RS= '/Total DOS/{tgt=NR+2} NR==tgt' file
If not then edit your question to clarify. Make it NR==tgt{print; exit} if you only want the first matching block in the file output and efficiency is a concern. Change the regexp if necessary to be as much of the Total DOS... line as you need to match to make it unique.
Here it is running against your provided sample input:
$ cat file
blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah
Total DOS and NOS and partial (IT) DOSUP
E Total 1
-1.5000 0.004 0.000 0.004
-1.4953 0.004 0.000 0.004
-1.4906 0.004 0.000 0.004
....... ..... ..... .....
0.3609 0.562 5.513 0.562
0.3656 0.381 5.515 0.381
0.3703 0.149 5.517 0.149
blah blah blah blah
$ awk -v RS= '/Total DOS/{tgt=NR+2} NR==tgt' file
-1.5000 0.004 0.000 0.004
-1.4953 0.004 0.000 0.004
-1.4906 0.004 0.000 0.004
....... ..... ..... .....
0.3609 0.562 5.513 0.562
0.3656 0.381 5.515 0.381
0.3703 0.149 5.517 0.149

Related

Plot secondary x_axis in ggplot

Dear All seniors and members,
Hope you are doing great. I have data set, which I like to plot the secondary x-axis in ggplot. I could not make it to work for the last 4 hours. below is my dataset.
Pathway ES NES p_value q_value Group
1 HALLMARK_HYPOXIA 0.49 2.25 0.000 0.000 Top
2 HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 0.44 2.00 0.000 0.000 Top
3 HALLMARK_UV_RESPONSE_DN 0.45 1.98 0.000 0.000 Top
4 HALLMARK_TGF_BETA_SIGNALING 0.48 1.77 0.003 0.004 Top
5 HALLMARK_HEDGEHOG_SIGNALING 0.52 1.76 0.003 0.003 Top
6 HALLMARK_ESTROGEN_RESPONSE_EARLY 0.38 1.73 0.000 0.004 Top
7 HALLMARK_KRAS_SIGNALING_DN 0.37 1.69 0.000 0.005 Top
8 HALLMARK_INTERFERON_ALPHA_RESPONSE 0.37 1.54 0.009 0.021 Top
9 HALLMARK_TNFA_SIGNALING_VIA_NFKB 0.32 1.45 0.005 0.048 Top
10 HALLMARK_NOTCH_SIGNALING 0.42 1.42 0.070 0.059 Top
11 HALLMARK_COAGULATION 0.32 1.39 0.031 0.067 Top
12 HALLMARK_MITOTIC_SPINDLE 0.30 1.37 0.025 0.078 Top
13 HALLMARK_ANGIOGENESIS 0.40 1.37 0.088 0.074 Top
14 HALLMARK_WNT_BETA_CATENIN_SIGNALING 0.35 1.23 0.173 0.216 Top
15 HALLMARK_OXIDATIVE_PHOSPHORYLATION -0.65 -3.43 0.000 0.000 Bottom
16 HALLMARK_MYC_TARGETS_V1 -0.49 -2.56 0.000 0.000 Bottom
17 HALLMARK_E2F_TARGETS -0.45 -2.37 0.000 0.000 Bottom
18 HALLMARK_DNA_REPAIR -0.46 -2.33 0.000 0.000 Bottom
19 HALLMARK_ADIPOGENESIS -0.42 -2.26 0.000 0.000 Bottom
20 HALLMARK_FATTY_ACID_METABOLISM -0.41 -2.06 0.000 0.000 Bottom
21 HALLMARK_PEROXISOME -0.43 -2.01 0.000 0.000 Bottom
22 HALLMARK_MYC_TARGETS_V2 -0.43 -1.84 0.003 0.001 Bottom
23 HALLMARK_CHOLESTEROL_HOMEOSTASIS -0.42 -1.83 0.003 0.001 Bottom
24 HALLMARK_ALLOGRAFT_REJECTION -0.34 -1.78 0.000 0.003 Bottom
25 HALLMARK_MTORC1_SIGNALING -0.32 -1.67 0.000 0.004 Bottom
26 HALLMARK_P53_PATHWAY -0.29 -1.52 0.000 0.015 Bottom
27 HALLMARK_UV_RESPONSE_UP -0.28 -1.41 0.013 0.036 Bottom
28 HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY -0.35 -1.39 0.057 0.040 Bottom
29 HALLMARK_HEME_METABOLISM -0.26 -1.34 0.014 0.061 Bottom
30 HALLMARK_G2M_CHECKPOINT -0.23 -1.20 0.080 0.172 Bottom
I like to plot like the following plot (plot # 1)
Here is my current codes chunks.
ggplot(data, aes(reorder(Pathway, NES), NES, fill= Group)) +
theme_classic() + geom_col() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 8),
axis.title = element_text(face = "bold", size = 12),
axis.text = element_text(face = "bold", size = 8), plot.title = element_text(hjust = 0.5)) + labs(x="Pathway", y="Normalized Enrichment Score",
title="2Gy_5f vs. 0Gy") + coord_flip()
This code produces the following plot (plot # 2)
So I would like to generate the plot where I have secondary x-axis with q_value (same like the first bar plot I have attached). Any help is greatly appreciated. Note: I used coord_flip so it turn angle of x-axis.
Kind Regards,
synat
[1]: https://i.stack.imgur.com/dBFIS.jpg
[2]: https://i.stack.imgur.com/yDbC5.jpg
Maybe you don't need a secondary axis per se to get the plot style you seek.
library(tidyverse)
ggplot(data, aes(x = NES, y = reorder(Pathway, NES), fill= Group)) +
theme_classic() +
geom_col() +
geom_text(aes(x = 2.5, y = reorder(Pathway, NES), label = q_value), hjust = 0) +
annotate("text", x = 2.5, y = length(data$Pathway) + 1, hjust = 0, fontface = "bold", label = "q_value" ) +
coord_cartesian(xlim = c(NA, 3),
ylim = c(NA, length(data$Pathway) + 1),
clip = "off") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 8),
axis.title = element_text(face = "bold", size = 12),
axis.text = element_text(face = "bold", size = 8),
plot.title = element_text(hjust = 0.5)) +
labs(x="Pathway", y="Normalized Enrichment Score",
title="2Gy_5f vs. 0Gy")
And for future reference you can read in data in the format you pasted like so:
data <- read_table(
"
Pathway ES NES p_value q_value Group
HALLMARK_HYPOXIA 0.49 2.25 0.000 0.000 Top
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 0.44 2.00 0.000 0.000 Top
HALLMARK_UV_RESPONSE_DN 0.45 1.98 0.000 0.000 Top
HALLMARK_TGF_BETA_SIGNALING 0.48 1.77 0.003 0.004 Top
HALLMARK_HEDGEHOG_SIGNALING 0.52 1.76 0.003 0.003 Top
HALLMARK_ESTROGEN_RESPONSE_EARLY 0.38 1.73 0.000 0.004 Top
HALLMARK_KRAS_SIGNALING_DN 0.37 1.69 0.000 0.005 Top
HALLMARK_INTERFERON_ALPHA_RESPONSE 0.37 1.54 0.009 0.021 Top
HALLMARK_TNFA_SIGNALING_VIA_NFKB 0.32 1.45 0.005 0.048 Top
HALLMARK_NOTCH_SIGNALING 0.42 1.42 0.070 0.059 Top
HALLMARK_COAGULATION 0.32 1.39 0.031 0.067 Top
HALLMARK_MITOTIC_SPINDLE 0.30 1.37 0.025 0.078 Top
HALLMARK_ANGIOGENESIS 0.40 1.37 0.088 0.074 Top
HALLMARK_WNT_BETA_CATENIN_SIGNALING 0.35 1.23 0.173 0.216 Top
HALLMARK_OXIDATIVE_PHOSPHORYLATION -0.65 -3.43 0.000 0.000 Bottom
HALLMARK_MYC_TARGETS_V1 -0.49 -2.56 0.000 0.000 Bottom
HALLMARK_E2F_TARGETS -0.45 -2.37 0.000 0.000 Bottom
HALLMARK_DNA_REPAIR -0.46 -2.33 0.000 0.000 Bottom
HALLMARK_ADIPOGENESIS -0.42 -2.26 0.000 0.000 Bottom
HALLMARK_FATTY_ACID_METABOLISM -0.41 -2.06 0.000 0.000 Bottom
HALLMARK_PEROXISOME -0.43 -2.01 0.000 0.000 Bottom
HALLMARK_MYC_TARGETS_V2 -0.43 -1.84 0.003 0.001 Bottom
HALLMARK_CHOLESTEROL_HOMEOSTASIS -0.42 -1.83 0.003 0.001 Bottom
HALLMARK_ALLOGRAFT_REJECTION -0.34 -1.78 0.000 0.003 Bottom
HALLMARK_MTORC1_SIGNALING -0.32 -1.67 0.000 0.004 Bottom
HALLMARK_P53_PATHWAY -0.29 -1.52 0.000 0.015 Bottom
HALLMARK_UV_RESPONSE_UP -0.28 -1.41 0.013 0.036 Bottom
HALLMARK_REACTIVE_OXYGEN_SPECIES_PATHWAY -0.35 -1.39 0.057 0.040 Bottom
HALLMARK_HEME_METABOLISM -0.26 -1.34 0.014 0.061 Bottom
HALLMARK_G2M_CHECKPOINT -0.23 -1.20 0.080 0.172 Bottom")
Created on 2021-11-23 by the reprex package (v2.0.1)

How do I get awk to print fields from the second row of a file?

I have a file that looks like this:
measured 10.8 0.0000 0.0000 0.0236 0.0304 0.0383 0.0433 0.0437 0.0442 0.0452
0.0455 0.0448 0.0440 0.0423 0.0386 0.0344 0.0274 0.0000 0.0000
I want gawk to print all the numbers in one long single column like this:
0.0000
0.0000
0.0236
0.0304
0.0383
0.0433
0.0437
0.0442
0.0452
0.0455
0.0448
0.0440
0.0423
0.0386
0.0344
0.0274
0.0000
0.0000
I run the command gawk '/measured/ { printf $3"\n" $4"\n" $5"\n" $6"\n" $7"\n" $8"\n" $9"\n" $10"\n" $11"\n" $12"\n" $13"\n" $14"\n" $15"\n" $16"\n" $17"\n" $18"\n" }' filename.txt
But I just get the first row of numbers:
0.0000
0.0000
0.0236
0.0304
0.0383
0.0433
0.0437
0.0442
0.0452
How do I get gawk to print the second row?
$ cat tst.awk
BEGIN { OFS = "\n" }
/measured/ { c=2; $1=$2=""; $0=$0 }
c && c-- { $1=$1; print }
$ awk -f tst.awk file
0.0000
0.0000
0.0236
0.0304
0.0383
0.0433
0.0437
0.0442
0.0452
0.0455
0.0448
0.0440
0.0423
0.0386
0.0344
0.0274
0.0000
0.0000
$ grep -A1 measured file | tr -s ' ' \\n | tail -n+4
0.0000
0.0000
0.0236
0.0304
0.0383
0.0433
0.0437
0.0442
0.0452
0.0455
0.0448
0.0440
0.0423
0.0386
0.0344
0.0274
0.0000
0.0000
with awk
$ awk -v OFS='\n' '/measured/ {p=1; for(i=3;i<=NF;i++) print $i; next}
p {$1=$1; print; exit}' file
If the number of fields is guaranteed to be as in the example, you can use the following command:
awk '{for(i=NF-8;i<=NF;i++){print $i}}' input.file
The GNU implementation of Awk allows an arbitrary regular expression as the RS record separator If the keyword measured occurs before each batch of numbers, we can use that keyword as the separator:
$ gawk 'BEGIN { RS = "measured" } { for (i = 1; i <= NF ; i++) print "field " i " = " $i }'
measured 10.8 0.0000 0.0000 0.0236 0.0304 0.0383 0.0433 0.0437 0.0442 0.0452
0.0455 0.0448 0.0440 0.0423 0.0386 0.0344 0.0274 0.0000 0.000
field 1 = 10.8
field 2 = 0.0000
field 3 = 0.0000
field 4 = 0.0236
field 5 = 0.0304
field 6 = 0.0383
field 7 = 0.0433
field 8 = 0.0437
field 9 = 0.0442
field 10 = 0.0452
field 11 = 0.0455
field 12 = 0.0448
field 13 = 0.0440
field 14 = 0.0423
field 15 = 0.0386
field 16 = 0.0344
field 17 = 0.0274
field 18 = 0.0000
field 19 = 0.000
As you can see, all the fields between the measured record separators are parsed out regardless of line breaks. Fields are separated on any mixture of spaces, tabs and newlines.
Note that because measured appears first, we get an empty record. The output you see above is, effectively, from the second record. The first record is the whitespcae before measured, which contains no fields.
In other words, he record separator is really expected to be a terminator, except that it can be missing after the last record.

AWK failing to sum floats

I am trying to sum the last 12 values in a field in a particular csv file, but AWK is failing to correctly sum the values. If I output the data to a new file then run the same AWK statement against the new file it works.
Here are the contents of the original file. The fields are separated by ";"
I want to sum the values in the 3rd field
...$ tail -12 OriginalFile.csv...
02/02/2020 10:30:00;50727.421;0.264;55772.084;0.360;57110.502;0.384
02/02/2020 10:35:00;50727.455;0.408;55772.126;0.504;57110.548;0.552
02/02/2020 10:40:00;50727.489;0.408;55772.168;0.504;57110.593;0.540
02/02/2020 10:45:00;50727.506;0.204;55772.193;0.300;57110.621;0.336
02/02/2020 10:50:00;50727.541;0.420;55772.236;0.516;57110.667;0.552
02/02/2020 10:55:00;50727.566;0.300;55772.269;0.396;57110.703;0.432
02/02/2020 11:00:00;50727.590;0.288;55772.300;0.372;57110.737;0.408
02/02/2020 11:05:00;50727.605;0.180;55772.321;0.252;57110.762;0.300
02/02/2020 11:10:00;50727.621;0.192;55772.344;0.276;57110.786;0.288
02/02/2020 11:15:00;50727.659;0.456;55772.389;0.540;57110.835;0.588
02/02/2020 11:20:00;50727.681;0.264;55772.417;0.336;57110.866;0.372
02/02/2020 11:25:00;50727.704;0.276;55772.448;0.372;57110.900;0.408
I used the following code to print the original value and the summed value of field 3 for each record, but it just returns the same output for the summed value for each line
...$ awk 'BEGIN { FS = ";" } ; { sum += $3 } { print $3, sum }' OriginalFile.csv|tail -12...
0.264 2.00198e+09
0.408 2.00198e+09
0.408 2.00198e+09
0.204 2.00198e+09
0.420 2.00198e+09
0.300 2.00198e+09
0.288 2.00198e+09
0.180 2.00198e+09
0.192 2.00198e+09
0.456 2.00198e+09
0.264 2.00198e+09
0.276 2.00198e+09
If I output the contents of the file into a different file, the same code works as expected
...$ tail -12 OriginalFile.csv > testfile2.csv...
...$ awk 'BEGIN { FS = ";" } ; { sum += $3 } { print $3, sum }' testfile2.csv...
0.264 0.264
0.408 0.672
0.408 1.08
0.204 1.284
0.420 1.704
0.300 2.004
0.288 2.292
0.180 2.472
0.192 2.664
0.456 3.12
0.264 3.384
0.276 3.66
How can I get the correct output from the original file without having to create a new file?
As #Shawn's excellent comment points out, the order in which you pipe in your data is the problem. By the time you reach the 12th line from the end, sum is already 2.00198e+09; adding many small fractions is not significant, so it seems like it is "the same output".
Simply:
tail -12 OriginalFile.csv | awk 'BEGIN { FS = ";" } ; { sum += $3 } { print $3, sum }'

Extract date from date time - change . to , and print sum up of different field

aNumber bNumber startDate cost balanceAfter trafficCase Operator unknown3 MainAmount BALANCEBEFORE
22676239633 433 2014-07-02 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-02 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-02 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-02 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
22665799922 70110055 2014-07-03 10:16:45.000 20,00 0.50 0 Telmob 126260244 20.0000 0.5000
22676239633 433 2014-07-03 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-04 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-04 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-05 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
Here is a sample of the data I have. I want to sum up cost, balanceAfter, MainAmount and BALANCEBEFORE at each time the date changed but my concern is I have date combined with time and my decimal separator is dot instead of comma so my awk script can't perform the operation.
Can I have an AWK script which will first extract only the date so in the end I will have an output looking like:
Date Cost balanceAfter MainAmount BALANCEBEFORE
02/07/2014 2,00 379,3 0 379,3
03/07/2014 20,00 0,7 20 0,7
04/07/2014 2,00 309,6 0 309,6
05/07/2014 0,00 69,5 0 69,5
HERE IS MY AWK SCRIPT
awk -F 'NR==1 {header=$0; next} {a[$3]+=$4 a[$3]+=$5 a[$3]+=$9 a[$3]+=$10} END {for (i in a) {printf "%d\t%d\n", i, a[i]}; tot+=a[i]};' out.txt>output.doc
EDIT: Avoid pre-processing step as per Etan Reisner's suggestion to use $NF to work around differing numbers of tokens in Operator column.
$ cat data.txt
aNumber bNumber startDate cost balanceAfter trafficCase Operator unknown3 MainAmount BALANCEBEFORE
22676239633 433 2014-07-02 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-02 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-02 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-02 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
22665799922 70110055 2014-07-03 10:16:45.000 20,00 0.50 0 Telmob 126260244 20.0000 0.5000
22676239633 433 2014-07-03 10:16:48.000 0,00 0.20 0 Short Code 397224944 0.0000 0.2000
22677277255 76919167 2014-07-04 10:16:51.000 1,00 92.60 0 Airtel 126268625 0.0000 92.6000
22676777508 76701575 2014-07-04 10:16:55.000 1,00 217.00 0 Airtel 4132186103 0.0000 217.0000
22665706841 433 2014-07-05 10:16:57.000 0,00 69.50 0 Short Code 4133821554 0.0000 69.5000
$ cat so2.awk
NR > 1 {
cost = $5;
balanceAfter = $6;
mainAmount = $(NF - 1);
balanceBefore = $NF;
sub(",", ".", cost);
sub(",", ".", balanceAfter);
sub(",", ".", mainAmount);
sub(",", ".", balanceBefore);
dateCost[$3] += cost;
dateBalanceAfter[$3] += balanceAfter;
dateMainAmount[$3] += mainAmount;
dateBalanceBefore[$3] += balanceBefore;
}
END {
printf("%s\t%s\t%s\t%s\t%s\n", "Date", "Cost", "BalanceAfter", "MainAmount", "BalanceBefore");
for (i in dateCost) {
printf("%s\t%f\t%f\t%f\t%f\n", i, dateCost[i], dateBalanceAfter[i], dateMainAmount[i], dateBalanceBefore[i]);
}
}
$ awk -f so2.awk data.txt
Date Cost BalanceAfter MainAmount BalanceBefore
2014-07-02 2.000000 379.300000 0.000000 379.300000
2014-07-03 20.000000 0.700000 20.000000 0.700000
2014-07-04 2.000000 309.600000 0.000000 309.600000
2014-07-05 0.000000 69.500000 0.000000 69.500000
This requires no pre-processing of the file:
awk '
BEGIN {print "Date Cost BalanceAfter MainAmount BalanceBefore"}
NR == 1 {next}
function showday() {
printf "%s\t%.2f\t%.1f\t%d\t%.1f\n", date, cost, bAfter, main, bBefore
}
date != $3 {
if (date) showday()
date = $3
cost = bAfter = main = bBefore = 0
}
{
sub(/,/, ".", $5)
cost += $5
bAfter += $6
main += $(NF-1)
bBefore += $NF
}
END {showday()}
' file | column -t
Date Cost BalanceAfter MainAmount BalanceBefore
2014-07-02 2.00 379.3 0 379.3
2014-07-03 20.00 0.7 20 0.7
2014-07-04 2.00 309.6 0 309.6
2014-07-05 0.00 69.5 0 69.5

Copy lines from one file and paste it to other n times

I have two files such as the following:
file1
t=10
HELLO
AAAAAA
BBBBBB
CCCCCC
DDDDDD
END
t=20
HELLO
EEEEEE
FFFFFF
GGGGGG
HHHHHH
END
file2
HELLO
AAAAAA
BBBBBB
CCCCCC
DDDDDD
111111
222222
333333
END
HELLO
EEEEEE
FFFFFF
GGGGGG
HHHHHH
444444
555555
666666
END
Is it possible to copy the t=10 and t=20 which are over of HELLO and paste them to the exact location at file2 making it like
t=10
HELLO
AAAAAA
BBBBBB
CCCCCC
DDDDDD
111111
222222
333333
END
t=20
HELLO
EEEEEE
FFFFFF
GGGGGG
HHHHHH
444444
555555
666666
END
Of course my files are not so small and imagine that I would like to do this over 100000 times in a file
With the help of other members of the community I created this script but it doesn't give the right result
for frame in $(seq 1 1 2)
do
add=$(awk '/t=/{i++}i=='$frame' {print; exit}' $file1)
awk -v var="$add" 'NR>1 && NR%9==0 {print var} {print $0}' $file2
done
Please if anyone can help my I could appreciate it.
Thanks in advance
You can try following awk script. It reads file1 and saves each line before the HELLO one in an indexed array and extract each position of it when it finds again the line HELLO in the second file:
awk '
NR == 1 { prev_line = $0 }
FNR == NR {
if ( $1 == "HELLO" ) {
hash[ i++ ] = prev_line
}
prev_line = $0
next
}
$1 == "HELLO" {
printf "%s\n", hash[ j++ ]
}
{ print }
' file1 file2
It yields:
t=10
HELLO
AAAAAA
BBBBBB
CCCCCC
DDDDDD
111111
222222
333333
END
t=20
HELLO
EEEEEE
FFFFFF
GGGGGG
HHHHHH
444444
555555
666666
END
awk 'BEGIN{FS="\n";RS="END\n"}
NR==FNR{for(i=2;i<=NF;i++) a[$1]=a[$1]==""?$i:a[$1] FS $i;next}
{for (i in a) {if ($0~a[i]) printf i ORS $0 RS}
}' file1 file2
Result:
t=10
HELLO
AAAAAA
BBBBBB
CCCCCC
DDDDDD
111111
222222
333333
END
t=20
HELLO
EEEEEE
FFFFFF
GGGGGG
HHHHHH
444444
555555
666666
END