I would like to create bins to get histogram with totals and percentage, e.g. starting from 0.
If possible to set the minimum and maximum value in the bins ( in my case value min=0 and max=20 )
Input file
8 5
10 1
11 4
12 4
12 4
13 5
16 7
18 9
16 9
17 7
18 5
19 5
20 1
21 7
output desired
0 0 0.0%
0 - 2 0 0.0%
2 - 4 0 0.0%
4 - 6 0 0.0%
6 - 8 0 0.0%
8 - 10 5 6.8%
10 - 12 5 6.8%
12 - 14 13 17.8%
14 - 16 0 0.0%
16 - 18 23 31.5%
18 - 20 19 26.0%
> 20 8 11.0%
---------------------
Total: 73
I use this code from Mr Ed Morton, it works perfectly but the percentage is missed.
awk 'BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%0.1f %0.1f %d\n", beg, end, cnt[bucketNr]
beg = end
}
}' file
Thanks in advance
Your expected output doesn't seem to correspond to your sample input data, but try this variation of that awk code in your question (Intended to be put in an executable file to run as a script, not a a one-liner due to size):
#!/usr/bin/awk -f
BEGIN { delta = (delta == "" ? 2 : delta) }
{
bucketNr = int(($0+delta) / delta)
cnt[bucketNr]++
max[bucketNr] = max[bucketNr] < $2 ? $2 : max[bucketNr]
sum += $2
numBuckets = (numBuckets > bucketNr ? numBuckets : bucketNr)
}
END {
for (bucketNr=1; bucketNr<=numBuckets; bucketNr++) {
end = beg + delta
printf "%d-%d %d %.1f\n", beg, end, max[bucketNr],
(cnt[bucketNr] / NR) * 100
beg = end
}
print "-------------"
print "Total " sum
}
It adds tracking the maximum of the second column for each bin the first column falls in, and prints out a percentage instead of a count of how many rows were in each bin. Plus some tweaks to the output format to better match your desired output.
price.txt file has two columns: (name and value)
Mary 134
Lucy 56
Jack 88
range.txt file has three columns: (fruit and min_value and max_value)
apple 57 136
banana 62 258
orange 88 99
blueberry 98 121
My aim is to test whether the value in price.txt file is between the min_value and max_value in range.txt. If yes, putout 1, If not, output "x".
I tried:
awk 'FNR == NR { name=$1; price[name]=$2; next} {
for (name in price) {
if ($2<=price[name] && $3>=price[name]) {print 1} else {print "x"}
}
}' price.txt range.txt
But my results are all in one column, just like follows:
1
1
x
x
x
x
x
x
1
1
1
x
Actually, I want my result to be like: (Each name has one column)
1 x 1
1 x 1
x x 1
x x x
Because I need to use paste to add the output file and range.txt file together. The final result should be like:
apple 57 136 1 x 1
banana 62 258 1 x 1
orange 88 99 x x 1
blueberry 98 121 x x x
So, how can I get the result of each loop in different columns? And is there anyway to output the final result without paste based on my current code? Thank you.
This builds on what you provided,
# load prices by index to maintain read order
FNR == NR {
price[names++]=$2
next
}
# save max index to avoid using non-standard length(array)
END {
names=NR
}
{
l = $1 " " $2 " " $3
for (i=0; i < names; i++) {
if ($2 <= price[i] && $3 >= price[i]) {
l = l " 1"
} else {
l = l " x"
}
}
print l
}
and generates output,
apple 57 136 1 x 1
banana 62 258 1 x 1
orange 88 99 x x 1
blueberry 98 121 x x x
However, you don't have the person name for the score (anonymous results) - maybe that's intentional?
The change here is to explicitly index array populated in first block to maintain order.
Question: Why does it seem that date_list[d] and isin_list[i] are not getting populated, in the code segment below?
AWK Code (on GNU-AWK on a Win-7 machine)
BEGIN { FS = "," } # This SEBI data set has comma-separated fields (NSE snapshots are pipe-separated)
# UPDATE the lists for DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5).
( $17~/_EQ\>/ ) {
if (date[$10]++ == 0) date_list[d++] = $10; # Dates appear in order in raw data
if (isin[$9]++ == 0) isin_list[i++] = $9; # ISINs appear out of order in raw data
print $10, date[$10], $9, isin[$9], date_list[d], d, isin_list[i], i
}
input data
49290,C198962542782200306,6/30/2003,433581,F5811773991200306,S5405611832200306,B5086397478200306,NESTLE INDIA LTD.,INE239A01016,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,591.13,5655,3342840.15,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49291,C198962542782200306,6/30/2003,433563,F6292896459200306,S6344227311200306,B6110521493200306,GRASIM INDUSTRIES LTD.,INE047A01013,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,495.33,3700,1832721,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49292,C198962542782200306,6/30/2003,433681,F6513202607200306,S1724027402200306,B6372023178200306,HDFC BANK LTD,INE040A01018,6/26/2003,1,E745964372424200306,REG_DL_STLD_02,242,2600,629200,REG_DL_INSTR_EQ,REG_DL_DLAY_D,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49293,C7885768925200306,6/30/2003,48128,F4406661052200306,S7376401565200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,44600,5575000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49294,C7885768925200306,6/30/2003,48129,F4500260787200306,S1312094035200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,4,E912851176274200306,REG_DL_STLD_04,125,445600,55700000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49295,C7885768925200306,6/30/2003,48130,F6425024637200306,S2872499118200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,48000,6000000,REG_DL_INSTR_EU,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
output that I am getting
6/27/2003 1 INE239A01016 1 1 1
6/27/2003 2 INE047A01013 1 1 2
6/26/2003 1 INE040A01018 1 2 3
6/28/2003 1 INE585B01010 1 3 4
6/28/2003 2 INE585B01010 2 3 4
Expected output
As far as I can tell, the print is printing out correctly (i) $10 (the date) (ii) date[$10), the count for each date (iii) $9 (firm-ID called ISIN) (iv) isin[$9], the count for each ISIN (v) d (index of date_list, the number of unique dates) and (vi) i (index of isin_list, the number of unique ISINs). I should also get two more columns -- columns 5 and 7 below -- for date_list[d] and isin_list[i], which will have values that look like $10 and $9.
6/27/2003 1 INE239A01016 1 6/27/2003 1 INE239A01016 1
6/27/2003 2 INE047A01013 1 6/27/2003 1 INE047A01013 2
6/26/2003 1 INE040A01018 1 6/26/2003 2 INE040A01018 3
6/28/2003 1 INE585B01010 1 6/28/2003 3 INE585B01010 4
6/28/2003 2 INE585B01010 2 6/28/2003 3 INE585B01010 4
actual code I now use is
{ if (date[$10]++ == 0) date_list[d++] = $10;
if (isin[$9]++ == 0) isin_list[i++] = $9;}
( $11~/1|2|3|5|9|1[24]/ )) { ++BNR[$10,$9,$12,$5]}
END { { for (u = 0; u < d; u++)
{for (v = 0; v < i; v++)
{ if (BNR[date_list[u],isin_list[v]]>0)
BR=BNR[date_list[u],isin_list[v]]
{ print(date_list[u], isin_list[v], BR}}}}}
Thanks a lot to everyone.
how can I read only lines: 3,9,12, 15 from the file containing the ff lines.
The idea is whenever I get x and y , I wanted to print the last line among lines containing x and y.
What I meant is , for example , if I have awk script like : BEGIN { name = $2; value=$3; } { if(name == x && value==y && the scan reaches at lines 3, 9, 12 and 15) printf("hello world") }. what expression can I use instead of "the scan reaches at lines 3, 9 12 and 15"
1 x y
2 x y
3 x y
4 a d
5 e f
6 x y
7 x y
8 x y
9 x y
10 g f
11 x y
12 x y
13 p r
14 w c
15 x y
16 a z
One way with awk:
$ awk '/^[0-9]+ x y$/{a=$0;f=1;next}f{print a;f=0}' file
3 x y
9 x y
12 x y
15 x y
One way without awk:
$ tac file | uniq -f1 | fgrep -w 'x y' | tac
3 x y
9 x y
12 x y
15 x y
Some like this?
awk 'a=="xy" && $2$3!="xy" {print b} {a=$2$3;b=$0}' file
3 x y
9 x y
12 x y
15 x y
You need to use two while loops here one to check the line and another to iterate. Something like this. Hope that helps
String line = "";
int i = 0;
try {
BufferedReader in = new BufferedReader(new FileReader("D:\\readline.txt"));
while ((line = in.readLine()) != null) {
i++;
if (line.charAt(0) == 'x' && line.charAt(2) == 'y') {
System.out.println("Line containg Y and Y");
String searchline = line;
while ((line = in.readLine()) != null) { //Iterate untill you find the last line of X and Y
i++; //To keep count of the line read
if (line.charAt(0) == 'x' && line.charAt(2) == 'y') {
searchline = line;
continue;
} else {
break;
}
}
System.out.println("Printing the line ::" + (i - 1) + ":: containing X and Y::::::::" + searchline);
}
}
} catch (Exception e) {
System.out.println("Exception Caught::::");
}
}
I am trying to transpose a really long file and I am concerned that it will not be transposed entirely.
My data looks something like this:
Thisisalongstring12345678 1 AB abc 937 4.320194
Thisisalongstring12345678 1 AB efg 549 0.767828
Thisisalongstring12345678 1 AB hi 346 -4.903441
Thisisalongstring12345678 1 AB jk 193 7.317946
I want my data to look like this:
Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678
1 1 1 1
AB AB AB AB
abc efg hi jk
937 549 346 193
4.320194 0.767828 -4.903441 7.317946
Would the length of the first string prove to be an issue? My file is much longer than this approx 2000 lines long. Also is it possible to change the name of the first string to Thisis234, and then transpose?
I don't see why it will not be - unless you don't have enough memory. Try the below and see if you run into problems.
Input:
$ cat inf.txt
a b c d
1 2 3 4
. , + -
A B C D
Awk program:
$ cat mkt.sh
awk '
{
for(c = 1; c <= NF; c++) {
a[c, NR] = $c
}
if(max_nf < NF) {
max_nf = NF
}
}
END {
for(r = 1; r <= NR; r++) {
for(c = 1; c <= max_nf; c++) {
printf("%s ", a[r, c])
}
print ""
}
}
' inf.txt
Run:
$ ./mkt.sh
a 1 . A
b 2 , B
c 3 + C
d 4 - D
Credits:
http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_12.html#SEC121
Hope this helps.
This can be done with the rs BSD command:
http://www.unix.com/man-page/freebsd/1/rs/
Check out the -T option.
I tried icyrock.com's answer, but found that I had to change:
for(r = 1; r <= NR; r++) {
for(c = 1; c <= max_nf; c++) {
to
for(r = 1; r <= max_nf; r++) {
for(c = 1; c <= NR; c++) {
to get the NR columns and max_nf rows. So icyrock's code becomes:
$ cat mkt.sh
awk '
{
for(c = 1; c <= NF; c++) {
a[c, NR] = $c
}
if(max_nf < NF) {
max_nf = NF
}
}
END {
for(r = 1; r <= max_nf; r++) {
for(c = 1; c <= NR; c++) {
printf("%s ", a[r, c])
}
print ""
}
}
' inf.txt
If you don't do that and use an asymmetrical input, like:
a b c d
1 2 3 4
. , + -
You get:
a 1 .
b 2 ,
c 3 +
i.e. still 3 rows and 4 columns (the last of which is blank).
For # ScubaFishi and # icyrock code:
"if (max_nf < NF)" seems unnecessary. I deleted it, and the code works just fine.