AWK ouput to separated file - awk

I am trying to split my file based on the patterns that it contains and store them as separate text files with that pattern.
I am using AWK to split my lines by two different separators ("\t" "_") and use $2 as file name and append that line ($0) into related file that is generated.
My command is:
awk 'BEGIN {FS="[_ \t]"}; {fn=$2"_HP100.txt"}; {print $0 >fn}' my_file.txt
Awk can generate text file named after my $2, and also it can give $0 as whole line when I printed separately. But My generated text files are always empty... Am I missing something?
My text file contains:
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
Expected output is CGTCGCCAAACTTAAC_HP100.txt file that contains all the lines with _CGTCGCCAAACTTAAC tag.

You can just use this awk command:
awk -F '[\t_]' '{print $0 >> ($2 "_HP100.txt")}' file

With your shown samples and attempts please try following awk code. Its a combination of awk + sort + cut + awk solution, this will help us in case you have multiple lines where 2nd fields are same and you want to save all same $2 lines in same output file. This is sorting them from $2(by making 2nd field as 1st field to sort it and then using only required/actual portion of input only).
awk -F'_|\t' '{print $2,$0}' Input_file | sort | cut -d' ' -f2- |
awk -F'_|\t' '
prev!=$2{
close(outputFile)
outputFile=$2"_HP100.txt"
}
{
print $0 > (outputFile)
prev=$2
}
'
NOTE: Pass your Input_file to first used awk in above code. Also this code is good for small files.

The approach I have in mind to deal with the 50GB issue is to collect a certain amount each round, then when the input arrays reach a fixed threshold, output the rows according to their file names
instead of nonstop concat-ing into a very very long string - the current approach uses one array for the rows themselves, and one more for mapping filenames to row #s (aka NR) :
collect until reaching threshold (i didn't set one that's up to u)
dump out all data in arrays to files
— also allows for re-creation of original input row order
use print (. . .) >> (fn) instead of >, just in case
use split("", arr) to reset both arrays to a mint state
— it's cleaner and more portable than delete arr statement
host system resource friendly
minimizes concurrent resource usage on the host system by only requiring one single instance of awk
the threshold you set would also constrain RAM usage at any moment to a sane level
avoids the need to use shell loops, xargs, or gnu-parallel
avoids nonstop close(fn)
avoids wasting time performing any amount of data sort-ing
mawk '1;1;1;1;1;1;1' raw_input.txt | shuf |
mawk2 '
BEGIN { OFS = "_HP100.txt"
FS = "^[^_]+_|[ \t].+$"
} { ___[ NR ] = $_
sub(OFS, _, $!(NF = NF))
sub( "$", "=" NR,____[$_])
} END {
OFS = "\f\r\t\t"
for(_ in ___) { print _, ___[_] }
for(_ in ____) { print _,____[_] } } '
1
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
2
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
3
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
4
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
5
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
6
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
7
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
8
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
9
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
10
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
11
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
12
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
13
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
14
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
15
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
16
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
17
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
18
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
19
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
20
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
21
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
22
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
23
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
24
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
25
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
26
K00295:342:HNYVTBBXY:5:1101:1955:1578_AGCGTCTTTCATGCTG 0 9 41603892 1 50M
27
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
28
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
29
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
30
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
31
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
32
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
33
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
34
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
35
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
36
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
37
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
38
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
39
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
40
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
41
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
42
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
43
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
44
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
45
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
46
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
47
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
48
K00295:342:HNYVTBBXY:5:1101:1976:1578_TCATACCAAGTCTCCG 16 9 113429747 42 49M
49
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
50
K00295:342:HNYVTBBXY:5:1101:1773:1578_CGTCGCCATCGCTAGG 0 12 115297976 24 51M
51
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
52
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
53
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
54
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
55
K00295:342:HNYVTBBXY:5:1101:2057:1578_GATGCGTTTTCTGGTT 0 14 68520409 42 50M
56
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
57
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
58
K00295:342:HNYVTBBXY:5:1101:2098:1578_CGTCGCCAAACTTAAC 0 8 94503004 42 50M
59
K00295:342:HNYVTBBXY:5:1101:1996:1578_TCATCGAACCTCGTTG 16 20 21594558 42 51M
60
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
61
K00295:342:HNYVTBBXY:5:1101:1935:1578_GCCTATTCCCTCGTTG 16 18 54707729 42 51M
62
K00295:342:HNYVTBBXY:5:1101:2016:1578_TGGATCAACAGGACCA 0 16 13244975 27 51M
63
K00295:342:HNYVTBBXY:5:1101:1834:1578_TCGAACGACCGTTGCG 16 2 22709262 42 50M
TCATACCAAGTCTCCG_HP100.txt
=5=12=15=23=24=40=48
CGTCGCCATCGCTAGG_HP100.txt
=29=30=34=38=44=46=50
TCATCGAACCTCGTTG_HP100.txt
=9=25=27=31=36=49=59
AGCGTCTTTCATGCTG_HP100.txt
=3=8=14=17=18=22=26
GCCTATTCCCTCGTTG_HP100.txt
=11=37=52=54=56=57=61
GATGCGTTTTCTGGTT_HP100.txt
=6=7=10=20=39=41=55
TCGAACGACCGTTGCG_HP100.txt
=16=19=32=42=43=60=63
TGGATCAACAGGACCA_HP100.txt
=1=13=33=35=45=47=62
CGTCGCCAAACTTAAC_HP100.txt
=2=4=21=28=51=53=58

Related

sorting (ascending & descending) value based on group same date?

I want to sort value (ascending/descending) value based on group same date. can anyone help how to achieve it?
df =
a b c
21-12-30 2 12 21
21-12-30 3 13 22
21-12-30 5 14 23
22-01-30 6 15 24
22-01-30 7 16 25
22-01-30 8 17 26
22-02-28 9 18 27
22-02-28 10 19 28
22-02-28 11 20 29
desired output =
a b c
21-12-30 5 14 23
21-12-30 3 13 22
21-12-30 2 12 21
22-01-30 8 17 26
22-01-30 7 16 25
22-01-30 6 15 24
22-02-28 11 20 29
22-02-28 10 19 28
22-02-28 9 18 27
One option:
out = (df.groupby(level=0, group_keys=False, sort=False)
.apply(lambda x: x.sort_values(by='a', ascending=False))
)
Another:
out = df.sort_values(by='a', ascending=False).sort_index(kind='stable')
output:
a b c
21-12-30 5 14 23
21-12-30 3 13 22
21-12-30 2 12 21
22-01-30 8 17 26
22-01-30 7 16 25
22-01-30 6 15 24
22-02-28 11 20 29
22-02-28 10 19 28
22-02-28 9 18 27

Pandas - cross columns reference

My data is a bit complicated, I separate into 2 sections: (A) Explain data, (B) Desire output
(A) - Explain data:
My data as follow:
comp date adj_date val
0 a 1999-12-31 NaT 50
1 a 2000-01-31 NaT 51
2 a 2000-02-29 NaT 52
3 a 2000-03-31 NaT 53
4 a 2000-04-30 NaT 54
5 a 2000-05-31 NaT 55
6 a 2000-06-30 NaT 56
----------------------------------
7 a 2000-07-31 2000-01-31 57
8 a 2000-08-31 2000-02-29 58
9 a 2000-09-30 2000-03-31 59
10 a 2000-10-31 2000-04-30 60
11 a 2000-11-30 2000-05-31 61
12 a 2000-12-31 2000-06-30 62
13 a 2001-01-31 2000-07-31 63
14 a 2001-02-28 2000-08-31 64
15 a 2001-03-31 2000-09-30 65
16 a 2001-04-30 2000-10-31 66
17 a 2001-05-31 2000-11-30 67
18 a 2001-06-30 2000-12-31 68
----------------------------------
19 a 2001-07-31 2001-01-31 69
20 a 2001-08-31 2001-02-28 70
21 a 2001-09-30 2001-03-31 71
22 a 2001-10-31 2001-04-30 72
23 a 2001-11-30 2001-05-31 73
24 a 2001-12-31 2001-06-30 74
25 a 2002-01-31 2001-07-31 75
26 a 2002-02-28 2001-08-31 76
27 a 2002-03-31 2001-09-30 77
28 a 2002-04-30 2001-10-31 78
29 a 2002-05-31 2001-11-30 79
30 a 2002-06-30 2001-12-31 80
----------------------------------
31 a 2002-07-31 2002-01-31 81
32 a 2002-08-31 2002-02-28 82
33 a 2002-09-30 2002-03-31 83
34 a 2002-10-31 2002-04-30 84
35 a 2002-11-30 2002-05-31 85
36 a 2002-12-31 2002-06-30 86
37 a 2003-01-31 2002-07-31 87
38 a 2003-02-28 2002-08-31 88
39 a 2003-03-31 2002-09-30 89
40 a 2003-04-30 2002-10-31 90
41 a 2003-05-31 2002-11-30 91
42 a 2003-06-30 2002-12-31 92
----------------------------------
date: is the actual date, as end of month.
adj_date = date + MonthEnd(-6)
val: is given value
I want to create new column val_new where:
it is referencing to val of previous year December
val_new is then applied to date as from date.July to date.(year+1).June, Or equivalently in adj_date it is from adj_date.Jan to adj_date.Dec
(B) - Desire Output:
comp date adj_date val val_new
0 a 1999-12-31 NaT 50 NaN
1 a 2000-01-31 NaT 51 NaN
2 a 2000-02-29 NaT 52 NaN
3 a 2000-03-31 NaT 53 NaN
4 a 2000-04-30 NaT 54 NaN
5 a 2000-05-31 NaT 55 NaN
6 a 2000-06-30 NaT 56 NaN
-------------------------------------------
7 a 2000-07-31 2000-01-31 57 50.0
8 a 2000-08-31 2000-02-29 58 50.0
9 a 2000-09-30 2000-03-31 59 50.0
10 a 2000-10-31 2000-04-30 60 50.0
11 a 2000-11-30 2000-05-31 61 50.0
12 a 2000-12-31 2000-06-30 62 50.0
13 a 2001-01-31 2000-07-31 63 50.0
14 a 2001-02-28 2000-08-31 64 50.0
15 a 2001-03-31 2000-09-30 65 50.0
16 a 2001-04-30 2000-10-31 66 50.0
17 a 2001-05-31 2000-11-30 67 50.0
18 a 2001-06-30 2000-12-31 68 50.0
-------------------------------------------
19 a 2001-07-31 2001-01-31 69 62.0
20 a 2001-08-31 2001-02-28 70 62.0
21 a 2001-09-30 2001-03-31 71 62.0
22 a 2001-10-31 2001-04-30 72 62.0
23 a 2001-11-30 2001-05-31 73 62.0
24 a 2001-12-31 2001-06-30 74 62.0
25 a 2002-01-31 2001-07-31 75 62.0
26 a 2002-02-28 2001-08-31 76 62.0
27 a 2002-03-31 2001-09-30 77 62.0
28 a 2002-04-30 2001-10-31 78 62.0
29 a 2002-05-31 2001-11-30 79 62.0
30 a 2002-06-30 2001-12-31 80 62.0
-------------------------------------------
31 a 2002-07-31 2002-01-31 81 74.0
32 a 2002-08-31 2002-02-28 82 74.0
33 a 2002-09-30 2002-03-31 83 74.0
34 a 2002-10-31 2002-04-30 84 74.0
35 a 2002-11-30 2002-05-31 85 74.0
36 a 2002-12-31 2002-06-30 86 74.0
37 a 2003-01-31 2002-07-31 87 74.0
38 a 2003-02-28 2002-08-31 88 74.0
39 a 2003-03-31 2002-09-30 89 74.0
40 a 2003-04-30 2002-10-31 90 74.0
41 a 2003-05-31 2002-11-30 91 74.0
42 a 2003-06-30 2002-12-31 92 74.0
-------------------------------------------
I have two solutions, but both comes at a cost:
Solution 1: to create sub_dec dataframe where we take val of Dec each year. Then merge back to main data. This one works fine, but I don't like this solution because our actual data will involve a lot of merge, and it is not easy and convenient to keep track of all those merges.
Solution 2: (1) I create a lag by shift(7), (2) set other adj_date but Jan to None, (3) then use groupby with ffill. This solution works nicely, but if there is any missing rows, or the date is not continuous, then the entire output is wrong
create adj_year:
data['adj_year'] = data['adj_date'].dt.year
cross referencing by shift(7):
data['val_new'] = data.groupby('comp')['val'].shift(7)
setting other adj_date except Jan to be None:
data.loc[data['adj_date'].dt.month != 1, 'val_new'] = None
using ffill to fill in None by each group of ['comp', 'adj_year']:
data['val_new'] = data.groupby(['comp', 'adj_year'])['val_new'].ffill()
If you have any suggestion to overcome the drawback of Solution 02, or any other new solution is appreciated.
Thank you
You can use Timedelta with correct conversion from seconds to months, according to your needs ,
check these two resources for more info:
https://docs.python.org/3/library/datetime.html
pandas: function equivalent to SQL's datediff()?

How to assign different value to a variable for different if statement

I have a text file with 4 columns. I need to modify the file so that the first sequence of 1's in the 4th column remain 1, but all other values in the 4th column be changed to 0.
I have tried the following awk command with multiple if statements but the variable fat doesn't seem to be updating properly.
`cat sample_data.txt`
72 29 16 0 <br>
73 30 16 0 <br>
74 31 16 0 <br>
75 32 16 1 <br>
76 33 16 1 <br>
77 34 16 1 <br>
78 35 16 0 <br>
79 36 16 0 <br>
80 37 16 0 <br>
81 38 16 0 <br>
82 39 16 0 <br>
83 40 16 0 <br>
84 41 16 0 <br>
85 42 16 0.55 <br>
86 43 16 0.57 <br>
87 44 16 0.41 <br>
88 45 16 0.58 <br>
89 46 16 1 <br>
90 47 16 1 <br>
91 48 16 1 <br>
92 49 16 1 <br>
93 50 16 0.59 <br>
94 51 16 0.52 <br>
95 52 16 0.43 <br>
`awk -v fat=1 '{if($4<1 && fat=1) {print $1,$2,$3,0;} else if($4=1 && fat=1) {fat=2;print $1,$2,$3,1;} else if($4=1 && fat=2) {fat=2;print $1,$2,$3,1;} else if($4<1 && fat=2) {fat=3;print $1,$2,$3,0} else if($4<1 && fat=3) {fat=3;print $1,$2,$3,0;} else if($4=1 && fat=3) {fat=3;print $1,$2,$3,0;}}' sample_data.txt`
I want this output:
72 29 16 0 <br>
73 30 16 0 <br>
74 31 16 0 <br>
75 32 16 1 <br>
76 33 16 1 <br>
77 34 16 1 <br>
78 35 16 0 <br>
79 36 16 0 <br>
80 37 16 0 <br>
81 38 16 0 <br>
82 39 16 0 <br>
83 40 16 0 <br>
84 41 16 0 <br>
85 42 16 0 <br>
86 43 16 0 <br>
87 44 16 0 <br>
88 45 16 0 <br>
89 46 16 0 <br>
90 47 16 0 <br>
91 48 16 0 <br>
92 49 16 0 <br>
93 50 16 0 <br>
94 51 16 0 <br>
95 52 16 0 <br>
wrt your code:
if($4<1 && fat=1)
fat=1 is an assignment, an equivalency test would be fat==1 (double equals) instead.
But anyway, here's how to do what you appear to want with a simple FSM:
$ awk '(state==0) && ($4==1){state=1} (state==1) && ($4!=1){state=2} state==2{$4=0} 1' file
72 29 16 0
73 30 16 0
74 31 16 0
75 32 16 1
76 33 16 1
77 34 16 1
78 35 16 0
79 36 16 0
80 37 16 0
81 38 16 0
82 39 16 0
83 40 16 0
84 41 16 0
85 42 16 0
86 43 16 0
87 44 16 0
88 45 16 0
89 46 16 0
90 47 16 0
91 48 16 0
92 49 16 0
93 50 16 0
94 51 16 0
95 52 16 0
tried on gnu awk:
awk -vf=1 '$4==1&&(f||s){f=0;s=1;$4=0;print;next} {s=0}1' sample_data.txt

How to interpret the log output of docplex optimisation library

I am having a problem interpreting this log that I get after trying to maximise an objective function using docplex:
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
0 0 6.3105 0 10.2106 26
0 0 5.9960 8 Cone: 5 34
0 0 5.8464 5 Cone: 8 47
0 0 5.8030 11 Cone: 10 54
0 0 5.7670 12 Cone: 13 64
0 0 5.7441 13 Cone: 16 72
0 0 5.7044 9 Cone: 19 81
0 0 5.6844 14 5.6844 559
* 0+ 0 4.5362 5.6844 25.31%
0 0 5.5546 15 4.5362 Cuts: 322 1014 22.45%
0 0 5.4738 15 4.5362 Cuts: 38 1108 20.67%
* 0+ 0 4.6021 5.4738 18.94%
0 0 5.4296 16 4.6021 Cuts: 100 1155 17.98%
0 0 5.3779 19 4.6021 Cuts: 34 1204 16.86%
0 0 5.3462 17 4.6021 Cuts: 80 1252 16.17%
0 0 5.3396 19 4.6021 Cuts: 42 1276 16.03%
0 0 5.3364 24 4.6021 Cuts: 57 1325 15.96%
0 0 5.3269 17 4.6021 Cuts: 66 1353 15.75%
0 0 5.3188 20 4.6021 Cuts: 42 1369 15.57%
0 0 5.2975 21 4.6021 Cuts: 62 1387 15.11%
0 0 5.2838 24 4.6021 Cuts: 72 1427 14.81%
0 0 5.2796 21 4.6021 Cuts: 70 1457 14.72%
0 0 5.2762 24 4.6021 Cuts: 73 1471 14.65%
0 0 5.2655 24 4.6021 Cuts: 18 1479 14.42%
* 0+ 0 4.6061 5.2655 14.32%
* 0+ 0 4.6613 5.2655 12.96%
0 0 5.2554 26 4.6613 Cuts: 40 1492 12.75%
0 0 5.2425 27 4.6613 Cuts: 11 1511 12.47%
0 0 5.2360 23 4.6613 Cuts: 3 1518 12.33%
0 0 5.2296 19 4.6613 Cuts: 7 1521 12.19%
0 0 5.2213 18 4.6613 Cuts: 8 1543 12.01%
0 0 5.2163 24 4.6613 Cuts: 15 1552 11.91%
0 0 5.2106 21 4.6613 Cuts: 4 1558 11.78%
0 0 5.2106 21 4.6613 Cuts: 3 1559 11.78%
* 0+ 0 4.6706 5.2106 11.56%
0 2 5.2106 21 4.6706 5.2106 1559 11.56%
Elapsed time = 9.12 sec. (7822.43 ticks, tree = 0.01 MB, solutions = 5)
51 29 4.9031 3 4.6706 5.1575 1828 10.42%
260 147 4.9207 1 4.6706 5.1575 2699 10.42%
498 242 infeasible 4.6706 5.0909 3364 9.00%
712 346 4.7470 6 4.6706 5.0591 4400 8.32%
991 497 4.7338 6 4.6706 5.0480 5704 8.08%
1358 566 4.8085 11 4.6706 5.0005 7569 7.06%
1708 708 4.7638 14 4.6706 4.9579 9781 6.15%
1985 817 cutoff 4.6706 4.9265 11661 5.48%
2399 843 infeasible 4.6706 4.9058 15567 5.04%
3619 887 4.7066 4 4.6706 4.7875 23685 2.50%
Elapsed time = 17.75 sec. (10933.85 ticks, tree = 3.05 MB, solutions = 5)
4623 500 4.6863 13 4.6706 4.7274 35862 1.22%
What I don't understand is the following:
What is the difference between the third (Objective) and fifth column (Best integer )
How come that the third column (Objective) has higher values than the actual solution of the problem given by CPLEX which is (4.6706)
Does the values in the third column take into consideration the constraints given to the optimization problem?
This webpage didn't help me to understand neither, the explanation of Best Integer is really confusing.
Thank you in advance for your feedback.
Regards.
The user manual includes a detailed explanation of this log in section
CPLEX->User's Manual for CPLEX->Discrete Optimization->Solving Mixed Integer Programming Problems (MIP)->Progress Reports: interpreting the node log
(see https://www.ibm.com/support/knowledgecenter/SSSA5P_12.8.0/ilog.odms.cplex.help/CPLEX/UsrMan/topics/discr_optim/mip/para/52_node_log.html)
I suggest to have a look at
in
https://fr.slideshare.net/mobile/IBMOptimization/2013-11-informsminingthenodelog

How to use pd.cut in pandas

Can anyone help me figure out why this isn't working:
ages = ['15-19','20-24','25-29','30-34','35-39','40-44','45-49','50-54','55-59','60-64','65-69','70-74','75-79','80-84']
race['age_group'] = pd.cut(race.Age,range(13,84,5),right=False, labels=ages)
race[['Age','age_group']].head(15)
This is the result I get:
Age age_group
0 31 30-34
1 38 40-44
2 45 45-49
3 30 30-34
4 45 45-49
5 35 35-39
6 32 30-34
7 33 35-39
8 29 30-34
9 42 40-44
10 34 35-39
11 48 50-54
12 35 35-39
13 51 50-54
14 38 40-44
Your "range" is not correct, try:
ages = ['15-19','20-24','25-29','30-34','35-39','40-44','45-49','50-54','55-59','60-64','65-69','70-74','75-79','80-84']
race['age_group'] = pd.cut(race.Age,range(15,86,5),right=False, labels=ages)
race[['Age','age_group']].head(15)