Why can't find column from original relation's alias unless its existence - apache-pig

I have this relation :
S10 = FOREACH E1EEFRE GENERATE
CD_SI AS CD_SI,
IDT_ETT_CTR AS IDT_ETT_CTR,
CD_EFS AS CD_EFS,
(BigDecimal) null AS MT_DSP,
(BigDecimal) null AS MT_NAL,
(BigDecimal) null AS MT_ENC_MOY,
(BigDecimal) null AS MT_UTI,
(BigDecimal) null AS MT_ITT_M,
MT_ENMO AS MT_ENMO;
and
S5 = FOREACH E1EEFAU GENERATE
CD_SI AS CD_SI,
IDT_ETT_CTR AS IDT_ETT_CTR,
CD_EFS AS CD_EFS,
MT_DSP AS MT_DSP,
MT_NAL AS MT_NAL,
MT_ENC_MOY AS MT_ENC_MOY,
MT_ENC_FIN_PER AS MT_UTI,
'EEFAU' AS CD_ETT_ORI,
MT_DSP AS MT_DSP_CVE,
MT_NAL AS MT_NAL_CVE,
(BigDecimal) null AS MT_ENC_MOY_CVE,
MT_IMP AS MT_IMP,
MT_PROR AS MT_PROR,
MT_DEM AS MT_DEM,
(BigDecimal) null AS MT_ITT_M;
Now I want to generate the final entity where MT_ENC_EFF and MT_NAL_LIG depends on S5 and S10 :
S26 = UNION S19, S22, S21;
S27 = FOREACH S26 GENERATE
'$CD_TY_TT' AS CD_TY_TT,
'$DA_TT' AS DA_TT,
'$A_ARR' AS A_ARR,
'$M_ARR' AS M_ARR,
'$CD_ETS' AS CD_ETS,
$0 AS CD_SI,
$1 AS IDT_ETT_CTR,
$2 AS CD_EFS,
$3 AS MT_DSP,
$4 AS MT_NAL,
$5 AS MT_ENC_MOY,
S10::MT_ENMO + S5::MT_ENC_MOY AS MT_ENC_EFF,
$6 AS MT_IMP,
$7 AS MT_PROR,
$8 AS MT_DEM,
$9 AS MT_ITT_M,
(S6::IDT_ETT_CTR_LIG == '' ? (S6::MT_NAL_BIL + S6::MT_AUT) :99) AS MT_NAL_LIG;
STORE S27 INTO '$PathDataWorkingFile' USING CSVExcelStorage(',', 'YES_MULTILINE');
The error shown is:
Invalid field projection. Projected field [S10::MT_ENMO] does not
exist.
But MT_ENMO exist in reality !
When I changed S10.MT_ENMO instead of S10::MT_ENMO
I get thi esrror in Hadoop Application manager :
xecException: ERROR 0: Scalar has more than one row in the output. 1st
: (001,1708104234,01,,,,,,,,,,,,,,,,,,,,,,,,,,0.0), 2nd
:(001,1715803812,01,,,,,,,,,,,,,,,,,,,,,,,,,,0.0) (common cause:
"JOIN" then "FOREACH ... GENERATE foo.bar" should be "foo::bar" ) at
org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:122) at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:326)
Edit: This is the output of S26
001,DQ0017751107,29,0.0,246327.35,0.0,,162234.16,0.0,0.0,0.0,,ECRFI,0.0,246327.35,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067947801,29,0.0,25217.33,0.0,,20433.19,0.0,0.0,0.0,,ECRFI,0.0,25217.33,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067947802,29,0.0,16666.67,0.0,,13496.64,0.0,0.0,0.0,,ECRFI,0.0,16666.67,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067947803,29,0.0,-16666.67,0.0,,-13496.64,0.0,0.0,0.0,,ECRFI,0.0,-16666.67,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067947804,29,0.0,25217.33,0.0,,21156.29,0.0,0.0,0.0,,ECRFI,0.0,25217.33,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067947805,29,0.0,16666.67,0.0,,13638.92,0.0,0.0,0.0,,ECRFI,0.0,16666.67,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067947806,29,0.0,-16666.67,0.0,,-13638.92,0.0,0.0,0.0,,ECRFI,0.0,-16666.67,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067947901,29,0.0,961900.0,0.0,,667228.77,0.0,0.0,0.0,,ECRFI,0.0,961900.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067948001,29,0.0,6250000.0,0.0,,4669082.64,0.0,0.0,0.0,,ECRFI,0.0,6250000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
001,DQ0067948101,29,0.0,1730000.0,0.0,,1314314.02,0.0,0.0,0.0,,ECRFI,0.0,1730000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,
Note
The full Pig script is here:
How can I resolve this please ?

The issue is that MT_ENMO is not projected in any of the relations besides S10 and your final relation S27.
S26 consists of three relations S23, S22, and S20. None of these consist of MT_ENMO.
S20 consists of six relations including S16. S16 also doesn't contain MT_ENMO.
S16 is projecting data from S10, so you should add MT_ENMO here to start with. Then in S27, you should be able to get the field by projecting S16::MT_ENMO.
Example:
S16 = FOREACH S16_JOIN_LEFT GENERATE
S10::CD_SI AS CD_SI,
S10::IDT_ETT_CTR AS IDT_ETT_CTR,
S10::CD_EFS AS CD_EFS,
S10::MT_DSP AS MT_DSP,
S10::MT_NAL AS MT_NAL,
S10::MT_ENC_MOY AS MT_ENC_MOY,
S10::MT_UTI AS MT_UTI,
S10::MT_CAP_RST_DU AS MT_CAP_RST_DU,
S10::MT_ITT_CRU AS MT_ITT_CRU,
S10::MT_CAP_ECN_IMP AS MT_CAP_ECN_IMP,
S10::MT_ITT_IMP AS MT_ITT_IMP,
S10::MT_DNR_ECN AS MT_DNR_ECN,
S10::CD_ETT_ORI AS CD_ETT_ORI,
S10::MT_DSP_CVE AS MT_DSP_CVE,
S10::MT_NAL_CVE AS MT_NAL_CVE,
S10::MT_ENC_MOY_CVE AS MT_ENC_MOY_CVE,
S10::MT_CAP_IMP_CVE AS MT_CAP_IMP_CVE,
S10::MT_ITT_IMP_CVE AS MT_ITT_IMP_CVE,
S10::MT_GLB_IMP AS MT_GLB_IMP,
S10::MT_GLB_IMP_CVE AS MT_GLB_IMP_CVE,
S10::MT_BN_INST AS MT_BN_INST,
S10::MT_BN_INST_CVE AS MT_BN_INST_CVE,
S10::MT_BN_NV AS MT_BN_NV,
S10::MT_BN_NV_CVE AS MT_BN_NV_CVE,
S10::MT_IMP AS MT_IMP,
S10::MT_PROR AS MT_PROR,
S10::MT_DEM AS MT_DEM,
S10::MT_ITT_M AS MT_ITT_M,
S10::MT_ENMO AS MT_ENMO;

Related

Something like OR operator in CLIPS (CLIPS Rule Based Programming Language)

I need help with this program, I couldn't find out how to do that a user can input a yes/no for characters and I have defined a type of animal-like mammal - for mammal applies that milk character must be yes and legs(number of legs) can be 2 or 4 and another character can be yes or no like for example -
mammal - milk yes, legs 2 or 4, but backbone can yes or no, predator yes or no .... but I don't know how to do it (OR condition or idk something like that), user can input one of these its find out the type of animal which is defined in effects, thanks for help :)
;*********** DEFTEMPLATE ***********;
(deftemplate animal_type
(slot type (type SYMBOL) (allowed-symbols mammal bird fish))
(slot milk (type SYMBOL) (allowed-symbols yes no))
(slot feathers (type SYMBOL) (allowed-symbols yes no))
(slot fins (type SYMBOL) (allowed-symbols yes no))
(slot backbone (type SYMBOL) (allowed-symbols yes no))
(slot fly (type SYMBOL) (allowed-symbols yes no))
(slot predator (type SYMBOL) (allowed-symbols yes no))
(multislot legs (type INTEGER) (allowed-integers 0 2 4 6 8))
)
(deftemplate finding_type
(slot milk (type SYMBOL) (allowed-symbols yes no))
(slot feathers (type SYMBOL) (allowed-symbols yes no))
(slot fins (type SYMBOL) (allowed-symbols yes no))
(slot backbone (type SYMBOL) (allowed-symbols yes no))
(slot fly (type SYMBOL) (allowed-symbols yes no))
(slot predator (type SYMBOL) (allowed-symbols yes no))
(multislot legs (type INTEGER) (allowed-integers 0 2 4 6 8))
)
;*********** DEFFACTS ***********;
(deffacts characters_type
(animal_type (typ mammal) (milk yes) (legs 2 4))
(animal_type (typ bird) (milk no) (feathers yes) (fly yes))
(animal_type (typ fish) (milk no) (feathers no) (fins yes) (legs 0))
)     
;something like this (animal_type (typ mammal) (milk yes) (backbone or(yes no)) ... (legs 2 4))
(deffacts temp_fact
(next_search)
)
(defrule input_characters
?gone<-(next_search)
=> (retract ?gone)
(printout t " " crlf)
(printout t "Enter - yes/no, legs - 0/2/4/6/8" crlf)
(printout t "==================================================================" crlf)
(printout t "Milk:")
(bind ?o1 (read))
(printout t "Feathers:")
(bind ?o2 (read))
(printout t "Fins:")
(bind ?o3 (read))
(printout t "Backbone:")
(bind ?o4 (read))
(printout t "Fly:")
(bind ?o5 (read))
(printout t "Predator:")
(bind ?o6 (read))
(printout t "Legs:")
(bind ?o7 (read))
(assert (finding_type (milk ?o1) (feathers ?o2) (fins ?o3) (backbone ?o4) (fly ?o5) (predator ?o6) (legs ?o7)) )
)
(defrule find_out_type
(finding_type (milk ?o1) (feathers ?o2) (fins ?o3) (backbone ?o4) (fly ?o5) (predator ?o6) (legs ?o7))
(animal_type (type ?type) (milk ?o1) (feathers ?o2) (fins ?o3) (backbone ?o4) (fly ?o5) (predator ?o6) (legs ?o7))
=>
(printout t " " crlf)
(printout t "Type of animal is: " ?type crlf)
)
(defrule not_found
(finding_type (milk ?o1) (feathers ?o2) (fins ?o3) (backbone ?o4) (fly ?o5) (predator ?o6) (legs ?o7))
(not (animal_type (type ?type) (milk ?o1) (feathers ?o2) (fins ?o3) (backbone ?o4) (fly ?o5) (predator ?o6) (legs ?o7)) )
=>
(printout t " " crlf)
(printout t "Nothing found!" crlf)
)
(defrule cancel (declare (salience -10))
?gone<-(finding_type (milk ?o1) (feathers ?o2) (fins ?o3) (backbone ?o4) (fly ?o5) (predator ?o6) (legs ?o7))
=>
(retract ?gone)
)
Modify the animal_type pattern in your rules to use multifield wildcards to match extraneous values to the left and right of the specified number of legs. As your rules are currently the animal_type patterns will only be matched by facts containing exactly one value in the legs slot.
(animal_type (type ?type)
(milk ?o1)
(feathers ?o2)
(fins ?o3)
(backbone ?o4)
(fly ?o5)
(predator ?o6)
(legs $? ?o7 $?))

velocity template loop through array to create string

I'm trying to use the velocity templating language in an AWS appsync resolver to create a string by looping through an array of characters.
Given the array listOfWords = ["好" "克力"] how would I achieve the string output of queryString = "+\"好\" +\"克力\""
So far I have managed something like this:
24: #set($listOfWords = ["好" "克力"])
25: #set($queryString = "")
26: #foreach($word in $listOfWords)
27: #if( $velocityCount == 1 )
28: #set($queryString = "+\"$word\"")
29: #else
30: #set($queryString = $queryString +"+\"$word\"")
31: #end
32: #end
This returns the error:
Encountered \"$word\" at velocity[line 28, column 37]\nWas expecting one of:\n <RPAREN> ...\n <WHITESPACE> ...\n \"-\" ...\n \"+\" ...\n \"*\" ...\n \"/\" ...\n \"%\" ...\n <LOGICAL_AND> ...\n <LOGICAL_OR> ...\n <LOGICAL_LT> ...\n <LOGICAL_LE> ...\n <LOGICAL_GT> ...\n <LOGICAL_GE> ...\n <LOGICAL_EQUALS> ...\n <LOGICAL_NOT_EQUALS> ...\n
I have also tried
#foreach( $word in $listOfWords )
#if( $velocityCount == 1 )
#set($queryString = "+" + "\\" + "\"" + $word + "\\" + "\"") line 27
#else
#set($queryString = $queryString + "+" + "\\" + "\"" + $word + "\\" + "\"")
#end
#end
)
But seem to be causing a lexical error:
"Lexical error, Encountered: \"\\\"\" (34), after : \"\\\\\\\\\" at *unset*[line 27, column 64]"
You can do that:
#set($listOfWords = ["好" "克力"])
#set($q = '"')
#set($queryString = "")
#foreach($word in $listOfWords)
#if( $velocityCount == 1 )
#set($queryString = "$q$word$q")
#else
#set($queryString = "$queryString+$q$word$q")
#end
#end
Rather than building up a VTL variable, you could just build up the string as the output directly. Kind of like this example from the VTL docs;
<ul>
#foreach( $product in $allProducts )
<li>$product</li>
#end
</ul>

Apply a calculation on columns (awk)

I tried to make this calculation from the column 41 to the end of the line:
awk '{ { split($10,a,":") } { split( a[4], b ,",") } {print b[1]+b[2]}}' filename
I know how to do this on just one column, but when I tried to do a loop it fails :
awk '{for (i=10;i<=NF;i++) {split($i,a,":")} {split(a[4],b,",")} {print ( b[1]+b[2])}}' filename
The aim is to split each columns and to do the sum of those numbers :
./.:0:.,.,.:0,0:0,0
Here is what my file looks like :
Contig POS ID REF ALT QUAL FILTER INFO FORMAT S155 S158 S168 S173 S175 S178 S180 S188 S189 S191 S193 S194 S196 S201 S205 S206 S208 S209 S210
NODE_14985_length_2800_cov_1.38384 67 999978 A C . PASS Ty=SNP;Rk=1;UL=19;UR=31;CL=.;CR=.;Genome=A;Sd=1 GT:DP:PL:AD:HQ ./.:8:.,.,.:8,0:71,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0 ./.:0:.,.,.:0,0:0,0
Here is my actual output:
awk '{for (i=10;i<=NF;i++) {split($i,a,":")} {split(a[4],b,",")} {print b[1]+b[2]}}' file.vcf | head
0
0
0
0
0
I want a matrix of the calcul for each columns :
0 0 0 0
1 2 0 6
2 0 0 8
...
Thank you in advance for your help
changed the printf adding a print at the end (your printf need, at least, a space to separate results of the line)
based on your sample change the 41 to a number lower than 28 (there is only 28 field for awk in this dataset)
your different split are AFTER the loop, they must be IN the loop scope (see where are the brace)
Modified code:
awk 'NR > 1 {
for( i=41; i<=NF; i++) {
split( $i, a, ":" )
#print NF ":" i "[" $i "] a[4]:" a[4]
split( a[4], b, ",")
#print i ": " b[1] " + " b[2] " : " b[1] + b[2]
printf( "%d ", b[1] + b[2])
}
print ""
}' YourFile

Optimize Pig request

I want to execute a pig command in embedded java program. For moment, I try Pig in local mode. My data file size is around 15MB but the execution of this command is very long so I think my script need optimizations...
My script :
A = LOAD 'data' USING PigPrismeLoader('data.xml');
filter_response_time_less_than_1_s = FILTER A BY (response_time < 1000.0);
filter_response_time_between_1_s_and_2_s = FILTER A BY (response_time >= 1000.0 AND response_time < 1999.0);
filter_response_time_between_greater_than_2_s = FILTER A BY (response_time >= 2000.0);
star__zne_asfo_access_log = FOREACH ( COGROUP A BY (date_day,url,date_minute,ret_code,serveur), filter_response_time_between_greater_than_2_s BY (date_day,url,date_minute,ret_code,serveur), filter_response_time_less_than_1_s BY (date_day,url,date_minute,ret_code,serveur), filter_response_time_between_1_s_and_2_s BY (date_day,url,date_minute,ret_code,serveur) )
{
GENERATE
FLATTEN(group) AS (date_day,zne_asfo_url,date_minute,zne_http_code,zne_asfo_server),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymd = FOREACH ( COGROUP A BY (date_day,date_year,date_month), filter_response_time_between_greater_than_2_s BY (date_day,date_year,date_month), filter_response_time_less_than_1_s BY (date_day,date_year,date_month), filter_response_time_between_1_s_and_2_s BY (date_day,date_year,date_month) )
{
GENERATE
FLATTEN(group) AS (date_day,date_year,date_month),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymd_ret_url = FOREACH ( COGROUP A BY (date_day,url,date_year,date_month), filter_response_time_between_greater_than_2_s BY (date_day,url,date_year,date_month), filter_response_time_less_than_1_s BY (date_day,url,date_year,date_month), filter_response_time_between_1_s_and_2_s BY (date_day,url,date_year,date_month) )
{
GENERATE
FLATTEN(group) AS (date_day,zne_asfo_url,date_year,date_month),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymd_ret_code = FOREACH ( COGROUP A BY (date_day,ret_code,date_year,date_month), filter_response_time_between_greater_than_2_s BY (date_day,ret_code,date_year,date_month), filter_response_time_less_than_1_s BY (date_day,ret_code,date_year,date_month), filter_response_time_between_1_s_and_2_s BY (date_day,ret_code,date_year,date_month) )
{
GENERATE
FLATTEN(group) AS (date_day,zne_http_code,date_year,date_month),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymd_ret_url_server = FOREACH ( COGROUP A BY (date_day,url,date_year,date_month,serveur), filter_response_time_between_greater_than_2_s BY (date_day,url,date_year,date_month,serveur), filter_response_time_less_than_1_s BY (date_day,url,date_year,date_month,serveur), filter_response_time_between_1_s_and_2_s BY (date_day,url,date_year,date_month,serveur) )
{
GENERATE
FLATTEN(group) AS (date_day,zne_asfo_url,date_year,date_month,zne_asfo_server),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymd_ret_code_server = FOREACH ( COGROUP A BY (date_day,ret_code,date_year,date_month,serveur), filter_response_time_between_greater_than_2_s BY (date_day,ret_code,date_year,date_month,serveur), filter_response_time_less_than_1_s BY (date_day,ret_code,date_year,date_month,serveur), filter_response_time_between_1_s_and_2_s BY (date_day,ret_code,date_year,date_month,serveur) )
{
GENERATE
FLATTEN(group) AS (date_day,zne_http_code,date_year,date_month,zne_asfo_server),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymdi_server = FOREACH ( COGROUP A BY (date_day,date_minute,date_year,date_month,serveur), filter_response_time_between_greater_than_2_s BY (date_day,date_minute,date_year,date_month,serveur), filter_response_time_less_than_1_s BY (date_day,date_minute,date_year,date_month,serveur), filter_response_time_between_1_s_and_2_s BY (date_day,date_minute,date_year,date_month,serveur) )
{
GENERATE
FLATTEN(group) AS (date_day,date_minute,date_year,date_month,zne_asfo_server),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymdhi_url = FOREACH ( COGROUP A BY (date_day,url,date_minute,date_year,date_month), filter_response_time_between_greater_than_2_s BY (date_day,url,date_minute,date_year,date_month), filter_response_time_less_than_1_s BY (date_day,url,date_minute,date_year,date_month), filter_response_time_between_1_s_and_2_s BY (date_day,url,date_minute,date_year,date_month) )
{
GENERATE
FLATTEN(group) AS (date_day,zne_asfo_url,date_minute,date_year,date_month),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
agg__zne_asfo_access_log_ymdhi = FOREACH ( COGROUP A BY (date_day,date_minute,date_year,date_month), filter_response_time_between_greater_than_2_s BY (date_day,date_minute,date_year,date_month), filter_response_time_less_than_1_s BY (date_day,date_minute,date_year,date_month), filter_response_time_between_1_s_and_2_s BY (date_day,date_minute,date_year,date_month) )
{
GENERATE
FLATTEN(group) AS (date_day,date_minute,date_year,date_month),
(long)SUM((bag{tuple(long)})A.response_time) AS response_time,
COUNT(filter_response_time_less_than_1_s) AS response_time_less_than_1_s,
COUNT(filter_response_time_between_1_s_and_2_s) AS response_time_between_1_s_and_2_s,
COUNT(filter_response_time_between_greater_than_2_s) AS response_time_between_greater_than_2_s,
COUNT(A) AS nb_hit;
};
STORE star__zne_asfo_access_log INTO 'star__zne_asfo_access_log' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymd INTO 'agg__zne_asfo_access_log_ymd' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymd_ret_url INTO 'agg__zne_asfo_access_log_ymd_ret_url' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymd_ret_code INTO 'agg__zne_asfo_access_log_ymd_ret_code' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymd_ret_url_server INTO 'agg__zne_asfo_access_log_ymd_ret_url_server' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymd_ret_code_server INTO 'agg__zne_asfo_access_log_ymd_ret_code_server' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymdi_server INTO 'agg__zne_asfo_access_log_ymdi_server' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymdhi_url INTO 'agg__zne_asfo_access_log_ymdhi_url' USING PigStorage('\t', '-schema');
STORE agg__zne_asfo_access_log_ymdhi INTO 'agg__zne_asfo_access_log_ymdhi' USING PigStorage('\t', '-schema');
Any ideas ?
Your script might need optimization, but as said in the comments, this is a tiny speck of data for Hadoop.
Hadoop does not perform well for such small data (even upto Gigabytes).
This is because Hadoop, designed to process massive amounts of data, involves a complex processing framework which takes time to setup. If you consider a large dataset, this setup time is negligible, but if your working with 15MB of data, setting up the framework would take much longer than actually processing that data.

Cannot get scroll bar to work PyGTK treeview

First let me start off by saying I read on here a lot and I have googled around for the answer but have not been able to turn one up.
Basically I am trying to add a scroll bar to add to my vbox window. I know its simply something I am not understanding. Here is the code (please ignore the mysql statements, I haven't gotten around to cleaning them up yet
#!/usr/bin/python
import pygtk
pygtk.require('2.0')
import gtk
import os
import sys
import MySQLdb
from Tkinter import *
database_connection = MySQLdb.connect('localhost', 'root', '', 'nmap');
cursor = database_connection.cursor()
class Table_GUI:
cells = {}
columns = {}
sort_order = gtk.SORT_ASCENDING
def delete_event(self, widget, event, data=None):
return False
def destroy(self, widget, data=None):
gtk.main_quit()
def __init__(self):
# create a new window
self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)
self.window.set_geometry_hints(min_width=400, min_height=200)
self.window.connect("delete_event", self.delete_event)
self.window.connect("destroy", self.destroy)
self.vbox = gtk.VBox(False, 0)
self.window.add(self.vbox)
self.vbox.show()
self.vbox.pack_start(self.scrolledwindow)
self.frm_table = gtk.Frame()
self.frm_table.set_shadow_type(gtk.SHADOW_NONE)
self.frm_table.modify_bg(gtk.STATE_NORMAL, gtk.gdk.color_parse('#fff'))
self.show_Table(oper_sys, ip_get)
self.frm_table.show()
self.vbox.pack_start(self.frm_table, True, True, 0)
self.window.show()
def show_Table(self, search_os, search_ip):
### Create the table
# List of items to display which represent IP, OS, DNS, Port number and Port description
self.liststore = gtk.ListStore(str, str, str, str, str)
if search_os != "" and search_ip !="":
#Set up the queries. If the user has activted the checkbox, we need to include the ports in the query
if ports_check == 1:
pass
#Otherwise just return the relevent data
else:
pass
elif search_os != "" and search_ip == "":
if ports_check == 1:
pass
else:
pass
elif search_os =="" and search_ip != "":
if ports_check == 1:
pass
else:
pass
#get the results and prepare to put them inside of lists
fetch_results = cursor.fetchall()
host_name_list = []
operating_list = []
ip_list = []
ports = []
#The element chosen to append to each list based on the order of retrieval in the mysql query
for individual_result in fetch_results:
ip_list.append(individual_result[0])
operating_list.append(individual_result[1])
host_name_list.append(individual_result[2])
if ports_check == 1:
ports.append(individual_result[3])
#we are going to add blanks to the files in order to help readability
#when putting this into the chart
cleaned_host =[]
cleaned_ip = []
cleaned_os_list = []
index_counter = 0
#this loop will check to see if the entry already exists in the cleaned variables. If it does, it 'omitts' them by inserting a blank line
while index_counter < len(host_name_list):
if host_name_list[index_counter] in cleaned_host:
#print "found a duplicate in HOST....OMITTING"
cleaned_host.append("")
else:
#print "adding ", host_name_list[index_counter]
cleaned_host.append(host_name_list[index_counter])
if operating_list[index_counter] in cleaned_os_list and ip_list[index_counter] in cleaned_ip:
#print "found a duplicate in OPERATING....OMITTING"
cleaned_os_list.append("")
else:
#print "adding ", operating_list[index_counter]
cleaned_os_list.append(operating_list[index_counter])
if ip_list[index_counter] in cleaned_ip:
#print "Found a duplicate in IP.... OMITTING "
cleaned_ip.append("")
else:
#print "adding ", ip_list[index_counter]
cleaned_ip.append(ip_list[index_counter])
index_counter +=1
#this section appends to the list store depending on whether the user wants to see the ports or not
counter = 0
for single_result in fetch_results:
if ports_check == 1:
self.liststore.append(
[ cleaned_ip[counter], cleaned_os_list[counter], cleaned_host[counter], single_result[4], single_result[3] ]
)
else:
self.liststore.append(
[ single_result[0], single_result[1], single_result[2], "" , "" ]
)
counter +=1
# Treeview
self.treeview = gtk.TreeView()
self.treeview.set_property("fixed-height-mode", False)
#For some reason I can't get the scrolled window to work...
self.scrolledwindow = gtk.ScrolledWindow()
self.scrolledwindow.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
# Columns
self.newColumn("IP Address", 0)
self.newColumn("Operating System", 1)
self.newColumn("Hostname",2)
if ports_check == 1:
self.newColumn("Ports", 3)
self.newColumn("Protocol name", 4)
self.treeview.set_model(self.liststore)
self.treeview.set_headers_clickable(True)
self.frm_table.add(self.treeview)
self.treeview.show()
#this function allows for the sorting of the columns
#Given the way this works with the ports_check == 1, this will confuse the output
def on_column_clicked(self, tc, user_data):
self.liststore.set_sort_column_id(user_data, self.sort_order)
if self.sort_order == gtk.SORT_ASCENDING:
self.sort_order = gtk.SORT_DESCENDING
else:
self.sort_order = gtk.SORT_ASCENDING
tc.set_sort_order(self.sort_order)
def newColumn(self, title, index):
self.cells[index] = gtk.CellRendererText()
self.cells[index].set_property('cell-background-gdk', gtk.gdk.color_parse("#FFF"))
self.columns[index] = gtk.TreeViewColumn(title, self.cells[index], text=index)
self.columns[index].set_resizable(True)
self.columns[index].set_reorderable(True)
self.columns[index].set_sort_indicator(True)
if(index == 0) :
self.columns[index].set_min_width(130)
if (index == 1) :
self.columns[index].set_min_width(300)
if (index == 2) :
self.columns[index].set_min_width(200)
self.columns[index].connect("clicked", self.on_column_clicked, index)
self.treeview.insert_column(self.columns[index], -1)
# The main function
def main(self):
gtk.main()
class createUI:
def pushButton(self, parent):
global ports_check, oper_sys, ip_get
ports_check = display_ports.get()
oper_sys = OS.get()
ip_get = IP.get()
gui = Table_GUI()
gui.main()
def __init__(self, parent):
#export variables from this class so they are available in other classes
global OS, IP, counter, display_ports
self.panel1 = Frame(parent.title("main window"))
self.panel1.pack()
self.frame1 = Frame(self.panel1)
#in the first frame, create the directions and the input boxes
self.OS_label = Message(self.frame1, text="Search by operating system", justify=LEFT, width=180).pack(pady=2)
OS = Entry(self.frame1)
OS.pack()
OS.focus_set()
self.IP_label = Message(self.frame1, text="Search by IP", justify=LEFT, width=180).pack(pady=3)
IP = Entry(self.frame1)
IP.pack(pady=14, padx=60)
self.frame1.pack()
self.frame5 = Frame(self.panel1)
#set the variables used by the checkboxes to an IntVar so that they can be evaluated as off or on
display_ports = IntVar()
ports_checkbutton = Checkbutton(self.frame5, text='Display Ports', onvalue = 1, offvalue = 0, variable=display_ports, width=10)
ports_checkbutton.pack(side=LEFT)
self.frame5.pack()
self.frame6 = Frame(self.panel1)
#lambda was used so that the button does not execute the addToDB class before click. addToDB requires an argument and self.database_button didn't work
self.database_button = Button(self.frame6, text="Get results!")
self.database_button.pack()
self.database_button.configure(command=lambda btn = self.database_button: self.pushButton(btn))
self.quit_button = Button(self.frame6, text="Get me outta here", command=self.panel1.quit).pack()
self.frame6.pack()
if __name__ == "__main__":
root = Tk()
ui = createUI(root)
ui.panel1.mainloop()
You will see the following in there:
# For some reason I can't get the scrolled window to work...
self.scrolledwindow = gtk.ScrolledWindow()
self.scrolledwindow.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
When I have attempted to attach it via
self.scrolledwindow.add_with_viewport(self.treeview)
I get the message
GtkWarning: Attempting to add a widget with type GtkTreeView to a container of type GtkFrame, but the widget is already inside a container of type GtkViewport
I went to the FAQ page but I really didn't understand what it was telling me.
If I try to set in on self.window I get:
GtkWarning: Can't set a parent on a toplevel widget
self.scrolledwindow.add_with_viewport(self.window)
That one is pretty self explanatory
If I try to add it to vbox I get:
GtkWarning: IA__gtk_scrolled_window_add_with_viewport: assertion `child->parent == NULL' failed[/quote]
I am not looking for a freebie but I am hoping that someone could help to point me in the right direction. (as you can see I am not coming with a "can you make me a program that does X" scenario)
--->New GTK only interface below<----
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pygtk
import gtk
import os
import sys
import MySQLdb
database_connection = MySQLdb.connect('localhost', 'root', '', 'nmap');
cursor = database_connection.cursor()
#---------------------------------------------------------------
class Application(gtk.Window):
cells = {}
columns = {}
sort_order = gtk.SORT_ASCENDING
####################
def __init__(self):
gtk.Window.__init__( self )
self.set_title("Netowrk Scanner")
self.create_widgets()
self.connect_signals()
#self.window.show_all()
self.show_all()
gtk.main()
#####################
def create_widgets(self):
#Ask the user to search by operating system
self.vbox = gtk.VBox(spacing=10)
self.operating_system_label_hbox_1 = gtk.HBox(spacing=10)
self.label = gtk.Label("Search by Operating System :")
self.operating_system_label_hbox_1.pack_start(self.label)
#Set a check box so the user can choose to display ports
self.ports_hbox_8 = gtk.HBox(spacing=10)
self.ports_check = gtk.CheckButton("Display Ports")
self.ports_hbox_8.pack_start(self.ports_check)
self.halign_ports = gtk.Alignment(0,1,1,0)
self.halign_ports.add(self.ports_hbox_8)
self.os_entry_hbox_2 = gtk.HBox(spacing=10)
self.OS = gtk.Entry()
self.os_entry_hbox_2.pack_start(self.OS)
self.hostname_label_hbox_3 = gtk.HBox(spacing=10)
self.label = gtk.Label("Search by Hostname:")
self.hostname_label_hbox_3.pack_start(self.label)
self.hostname_entry_hbox_4 = gtk.HBox(spacing=10)
self.HOSTNAME = gtk.Entry()
self.hostname_entry_hbox_4.pack_start(self.HOSTNAME)
self.ip_label_hbox_5 = gtk.HBox(spacing=10)
self.label = gtk.Label("Search by IP:")
self.ip_label_hbox_5.pack_start(self.label)
self.ip_entry_hbox_6 = gtk.HBox(spacing=10)
self.IP = gtk.Entry()
self.ip_entry_hbox_6.pack_start(self.IP)
self.buttons_hbox_7 = gtk.HBox(spacing=10)
self.button_ok = gtk.Button("Get Results!")
self.buttons_hbox_7.pack_start(self.button_ok)
self.button_exit = gtk.Button("Get me Outta Here!")
self.buttons_hbox_7.pack_start(self.button_exit)
#The order in which you pack_start a widget is the order in which it is displayed on the screen
self.vbox.pack_start(self.operating_system_label_hbox_1)
self.vbox.pack_start(self.os_entry_hbox_2)
self.vbox.pack_start(self.hostname_label_hbox_3)
self.vbox.pack_start(self.hostname_entry_hbox_4)
self.vbox.pack_start(self.ip_label_hbox_5)
self.vbox.pack_start(self.ip_entry_hbox_6)
self.vbox.pack_start(self.halign_ports, False, False, 3)
self.vbox.pack_start(self.buttons_hbox_7)
self.add(self.vbox)
##########################
def connect_signals(self):
#Have the buttons start 'listening' for user interaction
self.button_ok.connect("clicked", self.button_click)
self.button_exit.connect("clicked", self.exit_program)
def button_click(self, clicked):
#This function gets the values of the input boxes as well as the check box
#And then passes them to the show_table function so it can get the correct results from the database
global ports_check, os, ip, hostname
os = self.OS.get_text()
ip = self.IP.get_text()
hostname = self.HOSTNAME.get_text()
ports_check = self.ports_check.get_active()
self.frm_table = gtk.Window()
self.frm_table.set_title("Network scan results")
#Change the background to white instead of grey
self.frm_table.modify_bg(gtk.STATE_NORMAL, gtk.gdk.color_parse('#fff'))
self.frm_table.show()
self.show_Table(os, ip, hostname)
############################
def show_Table(self, search_os, search_ip, search_hostname):
### Create the table
# List of items to display which represent IP, OS, DNS, Port number and Port description
self.liststore = gtk.ListStore(str, str, str, str, str)
#If the user is running a search on the hostname run these queries
if search_hostname != "":
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE DNS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % (search_hostname))
#Otherwise just return the relevent data
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE DNS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % (search_hostname))
#If the user has specified the IP and the OS to search, run this query
if search_os != "" and search_ip !="":
#Set up the queries. If the user has activated the checkbox, we need to include the ports in the query
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' and Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % (search_os, search_ip))
#Otherwise just return the relevent data
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' and Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % (search_os, search_ip))
#If the user has specified an OS but not an IP run this
elif search_os != "" and search_ip == "":
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % search_os)
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % search_os)
#If the user has specified an IP but not an OS run this
elif search_os =="" and search_ip != "":
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % search_ip)
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % search_ip)
#get the results and prepare to put them inside of lists
fetch_results = cursor.fetchall()
host_name_list = []
operating_list = []
ip_list = []
ports = []
#The element chosen to append to each list based on the order of retrieval in the mysql query
for individual_result in fetch_results:
ip_list.append(individual_result[0])
operating_list.append(individual_result[1])
host_name_list.append(individual_result[2])
if ports_check == True:
ports.append(individual_result[3])
#we are going to add blanks to the files in order to help readability
#when putting this into the chart
cleaned_host =[]
cleaned_ip = []
cleaned_os_list = []
index_counter = 0
#this loop will check to see if the entry already exists in the cleaned variables. If it does, it 'omitts' them by inserting a blank line
while index_counter < len(host_name_list):
if host_name_list[index_counter] in cleaned_host:
#print "found a duplicate in HOST....OMITTING"
cleaned_host.append("")
else:
#print "adding ", host_name_list[index_counter]
cleaned_host.append(host_name_list[index_counter])
if operating_list[index_counter] in cleaned_os_list and ip_list[index_counter] in cleaned_ip:
#print "found a duplicate in OPERATING....OMITTING"
cleaned_os_list.append("")
else:
#print "adding ", operating_list[index_counter]
cleaned_os_list.append(operating_list[index_counter])
if ip_list[index_counter] in cleaned_ip:
#print "Found a duplicate in IP.... OMITTING "
cleaned_ip.append("")
else:
#print "adding ", ip_list[index_counter]
cleaned_ip.append(ip_list[index_counter])
index_counter +=1
#this section appends to the list store depending on whether the user wants to see the ports or not
counter = 0
for single_result in fetch_results:
if ports_check == True:
self.liststore.append(
[ cleaned_ip[counter], cleaned_os_list[counter], cleaned_host[counter], single_result[4], single_result[3] ]
)
else:
self.liststore.append(
[ single_result[0], single_result[1], single_result[2], "" , "" ]
)
counter +=1
# Treeview
self.treeview = gtk.TreeView()
#In lieu of getting the scroll bar to work, force a max height requirement on creation
self.treeview.set_property('height-request', 600)
# Columns
self.newColumn("IP Address", 0)
self.newColumn("Operating System", 1)
self.newColumn("Hostname",2)
#I only want the ports columns to show if the user requests it because this calls different mysql queries
if ports_check == True:
self.newColumn("Ports", 3)
self.newColumn("Protocol name", 4)
#put the liststore inside of the tree view
self.treeview.set_model(self.liststore)
self.treeview.set_headers_clickable(True)
#Still can't get the scroll bar to work properly so leaving them commented out
#self.scrolledwindow = gtk.ScrolledWindow()
#self.scrolledwindow.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
#add the tree view to the frm_table
self.frm_table.add(self.treeview)
self.treeview.show()
##########################
def on_column_clicked(self, tc, user_data):
#This function allows the columns to be resorted upon click
self.liststore.set_sort_column_id(user_data, self.sort_order)
if self.sort_order == gtk.SORT_ASCENDING:
self.sort_order = gtk.SORT_DESCENDING
else:
self.sort_order = gtk.SORT_ASCENDING
tc.set_sort_order(self.sort_order)
###########################
def newColumn(self, title, index):
#This function cleans up code because I want these options on all of the columns
#This code block would have to be duplicated for each column otherwise...
self.cells[index] = gtk.CellRendererText()
self.cells[index].set_property('cell-background-gdk', gtk.gdk.color_parse("#FFF"))
self.columns[index] = gtk.TreeViewColumn(title, self.cells[index], text=index)
self.columns[index].set_resizable(True)
self.columns[index].set_reorderable(True)
self.columns[index].set_sort_indicator(True)
if(index == 0) :
self.columns[index].set_min_width(130)
if (index == 1) :
self.columns[index].set_min_width(300)
if (index == 2) :
self.columns[index].set_min_width(200)
self.columns[index].connect("clicked", self.on_column_clicked, index)
self.treeview.insert_column(self.columns[index], -1)
# The main function
########################
def exit_program(self, widget, callback_data=None):
gtk.main_quit()
#---------------------------------------------
if __name__ == "__main__":
app = Application()
database_connection.commit()
cursor.close()
database_connection.close()
--->end new interface<---
I am still having the same problems with the scroll bars. I suspect it is the way I am adding widgets to containers.
NOTE: I removed mysql statements from first GUI to save post space
The only difference with my working code is that I do NOT use "add_with_viewport" :
swH = gtk.ScrolledWindow()
swH.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
swH.add(treeviewH)
It works fine. Please try, and tell me if it's OK. If not, I will digg further.
EDIT
Try this sample, adapt it with your DB data (I cannot run your program because I do not have the db), and enjoy !
#!/usr/bin/env python
# -*- Encoding: Latin-1 -*-
import pygtk
pygtk.require('2.0')
import gtk
def window():
win = gtk.Window ()
win.set_default_size(300, 150)
win.set_position(gtk.WIN_POS_CENTER)
win.connect ('delete_event', gtk.main_quit)
cols = ['Date', 'Index', 'Program', 'Comments']
sequence = [str] * len(cols)
starStore = gtk.TreeStore( * sequence)
starView = gtk.TreeView(starStore)
starView.cell = [None] * len(cols)
tvcolumn = [None] * len(cols)
for colnum, col in enumerate(cols):
starView.cell[colnum] = gtk.CellRendererText()
tvcolumn[colnum] = gtk.TreeViewColumn(col, starView.cell[colnum])
tvcolumn[colnum].add_attribute(starView.cell[colnum], 'text', colnum)
starView.append_column(tvcolumn[colnum])
scrollTree = gtk.ScrolledWindow()
scrollTree.set_policy(gtk.POLICY_NEVER, gtk.POLICY_AUTOMATIC)
scrollTree.add(starView)
win.add(scrollTree)
data = [['2010', '123', 'P02', 'Initial'],
['2008', '456', 'P08', 'not finished'],
['2007', '456', 'P08', 'not finished'],
['2006', '456', 'P08', 'not finished'], # copy and paste a line to add, delete a line to shorten the treeview. The scollbar appears when needed.
['2005', '456', 'P08', 'not finished'],
['2004', '456', 'P08', 'not finished'],
['2001', '999', 'P999', 'A space Odissey']]
for line in data:
mother = starStore.append( None, [line[0], line[1], line[2], line[3]])
win.show_all()
gtk.main()
if __name__ == '__main__':
window()
Thank you so much for that!
After some playing around I was able to adapt my program with your guidance!
If anyone is interested in this I am posting the code below for completeness
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pygtk
import gtk
import os
import sys
import MySQLdb
database_connection = MySQLdb.connect('localhost', 'root', '', 'nmap');
cursor = database_connection.cursor()
class Application(gtk.Window):
cells = {}
columns = {}
sort_order = gtk.SORT_ASCENDING
####################
def __init__(self):
gtk.Window.__init__( self )
self.set_title("Netowrk Scanner")
self.set_position(gtk.WIN_POS_CENTER)
self.create_widgets()
self.connect_signals()
#self.window.show_all()
self.show_all()
gtk.main()
##################
def create_widgets(self):
#Ask the user to search by operating system
self.vbox = gtk.VBox(spacing=10)
self.operating_system_label_hbox_1 = gtk.HBox(spacing=10)
self.label = gtk.Label("Search by Operating System :")
self.operating_system_label_hbox_1.pack_start(self.label)
#Set a check box so the user can choose to display ports
self.ports_hbox_8 = gtk.HBox(spacing=10)
self.ports_check = gtk.CheckButton("Display Ports")
self.ports_hbox_8.pack_start(self.ports_check)
self.halign_ports = gtk.Alignment(0,1,1,0)
self.halign_ports.add(self.ports_hbox_8)
self.os_entry_hbox_2 = gtk.HBox(spacing=10)
self.OS = gtk.Entry()
self.os_entry_hbox_2.pack_start(self.OS)
self.hostname_label_hbox_3 = gtk.HBox(spacing=10)
self.label = gtk.Label("Search by Hostname:")
self.hostname_label_hbox_3.pack_start(self.label)
self.hostname_entry_hbox_4 = gtk.HBox(spacing=10)
self.HOSTNAME = gtk.Entry()
self.hostname_entry_hbox_4.pack_start(self.HOSTNAME)
self.ip_label_hbox_5 = gtk.HBox(spacing=10)
self.label = gtk.Label("Search by IP:")
self.ip_label_hbox_5.pack_start(self.label)
self.ip_entry_hbox_6 = gtk.HBox(spacing=10)
self.IP = gtk.Entry()
self.ip_entry_hbox_6.pack_start(self.IP)
self.buttons_hbox_7 = gtk.HBox(spacing=10)
self.button_ok = gtk.Button("Get Results!")
self.buttons_hbox_7.pack_start(self.button_ok)
self.button_exit = gtk.Button("Get me Outta Here!")
self.buttons_hbox_7.pack_start(self.button_exit)
#The order in which you pack_start a widget is the order in which it is displayed on the screen
self.vbox.pack_start(self.operating_system_label_hbox_1)
self.vbox.pack_start(self.os_entry_hbox_2)
self.vbox.pack_start(self.hostname_label_hbox_3)
self.vbox.pack_start(self.hostname_entry_hbox_4)
self.vbox.pack_start(self.ip_label_hbox_5)
self.vbox.pack_start(self.ip_entry_hbox_6)
self.vbox.pack_start(self.halign_ports, False, False, 3)
self.vbox.pack_start(self.buttons_hbox_7)
self.add(self.vbox)
##########################
def connect_signals(self):
#Have the buttons start 'listening' for user interaction
self.button_ok.connect("clicked", self.button_click)
self.button_exit.connect("clicked", self.exit_program)
########################
def button_click(self, clicked):
#This function gets the values of the input boxes as well as the check box
#And then passes them to the show_table function so it can get the correct results from the database
global ports_check, os, ip, hostname
os = self.OS.get_text()
ip = self.IP.get_text()
hostname = self.HOSTNAME.get_text()
ports_check = self.ports_check.get_active()
self.show_Table(os, ip, hostname)
##############
def show_Table(self, search_os, search_ip, search_hostname):
### Create the table
# List of items to display which represent IP, OS, DNS, Port number and Port description
# Columns
if ports_check == True:
cols = ['IP Address', 'Operating System', 'Hostname', 'Ports', 'Protocol Name']
else:
cols = ['IP Address', 'Operating System', 'Hostname']
"""
self.newColumn("IP Address", 0)
self.newColumn("Operating System", 1)
self.newColumn("Hostname",2)
#I only want the ports columns to show if the user requests it because this calls different mysql queries
if ports_check == True:
self.newColumn("Ports", 3)
self.newColumn("Protocol name", 4)
"""
sequence = [str] * len(cols)
self.treestore = gtk.TreeStore( * sequence)
self.treestore.connect("rows-reordered", self.on_column_clicked)
self.treeview = gtk.TreeView(self.treestore)
self.treeview.cell = [None] * len(cols)
self.treeview_column = [None] * len(cols)
for column_number, col in enumerate(cols):
self.treeview.cell[column_number] = gtk.CellRendererText()
self.treeview_column[column_number] = gtk.TreeViewColumn(col, self.treeview.cell[column_number])
self.treeview_column[column_number].add_attribute(self.treeview.cell[column_number], 'text', column_number)
self.treeview_column[column_number].set_resizable(True)
self.treeview_column[column_number].set_reorderable(True)
self.treeview_column[column_number].set_sort_indicator(True)
self.treeview_column[column_number].set_sort_column_id(column_number)
self.treeview.append_column(self.treeview_column[column_number])
self.scrollTree = gtk.ScrolledWindow()
self.scrollTree.set_policy(gtk.POLICY_NEVER, gtk.POLICY_AUTOMATIC)
self.scrollTree.add(self.treeview)
#If the user is running a search on the hostname run these queries
if search_hostname != "":
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE DNS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % (search_hostname))
#Otherwise just return the relevent data
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE DNS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % (search_hostname))
#If the user has specified the IP and the OS to search, run this query
if search_os != "" and search_ip !="":
#Set up the queries. If the user has activated the checkbox, we need to include the ports in the query
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' and Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % (search_os, search_ip))
#Otherwise just return the relevent data
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' and Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % (search_os, search_ip))
#If the user has specified an OS but not an IP run this
elif search_os != "" and search_ip == "":
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % search_os)
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE OS_Name LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % search_os)
#If the user has specified an IP but not an OS run this
elif search_os =="" and search_ip != "":
if ports_check == True:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name, Port_Description, Open_Port FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address ), Open_Port" % search_ip)
else:
cursor.execute("SELECT DISTINCT Computer_IP_Address, OS_Name, DNS_Name FROM Computer_Ports AS CP \
JOIN Computer_Info AS CI ON ( CP.Computer_ID = CI.Computer_ID ) \
JOIN Ports_Table AS PT ON ( CP.Port_ID = PT.Port_ID ) \
JOIN OS_Table AS OS ON ( CI.Computer_ID = OS.OS_ID ) \
JOIN Port_Description AS PS ON ( PT.Open_Port = PS.Port_Number ) \
WHERE Computer_IP_Address LIKE '%%%s%%' ORDER BY inet_aton( Computer_IP_Address )" % search_ip)
#get the results and prepare to put them inside of lists
fetch_results = cursor.fetchall()
host_name_list = []
operating_list = []
ip_list = []
ports = []
#The element chosen to append to each list based on the order of retrieval in the mysql query
for individual_result in fetch_results:
ip_list.append(individual_result[0])
operating_list.append(individual_result[1])
host_name_list.append(individual_result[2])
if ports_check == True:
ports.append(individual_result[3])
#we are going to add blanks to the files in order to help readability
#when putting this into the chart
cleaned_host =[]
cleaned_ip = []
cleaned_os_list = []
index_counter = 0
#this loop will check to see if the entry already exists in the cleaned variables. If it does, it 'omitts' them by inserting a blank line
while index_counter < len(host_name_list):
if host_name_list[index_counter] in cleaned_host:
#print "found a duplicate in HOST....OMITTING"
cleaned_host.append("")
else:
#print "adding ", host_name_list[index_counter]
cleaned_host.append(host_name_list[index_counter])
if operating_list[index_counter] in cleaned_os_list and ip_list[index_counter] in cleaned_ip:
#print "found a duplicate in OPERATING....OMITTING"
cleaned_os_list.append("")
else:
#print "adding ", operating_list[index_counter]
cleaned_os_list.append(operating_list[index_counter])
if ip_list[index_counter] in cleaned_ip:
#print "Found a duplicate in IP.... OMITTING "
cleaned_ip.append("")
else:
#print "adding ", ip_list[index_counter]
cleaned_ip.append(ip_list[index_counter])
index_counter +=1
#this section appends to the list store depending on whether the user wants to see the ports or not
counter = 0
for single_result in fetch_results:
if ports_check == True:
self.treestore.append( None,
[ cleaned_ip[counter], cleaned_os_list[counter], cleaned_host[counter], single_result[4], single_result[3] ]
)
else:
self.treestore.append(None,
[ single_result[0], single_result[1], single_result[2] ]
)
counter +=1
self.frm_table = gtk.Window()
self.frm_table.set_default_size(600, 800)
self.frm_table.set_title("Network scan results")
#Change the background to white instead of grey
self.frm_table.modify_bg(gtk.STATE_NORMAL, gtk.gdk.color_parse('#fff'))
self.frm_table.add(self.scrollTree)
self.frm_table.show_all()
######################
def on_column_clicked(self, col1, col2, col3, col4 ):
#This function allows the columns to be resorted upon click
if self.sort_order == gtk.SORT_ASCENDING:
self.sort_order = gtk.SORT_DESCENDING
else:
self.sort_order = gtk.SORT_ASCENDING
#tc.set_sort_order(self.sort_order)
###############
def exit_program(self, widget, callback_data=None):
gtk.main_quit()
#---------------------------------------------
if __name__ == "__main__":
app = Application()
database_connection.commit()
cursor.close()
database_connection.close()