Related
I am trying to measure the length of the list under Original Query and subsequently find the mean and std dev but I cannot seem to measure the length. How do I do it?
This is what I tried:
filepath = "yandex_users_paired_queries.csv" #path to the csv with the query datasetqueries = pd.read_csv(filepath)
totalNum = queries.groupby('Original Query').size().reset_index(name='counts')
sessions = queries.groupby(['UserID','Original Query'])
print(sessions.size())
print("----------------------------------------------------------------")
print("~~~Mean & Average~~~")
sessionsDF = sessions.size().to_frame('counts')
sessionsDFbyBool = sessionsDF.groupby(['Original Query'])
print(sessionsDFbyBool["counts"].agg([np.mean,np.std]))
And this is my output:
UserID Original Query
154 [1228124, 388107, 1244921, 3507784] 1
[1237207, 1974238, 1493311, 1222688, 733390, 868851, 428547, 110871, 868851, 235307] 1
[1237207, 1974238, 1493311, 1222688, 733390, 868851, 428547] 1
[1237207, 1974238, 1493311, 1222688, 733390] 1
[1237207] 1
..
343 [919873, 551537, 1841361, 1377305, 610887, 1196372, 3724298] 1
[919873, 551537, 1841361, 1377305, 610887, 1196372] 1
345 [3078369, 3613096, 4249887, 2383044, 2366003, 4043437] 1
[3531370, 3078369, 284354, 4300636] 1
347 [1617419] 1
Length: 612, dtype: int64
You want to apply the len function on the 'Original Query' column.
queries['oq_len'] = queries['Original Query'].apply(len)
sessionsDF = queries.groupby('UserID').oq_len.agg([np.mean,np.std])
[EDIT] Changed df size to 1k and provided piecemeal code for expected result.
Have the following df:
import random
random.seed(1234)
sz = 1000
typ = ['a', 'b', 'c']
sub_typ = ['s1', 's2', 's3', 's4']
ifs = ['A', 'D']
col_sort = np.random.randint(0, 10, size=sz)
col_val = np.random.randint(100, 1000, size=sz)
df = pd.DataFrame({'typ': random.choices(typ, k=sz),
'sub_typ': random.choices(sub_typ, k=sz),
'col_if': random.choices(ifs, k=sz),
'col_sort': col_sort,
'value': col_val})
Would like to sort within groupby of [typ] and [sub_typ] fields, such that it sorts [col_sort] field in ascending order if [col_if] == 'A' and in descending order if [col_if] == 'D' and pick up the first 3 values of the sorted dataframe, in one line of code.
Expected result is like df_result below:
df_A = df[df.col_if == 'A']
df_D = df[df.col_if == 'D']
df_A_sorted_3 = df_A.groupby(['typ', 'sub_typ'], as_index=False).apply(lambda x:
x.sort_values('col_sort', ascending=True)).\
groupby(['typ', 'sub_typ', 'col_sort']).head(3)
df_D_sorted_3 = df_D.groupby(['typ', 'sub_typ'], as_index=False).apply(lambda x:
x.sort_values('col_sort', ascending=False)).\
groupby(['typ', 'sub_typ', 'col_sort']).head(3)
df_result = pd.concat([df_A_sorted_3, df_D_sorted_3]).reset_index(drop=True)
Tried:
df.groupby(['typ', 'sub_typ']).apply(lambda x: x.sort_values('col_sort', ascending=True)
if x.col_if == 'A' else x.sort_values('col_sort',
ascending=False)).groupby(['typ', 'sub_typ', 'col_sort']).head(3)
...but it gives the error:
ValueError: The truth value of a Series is ambiguous.
Sorting per groups is same like sorting by multiple columns, but if need same output is necessary kind='mergesort'.
So for improve performance I suggest NOT sorting per groups in groupby:
np.random.seed(1234)
sz = 1000
typ = ['a', 'b', 'c']
sub_typ = ['s1', 's2', 's3', 's4']
ifs = ['A', 'D']
col_sort = np.random.randint(0, 10, size=sz)
col_val = np.random.randint(100, 1000, size=sz)
df = pd.DataFrame({'typ': np.random.choice(typ, sz),
'sub_typ': np.random.choice(sub_typ, sz),
'col_if': np.random.choice(ifs, sz),
'col_sort': col_sort,
'value': col_val})
# print (df)
df_A = df[df.col_if == 'A']
df_D = df[df.col_if == 'D']
df_A_sorted_3 = (df_A.sort_values(['typ', 'sub_typ','col_sort'])
.groupby(['typ', 'sub_typ', 'col_sort'])
.head(3))
df_D_sorted_3 = (df_D.sort_values(['typ', 'sub_typ','col_sort'], ascending=[True, True, False])
.groupby(['typ', 'sub_typ', 'col_sort'])
.head(3))
df_result = pd.concat([df_A_sorted_3, df_D_sorted_3]).reset_index(drop=True)
print (df_result)
typ sub_typ col_if col_sort value
0 a s1 A 0 709
1 a s1 A 0 710
2 a s1 A 0 801
3 a s1 A 1 542
4 a s1 A 1 557
.. .. ... ... ... ...
646 c s4 D 1 555
647 c s4 D 1 233
648 c s4 D 0 501
649 c s4 D 0 436
650 c s4 D 0 695
[651 rows x 5 columns]
Compare outputs:
df_A_sorted_3 = df_A.groupby(['typ', 'sub_typ'], as_index=False).apply(lambda x:
x.sort_values('col_sort', ascending=True, kind='mergesort')).\
groupby(['typ', 'sub_typ', 'col_sort']).head(3)
df_D_sorted_3 = df_D.groupby(['typ', 'sub_typ'], as_index=False).apply(lambda x:
x.sort_values('col_sort', ascending=False, kind='mergesort')).\
groupby(['typ', 'sub_typ', 'col_sort']).head(3)
df_result = pd.concat([df_A_sorted_3, df_D_sorted_3]).reset_index(drop=True)
print (df_result)
typ sub_typ col_if col_sort value
0 a s1 A 0 709
1 a s1 A 0 710
2 a s1 A 0 801
3 a s1 A 1 542
4 a s1 A 1 557
.. .. ... ... ... ...
646 c s4 D 1 555
647 c s4 D 1 233
648 c s4 D 0 501
649 c s4 D 0 436
650 c s4 D 0 695
[651 rows x 5 columns]
EDIT: Possible, but slow:
def f(x):
a = x[x.col_if == 'A'].sort_values('col_sort', ascending=True, kind='mergesort')
d = x[x.col_if == 'D'].sort_values('col_sort', ascending=False, kind='mergesort')
return pd.concat([a,d], sort=False)
df_result = (df.groupby(['typ', 'sub_typ','col_if'], as_index=False, group_keys=False)
.apply(f)
.groupby(['typ', 'sub_typ', 'col_sort', 'col_if'])
.head(3))
print (df_result)
typ sub_typ col_if col_sort value
242 a s1 A 0 709
535 a s1 A 0 710
589 a s1 A 0 801
111 a s1 A 1 542
209 a s1 A 1 557
.. .. ... ... ... ...
39 c s4 D 1 555
211 c s4 D 1 233
13 c s4 D 0 501
614 c s4 D 0 436
658 c s4 D 0 695
[651 rows x 5 columns]
You wrote that col_if should act as a "switch" to the sort order.
But note that each group (at least for your seeding of random) contains
both A and D in col_sort column, so your requirement is ambiguous.
One of possible solutions is to perform a "majority vote" in each group,
i.e. the sort order in particular group is to be ascending if there are
more or equal A values than D. Note that I arbitrarily chose the
ascending order in the "equal" case, maybe you should take the other option.
A doubtful point in your requirements (and hence the code) is that you
put .head(3) after the group processing.
This way you get first 3 rows from the first group only.
Maybe you want 3 initial rows from each group?
In this case head(3) should be inside the lambda function (as I wrote
below).
So change your code to:
df.groupby(['typ', 'sub_typ']).apply(lambda x: x.sort_values('col_sort',
ascending=(x.col_if.eq('A').sum() >= x.col_if.eq('D').sum())).head(3))
As you can see, the sort order can be expressed as a bool expression for
ascending, instead of 2 similar expressions differing only in ascending
parameter.
I need to build a report collect the QtyOnHand in the store groped based on Vendor => Shipment
I am trying to build sql query
this code is not working:
CREATE or REPLACE VIEW purchase_custom_report as (
WITH currency_rate as (
SELECT
r.currency_id,
COALESCE(r.company_id, c.id) as company_id,
r.rate,
r.name AS date_start,
(SELECT name FROM res_currency_rate r2
WHERE r2.name > r.name AND
r2.currency_id = r.currency_id AND
(r2.company_id is null or r2.company_id = c.id)
ORDER BY r2.name ASC
LIMIT 1) AS date_end
FROM res_currency_rate r
JOIN res_company c ON (r.company_id is null or r.company_id = c.id)
)
SELECT
min(l.id) as id,
s.date_order as date_order,
s.partner_id as partner_id,
s.user_id as user_id,
s.company_id as company_id,
t.list_price as std_price,
l.product_id,
p.temp_qty as available_in_store,
p.product_tmpl_id,
t.categ_id as category_id,
s.currency_id,
case when invoice_status='invoiced' then True else False end as po_paid_flag,
t.uom_id as product_uom,
t.standard_price as cost,
l.price_subtotal/l.product_qty as real_price,
l.price_subtotal/l.product_qty - t.standard_price as profit_val,
s.date_order as purchase_id_date,
-- ((l.price_subtotal/COALESCE(NULLIF(l.product_qty, 0), 1.0)) - t.standard_price) * 100/ COALESCE(NULLIF(t.standard_price, 0), 1.0) as profit_percentage,
p.temp_qty * t.standard_price as stock_value,
sum(l.product_qty/u.factor*u2.factor) as unit_quantity,
sum(l.price_unit / COALESCE(NULLIF(cr.rate, 0), 1.0) * l.product_qty)::decimal(16,2) as price_total,
(sum(l.product_qty * l.price_unit / COALESCE(NULLIF(cr.rate, 0), 1.0))/NULLIF(sum(l.product_qty/u.factor*u2.factor),0.0))::decimal(16,2) as price_average
FROM (
purchase_order_line l
join purchase_order s on (l.order_id=s.id)
join res_partner partner on s.partner_id = partner.id
left join product_product p on (l.product_id=p.id)
left join product_template t on (p.product_tmpl_id=t.id)
LEFT JOIN ir_property ip ON (ip.name='standard_price' AND ip.res_id=CONCAT('product.product,',p.id) AND ip.company_id=s.company_id)
left join uom_uom u on (u.id=l.product_uom)
left join uom_uom u2 on (u2.id=t.uom_id)
left join stock_quant sq on (sq.product_id = p.id)
left join currency_rate cr on (cr.currency_id = s.currency_id and
cr.company_id = s.company_id and
cr.date_start <= coalesce(s.date_order, now()) and
(cr.date_end is null or cr.date_end > coalesce(s.date_order, now())))
)
GROUP BY
s.company_id,
s.user_id,
s.partner_id,
s.currency_id,
l.price_unit,
l.price_subtotal,
l.date_planned,
l.product_uom,
l.product_id,
p.product_tmpl_id,
t.categ_id,
s.date_order,
u.uom_type,
u.category_id,
t.uom_id,
u.id,
t.list_price,
p.temp_qty,
t.standard_price,
l.product_qty,
s.invoice_status,
s.date_order
)
What I am expecting to have is the table & field name and I will do the rest
It's nowhere on the database as it is a computed field.
See <path_to_v12>/addons/stock/models/product.py
21 class Product(models.Model):
22 _inherit = "product.product"
...
26 qty_available = fields.Float(
27 'Quantity On Hand', compute='_compute_quantities', search='_search_qty_available',
28 digits=dp.get_precision('Product Unit of Measure'),
29 help="Current quantity of products.\n"
30 "In a context with a single Stock Location, this includes "
31 "goods stored at this Location, or any of its children.\n"
32 "In a context with a single Warehouse, this includes "
33 "goods stored in the Stock Location of this Warehouse, or any "
34 "of its children.\n"
35 "stored in the Stock Location of the Warehouse of this Shop, "
36 "or any of its children.\n"
37 "Otherwise, this includes goods stored in any Stock Location "
38 "with 'internal' type.")
...
489 def _compute_quantities(self):
490 res = self._compute_quantities_dict()
491 for template in self:
492 template.qty_available = res[template.id]['qty_available']
493 template.virtual_available = res[template.id]['virtual_available']
494 template.incoming_qty = res[template.id]['incoming_qty']
495 template.outgoing_qty = res[template.id]['outgoing_qty']
496
497 def _product_available(self, name, arg):
498 return self._compute_quantities_dict()
499
500 def _compute_quantities_dict(self):
501 # TDE FIXME: why not using directly the function fields ?
502 variants_available = self.mapped('product_variant_ids')._product_available()
503 prod_available = {}
504 for template in self:
505 qty_available = 0
506 virtual_available = 0
507 incoming_qty = 0
508 outgoing_qty = 0
509 for p in template.product_variant_ids:
510 qty_available += variants_available[p.id]["qty_available"]
511 virtual_available += variants_available[p.id]["virtual_available"]
512 incoming_qty += variants_available[p.id]["incoming_qty"]
513 outgoing_qty += variants_available[p.id]["outgoing_qty"]
514 prod_available[template.id] = {
515 "qty_available": qty_available,
516 "virtual_available": virtual_available,
517 "incoming_qty": incoming_qty,
518 "outgoing_qty": outgoing_qty,
519 }
520 return prod_available
Question: Why does it seem that date_list[d] and isin_list[i] are not getting populated, in the code segment below?
AWK Code (on GNU-AWK on a Win-7 machine)
BEGIN { FS = "," } # This SEBI data set has comma-separated fields (NSE snapshots are pipe-separated)
# UPDATE the lists for DATE ($10), firm_ISIN ($9), EXCHANGE ($12), and FII_ID ($5).
( $17~/_EQ\>/ ) {
if (date[$10]++ == 0) date_list[d++] = $10; # Dates appear in order in raw data
if (isin[$9]++ == 0) isin_list[i++] = $9; # ISINs appear out of order in raw data
print $10, date[$10], $9, isin[$9], date_list[d], d, isin_list[i], i
}
input data
49290,C198962542782200306,6/30/2003,433581,F5811773991200306,S5405611832200306,B5086397478200306,NESTLE INDIA LTD.,INE239A01016,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,591.13,5655,3342840.15,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49291,C198962542782200306,6/30/2003,433563,F6292896459200306,S6344227311200306,B6110521493200306,GRASIM INDUSTRIES LTD.,INE047A01013,6/27/2003,1,E9035083824200306,REG_DL_STLD_02,495.33,3700,1832721,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49292,C198962542782200306,6/30/2003,433681,F6513202607200306,S1724027402200306,B6372023178200306,HDFC BANK LTD,INE040A01018,6/26/2003,1,E745964372424200306,REG_DL_STLD_02,242,2600,629200,REG_DL_INSTR_EQ,REG_DL_DLAY_D,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49293,C7885768925200306,6/30/2003,48128,F4406661052200306,S7376401565200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,44600,5575000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49294,C7885768925200306,6/30/2003,48129,F4500260787200306,S1312094035200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,4,E912851176274200306,REG_DL_STLD_04,125,445600,55700000,REG_DL_INSTR_EQ,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
49295,C7885768925200306,6/30/2003,48130,F6425024637200306,S2872499118200306,B4576522576200306,Maruti Udyog Limited,INE585B01010,6/28/2003,3,E912851176274200306,REG_DL_STLD_04,125,48000,6000000,REG_DL_INSTR_EU,REG_DL_DLAY_P,DL_RPT_TYPE_N,DL_AMDMNT_DEL_00
output that I am getting
6/27/2003 1 INE239A01016 1 1 1
6/27/2003 2 INE047A01013 1 1 2
6/26/2003 1 INE040A01018 1 2 3
6/28/2003 1 INE585B01010 1 3 4
6/28/2003 2 INE585B01010 2 3 4
Expected output
As far as I can tell, the print is printing out correctly (i) $10 (the date) (ii) date[$10), the count for each date (iii) $9 (firm-ID called ISIN) (iv) isin[$9], the count for each ISIN (v) d (index of date_list, the number of unique dates) and (vi) i (index of isin_list, the number of unique ISINs). I should also get two more columns -- columns 5 and 7 below -- for date_list[d] and isin_list[i], which will have values that look like $10 and $9.
6/27/2003 1 INE239A01016 1 6/27/2003 1 INE239A01016 1
6/27/2003 2 INE047A01013 1 6/27/2003 1 INE047A01013 2
6/26/2003 1 INE040A01018 1 6/26/2003 2 INE040A01018 3
6/28/2003 1 INE585B01010 1 6/28/2003 3 INE585B01010 4
6/28/2003 2 INE585B01010 2 6/28/2003 3 INE585B01010 4
actual code I now use is
{ if (date[$10]++ == 0) date_list[d++] = $10;
if (isin[$9]++ == 0) isin_list[i++] = $9;}
( $11~/1|2|3|5|9|1[24]/ )) { ++BNR[$10,$9,$12,$5]}
END { { for (u = 0; u < d; u++)
{for (v = 0; v < i; v++)
{ if (BNR[date_list[u],isin_list[v]]>0)
BR=BNR[date_list[u],isin_list[v]]
{ print(date_list[u], isin_list[v], BR}}}}}
Thanks a lot to everyone.
is there a possibility of fetch array while count the results
i'm trying to create an attendance monitoring which count the no. of presents, absents, total no. of meetings per sched. ID and Student ID
here is my database
SCHED_ID |STUDENT_ID |DATE | A_STAT
1234567 |2014-000003 |08/01/14 |Absent
123456 |2014-000003 |08/04/2014 |Present
1234567 |2014-000003 |08/10/2014 |Present
123456 |2014-000003 |08/10/2014 |Present
the output supposed to be like this
Subject Tot Num| Num of Days Present | Num of Days Absent
dasdasdasd 3 2 1
testing123 3 2 1
ASDASD 1 0 1
but its always been like this
Subject Tot Num | Num of Days Present | Num of Days Absent
dasdasdasd 3 1 1
testing123 3 1 1
ASDASD 3 1 1
$query = $this->db->query("SELECT * FROM (tbl_schedule c, tbl_subject d, tbl_attendance e) where e.SCHED_ID = c.SCHED_ID AND d.SUBJECT_CODE = c.SUBJECT_CODE AND (e.STUDENT_ID = '$si') GROUP BY e.SCHED_ID")->result_array();
$query1 = $this->db->query("SELECT * FROM (tbl_schedule c, tbl_subject d, tbl_attendance e) where e.SCHED_ID = c.SCHED_ID AND d.SUBJECT_CODE = c.SUBJECT_CODE AND e.A_STAT='Present' AND (e.STUDENT_ID = '$si') GROUP BY e.SCHED_ID AND e.A_STAT")->num_rows();
$query2 = $this->db->query("SELECT * FROM (tbl_schedule c, tbl_subject d, tbl_attendance e) where e.SCHED_ID = c.SCHED_ID AND d.SUBJECT_CODE = c.SUBJECT_CODE AND e.A_STAT='Absent' AND (e.STUDENT_ID = '$si') GROUP BY e.SCHED_ID AND e.A_STAT")->num_rows();
$query3 = $this->db->query("SELECT * FROM (tbl_schedule c, tbl_subject d, tbl_attendance e) where e.SCHED_ID = c.SCHED_ID AND d.SUBJECT_CODE = c.SUBJECT_CODE AND (e.STUDENT_ID = '$si') GROUP BY e.SCHED_ID ")->num_rows();
for($i=0;$i<sizeof($query);$i++){
$data[$i][0]=$query1; // present
$data[$i][1]=$query2; //absent
$data[$i][2]=$query3; //total
$data[$i][3]=$query[$i]['SUBJECT _DESC']; // subj
}
return $data;