I am using tabulate in my program to get decision table as depicted below. What i want to achieve to get subtotal on both column side and row side. Is that possible in python
+-------+----------+---------+----------+---------+----------+
| Env | xsmall | small | medium | large | xlarge |
|-------+----------+---------+----------+---------+----------|
| dev | 2 | 4 | 6 | 1 | 2 |
| prod | 1 | 9 | 4 | 0 | 0 |
| qa | 0 | 10 | 1 | 0 | 1 |
| uat | 0 | 0 | 2 | 0 | 0 |
+-------+----------+---------+----------+---------+----------+
I want to get subtotal like this
Row 2+4+6+1+2 =15 and so on
Column 2+1+0+0 = 3
+-------+----------+---------+----------+---------+----------+---------
| Env | xsmall | small | medium | large | xlarge |Sub Total
|-------+----------+---------+----------+---------+----------+----------
| dev | 2 | 4 | 6 | 1 | 2 |15
| prod | 1 | 9 | 4 | 0 | 0 |14
| qa | 0 | 10 | 1 | 0 | 1 |12
| uat | 0 | 0 | 2 | 0 | 0 |2
+-------+----------+---------+----------+---------+----------+--------
|SubTotal| 3 | 23| 13| 1 | 3|43
Will appreciate your help!
I have DataFrame
| ind | A | B |
------------------------
| 1.01 | 10 | -1.734 |
| 1.04 | 10 | -1.244 |
| 1.05 | 10 | 0.016 |
| 1.11 | NaN | -2.737 | <-
| 1.13 | NaN | -4.232 | <-
| 1.19 | 11 | -3.241 | <=
| 1.20 | 12 | -2.832 |
| 1.21 | 10 | -4.277 |
and would like to back-fill NaN values using decreasing sequence ending with next valid value
| ind | A | B |
------------------------
| 1.01 | 10 | -1.734 |
| 1.04 | 10 | -1.244 |
| 1.05 | 10 | 0.016 |
| 1.11 | 13 | -2.737 | <-
| 1.13 | 12 | -4.232 | <-
| 1.19 | 11 | -3.241 | <=
| 1.20 | 12 | -2.832 |
| 1.21 | 10 | -4.277 |
Is there a way to do this?
Get positions where NaNs are found
positions = df['A'].isna().astype(int)
| positions |
--------------
| 0 |
| 0 |
| 0 |
| 1 |
| 1 |
| 0 |
| 0 |
| 0 |
then doing reverse cumulative sum:
mask = df['A'].isna().astype(int).loc[::-1]
cumSum = mask.cumsum()
posCumSum = (cumSum - cumSum.where(~mask).ffill().fillna(0).astype(int)).loc[::-1]
| posCumSum |
--------------
| 0 |
| 0 |
| 0 |
| 2 |
| 1 |
| 0 |
| 0 |
| 0 |
adding it to backfilled original column:
df['A'] = df['A'].bfill() + posCumSum
| ind | A | B |
------------------------
| 1.01 | 10 | -1.734 |
| 1.04 | 10 | -1.244 |
| 1.05 | 10 | 0.016 |
| 1.11 | 13 | -2.737 | <-
| 1.13 | 12 | -4.232 | <-
| 1.19 | 11 | -3.241 | <=
| 1.20 | 12 | -2.832 |
| 1.21 | 10 | -4.277 |
I have a query that runs much slower (6.82% to 76% of DB Time) after using hash clustering. And the AWR report says latch free -> multiblock read objects is the key.
The query is
SELECT count(1) INTO result FROM (
SELECT s_w_id, s_i_id, s_quantity
FROM bmsql_stock
WHERE s_w_id = in_w_id AND s_quantity < in_threshold AND s_i_id IN (
SELECT ol_i_id
FROM bmsql_district
JOIN bmsql_order_line ON ol_w_id = d_w_id
AND ol_d_id = d_id
AND ol_o_id >= d_next_o_id - 20
AND ol_o_id < d_next_o_id
WHERE d_w_id = in_w_id AND d_id = in_d_id
)
);
And table ddl is (note that order_line table is not clustered)
create cluster bmsql_stock_cluster (
s_w_id integer,
s_i_id integer
)
single table
hashkeys 300000000
hash is ( (s_i_id-1) * 3000 + s_w_id-1 )
size 270
pctfree 0 initrans 2 maxtrans 2
storage (buffer_pool keep) parallel (degree 96);
create table bmsql_stock (
s_w_id integer not null,
s_i_id integer not null,
s_quantity integer,
s_ytd integer,
s_order_cnt integer,
s_remote_cnt integer,
s_data varchar(50),
s_dist_01 char(24),
s_dist_02 char(24),
s_dist_03 char(24),
s_dist_04 char(24),
s_dist_05 char(24),
s_dist_06 char(24),
s_dist_07 char(24),
s_dist_08 char(24),
s_dist_09 char(24),
s_dist_10 char(24)
)
cluster bmsql_stock_cluster(
s_w_id, s_i_id
);
create unique index bmsql_stock_pkey
on bmsql_stock (s_i_id, s_w_id)
parallel 32
pctfree 1 initrans 3
compute statistics;
create cluster bmsql_district_cluster (
d_id integer,
d_w_id integer
)
single table
hashkeys 30000
hash is ( (((d_w_id-1)*10)+d_id-1) )
size 3496
initrans 4
storage (buffer_pool default) parallel (degree 32);
create table bmsql_district (
d_id integer not null,
d_w_id integer not null,
d_ytd decimal(12,2),
d_tax decimal(4,4),
d_next_o_id integer,
d_name varchar(10),
d_street_1 varchar(20),
d_street_2 varchar(20),
d_city varchar(20),
d_state char(2),
d_zip char(9)
)
cluster bmsql_district_cluster(
d_id, d_w_id
);
create unique index bmsql_district_pkey
on bmsql_district (d_w_id, d_id)
pctfree 5 initrans 3
parallel 1
compute statistics;
create table bmsql_order_line (
ol_w_id integer not null,
ol_d_id integer not null,
ol_o_id integer sort,
ol_number integer sort,
ol_i_id integer not null,
ol_delivery_d timestamp,
ol_amount decimal(6,2),
ol_supply_w_id integer,
ol_quantity integer,
ol_dist_info char(24)
);
create unique index bmsql_order_line_pkey
on bmsql_order_line (ol_w_id, ol_d_id, ol_o_id, ol_number)
compute statistics;
Finally, the multiblock read objects related metrics:
Latch Activity:
Latch Name Get Requests Pct Get Miss Avg Slps /Miss Wait Time (s) NoWait Requests
// before
multiblock read objects 42,906 0.24 0.00 0 0
// after
multiblock read objects 302,570,536 87.49 0.04 22385 0
Latch Sleep Breakdown:
Latch Name Get Requests Misses Sleeps Spin Gets
multiblock read objects 302,570,536 264,712,892 11,385,692 254,114,619
Latch Miss Source:
Latch Name Where NoWait Misses Sleeps Waiter Sleeps
multiblock read objects kcbzibmlt 0 5,886,699 5,927,472
multiblock read objects kcbzibmlt: normal mbr free 0 5,498,253 5,457,785
And the execution plan with statistics:
Plan hash value: 485221244
--------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.11 | 1472 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.11 | 1472 | | | |
| 2 | NESTED LOOPS | | 1 | | 4 |00:00:00.11 | 1472 | | | |
| 3 | NESTED LOOPS | | 1 | 103 | 220 |00:00:00.10 | 1240 | | | |
| 4 | VIEW | VW_NSO_1 | 1 | 390 | 220 |00:00:00.10 | 578 | | | |
| 5 | HASH UNIQUE | | 1 | 1 | 220 |00:00:00.10 | 578 | 1397K| 1397K| 1342K (0)|
| 6 | MERGE JOIN | | 1 | 390 | 221 |00:00:00.10 | 578 | | | |
|* 7 | TABLE ACCESS HASH | BMSQL_DISTRICT | 1 | 62 | 1 |00:00:00.01 | 1 | | | |
|* 8 | FILTER | | 1 | | 221 |00:00:00.10 | 577 | | | |
|* 9 | FILTER | | 1 | | 221 |00:00:00.10 | 577 | | | |
| 10 | TABLE ACCESS BY INDEX ROWID| BMSQL_ORDER_LINE | 1 | 2500 | 31249 |00:00:00.02 | 577 | | | |
|* 11 | INDEX RANGE SCAN | BMSQL_ORDER_LINE_PKEY | 1 | 2500 | 31249 |00:00:00.01 | 106 | | | |
|* 12 | INDEX UNIQUE SCAN | BMSQL_STOCK_PKEY | 220 | 1 | 220 |00:00:00.01 | 662 | | | |
|* 13 | TABLE ACCESS BY INDEX ROWID | BMSQL_STOCK | 220 | 103 | 4 |00:00:00.01 | 232 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("BMSQL_DISTRICT"."D_ID"=:SYS_B_5 AND "BMSQL_DISTRICT"."D_W_ID"=:SYS_B_4)
8 - filter("OL_O_ID"<"D_NEXT_O_ID")
9 - filter("OL_O_ID">="D_NEXT_O_ID"-:SYS_B_3)
11 - access("OL_W_ID"=:SYS_B_4 AND "OL_D_ID"=:SYS_B_5)
12 - access("S_I_ID"="OL_I_ID" AND "S_W_ID"=:SYS_B_1)
13 - filter("S_QUANTITY"<:SYS_B_2)
Note
-----
- dynamic sampling used for this statement (level=6)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
828 consistent gets
2 physical reads
0 redo size
525 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Multiblock read is related to index/table scan, but it seems to me that altered tables do not require full scan operation. Hope someone can kindly explain this problem for me. Thanks in advance.
Update: turns fast after I un-cluster DISTRICT table
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2385307489
------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 802 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 802 | | | |
| 2 | NESTED LOOPS | | 1 | | 9 |00:00:00.01 | 802 | | | |
| 3 | NESTED LOOPS | | 1 | 137 | 191 |00:00:00.02 | 605 | | | |
| 4 | VIEW | VW_NSO_1 | 1 | 6 | 191 |00:00:00.01 | 30 | | | |
| 5 | HASH UNIQUE | | 1 | 1 | 191 |00:00:00.01 | 30 | 1397K| 1397K| 1321K (0)|
| 6 | NESTED LOOPS | | 1 | 6 | 195 |00:00:00.02 | 30 | | | |
| 7 | TABLE ACCESS BY INDEX ROWID| BMSQL_DISTRICT | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
|* 8 | INDEX UNIQUE SCAN | BMSQL_DISTRICT_PKEY | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
| 9 | TABLE ACCESS BY INDEX ROWID| BMSQL_ORDER_LINE | 1 | 6 | 195 |00:00:00.02 | 27 | | | |
|* 10 | INDEX RANGE SCAN | BMSQL_ORDER_LINE_PKEY | 1 | 383 | 195 |00:00:00.01 | 4 | | | |
|* 11 | INDEX UNIQUE SCAN | BMSQL_STOCK_PKEY | 191 | 1 | 191 |00:00:00.01 | 575 | | | |
|* 12 | TABLE ACCESS BY INDEX ROWID | BMSQL_STOCK | 191 | 137 | 9 |00:00:00.01 | 197 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
8 - access("BMSQL_DISTRICT"."D_W_ID"=:SYS_B_4 AND "BMSQL_DISTRICT"."D_ID"=:SYS_B_5)
10 - access("OL_W_ID"=:SYS_B_4 AND "OL_D_ID"=:SYS_B_5 AND "OL_O_ID">="D_NEXT_O_ID"-:SYS_B_3 AND "OL_O_ID"<"D_NEXT_O_ID")
11 - access("S_I_ID"="OL_I_ID" AND "S_W_ID"=:SYS_B_1)
12 - filter("S_QUANTITY"<:SYS_B_2)
Note
-----
- dynamic sampling used for this statement (level=6)
43 rows selected.
I have a long running Oracle Query which uses a bunch of:
WHERE EXISTS (SELECT NULL FROM Table WHERE TableColumn IN (...))
Instead of using SELECT NULL, which goes through the entire table to find criteria, can't I just put FETCH NEXT 1 ROW ONLY after it since I only care if TableColumn is IN (...)?
Like this:
WHERE EXISTS (SELECT NULL FROM Table WHERE TableColumn IN (...) FETCH NEXT 1 ROW ONLY)
So the WHERE EXISTS would be evaluated quicker.
EDIT:
Below is the query plan without the FETCH NEXT clause attached:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 75 | 521611 | |
| 1 | SORT AGGREGATE | | 1 | 75 | | |
| 2 | HASH JOIN | | 531266 | 39844950 | 521611 | |
| 3 | TABLE ACCESS FULL | ACCT | 47574 | 523314 | 418 | |
| 4 | HASH JOIN | | 531224 | 33998336 | 521185 | |
| 5 | INDEX FAST FULL SCAN | PK_ACTVTYP | 454 | 2270 | 2 | |
| 6 | HASH JOIN | | 531224 | 31342216 | 521177 | |
| 7 | INDEX FULL SCAN | PK_ACTVCAT | 67 | 335 | 1 | |
| 8 | HASH JOIN SEMI | | 531224 | 28686096 | 521169 | |
| 9 | NESTED LOOPS SEMI | | 531224 | 28686096 | 521169 | |
| 10 | STATISTICS COLLECTOR | | | | | |
| 11 | HASH JOIN RIGHT SEMI | | 531224 | 25498752 | 112887 | |
| 12 | TABLE ACCESS FULL | AMSACTVGRPEMPL | 2364 | 35460 | 10 | |
| 13 | TABLE ACCESS FULL | ACTV | 12779986 | 421739538 | 112712 | |
| 14 | INDEX RANGE SCAN | ACTVSUBACTV_DX2 | 163091724 | 978550344 | 251246 | |
| 15 | INDEX FAST FULL SCAN | ACTVSUBACTV_DX2 | 163091724 | 978550344 | 251246 | |
------------------------------------------------------------------------------------------------
Below is the query plan with the FETCH NEXT clause attached:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 69 | 113148 | |
| 1 | SORT AGGREGATE | | 1 | 69 | | |
| 2 | FILTER | | | | | |
| 3 | HASH JOIN | | 531221 | 36654249 | 113144 | |
| 4 | TABLE ACCESS FULL | ACCT | 47574 | 523314 | 418 | |
| 5 | HASH JOIN | | 531179 | 30808382 | 112718 | |
| 6 | INDEX FAST FULL SCAN | PK_ACTVTYP | 454 | 2270 | 2 | |
| 7 | HASH JOIN | | 531179 | 28152487 | 112710 | |
| 8 | INDEX FULL SCAN | PK_ACTVCAT | 67 | 335 | 1 | |
| 9 | HASH JOIN RIGHT SEMI | | 531179 | 25496592 | 112702 | |
| 10 | TABLE ACCESS FULL | AMSACTVGRPEMPL | 2167 | 32505 | 10 | |
| 11 | TABLE ACCESS FULL | ACTV | 12778893 | 421703469 | 112527 | |
| 12 | VIEW | | 1 | 13 | 4 | |
| 13 | WINDOW BUFFER PUSHED RANK | | 8 | 48 | 4 | |
| 14 | INDEX RANGE SCAN | ACTVSUBACTV_DX2 | 8 | 48 | 4 | |
------------------------------------------------------------------------------------------------
From what I see, it looks like without the FETCH NEXT it's adding overhead by more TABLE ACCESS FULL
EDIT #2
Adding AND ROWNUM = 1 instead of FETCH NEXT 1 ROW ONLY:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 54 | 128114 | |
| 1 | SORT AGGREGATE | | 1 | 54 | | |
| 2 | FILTER | | | | | |
| 3 | HASH JOIN | | 12779902 | 690114708 | 113296 | |
| 4 | TABLE ACCESS FULL | ACCT | 47574 | 523314 | 418 | |
| 5 | HASH JOIN | | 12778893 | 549492399 | 112713 | |
| 6 | MERGE JOIN CARTESIAN | | 30418 | 304180 | 31 | |
| 7 | INDEX FULL SCAN | PK_ACTVCAT | 67 | 335 | 1 | |
| 8 | BUFFER SORT | | 454 | 2270 | 30 | |
| 9 | INDEX FAST FULL SCAN | PK_ACTVTYP | 454 | 2270 | 0 | |
| 10 | TABLE ACCESS FULL | ACTV | 12778893 | 421703469 | 112517 | |
| 11 | COUNT STOPKEY | | | | | |
| 12 | INLIST ITERATOR | | | | | |
| 13 | INDEX UNIQUE SCAN | PK_AMSACTVGRPEMPL | 1 | 15 | 2 | |
| 14 | COUNT STOPKEY | | | | | |
| 15 | INDEX RANGE SCAN | ACTVSUBACTV_DX2 | 2 | 12 | 4 | |
------------------------------------------------------------------------------------------------
The FETCH NEXT is new in 12c, and to avoid the performance issue causing it add
hint like below
WHERE EXISTS (SELECT /*+ first_rows(1)*/* FROM Table WHERE TableColumn IN (...) FETCH NEXT 1 ROW ONLY)
try it and check its query plan
Note: I recommend to add indexes on table ACCT ,ACTV to enhance its performance.
If I have two int data type columns in SQL Server, how can I write a query so that I get the maximum number, at the maximum number of the other column?
Let me give an example. Lets say I have this table:
| Name | Version | Category | Value | Number | Replication |
|:-----:|:-------:|:--------:|:-----:|:------:|:-----------:|
| File1 | 1.0 | Time | 123 | 1 | 1 |
| File1 | 1.0 | Size | 456 | 1 | 1 |
| File2 | 1.0 | Time | 312 | 1 | 1 |
| File2 | 1.0 | Size | 645 | 1 | 1 |
| File1 | 1.0 | Time | 369 | 1 | 2 |
| File1 | 1.0 | Size | 258 | 1 | 2 |
| File2 | 1.0 | Time | 741 | 1 | 2 |
| File2 | 1.0 | Size | 734 | 1 | 2 |
| File1 | 1.1 | Time | 997 | 2 | 1 |
| File1 | 1.1 | Size | 997 | 2 | 1 |
| File2 | 1.1 | Time | 438 | 2 | 1 |
| File2 | 1.1 | Size | 735 | 2 | 1 |
| File1 | 1.1 | Time | 786 | 2 | 2 |
| File1 | 1.1 | Size | 486 | 2 | 2 |
| File2 | 1.1 | Time | 379 | 2 | 2 |
| File2 | 1.1 | Size | 943 | 2 | 2 |
| File1 | 1.2 | Time | 123 | 3 | 1 |
| File1 | 1.2 | Size | 456 | 3 | 1 |
| File2 | 1.2 | Time | 312 | 3 | 1 |
| File2 | 1.2 | Size | 645 | 3 | 1 |
| File1 | 1.2 | Time | 369 | 3 | 2 |
| File1 | 1.2 | Size | 258 | 3 | 2 |
| File2 | 1.2 | Time | 741 | 3 | 2 |
| File2 | 1.2 | Size | 734 | 3 | 2 |
| File1 | 1.3 | Time | 997 | 4 | 1 |
| File1 | 1.3 | Size | 997 | 4 | 1 |
| File2 | 1.3 | Time | 438 | 4 | 1 |
| File2 | 1.3 | Size | 735 | 4 | 1 |
How could I write a query so that I selected the maximum Replication value at the maximum Number value? As you can see, in this table, the maximum value in Number is 4 but the maximum number in Replication where Number = 4 is 1
All I can think to do is this:
SELECT MAX(Replication) FROM Table
WHERE Number IS MAX;
which is obviously wrong and doesn't work.
You can try Group By and Having
select max(Replication) from Table_Name group by [Number] having
[Number]=(select max([Number]) from Table_Name)
Just use a subquery to find the max number in the where clause. If you just want one single number as the result there is no need to use group by and having (which would make the query a lot more expensive):
select max([replication]) from tab
where number = (select max(number) from tab)