Trying to query certain groups that satisfy a value_count condition - pandas

I'm playing around with stock data, and I'm trying to filter the groups that have more buys than sells with respective to Transaction values
So the code I'm running to display the below data is
df.groupby('Stock').Transaction.value_counts()
data
Stock Transaction
ADC Buy 2
AKAM Option Exercise 51
Sale 34
Buy 9
AMNB Buy 10
ARCC Buy 15
ARL Buy 12
ASA Buy 7
ASRV Buy 12
Option Exercise 1
AUBN Buy 4
Sale 11
BAC Option Exercise 23
Buy 15
Sale 7
BCBP Buy 3
Sale 11
BKSC Buy 55
BMRA Buy 5
Option Exercise 3
Sale 1
..
I'm grouping the data by their stock tickers and then looking at their respective column Transaction values. I'm trying to filter out the groups whose Transaction value_counts have more Buy than Sale.
I can't figure out how to do this.
I tried something like this:
df.groupby('Stock').filter(lambda x: x.Transaction.value_counts().Buy > x.value_counts().Sale)
which oddly doesn't work despite this working:
df.Transaction.value_counts().Buy
>>>2674
I also tried things along the lines of
df.groupby('Stock').Transaction.filter(lambda x: x if x.value_counts().Buy > x.value_counts().Sale)
But I can't think of which pandas tools are ideal in this case.
The output can be anything from just the name of the stocks which satisfy this condition to printing out the entire group (stock name and Transaction)
So the output would be something like this
ADC Buy 2
AMNB Buy 10
ARCC Buy 15
ARL Buy 12
ASA Buy 7
ASRV Buy 12
Option Exercise 1
BAC Option Exercise 23
Buy 15
Sale 7
BKSC Buy 55
BMRA Buy 5
Option Exercise 3
Sale 1
Or just the stock names.
Thanks.

I'd unstack then query
d1 = df.groupby('Stock').Transaction.value_counts()
d1.unstack(fill_value=0).query('Buy > Sale')
We can get it back all nice a tidy with this
d1.unstack(fill_value=0).query('Buy > Sale') \
.replace(0, np.nan).stack().astype(int)
Stock Transaction
ADC Buy 2
AMNB Buy 10
ARCC Buy 15
ARL Buy 12
ASA Buy 7
ASRV Buy 12
Option Exercise 1
BAC Buy 15
Option Exercise 23
Sale 7
BKSC Buy 55
BMRA Buy 5
Option Exercise 3
Sale 1
dtype: int64

Related

PostgreSQL how to find row IDs based on aggregated metrics

I have a table in a PostgreSQL database that looks more or less like this:
ID
Side
Amount
Price
1
BUY
8
107295.000000000
2
SELL
18
107300.000000000
3
SELL
21
107305.000000000
4
BUY
17
107310.000000000
And I have some aggregated metrics that look like this:
{'BUY': {'amount_sum': 6655, 'price_avg': 105961.497370398197}, 'SELL': {'amount_sum': 6655, 'price_avg': 106214.787377911345}}
And I need to find the row IDs that match these metrics. How would I go about doing this?
I've read a bit into PostgreSQL documentation and I've tried using GROUP BY on SIDE, then using the HAVING clause, but wasn't successful.
===================================================
To clarify, given this table and input:
ID
Side
Amount
Price
1
BUY
2
1
2
SELL
1
2
3
SELL
2
1
4
BUY
1
3
5
SELL
8
1
6
BUY
5
2
{'BUY': {'amount_sum': 3, 'price_avg': 2}, 'SELL': {'amount_sum': 10, 'price_avg': 1}}
I would the expected output to be:
BUY: ids[1,4] SELL: ids[3,5] that's because for ids 1 and 4, which have side as BUY, the sum of the amount column is 3, and the average of the price column is 2. And for ids 3 and 5, which have side as SELL, the sum of the amount column is 10, and the average of the price column is 1.
#Gabriel Tkacz, I have not 50 reputaton to comment) and ask my question as "answer".
From input table was expected:
{'BUY':{ 'Ids':[1,4,6]','amount_sum':8','price_avg':2}
{'SELL':{ 'Ids':[2,3,5]','amount_sum':11','price_avg':1.3333333333333333}
why excluded [6] on BUY side and [2] on SELL side in your explanation
BUY: ids[1,4] SELL: ids[3,5]

how to access repeat purchase records for the next three months without self join?

I have a table with customer transaction information, for example records for one customer (identified by customer_id) look like this:
order_id
bk_date
booking_has_insurance_indicator
1
7/20
0
2
8/2
0
3
8/3
1
4
8/9
1
5
11/6
0
6
12/2
0
7
12/6
0
8
12/7
0
I'd like to find out for each customer, for each order_id, if there's repeat purchase within 90 days and how many of those, also if so, whether there's insurance attached. For example, for order_id = 1, there's three repeat purchase (order_id = 2,3,4) within 90 days and there exist orders with insurance (order_id = 3,4). Ideal output would look like
order_id
bk_date
repeat_count
repeat_has_insurance_indicator
1
7/20
3
1
2
8/2
2
1
3
8/3
2
1
4
8/9
1
0
5
11/6
3
0
6
12/2
2
0
7
12/6
1
0
8
12/7
0
0
I'm aware that if I only want to access the next order record I can use LEAD window function without joining, but with question above, I could only think of self join to join each order_id to the ones with bk_date within 90 days. However, given the volume of the data with millions of customers, self join is also not an option due to memory limit. Could someone help me if there's a more efficient solution?

In UniQuery, how do you get the count of unique values found while doing a BREAK.ON

I know I can get the counts for how many individual entries are in each unique groups of records with the following.
LIST CUSTOMER BREAK-ON CITY TOTAL EVAL "1" COL.HDG "Customer Count" TOTAL CUR_BALANCE BY CITY
And I end up with something like this.
Cust...... City...... Customer Count Currently Owes
6 Arvada 1 4.54
********** -------------- --------------
Arvada 1 4.54
190 Boulder 1 0.00
1 Boulder 1 13.65
********** -------------- --------------
Boulder 2 13.65
...
============== ==============
TOTAL 29 85.28
29 records listed
Which becomes this, after we suppress the details and focus on the groups themselves.
City...... Customer Count Currently Owes
Arvada 1 4.54
Boulder 2 13.65
Chicago 3 4.50
Denver 6 0.00
...
============== ==============
TOTAL 29 85.28
29 records listed
But can I get a count of how many unique grouping are in the same report? Something like this.
City...... Customer Count Currently Owes City Count
Arvada 1 4.54 1
Boulder 2 13.65 1
Chicago 3 4.50 1
Denver 6 0.00 1
...
============== ============== ==========
TOTAL 29 85.28 17
29 records listed
Essentially, I want the unique value count integrated into the other report so that I don't have to create an extra report just for something so simple.
SELECT CUSTOMER SAVING UNIQUE CITY
17 records selected to list 0.
I swear that this should be easier. I see various # variables in the documentation that hint at the possibility of doing this easily but I have never been about to get one of them to work.
If your data is structured in such a way that your id is what you would be grouping by and the data you want is stored in Value delimited field and you don't want to include or exclude anything you can use something like the following.
In UniVerse using the CUSTOMER table in the demo HS.SALES account installed on many systems, you can do this. The CUSTID is the the record #ID and Attribute 13 is where there PRICE is stored in a Value delimited array.
LIST CUSTOMER BREAK-ON CUSTID TOTAL EVAL "DCOUNT(#RECORD<13>,#VM)" TOTAL PRICE AS P.PRICE BY CUSTID DET.SUP
Which outputs this.
DCOUNT(#RECORD<13>,#
Customer ID VM)................. P.PRICE
1 1 $4,200
2 3 $19,500
3 1 $4,250
4 1 $16,500
5 2 $3,800
6 0 $0
7 2 $5,480
8 2 $12,900
9 0 $0
10 3 $10,390
11 0 $0
12 0 $0
==================== =======
15 $77,020
That is a little juice for a lot of squeeze, but I hope you find it useful.
Good Luck!
Since the system variable #NB is set only on the total lines, this will allow your counter to calculate the number of TOTAL lines, which occur per unique city, excluding the grand total.
LIST CUSTOMER BREAK-ON CITY TOTAL EVAL "IF #NB < 127 THEN 1 ELSE 0" COL.HDG "Customer Count" TOTAL CUR_BALANCE BY CITY
I don't have a system to try this on, but this is my understanding of the variable.

Graphically represent SQL Data

Given a table with the following structure with 11+M transactions.
ID ProductKey CloseDate Part PartAge Sales
1 XXXXP1 5/10/15 P1 13 100
2 XXXXP2 6/1/16 P1 0 15
3 XXXXP3 4/1/08 P1 0 280
4 XXXXP1 3/18/11 P1 0 10
5 XXXXP3 6/29/15 P1 45 15
6 XXXXP1 8/11/13 P1 30 360
Products XXXXP1 and XXXXP3 are entered twice since they are resales. Product Age=0 indicates its a new sale. So these products went from:
New Sale --> ReSale --> ReSale
Using a self-joining query, I can retrieve all the products which were resales. But is there a way to display these in a pretty graph or tree format?
Something which depicts the life-span of the sale transaction of the product?
Any ideas will be appreciated.
TIA,
B

Sum quantity from 2 different tables where they have an id that which from other table

ProdStock
ID_Prod Description
1 tshirt
2 pants
3 hat
Donation
id_dona ID_Prod Quantity
1 1 10
2 2 20
3 1 30
4 3 5
Beneficiation
id_bene ID_Prod Quantity
1 1 -5
2 2 -10
3 1 -15
Table expected
ID_Prod Description Quantity
1 tshirt 20
2 pants 10
3 hat 5
Donation = what is given to the institution
beneficiation= institution gives to people in need
i need to achieve "Table expected" , i tried with sum. I dont have much knowledge in sql, it would be great if someone could help.
Since I have no idea what database you're actually working with, here is an idea how you might get in the right direction:
Select ProdStock.ID_Prod, ProdStock.Description,
(Sum(Donation.Quantity) + Sum(Beneficiation.Quantity)) as Quantity
From ProdStock
Inner Join Donation on ProdStock.ID_Prod=Donation.ID_Prod
Inner Join Beneficiation on ProdStock.ID_Prod=Beneficiation.ID_Prod
Group By ProdStock.ID_Prod, ProdStock.Description;