PostgreSQL how to find row IDs based on aggregated metrics - sql

I have a table in a PostgreSQL database that looks more or less like this:
ID
Side
Amount
Price
1
BUY
8
107295.000000000
2
SELL
18
107300.000000000
3
SELL
21
107305.000000000
4
BUY
17
107310.000000000
And I have some aggregated metrics that look like this:
{'BUY': {'amount_sum': 6655, 'price_avg': 105961.497370398197}, 'SELL': {'amount_sum': 6655, 'price_avg': 106214.787377911345}}
And I need to find the row IDs that match these metrics. How would I go about doing this?
I've read a bit into PostgreSQL documentation and I've tried using GROUP BY on SIDE, then using the HAVING clause, but wasn't successful.
===================================================
To clarify, given this table and input:
ID
Side
Amount
Price
1
BUY
2
1
2
SELL
1
2
3
SELL
2
1
4
BUY
1
3
5
SELL
8
1
6
BUY
5
2
{'BUY': {'amount_sum': 3, 'price_avg': 2}, 'SELL': {'amount_sum': 10, 'price_avg': 1}}
I would the expected output to be:
BUY: ids[1,4] SELL: ids[3,5] that's because for ids 1 and 4, which have side as BUY, the sum of the amount column is 3, and the average of the price column is 2. And for ids 3 and 5, which have side as SELL, the sum of the amount column is 10, and the average of the price column is 1.

#Gabriel Tkacz, I have not 50 reputaton to comment) and ask my question as "answer".
From input table was expected:
{'BUY':{ 'Ids':[1,4,6]','amount_sum':8','price_avg':2}
{'SELL':{ 'Ids':[2,3,5]','amount_sum':11','price_avg':1.3333333333333333}
why excluded [6] on BUY side and [2] on SELL side in your explanation
BUY: ids[1,4] SELL: ids[3,5]

Related

increase rank based on particular value in column

I would appreciate some help for below issue. I have below table
id
items
1
Product
2
Tea
3
Coffee
4
Sugar
5
Product
6
Rice
7
Wheat
8
Product
9
Beans
10
Oil
I want output like below. Basically I want to increase the rank when item is 'Product'. May I know how can I do that? For data privacy and compliance purposes I have modified the data and column names
id
items
ranks
1
Product
1
2
Tea
1
3
Coffee
1
4
Sugar
1
5
Product
2
6
Rice
2
7
Wheat
2
8
Product
3
9
Beans
3
10
Oil
3
I have tried Lag and lead functions but unable to get expected output
Here is solution using a derived value of 1 or 0 to denote data boundaries SUM'ed up with the ROWS UNBOUNDED PRECEDING option, which is key here.
SELECT
id,
items,
SUM(CASE WHEN items='Product' THEN 1 ELSE 0 END) OVER (ORDER BY id ROWS UNBOUNDED PRECEDING) as ranks
FROM

SQL - Sum a row of values based on Dates

I have following type of data
ingredients:
Milk
Apple
Rice
...
Then its purchased Date
26.10.2020
25.10.2020
etc
Each item is recorded when its purchased.
I want now to get at the right hand side to see how many times I bought apples, rice & milk.
As now I only see
Dates ---> 25.10.2020|24.10.2020
Rice 1 NULL
Milk 1 1
Apples NULL 1
My Goal is to see:
Dates ---> 25.10.2020|24.10.2020 SUM
Rice 1 NULL 1
Milk 1 1 2
Apples NULL 1 1
Thank you for your support!
The example of the data
Now I want to see at the end to total SUM, as there would be multiple days.

In UniQuery, how do you get the count of unique values found while doing a BREAK.ON

I know I can get the counts for how many individual entries are in each unique groups of records with the following.
LIST CUSTOMER BREAK-ON CITY TOTAL EVAL "1" COL.HDG "Customer Count" TOTAL CUR_BALANCE BY CITY
And I end up with something like this.
Cust...... City...... Customer Count Currently Owes
6 Arvada 1 4.54
********** -------------- --------------
Arvada 1 4.54
190 Boulder 1 0.00
1 Boulder 1 13.65
********** -------------- --------------
Boulder 2 13.65
...
============== ==============
TOTAL 29 85.28
29 records listed
Which becomes this, after we suppress the details and focus on the groups themselves.
City...... Customer Count Currently Owes
Arvada 1 4.54
Boulder 2 13.65
Chicago 3 4.50
Denver 6 0.00
...
============== ==============
TOTAL 29 85.28
29 records listed
But can I get a count of how many unique grouping are in the same report? Something like this.
City...... Customer Count Currently Owes City Count
Arvada 1 4.54 1
Boulder 2 13.65 1
Chicago 3 4.50 1
Denver 6 0.00 1
...
============== ============== ==========
TOTAL 29 85.28 17
29 records listed
Essentially, I want the unique value count integrated into the other report so that I don't have to create an extra report just for something so simple.
SELECT CUSTOMER SAVING UNIQUE CITY
17 records selected to list 0.
I swear that this should be easier. I see various # variables in the documentation that hint at the possibility of doing this easily but I have never been about to get one of them to work.
If your data is structured in such a way that your id is what you would be grouping by and the data you want is stored in Value delimited field and you don't want to include or exclude anything you can use something like the following.
In UniVerse using the CUSTOMER table in the demo HS.SALES account installed on many systems, you can do this. The CUSTID is the the record #ID and Attribute 13 is where there PRICE is stored in a Value delimited array.
LIST CUSTOMER BREAK-ON CUSTID TOTAL EVAL "DCOUNT(#RECORD<13>,#VM)" TOTAL PRICE AS P.PRICE BY CUSTID DET.SUP
Which outputs this.
DCOUNT(#RECORD<13>,#
Customer ID VM)................. P.PRICE
1 1 $4,200
2 3 $19,500
3 1 $4,250
4 1 $16,500
5 2 $3,800
6 0 $0
7 2 $5,480
8 2 $12,900
9 0 $0
10 3 $10,390
11 0 $0
12 0 $0
==================== =======
15 $77,020
That is a little juice for a lot of squeeze, but I hope you find it useful.
Good Luck!
Since the system variable #NB is set only on the total lines, this will allow your counter to calculate the number of TOTAL lines, which occur per unique city, excluding the grand total.
LIST CUSTOMER BREAK-ON CITY TOTAL EVAL "IF #NB < 127 THEN 1 ELSE 0" COL.HDG "Customer Count" TOTAL CUR_BALANCE BY CITY
I don't have a system to try this on, but this is my understanding of the variable.

How to count the ID with the same prefix and store the total number in another column

I have a dataset in which I noticed that the ID comes with info for classification. Basically, the last 2 digits of ID stand for their sub-ID (01, 02, 03, etc) in the same family. Below is an example. I am trying to get another column (the 2nd column) to store the information of how many sub-IDs we have for the same family. e.g., 22302 belongs to family 223, which has 3 members: 22301, 22302, and 22303. So that I have a new feature for classification modeling. Not sure if there is a better idea to extract information. Anyway, can someone let me know how to extract the number in the same class (as shown the 2nd column)
ID Same class
23401 1
22302 3
43201 1
144501 2
144502 2
22301 3
22303 3
You can do it with str slice and transform
df['New']=df.groupby(df.ID.astype(str).str[:-2]).ID.transform('size')
df
Out[223]:
ID Sameclass New
0 23401 1 1
1 22302 3 3
2 43201 1 1
3 144501 2 2
4 144502 2 2
5 22301 3 3
6 22303 3 3

Trying to query certain groups that satisfy a value_count condition

I'm playing around with stock data, and I'm trying to filter the groups that have more buys than sells with respective to Transaction values
So the code I'm running to display the below data is
df.groupby('Stock').Transaction.value_counts()
data
Stock Transaction
ADC Buy 2
AKAM Option Exercise 51
Sale 34
Buy 9
AMNB Buy 10
ARCC Buy 15
ARL Buy 12
ASA Buy 7
ASRV Buy 12
Option Exercise 1
AUBN Buy 4
Sale 11
BAC Option Exercise 23
Buy 15
Sale 7
BCBP Buy 3
Sale 11
BKSC Buy 55
BMRA Buy 5
Option Exercise 3
Sale 1
..
I'm grouping the data by their stock tickers and then looking at their respective column Transaction values. I'm trying to filter out the groups whose Transaction value_counts have more Buy than Sale.
I can't figure out how to do this.
I tried something like this:
df.groupby('Stock').filter(lambda x: x.Transaction.value_counts().Buy > x.value_counts().Sale)
which oddly doesn't work despite this working:
df.Transaction.value_counts().Buy
>>>2674
I also tried things along the lines of
df.groupby('Stock').Transaction.filter(lambda x: x if x.value_counts().Buy > x.value_counts().Sale)
But I can't think of which pandas tools are ideal in this case.
The output can be anything from just the name of the stocks which satisfy this condition to printing out the entire group (stock name and Transaction)
So the output would be something like this
ADC Buy 2
AMNB Buy 10
ARCC Buy 15
ARL Buy 12
ASA Buy 7
ASRV Buy 12
Option Exercise 1
BAC Option Exercise 23
Buy 15
Sale 7
BKSC Buy 55
BMRA Buy 5
Option Exercise 3
Sale 1
Or just the stock names.
Thanks.
I'd unstack then query
d1 = df.groupby('Stock').Transaction.value_counts()
d1.unstack(fill_value=0).query('Buy > Sale')
We can get it back all nice a tidy with this
d1.unstack(fill_value=0).query('Buy > Sale') \
.replace(0, np.nan).stack().astype(int)
Stock Transaction
ADC Buy 2
AMNB Buy 10
ARCC Buy 15
ARL Buy 12
ASA Buy 7
ASRV Buy 12
Option Exercise 1
BAC Buy 15
Option Exercise 23
Sale 7
BKSC Buy 55
BMRA Buy 5
Option Exercise 3
Sale 1
dtype: int64