Grouping twice by different criteria, same column - sql

I have data with the following columns:
OFFICER_ID, CLIENT_ID, SECURITY_CODE, POSITION_SIZE
and then per each row for instance:
officer1, client100, securityZYX, $100k,
officer2, client124, securityADF, $200k,
officer1, client130, securityARR, $150k,
officer4, client452, securityADF, $200k,
officer2, client124, securityARR, $500k,
officer7, client108, securityZYX, $223k,
and so on.
As you see, each client has a single officer assigned to either buy o sell securities, but each client can have bought different securities.
Apart from ranking officers by total amount in US$ of securities held by their clients (which I've done) I need to create ranges of total client accounts by adding total securities held by client ID, for example, total securities held sum < $1million, between $1million and $3million and > $3 million.
I've tried:
SELECT officer_ID, SUM(position_size) as AUM
FROM trades
GROUP BY client_ID
HAVING AUM > 1000000 AND AUM < 3000000;
and I get a list of all officers appearing several times, no totals.
I'd need a simple:
Officer_ID | range < 1m | range 1m-3m | range > 3m
officer1, [total amount of client accounts with securities adding up < 1m totals], [total amount of client accounts with securities adding up between 1m and 3m totals], etc.
Please, would you point me in the right direction?
UPDATE
I modified Tim's suggested code and obtained the desired output:
SELECT
OFFICER_ID,
SUM(CASE WHEN total < 1000000 THEN total END) AS "range < 1m",
SUM(CASE WHEN total >= 1000000 AND total < 3000000 THEN total END) AS "range 1m-3m",
SUM(CASE WHEN total >= 3000000 THEN total END) AS "range > 3m"
FROM
(
SELECT OFFICER_ID, CLIENT_ID, SUM(POSITION_SIZE) AS total
FROM trades
GROUP BY OFFICER_ID, CLIENT_ID
) t
GROUP BY
OFFICER_ID;
Too kind, Tim, thanks!

We can try aggregating twice, first by both officer and client, to get the client totals, and a second time by officer alone, to get the counts:
SELECT
OFFICER_ID,
COUNT(CASE WHEN total < 1000000 THEN 1 END) AS "range < 1m",
COUNT(CASE WHEN total >= 1000000 AND total < 3000000 THEN 1 END) AS "range 1m-3m",
COUNT(CASE WHEN total >= 3000000 THEN 1 END) AS "range > 3m"
FROM
(
SELECT OFFICER_ID, CLIENT_ID, SUM(POSITION_SIZE) AS total
FROM trades
GROUP BY OFFICER_ID, CLIENT_ID
) t
GROUP BY
OFFICER_ID;

Related

Window function or Recursive query in Redshift

I try to classify customer based on monthly sales. Logic for calculation:
new customer – it has sales and appears for first time or has sales and returned after being lost (after 4 month period, based on sales_date)
active customer - not new and not lost.
lost customer - no sales and period (based on sales_date) more than 4 months
This is the desired output I'm trying to achieve:
The below Window function in Redshift classify however it is not correct.
It classified lost when difference between month > 4 in one row, however it did not classify lost if it was previous lost and revenue 0 until new status appear. How it can be updated?
with customer_status (
select customer_id,customer_name,sales_date,sum(sales_amount) as sales_amount_calc,
nvl(DATEDIFF(month, LAG(reporting_date) OVER (PARTITION BY customer_id ORDER by sales_date ASC), sales_date),0) AS months_between_sales
from customer
GROUP BY customer_id,customer_name,sales_date
)
select *,
case
WHEN months_between_sales = 0 THEN 'New'
WHEN months_between_sales > 0 THEN 'Active'
WHEN months_between_sales > 0 AND months_between_sales <= 4 and sales_amount_calc = 0 THEN 'Active'
WHEN /* months_between_sales > 0 and */ months_between_sales > 4 and sales_amount_calc = 0 THEN 'Lost'
ELSE 'Unknown'
END AS status
from customer_status
One way to solve to get cumulative sales amount partitioned on subset of potentially lost customers ( sales_amount = 0).
Cumulative amount for the customer partitioned
sum(months_between_sales) over (PARTITION BY customer_id ORDER by sales_date ASC rows unbounded preceding) as cumulative_amount,
How can it be changed to get sub-partitioned, for example when sales amount= 0 , in order to get lost correctly?
Does someone have ideas to translate the above logic into an
recursive query in Redshift?
Thank you

sql multiplying a sum

I'm struggling with the below code. i'm trying to work out the total balance of a folio by using its total charges and total payments. i'm almost there, however with the below code the payments is incorrect, it seems to be multiplying the total sum of payments by the number of entries in charges. i'm guessing this is because I've connected the payments and charges, which I needed to do to filter out the checkin date which is only on pms sales.
can anyone help?
thanks
SELECT DISTINCT 'PMS AD' AS APP,
FOLIO_ID,
SUM(TOTAL_CHARGES) as TOTAL_CHARGES,
SUM(TOTAL_PAYMENTS) as TOTAL_PAYMENTS
FROM ((SELECT DISTINCT P1.FOLIO_ID AS FOLIO_ID, SUM(P1.CHARGE_CODE_AMOUNT) AS TOTAL_CHARGES, 0 AS TOTAL_PAYMENTS
FROM DEV.VR_PMS_SALES P1
WHERE P1.CHARGE_CODE_AMOUNT <> 0 AND
P1.ITEM_OPERATING_DAY IS NOT NULL
AND P1.ITEM_OPERATING_DAY <= '03-DEC-2014'
AND P1.CHECKIN_DATE <= '03-DEC-2014'
GROUP BY P1.FOLIO_ID
) UNION ALL
(SELECT DISTINCT P2.FOLIO_ID AS FOLIO_ID, 0 AS TOTAL_CHARGES, SUM(P2.AMOUNT) AS TOTAL_PAYMENTS
FROM DEV.VR_PMS_PAYMENTS P2,
DEV.VR_PMS_SALES P3
WHERE P2.FOLIO_ID = P3.FOLIO_ID
AND P2.AMOUNT <> 0
AND P2.PMS_OPERATING_DAY <= '03-DEC-2014'
AND P3.CHECKIN_DATE <= '03-DEC-2014'
GROUP BY P2.FOLIO_ID
)
) F
GROUP BY FOLIO_ID
EDIT:
Sorry I didn't provide examples. table data below
VR_PMS_PAYMENTS
VR_PMS_SALES
The issue I am having is that when running the sql it is multiplying the sum of p1.amount by the number of entries in VR_PMS_SALES. eg folio 4 is returning as 165 instead of 55. I need it to return the below...
desired outcome
I hope this is clearer.
thank you

Account balance of more than 1000 over x days consecutively

Firstly, i have a transaction database with the following field. The requirement is to alert the admin of a user if the user holds usd 1000 in balance consecutively for more than 30 days.
TransactionID
TransactionType - deposit, withdrawal, transfer
TransactionFromUserID
TransactionToUserID
TransactionValueUsd
TransactionDateTime
Note:
- Currently i only have this table and do not have another table to update the balance. The balance is calculated on the fly.
If one of the days is not more than 1000 usd it is needed to be recalculate again
Need not worries about performance issue. Just need a general idea on how should i design another table to hold the value and maybe a trigger to solve this issue.
eg:
2019-01-01: deposit 500 usd
2019-02-01: deposit 2000 usd - balance 2500 usd, start count from here
2019-02-10: withdraw 2500 usd - balance 500, reset date
2019-02-15: deposit 2000 usd - balance 2500 - start exceed date again here
2019-04-15: withdraw 1000 usd - balance 1500 - flag here and reset last exceed date
Your question is not 100% clear, but I think this query is close enough to the answer:
select
*
from (
select
transactiontouserid,
transactiondatetime,
sum(case when balance < 1000 then 1 else 0 end)
over(
partition by transactiontouserid
order by transactiondatetime
range between interval '30' day preceding and current row
) as disqualify_points
from (
select
transactiontouserid,
transactiondatetime,
sum(case when transactiontype = 'deposit' then 1 else -1 end
* transactionvalueusd)
over(
partition by transactiontouserid
order by transactiondatetime
) as balance
from t -- your table name
) x
) y
where disqualify_points = 0

How can I return a count of how many orders fall within incremental ranges?

I have columns called order_id and purchase_amount, and I need to write a query to count how many orders fall within each incremental range of $100, along with the values of the range. For example, it has to return something like 12 orders are between $0-100, 9 orders are between $101-200 and continuing on that way, increasing by $100 each time, like below. And I'm stumped how to begin.
Count | Range
12 | $0-100
9 | $101-200
Look at using SQL function SUM combined with CASE to create a condition.
Select SUM(CASE WHEN OrderValue >= 0 AND OrderValue < 100 THEN 1 ELSE 0 END) AS Count FROM Table;
That should give you a starting point.
You didn't provide any source table data sample so I did use the standard table Orders from TPCH to build the following:
select
case
when o_totalprice < 1000 then '1k-'
when o_totalprice >= 1000 and o_totalprice < 5000 then '1k-5k'
when o_totalprice >= 5000 and o_totalprice < 10000 then '5k-10'
when o_totalprice >= 10000 and o_totalprice < 20000 then '10k-20k'
when o_totalprice >= 20000 and o_totalprice < 50000 then '20k-50k'
else '50k+'
end as 'Range',
count(*) as 'Range-Count'
from
orders
group by 1
;
Hope this answer your question...

Query to find and average weighted price for day trades

I found this old question which brings a nice approach for calculating weighted average prices. It basically consists in grouping by the stock name and then fetching the sum(quantity*price)/sum(quantity)
But in day trades you buy and sell the asset in the same day, meaning that the final quantity of the day is zero and sql returns: Divide by zero error encountered
A examples would be
3 trades for the same stock
1. Price 10 Quantity
2. 100 Price 8 Quantity 100
3. Price 30 Quantity 200
Do you guys know some workaround ? Is there a way to group trades with positive and negative quantities separately ?
sure, add a grouping bucket defined by the sign of the amount...
Select assetIdentifier,
case when amount > 0 then 'debit' else 'credit' end typeTx,
Avg(Amount)
from table
group by assetIdentifier,
case when amount > 0 then 'debit' else 'credit' end
or, if you want both values on a single output row,
Select assetIdentifier,
avg(case when amount > 0 then amount end) debit ,
avg(case when amount < 0 then amount end) credit
from table
group by assetIdentifier
The formula for the weighted average is:
sum(quantity*price)/sum(quantity)
------------------------^ NOT price
If you want to ignore the direction of the trade, then just use absolute value:
sum(abs(quantity)*price)/sum(abs(quantity))