PostgreSQL average for every field - sql

I'm trying to calculate the average in this sample; This example is working (but only when I select a specific ID, rather than the avg for every ID limited to 20 entries) but I'm having a hard time remembering how to calculate this for every id within the database, rather than the developer specifying the ID explicitly (in this case as 2958). I.E. It would be optimal to have the following rows (assuming this is grouped by each primary key with a limit of 20 values per avg):
ID: 1 -> avg 5
ID: 2 -> avg 2
ID: 3 -> avg 7
etc....
select avg(acc.amt)
from (
select acc.amt amt
from main_acc main_acc
join transactions trans on main_acc.id = trans.main_acc_id
where main_acc.id = 2958
order by main_acc.track_id, transactions.transaction_time desc
limit 20
) acc;
Any help at all would be greatly appreciated. The only relevant columns are the ones shown above, I can add a schema definition if requested. Thank you!

select main_acc.id, avg(acc.amt) from (select acc.amt amt
from main_acc main_acc
join transactions trans on main_acc.id = trans.main_acc_id
order by main_acc.track_id, transactions.transaction_time desc) acc
group by main_acc.id;
In fact you do not need the subquery.
select acc.id, avg(acc.amt)
from main_acc acc
join transactions trans on acc.id = trans.main_acc_id
group by acc.id

How do you define most recent. Is there a timestamp which is not shown, do you use the greatest id, something else? Basically, you order by that criteria and then use limit. So expanding the answer from #Tarik and assuming highest ids as most recent would yield something like:
select acc.id, avg(acc.amt) avg_for_id
from main_acc acc
join transactions trans on acc.id = trans.main_acc_id
group by acc.id
order by acc.id desc
limit 20;

Related

How to join a query result with a value, received from another query?

I want to calculate transaction costs in USD
for a number of most recent transactions
on the Rootstock blockchain.
I have a PostgreSQL database table with token
prices reports.token_prices
from which I select the value
of the latest available RBTC price in USD:
select tp.price_in_usd
from reports.token_prices tp
where tp.chain_id = 30
and tp.coingecko_token_id = 'rootstock'
order by tp.dt desc
limit 1
(note that tp.dt is a timestamp)
Result of the query:
16995.771
Then I have a table with all transactions,
chain_rsk_mainnet.block_transactions,
from which I select the gas fees
for the 5 most recent ones:
select
bt.fees_paid
from chain_rsk_mainnet.block_transactions bt
order by bt.block_id desc, bt.tx_offset
limit 5
(note that instead of using a timestamp, I'm using bt.block_id and bt.tx_offset for transaction order)
Result:
0
4469416300800
4469416300800
16450260000000
0
Now I want to multiply each of these numbers
by the result of the first query.
How can I do this in SQL?
Without further information your simplest option would be just convert the first query into a CTE then Join that result in the second query.
with price_cte(price_in_usd) as
(select tp.price_in_usd
from reports.token_prices tp
where tp.chain_id = 30
and tp.coingecko_token_id = 'rootstock'
order by tp.dt desc
limit 1
)
select bt.fees_paid * p.price_in_usd) "Fees Paid in USD"
from chain_rsk_mainnet.block_transactions bt
cross join price_cte p
order by bt.block_id desc, bt.tx_offset
limit 5;
NOTE: Not tested, no sample data nor results.

How to select 1000 customers who were the first to gain 1000 bonus points for purchases in categories "Taxi" and "Books"? (SQLite)

The BONUS table has attributes: client_id, bonus_date, the number of accrued bonuses (bonus_cnt), mcc code of the transaction for which added bonuses (mcc_code). The MCC_CATEGORIES table is a mcc code reference.
Attributes:
mcc-code (mcc_code), category (for example, supermarkets, transport, pharmacies, etc., mcc_category)
How to select 1000 customers who were the first to gain 1000 bonus points for purchases in
categories "Taxi" and "Books"?
BONUS table looks like:
CLIENT_ID BONUS_DATE BONUS_CNT MCC_CODE
1121 2020-01-02 23 5432
3421 2020-04-15 7 654
...
MCC_CATEGORIES table looks like:
MCC_CODE MCC_CATEGORY
5432 Taxi
3532 Music
...
I would use window functions and aggregation: first join the tables and compute the running sum of bonus per user and category. Then aggregate by user and category, and get the date when they reached a bonus of 1000. Finally, compute the date when each user reached the target on both categories, order by that, and limit:
select client_id, max(bonus_date) bonus_date
from (
select client_id, mcc_category, min(bonus_date) bonus_date
from (
select b.client_id, b.bonus_date, c.mcc_category,
sum(bonus_cnt) over(partition by b.client_id, c.mcc_category order by b.bonus_date) sum_bonus
from bonus b
inner join mcc_categories c on c.mcc_code = b.mcc_code
where mcc_category in ('Taxi', 'Books')
) t
where sum_bonus >= 1000
group by client_id, mcc_category
) t
group by client_id
having count(*) = 2
order by bonus_date
limit 1000
Window functions are available in SQLite starting version 3.25.
How to select 1000 customers who were the first to gain 1000 bonus points for purchases in categories "Taxi" and "Books"?
I am guessing you want to combine the bonuses for the two categories together. If so:
select client_id, min(bonus_date) as min_bonus_date
from (select b.client_id, b.bonus_date, b.bonus_cnt,
sum(b.bonus_cnt) over (partition by b.client_id order by b.bonus_date) as running_bonus_cnt
from bonus b join
mcc_categories c
on c.mcc_code = b.mcc_code
where mcc_category in ('Taxi', 'Books')
) bc
where running_bonus_cnt >= 1000 and
running_bonus_cnt - bonus_cnt < 1000
group by client_id
order by min_bonus_date
limit 1000;
Note how this works. The subquery calculates the running bonus amount. The where clause then gets the one row where the bonus count first exceeds 1000.
The rest is just aggregation.

Using a stored procedure in Teradata to build a summarial history table

I am using Terdata SQL Assistant connected to an enterprise DW. I have written the query below to show an inventory of outstanding items as of a specific point in time. The table referenced loads and stores new records as changes are made to their state by load date (and does not delete historical records). The output of my query is 1 row for the specified date. Can I create a stored procedure or recursive query of some sort to build a history of these summary rows (with 1 new row per day)? I have not used such functions in the past; links to pertinent previously answered questions or suggestions on how I could get on the right track in researching other possible solutions are totally fine if applicable; just trying to bridge this gap in my knowledge.
SELECT
'2017-10-02' as Dt
,COUNT(DISTINCT A.RECORD_NBR) as Pending_Records
,SUM(A.PAY_AMT) AS Total_Pending_Payments
FROM DB.RECORD_HISTORY A
INNER JOIN
(SELECT MAX(LOAD_DT) AS LOAD_DT
,RECORD_NBR
FROM DB.RECORD_HISTORY
WHERE LOAD_DT <= '2017-10-02'
GROUP BY RECORD_NBR
) B
ON A.RECORD_NBR = B.RECORD_NBR
AND A.LOAD_DT = B.LOAD_DT
WHERE
A.RECORD_ORDER =1 AND Final_DT Is Null
GROUP BY Dt
ORDER BY 1 desc
Here is my interpretation of your query:
For the most recent load_dt (up until 2017-10-02) for record_order #1,
return
1) the number of different pending records
2) the total amount of pending payments
Is this correct? If you're looking for this info, but one row for each "Load_Dt", you just need to remove that INNER JOIN:
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
ORDER BY 1 DESC
If you want to get the summary info per record_order, just add record_order as a grouping column:
SELECT
load_Dt,
record_order,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE final_Dt IS NULL
GROUP BY load_Dt, record_order
ORDER BY 1,2 DESC
If you want to get one row per day (if there are calendar days with no corresponding "load_dt" days), then you can SELECT from the sys_calendar.calendar view and LEFT JOIN the query above on the "load_dt" field:
SELECT cal.calendar_date, src.Pending_Records, src.Total_Pending_Payments
FROM sys_calendar.calendar cal
LEFT JOIN (
SELECT
load_Dt,
COUNT(DISTINCT record_nbr) AS Pending_Records,
SUM(pay_amt) AS Total_Pending_Payments
FROM DB.record_history
WHERE record_order = 1
AND final_Dt IS NULL
GROUP BY load_Dt
) src ON cal.calendar_date = src.load_Dt
WHERE cal.calendar_date BETWEEN <start_date> AND <end_date>
ORDER BY 1 DESC
I don't have access to a TD system, so you may get syntax errors. Let me know if that works or you're looking for something else.

SQL aggregate functions and sorting

I am still new to SQL and getting my head around the whole sub-query aggregation to display some results and was looking for some advice:
The tables might look something like:
Customer: (custID, name, address)
Account: (accountID, reward_balance)
Shop: (shopID, name, address)
Relational tables:
Holds (custID*, accountID*)
With (accountID*, shopID*)
How can I find the store that has the least reward_balance?
(The customer info is not required at this point)
I tried:
SELECT accountID AS ACCOUNT_ID, shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM Account, Shop, With
WHERE With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID
ORDER BY MIN(reward_balance);
This works in a way that is not intended:
ACCOUNT_ID | SHOP_ID | LOWEST_BALANCE
1 | 1 | 10
2 | 2 | 40
3 | 3 | 100
4 | 4 | 1000
5 | 4 | 5000
As you can see Shop_ID 4 actually has a balance of 6000 (1000+5000) as there are two customers registered with it. I think I need to SUM the lowest balance of the shops based on their balance and display it from low-high.
I have been trying to aggregate the data prior to display but this is where I come unstuck:
SELECT shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM (SELECT accountID, shopID, SUM(reward_balance)
FROM Account, Shop, With
WHERE
With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID;
When I run something like this statement I get an invalid identifier error.
Error at Command Line : 1 Column : 24
Error report -
SQL Error: ORA-00904: "REWARD_BALANCE": invalid identifier
00904. 00000 - "%s: invalid identifier"
So I figured I might have my joining condition incorrect and the aggregate sorting incorrect, and would really appreciate any general advice.
Thanks for the lengthy read!
Approach this problem one step at time.
We're going to assume (and we should probably check this) that by least reward_balance, that refers to the total of all reward_balance associated with a shop. And we're not just looking for the shop that has the lowest individual reward balance.
First, get all of the individual "reward_balance" for each shop. Looks like the query would need to involve three tables...
SELECT s.shop_id
, a.reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
That will get us the detail rows, every shop along with the individual reward_balance amounts associated with the shop, if there are any. (We're using outer joins for this query, because we don't see any guarantee that a shops is going to be related to at least one account. Even if it's true for this use case, that's not always true in the more general case.)
Once we have the individual amounts, the next step is to total them for each shop. We can do that using a GROUP BY clause and a SUM() aggregate.
SELECT s.shop_id
, SUM(a.reward_balance) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
At this point, with MySQL we could add an ORDER BY clause to arrange the rows in ascending order of tot_reward_balance, and add a LIMIT 1 clause if we only want to return a single row. We can also handle the case when tot_reward_balance is NULL, assigning a zero in place of the NULL.
SELECT s.shop_id
, IFNULL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
ORDER BY tot_reward_amount ASC, s.shop_id ASC
LIMIT 1
If there are two (or more) shops with the same least value of tot_reward_amount, this query returns only one of those shops.
Oracle doesn't have the LIMIT clause like MySQL, but we can get equivalent result using analytic function (which is not available in MySQL). We also replace the MySQL IFNULL() function with the Oracle equivalent NVL() function...
SELECT v.shop_id
, v.tot_reward_balance
, ROW_NUMBER() OVER (ORDER BY v.tot_reward_balance ASC, v.shop_id ASC) AS rn
FROM (
SELECT s.shop_id
, NVL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM shop s
LEFT
JOIN with w
ON w.shop_id = s.shop_id
LEFT
JOIN account a
ON a.account_id = w.account_id
GROUP BY s.shop_id
) v
HAVING rn = 1
Like the MySQL query, this returns at most one row, even when two or more shops have the same "least" total of reward_balance.
If we want to return all of the shops that have the lowest tot_reward_balance, we need to take a slightly different approach.
The best approach to building queries is step wise refinement; in this case, start by getting all of the individual reward_amount for each shop. Next step is to aggregate the individual reward_amount into a total. The next steps is to pickout the row(s) with the lowest total reward_amount.
In SQL Server, You can try using a CTE:
;with cte_minvalue as
(
select rank() over (order by Sum_Balance) as RowRank,
ShopId,
Sum_Balance
from (SELECT Shop.shopID, SUM(reward_balance) AS Sum_Balance
FROM
With
JOIN Shop ON With.ShopId = Shop.ShopId
JOIN Account ON With.AccountId = Account.AccountId
GROUP BY
Shop.shopID)ShopSum
)
select ShopId, Sum_Balance from cte_minvalue where RowRank = 1

Count the number of occurrences grouped by some rows

I have made a query to bring me the number of products that have not been in stock (I know that by looking at the orders which the manufacturer returned with some status code), by product, date and storage, that looks like this:
SELECT count(*) as out_of_stock,
prod.id as product_id,
ped.data_envio::date as date,
opl.id as storage_id
from sub_produtos_pedidos spp
left join cad_produtos prod ON spp.ean_produto = prod.cod_ean
left join sub_pedidos sp ON spp.id_pedido = sp.id
left join pedidos ped ON sp.id_pedido = ped.id
left join op_logisticos opl ON sp.id_op_logistico = opl.id
where spp.motivo = '201' -- this is the code that means 'not in inventory'
group by storage_id,product_id,date
That produces an answer like this:
out_of_stock | product_id | date | storage_id
--------------|------------|-------------|-------------
1 | 5 | 2012-10-16 | 1
5 | 4 | 2012-10-16 | 2
Now I need to get the number of occurrences, by product and storage, of products that have been out of stock for 2 or more days, 5 or more days and so on.
So I guess I need to do a new count on the first query, aggregating the resultant rows in some defined day intervals.
I tried looking at the datetime functions in Postgres (http://www.postgresql.org/docs/7.3/static/functions-datetime.html), but couldn't find what I need.
May be I didn't get correctly you question, but it looks you need leverage sub-query.
Now I need to get the number of occurrences, by product and storage, of products that have been out of stock for 2 or more days
So:
SELECT COUNT(*), date, product_id FROM ( YOUR BIG QUERY IS THERE ) a
WHERE a.date < (CURRENT_DATE - interval '2' day)
GROUP BY date, product_id
Since you seem to want every row in the result individually, you cannot aggregate. Use a window function instead to get the count per day. The well known aggregate function count() can also serve as window aggregate function:
SELECT current_date - ped.data_envio::date AS days_out_of_stock
,count(*) OVER (PARTITION BY ped.data_envio::date)
AS count_per_days_out_of_stock
,ped.data_envio::date AS date
,p.id AS product_id
,opl.id AS storage_id
FROM sub_produtos_pedidos spp
LEFT JOIN cad_produtos p ON p.cod_ean = spp.ean_produto
LEFT JOIN sub_pedidos sp ON sp.id = spp.id_pedido
LEFT JOIN op_logisticos opl ON opl.id = sp.id_op_logistico
LEFT JOIN pedidos ped ON ped.id = sp.id_pedido
WHERE spp.motivo = '201' -- code for 'not in inventory'
ORDER BY ped.data_envio::date, p.id, opl.id
Sort order: Products having been out of stock for the longest time first.
Note, you can just subtract dates to get an integer in Postgres.
If you want a running count in the sense of "n rows have been out of stock for this number of days or more", use:
count(*) OVER (ORDER BY ped.data_envio::date) -- ascending order!
AS running_count_per_days_out_of_stock
You get the same count for the same day, peers are lumped together.