SQL aggregate functions and sorting - sql

I am still new to SQL and getting my head around the whole sub-query aggregation to display some results and was looking for some advice:
The tables might look something like:
Customer: (custID, name, address)
Account: (accountID, reward_balance)
Shop: (shopID, name, address)
Relational tables:
Holds (custID*, accountID*)
With (accountID*, shopID*)
How can I find the store that has the least reward_balance?
(The customer info is not required at this point)
I tried:
SELECT accountID AS ACCOUNT_ID, shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM Account, Shop, With
WHERE With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID
ORDER BY MIN(reward_balance);
This works in a way that is not intended:
ACCOUNT_ID | SHOP_ID | LOWEST_BALANCE
1 | 1 | 10
2 | 2 | 40
3 | 3 | 100
4 | 4 | 1000
5 | 4 | 5000
As you can see Shop_ID 4 actually has a balance of 6000 (1000+5000) as there are two customers registered with it. I think I need to SUM the lowest balance of the shops based on their balance and display it from low-high.
I have been trying to aggregate the data prior to display but this is where I come unstuck:
SELECT shopID AS SHOP_ID, MIN(reward_balance) AS LOWEST_BALANCE
FROM (SELECT accountID, shopID, SUM(reward_balance)
FROM Account, Shop, With
WHERE
With.accountID = Account.accountID
AND With.shopID=Shop.shopID
GROUP BY
Account.accountID,
Shop.shopID;
When I run something like this statement I get an invalid identifier error.
Error at Command Line : 1 Column : 24
Error report -
SQL Error: ORA-00904: "REWARD_BALANCE": invalid identifier
00904. 00000 - "%s: invalid identifier"
So I figured I might have my joining condition incorrect and the aggregate sorting incorrect, and would really appreciate any general advice.
Thanks for the lengthy read!

Approach this problem one step at time.
We're going to assume (and we should probably check this) that by least reward_balance, that refers to the total of all reward_balance associated with a shop. And we're not just looking for the shop that has the lowest individual reward balance.
First, get all of the individual "reward_balance" for each shop. Looks like the query would need to involve three tables...
SELECT s.shop_id
, a.reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
That will get us the detail rows, every shop along with the individual reward_balance amounts associated with the shop, if there are any. (We're using outer joins for this query, because we don't see any guarantee that a shops is going to be related to at least one account. Even if it's true for this use case, that's not always true in the more general case.)
Once we have the individual amounts, the next step is to total them for each shop. We can do that using a GROUP BY clause and a SUM() aggregate.
SELECT s.shop_id
, SUM(a.reward_balance) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
At this point, with MySQL we could add an ORDER BY clause to arrange the rows in ascending order of tot_reward_balance, and add a LIMIT 1 clause if we only want to return a single row. We can also handle the case when tot_reward_balance is NULL, assigning a zero in place of the NULL.
SELECT s.shop_id
, IFNULL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM `shop` s
LEFT
JOIN `with` w
ON w.shop_id = s.shop_id
LEFT
JOIN `account` a
ON a.account_id = w.account_id
GROUP BY s.shop_id
ORDER BY tot_reward_amount ASC, s.shop_id ASC
LIMIT 1
If there are two (or more) shops with the same least value of tot_reward_amount, this query returns only one of those shops.
Oracle doesn't have the LIMIT clause like MySQL, but we can get equivalent result using analytic function (which is not available in MySQL). We also replace the MySQL IFNULL() function with the Oracle equivalent NVL() function...
SELECT v.shop_id
, v.tot_reward_balance
, ROW_NUMBER() OVER (ORDER BY v.tot_reward_balance ASC, v.shop_id ASC) AS rn
FROM (
SELECT s.shop_id
, NVL(SUM(a.reward_balance),0) AS tot_reward_balance
FROM shop s
LEFT
JOIN with w
ON w.shop_id = s.shop_id
LEFT
JOIN account a
ON a.account_id = w.account_id
GROUP BY s.shop_id
) v
HAVING rn = 1
Like the MySQL query, this returns at most one row, even when two or more shops have the same "least" total of reward_balance.
If we want to return all of the shops that have the lowest tot_reward_balance, we need to take a slightly different approach.
The best approach to building queries is step wise refinement; in this case, start by getting all of the individual reward_amount for each shop. Next step is to aggregate the individual reward_amount into a total. The next steps is to pickout the row(s) with the lowest total reward_amount.

In SQL Server, You can try using a CTE:
;with cte_minvalue as
(
select rank() over (order by Sum_Balance) as RowRank,
ShopId,
Sum_Balance
from (SELECT Shop.shopID, SUM(reward_balance) AS Sum_Balance
FROM
With
JOIN Shop ON With.ShopId = Shop.ShopId
JOIN Account ON With.AccountId = Account.AccountId
GROUP BY
Shop.shopID)ShopSum
)
select ShopId, Sum_Balance from cte_minvalue where RowRank = 1

Related

PostgreSQL average for every field

I'm trying to calculate the average in this sample; This example is working (but only when I select a specific ID, rather than the avg for every ID limited to 20 entries) but I'm having a hard time remembering how to calculate this for every id within the database, rather than the developer specifying the ID explicitly (in this case as 2958). I.E. It would be optimal to have the following rows (assuming this is grouped by each primary key with a limit of 20 values per avg):
ID: 1 -> avg 5
ID: 2 -> avg 2
ID: 3 -> avg 7
etc....
select avg(acc.amt)
from (
select acc.amt amt
from main_acc main_acc
join transactions trans on main_acc.id = trans.main_acc_id
where main_acc.id = 2958
order by main_acc.track_id, transactions.transaction_time desc
limit 20
) acc;
Any help at all would be greatly appreciated. The only relevant columns are the ones shown above, I can add a schema definition if requested. Thank you!
select main_acc.id, avg(acc.amt) from (select acc.amt amt
from main_acc main_acc
join transactions trans on main_acc.id = trans.main_acc_id
order by main_acc.track_id, transactions.transaction_time desc) acc
group by main_acc.id;
In fact you do not need the subquery.
select acc.id, avg(acc.amt)
from main_acc acc
join transactions trans on acc.id = trans.main_acc_id
group by acc.id
How do you define most recent. Is there a timestamp which is not shown, do you use the greatest id, something else? Basically, you order by that criteria and then use limit. So expanding the answer from #Tarik and assuming highest ids as most recent would yield something like:
select acc.id, avg(acc.amt) avg_for_id
from main_acc acc
join transactions trans on acc.id = trans.main_acc_id
group by acc.id
order by acc.id desc
limit 20;

SELECT TOP 10 rows

I have built an SQL Query that returns me the top 10 customers which have the highest outstanding. The oustanding is on product level (each product has its own outstanding).
Untill now everything works fine, my only problem is that if a certain customer has more then 1 product then the second product or more should be categorized under the same customer_id like in the second picture (because the first product that has the highest outstanding contagions the second product that may have a lower outstanding that the other 9 clients of top 10).
How can I modify my query in order to do that? Is it possible in SQL Server 2012?
My query is:
select top 10 CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
,S90T01_GROSS_EXPOSURE_THSD_EUR
,S90T01_COGNOS_PROD_NAME
,S90T01_DPD_C
,PREVIOUS_BUCKET_DPD_REP
,S90T01_BUCKET_DPD_REP
order by S90T01_GROSS_EXPOSURE_THSD_EUR desc;
You need to calculate the top Customers first, then pull out all their products. You can do this with a Common Table Expression.
As you haven't provided any test data this is untested, but I think it will work for you:
with top10 as
(
select top 10 CUSTOMER_ID
,sum(S90T01_GROSS_EXPOSURE_THSD_EUR) as TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR
from [dbo].[DM_07MONTHLY_DATA]
where S90T01_CLIENT_SEGMENT = 'PI'
and YYYY_MM = '2017_01'
group by CUSTOMER_ID
order by TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
)
select m.CUSTOMER_ID
,m.S90T01_GROSS_EXPOSURE_THSD_EUR
,m.S90T01_COGNOS_PROD_NAME
,m.S90T01_DPD_C
,m.PREVIOUS_BUCKET_DPD_REP
,m.S90T01_BUCKET_DPD_REP
from [dbo].[DM_07MONTHLY_DATA] m
join top10 t
on m.CUSTOMER_ID = t.CUSTOMER_ID
order by t.TOTAL_S90T01_GROSS_EXPOSURE_THSD_EUR desc
,m.S90T01_GROSS_EXPOSURE_THSD_EUR;

How to use rounded value for join in SQL

i'm just learning SQL today and i never thought how fun it's until i'm fiddling with it.
I got a problem and i need a help.
i have 2 tables, Customer and Rate, with details stated below
Customer
idcustomer = int
namecustomer = varchar
rate = decimal(3,0)
with value as described:
idcustomer---namecustomer---rate
1---JOHN DOE---100
2---MARY JANE---90
3---CLIVE BAKER---12
4---DANIEL REYES---47
Rate
rate = decimal(3,0)
description = varchar(40)
with value as described:
rate---description
10---G Rank
20---F Rank
30---E Rank
40---D Rank
50---C Rank
60---B Rank
70---A Rank
80---S Rank
90---SS Rank
100---SSS Rank
Then i ran query below in order to round all values in customer.rate field then inner join it with rate table.
SELECT *, round(rate,-1) as roundedrate
FROM customer INNER JOIN rate ON customer.roundedrate = rate.rate
It didn't produce this result:
idcustomer---namecustomer---rate---roundedrate---description
1---JOHN DOE---100---100---SSS Rank
2---MARY JANE---90---90---SS Rank
3---CLIVE BAKER---12---10---G Rank
4---DANIEL REYES---47---50---C Rank
Is there anything wrong with my code ?
Your query should produce an 'ambigious column' error because you're not specifying a table name when referring to rate (in round(rate,-1)), which exists in both tables.
Also, the where part of a sql query is executed before the select part, so you can't refer to the alias customer.roundedrate in your where statement.
Try this instead
SELECT *, round(customer.rate,-1) as roundedrate
FROM customer INNER JOIN rate ON round(customer.rate,-1) = rate.rate
http://sqlfiddle.com/#!9/e94a60/2
I would suggest a correlated subquery for this:
select c.*,
(select r.description
from rate r
where r.rate <= c.rate
order by r.rate desc
fetch first 1 row only
) as description
from customer c;
Note: fetch first 1 row only is ANSI standard SQL, which some databases do not support. MySQL uses limit. Older versions of SQL Server use select top 1 instead.

SQL percentage of the total

Hi how can I get the percentage of each record over the total?
Lets imagine I have one table with the following
ID code Points
1 101 2
2 201 3
3 233 4
4 123 1
The percentage for ID 1 is 20% for 2 is 30% and so one
how do I get it?
There's a couple approaches to getting that result.
You essentially need the "total" points from the whole table (or whatever subset), and get that repeated on each row. Getting the percentage is a simple matter of arithmetic, the expression you use for that depends on the datatypes, and how you want that formatted.
Here's one way (out a couple possible ways) to get the specified result:
SELECT t.id
, t.code
, t.points
-- , s.tot_points
, ROUND(t.points * 100.0 / s.tot_points,1) AS percentage
FROM onetable t
CROSS
JOIN ( SELECT SUM(r.points) AS tot_points
FROM onetable r
) s
ORDER BY t.id
The view query s is run first, that gives a single row. The join operation matches that row with every row from t. And that gives us the values we need to calculate a percentage.
Another way to get this result, without using a join operation, is to use a subquery in the SELECT list to return the total.
Note that the join approach can be extended to get percentage for each "group" of records.
id type points %type
-- ---- ------ -----
1 sold 11 22%
2 sold 4 8%
3 sold 25 50%
4 bought 1 50%
5 bought 1 50%
6 sold 10 20%
To get that result, we can use the same query, but a a view query for s that returns total GROUP BY r.type, and then the join operation isn't a CROSS join, but a match based on type:
SELECT t.id
, t.type
, t.points
-- , s.tot_points_by_type
, ROUND(t.points * 100.0 / s.tot_points_by_type,1) AS `%type`
FROM onetable t
JOIN ( SELECT r.type
, SUM(r.points) AS tot_points
FROM onetable r
GROUP BY r.type
) s
ON s.type = t.type
ORDER BY t.id
To do that same result with the subquery, that's going to be a correlated subquery, and that subquery is likely to get executed for every row in t.
This is why it's more natural for me to use a join operation, rather than a subquery in the SELECT list... even when a subquery works the same. (The patterns we use for more complex queries, like assigning aliases to tables, qualifying all column references, and formatting the SQL... those patterns just work their way back into simple queries. The rationale for these patterns is kind of lost in simple queries.)
try like this
select id,code,points,(points * 100)/(select sum(points) from tabel1) from table1
To add to a good list of responses, this should be fast performance-wise, and rather easy to understand:
DECLARE #T TABLE (ID INT, code VARCHAR(256), Points INT)
INSERT INTO #T VALUES (1,'101',2), (2,'201',3),(3,'233',4), (4,'123',1)
;WITH CTE AS
(SELECT * FROM #T)
SELECT C.*, CAST(ROUND((C.Points/B.TOTAL)*100, 2) AS DEC(32,2)) [%_of_TOTAL]
FROM CTE C
JOIN (SELECT CAST(SUM(Points) AS DEC(32,2)) TOTAL FROM CTE) B ON 1=1
Just replace the table variable with your actual table inside the CTE.

Count the number of occurrences grouped by some rows

I have made a query to bring me the number of products that have not been in stock (I know that by looking at the orders which the manufacturer returned with some status code), by product, date and storage, that looks like this:
SELECT count(*) as out_of_stock,
prod.id as product_id,
ped.data_envio::date as date,
opl.id as storage_id
from sub_produtos_pedidos spp
left join cad_produtos prod ON spp.ean_produto = prod.cod_ean
left join sub_pedidos sp ON spp.id_pedido = sp.id
left join pedidos ped ON sp.id_pedido = ped.id
left join op_logisticos opl ON sp.id_op_logistico = opl.id
where spp.motivo = '201' -- this is the code that means 'not in inventory'
group by storage_id,product_id,date
That produces an answer like this:
out_of_stock | product_id | date | storage_id
--------------|------------|-------------|-------------
1 | 5 | 2012-10-16 | 1
5 | 4 | 2012-10-16 | 2
Now I need to get the number of occurrences, by product and storage, of products that have been out of stock for 2 or more days, 5 or more days and so on.
So I guess I need to do a new count on the first query, aggregating the resultant rows in some defined day intervals.
I tried looking at the datetime functions in Postgres (http://www.postgresql.org/docs/7.3/static/functions-datetime.html), but couldn't find what I need.
May be I didn't get correctly you question, but it looks you need leverage sub-query.
Now I need to get the number of occurrences, by product and storage, of products that have been out of stock for 2 or more days
So:
SELECT COUNT(*), date, product_id FROM ( YOUR BIG QUERY IS THERE ) a
WHERE a.date < (CURRENT_DATE - interval '2' day)
GROUP BY date, product_id
Since you seem to want every row in the result individually, you cannot aggregate. Use a window function instead to get the count per day. The well known aggregate function count() can also serve as window aggregate function:
SELECT current_date - ped.data_envio::date AS days_out_of_stock
,count(*) OVER (PARTITION BY ped.data_envio::date)
AS count_per_days_out_of_stock
,ped.data_envio::date AS date
,p.id AS product_id
,opl.id AS storage_id
FROM sub_produtos_pedidos spp
LEFT JOIN cad_produtos p ON p.cod_ean = spp.ean_produto
LEFT JOIN sub_pedidos sp ON sp.id = spp.id_pedido
LEFT JOIN op_logisticos opl ON opl.id = sp.id_op_logistico
LEFT JOIN pedidos ped ON ped.id = sp.id_pedido
WHERE spp.motivo = '201' -- code for 'not in inventory'
ORDER BY ped.data_envio::date, p.id, opl.id
Sort order: Products having been out of stock for the longest time first.
Note, you can just subtract dates to get an integer in Postgres.
If you want a running count in the sense of "n rows have been out of stock for this number of days or more", use:
count(*) OVER (ORDER BY ped.data_envio::date) -- ascending order!
AS running_count_per_days_out_of_stock
You get the same count for the same day, peers are lumped together.