How to summing up the row values by passing condition - sql

I want to sum up the column (price) by passing sum limitation.
For example I have the below table and I want to limit the records by 10k or 15k.
ID PRICE
x1 10,000
x2 20,000
x3 5,000
x4 7,500.00
I want the result should be
For <=10000
ID PRICE
x1 10,000
<=15000
ID PRICE
x1 10,000
x3 5,000
<=14000
ID PRICE
x3 5,000
x4 7,500.00
I made some search on it.I find some window functions in postgresql i.e OVER function. so I written the below query
WITH cte AS (
SELECT *, sum(price) OVER (order BY id) AS total
FROM test1
)
SELECT *
FROM cte
WHERE total <= amount
But condition <=15,000 and <=14000 not bringing the right result.
I want to sum up the price column and fetch the records which matches the sum result with our given amount or condition and more specifically it should return the records by verifying and returning any other record can accommodate within the amount which we pass
Please help me for it.
Thanks

your Q is not clear but if you want you can sort the price column so you can get the minimum sum that is <= the limit.
try order by on price col:
WITH cte AS (
SELECT *, sum(price) OVER (order BY price) AS total
FROM test1
)
SELECT *
FROM cte
WHERE total <= amount

Play here https://www.db-fiddle.com/f/mckDPhtrY4vRrkF2NjhQcC/0
WITH b AS (
SELECT *, sum(price) OVER (order BY price) AS total
FROM a
)
SELECT *
FROM b
WHERE (total <= your_max OR price <= your_max) AND total >= your_min

Related

Aggregate sum of rows identified by max value

I am trying to aggregate a the max value of each type, then sum all of this to one value:
resource_id
price
a
100
a
84
b
33
b
100
A 100 and B 100 would be selected (max value of each type of A and B).
Expected return:
200
What I have so far:
SELECT resource_id, MAX(price)
FROM costs
GROUP BY resource_id
It is currently returning A = 100 and B = 100... just need a little help on how to sum all this into a return of just 200
Thanks!
Wrap your query...
select sum(m_price)
from (
SELECT resource_id, MAX(price) as m_price
FROM costs
GROUP BY resource_id
)z

Postgresql frequency table with percentage partition over and group by

I'm trying to create a frequency table with percentage in Postgresql
If anyone is familiar with SAS i'm trying to recreate a proc frequency table
Below I'm trying to get the frequency with a group by on var1,var2
var1 var2 frequency percentage
A 20 1 33%
A 30 1 33%
A 40 1 33%
B 20 4 80%
B 30 1 20%
Now that is easy with just
select var1
,var2
,count(*)
from table
group by 1,2
What gets tricky is where I try to add a percentage column which does % based on var1 distribution
select var1
,var2
,count(*)
,count(*)/count(*) over(partition by var1)
from table
group by 1,2
I get a wrong answer with the code above
You want to sum the count(*) values. So:
select var1, var2, count(*),
count(*) * 1.0 / sum(count(*)) over (partition by var1)
from table
group by 1, 2;
Your code just counts the number of rows for each var after the aggregation. Hence, it is actually returning the weighted average -- something that might be useful but not what you want.

SQL (DB2) query to get count of top 10% revenue contributing customers

I work for a telecom company and I need to run a scheme for top valued customers who contributed 10% of total company's revenue in the month. I want to know the count of customers who are eligible for this scheme? I am using SQL DB2.
Ex - In the below table, Sum of the Revenue is 5000 and its 10% is 500, and I want to know the count of minimum number of customers whose sum of revenue would be either 500 or just above 500
Customers Revenue
A 156
B 259
C 389
D 125
E 578
F 321
To find all customers where their total revenue is at least 10 percent of the overall revenue:
select customer
from the_table
group by customer
having sum(revenue) >= (select sum(revenue) * 0.1 from the_table);
Your sample data doesn't show this, but this also deals with multiple rows per each customer in the table (your example only has a single row per customer)
The get the count of that:
select count(*)
from (
select customer
from the_table
group by customer
having sum(revenue) >= (select sum(revenue) * 0.1 from the_table)
) t
I interpret the question as wanting the highest revenue customers whose sum is at least 10% of the total revenue.
You need a cumulative sum for this:
select count(*)
from (select t.*, sum(revenue) over (order by revenue desc) as cume_rev,
sum(revenue) over () as tot_rev
from t
) t
where cume_rev <= tot_rev * 0.1;
This assumes that there is one row per customer.
EDIT:
For "just above", the where clause should be:
where cume_rev - revenue < tot_rev * 0.1;

how to select values that sum up to 60% of the total

assume I have a table of this kind:
A B 3
C D 1
E F 2
G H 4
The sum of the last column is 10, and I want the biggest values that sum up to at least 60% of the total value. So, in this case, it will return
G H 4
A B 3
It goes up to 70% but if only the 1st value was selected, it will only go up to 40%. Even though there could be a combination that will return exactly 60%, we want to take the largest numbers.
So, I think I know how to sort the values from biggest to smallest and how to sum up all the values, but I don't know how to then take only lines that sum up to 60%.
--save the whole sum into a variable
summa = select sum(val) from sometable;
select *
from sometable o
where (
select sum(val)
from sometable i
where i.val <= o.val
) >= 0.6*summa;
I think this gives you the correct result. Need to work with a temporary table though, not sure if this can be avoided.
DECLARE #total bigint
select #total = SUM(value) from SampleTable
select st.*,
convert(decimal(10,2), (select SUM(value) from SampleTable st2 where st2.Value >= st.Value))/#total as percentage
into #temptable
from sampletable st
select * from #temptable
where Value >= (select max(Value) from #temptable where percentage >= 0.6)

SQL: Show average and min/max within standard deviations

I have the following SQL table -
Date StoreNo Sales
23/4 34 4323.00
23/4 23 564.00
24/4 34 2345.00
etc
I am running a query that returns average sales, max sales and min sales for a certain period -
select avg(Sales), max(sales), min(sales)
from tbl_sales
where date between etc
But there are some values coming through in the min and max that are really extreme - perhaps because the data entry was bad, perhaps because some anomoly had occurred on that date and store.
What I'd like is a query that returns average, max and min, but somehow excludes the extreme values. I am open to how this is done, but perhaps it would use standard deviations in some way (for example, only using data within x std devs of the true average).
Many thanks
In order to calculate the standard deviation, you need to iterate through all of the elements, so it would be impossible to do this in one query. The lazy way would be to just do it in two passes:
DECLARE
#Avg int,
#StDev int
SELECT #Avg = AVG(Sales), #StDev = STDEV(Sales)
FROM tbl_sales
WHERE ...
SELECT AVG(Sales) AS AvgSales, MAX(Sales) AS MaxSales, MIN(Sales) AS MinSales
FROM tbl_sales
WHERE ...
AND Sales >= #Avg - #StDev * 3
AND Sales <= #Avg + #StDev * 3
Another simple option that might work (fairly common in analysis of scientific data) would be to just drop the minimum and maximum x values, which works if you have a lot of data to process. You can use ROW_NUMBER to do this in one statement:
WITH OrderedValues AS
(
SELECT
Sales,
ROW_NUMBER() OVER (ORDER BY Sales) AS RowNumAsc,
ROW_NUMBER() OVER (ORDER BY Sales DESC) AS RowNumDesc
)
SELECT ...
FROM tbl_sales
WHERE ...
AND Sales >
(
SELECT MAX(Sales)
FROM OrderedValues
WHERE RowNumAsc <= #ElementsToDiscard
)
AND Sales <
(
SELECT MIN(Sales)
FROM OrderedValues
WHERE RowNumDesc <= #ElementsToDiscard
)
Replace ROW_NUMBER with RANK or DENSE_RANK if you want to discard a certain number of unique values.
Beyond these simple tricks you start to get into some pretty heavy stats. I have to deal with similar kinds of validation and it's far too much material for a SO post. There are a hundred different algorithms that you can tweak in a dozen different ways. I would try to keep it simple if possible!
Expanding on DuffyMo's post you could do something like
With SalesStats As
(
Select Sales, NTILE( 100 ) OVER ( Order By Sales ) As NtileNum
From tbl_Sales
)
Select Avg( Sales ), Max( Sales ), Min( Sales )
From SalesStats
Where NtileNum Between 5 And 95
This will exclude the lowest 5% and highest 95%. If you have numbers that vary wildly, you may find that the Average isn't a quality summary statistic and should consider using median. You can do that by doing something like:
With SalesStats As
(
Select NTILE( 100 ) OVER ( Order By Sales ) As NtileNum
, ROW_NUMBER() OVER ( Order By Id ) As RowNum
From tbl_Sales
)
, TotalSalesRows
(
Select COUNT(*) As Total
From tbl_Sales
)
, Median As
(
Select Sales
From SalesStats
Cross Join TotalSalesRows
Where RowNum In ( (TotalRows.Total + 1) / 2, (TotalRows.Total + 2) / 2 )
)
Select Avg( Sales ), Max( Sales ), Min( Sales ), Median.Sales
From SalesStats
Cross Join Median
Where NtileNum Between 5 And 95
Maybe what you're looking for are percentiles.
Standard deviation tends to be sensitive to outliers, since it's calculated using the square of the difference between a value and the mean.
Maybe a more robust, less sensitive measure like absolute value of difference between a value and the mean would be more appropriate in your case.