SQL Query: SUM on three columns with criteria - sql

I have a table with columns like these :
idx | amount | usercol1 | usercol2 | usercol3 | percentage1 | percentage2 | percentage3
Data is typically like this :
0 | 1500 | 1 | null | null | 100 | null | null
1 | 3000 | 2 | 3 | null | 50 | 50 | null
I would like to make a SUM() of every user's amount.
Example :
user1= 1500*100/100 (amount*usercol1/100)
user2= 3000*50/100 (amount*usercol1/100)
user3= 3000*50/100 (amount*usercol2/100)
I tried UNION to no avail (did not sum the SUMs).
Is there a way to do this ? The problem being that it should GROUP BY the username (which I get with a LEFT OUTER JOIN usernames ON exampletable.usercol1=usernames.idx).
I know this is non standard and would be better with relations from another table. But I am not allowed to change the table structure.
Many many many thanks ! :=)
Hereby, an example that gives a wrong result (seems to give only results from the query in the middle)
(
SELECT SUM(projects.amount * (projects.percentage1/100)) as totalproj,
entities.idx as idx,
COUNT(projects.idx) as numproj,
entities.name
FROM projects
INNER JOIN entities ON projects.usercol1=entities.idx
WHERE projects.usercol1=entities.idx
GROUP BY name ORDER BY totalproj DESC
)
UNION ALL
(
SELECT SUM(projects.amount * (projects.percentage2/100)) as totalproj,
entities.idx as idx,
COUNT(projects.idx) as numproj,
entities.name
FROM projects
INNER JOIN entities ON projects.usercol2=entities.idx
WHERE projects.usercol2=entities.idx
GROUP BY name ORDER BY totalproj DESC
)
UNION ALL
(
SELECT SUM(projects.amount * (projects.percentage3/100)) as totalproj,
entities.idx as idx,
COUNT(projects.idx) as numproj,
entities.name
FROM projects
INNER JOIN entities ON projects.usercol3=entities.idx
WHERE projects.usercol3=entities.idx
GROUP BY name ORDER BY totalproj DESC
)
ORDER BY totalproj DESC
LIMIT 10

You could use a derived table to simulate a first normal form table then join onto that.
SELECT SUM(P.amount * (P.percentage/100)) as totalproj,
entities.idx as idx,
COUNT(P.idx) as numproj,
entities.name
FROM
(
SELECT idx, amount, usercol1 AS usercol, percentage1 AS percentage
FROM projects
UNION ALL
SELECT idx, amount, usercol2 AS usercol, percentage2 AS percentage
FROM projects
UNION ALL
SELECT idx, amount, usercol3 AS usercol, percentage3 AS percentage
FROM projects
) P
INNER JOIN entities ON P.usercol=entities.idx
WHERE P.usercol=entities.idx
GROUP BY name
ORDER BY totalproj DESC

using this data (i added some stranger data to make sure the math was working properly)
0 1500 1 NULL NULL 100 NULL NULL
1 3000 2 3 NULL 50 50 NULL
2 780 4 1 3 70 20 50
3 3800 2 4 1 30 20 10
i got these results
user commission
------- -------------
1 2036
2 2640
3 1890
4 1306
is this what you were looking for? below is my query
SELECT [user]
,SUM([commission]) AS commission
FROM ( SELECT [usercol1] AS [user]
,( [amount] * [percentage1] ) / 100 AS commission
FROM [dbo].[projects]
WHERE [usercol1] IS NOT NULL
AND [percentage1] IS NOT NULL
UNION ALL
SELECT [usercol2]
,( [amount] * [percentage2] ) / 100
FROM [dbo].[projects]
WHERE [usercol2] IS NOT NULL
AND [percentage2] IS NOT NULL
UNION ALL
SELECT [usercol3]
,( [amount] * [percentage3] ) / 100
FROM [dbo].[projects]
WHERE [usercol3] IS NOT NULL
AND [percentage3] IS NOT NULL
) x
GROUP BY [user]

Related

Aggregating columns inside a CASE statement

I have a case such that
~id ~from ~to ~label ~weight
100 A B knows 2
100 A B knows 3
100 A B knows 4
But I want only the weight for maximum Date.
How can I modify the below CASE statement such that only 1 entry is there for an ID.
Query:
(
select distinct
CASE WHEN *some-condition* as "~id"
,CASE *some-condition* as "~from"
,CASE *some-condition* as "~to"
,CASE *some-condition* as "~label"
,CASE ??? as "weight"
from
(select
dense_rank() over(partition by t.job_id order by start_time desc) rnk,
t.Date,
t.job_id,
t.start_time,
t.end_time,
t.dep_id,
t.table_name
.....
t.region_id,
from Table1 t
,Tabel2 J
where t.JOB_ID=J.JOB_ID
)
where rnk=1
order by JOB_ID,table_name
)
where "~id" is NOT NULL and "~label" is NOT NULL and "~from" is NOT NULL and "~to" is NOT NULL;
;
Table t
job_id Date table_name ....... dep_id weight
100 2020-10-20 abc 1 2
100 2020-10-20 abc 2 3
100 2020-10-20 abc 3 4
100 2020-10-20 abc 4 10
100 2020-10-19 abc 3 2
Output weight in the result should be corresponding to maximum dep_id.
~id ~from ~to ~label ~weight
100 A B knows 10
It's quite hard to come up with a solution since you didn't state how ~id, ~from, ~to, ~label are calculated. You should be able to achieve your desired output with window functions, i.e. FIRST_VALUE():
...
,CASE *some-condition* as "~label"
,FIRST_VALUE(weight)OVER(ORDER BY dep_id desc) "weight"
...
You may need to add a PARTITION BY clause depending if you want to have the first value overall or depending on some other conditions as well.

SQL GROUP BY where either column has same value

I have the following table
User A | User B | Value
-------+--------+------
1 | 2 | 60
3 | 1 | 10
4 | 5 | 50
3 | 5 | 50
5 | 1 | 80
2 | 3 | 10
I want group together records where either user a = x or user b = x, in order to find averages.
e.g. User 1 appears in the table 3 times, once as 'User A' and twice as 'User B'. So I would want to carry out my AVG() function using those three rows.
I need the highest and lowest average values. Such a query would break down the above table into the following groups:
User | Avg Value
-----+-----
1 | 50
2 | 35
3 | 23.33
4 | 50
5 | 60
and then return
Highest Avg | Lowest Avg
------------+-----------
60 | 23.33
I know that GROUP BY collects together records where a column has the same value. I want to collect together records where either one of two columns has the same value. I have searched through many solutions but can't seem to find one that meets my problem.
A portable option uses union all:
select usr, avg(value) avg_value
from (
select usera usr, value from mytable
union all select userb, value from mytable
) t
group by usr
This gives you the first resultset. Then, you can add another level of aggregataion to get the maximum and minimum average:
select min(avg_value) min_avg_value, max(avg_value) max_avg_value
from (
select usr, avg(value) avg_value
from (
select usera usr, value from mytable
union all select userb, value from mytable
) t
group by usr
) t
In databases that support lateral joins and values(), this is most convinently (and efficiently) expressed as follows:
select min(avg_value) min_avg_value, max(avg_value) max_avg_value
from (
select usr, avg(value) avg_value
from mytable t
cross join lateral (values (usera, value), (userb, value)) as x(usr, value)
group by usr
) t
This would work in Postgres for example. In SQL Server, you would just replace cross join lateral with cross apply.
You can unpivot using union all and then aggregation:
select user, avg(value)
from ((select usera as user, value) union all
(select userb as user, value)
) u
group by user;
You can get the extremes with another level of aggregation:
select min(avg_value), max(avg_value)
from (select user, avg(value) as avg_value
from ((select usera as user, value) union all
(select userb as user, value)
) u
group by user
) ua

How to count over rows and avoid duplicates?

For a university project I have to calculate a kpi based on the data of one table. The table stores data about baskets of a supermarket and the shopped line items and their product category. I have to calculate a number of all product categories of products which were bought in a specific store. So in tables it looks like this:
StoreId BasketID CategoryId
1 1 1
1 1 2
1 1 3
1 2 1
1 2 3
1 2 4
2 3 1
2 3 2
2 3 3
2 4 1
As a result of the query I want a table which counts the distinct product categories over all basket associated to a store.
Something like this:
StoreId Count(CategoryId)
1 4
2 3
If I do a not dynamic statement with hard values, it is working.
select basket_hash, store_id, count(DISTINCT retailer_category_id)
from promo.checkout_item
where store_id = 2
and basket_hash = 123
GROUP BY basket_hash, store_id;
But when I try to write it in a dynamic way, the sql calculates the amount per basket and adds the single amounts together.
select store_id, Count(DISTINCT retailer_category_id)
from promo.checkout_item
group by store_id;
But like this it isn't comparing the categories over all baskets associated to a store and I'm getting duplicates because a category can be in basket 1 and in basket 2.
Can somebody pls help?!
Thx!
As your expected result, Do you want following statement?
SELECT StoreId, COUNT(*)
FROM (
SELECT DISTINCT StoreId, CategoryId
FROM table_name
)
GROUP BY StoreId;
Please, replace "table_name" in statement by your table's name.
I'm not sure what is "dynamic way" meaning.
I'm confused by your requirements. This is what I suppose you mean:
with checkout_item (store_id, basket_hash, retailer_category_id) as (
values
(1,1,1),(1,1,2),(1,1,3),(1,2,1),(1,2,3),
(1,2,4),(2,3,1),(2,3,2),(2,3,3),(2,4,1)
)
select distinct store_id, basket_hash, store_cats, basket_cats
from (
select store_id, basket_hash,
max(store_cats) over (partition by store_id) as store_cats,
max(basket_cats) over (partition by basket_hash) as basket_cats
from (
select store_id, basket_hash,
dense_rank() over (
partition by store_id
order by retailer_category_id
) as store_cats,
dense_rank() over (
partition by basket_hash
order by retailer_category_id
) as basket_cats
from checkout_item
) s
) s
order by 1, 2
;
store_id | basket_hash | store_cats | basket_cats
----------+-------------+------------+-------------
1 | 1 | 4 | 3
1 | 2 | 4 | 3
2 | 3 | 3 | 3
2 | 4 | 3 | 1

Update column value of one row from other rows

I have the following table:
sno name pid amount total
1 Arif 0 100 null
2 Raj 1 200 null
3 Ramesh 2 100 null
4 Pooja 2 100 null
5 Swati 3 200 null
6 King 4 100 null
I want total of each person such that it gives total sum of amount of its descendants.
For ex.
for RAJ total will be : total= amount of(raj+ramesh+pooja+swati+king)
for SWATI :Total=amount of swati only.
You could try something like this:
WITH hierarchified AS (
SELECT
sno,
amount,
hierarchyID = CAST(sno AS varchar(500))
FROM yourTable
WHERE pid = 0
UNION ALL
SELECT
t.sno,
t.amount,
hierarchyID = CAST(h.hierarchyID + '/' + RTRIM(t.sno) AS varchar(500))
FROM yourTable t
INNER JOIN hierarchified h ON t.pid = h.sno
)
UPDATE yourTable
SET total = t.amount + ISNULL(
(
SELECT SUM(amount)
FROM hierarchified
WHERE hierarchyID LIKE h.hierarchyID + '/%'
),
0
)
FROM yourTable t
INNER JOIN hierarchified h ON t.sno = h.sno;
Note that this query (which you can try on SQL Fiddle) would probably not be very efficient on a large dataset. It might do as a one-off query, and then it would likely be better to organise updating the totals each time the table is updated, i.e. using triggers.

How to find range of a number where the ranges come dyamically from another table?

If I had two tables:
PersonID | Count
-----------------
1 | 45
2 | 5
3 | 120
4 | 87
5 | 60
6 | 200
7 | 31
SizeName | LowerLimit
-----------------
Small | 0
Medium | 50
Large | 100
I'm trying to figure out how to do a query to get a result similar to:
PersonID | SizeName
-----------------
1 | Small
2 | Small
3 | Large
4 | Medium
5 | Medium
6 | Large
7 | Small
Basically, one table specifies an unknown number of "range names" and their integer ranges associated. So a count range of 0 to 49 from the person table gets a 'small' designation. 50-99 gets 'medium' etc. But I need it to be dynamic because I do not know the range names or integer values. Can I do this in a single query or would I have to write a separate function to loop through the possibilities?
Try this out:
SELECT PersonID, SizeName
FROM
(
SELECT
PersonID,
(SELECT MAX([LowerLimit]) FROM dbo.[Size] WHERE [LowerLimit] < [COUNT]) As LowerLimit
FROM dbo.Person
) A
INNER JOIN dbo.[SIZE] B ON A.LowerLimit = B.LowerLimit
With Ranges As
(
Select 'Small' As Name, 0 As LowerLimit
Union All Select 'Medium', 50
Union All Select 'Large', 100
)
, Person As
(
Select 1 As PersonId, 45 As [Count]
Union All Select 2, 5
Union All Select 3, 120
Union All Select 4, 87
Union All Select 5, 60
Union All Select 6, 200
Union All Select 7, 31
)
, RangeStartEnd As
(
Select R1.Name
, Case When Min(R1.LowerLimit) = 0 Then -1 Else MIN(R1.LowerLimit) End As StartValue
, Coalesce(MIN(R2.LowerLimit), 2147483647) As EndValue
From Ranges As R1
Left Join Ranges As R2
On R2.LowerLimit > R1.LowerLimit
Group By R1.Name
)
Select P.PersonId, P.[Count], RSE.Name
From Person As P
Join RangeStartEnd As RSE
On P.[Count] > RSE.StartValue
And P.[Count] <= RSE.EndValue
Although I'm using common-table expressions (cte for short) which only exist in SQL Server 2005+, this can be done with multiple queries where you create a temp table to store the equivalent of the RangeStartEnd cte. The trick is to create a view that has a start column and end column.
SELECT p.PersonID, Ranges.SizeName
FROM People P
JOIN
(
SELECT SizeName, LowerLimit, MIN(COALESCE(upperlimit, 2000000)) AS upperlimit
FROM (
SELECT rl.SizeName, rl.LowerLimit, ru.LowerLimit AS UpperLimit
FROM Ranges rl
LEFT OUTER JOIN Ranges ru ON rl.LowerLimit < ru.LowerLimit
) r
WHERE r.LowerLimit < COALESCE(r.UpperLimit, 2000000)
GROUP BY SizeName, LowerLimit
) Ranges ON p.Count >= Ranges.LowerLimit AND p.Count < Ranges.upperlimit
ORDER BY PersonID