Cumulative Percentage Categorization Query - sql

I've got a query that's been driving me up the wall. The t-sql query is as follows but with about a thousand records in the source table, it's taking forever and a day to run. Is there any faster way of accomplishing the same task anyone might think of:
SELECT *, ROUND((SELECT SUM(PartTotal)
FROM PartSalesRankings
WHERE Item_Rank <= sub.Item_Rank) /
(SELECT SUM(PartTotal)
FROM PartSalesRankings) * 100, 2) as Cum_PC_Of_Total
FROM PartSalesRankings As sub
I'm trying to classify my inventory into A,B, and C categories based on percentage of cost, but it needs to be a cumulative percentage of cost, ie. 'A' parts make up 80% of my cost, 'B' parts make up the next 15%, and 'C' parts are the last 5%. There's obviously more to the sql statement than what I included, but the code I posted is the bottle-neck.
Any help would be greatly appreciated!
Aj

Try this:
;WITH SumPartTotal AS
(SELECT SUM(PartTotal) as SumPartTotal, Item_Rank
FROM PartSalesRankings
GROUP BY Item_Rank
),
CumSumPartTotal AS
(SELECT SUM(Sub.SumPartTotal) as CumSumPartTotal, SPT.Item_Rank
FROM SumPartTotal as SPT
INNER JOIN SumPartTotal as Sub ON
SPT.Item_Rank >= Sub.Item_Rank
GROUP BY SPT.Item_Rank
),
SumTotal AS
(SELECT SUM(PartTotal) as SumTotal
FROM PartSalesRankings
)
SELECT *,
ROUND((CumSumPartTotal.CumSumPartTotal * 100.0) / SumTotal.SumTotal, 2) as Cum_PC_Of_Total
FROM PartSalesRankings As sub
INNER JOIN CumSumPartTotal ON
sub.Item_Rank = CumSumPartTotal.Item_Rank
INNER JOIN SumTotal ON
sub.Item_Rank = SumTotal.Item_Rank
It should give your query a speed boost.

Sorry for the delay in response, but I've been up to my neck in other stuff, but after a great deal of trial and error I've found the following to be the fastest method:
Select companypartnumber
, (PartTotal + IsNull(Cum_Lower_Ranks, 0) ) / Sum(PartTotal) over() * 100 as Cum_PC_Of_Total
from PartSalesRankings PSRMain
Left join
(
Select PSRTop.Item_Rank, Sum(PSRBelow.PartTotal) as Cum_Lower_Ranks
from partSalesRankings PSRTop
Left join PartSalesRankings PSRBelow on PSRBelow.Item_Rank < PSRTop.Item_Rank
Group by PSRTop.Item_Rank
) as PSRLowerCums on PSRLowerCums.Item_Rank = PSRMain.Item_Rank
) sub2

Related

Query to segregate multiple data which is inserted into single column

There is a machine which is made with multiple parts, those parts were of two different country, the need is to find percentage of parts used from each country.
Machine
Parts_Used
% of IN part
% of CH parts
M_001
IN_001, CH_001, IN_002, CH002, IN_003, IN_004, IN_005,
M_002
IN_0011, CH_0011, IN_0012, CH0012, CH_0013, CH_0014, Ch_001
select count(*) as "% of CH parts"
from tablename
where Parts_Used like 'CH%';
Used this but did not get result.
Try this:
SELECT MP.[Machine]
,SUM(IIF(CHARINDEX('IN_', PU.[value]) > 0, 1, 0)) * 100.0 / COUNT(*) AS [% of IN part]
,SUM(IIF(CHARINDEX('CH_', PU.[value]) > 0, 1, 0)) * 100.0 / COUNT(*) AS [% of CH parts]
FROM machine_parts MP
CROSS APPLY STRING_SPLIT (Parts_Used, ',') PU
GROUP BY MP.[Machine]
The idea is to perform split of the values to now how may parts we have and easily perform the conditional counting with SUM and IIF. Basically, we have the following:
SELECT *
FROM machine_parts MP
CROSS APPLY STRING_SPLIT (Parts_Used, ',') PU
Then just counting.

SQL - Group values by range

I have following query:
SELECT
polutionmm2 AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154'
and to_timestamp(startts) >= '2021/01/20 00:00:00' group by polutionmm2
this query returns these values:
"metric","value"
50,580
100,8262
150,1548
200,6358
250,869
300,3780
350,505
400,2248
450,318
500,1674
550,312
600,7420
650,1304
700,2445
750,486
800,985
850,139
900,661
950,99
1000,550
I would need to edit the query in a way that it groups them toghether in ranges of 100, starting from 0. So everything that has a metric value between 0 and 99 should be one row, and the value the sum of the rows... like this:
"metric","value"
0,580
100,9810
200,7227
300,4285
400,2556
500,1986
600,8724
700,2931
800,1124
900,760
1000,550
The query will run over about 500.000 rows.. Can this be done via query? Is it efficient?
EDIT:
there can be up to 500 ranges, so an automatic way of grouping them would be great.
You can use generate_series() and a range type to generate the the ranges you want, e.g.:
select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
from generate_series(0,1000,100) as x(start)
This generates the ranges [0,100), [100,200) and so on up until [1000,).
You can adjust the width and the number of ranges by using different parameters for generate_series() and adjusting the expression that evaluates the last range
This can be used in an outer join to aggregate the values per range:
with ranges as (
select int4range(x.start, case when x.start = 1000 then null else x.start + 100 end, '[)') as range
from generate_series(0,1000,100) as x(start)
)
select r.range as metric,
sum(t.value)
from ranges r
left join the_table t on r.range #> t.metric
group by range;
The expression r.range #> t.metric tests if the metric value falls into the (generated) range
Online example
You can create a Pseudo table with interval you like and join with that table.
I'll use recursive CTE for this case.
WITH RECURSIVE cte AS(
select 0 St, 99 Ed
UNION ALL
select St + 100, Ed + 100 from cte where St <= 1000
)
select cte.st as metric,sum(tb.value) as value from cte
inner join [tableName] tb --with OP query result
on tb.metric between cte.St and cte.Ed
group by cte.st
order by st
here is DB<>fiddle with some pseudo data.
use conditional aggregation
SELECT
case when polutionmm2>=0 and polutionmm2<100 then '100'
when polutionmm2>=100 and polutionmm2<200 then '200'
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end AS metric,
sum(cnt) as value
FROM polutiondistributionstatistic as p inner join crates as c on p.crateid = c.id
WHERE
c.name = '154'
and to_timestamp(startts) >= '2021/01/20 00:00:00'
group by case when polutionmm2>=0 and polutionmm2<100 then '100'
when polutionmm2>=100 and polutionmm2<200 then '200'
........
when polutionmm2>=900 and polutionmm2<1000 then '1000'
end

Out of range integer: infinity

So I'm trying to work through a problem thats a bit hard to explain and I can't expose any of the data I'm working with but what Im trying to get my head around is the error below when running the query below - I've renamed some of the tables / columns for sensitivity issues but the structure should be the same
"Error from Query Engine - Out of range for integer: Infinity"
WITH accounts AS (
SELECT t.user_id
FROM table_a t
WHERE t.type like '%Something%'
),
CTE AS (
SELECT
st.x_user_id,
ad.name as client_name,
sum(case when st.score_type = 'Agility' then st.score_value else 0 end) as score,
st.obs_date,
ROW_NUMBER() OVER (PARTITION BY st.x_user_id,ad.name ORDER BY st.obs_date) AS rn
FROM client_scores st
LEFT JOIN account_details ad on ad.client_id = st.x_user_id
INNER JOIN accounts on st.x_user_id = accounts.user_id
--WHERE st.x_user_id IN (101011115,101012219)
WHERE st.obs_date >= '2020-05-18'
group by 1,2,4
)
SELECT
c1.x_user_id,
c1.client_name,
c1.score,
c1.obs_date,
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
FROM CTE c1
LEFT JOIN CTE c2 on c1.x_user_id = c2.x_user_id and c1.client_name = c2.client_name and c1.rn = c2.rn +2
I know the query works for sure because when I get rid of the first CTE and hard code 2 id's into a where clause i commented out it returns the data I want. But I also need it to run based on the 1st CTE which has ~5k unique id's
Here is a sample output if i try with 2 id's:
Based on the above number of row returned per id I would expect it should return 5000 * 3 rows = 150000.
What could be causing the out of range for integer error?
This line is likely your problem:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / c2.score) * 100, 0) AS INT) AS score_diff
When the value of c2.score is 0, 1.0/c2.score will be infinity and will not fit into an integer type that you’re trying to cast it into.
The reason it’s working for the two users in your example is that they don’t have a 0 value for c2.score.
You might be able to fix this by changing to:
CAST(COALESCE (((c1.score - c2.score) * 1.0 / NULLIF(c2.score, 0)) * 100, 0) AS INT) AS score_diff

Adding subquery to a grid query

Following from this question, I have another query that I need to subtract a value of 10 from for all negative numbers in the data. Sadly, I'm just not sure how to implement the same subquery as is given in the previous question.
The query in question is
SELECT 10 * (c.customer_x / 10), 10 * (c.customer_y / 10),
COUNT(*) as num_orders,
SUM(o.order_total)
FROM t_customer c
JOIN t_order o
ON c.customer_id = o.customer_id
GROUP BY c.customer_x / 10, c.customer_y / 10
ORDER BY SUM(o.order_total) DESC;
which calculates the order totals from each grid square.
Your original query doesn't change much in the query below. The only difference is the new join and one more term added to the SELECT list:
SELECT 10 * (c.customer_x / 10) AS col1,
10 * (c.customer_y / 10) AS col2,
COUNT(*) AS num_orders,
SUM(o.order_total) AS order_total_sum
FROM
(
SELECT customer_id,
CASE WHEN customer_x < 0 THEN customer_x - 10 ELSE customer_x END AS customer_x,
CASE WHEN customer_y < 0 THEN customer_y - 10 ELSE customer_y END AS customer_y
FROM t_customer
) c
INNER JOIN t_order o
ON c.customer_id = o.customer_id
GROUP BY c.customer_x / 10,
c.customer_y / 10
ORDER BY SUM(o.order_total) DESC
Note that you could solve this without the use of the subquery which I have used. However, the subquery makes it much more readable, and lets us compute the adjusted customer_x and customer_y values neatly.

Why colums in SELECT not belongs to SELECT

I have this select, but does not work.
select
a.code1,
a.data1,
a.stval,
(select sum(col1+col2+col3) from tad ) as sum1,
(select sum(col7+col8+col9) from tbac) as sum2,
CASE
WHEN (sum1+sum2) > 100 THEN (a.stval * sum1)
WHEN (sum1+sum2( <= 100 THEN (a.stval * sum2)
END as newdat1
from arti as a
Where is the error? why (sum1+sum2) its error?
Thanks
(sum1 + sum2) is an error because these identifiers are not defined in the scope where you are trying to use them. In an SQL select list, you cannot use symbols declared in the same select list, irrespective of their position on the list. Use a subquery if you need to access sum1 and sum2.
The specific reason is that SQL is a descriptive language that does not guarantee the order of evaluation of expressions. This is true in the select clause. This is true in the where clause. It is true in the from clause. SQL describes what the results look like. It does not prescribe the specific actions.
As a result, SQL does not allow identifiers defined in the select to be used in the same select clause (nor in the where clause at the same level). The expressions can be processed in any order.
The normal solution in your case is to use a subquery or a CTE. In your case, though, the subqueries are independent of the outer query (as written), so I would move them to the from clause:
select a.code1, a.data1, a.stval, x1.sum1, x2.sum2,
(CASE WHEN x1.sum1 + x2.sum2 > 100 THEN a.stval * x1.sum1
WHEN x1.sum1 + x2.sum2 <= 100 THEN a.stval * x2.sum2
END) as newdat1
from arti a cross join
(select sum(col1+col2+col3) as sum1 from tad ) x1 cross join
(select sum(col7+col8+col9) as sum2 from tbac) x2;
EDIT:
You can use a subquery or CTE. But there is an approach that builds on the above:
select a.code1, a.data1, a.stval, x1.sum1, x2.sum2,
(CASE WHEN x1.sum1 + x2.sum2 > 100 THEN a.stval * x1.sum1
WHEN x1.sum1 + x2.sum2 <= 100 THEN a.stval * x2.sum2
END) as newdat1
from arti a join
(select ascon, sum(col1+col2+col3) as sum1
from tad
group by ascon
) x1
on x1.ascon = arti.code1 cross join
(select sum(col7+col8+col9) as sum2 from tbac) x2;